B[8      !"#$%&'()*+,-./ 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T UVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~&,Classes are denoted by plus or minus 1 or 0 CTargets either correspond to classes or to doubles for regression  problems. Feature values are doubles. <We may or may not want to automatically convert an integer, > depending on the circumstances. In particular targets for B regression problems can be doubles, so we dont want to read a  class as a double. 6A target is -1, +1, 0 or a floating point number. The 8 Boolean is True for one of the first three and false % for the fourth. A look at Joachims' examples suggests  the + is optional. )We need integers to number the features. <A feature in this format can in fact be either numbered or  labelled as qid. GMost of a line is taken up with : separated features and their values. FThis is needed so that we can read pretty much anything we like from  comments. JThe format describes comments in two ways. At the start of the file they 7 are ignored, but after a line they need to be read. At the end of a line, a #. followed by text needs to be read. L I assume you only take the string up to the end of the line. 4This reads a single line denoting a single example. 7This reads a file, ignoring comments at the beginning. BA bunch of basic functions for extracting interesting things from ( the output of the parser in a format  that'$s likely to be a bit easier to use. 2Now sort numbers and attributes at the same time & so that the numbers are ascending. .Get the attribute vectors as a list of lists. J Care required here as we need to insert 0 where there is no attribute. DDoes a matrix of Doubles make sense: that is, are all the rows the  same length? KFind the dimensions of a matrix represented as a list of lists of Doubles. HParse a file in SvmLight format and print some information about it. JRead examples from a file in SvmLight format and produce a corresponding E matrix and vector, for a classification problem. Includes checks L that all examples have the same number of attributes, and that the file 8 does in fact correspond to a classification problem. <iterateOnce takes a function to update a state and another 7 to compute a value associated with a given state. =It returns a state transformer performing the corresponding & update - that is, one iteration. DiterateToConvergence takes a state transformer typically generated E using iterateOnce, a convergence test that compares two values D associated with the current and next states returning True if  we'$ve converged, and an initial value. >It returns a state transformer that performs iteration until G convergence. When run from an initial state it returns the state 1 at convergence and the corresponding value. BThe same as iterateToConvergence, but takes the state update and I state value functions directly, so the resulting state transformer , only requires a start state to be run. DThe same as iterateToConvergence, but does one update to obtain an G initial value and continues from there. Consequently, no initial 5 value is required, but you do one extra update.  3These are defined to make functions more readable.     Standard delta function - 0/ 1 valued. *Standard delta function - boolean valued. .General sigmoid function with variable slope. Standard sigmoid function. 7Integral of Gaussian density of mean 0 and variance 1  from -infinity to x 3Value of Gaussian density function for mean 0 and  variance 1. DANGER! You can't compute the ratio (n x) / (phiIntegral x) directly, K as although it has sensible values for negative x the denominator gets H small so fast that you quickly get Infinity turning up. GSL has the  inverse Mill' s function/0hazard function for the Gaussian distribution, ) and the ratio is equal to hazard(-x). >DANGER! See nOverPhi - you have to compute this carefully as  well.   FTake two vectors and a function. The vectors contain inputs 1 and 2. D The function maps a pair of inputs to a value. Produce a matrix A containing the values of the function at the relevant points. ITake a function and a matrix of instance vectors. Apply the function to M each possible pair of instance vectors and return the result as a matrix. HSame as makeMatrixFromPairs but the function returns a vector. In this E case the output is a list of matrices, one for each element of the  function value.  Sum the elements in a vector. 0Sum of elements in a vector, divided by an Int. Length of a vector. 9Generate a vector equal to the first column of a matrix. 9Replace the element at a specified position in a vector. D NOTE: hmatrix numbers from 0, which is odd. This numbers from 1. A The result is returned by overwriting v. This is implemented ? via runSTVector because the increase in efficiency is HUGE. CEfficiently pre multiply by a diagonal matrix (passed as a vector) DEfficiently post multiply by a diagonal matrix (passed as a vector) !@Compute x^T A x when A is diagonal. The second argument is the  diagonal of A. "@Compute the diagonal only of the product of two square matrices #JCompute ABA where A is diagonal. The first argument is the diagonal of A. $2Compute aBa where a is a vector and B is a matrix  !"#$  !"#$  !"#$%It'Ds not clear whether the use of linearSolve from HMatrix will induce K a performance hit when the matrix is upper or lower triangular. Pro:  it'5s a call to something presumably from LaPack. Con: we' ve got some I structure that should allow us to make it O(n^2) instead of O(n^3). 7To do: try some timed runs to see if these are needed. #Solve an upper triangular system. &!Solve a lower triangular system. 'Used by lowerSolve. Used by upperSolve. :Compute the value of x_n when solving a lower triangular C set of equations Mx=y. It is assumed that all values x_i where  i <9 n are already in the vector x and that the rest of the  elements of x are 0.  nth row of M y_n n current x vector x vector with x_n computed. (;General solver for linear equations of the relevant kind. IFirst parameter is either upperSolve or lowerSolve. Next two parameters  are the upper/:lower triangular matrix from the Cholesky decomposition, : then another matrix. Returns the solution as a matrix. )=Find the inverse of a matrix from its Cholesky decomposition %&'()%&'()%&'()*BMake a random matrix. Elements are uniformly distributed between = specified bounds. Returns the matrix and a new generator. Seed Range for the elements Number of rows Number of columns +CProduce vectors with normally distributed, independent elements of % zero mean and specified variance. Seed  Variance "Number of elements in the vector. ,@Produce lists with normally distributed independent elements of % zero mean and specified variance. Seed  Variance Number of elements in the list ->Produce normally distributed vectors with mean and covariance  specified. Seed  Mean vector Covariance matrix .BMake a matrix with normally distributed, independent elements of % zero mean and specified variance. Seed  Variance Rows Columns *+,-.*+,-.*+,-. /0"The actual hyperparameter values. The covariance 15Derivative of covariance with respect to parameters 2 Construct using log parameters. 3Get log parameters. 45JConstruct a matrix of covariances from a covariance and a design matrix. 6EConstructs the column vector required when a new input is included. = Constructed as a matrix to avoid further work elsewhere. 7<covarianceWithPoint applied to a list of points to produce  a list of vectors. /01234567 /01234567 /0123401234567 8GThe following allows arbitrary likelihoods with or without parameters M to be wrapped up with their derivatives (with respect to f) and passed  to a function. 9:;<89:;<89:;<89:;<9:;< =DA convergence test is a function that takes two consecutive values - during iteration and works out whether you've converged or not. 8The state is the vector f and the number of iterations. >>Computing the Laplace approximation requires us to deal with B quite a lot of information. To keep things straightforward we  wrap this up in a type. DThe value associated with a state includes f, evidence, objective, D derivative of the objective, the vector a needed to compute the = derivative of the evidence, and the number of iterations. ?@ABCDE7Compute the Laplace update for the latent variables f. @Produces new f, log marginal likelihood, objective, derivative M of objective, and the vector a which is needed to compute the derivative # of the log marginal likelihood. log likelihood Current f and n. EIteration to convergence is much nicer if the state is hidden using  the State monad. DThis uses the pure gpCLaplaceUpdate function, and wraps it up in a  state transformer that'&s usable by the general functions in  HasGP.Support.Iterate. log likelihood FEIteration to convergence is much nicer if the state is hidden using  the State monad. FThis uses a general function from HasGP.Support.Iterate to implement E the learning algorithm. Convergence testing is done using a user  supplied function. log likelihood GDConverts pairs of fStar and V produced by the prediction functions I to actual probabilities, assuming the cumulative Gaussian likelihood  was used. HBPredict using a GP classifier based on the Laplace approximation. <Produces fStar and V rather than the actual probability as = further approximations are then required to compute this. f Covariance matrix Covariance function log likelihood Input to classify IBPredict using a GP classifier based on the Laplace approximation. AThe same as gpLaplacePredict but applies to a collection of new , inputs supplied as the rows of a matrix. @Produces a list of pairs of fStar and V rather than the actual I probabilities as further approximations are then required to compute  these. f Covariance function log likelihood Inputs to classify JFCompute the log marginal likelihood and its first derivative for the 0 Laplace approximation for GP classification. 7The convergence test input tests for convergence when H using gpClassificationLaplaceLearn. Note that a covariance function F contains its own parameters and can compute its own derivative so 0 theta does not need to be passed seperately. BOutputs the NEGATIVE log marginal likelihood and a vector of its I derivatives. The derivatives are with respect to the actual, NOT log  parameters. Covariance function log likelihood K1A version of gpClassificationLaplaceEvidence that's usable by the J conjugate gradient function included in the hmatrix library. Computes L the log evidence and its first derivative for the Laplace approximation H for GP classification. The issue is that while it makes sense for a L covariance function to be implemented as a class so that any can easily H be used, we need to supply evidence and its derivatives directly as G functions of the hyperparameters, and these have to be supplied as E vectors of Doubles. The solution is to include a function in the L CovarianceFunction class that takes a list and returns a new covariance G function of the required type having the specified hyperparameters. FParameters: The same parameters as gpClassifierLaplaceEvidence, plus K the list of hyperparameters. Outputs: negative log marginal likelihood + and a vector of its first derivatives. FIn addition to the above, this assumes that we want derivatives with 5 respect to log parameters and so converts using df/ d log p =  p df/dp. log hyperparameters LBThis is the same as gpCLaplaceLogEvidenceList but takes a vector  instead of a list. =>?@ABCDEFGHIJKL>?@ABCDE=FGHIJKL=>?@ABCDE?@ABCDEFGHIJKL MNOlog s igma_f^2 Plog l MNOPMNOPMNOPNOP QRSTQRSTQRSTQRSTRSTUVUVUVUV W;Compute the mean for each attribute in a set of examples. Matrix - one row per example $Vector of means for each attribute. X>Compute the variance for each attribute in a set of examples. Matrix - one row per example (Vector of variances for each attribute. YGCompute the mean and variance for each attribute in a set of examples. Matrix - one row per example Means and variances ZANormalise a set of examples to have specified mean and variance. Vector of new means required !Vector of new variances required Matrix - one row per example Normalised matrix [CThe same as normaliseMeanVariance but every column (attribute) is  normalised in the same way. New mean required New variance required Matrix - one row per example Normalised matrix \CNormalise a set of examples to have specified maximum and minimum. New min required New max required Matrix - one row per example Normalised matrix ]BFind the columns of a matrix in which all values are equal. Matrix - one row per example %List - True elements mark redundancy ^.List column numbers for redundant attributes. Matrix - one row per example )List - positions of redundant attributes _,Remove any redundant columns from a matrix. Matrix - one row per example &Modified matrix - one row per example `4Specify a list of columns (matrix numbered from 1). 4 Produce a matrix with ONLY those columns in the  order specified in the list. List of columns to keep. Matrix - one row per example &Modified matrix - one row per example a.Compute the numbers for the confusion matrix. C It is assumed that classes are +1 (positive) and -1 (negative).  Result is (a,b,c,d):  a - correct negatives 1 b - predict positive when correct is negative 1 c - predict negative when correct is positive  d - correct positives b5Print the confusion matrix and some other statistics Vector of targets Vector of actual outputs cDAssuming the labels are +1 or -1, count how many there are of each. WXYZ[\]^_`abc WXYZ[\]^_`abc WXYZ[\]^_`abcdBGenerate training data for a simple classification problem as in  Rasmussen/Williams, page 62. #Seed for random number generator. dddeAValue and first three derivatives of log Phi with respect to its $ parameter f. log p(y|f) = log Phi (yf) where y is +1 or -1. fefefeffgECompute the main quantities required to do regression, specifically: O the Cholesky decomposition L of the covariance matrix, and the parameters  a lpha such that L L^t y = alpha. The log noise variance  L and alpha. h=Compute the expected value and variance for a collection of > new points supplied as the rows of a matrix. Differs from  gpRPredict'0 as l and alpha need to be computed in advance. l alpha The new inputs Mean, variance i=Compute the expected value and variance for a collection of 7 new points supplied as the rows of a matrix. The log noise variance The new inputs Mean, variance j,Compute the log of the marginal likelihood. l alpha log marginal likelihood k5Compute the gradient of the log marginal likelihood. > Output contains derivative with respect to noise variance D followed by the derivatives with respect to the hyperparameters  in the covariance function. the log noise variance l alpha  Derivatives l>Given the log parameters and other necessary inputs, compute I the NEGATIVE of the log marginal likelihood and its derivatives with ' respect to the LOG hyperparameters. *log hyperparameters, noise variance first ghijklghijklghijklmFValue and first three derivatives of log sigmoid with respect to its E parameter f. log p(y|f) = log sigmoid (yf) where y is +1 or -1. nmnmnmnn!oIf we'<re updating sites in a random order then we need access to ! the random number generator. p@We hide the state used in performing EP using the state monad. C We need to include a random number generator and the number of  iterations. q>When updating a single site at a time you keep track of var, 1 tauTilde, mu, nuTilde, TauMinus, and MuMinus. r?By passing a function with this type we can specify arbitrary  convergence tests. s?A convergence test for EP usually depends on the evidence and ? the number of iterations so far. This allows us to specify + completely arbitrary convergence tests. tuvDGenerates a basic start state for the sites, with var = covariance  matrix and all vectors = 0. Number of sites <Updates for the EP version of Gaussian Process Classifiers. 9 cavityParameters, marginalMoments and siteParameters 9 are successive parts of the update for a single site. varI  tauTildeI muI  nuTildeI  muMinusI tI  varMinusI  tauTildeI varHatI  tauMinusI muHatI  nuMinusI !Do a complete update for site i. Labels Number of sites Site to update @Generate a random permutation. This is wrapped up in the state ( transformer generateRandomSiteOrder. Random number generator Size of list required New generator and result. wWe';re often going to want to update sites in a random order. G So we need a state transformer that takes the current state (which J includes a random number generator) and produces a random permutation. x>For completeness: just in case you want to update sites in a @ non-random manner, this state transformer does exactly that. >Update all the sites in the order specified by a list of Ints Number of sites Sites to update ;Re-compute the approximation after updating all the sites.  Outputs S igma and mu. Number of sites  tauTilde nuTilde y:Compute the approximation to the log marginal likelihood.  L matrix log marginal likelihood. As we'<re hiding the state using the State monad, we make a state G transformer that uses updateAllSites and recomputeApproximation to D do a complete single update. This will make use of an arbitrary G state transformer to produce a list specifying the order to update K the sites in. The output is the l matrix produced when recomputing the  approximation. Supplier of update order. zEThe learning algorithm. Takes an arbitrary function for convergence  testing. {5Prediction with GP classifiers based on EP learning. 9 Takes a matrix in which each row is an example to be  classified. Inputs in training set Covariance Function  New inputs |LCompute the log evidence and its first derivative for the EP approximation . for GP classification. Targets should be +1/-1. Outputs the -log 8 marginal likelihood and a vector of its derivatives.  Covariance }@Essentially the same as gpClassifierEPLogEvidence, but makes a E covariance function using the hyperparameters supplied in a list  and passes it on.  Covariance ~@Essentially the same as gpClassifierEPLogEvidence, but makes a G covariance function using the hyperparameters supplied in a vector  and passes it on.  Covariance opqrstuvwxyz{|}~stuvrqpowxyz{|}~opqrstuvtuvwxyz{|}~DThis function defines when iteration stops for the Laplace version. ?This function defines when iteration stops for the EP version. ,This function defines when iteration stops.  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEF G H I J K L M N O P Q R S T U V V W X Y Z [ \ ] ^ _ ` a b c d d e f g g h ijklmnopqrstuvwxyzz{|}~X\   HasGP-0.1HasGP.Parsers.SvmLightHasGP.Support.IterateHasGP.Types.MainTypesHasGP.Support.FunctionsHasGP.Support.MatrixFunctionHasGP.Support.LinearHasGP.Support.SolveHasGP.Support.RandomHasGP.Covariance.BasicHasGP.Likelihood.Basic2HasGP.Classification.Laplace.ClassificationLaplaceHasGP.Covariance.SquaredExpHasGP.Covariance.SquaredExpARDHasGP.Data.BishopDataHasGP.Data.NormaliseHasGP.Data.RWData1HasGP.Likelihood.LogPhiHasGP.Regression.RegressionHasGP.Likelihood.LogLogistic(HasGP.Classification.EP.ClassificationEPHasGP.Demos.ClassificationDemo1HasGP.Demos.ClassificationDemo2HasGP.Demos.RegressionDemo1analysegetMatrixExamplesFromFileC iterateOnceiterateToConvergenceiterateToConvergence'iterateToConvergence''OutputsTargetsCovarianceMatrixInputsInputDMatrixDVectorsquaretracedelta deltaBoolgeneralSigmoidsigmoid phiIntegralnnOverPhilogPhimakeMatrixFromFunction2makeMatrixFromPairs2makeMatricesFromPairs sumVector sumVectorDivlengthVtoVectorreplaceInVector preMultiply postMultiplyxAxDiag abDiagOnly abaDiagDiagabaVV upperSolve lowerSolvecomputeNthElement generalSolve cholSolve uniformMatrixnormalVectorSimple normalList normalVector normalMatrixCovarianceFunction trueHyper covariancedCovarianceDParametersmakeCovarianceFromListmakeListFromCovariancecovarianceMatrixcovarianceWithPointcovarianceWithPoints LogLikelihood likelihood dLikelihood ddLikelihood dddLikelihoodLaplaceConvergenceTest LaplaceValuefValueeValuepsiValue dPsiValueaValuecountgpCLaplaceLearn convertToP_CGgpCLaplacePredictgpCLaplacePredict'gpCLaplaceLogEvidencegpCLaplaceLogEvidenceListgpCLaplaceLogEvidenceVecSquaredExponentialflSquaredExponentialARDfARDmh bishopData exampleMeanexampleVarianceexampleMeanVariancenormaliseMeanVariancenormaliseMeanVarianceSimplenormaliseBetweenLimitsfindRedundantAttributeslistRedundantAttributesremoveRedundantAttributesretainAttributesconfusionMatrixprintConfusionMatrix countLabelssimpleClassificationDataLogPhigpRMain gpRPredict gpRPredict'gpRLogEvidencegpRGradLogEvidencegpRLogHyperToEvidence LogLogistic SiteOrderEPState EPSiteStateEPConvergenceTestEPValue siteStategenerateRandomSiteOrdergenerateFixedSiteOrdergpClassifierEPEvidencegpClassifierEPLearngpClassifierEPPredictgpClassifierEPLogEvidencegpClassifierEPLogEvidenceListgpClassifierEPLogEvidenceVec stopLaplacestopEP FullExample FullFeature FullTargetoneZero classTargetpositiveDoublepositiveDoubleNotInt signedDoublesignedDoubleNotInttargetintegerfeaturefeatureValuePair generalLetterstringToLineEndinfolinefilesplit1split2split3fullExamplesSeparateclassificationProblemgetClassificationTargetsregressionProblemgetRegressionTargetsnoQidgetExampleRangecomp sortExamples insertZeros getExamplesdimensionsCorrect dimensionsgetExamplesFromFilefullExampleToMatrixClSuS LaplaceStategpCLaplaceUpdatesingleIterationdemovartauTildemunuTildetauMinusmuMinusgenerateInitialSiteStatecavityParametersmarginalMomentssiteParameters updateOneSiterandomPermutationupdateAllSitesrecomputeApproximation doOneUpdate