Simseq - SIMulate SEQuences. Yep, that's real creative. Synopsis -------- Generates a bunch of sequences from a set of reference sequences. For ESTs, NCBI's refseq transcripts are probably good choices. The generated sequences are generated using a model that specifies priming conditions and error generation. Currently, this is not very refined, you can try simseq --model=sanger:n,d reference.fasta Where n indicates the number of sequences to generate, starting points drawn from a uniform distribution, and d probability of being in the forward direction. Or, even more experimentally: simseq --model=454:n,d Which implemets a completely unfounded and baseless model of 454/Roche pyrosequencing. (Okay, actually based on a paper by Marguiles et al, but more data is definitely a requirement). Solexa will be installed as soon as anybody says something definitive about the error modes. In any case, running out of sequence results in X's, indicating vector, which I hope makes sense for Sanger, at least. Install ------- The usual Cabal routine. Get a working GHC compiler, install my 'bio' library, and do: chmod +x Setup.hs ./Setup.hs configure ./Setup.hs build sudo ./Setup.hs install Mail me if it didn't work - .