id	summary	reporter	owner	description	type	status	priority	resolution	keywords	cc	topic	difficulty	mentor
1121	bio library development	KetilMalde	none	"The ""bio"" package is a collection of useful functionality aimed at bioinformatics.  Its development has been largely driven by the immediate needs of the applications that use it, and its current contents reflect this.  Ideally, this would develop into a general, broadly scoped bioinformatics library (akin to bioperl and biopython).

The library can be extended in many directions, and this will to a large part be dictated by student interests and background.  Some possibilities are:

 * sequence alignment:  some rudimentary alignment exists already, but more advanced methods and multiple alignment, phylogeny, etc would be useful.  Haskell should also lend itself well to parallel alignment.  (In addition, laziness could possibly be used to make T-coffee-type multiple alignments more efficient?)

 * machine learning: this is an important aspect, but as it is applicable to many different domains, I've added a separate ticket.  Interested students should probably apply for a pure/general machine learning library (ticket 1127) instead of restricting it to bioinformatics.

 * file formats: currently there's support for a handful of file formats, but there exists many other, usually simple text-based, file formats to which a parser would be nice to have.  File format support is likely to be a part of other tasks, but also useful in its own right.

 * suffix arrays: gives time and space efficient searching.  Efficient construction is likely to require some low-level hacking, but would improveme many algorithms that currently use associative structures.  Also generally useful for many  text-related problems.

Application-driven library development may be a useful route, so library development as part of solving concrete (biological or otherwise) problems is welcome.

If this sounds interesting, please e-mail me at <ketil at malde dot org> to discuss the details."	proposed-project	new					Bioinformatics	unknown	not-accepted
