Thursday, 6 August 2009

Organizing Computational Biology Projects

I saw this interesting article in PLoS Computational Biology: A Quick Guide to Organizing Computational Biology Projects. It makes some interesting points and though most people come to the same conclusions after working in computational biology for a while, it might save some people making the same mistakes over and over. Organizing computational work is a problem for everyone in the field, especially people just starting out.

The main problem is that we never feel we have time to organize and document computational analyses fully at the time, however after several occasions of having to repeat analysis because we forgot how we did it the first time, or writing a script only to find you had already written one you just couldn't find, it becomes apparent that it is quicker, in the long run, to do things properly the first time.

Everyone does these things differently and will find their own way but I think the article is a good description of best practices. I always try and think, "Could someone look at this and repeat what I did?" Normally that someone is me a couple of months later, so it is well worth making sure they can.

I am not perfect, but the key for me is documentation. Good notes within scripts about what they do, to what and how. Also keeping a computational version of a lab book. Which I do in wiki form, so other people in my core could repeat my work is necessary. However a simple text time does the job as well. Version control is another important aspect for me. I have only recently started to use subversion to track changes to my code, but version control can be as simple as documenting which genome build you did some analysis with, or which version of a dataset you used and noting its location.

I think I might try and write some more about this, as formalizing my thought might help me improve my system too. I have particular problems in keeping track of some work that is spread across computer systems, on my desktop and on the cluster. I really need a more formal way or organizing that, maybe a file that indicated the location of the analysis on the other platform, anyway enough for now. Enjoy the article.

No comments:

Post a Comment