I have been working more and more on chIP sequencing data recently, which can be pretty huge. Even simple tasks such as counting the number of lines in a file, sorting, filtering etc now have a considerable time cost.
In order to assess the most efficient way of performing some operations I have been using the time function at the command line. For example:
wc -l test.txt
19050959 test.txt
time sort test.txt >test_normalsort.txt
real 1m59.395s
time distSort test.txt
real 2m18.901s
In this case the normal sort was faster than a distributed sort and merge, but that could just be as our cluster was really busy when I ran this. Either way time is very useful.
No comments:
Post a Comment