Wednesday, 4 November 2009

Perl one liner: Random Lines from a File

I have some bed files that are too large to process in a reasonable time, so I need to randomly sample lines from them to create files of a workable size.

I used some bash and perl magic for this.

for f in *.bed;do export WC=`wc ${f} -l |cut -f 1 -d " "`;perl -i -ne 'srand;print if rand() <1500/$ENV{'WC'}' ${f} ;done


Basically, it checks the length of the file and stores the result in the environment variable WC, then it reads in the file line by line and only prints out the line if a random number between 0 and 1 is less than the proportion of our required size (1500 in this case) of our length (WC).

This is looped round all bed files in the current directory.

Edit:
You could also do something like this:

perl -ne 'print rand;print "\t";print;' FILENAME |sort |head -n 100 |cut -f 2 >NEWFILENAME


Which will return a random 100 lines from the file.

No comments:

Post a Comment