I used some bash and perl magic for this.
for f in *.bed;do export WC=`wc ${f} -l |cut -f 1 -d " "`;perl -i -ne 'srand;print if rand() <1500/$ENV{'WC'}' ${f} ;done
Basically, it checks the length of the file and stores the result in the environment variable WC, then it reads in the file line by line and only prints out the line if a random number between 0 and 1 is less than the proportion of our required size (1500 in this case) of our length (WC).
This is looped round all bed files in the current directory.
Edit:
You could also do something like this:
perl -ne 'print rand;print "\t";print;' FILENAME |sort |head -n 100 |cut -f 2 >NEWFILENAME
Which will return a random 100 lines from the file.
No comments:
Post a Comment