Sunday, 9 August 2009

One Liner: Count occurrences of multiple patterns in multiple files

There may be a more elegant solution to this, but I wanted to count the number of times a number of sequences occur in a number of files. Replace FILES with the list of files you want to search in (e.g. *.txt) and replace PATTERNS, with a file containing the things you want to search for, one entry per line. This BASH script should do the rest.

for f in FILES;do cat PATTERNS |while read seq;do grep ${seq} ${f} |wc -l|xargs echo ${f} ${seq};done;done

It is basically two loops, one that goes through the files, the other through each line in the PATTERNS file, then it just uses xargs to output the results in a sensible order. If you don't care about the number of each individual pattern in the file but just the total the -f option to grep would be work.

No comments:

Post a Comment