Thursday 16 December 2010

LSF: Using job arrays



Our cluster uses the LSF job scheduler. One feature that I find useful is the ability to create job arrays. These are similar jobs that differ in just once respect, such as input file or parameter, or they could be identical such as for simulations, modelling etc. The main benefit of job array, at least for me, is the ability to control the number of jobs running, and change it on the fly. For example I need 500 jobs running, but I can only run 240 jobs at any one time on my group's queue on the cluster and that would exclude others in my group for getting anything done. So I can use job arrays to submit 500 jobs, but only allow 100 to run at any time. When one finishes another one starts until they are all finished.


Another benefit of jobs arrays is that they are submitted as a single job, so a job array with a thousand parts is submitted instantly, but submitting a thousand separate jobs would take a long time.


Below is an example bash script that does a distributed sort, it is designed to show how to use job arrays and dependencies, not necessarily how to do sorting.

### Generate a random big file that we want to sort, 10 Million lines
perl -e 'for (1..1E7){printf("%.0f\n",rand()*1E7)};' > bigFile
### Split the file up into chunks with 10,000 lines in each chunk
split -a 3 -d -l 10000 bigFile split
### rename the files on a 1-1000 scheme not 0-999
for f in split*;do mv ${f} $(echo ${f} |perl -ne 'm/split(0*)(\d+)/g;print "Split",$2+1,"\n";');done
### submit a job array, allowing 50 jobs to be run at anyone time
ID=$(bsub -J "sort[1-1000]%50" "sort -n Split\$LSB_JOBINDEX >Split\$LSB_JOBINDEX.sorted" |perl -ne 'm/<(\d+)>/;print "$1"')
### merge the sorted files together once all the jobs are finished using the –w dependency
ID2=$(bsub -w "done($ID)" "sort -n -m *.sorted >bigFile.sorted" |perl -ne 'm/<(\d+)>/;print "$1"')
### Delete the temp files, waits for the merge to finish first
bsub -w "done($ID2)" "rm -f Split*"

The main point is that the jobs differ only by the value passed to them from the $LSB_JOBINDEX environment variable. Each job gets a different version of this with the number specified in the square brackets earlier, [1-1000] in this case. There are also additional notation for doing steps, such as 10,20,30 and you can also just specify a list of numbers such as 1,5,10,22,999 etc.


The hard part is making this simple number map to something useful for your task, in this case it was easy as I used split to name the files with sequential numbers, but perhaps you have 500 data-sets you want to perform the same analysis on. In this case you either rename the data-sets with a sequential naming, or use a look up table to associate input files with the numbers given from $LSB_JOBINDEX and have your analysis script use the lookup table to convert the number from $LSB_JOBINDEX into an input filename or parameter.
They key point in the code is using the %50 notation to choose how many jobs to run at any one time. This can be changed with bmod, for example:


bmod -J"%100" JOBID This would now allow 100 jobs to be run simultaneously, rather then 50. Notice also the use of the perl one liner (I am sure awk would work too) to get the job ID and store it ready to use as a dependency for the next step. This is another benefit of the job array, in that there is just one job id, which makes modifying and killing jobs much easier.

You can monitor the status of job arrays with the -A flag to bjos (bjobs -A), which will show you how many jobs are pending, running, done or exited etc.

If you want to check the progress if a particular job you can do a bpeek using its job id and array id, e.g. bpeek 1234542[101], the same notation works for bkill and bjobs

3 comments:

  1. Looks like this could be an interesting article, but the formatting is really messed up viewing with Chrome and Safari. Also, I don't see the script. -bob

    ReplyDelete
    Replies
    1. I see what you mean, blogger seems to have added added some formatting and removed my gist link. Hopefully it is legible now. Stew

      Delete
  2. I think the article is good, and a good article can not spoil anything)))
    Richard Brown vdr data room

    ReplyDelete