Your first Condor Submission
Condor is a parallel computing platform, so you can use it to run multiple jobs simultaneously, significantly reducing your computing time. The goal of this tutorial is to successfully submit a job to condor that will print "Hello world!" ten times. The ingredients of this simple example will serve as the starting point for future submissions.
You will need...
- An executable - This is the program or script that you want to use condor to run.
- A .sub file - This file describes the job that will be running.
- A .dag file - This creates a list of the jobs and specifies file path inclusions.
- A wrapper - This is another script that is called by the .sub file and points to the executable we want to run.
The executable
Let's make a shell script that simply prints "Hello World!" to the screen. Copy and paste the code below to a file called "helloWorld.sh".
#!/bin/bash
#file name: helloWorld.sh
echo "Hello world!"
The .sub file
universe = vanilla
executable = /home/sylvia.biscoveanu/condor/condor.sh
log = /home/sylvia.biscoveanu/condor/logs/condor.log
error = /home/sylvia.biscoveanu/condor/logs/condor_$(jobNumber).err
output = /home/sylvia.biscoveanu/condor/logs/condor_$(jobNumber).out
arguments = $(jobNumber)
accounting_group = ligo.prod.o1.sgwb.explore.test
notification = error
request_memory = 4000
queue 1
The above code should be saved as "condor.sub". All file paths should be replaced with your desired output location, but the file names can stay the same. The accounting_group tag is only if running condor through the LIGO Data Grid and depends on which group you are working with. You can calculate the appropriate tag here.
The .dag file
Download the following perl script to create your .dag file: dag.pl Several fields will need to be modified. Replace all paths that include "sylvia.biscoveanu" and rename "sylvia.dag" in the line after the $LIB specification. $njobs can be changed, but for now we have set it to 10. If you run this script in the command line using perl dag.pl a new .dag file will be created based on whatever you decided to call it. It should look like this.
The wrapper
While you could tell condor.sub to point directly to the helloWorld.sh executable, it is safer to point to a wrapper that calls helloWorld.sh indirectly. This allows us to set up environment variables and ensure that the first job always runs successfully. A sample wrapper can be downloaded here. The memory limit can be changed, but it is currently set for 4GB. This is more than enough for our submission. Change the path to the hostname and OUTPUTFILE. This is where "Hello World!" will be printed along with the job number.
Testing and submitting
Test your wrapper in the terminal before submitting to condor by typing condor.sh 0 into the command line. 0 refers to the job number and can actually be any number from 0-10 since that's how many jobs we have. You should have a new file called output.log that looks like this:
Job number 0
"Hello world!"
If this file was produced, you are ready to try running your program on condor! To do this, simply type the following into the command line: condor_submit_dag sylvia.dag but replace "sylvia.dag" with the name of your dag file. You will see several new files created in the directory from which you ran condor. The file called "sylvia.dag.dagman.out" (or whatever you called your .dag file) will be the most useful since it contains that output that condor produces when running the program. You can see if there were any errors or check on the progress of your submission in this file.