Condor - simple guide
Contents: |
1. Introduction
This document is intended to provide a quick introduction to Condor, to introduce the most important concepts, and to provide examples to allow users to start using Condor as quickly as possible. Although this document is quite large it should be less intimidating than the Condor Reference Manual!For any more advanced use, however, you will have to refer to the vast Condor Reference Manual.
You may also wish to look at the Condor Project Homepage
1.1. What is Condor
Condor is a specialized batch system for managing compute-intensive jobs. Users submit their compute jobs to Condor, Condor puts the jobs in a queue, runs them, and then informs the user as to the result. The collection of inter-networked machines running Condor and controlled by a particular manager is known as a pool. Like most batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme, and resource classifications.In slightly more detail: a user submits the job to Condor from one of a number of Submit machines. Condor finds an available Execute machine from the pool and begins running the job on that machine. Condor has the capability to detect that a machine running a Condor job is no longer available (perhaps because the owner of the machine came back from lunch and started typing on the keyboard). It might be able to checkpoint the job and move (migrate) the jobs to a different machine which would otherwise be idle. If it has been able to checkpoint the job then Condor continues the job on the new machine from precisely where it left off.
Condor does not require an account (login) on machines where it runs a job. Condor can do this because it uses remote system calls which trap library calls for such operations as reading or writing from disk files. The calls are transmitted over the network to be performed on the machine where the job was submitted.
Every machine in a Condor pool can serve a variety of roles, and most machines will serve more than one role simultaneously, although certain roles can only be performed by single machines in a pool. The following list describes the 4 different roles:
- Central Manager
The Manager machine is the collector of information, and the negotiator between resources and resource requests. There is only one central manager for a pool. - Submit
Submit machines queue Condor jobs. There may be more than one. Users will only need to be able to log in to these. - Execute
Execute machines actually run the Condor jobs. There may be more than one. - Checkpoint Server
The checkpoint server is a centralized machine that stores all the checkpoint files for the jobs submitted in the pool. Only one machine in a pool can be configured as a checkpoint server, and its presence is optional.
1.2. Condor on the PLG Cluster
To see condor's view of running machines, use condor_status. People wishing to use them for running jobs under condor only need to be able to log in pleiades.plg.inf.uc3.es.We have chosen to support only the Vanilla Universe on our Condor pool.
We have decided not to allow preemption on the local pool (this is not the default behaviour - see 4.1.2. User Priority for details).
1.2.1 Etiquette
Because we have turned off job preemption it is possible for a single user to use the entire pool for long periods, thus preventing other people from getting any jobs to run. Condor does not have sophisticated scheduling mechanisms, there is not much we can do about this!We have decided to adopt the policy that jobs should aim to finish within a reasonable amount of time - anyone requiring the execution of very long jobs should contact sys-admin.
Users should add the line "nice_user = True" to their jobs as a matter of course. This will ensure that when a new job is to be started, they will only be considered if there are no other jobs free to run. This means that if a user has submitted a large batch of jobs, other users can submit a small number of non-nice jobs. This will only work if most waiting jobs are niced. Note that even if jobs are niced, while they are running, they stop other jobs from starting, so please try to ensure most jobs do complete within a reasonable amount of time.
2. Using Condor
- Code Preparation.
A job run under Condor must be able to run as a background batch job. Condor runs the program unattended and in the background. A program that runs in the background will not be able to do interactive input and output. Condor can redirect console output (stdout and stderr) and keyboard input (stdin) to and from files for you. Create any needed files that contain the proper keystrokes needed for program input. Make certain the program/script will run correctly with these files on the submit machine. - Submit description file.
A submit description file controls the details of job submission. The file will contain information about the job such as what executable to run, the files to use for keyboard and screen data, the platform type required to run the program, and where to send e-mail when the job completes. You can also tell Condor how many times to run a program; it is simple to run the same program multiple times with multiple data sets. Write a submit description file to go with the job, using the syntax description and some illustrative examples here. - Submit the Job.
Login to a submit machine (see above) and submit the program to Condor with the condor_submit command.
When your program completes, Condor will tell you the exit status of your program and various statistics about its performance, including time used and I/O performed. If you are using a log file for the job (which is recommended) the exit status will be recorded in the log file. Alternatively you can view the history file for the job by typing condor_history, which will show something like:
ID OWNER SUBMITTED CPU_USAGE ST PRI SIZE CMD 1.0 condor 6/13 10:58 0+00:00:00 C 0 0.9 job_blahNotice that the status ("ST") is now C, for completed.
You can remove a job from the queue prematurely with condor_rm.
3. Submit File syntax
A submit description file controls the details of job submission. The syntax is simple, a list of the most important entries grouped by concept follows. This is by no means a full list, for that see the condor_submit man page, this selection is intended mainly to make it easier to understand the examples given in other sections.Blank lines and lines beginning with a pound sign (#) character are ignored by the submit description file parser, and so may be used for comments.
3.1. Basic entries
- executable = name
The name of the executable file for this job cluster (for a definition of a job cluster see this example). - arguments = argument_list
List of arguments to be supplied to the program named as the executable on its command line. - input = pathname
- output =pathname
- error =pathname
Condor assumes that its jobs are long-running, and that the user will not wait at the terminal for their completion. Because of this, the standard files which normally access the terminal, (stdin, stdout, and stderr), must refer to files. Thus, the file name specified with input should contain any keyboard input the program requires (that is, this file becomes stdin). Likewise with output and error. If not specified, the default value of /dev/null is used for submission to a Unix machine. - universe = vanilla | standard | java
Specifies which Condor Universe to use when running this job. Remember that only vanilla is suported. - initialdir = directory-path
Used to give jobs a directory with respect to file input and output. Also provides a directory (on the submit machine) for the user log. - log = pathname
Use log to specify a file name where Condor will write a log file of what is happening with this job cluster. For example, Condor will log into this file when and where the job begins running, when the job completes, etc. Most users find specifying a log file to be very handy; its use is recommended. If no log entry is specified, Condor does not create a log for this cluster. - queue [number-of-procs]
Places one or more copies of the job into the Condor queue. The optional argument number-of-procs specifies how many times to submit the job to the queue, and it defaults to 1. If desired, any commands may be placed between subsequent queue commands, such as new input, output, error, initialdir, arguments, or executable commands. This is handy when submitting multiple runs into one cluster with one submit description file. Multiple clusters may be specified within a single submit description file by changing the executable between queue commands. Each time the executable command is issued (between queue commands), a new cluster is defined.
3.2. Job Ordering and location
- priority = priority
Condor job priorities default to 0. Jobs with higher numerical priority will run before jobs with lower numerical priority. Note that this priority is on a per user basis; setting the priority will determine the order in which your own jobs are executed, but will have no effect on whether or not your jobs will run ahead of another user's jobs. See 4.1 Priority. - nice_user = True | False
Normally, when a machine becomes available to Condor, Condor decides which job to run based upon user and job priorities. Setting nice_user equal to True tells Condor not to use your regular user priority, but that this job should have last priority among all users and all jobs. Jobs submitted in this fashion run only on machines which no other non-nice_user job wants. This is very handy if a user has some jobs they wish to run, but do not wish to use resources that could instead be used to run other people's Condor jobs. Jobs submitted in this fashion have "nice-user." pre-appended in front of the owner name when viewed from condor_q or condor_userprio. The default value is False. - requirements = Boolean Expression
The requirements command is a boolean expression which uses C-like operators. In order for any job in this cluster to run on a given machine, this requirements expression must evaluate to true on the given machine. For example, to require that whatever machine executes your program has a least 64 Meg of RAM and has a MIPS performance rating greater than 45, use:requirements = Memory >= 64 && Mips > 45
Only one requirements command may be present in a submit description file. Unless you request otherwise, Condor will by default give your job to machines with the same architecture and operating system version as the machine running condor_submit. See 4.2 Machine Attributes. In our current configuration all the machines have the same characteristics, so for the time being requirements can be ignored. - rank = Float Expression
The argument is a Floating-Point expression that states how to rank machines which have already met the requirements expression. Essentially, rank expresses preference. A higher numeric value equals better rank. Condor will give the job to the machine with the highest rank. For example,requirements = Memory > 60 rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory and give the job the one with the most amount of memory. See 4.3 Ranking. Similarly to requirements, this will have no effect on a homogeneous computing cluster.
3.3. File Handling
- transfer_input_files = file1, file2, ...
A comma-delimited list of all the files and directories to be transferred into the working directory for the job, before the job is started. By default, the file specified in the executable command and any file specified in the input command (for example, stdin) are transferred.When a path to an input file or directory is specified, this specifies the path to the file on the submit side. The file is placed in the job's temporary scratch directory on the execute side, and it is named using the base name of the original path. For example, /path/to/input_file becomes input_file in the job's scratch directory.
A directory may be specified using a trailing path separator. An example of a trailing path separator is the slash character on Unix platforms; a directory example using a trailing path separator is input_data/. When a directory is specified with a trailing path separator, the contents of the directory are transferred, but the directory itself is not transferred. It is as if each of the items within the directory were listed in the transfer list. When there is no trailing path separator, the directory is transferred, its contents are transferred, and these contents are placed inside the transferred directory.
- transfer_output_files = file1, file2, ...
This command forms an explicit list of output files and directories to be transferred back from the temporary working directory on the execute machine to the submit machine. If there are multiple files, they must be delimited with commas.If transfer_output_files is not specified, Condor will automatically transfer back all files in the job's temporary working directory which have been modified or created by the job. Subdirectories are not scanned for output, so if output from subdirectories is desired, the output list must be explicitly specified. Another reason to explicitly list output files is for a job that creates many files, and the user wants only a subset transferred back.
When a path to an output file or directory is specified, it specifies the path to the file on the execute side. As a destination on the submit side, the file is placed in the job's initial working directory, and it is named using the base name of the original path. For example, path/to/output_file becomes output_file in the job's initial working directory. The name and path of the file that is written on the submit side may be modified by using transfer_output_remaps. Note that this remap function only works with files but not with directories.
A directory may be specified using a trailing path separator. An example of a trailing path separator is the slash character on Unix platforms; a directory example using a trailing path separator is input_data/. When a directory is specified with a trailing path separator, the contents of the directory are transferred, but the directory itself is not transferred. It is as if each of the items within the directory were listed in the transfer list. When there is no trailing path separator, the directory is transferred, its contents are transferred, and these contents are placed inside the transferred directory.
Symbolic links to files are transferred as the files they point to. Transfer of symbolic links to directories is not currently supported.
3.4. Job Information
- notification = when
Owners of Condor jobs are notified by email when certain events occur. If when is set to Always, the owner will be notified whenever the job is checkpointed, and when it completes. If when is set to Complete (the default), the owner will be notified when the job terminates. If when is set to Error, the owner will only be notified if the job terminates abnormally. If when is set to Never, the owner will not be mailed, regardless what happens to the job. - notify_user = email-address
Used to specify the email address to use when Condor sends email about a job. If not specified, Condor will default to using job-owner@UID_DOMAIN where UID_DOMAIN is specified by the Condor site administrator.
3.5. Environment
- environment = parameter_list
A list of environment variables which will be placed (as given) into the job's environment before execution. The list is of the form : <parameter>=<value>. Multiple environment variables can be specified by separating them with a semicolon (;) when submitting from a Unix platform. The length of the list specified in the environment is currently limited to 10240 characters. - getenv = True | False
If getenv is set to True, then condor_submit will copy all of the user's current shell environment variables at the time of job submission into the job description. The job will therefore execute with the same set of environment variables that the user had at submit time. Defaults to False.
3.6. Macros
Parameterless macros in the form of $(macro_name) may be inserted anywhere in Condor submit description files. Macros can be defined by lines in the form of<macro_name> = <string>Two pre-defined macros are supplied by the submit description file parser. The $(Cluster) macro supplies the number of the job cluster, and the $(Process) macro supplies the number of the job. These macros are intended to aid in the specification of input/output files, arguments, etc., for clusters with lots of jobs, and/or could be used to supply a Condor process with its own cluster and process numbers on the command line. For an example see 5.2. Multiple Submission - Different Inputs.
In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows you to substitute expressions defined on the resource machine itself (gotten after a match to the machine has been performed) into specific expressions in your submit description file. The special substitution macro is of the form $$(attribute). It may only be used in three expressions in the submit description file: executable, environment, and arguments. Example:
executable = myprog.$$(opsys).$$(arch)The opsys and arch attributes will be substituted at match time for any given resource. This will allow Condor to automatically choose the correct executable for the matched machine.
The environment macro, $ENV, allows the evaluation of an environment variable to be used in setting a submit description file command. The syntax used is
$ENV(variable)For example:
log = $ENV(HOME)/jobs/logfile
4. Job scheduling - Priority, Requirements and Rank
The scheduling arrangements adopted by condor control when and on which machine your jobs are run. Priority (both per-job and per-user) determine when a job will be run, ranking (which uses requirements and machine attributes) may be used to determine where a job is run.All machines in a Condor pool advertise their attributes, such as available RAM memory, CPU type and speed, virtual memory size, current load average, along with other static and dynamic properties. This machine information also includes under what conditions a machine is willing to run a Condor job and what type of job it would prefer.
Likewise, when submitting a job, you can specify your requirements and preferences, for example, the type of machine you wish to use. You can also specify an attribute, for example, floating point performance, and have Condor automatically rank the available machines according to their values for this attribute. Condor plays the role of a matchmaker by continuously reading all the job requirements and all the machine information, matching and ranking jobs with machines.
4.1 Priority
4.1.1 Job Priority
Job priorities allow the assignment of a priority level to each submitted Condor job in order to control order of execution - note that these are priorities between jobs of the same user only. To set a job priority, use the condor_prio command, or use the priority command in your submit description file. Job priorities do not impact user priorities in any fashion.4.1.2 User Priority
The default behaviour for Condor is to allocate machines to users based upon a user's priority - which changes according to the number of resources the individual is using. It is possible to submit a job as a "nice" job. Setting nice_user in your submit description file tells Condor not to use your regular user priority, but that this job should have the least priority among all users and all jobs.4.2 Machine Attributes
The attributes advertised by a machine can be seen with condor_status -l machine_name. Some of the listed attributes are used by Condor for scheduling. Other attributes are for information purposes. An important point is that any of the attributes in a machine can be utilized at job submission time as part of a request or preference on which machine to use. Additional attributes can be easily added.For example, this is the output of condor_status -l for one processor of the machine pb001:
MyType = "Machine" TargetType = "Job" Name = "pb001@plg.inf.uc3m.es" Machine = "pb001.plg.inf.uc3m.es" Rank = 0.000000 CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000) COLLECTOR_HOST_STRING = "pb001.plg.inf.uc3m.es" CondorVersion = "$CondorVersion: 6.6.7 Oct 11 2004 $" CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $" VirtualMachineID = 1 VirtualMemory = 0 Disk = 467172 CondorLoadAvg = 0.000000 LoadAvg = 0.000000 KeyboardIdle = 25539262 ConsoleIdle = 25539262 Memory = 29994 Cpus = 1 StartdIpAddr = "<128.232.4.1:33071>" Arch = "x86_64" OpSys = "LINUX" UidDomain = "plg.inf.uc3m.es" FileSystemDomain = "plg.inf.uc3m.es" Subnet = "128.232.4" HasIOProxy = TRUE TotalVirtualMemory = 0 TotalDisk = 934344 KFlops = 951601 Mips = 3370 LastBenchmark = 1103098732 TotalLoadAvg = 0.000000 TotalCondorLoadAvg = 0.000000 ClockMin = 678 ClockDay = 3 TotalVirtualMachines = 2 HasFileTransfer = TRUE HasMPI = TRUE HasJICLocalConfig = TRUE HasJICLocalStdin = TRUE JavaVendor = "Sun Microsystems Inc." JavaVersion = "1.4.1_01" JavaMFlops = 295.152039 HasJava = TRUE HasPVM = TRUE HasRemoteSyscalls = TRUE HasCheckpointing = TRUE StarterAbilityList = "HasFileTransfer,HasMPI,HasJICLocalConfig,HasJICLocalStdin, HasJava,HasPVM,HasRemoteSyscalls,HasCheckpointing" CpuBusyTime = 0 CpuIsBusy = FALSE State = "Unclaimed" EnteredCurrentState = 1103041577 Activity = "Idle" EnteredCurrentActivity = 1103098732 Start = TRUE Requirements = START CurrentRank = 0.000000 DaemonStartTime = 1103041099 UpdateSequenceNumber = 230 MyAddress = "<128.232.4.1:33071>" LastHeardFrom = 1103109536 UpdatesTotal = 231 UpdatesSequenced = 230 UpdatesLost = 0 UpdatesHistory = "0x00000000000000000000000000000000"
4.3 Ranking
When considering the match between a job and a machine, rank is used to choose a match from among all machines that satisfy the job's requirements and are available to the user, after accounting for the user's priority and the machine's rank of the job. The rank expressions, simple or complex, define a numerical value that expresses preferences.The job's rank expression evaluates to one of three values. It can be UNDEFINED, ERROR, or a floating point value. If rank evaluates to a floating point value, the best match will be the one with the largest, positive value. If no rank is given in the submit description file, then Condor substitutes a default value of 0.0 when considering machines to match. If the job's rank of a given machine evaluates to UNDEFINED or ERROR, this same value of 0.0 is used. Therefore, the machine is still considered for a match, but has no rank above any other.
A boolean expression evaluates to the numerical value of 1.0 if true, and 0.0 if false.
Example 1: For a job that desires the machine with the most available memory:
Rank = memoryExample 2: For a job that prefers to run on Saturdays and Sundays:
Rank = ( (clockday == 0) || (clockday == 6) )It is wise when writing a rank expression to check if the expression's evaluation will lead to the expected resulting ranking of machines. This can be accomplished using the condor_status command with the -constraint argument. This allows the user to see a list of machines that fit a constraint.
Example 1: To see which machines in the pool have kflops defined, use:
condor_status -constraint kflopsExample 2:If this is typed on a Wednesday it will show all of the machines in the pool, on any other day it will show none:
condor_status -constraint "(clockday == 3)"
5. Examples
5.1. A very simple job
Using the C program called hello.c:#include <stdio.h> main() { printf("hello, Condor\n"); exit(0); }The submit file, submit.hello, is:
######################## # Submit description file for hello program ######################## Executable = hello nice_user = True Universe = vanilla Output = hello.out Log = hello.log QueueThe submit instruction is:
condor_submit submit.helloand the output will look something like this:
Submitting job(s) . Logging submit event(s). 1 job(s) submitted to cluster 57.condor_q will say:
$ condor_q -- Submitter: pb000.plg.inf.uc3m.es : <127.0.0.1:59865> : pb000.plg.inf.uc3m.es ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 57.0 ckh11 2/1 11:23 0+00:00:00 R 0 9.8 hello 1 jobs; 0 idle, 1 running, 0 heldThe log file, hello.log, will show (something similar to):
000 (057.000.000) 02/01 11:23:57 Job submitted from host: <127.0.0.1:59865> ... 001 (057.000.000) 02/01 11:24:31 Job executing on host: <127.0.0.1:34755> ... 005 (057.000.000) 02/01 11:24:31 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 816 - Run Bytes Sent By Job 1702035 - Run Bytes Received By Job 816 - Total Bytes Sent By Job 1702035 - Total Bytes Received By Job ...The output file, hello.out, will contain:
hello, Condor
5.2. Multiple Submission - Different Inputs
A common situation has one executable that is executed many times, each time with a different input set. This is called a job cluster. Each cluster has a "cluster ID" and within each cluster, each job has a "process ID". If the program wants the input in a file with a fixed name, then the solution of choice runs each queued job in its own directory.This particular example outputs the number of characters in an input file named mult_job_input. There are 5 different input files, so we need 5 jobs. Because the program uses a fixed name for its input file we do not need to specify an input in the submit description file. The 5 different but identically named input files are prestaged in 5 directories before submitting the job. The directories are named job.0, job.1, job.2, job.3 and job.4. In addition to the input file, each directory will receive its own output in a file called mult_job_output, its own error messages will go into mult_job_error, and Condor will log each job's progress in the file called mult_job_log.
The submit file, submit.mult_job, is:
#################### # Multiple jobs queued, each in its own directory #################### nice_user = True universe = vanilla executable = mult_job output = mult_job_output error = mult_job_error log = mult_job_log initialdir = job.$(Process) queue 5Note the initialdir line, it is using a simple macro to give a different directory name for each job to be queued.
The program source, mult_job.c, is:
#include <stdio.h> main() { FILE *in; char ch, filename[80]; int i=0; sprintf(filename,"mult_job_input"); if((in=fopen(filename,"r")) == NULL){ printf("Cant open %s\n",filename); exit(1); } while((ch=getc(in)) != EOF){i++;} printf("i is %d\n",i); exit(0); }Having set up the directories and input files, the submit instruction and output is:
$ condor_submit submit.mult_job Submitting job(s) . Logging submit event(s)..... 5 job(s) submitted to cluster 60.
5.3. Multiple Submission - Different Arguments
This example queues three jobs for execution by Condor. The first will be given command line arguments of 15 and 20, and it will write its standard output to msda.out1. The second will be given command line arguments of 30 and 20, and it will write its standard output to msda.out2. Similarly the third will have arguments of 45 and 60, and it will use msda.out3 for its standard output.The submit file, submit.msda, is:
#################### # # Different command line arguments and output files. # #################### executable = msda nice_user = True universe = vanilla arguments = 15 20 output = msda.out1 error = msda.err1 queue arguments = 30 20 output = msda.out2 error = msda.err2 queue arguments = 45 60 output = msda.out3 error = msda.err3 queueThe source for msda is not given as it is trivial - it adds its two arguments and outputs them to stdout.
The submit instruction and output is:
condor_submit submit.msda Submitting job(s)... 3 job(s) submitted to cluster 61.Note this time it does not mention logging as we did not specify a log file.
5.4. Simple shell script
Any program can be run as a vanilla job, including shell scripts. The script "doloop" stays in a loop and prints out a number, then sleeps for a second. At the end, doloop.out should contain the values from 0 to 10 and the message "Normal End-of-Job".The script, "doloop" is:
#!/bin/bash x=0; # initialize x to 0 while [ "$x" -le 10 ]; do echo "$x" # increment the value of x: x=$(expr $x + 1) sleep 1 done echo "Normal End-of-Job"The submit file, "submit.doloop", is
#################### ## ## Vanilla script test ## #################### nice_user = True universe = vanilla executable = doloop output = doloop.out error = doloop.err log = doloop.log arguments = 10 queue
5.5. Matlab
The following example shows matlab running a simple script file (also often known as an M-file). A Matlab script file is an external file that contains a sequence of matlab statements that can be executed interactively in matlab simply by typing its name (without the extension) at the prompt. However, under condor matlab cannot be run interactively, so the script file needs to be executed from the command line by using the -r option to matlab. It is also necessary to use the -nosplash, -nojvm and -nodesktop matlab options to prevent unwanted windows from appearing. Matlab will still try to open a display connection even if we don't want any windows to appear - normally this would not be a problem, but as we run condor daemons as user "condor" instead of root there can be authentication issues. Thus an option such as -display yourhostname:0 or -nodisplay is also needed (the latter will result in some warning messages about broken X connections in your error file which can be ignored). The fact that we run condor daemons as user "condor" instead of root also can cause file ownership problems in this particular example (see 1.2. Condor on the PLG grid) - because we write to a file which will be owned by user "condor" we have to make the working directory world-writeable.The script file "matscripttest.m" in this example is:
load a.dat; load b.dat; matrR = a * b; save matrR.dat; exit;Note the final exit - else the script will never finish and condor will hang. The files a.dat and b.dat must preexist, the file matrR.dat will be created.
The submit file, "submit.matlab" will be
# # Submit a matlab job # executable = /usr/bin/cl-matlab arguments = -nosplash -nojvm -nodesktop -nodisplay -r matscripttest nice_user = True universe = vanilla getenv = True # MATLAB needs local environment log = mat.log output = mat.out error = mat.err queue 1Note the getenv = True - without it matlab will core dump!
Note also that the executable given is the full path name. Even if matlab is on your PATH you need to give the full pathname or condor will assume it is an executable in the current working directory, and condor_submit will report an error when it can't find it.
5.6. More matlab - a slight variant
A slight variant on the procedure in 5.5. Matlab is to create a small shell script, eg "matscripttest.sh", as a wrapper to matlab:#! /bin/sh cl-matlab -nosplash -nojvm -nodesktop -nodisplay -r "matscripttest"
The submit file would be similar to the above, but the executable line would then be
executable = matscripttest.shand the arguments line would not be needed.
5.7. IPC Software
This is an example of how to use the International Planning Competition(IPC) software sending a job per domain of the last IPC for a given configuration.The script, "script.sh" is:
#!/bin/bash domains=("barman" "elevators" "floortile" "nomystery" "openstacks" "parking" "parcprinter" "pegsol" "scanalyzer" "sokoban" "transport" "tidybot" "visitall" "woodworking") domain=${domains["$2"]} exec_folder=exec_"$2" mkdir "$exec_folder" cd scripts/IPCData ./invokeplanner.py --ini ../../ipc.ini -e myaddress@mydomain.com -t seq -s opt --timeout 1800 --memory 6 --directory ../../"$exec_folder" -l ./logfile -D "$domain" --planner "$1"The submit file, "exec.condor", is
nice_user = True universe = vanilla notify_user = myaddress@mydomain.com notification = Always getenv = TRUE Executable = script.sh Output = test.$(Cluster).$(Process).out log = test.$(Cluster).$(Process).log error = test.$(Cluster).$(Process).err request_memory = 7000 transfer_input_files = ipc.ini, scripts transfer_output_files = exec_$(PROCESS) arguments = mutex $(PROCESS) queue 14
5.8. Urgent job
To send an urgent job, just set nice_user to "False".#################### ## ## Urgent job ## #################### nice_user = False universe = vanilla executable = doloop output = doloop.out error = doloop.err log = doloop.log arguments = 10 queue
6. Summary of Useful Condor Commands
- condor_submit is the program for submitting jobs for execution under Condor.
- condor_q displays information about jobs in the Condor job queue. Use the -global option to see multiple machines.
- condor_status may be used to monitor and query resource information, submitter information, checkpoint server information, and daemon master information for the Condor pool.
- condor_prio changes the priority of one or more jobs in the condor queue.
- condor_userprio with no arguments, lists the active users (see below) along with their priorities, in increasing priority order. The -all option can be used to display more detailed information.
- condor_history displays a summary of all condor jobs listed in the specified history files. If no history files are specified then the local history file as specified in Condor's configuration file is read.
- condor_rm removes one or more jobs from the Condor job queue.