In the previous section, we learned how to find Flux, get an allocation, and query the compute resources in that allocation. Now, we are ready to launch work on those compute resources and get some work done. When you launch work in Flux, that work takes the form of jobs that can be either blocking or non-blocking. Blocking jobs will run to completion before control is returned to the terminal, whereas non-blocking jobs are enqueued, allowing you to immediately submit more work in the allocation.
Before we get into submitting and managing Flux jobs, we should also discuss Flux’s jobids as they’re a bit different than what you’ll find in other resource management software. Remember that an allocation in Flux is a fully featured Flux instance. Rather than create sequential numeric ids within each instance, Flux combines the submit time, an instance id, and sequence number to create identifiers that are unique for each job in a Flux instance. This avoids requiring a central allocator for jobids which improves the scalability of job submission within instances. There are options to display these identifiers in a number of ways, but the default is an 8 character string prepended by an f
, e.g. fBsFXaow5
for the job submitted in the example below. For more details on Flux’s identifiers, see the FLUID documentation.
flux run
If you want your work to block until it completes, the flux run
command will submit a job and then wait until it is complete before returning. For example, in a two node allocation, we can launch an mpi program with 4 tasks:
sh-4.2$ flux run -n4 ./mpi_hellosleep
task 2 on rzalastor6 going to sleep
MASTER: Number of MPI tasks is: 4
task 0 on rzalastor5 going to sleep
task 3 on rzalastor6 going to sleep
task 1 on rzalastor5 going to sleep
task 2 on rzalastor6 woke up
task 0 on rzalastor5 woke up
task 3 on rzalastor6 woke up
task 1 on rzalastor5 woke up
sh-4.2$
flux submit
If you just want to queue up work in a Flux instance, the flux submit
command will submit the job and return immediately. As in the example above, here we will submit a 4 task mpi program in our two node allocation:
sh-4.2$ flux submit -n4 --output=job_.out ./mpi_hellosleep
fBsFXaow5
sh-4.2$ tail -f job_fBsFXaow5.out
MASTER: Number of MPI tasks is: 4
task 0 on rzalastor5 going to sleep
task 1 on rzalastor5 going to sleep
task 2 on rzalastor6 going to sleep
task 3 on rzalastor6 going to sleep
task 0 on rzalastor5 woke up
task 1 on rzalastor5 woke up
task 2 on rzalastor6 woke up
task 3 on rzalastor6 woke up
^C
sh-4.2$
--dependency=
If you want to submit a Flux job that won’t start until another job has completed or reached some other state, you can add a --dependency=
flag to any flux run
or other job submission command. Flux currently supports five dependency conditions:
--dependency=after:JOBID
job will not start until JOBID has started
--dependency=afterany:JOBID
job will not start until JOBID has completed
--dependency=afterok:JOBID
job will not start until JOBID has completed successfully
--dependency=afternotok:JOBID
job will not start until JOBID has completed unsuccessfully
--dependency=begin-time:TIMESTAMP
job will not start until TIMESTAMP
These dependency conditions can be used with flux run
, flux submit
, and the flux batch
and flux alloc
commands described in the Section 3.
flux jobs
and flux job
If you have multiple Flux jobs running and queued you can list those jobs with the flux jobs
command, and manage them with flux job
. For example, in two node instance with 20 cores per node, we can see the states of the job steps that we’ve submitted as:
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7AC3114K
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7As1t7pB
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7BK65VFu
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7Bgjnzwy
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7WUMj9qH
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7WsybiDu
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7XUqnpcF
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7Y14ABhq
sh-4.2$ flux jobs
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
f7WUMj9qH day36 mpi_hellos PD 10 - - -
f7WsybiDu day36 mpi_hellos PD 10 - - -
f7XUqnpcF day36 mpi_hellos PD 10 - - -
f7Y14ABhq day36 mpi_hellos PD 10 - - -
f7Bgjnzwy day36 mpi_hellos R 10 1 48.41s rzalastor5
f7BK65VFu day36 mpi_hellos R 10 1 49.25s rzalastor5
f7As1t7pB day36 mpi_hellos R 10 1 50.27s rzalastor4
f7AC3114K day36 mpi_hellos R 10 1 51.8s rzalastor4
sh-4.2$
We can also see any output from a given job with flux job attach
and kill a given job with flux job cancel
:
sh-4.2$ flux job attach f7XUqnpcF
task 9 on rzalastor5 going to sleep
task 8 on rzalastor5 going to sleep
task 7 on rzalastor5 going to sleep
task 6 on rzalastor5 going to sleep
task 5 on rzalastor5 going to sleep
task 4 on rzalastor5 going to sleep
task 3 on rzalastor5 going to sleep
task 2 on rzalastor5 going to sleep
task 1 on rzalastor5 going to sleep
MASTER: Number of MPI tasks is: 10
task 0 on rzalastor5 going to sleep
^Z
[1]+ Stopped(SIGTSTP) flux job attach f7XUqnpcF
sh-4.2$ flux jobs
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
f7Y14ABhq day36 mpi_hellos R 10 1 47.18s rzalastor5
f7XUqnpcF day36 mpi_hellos R 10 1 47.99s rzalastor5
f7WsybiDu day36 mpi_hellos R 10 1 49.05s rzalastor4
f7WUMj9qH day36 mpi_hellos R 10 1 50.44s rzalastor4
sh-4.2$ flux job cancel f7XUqnpcF
sh-4.2$ flux jobs
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
f7Y14ABhq day36 mpi_hellos R 10 1 59.17s rzalastor5
f7WsybiDu day36 mpi_hellos R 10 1 1.017m rzalastor4
f7WUMj9qH day36 mpi_hellos R 10 1 1.04m rzalastor4
sh-4.2$
Section 1 | Section 2 | Exercise 2 | Section 3
Back to index