/ Starting Flux and getting an allocation
Starting Flux and getting an allocation
Regardless of what resource management software a cluster is running, the first step in running in a multi-user environment is to get an allocation of hardware resources. In Flux, your allocation will take the form of a Flux instance where you can use Flux commands to manage your workload on those resources. This section will tell you where to find Flux and how to start it in an allocation even if it is not the main resource manager on the cluster that you are running on.
Flux is included in the TOSS operating system on LC systems, so should be available in your standard PATH. You can check on this with:
LC clusters running TOSS 4 are tracking Flux releases closely, but on TOSS 3 or non-LC clusters you may want a newer version. You can install a local build of Flux using spack or build it from source. See Appendix I for more details on those options.
If you’re on an LC cluster such as corona or tioga where Flux is running as the system level scheduler, you can skip this step. You can just use the flux mini alloc command to get an interactive allocation or any of the batch commands described in Section 3.
If you are on a cluster that is running another resource manager, such as Slurm or LSF, you can still use Flux to run your workload. You will need to get an allocation using the native resource managers commands (e.g. salloc), then start Flux on all of the nodes in that allocation with the flux start command. This will start flux-broker processes on all of the nodes that will gather information about the hardware resources available and communicate between each other to assign your workload to those resources. On a cluster running Slurm, this will look like:
[day36@rzalastor2:~]$salloc -N2--exclusivesalloc: Granted job allocation 234174
sh-4.2$srun -N2-n2--pty--mpibind=off flux start
sh-4.2$flux mini run -n 2 hostnamerzalastor6
The --mpibind=off flag affects an LC-specific plugin, and should not be used on non-LC clusters.
Showing the resources in your Flux instance
When started as a job in another resource manager, Flux uses hwloc to build an internal model of the hardware available in a Flux instance. You can see a view of what resources are allocated and available with flux resource list. For example, in the Flux instance started in the previous section, we have two nodes with 20 cores each:
sh-4.2$ flux resource list
STATE NNODES NCORES NGPUS NODELIST
free 2 40 0 rzalastor[5-6]
allocated 0 0 0
down 0 0 0
sh-4.2$ flux resource list -v
STATE NNODES NCORES NGPUS LIST
free 2 40 0 rank[0-1]/core[0-19]
allocated 0 0 0
down 0 0 0
Most Flux commands will give you a brief summary of their options if you add a --help flag. For more detailed help, you can easily access the man page for a given command with flux help COMMAND or man flux-COMMAND.
7000 East Avenue • Livermore, CA 94550 | LLNL-WEB-458451
Operated by the Lawrence Livermore National Security, LLC for the
Department of Energy's National Nuclear Security Administration
Learn about the Department of Energy's Vulnerability Disclosure Program