Using Python multiprocessing in Slurm, and which combination of ntasks or ncpus I need -
i'm trying run python script on slurm cluster, , i'm using python's built-in multiprocessing
module.
i'm using quite simple set up, testing purpose, example is:
len(arg_list) out[2]: 5 threads = multiprocessing.pool(5) output = threads.map(func, arg_list)
so func
applied 5 times in parallel on 5 arguments in arg_list
. want know how allocate correct amount of cpu's/tasks in slurm work expected. relevant part of slurm batch script looks like:
#!/bin/bash # runtime , memory #sbatch --time=90:00:00 #sbatch --mem-per-cpu=2g # parallel jobs #sbatch --cpus-per-task=10 ##sbatch --nodes=2 #sbatch --ntasks=1 ##sbatch --ntasks-per-node=4 #### shell commands below line #### srun ./script_wrapper.py 'test'
as can see, @ moment have ntasks=1
, cpus-per-task=10
. note main bulk of func contains scipy routine tends run on 2 cores (i.e uses 200% cpu usage, why want 10 cpus , not 5).
is correct way allocate resources purposes, because @ moment job takes lot longer expected (more it's running in single thread).
do need set ntasks=5
instead? because impression online documentation ntasks=5
instead call srun ./script_wrapper.py 'test'
5 times instead, not want. right in assumption?
also, there way check stuff cpu usage , process id's of python tasks called multiprocessing.pool? @ moment i'm trying sacct -u <user> --format=jobid,jobname,maxrss,elapsed,avecpu
, avecpu
, maxrss
fields come empty reason (?) , while see first script process, don't see 5 others should called multiprocessing. example:
jobid jobname maxrss elapsed avecpu ------------ ---------- ---------- ---------- ---------- 16260892 gp 00:13:07 16260892.0 script_wr+ 00:13:07
your slurm task allocation looks correct me. python's multiprocessing run on single machine, , looks me you're allocating 10 cpus on 1 node correctly. might causing problem multiprocessing's pool.map
default works on "chunks" of input list rather 1 element @ time. minimise overhead when tasks short. force multiprocessing work on 1 element of list @ time, set chunksize parameter of map 1, e.g.
threads.map(func, arglist, 1)
see multiprocessing documentation more information.
because you're using multithreaded version of scipy, may want check relevant threading level underlying library. instance, if scipy has been built against intel math kernel library, try setting omp_num_threads
andmkl_num_threads
environment variables make sure it's using no more 2 threads per process , making full use (and not over-use) of allocated slurm resources.
edit: sacct going give running times processes launched directly srun, , not subprocesses. hence in case you'll have 1 process single srun command. monitor subprocesses may have monitoring tools operate @ system level rather through slurm.
Comments
Post a Comment