Using Python multiprocessing in Slurm, and which combination of ntasks or ncpus I need -


i'm trying run python script on slurm cluster, , i'm using python's built-in multiprocessing module.

i'm using quite simple set up, testing purpose, example is:

len(arg_list) out[2]: 5  threads = multiprocessing.pool(5) output = threads.map(func, arg_list) 

so func applied 5 times in parallel on 5 arguments in arg_list. want know how allocate correct amount of cpu's/tasks in slurm work expected. relevant part of slurm batch script looks like:

#!/bin/bash  # runtime , memory #sbatch --time=90:00:00 #sbatch --mem-per-cpu=2g  # parallel jobs #sbatch --cpus-per-task=10 ##sbatch --nodes=2 #sbatch --ntasks=1 ##sbatch --ntasks-per-node=4    #### shell commands below line ####  srun ./script_wrapper.py 'test' 

as can see, @ moment have ntasks=1 , cpus-per-task=10. note main bulk of func contains scipy routine tends run on 2 cores (i.e uses 200% cpu usage, why want 10 cpus , not 5).

is correct way allocate resources purposes, because @ moment job takes lot longer expected (more it's running in single thread).

do need set ntasks=5 instead? because impression online documentation ntasks=5 instead call srun ./script_wrapper.py 'test' 5 times instead, not want. right in assumption?

also, there way check stuff cpu usage , process id's of python tasks called multiprocessing.pool? @ moment i'm trying sacct -u <user> --format=jobid,jobname,maxrss,elapsed,avecpu, avecpu , maxrss fields come empty reason (?) , while see first script process, don't see 5 others should called multiprocessing. example:

       jobid    jobname     maxrss    elapsed     avecpu  ------------ ---------- ---------- ---------- ----------  16260892             gp              00:13:07             16260892.0   script_wr+              00:13:07             

your slurm task allocation looks correct me. python's multiprocessing run on single machine, , looks me you're allocating 10 cpus on 1 node correctly. might causing problem multiprocessing's pool.map default works on "chunks" of input list rather 1 element @ time. minimise overhead when tasks short. force multiprocessing work on 1 element of list @ time, set chunksize parameter of map 1, e.g.

threads.map(func, arglist, 1) 

see multiprocessing documentation more information.

because you're using multithreaded version of scipy, may want check relevant threading level underlying library. instance, if scipy has been built against intel math kernel library, try setting omp_num_threads andmkl_num_threads environment variables make sure it's using no more 2 threads per process , making full use (and not over-use) of allocated slurm resources.

edit: sacct going give running times processes launched directly srun, , not subprocesses. hence in case you'll have 1 process single srun command. monitor subprocesses may have monitoring tools operate @ system level rather through slurm.


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -