Speed up nested for-loops in python / going through numpy array -
say have 4 numpy arrays a,b,c,d , each size of (256,256,1792). want go through each element of arrays , it, but need in chunks of 256x256x256-cubes.
my code looks this:
for l in range(7): x, y, z, t = 0,0,0,0 m in range(a.shape[0]): n in range(a.shape[1]): o in range(256*l,256*(l+1)): t += d[m,n,o] * constant x += a[m,n,o] * d[m,n,o] * constant y += b[m,n,o] * d[m,n,o] * constant z += c[m,n,o] * d[m,n,o] * constant final = (x+y+z)/t dooutput(final)
the code works , outputs want, awfully slow. i've read online kind of nested loops should avoided in python. cleanest solution it? (right i'm trying part of code in c , somehow import via cython or other tools, i'd love pure python solution)
thanks
add on
willem van onsem's solution first part seems work fine , think comprehend it. want modify values before summing them. looks like
(within outer l loop)
for m in range(a.shape[0]): n in range(a.shape[1]): o in range(256*l,256*(l+1)): r += (d[m,n,o] * constant * (a[m,n,o]**2 + b[m,n,o]**2 + c[m,n,o]**2)/t - final**2) dooutput(r)
i can't square sum x = (a[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()**2*constant
since (a²+b²) != (a+b)² how can redo last loops?
since update t
every element of m in range(a.shape[0])
, n in range(a.shape[1])
, o in range(256*l,256*(l+1))
, can substitute:
for m in range(a.shape[0]): n in range(a.shape[1]): o in range(256*l,256*(l+1)): t += d[m,n,o]
with:
t += d[:a.shape[0],:a.shape[1],256*l:256*(l+1)].sum()
the same other assignments. can rewrite code to:
for l in range(7): dsub = d[:a.shape[0],:a.shape[1],256*l:256*(l+1)] x = (a[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant y = (b[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant z = (c[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant t = dsub.sum()*constant final = (x+y+z)/t dooutput(final)
note *
in numpy element-wise multiplication, not matrix product. can multiplication before sum, since sum of multiplications constant equal multiplication of constant sum, think more efficient out of loop.
if a.shape[0]
equal d.shape[0]
, etc. can use :
instead of :a.shape[0]
. based on question, that seems case. so:
# when `a.shape[0] == d.shape[0], a.shape[1] == d.shape[1] (and a, b , c)` l in range(7): dsub = d[:,:,256*l:256*(l+1)] x = (a[:,:,256*l:256*(l+1)]*dsub).sum()*constant y = (b[:,:,256*l:256*(l+1)]*dsub).sum()*constant z = (c[:,:,256*l:256*(l+1)]*dsub).sum()*constant t = dsub.sum()*constant final = (x+y+z)/t dooutput(final)
processing .sum()
on numpy
level boost performance since not convert values , forth , .sum()
, use tight loop.
edit:
your updated question not change much. can use:
m,n,_* = a.shape lo,hi = 256*l,256*(l+1) r = (d[:m,:n,lo:hi]*constant*(a[:m,:n,lo:hi]**2+b[:m,:n,lo:hi]**2+d[:m,:n,lo:hi]**2)/t-final**2)).sum() dooutput(r)
Comments
Post a Comment