Speed up nested for-loops in python / going through numpy array -
say have 4 numpy arrays a,b,c,d , each size of (256,256,1792). want go through each element of arrays , it, but need in chunks of 256x256x256-cubes.
my code looks this:
for l in range(7):      x, y, z, t = 0,0,0,0     m in range(a.shape[0]):         n in range(a.shape[1]):             o in range(256*l,256*(l+1)):                 t += d[m,n,o] * constant                 x += a[m,n,o] * d[m,n,o] * constant                 y += b[m,n,o] * d[m,n,o] * constant                 z += c[m,n,o] * d[m,n,o] * constant     final = (x+y+z)/t     dooutput(final) the code works , outputs want, awfully slow. i've read online kind of nested loops should avoided in python. cleanest solution it? (right i'm trying part of code in c , somehow import via cython or other tools, i'd love pure python solution)
thanks
add on
willem van onsem's solution first part seems work fine , think comprehend it. want modify values before summing them. looks like
(within outer l loop)
for m in range(a.shape[0]):     n in range(a.shape[1]):         o in range(256*l,256*(l+1)):             r += (d[m,n,o] * constant * (a[m,n,o]**2              + b[m,n,o]**2 + c[m,n,o]**2)/t - final**2) dooutput(r) i can't square sum x = (a[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()**2*constant since (a²+b²) != (a+b)² how can redo last loops?
since update t every element of m in range(a.shape[0]), n in range(a.shape[1]) , o in range(256*l,256*(l+1)), can substitute:
for m in range(a.shape[0]):     n in range(a.shape[1]):         o in range(256*l,256*(l+1)):             t += d[m,n,o] with:
t += d[:a.shape[0],:a.shape[1],256*l:256*(l+1)].sum() the same other assignments. can rewrite code to:
for l in range(7):      dsub = d[:a.shape[0],:a.shape[1],256*l:256*(l+1)]     x = (a[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant     y = (b[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant     z = (c[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant     t = dsub.sum()*constant    final = (x+y+z)/t    dooutput(final) note * in numpy element-wise multiplication, not matrix product. can multiplication before sum, since sum of multiplications constant equal multiplication of constant sum, think more efficient out of loop.
if a.shape[0] equal d.shape[0], etc. can use : instead of :a.shape[0]. based on question, that seems case. so:
# when `a.shape[0] == d.shape[0], a.shape[1] == d.shape[1] (and a, b , c)` l in range(7):      dsub = d[:,:,256*l:256*(l+1)]     x = (a[:,:,256*l:256*(l+1)]*dsub).sum()*constant     y = (b[:,:,256*l:256*(l+1)]*dsub).sum()*constant     z = (c[:,:,256*l:256*(l+1)]*dsub).sum()*constant     t = dsub.sum()*constant     final = (x+y+z)/t     dooutput(final) processing .sum() on numpy level boost performance since not convert values , forth , .sum(), use tight loop.
edit:
your updated question not change much. can use:
m,n,_* = a.shape lo,hi = 256*l,256*(l+1) r = (d[:m,:n,lo:hi]*constant*(a[:m,:n,lo:hi]**2+b[:m,:n,lo:hi]**2+d[:m,:n,lo:hi]**2)/t-final**2)).sum() dooutput(r) 
Comments
Post a Comment