Speed up nested for-loops in python / going through numpy array -

say have 4 numpy arrays a,b,c,d , each size of (256,256,1792). want go through each element of arrays , it, but need in chunks of 256x256x256-cubes.

my code looks this:

for l in range(7):      x, y, z, t = 0,0,0,0     m in range(a.shape[0]):         n in range(a.shape[1]):             o in range(256*l,256*(l+1)):                 t += d[m,n,o] * constant                 x += a[m,n,o] * d[m,n,o] * constant                 y += b[m,n,o] * d[m,n,o] * constant                 z += c[m,n,o] * d[m,n,o] * constant     final = (x+y+z)/t     dooutput(final)

the code works , outputs want, awfully slow. i've read online kind of nested loops should avoided in python. cleanest solution it? (right i'm trying part of code in c , somehow import via cython or other tools, i'd love pure python solution)

thanks

add on

willem van onsem's solution first part seems work fine , think comprehend it. want modify values before summing them. looks like

(within outer l loop)

for m in range(a.shape[0]):     n in range(a.shape[1]):         o in range(256*l,256*(l+1)):             r += (d[m,n,o] * constant * (a[m,n,o]**2              + b[m,n,o]**2 + c[m,n,o]**2)/t - final**2) dooutput(r)

i can't square sum x = (a[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()**2*constant since (a²+b²) != (a+b)² how can redo last loops?

since update t every element of m in range(a.shape[0]), n in range(a.shape[1]) , o in range(256*l,256*(l+1)), can substitute:

for m in range(a.shape[0]):     n in range(a.shape[1]):         o in range(256*l,256*(l+1)):             t += d[m,n,o]

with:

t += d[:a.shape[0],:a.shape[1],256*l:256*(l+1)].sum()

the same other assignments. can rewrite code to:

for l in range(7):      dsub = d[:a.shape[0],:a.shape[1],256*l:256*(l+1)]     x = (a[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant     y = (b[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant     z = (c[:a.shape[0],:a.shape[1],256*l:256*(l+1)]*dsub).sum()*constant     t = dsub.sum()*constant    final = (x+y+z)/t    dooutput(final)

note * in numpy element-wise multiplication, not matrix product. can multiplication before sum, since sum of multiplications constant equal multiplication of constant sum, think more efficient out of loop.

if a.shape[0] equal d.shape[0], etc. can use : instead of :a.shape[0]. based on question, that seems case. so:

# when `a.shape[0] == d.shape[0], a.shape[1] == d.shape[1] (and a, b , c)` l in range(7):      dsub = d[:,:,256*l:256*(l+1)]     x = (a[:,:,256*l:256*(l+1)]*dsub).sum()*constant     y = (b[:,:,256*l:256*(l+1)]*dsub).sum()*constant     z = (c[:,:,256*l:256*(l+1)]*dsub).sum()*constant     t = dsub.sum()*constant     final = (x+y+z)/t     dooutput(final)

processing .sum() on numpy level boost performance since not convert values , forth , .sum(), use tight loop.

edit:

your updated question not change much. can use:

m,n,_* = a.shape lo,hi = 256*l,256*(l+1) r = (d[:m,:n,lo:hi]*constant*(a[:m,:n,lo:hi]**2+b[:m,:n,lo:hi]**2+d[:m,:n,lo:hi]**2)/t-final**2)).sum() dooutput(r)

Search This Blog

Breniser

Speed up nested for-loops in python / going through numpy array -

Comments

Post a Comment

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -

reflection - why SomeClass::class is KClass<SomeClass> but this::class is KClass<out SomeClass> -