gpgpu - Android renderscript never runs on the gpu -


exactly title says.

i have parallelized image creating/processing algorithm i'd use. kind of perlin noise implementation.

// logging never used here #pragma version(1) #pragma rs java_package_name(my.package.name) #pragma rs_fp_full  float sizex, sizey; float ratio;  static float fbm(float2 coord) { ... }  uchar4 rs_kernel root(uint32_t x, uint32_t y) { float u = x / sizex * ratio; float v = y / sizey;  float2 p = {u, v};  float res = fbm(p) * 2.0f;   // rs.: 8245 ms, fs: 8307 ms; fs 9842 ms on tablet  float4 color = {res, res, res, 1.0f}; //float4 color = {p.x, p.y, 0.0, 1.0};  // rs.: 96 ms  return rspackcolorto8888(color); } 

as comparison, exact algorithm runs @ least 30 fps when implement on gpu via fragment shader on textured quad.

the overhead running renderscript should max 100 ms calculated making simple bitmap returning x , y normalized coordinates.

which means in case use gpu surely not become 10 seconds.

the code using renderscript with:

// non-support version gives @ least 25% performance boost import android.renderscript.allocation; import android.renderscript.renderscript;  public class rsnoise {      private renderscript renderscript;     private scriptc_noise noisescript;      private allocation allout;      private bitmap outbitmap;      final int sizex = 1536;     final int sizey = 2048;      public rsnoise(context context) {         renderscript = renderscript.create(context);          outbitmap = bitmap.createbitmap(sizex, sizey, bitmap.config.argb_8888);         allout = allocation.createfrombitmap(renderscript, outbitmap, allocation.mipmapcontrol.mipmap_none, allocation.usage_graphics_texture);          noisescript = new scriptc_noise(renderscript);     }      // render function benchmarked     public bitmap render() {         noisescript.set_sizex((float) sizex);         noisescript.set_sizey((float) sizey);         noisescript.set_ratio((float) sizex / (float) sizey);          noisescript.foreach_root(allout);          allout.copyto(outbitmap);          return outbitmap;     } } 

if change filterscript, using (https://stackoverflow.com/a/14942723/4420543), several hundred milliseconds worse in case of support library , double time worse in case of non-support one. precision did not influence results.

i have checked every question on stackoverflow, of them outdated , have tried nexus 5 (7.1.1 os version) among several other new devices, problem still remains.

so, when renderscript run on gpu? enough if give me example on gpu-running renderscript.

can try run rs_fp_relaxed instead of rs_fp_full?

#pragma rs_fp_relaxed 

rs_fp_full force script running on cpu, since gpus don't support full precision floating point operations.


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -