gpgpu - Android renderscript never runs on the gpu -
exactly title says.
i have parallelized image creating/processing algorithm i'd use. kind of perlin noise implementation.
// logging never used here #pragma version(1) #pragma rs java_package_name(my.package.name) #pragma rs_fp_full float sizex, sizey; float ratio; static float fbm(float2 coord) { ... } uchar4 rs_kernel root(uint32_t x, uint32_t y) { float u = x / sizex * ratio; float v = y / sizey; float2 p = {u, v}; float res = fbm(p) * 2.0f; // rs.: 8245 ms, fs: 8307 ms; fs 9842 ms on tablet float4 color = {res, res, res, 1.0f}; //float4 color = {p.x, p.y, 0.0, 1.0}; // rs.: 96 ms return rspackcolorto8888(color); }
as comparison, exact algorithm runs @ least 30 fps when implement on gpu via fragment shader on textured quad.
the overhead running renderscript should max 100 ms calculated making simple bitmap returning x , y normalized coordinates.
which means in case use gpu surely not become 10 seconds.
the code using renderscript with:
// non-support version gives @ least 25% performance boost import android.renderscript.allocation; import android.renderscript.renderscript; public class rsnoise { private renderscript renderscript; private scriptc_noise noisescript; private allocation allout; private bitmap outbitmap; final int sizex = 1536; final int sizey = 2048; public rsnoise(context context) { renderscript = renderscript.create(context); outbitmap = bitmap.createbitmap(sizex, sizey, bitmap.config.argb_8888); allout = allocation.createfrombitmap(renderscript, outbitmap, allocation.mipmapcontrol.mipmap_none, allocation.usage_graphics_texture); noisescript = new scriptc_noise(renderscript); } // render function benchmarked public bitmap render() { noisescript.set_sizex((float) sizex); noisescript.set_sizey((float) sizey); noisescript.set_ratio((float) sizex / (float) sizey); noisescript.foreach_root(allout); allout.copyto(outbitmap); return outbitmap; } }
if change filterscript, using (https://stackoverflow.com/a/14942723/4420543), several hundred milliseconds worse in case of support library , double time worse in case of non-support one. precision did not influence results.
i have checked every question on stackoverflow, of them outdated , have tried nexus 5 (7.1.1 os version) among several other new devices, problem still remains.
so, when renderscript run on gpu? enough if give me example on gpu-running renderscript.
can try run rs_fp_relaxed instead of rs_fp_full?
#pragma rs_fp_relaxed
rs_fp_full force script running on cpu, since gpus don't support full precision floating point operations.
Comments
Post a Comment