What's wrong
It is evident that, loop selection method for scalar operation is inefficient and consume almost 4.2% of time. It check through all associated dtypes of function one by one from types array. There is scope to make this much faster and better.
Replacing loop by specialized conditions
Most of the function share identical signature. E.g These sets (add, subtracts) , (arccos, arcsin, arctan, arcsinh, arccosh) share same signature array. As if know, there are only 32 distinct signature arrays. I make code generator to identity and make specialized condition for each distinct signature arrays. Hence, improvement of 4%.
Implementation
 Most of functions have uniform arguments, so it will better to look them first.
 For each distinct signature array, autogen lookup function having ifelse condition which check and return index. E.g following code is autogenerated to quickly return innerloop index of add function
/** Warning this file is autogenerated!!! Please make changes to the code generator program (numpy/core/code_generators/generate_umath.py) **/ static int type21_id3_index(int x, int y){ if(x==y){ if(x==NPY_HALF){ return 11;} if(x==NPY_TIMEDELTA){ return 19;} if(x>=NPY_BOOL && x<=NPY_ULONGLONG){ return x+(0);} if(x>=NPY_FLOAT && x<=NPY_CLONGDOUBLE){ return x+(1);} if(x==NPY_OBJECT){ return 21;} } if(x==NPY_DATETIME && y==NPY_TIMEDELTA){ return 18;} if(x==NPY_TIMEDELTA && y==NPY_DATETIME){ return 20;} return 1; }
 Encapsulates logic, with
((PyUFuncObject *)f)>sig_index(arg1, args2)

Detail implementation has been mentioned in figure
fig 1. Code generator implementation for quick loop selection
Full image
More
 PR for this feature is #3535
 Full diagram of implementation is at http://goo.gl/pNIFL
No comments:
Post a Comment