Monday, July 22, 2013

Replacement for inefficient loop selection

What's wrong

It is evident that, loop selection method for scalar operation is inefficient and consume almost 4.2% of time. It check through all associated dtypes of function one by one from types array. There is scope to make this much faster and better.

Replacing loop by specialized conditions

Most of the function share identical signature. E.g These sets (add, subtracts) , (arccos, arcsin, arctan, arcsinh, arccosh) share same signature array. As if know, there are only 32 distinct signature arrays. I make code generator to identity and make specialized condition for each distinct signature arrays. Hence, improvement of 4%.

Implementation

  1. Most of functions have uniform arguments, so it will better to look them first.
  2. For each distinct signature array, auto-gen lookup function having if-else condition which check and return index. E.g following code is auto-generated to quickly return innerloop index of add function
    /** Warning this file is autogenerated!!!
    
        Please make changes to the code generator program (numpy/core/code_generators/generate_umath.py)
    **/ 
    static  int type21_id3_index(int x, int y){ 
      if(x==y){ 
        if(x==NPY_HALF){ return 11;}
        if(x==NPY_TIMEDELTA){ return 19;}
        if(x>=NPY_BOOL && x<=NPY_ULONGLONG){ return x+(0);}
        if(x>=NPY_FLOAT && x<=NPY_CLONGDOUBLE){ return x+(1);}
        if(x==NPY_OBJECT){ return 21;} 
      }
      if(x==NPY_DATETIME && y==NPY_TIMEDELTA){ return 18;}
      if(x==NPY_TIMEDELTA && y==NPY_DATETIME){ return 20;} 
      return -1;
    }
    
  3. Encapsulates logic, with ((PyUFuncObject *)f)->sig_index(arg1, args2)
  4. Detail implementation has been mentioned in figure
    fig 1. Code generator implementation for quick loop selection
    Full image

More


No comments:

Post a Comment