Saturday, September 14, 2013

Speedup UFUNC_CHECK_STATUS by avoiding heavy clearing flag

UFUNC_CHECK_STATUS is just single macro which do both checking clearing the error flags. It clear error flags every time after checking. We should avoid clear operation if not needed, as it is a bit expensive and take significant amount of time.

The way numpy detect divide-by-zero, overflow, underflow, etc., is that before each ufunc loop it clear the FP error flags, and then after the ufunc loop we see if any have become set. And clear again. I have avoided clear if not needed to save time.

Improvement

Before each ufunc loop when PyUFunc_clearfperr() flag error is checked, then clearing them if necessary. Now, checking results in macro doesn't get ignored unlike before. Earlier time taken by PyUFunc_clearfperr() and PyUFunc_getfperr() combined was around 10%, which is now dropped to 1%, for operation which don't raise any error.

callgraph comparing performance
x = np.asarray([1]); x+x;

More

PR for this change is #3739

No comments:

Post a Comment