What's wrong
For scalar operations Numpy first try to extract the underlying C value from a Python Integers. It causes bottleneck because it first converts the Python scalar into its matching NumPy scalar (e.g. PyLong -> int32) and then it extracts the C value from the NumPy scalar.Avoiding conversion
Hence avoiding this conversation improve speed significantly. I have avoided conversion for known integer type but extracting its value directly.For byte, short, int, long
#if PY_VERSION_HEX >= 0x03000000 if(PyLong_CheckExact(a)){ *arg1 = PyLong_AsLong(a); return 0; } #else if (PyInt_CheckExact(a)){ *arg1 = PyInt_AS_LONG(a); return 0; } #endif
Performance
- 1.09 Array * Array
- 1.09 Array + Array
- 1.18 np.int * np.int
- 1.14 np.int + np.int
- 1.08 PyInt * Array
- 1.07 np.int * Array
- 1.10 PyInt + Array
- 1.09 np.int + Array
- 1.08 PyInt * PyInt
- 1.09 PyInt + PyInt
- 2.53 PyInt * np.int
- 2.69 PyInt + np.int
- 1.06 np.int * np.int
- 1.13 np.int + np.int
- 7.29 PyInt < np.int
- 2.38 np.int ** 2.
Array = np.array([2234,32342]) PyType = 3 NyType = Array[0]
More
- PR for this enhancement is #3567
- Profiled Speedup Datasheet is http://goo.gl/U9R3DS
- Also Raul has made improvement for float dtype at #2941
No comments:
Post a Comment