Arink`s Blog: GSoC'13 Project Summary-1 : Numpy's profiling

Small numpy arrays are very similar to Python scalars but numpy incurs a fair amount of extra overhead for simple operations. For large arrays this doesn't matter, but for code that manipulates a lot of small pieces of data, it can be a serious bottleneck.
For example

In [1]: x = 1.0

In [2]: numpy_x = np.asarray(x)

In [3]: timeit x + x
10000000 loops, best of 3: 61 ns per loop

In [4]: timeit numpy_x + numpy_x
1000000 loops, best of 3: 1.66 us per loop

This project involved

profiling simple operations like the above
determining possible bottlenecks
devising improved algorithms to solve them, with the goal of getting the numpy time as close as possible to the Python time.

Profiling tools

The very first objective to find bottleneck is profiling for time or space. During project I have used few tools for profiling and visualizing data of numpy execution flow.

Google profiling tool

This is the suit of different tools provided by Google. It Includes TCMalloc, heap-checker, heap-profiler and cpu-profiler. As need of project was to reduce time, so CPU-Profiler was used.

Setting up Gperftools

Following are the steps used to setup python C level profiler on Ubuntu 13.04. (For any other system, options see [1])

Make sure to build it from source. Clone svn repository from http://gperftools.googlecode.com/svn/trunk/
In order to build gperftools checked out from subversion repository you need to have autoconf, automake and libtool installed.
First, run ./autogen.sh script which generate ./configure and other files. Then run ./configure
'make check', to run any self-tests that come with the package. Check is optional but recommended to use
After all test gets passed, type 'sudo make install' to install the programs and any data files and documentation.

Running CPU profiler

I evoked profiler manually before running sample code. Consider python code to profiled is in num.py file.

$CPUPROFILE=num.py.prof LD_PRELOAD=/usr/lib/libprofiler.so python num.py

Alternatively, include profiler in code as follow

import ctypes
import timeit
profiler = ctypes.CDLL("libprofiler.so")
profiler.ProfilerStart("num.py.prof")
timeit.timeit('x+y',number=10000000,
       setup='import numpy as np;x = np.asarray(1.0);y = np.asarray(2.0);')
profiler.ProfilerStop()

To analysis stats use

$pprof --gv ./num.py num.py.prof

Callgraph generated by gperftools. Each block represent method with local and cumulative percentage.

Oprofile

OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

Setting up Oprofile

Access the source via Git : git clone git://git.code.sf.net/p/oprofile/oprofile

Automake and autoconf is needed.

Run autogen.sh before attempting to build as normal.

Running CPU profiler

$opcontrol --callgraph=16
$opcontrol --start
$python num.py
$opcontrol --stop
$opcontrol --dump
$opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

Callgraph is visualized with help of script gprof2dot.py

Perf from linux-tools

Perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and per-workload counters, sampling on top of these and source code event annotation.

Setting up perf

$sudo apt-get install linux-tools-common 
$sudo apt-get install linux-tools-<kernal-version>

Running Profiler and visualizing data as flame-graph

$perf record -a -g -F 1000 ./num.py
$perf  script | ./stackcollapse-perf.pl > out.perf-folded
$cat out.perf-folded | ./flamegraph.pl > perf-numpy.svg

The first command runs perf in sampling mode (polling) at 1000 Hertz (-F 1000; more on this later) across all CPUs (-a), capturing stack traces so that a call graph (-g) of function ancestry can be generated later. The samples are saved in a perf.data

Script to visualize above flame graph is at https://github.com/brendangregg/FlameGraph.

Arink`s WebLog

Monday, September 23, 2013

GSoC'13 Project Summary-1 : Numpy's profiling

Profiling tools

Google profiling tool

Setting up Gperftools

Running CPU profiler

To analysis stats use

Oprofile

Setting up Oprofile

Running CPU profiler

Perf from linux-tools

Setting up perf

Running Profiler and visualizing data as flame-graph

No comments:

Post a Comment

About Me

Labels