Python and its legendary slowness, how to solve it ?

10 October 2019
Image en-tête Python et sa lenteur légendaire, comment y remédier ?
Python is a slow language. That is a fact. But it remains very widely used for large-scale projects. Let’s look at why...

The online file sharing site “Dropbox”, the file comparer installed as standard in Linux, “Meld”, the distributed sources management tool “Mercurial” and the art and animation programme “Blender”, are just a few examples among others of products developed using Python. But Python is a language that is disparaged for its slowness.

According to the site “The Computer Language Benchmarks Game”, which performs exhaustive comparisons of the performance of various computer languages, Python3 is often several dozen times slower than C++ on the main test cases:

Comparative table of programming software

However, there are many ways to make Python code faster. The best known method is to use the library outside the language, “Numpy”, which handles the matrices very efficiently and lets Python present itself as a viable alternative to the very costly ecosystem “Matlab”. This method, however, applies only to matrix calculation.

So how can we optimise Python? Here are some methods actually used by the author, the last being their favourite.

 

Option 1: Use PyPy

Simply install PyPy and launch your programme, using it instead of the standard interpreter. PyPy features a “Just in Time” compiler, which operates just before execution.

AUSY recommends this solution for its ease of use. However, the gain is almost non-existent on the examples used previously.

 

Option 2: Use Cython

This involves re-writing parts of the code in a specific language closer to C language. We obtain a sort of hybrid between the two languages. Of course we need to install Cython and put some effort into learning to use it.

Comparative table of Python and Cython

For a simple script (a single file), it is possible to obtain a result without modifying the script.

Cython appears to be difficult to use. The writer of this article has observed a low success rate (only on a programme with a single file). This poor success rate is not worth the gain, which remains slight on the examples tested.

 

Option 3: Re-write functions in C

Python is an extendible language. This means that it can call modules in lower level languages, such as C.

After identifying the function posing the problem, it can simply be re-written in C or C++ language. Note that adaptations are necessary to pass the parameters and recover them.

Let’s see an example to illustrate this third solution:

Comparative table of Original Python and Modified Python

Of course we have made sure to compile the “make_lum” function in a “make_lum.so” file, using two link compilation and publishing commands:

                gcc -c -O3 -Wall  -fPIC  -o  make_lum.o  make_lum.c

                gcc -O3 -shared  -Wl,-soname, make_lum.o -o  make_lum.o  make_lum.so

This option is much better than the previous ones since it is quite easy to use and the gain is significant (provided that the situation is suited to it).

 

Option 4: use parallelism

This method is proposed natively by the language Go (thread parallelism).

It involves isolating the critical part in a list of data to be processed, then distributing the work in parallel among different small “processes”. In theory, the gain can be up to a factor equal to the number of processors of the machine.

Let’s look at a concrete example simplified according to a programme developed by our consultant, in which caterpillars must be placed on a scale:

Caterpillar table

To simplify the codes, functions have been omitted. The “try caterpillar” function assesses the utility of choosing a caterpillar to place on a scale, and the “best one” function compiles these results to select the best one.

Note: Python does not help you save time on the “thread” parallelism for reasons of implementation, because of the GIL “Global Interpreter Lock”, which prevents several threads from operating at the same time.

As a reminder, the thread parallelism is a lighter parallelism in which the data is shared. The process parallelism is a heavier parallelism in which nothing is shared.

This is what we have used in our example. The first lets us optimise a Python script provided that the resource is not the calculation but rather (for example) inputs outputs.

This solution takes a long time to implement but the gains can be impressive. This result is visible however provided that we can break down the programme in the appropriate way and invest in a PC with more cores and processors.

 

Last but not least, Option 5: Optimise the programme itself

In order to optimise the programme itself, we use “profiling”. Profiling involves first determining the functions that take longest, then, if needed, the code lines that take longest. Then simply modify the programmes to reduce the calculation time. One of the most interesting and creative aspects of programming...

Profiling according to function is very easy in Python: simply add a few code lines to the programme.

Here is an example:

Table 5 Python

We then obtain this type of listing which indicates the functions that take the most time:

Table Python 5.2

We see that the bottle-neck is the “install caterpillar()” function, defined in line 411 of the script. We may have to detail the code of this function line by line.

Note: the approach described is not specific to Python, it is just a version of the traditional approach using gprof from Linux for the C/C++ programmes. Although it is adapted from another approach, this solution works perfectly for Python.

Profiling is a very interesting intellectual exercise which produces a variable gain. We may be surprised when an expected gain does not materialise, and vice versa.

Profiling by code line is also very simple but requires a tool that is not provided in the Python standard: “line_profiler”.

What is your favourite option?

 

 

Jérémie Lefrançois
Jérémie Lefrançois, an Ausy consultant since 2004, wrote his first programmes in Basic on a ZX Spectrum back in 1982. He gained a postgraduate qualification in IT in 1989, and has worked with many computer languages on a personal and professional basis. He enjoys exploring and comparing the possibilities of different languages - especially Python - which has been his preferred language since 2014. He thus takes a keen interest in monitoring developments in this dynamic language, which he likes for its ease of implementation.

 

Don’t hesitate to check out our Big Data offer.

Let's have a chat about your projects.

bouton-contact-us-en