Plop: Low-overhead profiling for Python

Posted by Ben Darnell on July 10, 2012

It’s almost time for another Hack Week at Dropbox, and with that in mind I’d like to present one of the projects from our last Hack Week.

A profiler is an indispensable tool for optimizing programs.  Without a profiler, it’s hard to tell which parts of the code are consuming enough time to be worth looking at.  Python comes with a profiler called cProfile, but enabling it slows things down so much that it’s usually only used in development or simulated scenarios, which may differ from real-world usage.

At our last hack week, I set out to build a profiler that would be usable on live servers without impacting our users.  The result, Plop (Python Low Overhead Profiler) is now available on Github.

How it works

Plop is a sampling profiler, similar to Google’s gperftools.  Every 10 milliseconds, a timer is fired which causes the program to record its current stack trace.  After 30 seconds, the collected samples are aggregated and saved.  There is a web-based viewer for the resulting call graphs (using d3.js)

Here’s a sample profile, from a simple Tornado-based web server. Click on the image for a full-size interactive view (it’s big, so you’ll need to either scroll around a lot or use your browser’s zoom controls).

Each bubble is a function; they’re color-coded by filename and you can mouse over them for more details. The size of the bubble represents the amount of time spent in that function.  The thickness of the lines connecting the bubbles represents how frequently that function call appears on the stack.  The disconnected bubbles around the edge are actually common utility functions that are called from many places throughout the code – they’re so common that drawing the connections to every function that calls them would make the graph unreadable.

Since hack week, we’ve integrated Plop into our servers and run it regularly.  Every time we push new code to the site, a script collects a profile for the new version.  This has proven not to be disruptive, with less than 2% CPU overhead while the profile is being collected. It’s still a work in progress (especially the viewer), but has already proven useful in identifying performance regressions.

  • Dan Hlavenka

    So much movement…

    • Follow me

       At least something

  • Graham Dumpleton

    The alternative which I use, is to use cProfile on selected functions but only do the full profile on calls at a certain interval. So, in a web application, you may say to leave at least five seconds between each full profile sample. The results get aggregated on each profile and you write out a checkpoint file periodically with the snapshot of data. Because it is a full profile, albeit it sampled, you still get full detail and don’t miss anything. Being selective about the function you want to do it on and doing it only every so often means overhead isn’t noticeable. The resultant data file can then be processed with normal tools for handling Python profiling data.

    I did try the thread stack sampling technique you are using, but found it not as useful because it generated so much data outside of the context you were actually interested in. Also, you got a fair bit of confusing data in WSGI applications running on a C based server because you would see what appeared to be orphan calls when the WSGI server tried to yield the next item from the iterable returned by the WSGI application.

    One could make it sort of understandable using decent visualisation tool as you have, something I didn’t have the time to play with, but I found using other performance monitoring tools to narrow down on problem area of the code and then using cProfile based solution to work better. As I said the overhead wasn’t noticeable due to the sampling and so still realistic to use cProfile based approach in a production setting so long as have means to easily turn it on and off at run time as I did.

  • Davi Ribeiro

    Amazing tool!! i´ll download it now and make sum tests with graph generation…;)

  • Zooko

    See also statprof — — which is a sampling profiler for Python. I’ve used statprof successfully.

  • Anonymous Coward

    A treemap would have been a much better option for representing this kind of data.

  • talha

    does the size of the bubble represent the time spent within a function including its child calls?