← Articles

Mastering matplotlib: The First Month

A cold email, a move to the Midwest, and a month spent deciding that "mastering" means going down to the architecture — the layers, the event loops, the module graph itself.

A little over a month ago, an email arrived from an acquisition editor at Packt named Meeta Rajani, asking whether I might be interested in writing a book called Mastering matplotlib. I read it twice, mostly to make sure it was real. Then I sat with it for the better part of an afternoon, because saying yes to a book is not a small thing, and I wanted to be honest with myself about why I would.

The honest answer surprised me a little. I have not written Python for a living in some years now — my days lately are spent in Erlang, LFE, Common Lisp, and Clojure, which is about as far from pyplot as a person can wander and still be plotting anything at all. But I studied physics and mathematics before I wrote software for a living, and the love of scientific computing and of seeing data never really left. matplotlib was the tool I reached for back then, and I have a great deal of affection for the Python community that built and sustains it. A recent move from San Francisco to a much quieter corner of rural Minnesota had, among other things, given me something I had not had in a long while: the time to try something I had long wanted to try. Writing this book felt like a way to give a little back — to the craft of visualization, and to a community I owe more than I have repaid.

So I said yes. It's going to be a wild ride.

What "mastering" ought to mean

Packt's original brief contained a line I kept coming back to. The book, they said, should teach how matplotlib works, not just what to do with it. That is exactly the kind of sentence that is easy to nod along to and hard to actually honor, and I decided early that honoring it would be the whole point of the book.

There are already good matplotlib books — I'll be recommending a few of them in these pages, because pointing you toward other people's fine work is part of the job. Most of them, quite reasonably, teach you what to do: here is how you make this kind of plot, here is the incantation for that. A "mastering" book has to earn its title by going one level down. Not more recipes — the layers underneath the recipes. Not "call this function," but "here is the architecture that function lives in, and here is how you can go and read it yourself." My real goal, the one under all the chapters, is to make you no longer need the book.

That ambition showed up first in the outline, which I spent a hard few days on before I wrote a line of the book proper. An outline is not paperwork; it is a design act, and this one changed shape as I thought. I had planned separate chapters on architecture and internals, then found there was not enough daylight between them and merged the two. Reading through months of mailing-list and Stack Overflow questions, I noticed how many advanced users were still genuinely confused about matplotlib's several APIs — so I carved out a small dedicated chapter on exactly that, right after the architecture. I had wanted the cloud chapter to cover Elastic Beanstalk and Heroku and OpenShift besides, and admitted to myself that would take a second book, so it became two, done properly: plain EC2 and Google App Engine. And I still cannot decide whether big data or clustering should come first, because you cannot really explain one without the reader having felt the limits that motivate the other. I suspect the writing will decide it for me.

A month in the open

Here is a thing worth admitting about writing a technical book: the finished object is serene and orderly, and the making of it is nothing of the sort.

I set the book up as a family of small repositories — one per chapter — so that every example would be something you can clone and run, not something you have to reconstruct from a printed page. Reproducibility is not a nicety; it is the difference between a book you read and a book you use. The bulk of the structure came into being almost at once, stamped out from a single seed, and then I began filling it in — which is where the orderliness ends.

Because chapters do not fill in evenly. Some of them come pouring out; some of them sit there as an empty scaffold while you circle the topic waiting for the way in. As I write this, four of the nine chapters have real content and five are still skeletons with nothing in them but a heading and a promise. That is not a confession of failure — it is just what the middle of a book looks like, and I'd rather you saw it honestly than imagine the thing arrives fully formed.

The chapters that did come first were the early ones, and they came in two hard weeks. Chapter 1 needed real data to be worth anything, so I threw out the tidy little sample I'd started with — a hundred-odd rows of recent baseball statistics — and replaced it with the full historical record: just under twenty-two thousand player-seasons of batting, reaching all the way back to 1871, complete with the honest gaps where nineteenth-century leagues simply didn't record a number. There's not enough neatness in a data set like that to do tidy science, but there is an enormous amount to explore, which is rather the point.

And as those first finished chapters began making their way back to Packt for review, the book quietly gained a second editor — Sumeet Sawant — to shepherd the drafts as they come in. It's a small thing, an extra name in the email thread, but it's the kind of small thing that makes a project feel real. A month ago this was a single message in my inbox from Meeta Rajani; now there are people on the other end of it, waiting on pages.

An instrument, not a diagram

The chapter I am proudest of so far is the one on architecture, and the reason is a small tool I built for it.

It would have been easy to draw you a three-layer diagram of matplotlib — backend, artist, scripting — label the boxes, and move on. Diagrams like that are comforting and, taken too literally, quietly wrong; the running code does not actually partition itself along the lines the tidy picture suggests. So rather than hand you a diagram to trust, I wanted to hand you an instrument you could point at the framework yourself.

The tool subclasses Python's own modulefinder to watch matplotlib import itself, recording which module pulls in which and how often, and building that up into a weighted graph of the framework's internal dependencies. On top of the raw graph I laid a hand-made map of matplotlib's real submodules onto its architectural layers — backend, artist, scripting, configuration, utilities — so that the picture is not just "these files import each other" but "here is the shape of the thing, by role." Run it against an average plotting script and matplotlib's structure draws itself, which is a far more honest teacher than any diagram I could pose for you.

Teaching the event loop by building a worse one

The chapter I'm in the middle of right now is on event handling and interactive plots, and it has taught me — again — that the fastest way to explain something intimidating is to build a deliberately terrible version of it first.

An event loop sounds like systems-programming arcana. So the chapter starts with the most naive loop imaginable — a while loop that does nothing but spin — and then grows it, a piece at a time, into a small class that keeps a registry of handlers, runs until something tells it to stop, and knows how to bow out gracefully when you interrupt it. It is a toy. I say so plainly. But it is a toy with a purpose: it is very close, in spirit and in code, to the real event loop matplotlib runs underneath your interactive figures, and once you've built the toy the real thing stops being magic.

And then, because this voice does not oversell, an honest wrinkle: I wanted to show off keyboard interaction inside the IPython Notebook, and discovered that the notebook backend simply did not support keyboard events yet. No amount of wishing changed that. So the section does the grown-up thing and drops down to the native GUI backend to make the keyboard demo work, and tells you exactly why it had to. Your tools have edges; a book that pretends otherwise is worse than useless.

The best part: the people

The thing I did not expect to love most about writing this book is the excuse it gives me to write to strangers.

Part of what "mastering" a community's tool should mean is engaging with the community itself, so I've been reaching out — to physics professors, to working experimental scientists, to people whose example code I admired — asking permission to adapt their work and, where I can, to make it better. The generosity has already been humbling. One physics professor not only let me use code from a tutorial he'd written but sent along more; because his used a since-deprecated module, I modernized it, which felt like a small way of returning the favor. Another scientist, formerly of the National Radio Astronomy Observatory, gave his blessing and — it turns out he's writing a book of his own — offered better examples still. A third piece I needed had already been released under a permissive license by some generous folks at the University of Toronto, so I could simply use it with thanks.

I've written to a good many more people than have written back so far, and I have high hopes of hearing from them in the weeks to come. Every one of these examples will be credited, by name and link, where it lives in the code. Pointing outward like this is not politeness; it is the truest thing a book like this can do — remind you that the tool in your hands was made by people, and that you are now part of the same lineage.

Where things stand

So: four chapters written, five waiting; a tool that draws matplotlib's own architecture; a toy event loop with a real one hiding inside it; and a growing pile of gracious email from people I've never met. There is a great deal of book still ahead of me, and I have no illusions that the second half will be as tidy as this note makes the first half sound.

But the shape of the thing is clear now, and the shape is the ambition: not to tell you what to type, but to walk you down into the framework until you can find your own way around it — and then, gently, to let go of your hand.

More soon.