How should we be using matplotlib?

matplotlib isn’t a great interface, but it is (still?) the de-facto scipy standard and we are using it extensively.

Some issues have come up in the discussion of validate_drp that I’d like to raise here.

  • How should we set the backend? @jsick points out that we cannot rely on users having “sensible” ~/.matplotlibrc setting, so we need to set a backend explicitly in our code, so what should we do?
  • Should we use the object model? There are many documents on the web that propose that this is the one true way to use matplotlib, but my experience of the sorts of plots that are needed for exploratory data analysis, QA, and testing is that the pyplot interface is sufficient.

I am concerned about solutions to these questions that tie is more firmly to matplotlib.

I’d like to propose that:

  • We wrap any selection of a backend into a utility function that hides the matplotlib specifics from user code
  • We adopt the matplotlib.pyplot interface. I don’t think it’s worth writing a layer that pretends that we are plotting package agnostic, but using pyplot is pretty generic, and I can imagine moving on without too much agony (and I bet that the next best python plotting package will support 90+% of simple pyplot calls)

A very common thing (including in the astropy world) is to do

import matplotlib


In my experience, pyplot is fine for interactive analysis. However, the LSST Stack is not about interactive analysis; it’s a production-quality product that is most often used in a non-interactive computational context. I’ve personally ‘enjoyed’ using the full object-oriented API. The following snippet (I keep it stored in Vim as a template) allows you to specify the backend not as a global variable but as an actual object via the FigureCanvas and also gives you a GridSpec which is awesome for composing plots:

from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
import matplotlib.gridspec as grid spec

fig = Figure(figsize=(3.5, 3.5), frameon=False)
canvas = FigureCanvas(fig)
gs = gridspec.GridSpec(1, 1,
                       left=0.15, right=0.95, bottom=0.15, top=0.95,
                       wspace=None, hspace=None,
                       width_ratios=None, height_ratios=None)
ax = fig.add_subplot(gs[0])
# ax.plot(...)
canvas.print_figure(plot_path + ".png", format="png")

Doing this gives you full control over your plots and you don’t pollute the global matplotlib state by changing Matplotlib’s default backend in case the user has imported an lsst Python module as part of their own plotting code and are using a different backend.

Another recommendation is decomposing the plot code as much as possible such that the ax.plot(…) code is delegated to its own function (taking the axes as an argument). This helps separate the logic of plotting on an axes from the act of writing a plot to disk. Indeed, someone could call the axes plotting function from a Jupyter notebook and use a notebook-friendly backend rather than saving to a file.

Finally, I’d recommend against using matplotlib.pylab.rcParams to set the plot’s style.

matplotlib.pylab.rcParams is a bit of an anti-pattern for any code that could be imported by another Python user, since these rcParams change Matplotlib’s global state, and therefore change how a user’s own plots might look in the same Python session. I think we should move towards using Matplotlib’s stylesheets. Essentially you’d put all the styles in a file embedded in an LSST eups package. Then while plotting you’d apply the style sheet with a context manager, e.g.,

import matplotlib.pyplot as plt

    plt.plot(np.sin(np.linspace(0, 2*np.pi)), 'r-o')
1 Like

Indeed, the:

import matplotlib

is what we use in MAF to make it possible to run unit tests with the build bot where we don’t have all the backends.

I’ve grown to like using the object style plotting, but it does often break in notebooks which is annoying.

I think it would be great if we had an eups package with some pre-made stylesheets and maybe a few preferred color tables. We could have a stylesheet for “publication ready” plots, a sheet for powerpoint slide plots.

1 Like

I’d like to propose that 3 people who disagree be assigned to come to a policy and decision and that we then implement that across the project.

I’m somewhat unhappy with almost any approach from an abstract perspective. But I’m happy to implement any specific standardized practice, even if I dislike it, in the interests of it being standard and such that it can be improved/wrapped/refactored later.

Interestingly, I filed an Epic last week about exactly this topic:

I wonder if we wouldn’t be better off adopting Seaborn or some other “frontend” to matplotlib that takes care of a lot of the “make default values better” things, while providing extra features.

1 Like

@parejkoj It’d be interesting to compare Matplotlib stylesheets vs Seaborn as a way of getting good defaults.

The seaborn styles are built in to core matplotlib itself now (try, or see previews here).

Lots of reasons to love and use seaborn, but if you just want the styles you don’t have to add it as a dependency.

seaborn is nice indeed, but it is not pre-installed with anaconda IIRC.

I’m happy with @parejkoj’s proposal for a thin wrapper, which is (I think) consistent with my view what we should only use the pyplot-style interface in pipeline code.

I don’t quite understand @jsick’s comment re pyplot:

We agree that the stack isn’t interactive; the question is whether a pyplot-style interface is sufficient for what we need to do, and I think the answer is yes.

How should the backend be set by the Stack?

I guess I’ll rephrase that my argument against pyplot isn’t so much about the flexibility of its plotting interface. Indeed, once you do fig, ax = plt.subplots() you get an Axes instance and you’re effectively doing ‘object-oriented style’ plotting anyways.

My recommendation against pyplot is that it doesn’t allow one to choose the backend without doing so at the global import level, i.e.,

import matplotlib

I’m thinking of the case where we hard-code Agg for the pipeline to produce plot files, then somebody imports the LSST Stack in a Jupyter notebook and wants to use the nbagg backend. Will those global backend preferences conflict? The matplotlib.use() docs tell us that the global backend can only be set once.

Maybe it’s a non-issue, but I’m thinking this could be a source of confusion and bugs—especially considering that the science community, not DM, may be the primary users of the Stack inside Jupyter notebooks.

Perhaps it would be appropriate to introduce a LSST_MPL_BACKEND environment variable so we could something like

import os
import matplotlib
matplotlib.use(os.getenv('LSST_MPL_BACKEND', 'agg'))

Consistent context-based styling

Lastly, my preference to reduce the impact of global settings is also why I’d suggest we deprecate the use of matplotlib.pyplot.rc or matplotlib.rcParams for setting plot styles in our code.

Instead, we should be using context managers for styles, either via matplotlib.pyplot.rc_context or stylesheets.

I’ve taken another look at Seaborn after @ebellm mentioned it and I do like what they’ve done to make complex things like pairplot and jointplot trivial. Seaborn also supports context-based styling. For Seaborn we could create a Python package for LSST that merely contains a Seaborn-compatible dict with styles so that we can establish a consistent visual language across all of DM’s plots.

Yep, I think this is absolutely essential. I’d add that this Python package should not only be an EUPS package but also be a package so that we can get visually consistent LSST plots outside of the EUPS/Stack context too.