matplotlib isn’t a great interface, but it is (still?) the de-facto scipy standard and we are using it extensively.
Some issues have come up in the discussion of validate_drp that I’d like to raise here.
How should we set the backend? @jsick points out that we cannot rely on users having “sensible” ~/.matplotlibrc setting, so we need to set a backend explicitly in our code, so what should we do?
Should we use the object model? There are many documents on the web that propose that this is the one true way to use matplotlib, but my experience of the sorts of plots that are needed for exploratory data analysis, QA, and testing is that the pyplot interface is sufficient.
I am concerned about solutions to these questions that tie is more firmly to matplotlib.
I’d like to propose that:
We wrap any selection of a backend into a utility function that hides the matplotlib specifics from user code
We adopt the matplotlib.pyplot interface. I don’t think it’s worth writing a layer that pretends that we are plotting package agnostic, but using pyplot is pretty generic, and I can imagine moving on without too much agony (and I bet that the next best python plotting package will support 90+% of simple pyplot calls)
In my experience, pyplot is fine for interactive analysis. However, the LSST Stack is not about interactive analysis; it’s a production-quality product that is most often used in a non-interactive computational context. I’ve personally ‘enjoyed’ using the full object-oriented API. The following snippet (I keep it stored in Vim as a template) allows you to specify the backend not as a global variable but as an actual object via the FigureCanvas and also gives you a GridSpec which is awesome for composing plots:
Doing this gives you full control over your plots and you don’t pollute the global matplotlib state by changing Matplotlib’s default backend in case the user has imported an lsst Python module as part of their own plotting code and are using a different backend.
Another recommendation is decomposing the plot code as much as possible such that the ax.plot(…) code is delegated to its own function (taking the axes as an argument). This helps separate the logic of plotting on an axes from the act of writing a plot to disk. Indeed, someone could call the axes plotting function from a Jupyter notebook and use a notebook-friendly backend rather than saving to a file.
Finally, I’d recommend against using matplotlib.pylab.rcParams to set the plot’s style.
matplotlib.pylab.rcParams is a bit of an anti-pattern for any code that could be imported by another Python user, since these rcParams change Matplotlib’s global state, and therefore change how a user’s own plots might look in the same Python session. I think we should move towards using Matplotlib’s stylesheets. Essentially you’d put all the styles in a file embedded in an LSST eups package. Then while plotting you’d apply the style sheet with a context manager, e.g.,
import matplotlib.pyplot as plt
plt.plot(np.sin(np.linspace(0, 2*np.pi)), 'r-o')
is what we use in MAF to make it possible to run unit tests with the build bot where we don’t have all the backends.
I’ve grown to like using the object style plotting, but it does often break in notebooks which is annoying.
I think it would be great if we had an eups package with some pre-made stylesheets and maybe a few preferred color tables. We could have a stylesheet for “publication ready” plots, a sheet for powerpoint slide plots.
I’d like to propose that 3 people who disagree be assigned to come to a policy and decision and that we then implement that across the project.
I’m somewhat unhappy with almost any approach from an abstract perspective. But I’m happy to implement any specific standardized practice, even if I dislike it, in the interests of it being standard and such that it can be improved/wrapped/refactored later.
I guess I’ll rephrase that my argument against pyplot isn’t so much about the flexibility of its plotting interface. Indeed, once you do fig, ax = plt.subplots() you get an Axes instance and you’re effectively doing ‘object-oriented style’ plotting anyways.
My recommendation against pyplot is that it doesn’t allow one to choose the backend without doing so at the global import level, i.e.,
I’m thinking of the case where we hard-code Agg for the pipeline to produce plot files, then somebody imports the LSST Stack in a Jupyter notebook and wants to use the nbagg backend. Will those global backend preferences conflict? The matplotlib.use() docs tell us that the global backend can only be set once.
Maybe it’s a non-issue, but I’m thinking this could be a source of confusion and bugs—especially considering that the science community, not DM, may be the primary users of the Stack inside Jupyter notebooks.
Perhaps it would be appropriate to introduce a LSST_MPL_BACKEND environment variable so we could something like
I’ve taken another look at Seaborn after @ebellm mentioned it and I do like what they’ve done to make complex things like pairplot and jointplot trivial. Seaborn also supports context-based styling. For Seaborn we could create a Python package for LSST that merely contains a Seaborn-compatible dict with styles so that we can establish a consistent visual language across all of DM’s plots.
Yep, I think this is absolutely essential. I’d add that this Python package should not only be an EUPS package but also be a setup.py-installable package so that we can get visually consistent LSST plots outside of the EUPS/Stack context too.