Baby steps towards a more Pythonic API

This is in some sense an alternate proposal to the one on the table for RFC-81, but I didn’t want to hijack that issue page (and I wanted to get some more Discourse experience).

Before we start a big, stack-wide effort to improve our Python interfaces with a fundamental change to how we map C++ to Python, as proposed in RFC-81, I think we should consider actually implementing a number of changes that are either easier or less controversial, and then reassess where we stand with respect to providing a quality Python interface.

My list of such changes:

  • Turn on Swig’s support for keyword arguments in Python. This should give us keyword argument support for all non-overloaded methods, for very little additional effort.
  • Add properties (via Swig %extend blocks) for the most frequently-used getters and setters, including (but not limited to):
    • Image.array
    • MaskedImage.[image,mask,variance]
    • Point.[x,y]
    • Box.[min,max]
  • Replace heavily-overloaded constructors in afw.image with static method factories. This should produce more readable code in both languages, and avoiding overloading here will allow us to use keyword arguments automatically.
  • Replace usage of std::vector<T> for numeric scalar types in frequently used C++ classes with ndarray::Array (which will naturally convert to numpy.ndarray in Python).
  • Identify the top N (N=5?) most important unpythonic interfaces in need of improvement, and either:
  • attempt a C++ refactor with the aim of improving both interfaces (if the C++ interface is in bad shape, too)
  • add Swig %extend blocks to customize individual methods. This can include the addition of new helper methods or adding support for keyword arguments via %feature("shadow").
  • Generate reference documentation for Python code by introspecting the Python modules themselves (which would allow us to include docstrings added in the Swig layer, which are ignored entirely by Doxygen). I think this is relatively straightforward to do with Sphinx, at least as a proof-of-concept.

My personal hope for our C++/Python bindings is to move away from Swig towards Boost.Python or something like it, so I’m not suggesting we go crazy in trying to use Swig %extend blocks across the full stack. But I think we should go far enough to compare that approach to a shim or Boost.Python approach to customizing the mapping. More importantly, I think it’s very important we disentangle the multiple different factors that currently contribute to our poor Python interfaces:

  • interfaces that are bad in both languages
  • interfaces that are natural in C++ that simply don’t translate well to Python (regardless of tools)
  • interfaces that are bad because of the limitations of the tools we’re using to map from C++ to Python
  • interfaces that are bad because of the way we’re using those tools
1 Like

This all sounds great. I think that there are two parts of the RFC. One is an agreement that we actually need to change the python interface and the second one is how we go about doing it.

I do wonder if a class-by-class migration from swig to boost.python would be the way to go.

I put some effort into trying this a long while ago, and I do think it’s an option worth considering (I dropped it before because we didn’t have a consensus to adopt such a big change even if I’d gotten it done). If we don’t want to do it in a single giant change, though, I think we need to solve a tricky technical problem: how to get Boost.Python to recognize and use Swig objects (if we start the conversion at the top of the dependency tree) or vice versa (if we start at the bottom). I have some vague ideas about the latter involving Swig typemaps, but really getting that working well may be harder than just converting at once. I think the former may be impossible, at least without a lot of fragile reverse engineering of Swig.

At the risk of exposing my C++ naïveté, have we thoroughly considered what Cython could do for us? The cython project has recently become more complete and explicit in their C++ support and documentation. See:

I believe scipy and astropy, among others, use cython extensively. I do know we’re a different level of C++ integration, but it could be worth at least a cursory investigation.

I think it’s fair to say we have not thoroughly considered Cython - the last I looked into it at all was about 2 years ago, and while their C++ was not in good shape at all then, that’s plenty of time for things to have improved considerably. I think @timj has looked into it since then, but my impression was that it wasn’t from this angle (i.e. as a tool to wrap an existing C++ library).

I also have some philosophical disagreements with some of Cython’s architecture (I don’t like the fact that so much of the logic is in a code generator you can’t easily step through), but that’s not the kind of concern that should be considered a dealbreaker, especially given that all of the other options also have pretty deep problems of one kind or another. And C-level interoperability with other scientific Python libraries could turn out to be useful.