Report from LSST/AstroPy Summit, 3/26-3/27/2016

jbosch · March 31, 2016, 5:33pm

Following the Python in Astronomy and Science Pipelines Working Group Meetings at UW last week, we held a weekend meeting between various DM participants and a few of the more active AstroPy developers.

This report is a high-level summary, focused on action items; more complete notes can be found on the GitHub wiki for the in-progress SPIE Paper on LSST/AstroPy integration.

Packaging: Python end-users typically want to use pip for simplified installation and dependency management; LSST production and DM developers want the added power and flexibility of eups. Several proposals to make these coexist within DM packages seem feasible, with the most likely being (at least short-term) generating setup.py files with dependencies taken from eups table files. It’s essentially up to LSST whether to attempt any of this work (benefit is easier install, cost is work needed to implement). Might need to solve at least some of this for regions spin-off proposal (below).
Tables: LSST and AstroPy both have table libraries, but these have different data models and hence different strengths. LSST’s are better for row-based access and provide a limited ORM layer that’s valuable for pipeline use, while AstroPy’s excel at column-based access and use in interactive analysis. We anticipate being able to do a lot by just creating AstroPy views to LSST table objects, and we were able to put together a proof-of-concept implementation for doing this at the meeting. A more polished implementation will probably be available on the master branch of both products in a month or so.
Images/NDData: LSST’s Exposure and MaskedImage have a lot in common with AstroPy’s NDData hierarchy, as well as a few critical differences: AstroPy lacks “xy0”, has strictly boolean masks, and is more permissive about arithmetic operations on images with metadata. It also has a somewhat more general approach for dealing with uncertainty, with a somewhat less mature implementation. There was a lot of enthusiasm for resolving these differences at the meeting, with the next action being an AstroPy Proposal for Enhancement (APE) to propose making NDData objects behave much more like LSST objects. Given the feedback at the meeting, we expect most aspects of this proposal to be well-received by the AstroPy community, but there are many details to be worked out and other AstroPy stakeholders to be consulted.
Mappings/WCS: there was broad agreement that the astronomy community needs a WCS library and standard based on composable mappings, and that it’d be highly desirable for LSST, AstroPy, and JWST to lead this effort. It is unclear whether the pure-Python gWCS library currently in development will be usable by LSST, but we hope to at least agree on a data model and serialization format(s). It is possible that the serialization format could be broadly similar to VO standard, perhaps by converting STC-S to yaml. Next step is consulting other stakeholders and discussing possible implementations and formats in the new Astro Mappings category.
Modeling: AstroPy has a set of generic classes for representing modeling and fitting problems, which is used heavily in the gWCS library. LSST is skeptical of general solutions for such a broad domain, but could learn a lot from trying to use these classes to represent the more complex future version of the jointcal fitting problem. AstroPy is eager to learn of any limitations in astropy.modeling that may be revealed by such an experiment, in the hopes they could be addressed.
Regions: AstroPy team presented a nice vision for how geometry classes of different varieties could work together, and LSST team identified a few places where existing LSST code could provide useful implementations and extensions; this is essentially the contents of afw.geom, as well as afw.detection.Footprint. There was broad enthusiasm for “spinning-off” this LSST code into an AstroPy affiliate package (and rewriting other LSST code to use it), but some concern about whether LSST can justify doing the work (there are clear benefits for LSST MREFC mandate, but it’s unclear whether they’re worth the effort). Next step is for LSST to do a more careful accounting of the work involved and decide whether to allocate resources. We also identified the fact that interoperating on spherical geometry and sky pixellization would also be valuable, but we didn’t have the right personnel present to discuss these in detail, and we agreed they were lower priority for LSST than getting our Cartesian region code integrated with AstroPy.
Coordinates: AstroPy has a coordinate class that may do everything we need, as long as we can replace our C++ Coord class with a simpler and more efficient spherical point class. Action is for LSST to investigate further whether this is feasible, and if so, schedule the work. May involve submitting some proposals for changes to AstroPy code.
Middleware: LSST gave presentations on Task, pex_config, and Butler functionality, and AstroPy developers were generally very enthusiastic about using them - once ease-of-use, installation, and maturity issues were addressed. These could also be excellent candidates for spin-off to AstroPy, especially because they’re already pure-Python, but (apps) LSST developers didn’t want to speak too much for middleware developers. STScI has their own pipeline-stage system, stpipe, but this is generally less ambitious than LSST’s Tasks, and there’s some early AstroPy development on a Butler-like object (NdMapper) that a Butler AstroPy spin-off might usefully pre-empt. Next action is to bring this to the attention of people like @ktl, @npease, and maybe @mgckind to see if they have any interest in developing LSST middleware tools towards a more general audience.

rowen · March 31, 2016, 5:59pm

Regarding NDData: the base class can use non-bool masks as long as they can be cast to bool, but the support for named mask planes has not yet jelled and the AstroPy representatives were quite enthusiastic about offering something like our implementation: deeper masks with named mask planes. Nonetheless, they are keen to preserve a simple way of extracting a boolean array for those who don’t want to bother with the details. This suggests the need for some kind of default bitmask to identify mask planes that indicate bad data. This makes me a bit nervous (the last thing I want is users setting saving mask planes as state in an Exposure-like object), but as long as it is easy to continue to use a more nuanced approach we should be OK.

Perry Greenfield is writing an APE with input from myself and @parejkoj. I think Perry is not at all convinced that the equivalent to our ExposureInfo class should be a separate class from the data (they use our original model for Exposure as an object containing an image, a WCS, and miscellaneous extra information). @parejkoj and I prefer our model because it allows unpersisting the info without the pixel data, which is all many algorithms need (though that won’t really work properly until we figure out how to add the bounding box to ExposureInfo). In any case, there is good hope that NDData will evolve a class we can use as a view to Exposure.

timj · March 31, 2016, 6:03pm

This is exactly what the Starlink NDF standard allowed. You could access bit planes by name or you could specify a bit mask that would effectively turn the bits into a simple boolean mask. It was a really useful feature (for example, people had the option of simply asking for the data array and having the automask applied directly based on the bit mask without having to look at the mask array at all).

rowen · March 31, 2016, 7:46pm

This feature is clearly useful. However, most of our algorithms use different bit masks for different purposes, so I would be very disappointed if the user needed to change some persistent bit of state every time a boolean mask was wanted. Perhaps saving one default mask makes sense, but I would expect that users would not touch that. For non-default behavior the user should create a bit mask and use it as an explicit argument to a function that returns the boolean mask. That’s approximately what we do now, though we probably just apply bit operations to the full mask array. Simple bit operations are simple, and this allows flexibility in how different bit planes are applied.