Refactoring rubin_sim for 1.0 release

yoachim · July 29, 2022, 12:20am

Hi all,

We are planning on refactoring rubin_sim in the coming months. The rubin_sim repo is now home to the scheduler algorithm, MAF (for analyzing simulated surveys), photUtils, our sky brightness model, and more. This represents a large collection of code written by numerous authors over many years. As such, the code does not follow consistent naming and style conventions, making it tough for new (and old) users to read and contribute.

Our plan is to go through and refactor the code to follow the standard PEP-8 conventions. Namely:
• Python classes use PascalCase
• methods, functions, and variables use snake_case
• We will continue to run black on the code to enforce consistent style

While we’re refactoring, figured we might as well also:
• change all file and directory names to lower case
• shell command flags should all be --lower or --snake_case
• make sure all internal variables that are angles are in radians and have _rad on them
• Make other useful name changes for clarity (maybe Core_scheduler → ObservationScheduler)
• Update the database schema used for saving simulated observations
• Update docstrings and unit tests

This post is here to:

Warn users that if you have code that uses rubin_sim we will almost certainly be breaking it in the near future by renaming things. The code is on GitHub and conda-forge, so you can always install a previous version if you want to run your old code without updating.
Solicit any comments or suggestions. If anyone has good ideas on what something should be renamed, or if you’d like to help beta-test after the refactor we’d love to hear it.

timj · November 7, 2022, 4:02pm

Or use astropy units and angle?

yoachim · November 7, 2022, 6:34pm

I always worry about speed when moving things to astropy. And I don’t like forcing users to import astropy and set up their args as things with units. But I should probably give it a try in a few places and see if there’s a speed penalty.

timj · November 7, 2022, 6:54pm

We use astropy.units in astro_metadata_translator and all DM now depends on the astropy package. @parejkoj may also have some opinions on requiring users to pass in astropy quantities rather than floats.

parejkoj · November 7, 2022, 7:16pm

Are you doing units calculations on lots (thousands) of single numbers, just a handful of numbers, or on arrays of numbers? The only place where astropy units has any significant performance penalty is e.g. for x in array: x.to(u.otherUnit) type of calculations. array.to(u.otherUnit) is comparable in speed to doing the raw math with numpy.

yoachim · November 7, 2022, 8:21pm

Usually lots of large array calculations.

timj · November 7, 2022, 8:37pm

That’s fine. Astropy Quantity is designed to allow you to attach units to numpy arrays.

ehneilsen · November 7, 2022, 9:01pm

Doing this effectively is sometimes tricky. I’ve spent too much time tracking down performance bugs related to this, for example because an instance of SkyCoord with arrays of coordinates got unexpectedly transformed into a numpy array of objects, each of which was its own separate instance of SkyCoord, with unfortunate consequences. There are also issues with playing nice with pandas.

My point is not that we shouldn’t do it, I am personally ambivalent. We should just be aware that the cost in learning to use it effectively is higher than it appears, and we should expect to occasionally expect to need to track down related performance issues. It may not even be a bad idea make a habit of occasionally running a profiler to verify that only a small fraction of time is spent in units related code, so that we can refactor things when we’ve done them stupidly.

ljones · November 29, 2022, 12:38am

This has been completed and merged!
v1.0.0 of rubin_sim includes these changes; v0.14.0 is “the same-ish but pre-refactor”.