We are planning on refactoring rubin_sim in the coming months. The rubin_sim repo is now home to the scheduler algorithm, MAF (for analyzing simulated surveys), photUtils, our sky brightness model, and more. This represents a large collection of code written by numerous authors over many years. As such, the code does not follow consistent naming and style conventions, making it tough for new (and old) users to read and contribute.
Our plan is to go through and refactor the code to follow the standard PEP-8 conventions. Namely:
• Python classes use PascalCase
• methods, functions, and variables use snake_case
• We will continue to run black on the code to enforce consistent style
While we’re refactoring, figured we might as well also:
• change all file and directory names to lower case
• shell command flags should all be --lower or --snake_case
• make sure all internal variables that are angles are in radians and have _rad on them
• Make other useful name changes for clarity (maybe Core_scheduler → ObservationScheduler)
• Update the database schema used for saving simulated observations
• Update docstrings and unit tests
This post is here to:
Warn users that if you have code that uses rubin_sim we will almost certainly be breaking it in the near future by renaming things. The code is on GitHub and conda-forge, so you can always install a previous version if you want to run your old code without updating.
Solicit any comments or suggestions. If anyone has good ideas on what something should be renamed, or if you’d like to help beta-test after the refactor we’d love to hear it.
I always worry about speed when moving things to astropy. And I don’t like forcing users to import astropy and set up their args as things with units. But I should probably give it a try in a few places and see if there’s a speed penalty.
We use astropy.units in astro_metadata_translator and all DM now depends on the astropy package. @parejkoj may also have some opinions on requiring users to pass in astropy quantities rather than floats.
Are you doing units calculations on lots (thousands) of single numbers, just a handful of numbers, or on arrays of numbers? The only place where astropy units has any significant performance penalty is e.g. for x in array: x.to(u.otherUnit) type of calculations. array.to(u.otherUnit) is comparable in speed to doing the raw math with numpy.
Doing this effectively is sometimes tricky. I’ve spent too much time tracking down performance bugs related to this, for example because an instance of SkyCoord with arrays of coordinates got unexpectedly transformed into a numpy array of objects, each of which was its own separate instance of SkyCoord, with unfortunate consequences. There are also issues with playing nice with pandas.
My point is not that we shouldn’t do it, I am personally ambivalent. We should just be aware that the cost in learning to use it effectively is higher than it appears, and we should expect to occasionally expect to need to track down related performance issues. It may not even be a bad idea make a habit of occasionally running a profiler to verify that only a small fraction of time is spent in units related code, so that we can refactor things when we’ve done them stupidly.