One problem with using pandas in sims is we haven’t really understood what is required to add it to the requirements. Pandas is provided by anaconda, so most of our users will have it (since the official LSST python is anaconda - right?) but it is not part of the jenkins build system (which uses miniconda and thus only adds packages which are approved … although this is also kind of fuzzy). There are other python packages which are not LSST supported/distributed third party python packages – notably, astropy and scipy – which currently are usable by sims, so it’s been hard to understand where the line is drawn.
The third party packages currently supported are listed at
https://confluence.lsstcorp.org/display/DM/DM+Third+Party+Software
There are three levels of ‘supported third party packages’: (a) official, distributed packages (like pyephem, healpy, sqlalchemy, pymssql) (b) officially used but not distributed packages (ds9 and ws4py – intended to be things we’re using for development but may change in the future?) and © third party packages for developer use and not distributed (like scipy).
I think all of the third party packages sims currently uses are listed there, except astropy. Note that we use scipy in an integral way in the sims stack, although it’s only supposed to be for developer use in the third party package lists above.
Note that distributed third party packages are supposed to have an eups-packaged repo in github/lsst - see https://github.com/lsst/sqlalchemy for the sqlalchemy package, this seems to be pretty straightforward in most cases). Scipy and astropy do not currently have third party eups packges.
Confluence page describing the process of how to add 3rd party packages to the stack.
https://confluence.lsstcorp.org/display/LDMDG/Adding+a+new+package+to+the+build
This suggests that first you file an RFC (see https://confluence.lsstcorp.org/display/LDMDG/Discussion+and+Decision+Making+Process) which basically means filing a JIRA ticket, at which point you’re saying you want a particular package and are willing to do the work to make it happen, and to maintain the package for LSST.
If the RFC passes (doesn’t receive any objections), then you create a third party package following the instructions here:
https://confluence.lsstcorp.org/display/LDMDG/Distributing+third-party+packages+with+EUPS … looks like we could actually do this fairly simply for pandas, if we assume that scipy stays in its “used by developers by not released - so users have to provide this themselves” box (which is kind of not how we’re actually using it but perhaps would be good enough?? note that we already require it in sims packages)
and then add it to lsstsw/etc/repos.yamls file.
So it seems like we could package up astropy and pandas like this, file an RFC, and add them to the distributed third party packages. I think to do it easily we’d have to assume scipy is user-provided (at least, this is my impression from talking to Simon). It does make me wonder why we’re taking the effort to do this for these packages (and for sqlalchemy) given that sqlalchemy, astropy and pandas come standard with anaconda. However, to support users not using anaconda, this is necessary. [do we have any sims users who are not on anaconda?]
My point here is to try to document what we’d have to do. If I’ve missed anything, please add in comments. Also, please feel free to add comments on what other paths forward we might follow – sims is actually a bit different from DM, but since sims packages are built with jenkins (and we want them to be built with Jenkins), then we have to make sure to support that use too.