Testing the lsst_sims stack

Understanding how to test an lsst_sims installation is becoming more urgent now that square is starting to handle our releases, particularly the conda binary releases. (sidenote: this is a good thing and I’m very happy square is extending their work to us).

DM have a separate “demo” which is used to test the DM software stack, after installation. This and the validate_drp package also provide the opportunity to do things like regression testing, to see the impact of changes to the code.

Things are a little more complicated, conceptually, for sims, as we have a very diverse set of packages which are not necessarily related. For example, sims_maf and sims_catUtils have dependencies in common, but do not actually operate on the same data and one runs metrics on opsim (or other) outputs, while one generates catalogs of objects.

When building the release from source, each package has unit tests which test that package and its dependencies.
If a user installs from source, these unit tests are run.
If a user installs from conda binaries however, the unit tests are not run on the user’s computer, and there could potentially be problems. In particular, it’s hard for square to know if the conda binary build was a success.

So – long-term, I think we need to build a “demo” package for sims that will exercise various pieces of the sims software.

In the meantime, to make @frossie and @jmatt 's life easier and happier, what should we do?

One possible option I wondered about was if the unit tests themselves could be used to do a simple test of the binaries, after being built.

Another option is for us to try to hurry up to put a set of scripts together. I have added making a proper demo to the list for next year’s planning (probably could get to this in the fall), but if we need something sooner, we could probably pull the guts out of a few ipython notebooks? The rush version would be quite non-comprehensive though.

Thoughts?

I don’t have a real answer to this quite yet. I’ll say that I think the “demo” for lsst_apps is a bit of an anachronistic misnomer at this point. It used to be an actual demo that people could learn from, but I don’t think it will necessarily be used for tutorials going forward. (and indeed, the future tutorials for lsst_apps will all have integration tests in CI).

Instead, I think I’d focus on the idea of ‘integration’ tests that exercise functionality in a sims installation. I think it would be too much work and possibly be quite fraught to completely combine documentation/tutorial efforts with integration testing efforts.

Agreed - our “demo” would definitely be integration tests rather than documentation.
We have other documentation.

It’s just that the easiest way for us to put together integration tests in a hurry would be to pull those out from some subset of our ipython notebooks (written for documentation purposes).

Okay - so if we put together some scripts that just run various pieces of sims, is the idea we should put that into a git repo? And we should include the expected outputs? How would those outputs be compared to the outputs of the various scripts?

On the basis that the perfect is the enemy of the good here…

I think the lsst “demo” script is not so much a thorough test (it’s a pale similacrum of one) but it’s sufficient for a couple of purposes - basically to make sure dependencies are reasonably well installed and it’s also useful for uncovering basic numeric imprecision errors in portability testing (like different definitions of int * and so on) and the breaking (or not) of some important interfaces.

So what I’d concentrate for the near term is doing a script that does something that is a “typical” user operation for the software in question - schedule a night, or calculate an orbit, or something like that. In the longer term, as the DM verification platform matures we can be more thorough. Right now it would be nice to have a “runtime” test as opposed to a “compile” test.

There is a fair number of scripts, and jupyter notebooks in the example directories. A current problem is that sometimes these fall behind the changes as they are not checked in the same manner as unit tests. Maybe running these as an integration test can be a first step to killing both birds with one stone? If people have longer analyses they have been running, these could be integration tests as well.

Sure - but I think they need to be moved into a separate package which square can clone and use. I don’t think it’s that easy for them to do a conda install of the binaries, and then also clone all of the individual packages to run these tests.

Jupyter notebooks are not convenient for these tests, I don’t think (although if they are fair game, that’d be good to know).
I also think that the analyses should probably not be super long. The “demo” ought to be fairly easy to run with each build (yes @frossie ?)

I agree. I support any sort of demo or test since it will be an improvement over nothing.

Both integration tests or a demo would be great and have their place. The DM stack demo has become an integration test. Now that we know that, it may help inform the creation of future demos.

I share similar concerns with @rbiswas that whatever is used needs to use stable parts of sims and be easily maintainable. My expectation is such a library will be run by automatically all the time by CI or when creating artifacts (binaries, etc). Similar to the DM stack demo. So if it breaks people will notice. Run-time expectations are on the order of minutes instead of hours to build and run.

I have already discussed this with @jsick and heard his greater vision. I completely agree. I think these initial steps are in line with where we want to be in the end. Which is a public facing site that provides interactive, stable docs and demos. Something that is not a one off from our development and CI but instead is sourced and integrated with it.

@ljones Thanks for bringing this up and creating the topic.