I’m working creating a set of regression tests for the scarlet
deblending repo and I’d like to solicit some feedback as to the best way to accomplish this. What we want to do is very similar to what SQuaSH can already do, but unforunately not similar enough to work right now, as far as I know. So first I’ll describe the problem, how it’s currently implemented, the problems with that implementation, and some proposed future solutions.
The problem
The scarlet repo wlll always be separate from the DM stack because the co-creator is off project and is now a faculty member with post docs and students who are actively working on improving the code and its scientific output. So having a DM branch of the repo will allow us to keep up to date with the latest features. However a problem that we’re increasingly running into is that changes made by researchers off project can have an adverse affects on the results of scarlet (on HSC data for now, Rubin in the future). So I started creating a testing framework scarlet_test
which uses a set of ~100 HSC blends with fake sources injected and runs a set of measurements on all of the fake sources. Because the fake sourcs are limited in the information that we can test (for example they are all parametric models with little substructure) we also need to generate residual plots for a subset of ~15 blends to track changes in the deblending performance for different configurations of complicated blends with real spirals and irregular galaxies.
My initial thought was to use the SQuaSH tool but it doesn’t quite fit with the desired design:
- A scarlet developer creates a PR in 3rd party scarlet repo
- Travis automatically generates the measurements for that branch and generates plots to compare to previous merged branches.
- The scarlet docs are updated with the new branch plots, and a comparison of the residuals for the PR banch and master for the subset of special blends is displayed.
My understanding of SQuaSH is that it tracks metrics by date (but not another key, like branch name) and also doesn’t have a way to store and display the residual plots. Am I correct on both of those accounts, or am I lacking a proper understanding of its full set of features?
The current solution
To solve the problem I created the scarlet_test
package, which does the following:
- Runs the default configuration of scarlet that we use in the stack on the new PR.
- Generates the measurements that we want to track.
- Stores the measurements for each branch.
- Uses netlify to build the docs, which displays the measurements for the last 10 PRs and displays the comparison between the master branch and PR residual for each blend (see https://scarlettest.netlify.app/).
This works, but there are two problems with the current workflow that we don’t like:
- When the developer makes the PR in scarlet the measurements are made and pushed to a new branch in
scarlet_test
, however the developer still has to trigger the PR in github in order to build the docs. - The size of each residual plot is ~100 KB, so each PR has ~1.5MB of image data that will cause the repo to quickly grow in size. To prevent this locally we only store the previous residual plot and current residual plot for each blend in a separate commit, using
git fixup
to overwrite the old residual plots and new residual plots without having the data grow in size. - Netflify doesn’t have an easy way to track the location of a new branch by name. For example the preview docs for the
testing
branch are stored at https://deploy-preview-8--scarlettest.netlify.app/ , so there isn’t a simple way to link the docs from the PR to thescarlet
docs to make it easy for a researcher to find them.
Possible improvements
-
A preferable solution would be to somehow leverage SQuaSH. This would require the ability to plot by a keyword on the x-axis, namely the branch name, and the ability to push updates to SQuaSH from Travis. We would also need some way to compare residual plots. If this is a possibility it might be worth just keeping what we have for now and waiting for an upgrade to SQuaSH.
-
If a SQuaSH solution isn’t possible our first thought is to use AWS or Google cloud to store the measurement data and residual plots that are created by Travis and dynamically load the plots when someone reads the docs.
-
The easiest solution to implement with the current framework is probably to use
gitpython
andgit request-pull
to automatically generate the PR inscarlet_test
from Travis, after first checking that a PR doesn’t already exist for the branch. Then we could usegit-lfs
to store the residual plots, so at least the main repo doesn’t get bloated.
But this is outside anything that I’ve done in a while, so if people have other ideas I’d be happy to hear them.