Implementation of specialized SQauSH-like metrics and residual plots

I’m working creating a set of regression tests for the scarlet deblending repo and I’d like to solicit some feedback as to the best way to accomplish this. What we want to do is very similar to what SQuaSH can already do, but unforunately not similar enough to work right now, as far as I know. So first I’ll describe the problem, how it’s currently implemented, the problems with that implementation, and some proposed future solutions.

The problem

The scarlet repo wlll always be separate from the DM stack because the co-creator is off project and is now a faculty member with post docs and students who are actively working on improving the code and its scientific output. So having a DM branch of the repo will allow us to keep up to date with the latest features. However a problem that we’re increasingly running into is that changes made by researchers off project can have an adverse affects on the results of scarlet (on HSC data for now, Rubin in the future). So I started creating a testing framework scarlet_test which uses a set of ~100 HSC blends with fake sources injected and runs a set of measurements on all of the fake sources. Because the fake sourcs are limited in the information that we can test (for example they are all parametric models with little substructure) we also need to generate residual plots for a subset of ~15 blends to track changes in the deblending performance for different configurations of complicated blends with real spirals and irregular galaxies.

My initial thought was to use the SQuaSH tool but it doesn’t quite fit with the desired design:

  1. A scarlet developer creates a PR in 3rd party scarlet repo
  2. Travis automatically generates the measurements for that branch and generates plots to compare to previous merged branches.
  3. The scarlet docs are updated with the new branch plots, and a comparison of the residuals for the PR banch and master for the subset of special blends is displayed.

My understanding of SQuaSH is that it tracks metrics by date (but not another key, like branch name) and also doesn’t have a way to store and display the residual plots. Am I correct on both of those accounts, or am I lacking a proper understanding of its full set of features?

The current solution

To solve the problem I created the scarlet_test package, which does the following:

  1. Runs the default configuration of scarlet that we use in the stack on the new PR.
  2. Generates the measurements that we want to track.
  3. Stores the measurements for each branch.
  4. Uses netlify to build the docs, which displays the measurements for the last 10 PRs and displays the comparison between the master branch and PR residual for each blend (see https://scarlettest.netlify.app/).

This works, but there are two problems with the current workflow that we don’t like:

  • When the developer makes the PR in scarlet the measurements are made and pushed to a new branch in scarlet_test, however the developer still has to trigger the PR in github in order to build the docs.
  • The size of each residual plot is ~100 KB, so each PR has ~1.5MB of image data that will cause the repo to quickly grow in size. To prevent this locally we only store the previous residual plot and current residual plot for each blend in a separate commit, using git fixup to overwrite the old residual plots and new residual plots without having the data grow in size.
  • Netflify doesn’t have an easy way to track the location of a new branch by name. For example the preview docs for the testing branch are stored at https://deploy-preview-8--scarlettest.netlify.app/ , so there isn’t a simple way to link the docs from the PR to the scarlet docs to make it easy for a researcher to find them.

Possible improvements

  • A preferable solution would be to somehow leverage SQuaSH. This would require the ability to plot by a keyword on the x-axis, namely the branch name, and the ability to push updates to SQuaSH from Travis. We would also need some way to compare residual plots. If this is a possibility it might be worth just keeping what we have for now and waiting for an upgrade to SQuaSH.

  • If a SQuaSH solution isn’t possible our first thought is to use AWS or Google cloud to store the measurement data and residual plots that are created by Travis and dynamically load the plots when someone reads the docs.

  • The easiest solution to implement with the current framework is probably to use gitpython and git request-pull to automatically generate the PR in scarlet_test from Travis, after first checking that a PR doesn’t already exist for the branch. Then we could use git-lfs to store the residual plots, so at least the main repo doesn’t get bloated.

But this is outside anything that I’ve done in a while, so if people have other ideas I’d be happy to hear them.

Your initial questions were:

  1. Can SQuaSH curate (and display) arbitrary visualizations?
  2. Can SQuaSH make plots against something other than time?

The answer to 1) is definitely no. SQuaSH was intended to track scalar values over time and not to try to solve the problem of data visualization and drill down. The QA working group suggested that other avenues should be used for interactive visualization.

The answer to 2) is more complicated. The current version of the Chronograf interface only supports plotting against time. However, it does support tags as metadata on measurements. You can use that functionality to choose only the relevant PRs to plot (if you include the PR in the metadata). This will allow plotting measurement of the same metric for several PRs on the same visualization for comparison (though the x-axis will still be time). The newest version of Chronograf does have some minimal scatter plot capability, but I don’t have enough experience to know if that will support your use case.

If the answer to 2) does not need to use Chronograf, the influxDB, where these values are saved, has a nice Python client that can be used to query for data returned as Pandas dataframes which can be used to plot whatever you wish.

As to presenting your individual regression test visualizations, we do have a system for rendering templated notebooks called nbreport. This could replace your netlify app. Find instructions for use here.

SQuaSH alone can’t do all that you need but a combination of SQuaSH to store scalar metric values, Chronograf to visualize metric values by PR using the GROUP BY on PR name stored as a tag, stand alone notebooks using the InfluxDB client, and nbreport for templated report generation may do a lot of what you need.

I’m happy to talk more about it.

Thanks Simon. The actual visualization isn’t a problem, since they can easily be implemented in the scarlet docs similar to the way they are implemented with netlify. It’s really data storage that’s the issue, and if influxDB can’t support uploading an array or blob (the residual image) then I’ll probably look into other solutions for now.