Seattle Face-To-Face Dec 2015

For the face to face meeting in December (15-18) I propose we split the days into mornings with talks and discussions and the afternoons with hacks (code or design). For the discussion topics and estimated time for the discussion (the person listed will lead the discussion) see below:

  • review of the sky brightness model (Peter: 2 hrs)
  • walkthrough of opsim v4 design (Michael: 2/3 hrs)
  • look at pandas and how it fits with the sims code (?: 1 hour)
  • look at use cases for the parameter database (Scott: 1 hour)
  • overview of where we stand on rolling cadence (Kem: 1 hour)
  • overview of outcome from survey strategy workshop (Zeljko: 1 hour)
  • agree on what algorithms should be shared between sims components (Scott: 2 hours)

Please add any addition topics that would benefit from having everyone present in a discussion

Possible hacks:

  • Get Opsim running at NERSC on docker
  • Import bright stars into the db
  • Design how the sky model will interface to opsim V4
  • Review what we know about weather prediction models
  • Complete metrics from the survey strategy workshop

Please add any hack topics that you would like to work on.

Adding
For the discussion of observing strategy

  • what are the driving metrics that we should focus on from the observing strategy workshop to guide the validation and verification of the scheduler
  • a possible hack to triage the current observing strategy metrics and work out the common features that we could optimize (to help people write their metrics efficiently)

Possible hack:

  • design of comparison of the summary tables in MAF (and possible trimming the summary tables)

More hacks:

Pore through sims_photUtils package and see what we would like to simplify or change about the API. Compare to GalSim throughputs (and possibly something in astropy if there’s anything similar? or IDL? or synphot?).

Go through various kinds of catalogs that we would like to be able to generate and lay out their requirements or characteristics. Can this help simplify catsim or give it new functionality? (i.e. SourceCatalogs, ReferenceCatalogs, the current InstanceCatalog).

Travel Schedule Information

| Name | Arrive | Departing |
| ------- | ------  | ----------- |
| Cathy | Monday Evening | Friday Morning |
| Michael | Monday Evening | Friday Morning |
| Steve | Monday Afternoon | Friday Morning |

Possible hack:

Set up a framework to study correlations of MAF(able) metrics, particularly sliced on time windows. This could be useful in understanding how well fast heuristics which are useful features reproduce behavior of ‘slow metrics/calculations’. This is incomplete information, since it is conditioned on the distribution of OpSim visits, but would still be extremely useful.

Tuesday:
8.45am - 9.00am: Welcome
9.00am - 11.00am: walkthrough of opsim v4 design (Michael: 2 hrs)
11.00am -11.15am: break
11.15am - noon overview of where we stand on rolling cadence (Kem: 1 hour)
noon - 1.00pm lunch
1.00pm - 2.00pm Hack description and selection
2.00pm - Hacks

Wednesday:
9.00am - 10.00am: overview of outcome from survey strategy workshop: are there common surveys that we will need to support/run (Zeljko: 1 hour)
10.00am - 11.00am: look at pandas and how it fits with the sims code (Rahul: 1 hour)
11.00am - 11.15am: break
11.15am - 12.30pm: What algorithms should be shared between sims components (Scott: 1 hour)
1.30pm - 2.00pm: Hack review and updates
2.00pm - Hacks

Thursday:
9.00am - 11.00am review of the sky brightness model (Peter: 2 hrs)
11.15am - 12.30pm look at use cases for the parameter database (Scott: 1 hour)
12.30pm - 1.30pm Lunch
1.30pm - 2.00pm Hack review and updates
2.00pm - Hacks
5.30pm Christmas party

another hack? version control of config files with opsim (and tracking what versions were actually run of opsim-simulator, opsim-modify schema, MAF).

I realize that this is a special case of a few such goals all of which would require similar underlying software:
The goals I have in mind are:

  • (above post) Correlate a fast metric based on heuristics in time slices to a calculation based on simulation on the same time windows. As an example consider as a fast metric something like the current transient metric which assumes a transient is well-measured if a threshold of a certain number of points with a min SNR is met. How well does this represent a calculation where several transients are simulated in a field over time, and measured? If we could simulate a number of these and quantify how well they are measured, we could see how the metric based on heuristics correlates with this calculated quantity.
  • Study (maybe by visualization or computing summary statistics) how correlated different metrics are. We know that many of the metrics that we are coding up encode similar information, and it would be useful to see how close or different they are.

This should provide a general way of creating a (previously defined) set of self-adjusting (with further possibility of user tuning) plots and a few simple statistics over different functions of the variables (metrics/ calculations/etc.) calculated from datasets and a set of fields. A very useful feature of these plots could be interaction, For example, consider of the MAF metric maps we are all used to, and a. being able zoom into one part of the map, or b. being able to point to a pixel using the mouse and getting the time history in 10 years.

On the front end, for making the actual plots, we could try looking at bokeh
http://bokeh.pydata.org/en/latest/
http://stanford.edu/~mwaskom/software/seaborn/
Maybe also plotly
and I have also seen a few tools created here at UW but never personally used
which might be quite interesting:



https://idl.cs.washington.edu/papers/voyager/
I may have misunderstood, this feature, but these apparently have the ability to suggest a function of variables that would be more interesting

Benchmarking Sims Operations: Trying to use vbench or asv with sims:

We know that there are critical calculations that will have to be done many times when catsim is used in studies for large statistics. For example, think of how quickly we can pull up a combination of various objects from a focal plane size area from catalogs. The performance of such operations depends both on the code and the hardware etc. It would be good to try to have benchmarks that can be run on changes to either.

vbench: https://github.com/pydata/vbench or asv https://github.com/spacetelescope/asv/
for example used here : http://www.astropy.org/photutils-benchmarks/

Design a better way to catalog, track, annotate, download data and analysis for various sets of simulations and ultimately, a large number of simulations. Automation could be incorporated into the current production script which is under development. The hack would be to define/design the outward facing portals and functionality: determining where data is hosted & which collections will be supported/curated and on which ports; how does Run Log connect with MAF while syncing startup comments, allowing edits and updates, and providing descriptions or documentation (like Confluence Tier1); version control for opsim.py, modifySchema.sh, and MAF; how to present the MAF Run LIst most usefully (e.g. sort on most recent?); how to present the Run Log most usefully (links to data, analyses, filter list, sort list, hide/delete entries, maintenance).

To help the science WGs understand how to write a metric, reverse engineer an existing metric and turn it into pseudo code, and then into a written description of the metric so the WGs can understand how they should got about thinking about how to write a metric.

1 Like

We could talk about how DM is considering doing some “documentation hackathons” and if we want to do something similar in sims. That might be a good way to:

  • Get all the sims packages using sphinx
  • Make sure all the doc strings are properly formatted and up-to-date
  • Set up repos and outline papers for sims codes

Another hack: Expanding Lynne’s new notebook on comparing OpSim runs. We might want to think about how we could standardize metrics so our display tools can know if “bigger is better” or “smaller is better”. Mixing those two on the same display is pretty confusing.

Kem | Monday Evening | Thursday Evening |

This is just a test of imbedded discussion

Another embedded reply to a question

this is a reply seriously it is a reply