SMWLV observing strategy task force

jgizis · October 15, 2019, 6:29pm

Dear SMWLV colleagues,

We should start up the task force to look at the observing strategy (cadence) simulations. This is urgent because many simulations already exist and reports may start as soon as January 2020. I understand that decisions about how often etc. to observe both high latitude and low latitude fields will be based on the suite of simulations.

The current status of the runs is available here:

I think what we really need are:

People to look at the output of the current simulations, there is a lot of info out there now that we can use.
People to write new metrics (MAF) IF ANY ARE NEEDED.
A discussion of what we think about the different proposals in terms of our science. That could lead to new “Figures of Merit” to express this conclusion.

I think it would be good to have this happen here on community rather than email for a variety of reasons.

John Gizis

dgmonet · October 16, 2019, 1:21pm

Gasp. What a large inventory of cadences. Are there clues for the clueless as to which to look at first?

In the Bad Old Days, cadence output appeared in MySQL format. The current ones appear as .db files. Is there a pointer to documents on what platform is needed and how to proceed. Is this a Jupyter thing? Sorry that I don’t work LSST every day and the learning curve is steep.

jgizis · October 16, 2019, 5:36pm

Yes, it’s amazing!

The MAF (in Python) is the way to read these. Maybe someone can post link to a tutorial? (like @yoachim. )

The SAC made recommendation on what to run (and why) after reading all the cadence white papers so I think reading through their report would be the best guide:

https://project.lsst.org/groups/sac/node/44

ljones · October 16, 2019, 7:11pm

Hi @dgmonet - the cadence outputs are now plain sqlite database files, not a full mysql database, hopefully this is even easier to access as you don’t have to set up a mysql database for yourself (see ‘sqlite’ at the command line or ‘import sqlite3’ in python).

But yes, I would definitely encourage you to use MAF (there are a lot of tutorials and links to more resources at https://github.com/LSST-nonproject/sims_maf_contrib/tree/master/tutorials) – especially if you want us to be able to recreate your analysis on any new runs in the future (there will be at least another one or two big sets of runs like this). It helps us a lot.

That said, I know you worked with @yoachim in the past to set up proper motion and parallax evaluations – those metrics are still in MAF (https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/calibrationMetrics.py) and are being run on the existing runs. Some of the results are available in the MAF analysis we have online currently (look for “ScienceRadar” in the MAF comment line, then click on the run name you want to investigate, and then click on “Milky Way; Astrometry” on the left column … an example of this in the current ‘baseline’ is http://astro-lsst-01.astro.washington.edu:8081/allMetricResults?runId=2#Milky%20Way).

Happy to help set up new metrics as needed.
Lynne

willclarkson · October 17, 2019, 2:35am

Hi @ljones cc @yoachim- I am attempting to use sims with Docker to try to short-cut the various dependency installations (machine is old). Is Owen Boberg’s Docker page still the place to look for instructions running sims within Docker, and does the sims version therein understand the latest round of OpSim outputs? https://hub.docker.com/r/oboberg/maf/

Thanks!! – Will

willclarkson · October 17, 2019, 12:19pm

Hi @jgizis @pmmcgehee - I agree with John’s 3-point list in the message at the top of this thread about what is needed.

Perhaps a good way to jumpstart this process would be for the CF task force leads to contact the lead authors of the latest round of cadence whitepapers (cWPs) to check in about whether the existing MAFs are sufficient to assess observing strategies - and, if not, what precisely is needed to make the assessment on a timescale of a couple of months.

Some cWP groups already have run their MAFs on older sims (so in theory could be re-run on the latest round of OpSims), some still need the MAFs to be coded from specs in their whitepapers (and/or the target MAFs may have evolved or simplified since the cWPs were written). Given the time-frame, I think it’s likely that the latest round of MAFs will be produced by dissection of the nice set of tutorial notebooks @yoachim and @ljones have produced in maf_contrib on github (unless someone has had a PhD student working on metrics that we don’t know about).

I think the same goes for the SMWLV metrics in the COSEP - it should be determined which of those MAFs really are the critical ones for a Jan-2020 (!!) time-frame. The emphasis should be on simple metrics.

@jgizis , do you have a record of who volunteered to lead the task forces at the PCW in August?

Cheers - Will

pmmcgehee · October 17, 2019, 1:00pm

@willclarkson - here are the notes from @jgizis SC e-mail of 26 August 2019:

"Here are some important updates from the LSST Project and Community Workshop held earlier this month in Tucson:

Crowded Field Photometry: The project is making a plan to process crowded stellar fields with a special deblender, i.e., a DAOPhot-like system integrated into the LSST DM pipeline, and not the same deblender being developed to deal with galaxies at high galactic latitude. They believe an in-house DM solution has significant advantages over using existing external codes. A document (plan) explaining their approach is expected around September 1. We will have two months to provide feedback. [See Task Forces below]
New “Task Forces” for 2019-2020: We had a meeting of SMWLV members. We decided to form a number of task forces which have a defined mission and finite lifetime. A number of volunteers, but we badly need more. Please sign up here:

The proposed task forces are:

CADENCE SIMULATIONS: Tasks: Review existing cadence metrics and identify new metrics. Work with “LSST MAF” group to implement metrics. Complete this within 4 months. After feedback from our collaboration, analyze results of simulations. Volunteers: Olsen, Clarkson, Leo, Marcio,

CROWDED FIELD: Read and comment on DM crowded field photometry plan by November. Volunteers: Clarkson, Nidever,Adriano Pieres

COMMISSIONING: Review LSST official document on commissioning tests. Lead discussion of tests that we feel are missing. Identify participants to directly participate in commissioning, early access sprints, data preview sprints. Many timescales, but reading and planning can occur in next six months. We anticipate most work is in 2020-2021. Volunteers: Olsen.

ASTROMETRY: Investigate Differential Color Refraction (DCR). Identify data and develop proposal if necessary. Volunteers: D. Monet, Jullio Camargo

CALIBRATION: Investigate water bands and calibration (y) filter. New calibration data expected in spring 2020, so start then. (Should also reach out to supernovae people). Volunteer: Pat Boeshaar.

DATA CHALLENGE: Plan sprint or data challenges of our own to do science. Volunteer: Clarkson.

Other task forces are welcome. The first three were considered the most urgent and important. Please volunteer!"

The Project intends to ask Science Collaborations to evaluate in-kind contributions from international contributors. What this means is yet to be worked out.

willclarkson · October 17, 2019, 1:02pm

@pmmcgehee cc @jgizis aha, thanks! So in this present thread I think we’re talking about the “Cadence Simulations” task force, on which it looks like myself, @knutago , Leo and Marcio volunteered to work.

ljones · October 18, 2019, 12:32am

Oof, we haven’t kept up with generating docker images of sims … that said, I think this would still work (especially if you did a git pull of the MAF repo - the ‘pull_repos’ command mentioned on Owen’s page).
I’ll see if we can spend a bit of time generating a new docker image. It’ll have to move locations because Owen is no longer with LSST (he moved on to a great data scientist job in Indiana).

willclarkson · October 18, 2019, 2:28am

Thanks @ljones - although the “pull_repos” command on Owen’s page didn’t work on my system, we were able to git pull a very recent maf_contrib & use some eups incantations (thanks to @yoachim) to get recent MAF to work in docker (we were able to run the CameraRotDist.ipynb from beginning to end). A record of the steps we took can be found at the google doc at the following link: https://docs.google.com/document/d/1wjXJnJsXuD1JQFwvThnjNcWwRoLaev_KWPWS9H04w6E/edit

lgirardi · October 18, 2019, 8:14am

Hi all, just to clarify my place in this matter: I hope to look at the metrics for “non-variable stellar science”, more specifically at the expected distribution of limiting magnitudes, star counts, maximum reached distances, etc, for the different cases and filters, for single visits + stacks. My time is limited and work will not start before 2 weeks. And everything will depend very much on me being able to run MAF remotely (thanks Will for the instructions, which I still did not try).

jgizis · October 18, 2019, 3:13pm

One thing I would like us to check is the Figure of Merit suggested by Loredana Prisinzan and Laura Magrini in their white paper on Star Formation and Open Clusters has been implemented and .

Their science case refers to gri bands.

jgizis · October 18, 2019, 3:39pm

One set of metrics/figure-of-merits I would like to explore would be based around the actual positions of star-forming-regions or open clusters or the bulge or known satellite galaxies or whatever [including both Northern and Southern Hemisphere. (It would also need depth, astrometry, or time coverage metrics for the various science case).

I think this would be valuable because it could be helpful in looking at the Big Sky Cadence, WFD, Galactic Plane both inner and anti-center, South Celestial Cap, etc. in a fair way.

lprisinzano · November 29, 2019, 8:03am

Hi John,
do you have any news about simulations of the metrics/figure-of-merits described in our White paper ( “Investigating the population of Galactic star formation regions and star clusters within a Wide-Fast-Deep Coverage of the Galactic Plane”)?

ljones · January 29, 2020, 8:37pm

Hi Loredana,

We have several new simulations that would be relevant for the galactic plane survey strategy questions. However, I looked at the metrics in your white paper and have questions:
We can certainly just look at the limiting depth in each part of the sky, but this doesn’t seem to really be the thing that we should be measuring - because dust extinction is not included in that simple limiting magnitude. Your figures of merit included some additional conversion between simply the coadded depth and the final figures of merit, which included something I can’t quite decipher about the distribution of the stellar population and the distribution of dust. Can you help us make better metrics that will include both of these?

willclarkson · July 28, 2020, 9:52pm

Hi @jgizis @lgirardi @pmmcgehee @dgmonet @knutago @lprisinzano @nidever and all interested in the observing strategy from a Milky Way perspective: with this message I am attempting to re-activate the SMWLV Observing Strategy Task Force.

Leads of the SMWLV-relevant 2018 Cadence Whitepapers will be contacted by SMWLV by email over the next day or few, but to start the conversation here: how goes the effort to evaluate metrics and figures of merit for your favorite science cases?

If you are interested in this topic, please make sure you’re aware of the MAF metrics hackathon organized by @rstreet and @fed - there are still places left if you want to register!

[Edit: forgot the hackathon link - here it is: LSST MAF Metrics Hackathon August 6-7]

Cheers - Will

willclarkson · August 5, 2020, 7:59pm

Hi @ljones @yoachim - we want to calibrate a particular LSST metric based on experience with observations, but we want to account for conditions (particularly seeing) during those observations. What is the best way to obtain LSST synthetic metadata that most closely match our comparison observations?

I think one way might be to select from OpSIM output the set that most closely matches our condition-information from the observations (presumably easier for cases in which the comparison observations have many fewer rows than the OpSIM output for the same locations). Is there an existing maf method that does anything like this?

If not then an alternative might be to request an OpSIM that matches our observations… does OpSIM have the ability to generate observations patterned after an existing set of observations in this way?

ljones · August 5, 2020, 9:24pm

I’m not sure I’m totally clear on what you’re asking for.
It sounds like you have a series of (real) observations where you know the (real performance) answer, and you have a metric which you think will give you something which should approximate the performance, but that you still need to figure out fudge factors or something that will relate the metric output to real performance.

So you have metadata from the real observations – why not just input those into MAF?
Your metric requires particular columns, and it sounds like you’re maybe going to run this on one point in the sky, so you could just format your observation metadata to a numpy recarray with the required columns, and then use the metric “run” method directly. Kind of like this:

m = TestMetric()
dataSlice = np.array([observation metadata], dtype=[(colname, np type), ..])
m.run(dataSlice)

Of course, maybe you’re looking to run this over more than one point in the sky? If so, then you’ll need to do a bit more.
In that case, I would get all of your observation metadata into a numpy recarray, set up a HealpixSlicer (as normal), set up your metric (as normal), then set up a MetricBundle and MetricBundleGroup (as normal, but pass opsdb=None).
Then instead of doing metricbundlegroup.runAll(), use metricbundlegroup.runCurrent(None, simdata=).

Would that work?
(I’m just thinking that using your actual metadata might work even better than using simulated metadata which is close but not quite the same. Presumably you already know any translations needed, if you can choose a subset of opsim pointings that match).

(also note: numpy recarrays aren’t terrible to set up by hand, but if you already have your data in a pandas dataframe because those are easy to read from CSV or other sources, you can turn that into a recarray just by using the to_numpy method on the dataframe).

willclarkson · August 5, 2020, 9:38pm

Thanks, @ljones - this helps! We do indeed want to do this over more than one point in the sky (like >100 square degrees). It’s really good to know that MAF can work with metadata that happens not to have been built by OpSIM, and it sounds like this is the path we’d want to take in this case.

Just to confirm: does MAF require that all the metadata columns be populated? Or can we insert only the columns relevant to the metrics we want to run? Is there a minimum set of columns that MAF looks for before it will proceed?

(Also - thanks for the pandas dataframe tip, I think you have given me a reason to finally learn how to use pandas… )

ljones · August 6, 2020, 2:21am

You just need the columns required for the metric and the slicer - and if they’re called different names, that’s ok just configure the MAF pieces accordingly

There are plenty of things I find frustrating about pandas, but it is quite nice overall. And read/write is pretty nice.