Data Challenge to characterize transient and variable objects

tmatheson · August 19, 2016, 6:45pm

On behalf of the Transient and Variable Stars science collaboration, we would like to create a data challenge to develop techniques to
characterize alerts. This includes what alert data are
necessary/sufficient to characterize them as well as what ancillary
data (beyond what LSST provides) are necessary/sufficient.

We need the community to contribute data on their favorite objects
(multi-band light curves are highly desired) as well as what other
contextual information is useful for characterization. Contextual
information includes nearby objects, multi-wavelength associations,
etc.

The ultimate goal is to generate a set of features for transient and
variable objects that will enable rapid characterization for futher
follow up and, eventually, more detailed classification.

We will collect data here:

Thanks,
Tom Matheson @tmatheson
Federica Bianco @fed
Ashish Mahabal @ashish

AstroVPK · August 19, 2016, 7:06pm

Hi Dr Matheson, I’m interested in contributing on AGN variability.

MelissaGraham · August 22, 2016, 10:44pm

I notice the call says data can be in ‘any’ format, with a request for a readme file, which is cool and convenient. I was wondering do we want to perhaps also suggest a nominal or preferred format, to ease conversion problems later, and if so whether there is a standard light-curve file format that e.g., many LC-fitters use?

ashish · August 23, 2016, 1:37pm

Good point/question, @MelissaGraham.

The minimum number of fields for a light-curve would be: time, flux, flux-error, filter class?.
Suggestion for units are given below with meta-comments on which it will be good to have inputs.
What was done when light-curves were collected in the past to work with LSST Sims? @KSK may know …

time of observation (MJD with fraction to the second)
Flux (mags) [some non-optical may use numbers expressed in terms of J or eV (but not crabs)]
error (mags) [same units as for the previous one. What is to be done when this is not available, or the flux is an upper limit?]
fliter [how is the waveband to be conveyed?]
class? [to what granularity?]

Possibly optional bits are source of the light-curve (survey), its id in the survey, RA, Dec.
The survey/source of light-curves should certainly be in the readme file.

Comments? Further suggestions?

KSK · August 23, 2016, 5:02pm

I think you all probably know this already, but just so it doesn’t fall through the cracks…

The baseline for the alert data will include many pre-computed features. We currently assume they will be the features presented in Richards et al. 2011, but this will change based on new work like the work going on here.

We did a round of collecting light curves a while back. The vast majority of them came in as text files: either one file per band or all bands stacked in a single file. Most did not come with errors as these were to be taken as truth lightcurves. Most periodic lightcurves were normalized to the period. The aperiodic ones were typically in units of days.

The notable exception is that the AGN were specified in terms of a structure function, so the lightcurve is realized at runtime (with a fixed seed to make them deterministic).

I don’t think that probably helps much, but that’s what we did in the past.

fed · August 23, 2016, 5:19pm

– time of observation (MJD with fraction to the second)
OR time to beginning of transient, since we do accept theoretical models
– Upper limits should be clearly tagged as such but I think we won’t use them. Still people can include them. May be useful in the future.
– fliter [how is the waveband to be conveyed?]
Tricky. Cause accurate conversions always require detail info, but again: for our purpose we can be more relaxed, we are not doing precision Cosmo. Filter band and photo metric system are necessary

Can you clarify what you mean by class?

fed · August 23, 2016, 5:19pm

Hi Simon,

can you share the data you have collected?

fed · August 23, 2016, 5:20pm

I am updating the format in real time on the github repo readme as I follow this discussion. I think some guidelines would be important early on, but don’t think I am closing the conversation in putting some place holders on the repo.

MelissaGraham · August 23, 2016, 5:20pm

Perhaps we could recommend all contributors use the “sncosmo” format for photometric data as an easy way to homogenize the data?
https://sncosmo.readthedocs.io/en/v1.3.x/photdata.html

(It’s basically the same columns as what Ashish suggested).

fed · August 23, 2016, 5:21pm

– time of observation (MJD with fraction to the second)
OR time to beginning of transient, since we do accept theoretical models
– Upper limits should be clearly tagged as such but I think we won’t use them. Still people can include them. May be useful in the future.
– fliter [how is the waveband to be conveyed?]
Tricky. Cause accurate conversions always require detail info, but again: for our purpose we can be more relaxed, we are not doing precision Cosmo. Filter band and photo metric system are necessary

Can you clarify what you mean by class?

fed · August 23, 2016, 5:22pm

I’d agree to recommend (without strictly enforcing)

KSK · August 23, 2016, 5:22pm

@fed all the files were eventually stuck in a database. I don’t know that we have the original files around anymore. @danielsf may be able to easily pull the lightcurves out. What do you think, Scott?

Regarding the AGN, do you want some simulated lightcurves or the structure function parameters?

And I agree that the bandpass is a really tricky thing. It makes things way easier if you can require a particular system (e.g. AB) for all the contributed lightcurves.

fed · August 23, 2016, 5:26pm

I think for AGN we can get our hands on the Time Delay Challenge data (I may have the inside track: it was Dobler et al. 2013). Those are simulated data

danielsf · August 23, 2016, 5:57pm

The light curves we collected are all distributed in the sims_sed_library package

(that github repo really just points to some bash scripts that curl the actual light curves down from the server where they are stored).

If you are interested in producing “realistic” light curves (i.e. theoretical light curves observed with an officially simulated LSST Cadence), the simulations software stack does contain a tool for doing so. It is demonstrated in a Jupyter notebook here.

drphilmarshall · August 24, 2016, 6:21pm

I opened an issue on spatially-correlated variables - is this something that this particular data challenge will avoid, for simplicity?

Also, presumably in the end we will need CatSim code modules that will be able to generate model light curves, rather than just examples of light curve data. Supporting this from an early stage could be really helpful. @danielsf, how hard would it be to define very simple (ie for astronomers :-)) Transient and Variable python classes that would import easily into CatSim? I guess we would all want to do something like from tvs.sources import SNIb etc (where SNIb would inherit from Transient)

One crucial thing would be that the tvs.sources package that contains these classes should not need to import the DM stack… They would just need to have the methods and attributes that CatSim needs, and I guess follow the CatSim coding conventions.

guillochon · September 6, 2016, 11:26pm

Hi all,

I know that @fed is already aware of this, but I just wanted to let everyone else know that for supernovae, novae, and tidal disruption events we have separate AstroCats catalogs with large collections of LC data available. By far the supernova catalog is the most mature and complete (several thousand LCs), but the tidal disruption and novae catalogs also have a large amount of photometric data available that’s not readily available elsewhere. The outputs from these catalogs are stored on GitHub in a series of repositories (listed below), and are all in the same JSON format which is described in a schema file.

Filter and photometric system details are currently limited but are provided when known; I am currently working with Dan Scolnic to improve the completeness of the filter/system for most of the observations.

Supernovae:

Tidal disruptions:

Novae:

ashish · September 7, 2016, 11:07pm

Two weeks back the Opening Workshop of the nine-month long SAMSI ASTRO program took place. I, with Jogesh Babu, am leading a group Synoptic Time Domain Surveys. One of the subgroups is going to work on issues related to putting such a data-challenge together. There will be good opportunities for cross-fertilization of ideas. More details on that should emerge soon.

juramaga · December 2, 2016, 8:58pm

Hi all,

I would like to share with you the preliminary outcome of several weeks of brainstorming among participants of the 2016-17 Program on Statistical, Mathematical and Computational Methods for Astronomy (ASTRO) opening workshop that took place last August at SAMSI (https://www.samsi.info/programs-and-activities/research-workshops/2016-17-astro-opening-workshop-august-22-26-2016/). As such, this is work in progress that is done by a subgroup of Working Group 2 of the SAMSI program: Synoptic Time Domain Surveys. The mentioned subgroup is working towards the development of a light curve classification Data Challenge in preparation for LSST, as others in this group. We now share these ideas with the LSST community to add momentum to this effort, to increase the flow of ideas and incorporate existing ones. This is not intended as a final document, but as an informed starting point that should benefit from your input. Please feel free to comment on the document: