Update on Simulated Alert Stream?

alec_b612 · April 24, 2024, 4:15pm

Hello!
This line from the latest news digest piqued my interest:

The team was able to generate a full suite of science data products from the two data streams, including alerts from difference imaging, and demonstrated the ability to monitor science performance in near real-time, identify anomalies, and work together to resolve the issues during a 24-hour cycle.

In particular, are there plans to make available a full-scale simulated dataset that will contain realistic solar system objects and labels? We are building a pipeline around the alert stream and I want to be able to estimate our performance around the number of diaSource alerts, in particular the ratio of those with ssObjectId attached vs diaObjectId.

Thank you

ebellm · April 24, 2024, 11:33pm

Hi Alec, the simulated images used in the rehearsal included simulated solar system objects. We are aiming to attribute those in alerts in a future rehearsal, presently scheduled for June.

ljones · April 25, 2024, 1:59am

It may be worth noting that the operations rehearsal only ran for 3 nights and used simulated images based on the ComCam camera … Operations Rehearsal 4 will be similar.

It’s extremely useful for all of the things like testing the pipelines, but I don’t know if it will help you estimate the total number of diaSource alerts, as only a simple mock up of general variable objects was included, or the ratios of ssObjectId versus diaObjectId (this will depend also on whether all of the simulated solar system objects are assumed to be known already or not …).

If you haven’t already found these tech notes, they may be useful:
https://dmtn-102.lsst.io - estimates of the number of alerts for different kinds of objects
and
https://dmtn-109.lsst.io - in particular section 7, with the rate of alerts due to solar system objects and whether they’re attributed or not.

alec_b612 · April 25, 2024, 1:12pm

Thank you both, Eric and Lynne. My goal is to be able to properly estimate the the amount of computation load to run different subsets of the alert stream with the THOR algorithm. Due to the nature of how it uses clustering, I won’t really be able to estimate load without having a dataset that represents the shape of the data. Additional challenges are that we will likely be linking over rolling periods of 2-4 weeks.

Thanks for pointing me to the estimates. Is there a utility open to collaborations that does the synthetic alert generation? Perhaps we can use it to generate our own synthetic dataset. Anything to save on dev time.