LSST Answers to Community Broker FAQs

science
brokers
Tags: #<Tag:0x00007f61a6da0980> #<Tag:0x00007f61a6da0700>

(Melissa Graham) #1

Contributing Authors: @ebellm, @swinbank, @leanne, @ktl, @ctslater, @gpdf, and @MelissaGraham

Introduction: The following is a list of “Frequently Asked Questions” that broker developers and the science community posed to the LSST Project during the LSST Community Brokers Workshop in Seattle in June 2019. Answers have been developed by the Data Management team. Please feel free to ask additional questions and/or provide comments on this topic.

Q1. When can brokers expect to get the first stream of simulated/real alerts from LSST?

A simulated alert stream that is based on LSST-processed data could be made available as early as Q3 2019 (LDM-612). Such a simulated stream would be generated from archival data (and therefore have no guaranteed latency). An alert stream from commissioning data is expected to be available with substantial latency (i.e., not within 60 seconds of readout) three months after the start of sustained observing with ComCam, as early as Q4 2020 (LDM-612). The full timeline is available in LDM-612.

Q2. Where has LSST provided the alert packet schema and/or a single example of an Alert Packet, and/or of a SSObject?

Alert schemas are described in DMTN-093, which also includes a link to a preliminary schema*. As discussed in DMTN-093, schema are not immutable and are subject to change. Simulated example alert packets could be made available as early as Q3 2019 (LDM-612). The schema for an SSObject will be available in Q4 2019 at the earliest (the SSObject schema are currently being updated).
*That is not a permanent link to the alert schema: if it turns into a dead end, contact Eric Bellm for the new destination.

Q3. Can a broker co-locate at NCSA?

To co-locate a broker at NCSA requires the negotiation and conclusion of an agreement between the broker proposers (or proposing team) and NCSA (contact Margaret Gelman), and would be independent of the LSST Construction Project. Whether selected brokers co-locating at NCSA increases the number of selected brokers is TBD.

Q4. Will the LOI be shared publicly?

The Letters of Intent (and all other proposal-related communications) from parties building broker infrastructure and/or requesting to receive the LSST alert stream are considered private communications and will not be made publicly available by the LSST Project.

Q5. How many full-stream brokers will be supported?

It is a requirement that the Data Management System be able to support the distribution of at least 5 full streams (soon to be added to LSE-61). This is limited by the outbound bandwidth from the NCSA (currently 10 Gb/s; LDM-148). Options to increase the number of brokers to which alerts may be distributed are under investigation by the DM team, and include pre-filtered streams, removing image stamps from alert packets, and/or increasing the time window for alert distribution. The final number of full-stream brokers that will be supported is the responsibility of the Broker Selection Panel (Section 4.3 of LDM-612) and the LSST Operations team.

Q6. Could stamps instead be made immediately available in a database?

The LSST DM team is working to establish the cost and schedule impacts of migrating to an Alert Production pipeline with an alternative storage and distribution system for the stamps (e.g., generated and kept in an accessible database, or made available via a cutout service). Broker teams should continue to provide feedback on their optimal mode of stamp access, while also preparing for alert packets with stamps. DM’s progress on studying these options can be followed here. Pending a technical design review of that study’s findings, any decisions will be announced, e.g., in the solicitation for full proposals.

Q7. How do users access the Prompt Products Database (PPDB) when they are working with alerts via a community broker?

External entities (e.g., brokers) may interface with the PPDB via the Web API aspect of the LSST Science Platform, and use the TAP interface to query the PPDB catalogs by, e.g., using DIAObject/DIASource IDs from the alert packet as keys (LSE-319, LDM-542,-554). The PPDB catalogs are updated with new data within L1PublicT of image acquisition (currently 24h; LSE-29). More information about TAP can be found on this IVOA and this CADC website.

Q8. How do users connect to a community broker, or subscribe to Alerts, from the LSST Science Platform?

Each community broker defines its own user interfaces through which individuals can access the broker’s alert stream filters, value-added data products, queryable database, etc. This access might be via, e.g., web browsers or desktop clients. It is expected that most scientists will access LSST alerts via community brokers (note that users will not be able to subscribe to full, unfiltered alert streams coming directly from LSST). While it is conceivable that users could ingest a set of broker-filtered alerts to their user account in the LSST Science Platform (LSP) – and note that this would be subject to the same data storage limits as any other uploaded data set – a more efficient use of the LSP would be to directly access the original Prompt data products from which the alert packets are derived (i.e., the images and catalogs described in Section 3 of LSE-163). The contents of the Alerts and the Prompt products database are essentially identical.

The LSST Alert Filtering Service will provide basic, limited capacity access to the LSST alert stream; it is sized to allow 100 simultaneous user-generated filters to return 20 alerts per visit (Section 2.2.4, LSE-61). It is expected that users of the LSST Science Platform (LSP) will be able to define an alert stream filter in the LSP environment, and have it installed in the LSST Alert Filtering Service, which is separate from the LSP. (LSP facilities are for analysis and queries of the LSST data products and not for continuously-running processes such as alert stream filters.) The LSST Alert Filtering Service is expected to provide VOEvent format alerts (or similar; Section 3.5.2 of LSE-163). Users may receive their filtered alerts from the LSST Alert Filtering Service by, e.g., a simple User Interface provided in the LSP via the Portal Aspect (Section 3.9 of LDM-542; Section 2.9.5 of LDM-554), and/or a direct connection using standard IVOA protocols to a third-party system (e.g., VTP to a private server). It is important to note that although it should be possible to query the Alerts Database from the LSP (LDM-542), the Alerts Database might only support queries by alert ID.

More information about VOEvents, the IVOA (International Virtual Observatory Alliance), and VTP (VOEvent Transport Protocol) can be found in Seaman et al. (2011) and Allan et al. (2017).

Q9. Every year when a new data release (DR) is released, should brokers simply replace the aggregate set of alerts with the DR? With the Zwicky Transient Facility (ZTF) alerts, the DR was entirely separate.

Brokers would not replace the aggregate set of alerts on a yearly basis because there is no direct analogue of alert packets in a data release (LSE-163). DIAObjects and DIASources will be produced in the data release, but will not replace those in the Prompt Products Database because their processing details will differ. Users may query whichever database is most useful and appropriate for their science.

Q10. Is the LSST Prompt Products Database (PPDB) different from the aggregate set of alerts that were previously sent out? If yes, how do brokers access the PPDB? If not, should brokers mirror the PPDB?

The PPDB is a superset of the contents of the alert packets because it contains forced photometry data, which will not always be included in the alert packets. For example, a new DIASource which is not cross-matched to any DIAObject in the PPDB will have 30 days of precovery forced photometry generated for it, which is stored in the PPDB. However, the PPDB keeps only a ~12 month history of the products of Difference Imaging Analysis (DIA). If there is not another DIASource at that location within the next year, that forced photometry will never be released in an alert packet. More information on the data products and the alert packet contents can be found in LSE-163.

Individual users and brokers will access the PPDB through the three LSST Science Platform interfaces: Portal, Notebook, and API (LSE-319). Policies for mirroring the PPDB, and supporting the bulk data exports that would be required for mirroring are in development by the operations teams. The nightly updates to the PPDB are estimated to be ~100 GB, which is derived from the estimated increase in PPDB size of 30 TB/year, divided by 300 observing nights per year (LDM-141).

Q11. What will be provided in the alert packets for the “variety of variability metrics computed on the updated DIASource lightcurve” mentioned in LDM-612 (page 7 bullet IV)?

LDM-151 provides a baselined set of light curve features that could be updated as part of Prompt Processing and released in the alert packet. Refining this set of features is the topic of a current study by the Data Management System Science Team, who will bring specific proposals and questions back to the science collaborations later this year. Suggestions and comments about these light curve features are very welcome, and should be directed at Eric Bellm and Melissa Graham.

Q12. Can we say LSST is about 30 times ZTF, in rate and bulk? Or maybe 20, 50, 100 is a better estimate of the multiplier?

ZTF averages 1025 alerts/image, and takes one image every 40 seconds (30 second integration, 10 second read/slew). Alerts average about 70 kB (although they contain less catalog information than LSST alerts, they have an extra cutout and an uncompressed Avro schema). That implies an average data rate of 14.4 Mbps.

For LSST our numbers are 10,000 alerts/image, images every 34 seconds, and 82 kB packets for an average data rate of 193 Mbps, a factor of 11.5 larger in numeric alert rate and 13.4x larger in data rate. The ZTF calculation above is for all images; the public surveys use 40% of the ZTF time, so the integrated data volume of ZTF public alerts is 1/33.5 of LSST.

Q13. With respect to the changing data policy, can an external broker be considered as, or included within, “in-kind” products?

LSST does not yet know if brokers will be valid for in-kind or not, but they may not be (for various reasons including the public nature of the stream). It will take some time before the agencies are prepared to consider what kind of value-added in kind items are acceptable. It is expected that evaluation and selection of brokers will proceed without regard to their potential status as in-kind contributions. All inquiries about in-kind contributions for data rights should be directed to Robert Blum.

Q14. How long will the alert system retention period be (that is the time an alert is stored on a topic before it is discarded by the alert system)? What will be the procedure to retrieve missing alerts by a broker after this period of time?

The retention period in the alert distribution system is still to be determined, but given the alert stream data rate of ~800 GB/night it is currently reasonable to expect a retention period of ~7 days.

Brokers with data rights can retrieve old alert packets from the Alerts Database. However, note that the requirements on user access to the Alert Database are not yet fully developed, and may be limited in terms of how alert packets may be queried (e.g., alert ID), or by the bulk download capabilities (DMTN-102).

Q15. How many topics are envisaged by the alert system? One single “raw” alert stream? Or will there be multiple topics available for the community brokers?

The primary mode envisioned is that the full raw alert stream will be made available to all community brokers, which may be partitioned into nightly topics. Broker teams may request to receive a filtered subset of the alerts as part of their proposal process (LDM-612). More information about alert stream topics can be found in DMTN-093.

Q16. How will the collaboration between LSST project and the community brokers be organized?

As described in LDM-612, the formal component of this collaboration is that “Selected brokers will be expected to sign a Memorandum of Understanding codifying agreement to respect LSST Data Rights policies (where relevant) and outlining expected interfaces, support, and Service-Level Agreements for both parties.” However, ideas surrounding hierarchical, peer-to-peer, and decentralized networks of brokers – which emerged in discussions of VOEventNet over 10 years ago – were discussed at the recent LSST Community Broker Workshop (CBW). Given that such architectures could facilitate a much wider access to the LSST alert stream, but also may potentially require that “downstream” brokers be subject to LSST policies, this formal component might be revisited and revised by the LSST Project. Any changes would be communicated to the brokers who have submitted an LOI.

Informally, this collaboration will be strengthened via interaction at meetings such as the CBW; documentation such as the forthcoming white paper from that meeting; and communications through the Science Collaborations as their members become initial users of the brokers’ software and begin to provide feedback. Furthermore, as the DM team proceeds with the various investigations regarding, e.g., filtered streams, postage stamps (as described in the answers to other questions) broker teams may be contacted to provide input. There will also be opportunities for brokers to communicate their needs to the LSST Project as part of the full application process.

Cited Resources
DMTN-093 “Design of the LSST Alert Distribution System”
DMTN-102 “LSST Alerts: Key Numbers”
LDM-141 “Alerts Sizing Spreadsheet”
LDM-148 “Data Management System Design”
LDM-151 “Data Management Science Pipelines Design”
LDM-542 “Science Platform Design”
LDM-554 “Data Management LSST Science Platform Requirements”
LDM-612 “Plans and Policies for LSST Alert Distribution”
LSE-29 “LSST System Requirements (LSR)”
LSE-30 “Observatory System Specifications (OSS)”
LSE-61 “Data Management System (DMS) Requirements”
LSE-163 “Data Products Definition Document (DPDD)”
LSE-319 “LSST Science Platform Vision Document”
Seaman et al. 2011, “Sky Event Reporting Metadata Version 2.0”
Allan et al. 2017, “VOEvent Transport Protocol”

Find more LSST documentation here.