Collaboration with non-data-rights holders on statistical algorithm development

Dears,

I would like to collaborate with applied mathematics colleagues on an interesting anomaly detection problem stemming from our work with LSST data. These researchers are not from Chile or the US, and are not on the International In-Kind Data Rights Holders list.

We have successfully collaborated on similar projects in the past, applying statistical algorithms to astrophysics datasets. Our previous work resulted in two publications: one led by our (astrophysics) group for an astrophysics journal, and one led by their group for a statistics journal. We would like to do the same for LSST.

After reviewing the Rubin Data Policy (Section 8.1), I understand that publishing a paper led by our data-rights-holding group, with our colleagues as co-authors and using LSST-DP1 data for scientific applications, should be permissible.

My question concerns the second work: a paper led by the non-data-rights-holding collaborators, focusing on an algorithm, and targeted at a statistics/CS/mathematics audience. For an algorithm developed for solving a challenge with LSST data products, validation would ideally require reporting performance metrics computed on LSST data (accessed through mediation by our research group).

  • Would this type of publication be acceptable under the existing data policy?
  • Should the collaborators avoid touching LSST data at all in such a paper, or are there specific guidelines for acknowledgment and attribution?

The exact data products we are addressing are DiaSources, where many of the entries are unconfirmed or noise, so I’m not sure how the “1000 objects” sample would apply.
I appreciate any guidance you can provide.

Thank you!

Hi @deppep, thanks for this question, happy to try and clarify the Rubin Data Policy (ls.st/rdo-013).

Would this type of publication be acceptable under the existing data policy?

A paper that is led by non-data-rights holders, and contains LSST data that was accessed and analyzed by co-authors with data rights, is OK as long as the co-authors with data rights do not share any proprietary LSST data products with the co-authors who do not have data rights (Section 8.1, Scenario 2). The exception is that that coordinate lists for up to 1000 objects can be shared to enable follow-up observations by co-authors without data rights (Section 8.1, Scenario 1), but it does not sound like this exception applies to your case.

Should the collaborators avoid touching LSST data at all in such a paper, or are there specific guidelines for acknowledgment and attribution?

Yes that is correct, the co-authors without data rights cannot touch the proprietary LSST data products by the co-authors with data rights (Section 8.1, Scenario 2).

Everyone should follow the guidelines for citing LSST and a given data release (DPOL-306):

There is one more aspect to consider here, and that is which LSST data products are proprietary, and which are public (DPOL-301). The term public means that a data product can be shared with anyone, anywhere, worldwide. The term proprietary means that a data product cannot be shared with anyone that does not have data rights. You mention that the data product you want to use is the DiaSource table. There will be multiple versions of this table, some public and some proprietary. The versions that are produced via the Data Release Processing (DRP) pipelines, and released as part of Data Preview 1, Data Preview 2, and Data Release 1, are proprietary (DPOL-506, Section 5.1). However, the DiaSources that are produced via the Prompt Processing pipeline and released either as alerts, or as part of the Prompt Products Database (PPDB), are public (DPOL-503, 504, Section 5.1).

I’m going to mark this response as the solution, but if this doesn’t fully answer your data rights questions please don’t hesitate to reply in thread – happy to keep discussing.

2 Likes