Catalog from proprietary data

I read Data Policy | Rubin Observatory

Acknowledging the data policy that data rights holders may not be able to share the derived catalogs with their collaborators, how does the Rubin view when they want to publish their paper(s) with the derived data products? Is there any place where the document discusses this kind of case?

In a more detailed way, how does the Rubin view posting a draft with the derived product on arXiv? (not limited to the collaborator but making it public).

Hi Yousuke, thanks for this question.

Sections 6, 7, and 8 of the Rubin Data Policy (https://docushare.lsst.org/docushare/dsweb/Get/RDO-013) has the information you’re looking for.

To summarize, Rubin data rights holders may produce derived data products from the proprietary data that can be shared with their colleagues without data rights, and published in papers (DPOL-601). And colleagues without data rights may co-author papers which use results from proprietary LSST data (DPOL-702). The document provides some specific examples in Section 8 to illustrate these policies in practice.

I’m pretty sure this answers your question, so I’m going to mark this post as the solution for this topic – but please feel free to reply if more clarification is needed.

I think RDO-013 doesn’t answer the case that I am worried about. Let’s think about the case where a data rights holder wants to publish a paper with 100,000 galaxies (clearly over the 1000 threshold) over the southern sky. This galaxy catalog is based on some sophisticated idea to select some type of galaxies, which would be useful for transient surveys. The author needs to present their location (ra, dec) and magnitudes and colors. The author does not need to share those catalogs with their colleague who don’t have data rights but just wants to publish them.

How does this scenario work?

Hi Yousuke, thanks for clarifying.

In this case where a paper presents an analysis of 100,000 galaxies from the proprietary Rubin data, based on the coordinates and magnitudes of those galaxies, the paper should not reproduce a table of RA, Dec, and mag for the 100,000 galaxies.

The relevant policy in the document is DPOL-703: “for the purposes of publication, reproducibility of scientific results should be accomplished by including, describing or citing the queries used to generate the LSST data set on which the analysis was made, and not by posting or publishing copies of the proprietary LSST data”. Another option here is that the list of the object identifier numbers for the galaxies could be published (objectId column).

The Rubin team is working towards the capability for users to be able to issue DOIs for queries and for lists of object IDs, so that actual tables need not take up space in a paper.

Does this make sense, is it closer to the answer you were looking for?