Parquet files, analysis and DP0.2


A colleague mentioned that parquet files had come up at PCW ((Time series in the RSP?), in the context that some analysis platform would access data via that format rather than qserv.

I presume parquet files and the analysis functionality are not yet available via the RSP for DP0.2 and the kind of bulk analysis functionality envisaged is some ways off yet?

We’re down to publish LSST/infrared fusion data from a UK work-package. We’re doing this in qserv but the data volume is comparable to LSST releases with thousands of columns. We’re thinking maybe store a subset of the most useful attributes in qserv with other attributes available in parquet/butler, so the availability of such an analysis platform withing the RSP is very interesting.


We create parquet files as part of DRP so those files are there in DP0.2. It’s the primary format that we pass to the Qserv ingest system.

User-driven bulk-processing of those parquet files is not part of DP0.2 though.


Thanks @timj.

The spatial sharding of the Parquet files in DP0.2 should not be assumed to be the one that will be used during operations. There are examples in the DP0.2 tutorial materials that show Parquet files being retrieved by coadd-patch ID; this may not be the final approach taken.