Parquet files and metadata

For Roman, we are debating how/whether to store metadata in our parquet tables of object catalogs. What is LSST doing in this regard? (When we initially generate these, they associated with an individual coadd, so at that stage we could consider putting some of the relevant coadd metadata in the parquet file itself. Not sure if there is an equivalent for this in LSST?)

For LSST I did the same thing in LSST that I did in astropy parquet support. There’s a parquet schema metadata field table_meta_yaml that is a yaml representation of the astropy table.meta (since you can do astropy.table.meta.get_yaml_from_table()) and this is byte encoded on persistence and decoded on read. daf_butler/python/lsst/daf/butler/formatters/parquet.py at main · lsst/daf_butler · GitHub The same keyword and method is used in astropy astropy/astropy/io/misc/parquet.py at 0c0557ca2f23a334791691b7764660b13444ef17 · astropy/astropy · GitHub

Since pandas dataframes don’t allow any metadata attached there’s no pandas compatibility for this.

1 Like