Database ingest of exposure metadata

smonkewitz · January 27, 2016, 1:51am

I think the problem is mainly that ingestProcessed.py is still operating in the same way as the previous source ingestion scripts. That is, it outputs rows conforming to an externally defined schema using available metadata rather than using the available metadata to determine the output schema. As you say, the metadata and schemas are camera specific, so you end up with a bunch of IFs and potentially have to modify code every time you encounter a new camera.

My thoughts were that we would either:

have pipeline runs output afw tables containing the desired metadata, and ingest exposure metadata by feeding these tables to the existing ingestCatalogTask
or, write an ingestExposureTask which builds a schema based on the metadata present in FITS files. The task config probably needs to be able to explicitly list the interesting metadata keys or exclude undesirable keys.

In the latter case, I’d rather the exposure metadata ingestion script not be tasked with computing sky coordinates for the 4 corners of the exposure, or FWHMs, even though the code for that seems to be identical across all cameras supported thus far. I’d prefer to see that sort of thing done by the pipeline code.

Does that seem acceptable?

smonkewitz · January 27, 2016, 2:18am

For the record - one issue with using a FITS image header to create a table schema is that you have to rely on keyword value formatting to determine the column type or do multiple passes over the data. For example, imagine that there is a FITS keyword FOO which in general has some floating point value. Let’s say FOO = 2 in the first exposure processed by the script - the FITS header might encode this value as either 2 or 2.0. If it does the former and the script is using metadata from the first image to create the database schema, it would erroneously create a FOO column of integer type. Another thing I worry about is whether I can I count on FOO always being in the exposure metadata (with some sort of sentinel value indicating whether or not the value is valid). If not, then an ingestion task that doesn’t do 2 passes over all the images to ingest would have to be told about FOO and its type.

afausti · January 27, 2016, 9:11pm

Hi @smonkewitz, I think we should first specify the schema of the image metadata including both single visit and co-add processing. For single visit processing we probably want two metadata tables: ccd (with image quality information, median and rms derived from the src catalogs) and visit (with filter, exposure time, mjd, pointing ra and dec, hour angle, airmass, etc). I like the idea of mapping the header keys to a table and column name in the task configuration, but the schema should be camera indepentend and then the column datatypes are already predefined. And @frossie reminded me that this discussion is relevant for the QA database.

smonkewitz · March 16, 2016, 11:47pm

This came up at the database group meeting today. A fixed schema for a common subset of what camera’s provide may be workable. It may be possible to fit all desirable parameters into a camera/observatory data model (e.g. CAOM). DM-5501 is also highly likely to be relevant for exposure ingestion.