Changes to how defects are handled: implementation of RFC-595

Defects are changing

As of RFC-595 we have been attempting to formulate a system that allows defects (and potentially other user curated calibration products) to be handled in a more consistent manner.

When the word defect is used in this posting it refers specifically to strictly static, rectangular regions of pixels.
Examples are single hot/cold pixels, bad columns, and anomalous pixels at the edges the sensors.
To date, these have been represented in several different ways: text files, FITS binary tables of bounding boxes, and bad pixel masks.
This work is not intended to extend the types of defects, but does intend to reduce the numbers of ways they are encoded.

The target is to merge at the beginning of next week: 17 June 2019.

What is changing?

  1. We are standardizing on a single text format as the human readable version of defects.
  2. This human readable version is stored in a package separate from the package that holds configuration overrides and special case code.
  3. The standard text format files are ingested into calib repositories for use with command line tasks.

The standard format

  • The standard is to produce text files in the ecsv standard format.
    This format provides both an easy to read and human readable format with support for metadata and machine readability through astropy.table.
  • The standard is to store four columns: x0, y0, width, height.
    These represent the location of the pixel nearest the origin and the extents in the x and y coordinates respectively.
  • Three pieces of user curated metadata are required to exist in the file header.
    These metadata must also agree with the layout of the files in the data package as described in the next section.
    When using the writeText method on the Defects object, the other required metadata will be added automatically.
    Other arbitrary metadata may be added.
    Specifically, we suggest adding a DEFECTTYPE key.
    • INSTRUME: The instrument name e.g. decam.
    • DETECTOR: The index of the detector to which to apply these defects.
    • CALIBDATE: An ISO compatible string that is unique for a particular level (typically detector) across all validity ranges. This cannot generically be assumed to be the valid start time.
    • DEFECTTYPE (optional): A string describing what kind of defects these are.

Note: As originally written, I had assumed one could use the valid start time as the CALIBDATE. That cannot be enforced in general since different calibration pipelines define the CALIBDATE in different ways. I have updated the description to reflect this.

Here is an example defects file from the obs_test_data repository.

The obs data repository

A primary goal of this work was to provide guidance on how to break calibration-like data out of the obs_* packages and into separate repositories.
The following guidelines are specifically for this work with defects, but should be extensible to other types of versioned, human curated data.

  • The package is the name of the obs_ package appended with _data.
    E.g. obs_test has a corresponding obs_test_data.
  • Each instrument in the obs_ package should have a corresponding directory in the obs_*_data package.
    E.g. obs_subaru has both hsc and suprimecam so will have each of those as top level directories in obs_subaru_data.
    These names must correspond to the INSTRUME metadata in the defects files.
  • Each type of curated calibration data will have a separate directory for each instrument.
    In this case there will be a defects directory for each instrument in a given obs_*_data package.
  • Each sensor in the instrument array will have a directory containing a file for each validity range.
    The directory name is the name of the sensor as given by Detector.getName().lower().
    By convention, these will be lower case to avoid problems with case sensitive vs. non-case sensitive file systems.
    An example in the obs_test_data repository is: test/defects/0/19700101T000000.ecsv.
    In this case, Detector.getName() returns the string 0 and Detector.getId() returns the integer 0, so the directory is ambiguous.
    Most cameras return a more readable value for the detector name.
    The same detector must be accessed by name via the directory name or by ID via the defect file metadata.
  • The file name for individual validity ranges will consist of an ISO compliant date string corresponding to the beginning of the validity range and the .ecsv extension
    The string date in the file name must correspond to the CALIBDATE metadata in the file.

Using the standardized files

The standard text files are intended to be easy for a person to curate.
Specifically, they should be easy to sort by validity date on the command line with ls.
It should be clear what file goes with which sensor.
Further, the format is meant to be editable for simple changes and easy to generate from code via Defects.writeText.

In this form, the files do not constitute a proper calibration repository.
They must be ingested into a calibration repository to be used by the butler.
This is accomplished with the ingestDefects.py command line task.
This task does a translation of the files from their text form to a binary form (FITS) and uses a Butler to put them in the correct location.

A momentary diversion for three implementation details:

  1. For technical reasons, the files are written to a temporary location and then moved into the appropriate location for the calibration repository.
    This means that the --mode option is not available.
    Specifically, it is not possible to use --mode=link.
    This shouldn’t be an issue for defects as they will be relatively small.
  2. The representation of the defects inside the calibration repository is in the FITS region format.
    This means they are parsable natively by both astropy and ds9.
  3. Each defect file has a validity range extending to the next valid defect file. For the last in the sequence the validEnd is set to the end of Unix time.

The obs_ packages translated for this activity: obs_decam, obs_lsst, obs_subaru, include a SConscript that will ingest the defects from the appropriate obs_*_data package into a temporary calibration repository inside the obs_ package.
These are not automatically discovered by the butler.

The defects can be used by other command line tasks via one of two options:

  1. The calibration repository created at scons time can be used to seed a calibration repository into which other products like flats may be ingested.
  2. The defects can be ingested directly into an existing calibration repository.
    An example command line for obs_test_data defects is (note that ingestDefects.py lives in pipe_tasks):
    ingestDefects.py path_to_butler_repository $OBS_TEST_DATA_DIR/test/defects --calib=path_to_calib_repo.
3 Likes

A few questions about this:

Do we need to use FITS-style metadata names, instead of e.g. Instrument and CalibStartDate? These aren’t FITS files.

Why is DEFECTTYPE optional? I would think that should be mandatory, so that downstream code can always make use of it.

Is there a reason a CalibEndDate is not required, or does that get added by writeText? Similarly for the DEFECTS_SCHEMA_VERSION that appears in your obs_test_data example.

Are the obs_*_data packages expected to contain other types of data in the future, for examples bias/dark/flat files?

These headers migrate directly to the output FITS files. I don’t want to have to include mapping tables to try to convert Instrument to INSTRUME and it’s easier for everyone if the metadata they read from the Defects object looks like the metadata read from other calibration data files.

The assumption is that later defects define the end date. Otherwise you risk having unexplained gaps in the history.

I believe this is discussed on the RFC. The data packages will contain the definitions of what goes into master flats (eg exposure IDs) but not the actual master flats.

At the moment they’re just arbitrary strings, so they wouldn’t be machine-readable anyway…

Edit: should have said they “sound like” arbitrary strings. I don’t actually know.

In that case, does every defect turn into the same mask plane bit? It seems like we should either choose to require defect type and decide how to use it later, or not bother because we’ll have plenty of files without between now and when we want to do more with it.

Currently every defect turns into the same mask plane bit. This was added at the request of @RHL. I don’t have a way to validate the DEFECTTYPE at the moment, so I don’t think we can require it.

Note that there are some minor tweaks to the implementation as described in this post: Extension of the standardized curated calibrations system.