Removing an exposure dimension record after ingestion

I am using weekly w.2022.45 of the science pipelines. I’d like to remove an exposure dimension record from the Butler registry. I am ingesting a second version of a DECam fits file where some of the metadata (at the least exposure timespan) has changed, but exposure number is the same. This produces a conflict when ingesting with butler ingest-raws:

lsst.ingest WARNING: Exposure DECam:ct4m20140331t051219 could not be registered: Conflict in sync for table exposure on column(s) timespan: Timespan(begin=astropy.time.Time('2014-03-31 05:12:54.091401', scale='tai', format='iso'), end=astropy.time.Time('2014-03-31 05:14:24.091401', scale='tai', format='iso')) != Timespan(begin=astropy.time.Time('2014-03-31 05:12:54.091401', scale='tai', format='iso'), end=astropy.time.Time('2014-03-31 05:15:18.000000', scale='tai', format='iso')).

I can’t find a good way to remove a dimension record from the registry, no butler remove-dimension-record or in Python no butler.registry.removeDimensionRecord(). How should I go about this?

As an aside, I am finding the ingestion code extremely opaque to find and search though for relevant configs that would set e.g. update=True in butler.Registry.syncDimensionData that I think would solve my issue.

Long log hints I should start looking at line 1182 of ingest.py:

WARNING 2022-11-21T13:54:40.498-08:00 lsst.ingest ()(ingest.py:1182) - Exposure DECam:ct4m20140331t051219 could not be registered: Conflict in sync for table exposure on column(s) timespan: Timespan(begin=astropy.time.Time('2014-03-31 05:12:54.091401', scale='tai', format='iso'), end=astropy.time.Time('2014-03-31 05:14:24.091401', scale='tai', format='iso')) != Timespan(begin=astropy.time.Time('2014-03-31 05:12:54.091401', scale='tai', format='iso'), end=astropy.time.Time('2014-03-31 05:15:18.000000', scale='tai', format='iso')).

But I can’t find a relevant ingest.py (or lsst.ingest package) on GitHub. Additionally, I can’t find any source code for DecamParseTask (github and docs) which I was hoping would give me insight into how the ingestion code is working.

Does anyone have tips or links with a high level overview of how fits ingestion (through dataset registration with the registry) is done in the science pipelines?

Thanks.

Do you know how this is possible?

It’s in obs_base:

The problem is that we don’t make that update option available in the butler ingest-raws command at the moment because it seemed like a very dangerous thing to do and we wanted to have a think before adding it. The current advice is to run the Task directly for the file that you want to override the existing definition. You can see the code here:

That is Gen2 code that has now been deleted. The metadata translation system changed for gen3 and the DECam translator is at:

If the second file has an incorrect header for some reason it is possible to specify a header override file for it. You can also run the astrometadata write-sidecar command and then edit the translated content before running ingest. It all partly depends on knowing which file had the right header information.

It’s all generic code. We use astro_metadata_translator to extract the relevant exposure record information and then create the exposure records and ingest the files into butler.

Thanks, Tim for the helpful information. This is exactly what I needed to do. Here’s what I ran:

from lsst.daf.butler import Butler
from lsst.obs.base import ingest
from lsst.pipe.base.configOverrides import ConfigOverrides
_butler = Butler("./repo", writeable=True)
config = ingest.RawIngestConfig()
configOverrides = ConfigOverrides()
configOverrides.applyTo(config)
task = ingest.RawIngestTask(config=config, butler=_butler)
task.run(["./data/images/mar31/object/c4d_140331_051219_ori.fits.fz"], run="mar31/raw/object", skip_existing_exposures=True, update_exposure_records=True)

One file I downloaded from the NOIRLab image archive was corrupted/unreadable. They re-ingested the original raw data into the archive under a different name with new tooling (updating some headers) and deleted the old file. It looks like this updated the exposure timespan. The diff on the fits headers is (diff old new):

< DATE-OBS= '2014-03-31T05:12:19.091401'  /  UTC epoch                            
---
> DATE-OBS= '2014-03-31T05:12:19.091401' / UTC epoch                              
36c36
< DTPI    = 'Heinze            '  /  Principal Investigator                       
---
> DTPI    = 'Heinze  '           / Principal Investigator                         
131,139c131,168
< CHECKSUM= 'nDg2q9Z2nAf2n9Z2'   / HDU checksum updated 2022-11-16T12:09:25       
< DATASUM = '0         '         / data unit checksum updated 2022-11-16T12:09:25 
< DTSITE  = 'ct                '  /  observatory location                         
< DTTELESC= 'ct4m              '  /  telescope identifier                         
< DTINSTRU= 'decam             '  /  instrument identifier                        
< DTCALDAT= '2014-03-30        '  /  calendar date from observing schedule        
< ODATEOBS= '                  '  /  previous DATE-OBS                            
< DTPROPID= '2014A-0496        '  /  observing proposal ID                        
< DTACQNAM= '/data_local/images/DTS/2014A-0496/DECam_00297828.fits.fz'  /  file na
---
> CHECKSUM= 'a6Maa6JZa6Jaa6JY'   / HDU checksum updated 2022-11-21T13:47:36       
> DATASUM = '0       '           / data unit checksum updated 2022-11-21T13:47:36 
> DTSITE  = 'ct      '           / observatory location                           
> DTTELESC= 'ct4m    '           / telescope identifier                           
> DTINSTRU= 'decam   '           / instrument identifier                          
> DTCALDAT= '2014-03-30'         / calendar date from observing schedule          
> ODATEOBS= '2014-03-31T05:12:19.091401' / previous DATE-OBS                      
> DTUTC   = '2014-03-31T05:14:43'  /  post exposure UTC epoch from DTS            
> DTOBSERV= 'NOAO    '           / scheduling institution                         
> DTPROPID= '2014A-0496'         / observing proposal ID                          
> DTPIAFFL= '                  '  /  PI affiliation                               
> DTTITLE = '                  '  /  title of observing proposal                  
> DTCOPYRI= 'AURA    '           / copyright holder of data                       
> DTACQUIS= 'pipeline4.ctio.noao.edu'  /  host name of data acquisition computer  
> DTACCOUN= 'sispi             '  /  observing account name                       
> DTACQNAM= '/data_local/images/DTS/2014A-0496/DECam_00297828.fits.fz' / file na  
> DTNSANAM= 'c4d_140331_051219_ori.fits.fz' / file name in NOAO Science Archive   
> DT_RTNAM= 'c4d_140331_051443_ori'  /  NSA root name                             
> DTQUEUE = 'decam             '  /  DTS queue (17555)                            
> DTSTATUS= 'done              '  /  data transport status                        
> SB_HOST = 'pipeline4.ctio.noao.edu'  /  iSTB client host                        
> SB_ACCOU= 'sispi             '  /  iSTB client user account                     
> SB_SITE = 'ct                '  /  iSTB host site                               
> SB_LOCAL= 'dec               '  /  locale of iSTB daemon                        
> SB_DIR1 = '20140330          '  /  level 1 directory in NSA DS                  
> SB_DIR2 = 'ct4m              '  /  level 2 directory in NSA DS                  
> SB_DIR3 = '2014A-0496        '  /  level 3 directory in NSA DS                  
> SB_RECNO=               165255  /  iSTB sequence number                         
> SB_ID   = 'dec165255         '  /  unique iSTB identifier                       
> SB_NAME = 'c4d_140331_051443_ori.fits'  /  name assigned by iSTB                
> SB_RTNAM= 'c4d_140331_051443_ori'  /  NSA root name                             
> RMCOUNT =                    0  /  remediation counter                          
> RECNO   =               165255  /  NOAO Science Archive sequence number         
> COMMENT MODIFIED:DATE-OBS,DTACQNAM,DTCALDAT,DTCOPYRI,DTINSTRU,DTNSANAM,DTOBSERV,
> COMMENT DTPI,DTPIAFFL,DTPROPID,DTSITE,DTTELESC,DTTITLE,INSTRUME,OBSERVAT,OBSID,O
> COMMENT BSTYPE,ODATEOBS,PROCTYPE,PRODTYPE,PROPID,SIMPLE,TELESCOP,TIME-OBS       
> HISTORY Applied DTCALDATfromDATEOBSchile which added/modified fields (set()). Ol
> HISTORY d values were:                                                          

Looking at the metadata translator, it seems like the presence of the DTUTC header in the new vs old file changes how the exposure duration is defined.

1 Like

And for future readers, since I ran butler define-visits, I additionally had to run defineVisits.py and adding update_records=True

task.run(
    butler.registry.queryDataIds(
        ["exposure"],
        dataId={"instrument": instr.getName()},
        collections=collections,
        datasets=raw_name,
        where=where,
    ),
    collections=collections,
    update_records=True,
)