We have been running pipetask quantum backed butlers and merging datasets into our butler after processing.
When transferring from graph output into the butler db, we get the following error:
butler --log-level=ERROR --long-log transfer-from-graph --update-output-chain qgraphs-BUTLER-20240709/processed/processCcd330994_53.qgraph BUTLER-20240709
lsst.daf.butler.registry._exceptions.ConflictingDefinitionError: Existing dataset type and dataId does not match new dataset: {'dataset_type_id': 33, 'instrument': 'HSC', 'detector': 53, 'exposure': 330994, 'dataset_id': UUID('cfb67d32-225b-43f3-8c16-2cd182988e77'), 'new dataset_id': UUID('ff3e8529-48eb-4611-8be4-e8edc4e7ed3f'), 'collection_id': 21, 'new collection_id': 21}
I suspect we have mistakenly processed the same dataset twice in the same ‘run’, and this has clobbered the previously produced datasets, and now the butter SQL database has the wrong data. This appears to have occurred for a few 100 datasets.
Is there a process to replace the db content with what is on disk?
You would get an error like this if you regenerated the graph but used the same RUN collection. The UUIDs would change so the system no longer could tell that the dataset was already present. If you reuse a RUN you now have to use the same graph. If you regenerate the graph it has to be a new RUN. pipetask update-graph-run can be used to update the run and UUIDs in an existing graph.
v26 is old enough that I am not really sure what situation you are in.