How do we guarantee that a source ID won’t clash with a DIAObject or Object ID? Two bits for ID type?
The wording of the DPDD is slightly unclear, but as I read it, it does not mandate this separation.
Also note that the DPDD specifies separate conceptual catalog entries for Source ID and “ccdVisitId”. With the proposals here, the latter could be extracted from the former using a simple UDF.
Presumably there are engineering requirements pertaining to this? Comments above mingle representation and numeric issues. Do the semantics require a sequential count? There is convenience in mapping the exposure ID not just to the calendar, but also the clock. The maximum number of exposures per day is constrained by exposure + readout time. If the latter will always be greater than a second, no matter how short the exposure, then 17 bits is sufficient for a daily count. The remaining 15 bits suffice for 89 years of operation. Converting a 32-bit integer into an ISO-8601 string requires computation in all cases, just start the calendar count at 0h UTC (or TAI or local time or whatever) on day 0 of the LE (LSST Era). One doubts there is a project requirement to embed knowledge of the Gregorian calendar in every ID, rather this belongs in the representation layer. If otherwise “much larger than 32 bits”, allocate more space.
Many telescopes include the date of acquisition in the observation identifier (normally a string and conventionally called OBSID
in FITS). The issue here is a unique integer identifier for every CCD exposure. Human readable integers can be useful when glancing at an error report without having to run up a tool to convert ID 1234564
to some day in 2026.
Using day since some epoch + seconds in day (we definitely can’t take more than 86400 exposures per day) + up to 999 detectors results in 41 bit detector_exposure_ids for 50 years.
If, on the other hand, the acquisition system kept track of how many exposures were taken and used that as the exposure_id then if we took 86400 exposures every day for 50 years that fits in 33 bits (so in reality 32 bits is more than enough in that scheme).
After some outside discussion on Slack with @ctslater, @jbosch, and @ktl it seems like we are edging towards:
- Source IDs need 20 bits to reflect the number of sources that can be found on a single CCD.
- We may want to consider relaxing the requirement that source IDs must include the data release number (which would use 5 bits to cover us for 32 releases).
- The ID relating the source on this detector to the exposure ID can therefore use 44 bits.
- We do not necessarily require that the detector_exposure_id described here has to be integrated as-is into the source ID. In particular, decoupling the two means that we can have more than 44 bits in the
detector_exposure_id
and use a human readable version of that andexposure_id
and use @ktl’s proposed hex-packed version (which covers 256 years if you drop data releases) for Source IDs themselves. The butler registry should have enough information in it to know what is required for uniqueness.
This all means that I think we can proceed with things more or less as they currently are: retain YYYYMMDD but drop the extra 0
from the detector number and allow a maximum of 99,999 exposures per day.
Some were present when the FITS convention was set Originally called RECID, a string, this was paired with an integer RECNO. The reason I replied is that you appear to be trying to combine the semantics of both. Does the solution described in the subsequent message meet requirements? Are there requirements broader than LSST? OBSID at NOAO where it originated included fields for telescope and optionally instrument.