Are ObjectIDs ordered by tract?

mwv · May 27, 2021, 12:17pm

Are ObjectIDs ordered across tracts in some predictable way?

Or, more precisely, are the ObjectIDs from a given tract drawn from a contiguous block such that no blocks overlap across tracts?

I would like to use Dask to analyze the 166 tracts from DESC DC2. There is a natural partitioning by tract for the original Parquet file access. But then I would like to use objectID as my DataFrame index. Performance would be much better if I can avoid having to do a shuffle when I set the index. Can I make assumptions that objectIDs are assigned in some ordered way across tracts?

I don’t think I need for there to be any particularly relationship between the actual value of the tract number ObjectID. I just need for each tract to have a block of ObjectIDs and that no ObjectIDs from other tracts fall in the range of that block. If I can guarantee that, then setting the index shouldn’t be painful. I may have to work a little bit extra hard to convince Dask that this will be the case; but I think it’s pretty possible.

jbosch · May 27, 2021, 12:44pm

Object IDs do indeed come from separate, non-overlapping per-tract ranges of integers. Within a tract, the same is true of patches.