Thanks for looking at this. It’s really great. Some comments:
- What does
createmean for ECHO? Are you using local sqlite file but ECHO datastore?
- What is being imported in the
ingest-rawsare you ingesting from local disk to ECHO or ingesting from the raw files in the ECHO bucket? If ingesting from local then there will be the transfer overhead (and I haven’t parallelized the S3 copies). If ingesting from the bucket each file has to be downloaded so the header can be read from it to extract the registry information. This can be sped up considerably by writing index files for the raw files (
astrometadata write-index) and putting those in the bucket as well. The ingest-raws command will then download the small JSON files rather than the huge fits files.
define-visitsis entirely registry based so I would not expect any difference between the two.
write-curated-calibrationsreads lots of ECSV/YAML files into memory and then writes out transformed FITS files. This will write each file locally and then transfer it to the bucket so I assume that’s where the slow down is coming from. It does not attempt to batch up those transfers (datastore.put can’t be given multiple
make-discrete-skymapis going to be slower because currently we do not cache the file locally when using S3. This means that the get of
calexp.wcsdownloads the full file, reads the WCS, then deletes the local file, and then
calexp.bboxdoes the same thing. You could enable caching for
calexpby changing the configuration file for datastore but I haven’t enabled it because I don’t have cache expiry. One thing I would be very interested in is how your timing changes with ECHO if you use composite disassembly – in that datastore mode I wrote components out as separate files such that
calexp.wcsonly downloads the WCS part. You can see how to do this by looking at pipelines_check.