How to mock data from a repo?

Is there a straightforward way to create a butler that points to an in-memory mocked data? It would be very useful for tests of code that retrieves data from a butler.

If this is something that does not yet exist, I would like to request it for the new butler.

I can see two desired modes:

  • Populate some data in advance
  • Allow a callback function to retrieve other data (e.g. large images) when it is requested, so that code can generate it on the fly to avoid running out of memory

You can create your own Mapper subclass that uses bypass_* functions to generate and return data objects. Is that sufficient?

I don’t know if it’s enough for what you want to do, but mocking ButlerDataRef instead may be worth considering.

Making a Mapper subclass from scratch sounds a bit scary, especially since we want to create the butler without an on-disk repository and associated database file.

Mocking data references is a start, and what I would do now. It’s just a bit clumsy if the code uses much of the power of data references.

If your Mapper subclass doesn’t need to be a CameraMapper, it should be much easier to implement. Even a CameraMapper subclass doesn’t need a registry or any persistent storage (although you might have to build a paf string into the code).

That sounds fairly promising, except the paf string. I hope this will be easier with the new butler.

It won’t be paf, but I’m pretty sure there will still be a need to specify some configuration information for each dataset type for the CameraMapper. If that information is not coming from a repository (in which it was recorded from a dynamic output dataset type definition/prototype), it will have to be specified manually.

Again, bare Mapper subclasses shouldn’t need any of this.

I am hoping that new butler + camera mapper system will define standard types of dataset types (preferably replacing “dataset type” with a term that does not include the word “type”), such as “exposure”, “source catalog”, or “config”. At that point some code needs to tie names of dataset types to types: code that unpersists and persists, or perhaps some defaults specified somewhere (e.g. “calexp is an exposure”). With such a system the amount of configuration required would be minimal and easily managed in mock code.

Yes, your “standard types of dataset types” are the genres/prototypes, which contain defaults for things like serialization. Tying them to actual dataset types is the responsibility of the dynamic dataset type definition interface. Whether this will be sufficiently minimal for mock code is yet to be determined.