How to mock data from a repo?

rowen · April 26, 2016, 10:57pm

Is there a straightforward way to create a butler that points to an in-memory mocked data? It would be very useful for tests of code that retrieves data from a butler.

If this is something that does not yet exist, I would like to request it for the new butler.

I can see two desired modes:

Populate some data in advance
Allow a callback function to retrieve other data (e.g. large images) when it is requested, so that code can generate it on the fly to avoid running out of memory

ktl · April 26, 2016, 11:23pm

You can create your own Mapper subclass that uses bypass_* functions to generate and return data objects. Is that sufficient?

smonkewitz · April 26, 2016, 11:45pm

I don’t know if it’s enough for what you want to do, but mocking ButlerDataRef instead may be worth considering.

rowen · April 26, 2016, 11:55pm

Making a Mapper subclass from scratch sounds a bit scary, especially since we want to create the butler without an on-disk repository and associated database file.

Mocking data references is a start, and what I would do now. It’s just a bit clumsy if the code uses much of the power of data references.

ktl · April 26, 2016, 11:58pm

If your Mapper subclass doesn’t need to be a CameraMapper, it should be much easier to implement. Even a CameraMapper subclass doesn’t need a registry or any persistent storage (although you might have to build a paf string into the code).

rowen · April 27, 2016, 12:00am

That sounds fairly promising, except the paf string. I hope this will be easier with the new butler.

ktl · April 27, 2016, 12:05am

It won’t be paf, but I’m pretty sure there will still be a need to specify some configuration information for each dataset type for the CameraMapper. If that information is not coming from a repository (in which it was recorded from a dynamic output dataset type definition/prototype), it will have to be specified manually.

Again, bare Mapper subclasses shouldn’t need any of this.

rowen · April 27, 2016, 4:12pm

I am hoping that new butler + camera mapper system will define standard types of dataset types (preferably replacing “dataset type” with a term that does not include the word “type”), such as “exposure”, “source catalog”, or “config”. At that point some code needs to tie names of dataset types to types: code that unpersists and persists, or perhaps some defaults specified somewhere (e.g. “calexp is an exposure”). With such a system the amount of configuration required would be minimal and easily managed in mock code.

ktl · April 27, 2016, 6:19pm

Yes, your “standard types of dataset types” are the genres/prototypes, which contain defaults for things like serialization. Tying them to actual dataset types is the responsibility of the dynamic dataset type definition interface. Whether this will be sufficiently minimal for mock code is yet to be determined.