lsst::utils::Cache

price · March 28, 2018, 12:53am

I’ve just merged DM-13854, which introduces a new class, lsst::utils::Cache, that I hope may be of use beyond this one ticket. It’s essentially a dict that keeps only the N most-recently accessed elements. It has the usual add, operator[] and contains methods you might expect, but it also has a handy operator() which takes a key and a pointer to a function that can generate a value given the key, e.g., using a lambda:

value = cache(key, [extra1, extra2](Key const& key) { return expensiveCalcuation(key, extra1, extra2); });

(See also here.) The idea is that the function to create the value, which is presumably expensive, only fires if the appropriate value is not available in the cache. This makes it very simple to modify operational code to use the cache.

There are also python bindings, and the __call__ method in python works just the same as the operator() in C++ (i.e., the cache, though written in C++, can run the python code via pybind11), e.g.,

value = cache(key, lambda key: expensiveCalculation(key, extra1, extra2))

In order for this to work, your Key and Value objects must be copyable, and Key objects have to be hashable and comparable. Here’s how to do the latter parts:

#include "lsst/utils/hashCombine.h"

namespace myNamespace {

struct MyKey {
    double x;
    int y;
    // Compare equal
    bool operator==(MyKey const& other) const {
        return x == other.x && y == other.y;
    }
    // Streaming; not necessary except for debugging
    friend std::ostream & operator<<(std::ostream & os, MyKey const& key) {
        return os << key.x << "," << key.y;
    }
};

} // namespace myNamespace

namespace std {

// Define hash of MyKey
template <>
struct hash<myNamespace::MyKey> {
    std::size_t operator()(myNamespace::MyKey const& key) const {
        std::size_t seed = 0;
        return lsst::utils::hashCombine(seed, key.x, key.y);
    }
};

} // namespace std

There’s also a debugging utility: if you #define LSST_CACHE_DEBUG and rebuild, the cache code is instrumented to record all cache queries if you call the enableDebugging method to activate it. The list of queries is dumped on destruction of the cache to lsst-cache-<className>-<unique id>.dat. There’s a simple python script, simulateCache.py, that will allow you to simulate the cache and report performance with different cache sizes.

parejkoj · March 28, 2018, 3:39am

If one was just using it in python, what advantages does it have over the built-in functools.lru_cache decorator?

https://docs.python.org/3/library/functools.html#functools.lru_cache

price · March 28, 2018, 3:54am

I don’t think you’d want to use lsst.utils.Cache for pure-python classes, but it’s useful when you need things available from C++. So why the python bindings? It was easy enough to write, made it simple to test, and I figured someone might want to be able to access the Cache from python to debug or tune the code.