macOS Ventura and the Science Pipelines

For people using macOS Ventura, or considering updating to Ventura, there seems to be a bug in the dynamic library loader in this release that causes difficulties when building the LSST Science Pipelines. This problem occurs both on Intel and Apple Silicon computers.

The science pipelines software itself runs but on rare occasions you may get a crash on startup. This problem is exacerbated during builds where the code is imported many times and sometimes in parallel due to our use of multiprocessing in tests.

The problem manifests itself in three ways during a build:

  • The feedback “…” never ending for a particular package.
  • A test failure with a traceback that may include dyld4.
  • Worker processes dying such that no tests are run.

In all these cases restarting the build will likely eventually get everything to succeed. It can sometimes help to reduce the number of parallel processes (eg by setting EUPSPKG_NJOBS). To repeat, the software can be used and if you install prebuilt binaries you likely will not encounter a problem.

We have filed a feedback with Apple (thank you @erykoff ) but it is clear that we have no idea if Apple will ever fix this or even think it might be a problem. We are working on mitigation strategies but much of this is outside of our control.

Technical Details

We are discussing this problem in DM-37301

In macOS Ventura Apple rewrote the dynamic library loader (dyld4). The problem seems to be related to loading many shared libraries in a very short period of time. For example, running the tests in drp_pipe results in more than 800 libraries being loaded and when parallel testing processes are involved this can easily trigger the problem (so drp_pipe is one package where you may need to reduce the number of processes to get it to build).

Our code accounts for about 200 shared libraries and we are working on reducing that number, which should help reduce the likelihood of crashes but the remaining 600 are always going to be a problem unless we can convince the developers of packages such as scipy, skimage, and sklearn (200 libraries just for those) to start combining libraries. We do sometimes see crashes in daf_butler testing and that package does not use any of our shared libraries.

One silver lining is that reducing the number of shared libraries in our code does reduce the package import times.