I am currently migrating a workflow from the RSP Jupyter portal into batch processing on NERSC. I have run into the issue that the TAP service has rate limits and so my workflow (which involves accessing various Object, Visit, etc tables) multiplied by N jobs is immediately given an error for overloading TAP. I had understood that the reason for having the Rubin data and the compute right next to each other (i.e. on the RSP and NERSC) was to mitigate the effects of data transfer through the internet and to avoid these kinds of strong rate limits.
Should I rewrite everything to work with the Butler? Does that behave better at directly accessing the data on NERSC rather than going through a portal? Because I have looked into that and the Butler seems much less efficient than TAP (you often have to download whole columns and do selection client side rather than server side). As another point, beyond the reduced efficiency of loading extra data in memory, is that using the Butler to reproduce TAP style ADQL queries requires much more code. I understand the Butler has access to the tables/catalogues, so are there plans to add ADQL queries as an option?
As a final note, it was very jarring when migrating from the RSP to NERSC (and presumably any other IDAC) that the lsst stack is different on the RSP than anywhere else. I loaded the exact same version on the RSP and NERSC (v29.1.1) and I was unable to do from lsst.rsp import get_tap_service. I think if there are going to be RSP specific tools then those should have their own package, rather than being added in to the lsst stack sometimes.
All of this is conditioned on me being very new to all this, so perhaps I am missing something and my above points (except the lsst.rsp thing) are moot.