GPU and parallel computing resources in the RSP

TomLoredo · November 12, 2023, 6:38pm

Hello (again), RSP experts—

I dimly recall this question being tentatively addressed in a Q&A (perhaps at an online talk or a DP0 delegate meeting), but I don’t recall the answer: Will the RSP offer access to GPUs in any way (even just via pre-compiled capability in Python packages like the GPU capability in JAX or PyTorch, or via PyCUDA)?

In an in-kind Q&A here on Community, FAQ: In-kind Contributions and Data Rights Agreements, it’s noted that “There are no plans on the part of Rubin to use GPUs in the science pipelines,” but also that “GPUs could be beneficial for certain computationally bound use cases.” That thread concerns whether IDACs might exploit GPUs.

Many demographic/population modeling tasks would be pretty straightforward to parallelize (for GPUs or even just multiple cores); e.g., they often have “embarrassingly parallel” structure (for an example and some general discussion, see [2105.08026] GPU-Accelerated Hierarchical Bayesian Inference with Application to Modeling Cosmic Populations: CUDAHM). So I’m wondering if there might be GPU support in future incarnations of the RSP, which could accelerate demographic analyses.

Regarding multicore CPU parallelism, I see that current large server instances have 4 cores, so I presume the usual Python/IPython-accessible parallel computing tools will work. But if there are any obstacles to that, I’d like to learn of them. Also, is 4 cores likely to remain the limit, or might there be more available in the future?

—Tom

frossie · November 12, 2023, 6:54pm

Hi I just want to explain what may seem like mixed messaging over GPUs.

Right now Rubin is still in a construction phase, and hence we are largely working on capabilities that are specified in the requirements. GPUs are not in the requirements, and the formal answer is that IDACs have been solicited to expose GPU capabilities to the community.

At the same time, we recognize GPUs are very attractive and one of the advantages of using commodity cloud on our hybrid science platform do include a certain elasticity on how compute is provided.

So when I say GPUs are “not planned” it means literally that: there is no date in a plan for the deliver of GPUs in the “flagship” general science platform. We do intend to investigate this further once the survey starts and I would be very surprised if we cannot provide some form of GPU service at some point in the duration of the survey. How, where, and to what extent it would be generally available are all issues that need discussion. Right now we are ignorant on the most basic questions about what the user data access patterns will be once survey starts so I’d rather nail those things down before opening new fronts.

I am always super careful about promising things to the community because people will go get grad students on the basis of timelines given and then if you’re not ready when you said you mess up people’s lives as well as demoralize your developers.

tl;dr: I cannot give you a date or capability description of a GPU service at this time. The project is absolutely monitoring emerging community requirements in this area and at some point in the future there will be a technical response to those requirements.

frossie · November 12, 2023, 7:01pm

Oh, on the issue of parallel cores, which is a different issue, there are a number of plans.

The project is planning to expose a “user batch” service on top of its data processing infrastructure. My understanding is that the service will be by application since it is intended for heavy duty reprocessing. This is not an RSP service per se, so I don’t know the timeline. It is in the requirements though, so it is planned.
Not in the requirements, but it is clear that dask would be of great utility to the Notebook Aspect users and we have done enough prototyping work that I am confident we can provide it as a service by Data Release 1.
We are also investigating other axes of parallelisation that are accessible to non-experts, such as batch notebook execution. That’s lower in the priority list as it is speculative, and we don’t know what resources will be left over after real usage patterns are established. It’s a “stay tuned” thing.

MelissaGraham · November 12, 2023, 7:05pm

I’m going to add this link here to an FAQ on RSP future functionality, which seems relevant to the conversation and also mentions GPUs (albeit with less detail than what Frossie’s already provided above): RSP Future Functionality FAQ — Vera C. Rubin Observatory Documentation for Data Preview 0.2

TomLoredo · November 12, 2023, 7:17pm

Thanks, @frossie and @MelissaGraham , for the amazingly quick and helpful responses. I wish I could check them all as “Solutions;” you’ll have to settle for hearts on some of them.