Using a custom conda environment in the RSP notebook aspect (R experiment)

TomLoredo · May 16, 2023, 5:57am

Hello, RSP gurus-

On behalf of some Penn State Center of Astrostatistics (CASt) colleagues who are R fans, I wanted to do some simple testing of use of R on RSP by installing R via conda. For some quick command-line testing, the following rpy-test conda environment works:

conda create --name rpy-test -c conda-forge -c defaults \
python r-base ipython jupyter scipy matplotlib pandas \
astropy r-essentials rpy2

It combines Python, R, and Jupyter, and includes RPy2, a Python package that lets Python access R from within a script, module, or notebook.

Once activated, this lets me use R (the command-line interpreter) on the command line (including installing new R packages via install.packages), and from an interactive Python session, it lets me import rpy2.

I’m not planning on doing anything beyond the most basic testing at this point (I’m not an R expert myself). But I’m already stymied by the next steps. I’d appreciate tips or specific pointers to the docs (I did already have a quick look!) regarding the following:

Is it possible to use a custom conda environment for a notebook, either by launching a notebook in the env, or by selecting a kernel from the desired env?

If so, how, and what other packages should the environment include to enable access to DP0 data?

If there isn’t a way to use a notebook with a custom env, is the only way accomplish something like this to add packages to the base conda environment? That seems problematic. Conda had a lot of trouble solving this environment. An initial attempt, pinning Python at 3.10 and R at 4.3 (both available on conda-forge) had dependency conflicts; I quit an attempted install after ~30 min. The unpinned environment above ended up using Python 3.11 and R 4.2.3.

Alternatively, I suppose one could just write scripts and run them in the custom env. In that case, I’d still need to know the recommended packages for Rubin data access.

Any advice appreciated!

Cheers,
Tom

ktl · May 16, 2023, 5:08pm

It is possible to start a notebook using a kernel based on a custom conda environment. Following some of the instructions here, I did this (at the USDF RSP, but should work elsewhere):

source /opt/lsst/software/stack/loadLSST.bash
mamba create --name rpy-test -c conda-forge -c defaults \
    python r-base ipython jupyter scipy matplotlib pandas \
    astropy r-essentials rpy2
conda activate rpy-test
python -m ipykernel install --user --name rpy-test --display-name RPy

(using mamba for faster environment solving and creation) followed by stopping and restarting my notebook server, I get:
Screenshot 2023-05-16 at 10.07.43

TomLoredo · May 16, 2023, 6:22pm

Thank you, @ktl—very helpful! I knew about mamba but didn’t think to check whether it was already available in the RSP. The Jupyter kernel selection document you found at NERSC is also helpful. Perhaps some of this should be in the RSP docs.

Do you (or any other readers) know the proper set of packages to include in an environment in order to be able to access DP0 data—to use the Butler, etc.? Is that already in the docs in someplace I’ve missed?

ktl · May 16, 2023, 8:51pm

To access and work with DP0 data in addition to using user-custom packages, it may be best to use the full rubin-env-rsp metapackage. We have tried to make rubin-env-rsp compatible with many external packages by limiting dependency constraints.

In particular, it is possible to create an environment containing python, r-base, ipython, jupyter, scipy, matplotlib, pandas, astropy, r-essentials, rpy2, rubin-env-rsp using only the conda-forge channel (partly because most of those packages are already in rubin-env-rsp).

And I should give credit to @cwalter who originally pointed out how to make a kernel from a conda environment.

TomLoredo · May 17, 2023, 12:19am

Thanks again, K-T (and Chris)—just the info I was looking for. I appreciate you taking the time to respond.