The RSP server fails to start when I choose weekly 2023_16 image. The following image shows the messages I get when it fails. I should mention that I am trying to start the server on data.lsst.cloud. Could you please help me to figure out how this can be fixed? or who I should contact to?
I can confirm that starting a new server with the 2023_16 image has worked fine for me twice today, just now and 2 hours ago. But by the way you say âwhen it failsâ it sounds like youâve encountered this issue multiple times?
Iâm going to bring this to the attention of our RSP developers for you, because I canât recreate the issue or fix it. As a headâs up, it might help them to know all the (other) dates and times you encountered the failure and if you were fully logged out of the Notebook Aspect when it occurred. If you can respond with that extra information here in the thread that might be helpful.
Since itâs starting the container image, something is keeping the JupyterLab process inside the container from calling back to the Hub.
First try logging out (with â/logoutâ) and then back in to get a new token. If it still doesnât start after that, but fails the same way, thenâŚ
Did you install some software locally, with pip or conda? If so it is likely that you have installed something that is fighting with the other packages in the RSP. Try checking âReset user environmentâ on the spawner screen and see if that helps.
I should say that I havenât been able to start the server on the 2023_16 image since last Thursday. I have tried multiple time since then, but failed!
Since I can start the server on the 2023_15 image, I have logged in and logged out several times since then.
So, you didnât mean top âLog Outâ when you advised me to log out with â/logoutâ?
If yes, I am not sure what â/logoutâ is. Could you please clarify that?
I should add that I get this message 2023-04-27T22:27:39Z [Warning] Back-off restarting failed container just before getting the failure message that I shared in the above screenshot.
Thanks for your advice @adam, @bazkiaei and I have been trying to get to the bottom of this. For reference, the above error message was returned after we first tried moving temporary files in Amirâs homespace into a scratch folder, typing logout in the terminal, and shutting down the instance with âSave all, Exit, and Log Outâ. The temporary files look like: .user_env.20230427TTTTTT.
Amir can spin up w_2023_15 (and anything weâve tried earlier than w15), but w_2023_16 and w_2023_17 are a no-go. Thereâs nothing mission-critical in Amirâs homespace right now - if we had to wipe everything and start over, that would be an option, unless you have an alternative?
There was a .conda file in my home directory for some reason (I had made it!) and after moving that from the home directory I could build the 2023_17 image successfully.
w_2023_16 and 17 use rubin-env 6.0.0, so it makes some sense that a local .conda dating from the preceding rubin-env 5.1.0 might interfere. Glad you found the solution.
I am not sure what â/logoutâ is. Could you please clarify that?
This means to go to https://data.lsst.cloud/logout (or the equivalent URL for other instances of the RSP) to remove your auth tokens. This is the same URL reachable from the âLogoutâ entry in the menu with your username on the âlanding pageâ.