Starting the server failed on `data.lsst.cloud`

Hi everyone,

The RSP server fails to start when I choose weekly 2023_16 image. The following image shows the messages I get when it fails. I should mention that I am trying to start the server on data.lsst.cloud. Could you please help me to figure out how this can be fixed? or who I should contact to?

Thanks @bazkiaei .

I can confirm that starting a new server with the 2023_16 image has worked fine for me twice today, just now and 2 hours ago. But by the way you say “when it fails” it sounds like you’ve encountered this issue multiple times?

I’m going to bring this to the attention of our RSP developers for you, because I can’t recreate the issue or fix it. As a head’s up, it might help them to know all the (other) dates and times you encountered the failure and if you were fully logged out of the Notebook Aspect when it occurred. If you can respond with that extra information here in the thread that might be helpful.

Since it’s starting the container image, something is keeping the JupyterLab process inside the container from calling back to the Hub.

First try logging out (with “/logout”) and then back in to get a new token. If it still doesn’t start after that, but fails the same way, then…

Did you install some software locally, with pip or conda? If so it is likely that you have installed something that is fighting with the other packages in the RSP. Try checking “Reset user environment” on the spawner screen and see if that helps.

1 Like

Hi @MelissaGraham ,

I should say that I haven’t been able to start the server on the 2023_16 image since last Thursday. I have tried multiple time since then, but failed!

Since I can start the server on the 2023_15 image, I have logged in and logged out several times since then.

Ok, this is interesting!
When I try to log out after running the server, if I do “Log Out”, it redirects me to a page with this message:

" 404 : Not Found

You are requesting a page that does not exist!"

But if I do “Save All, Exit, and Log Out” it works well! However, it is still not working for the 2023_16 image.

And yes, I think I have installed some software locally! I tried checking “Reset user environment” as you suggested, but it didn’t work as well.

Yes, the fact that the top “Log Out” menu item doesn’t work is known and is under investigation as to how we can disable it. [DM-38149] "Log Out" menu item in JL fails - Jira

So, you didn’t mean top “Log Out” when you advised me to log out with “/logout”?
If yes, I am not sure what “/logout” is. Could you please clarify that?

I should add that I get this message 2023-04-27T22:27:39Z [Warning] Back-off restarting failed container just before getting the failure message that I shared in the above screenshot.

Thanks for your advice @adam, @bazkiaei and I have been trying to get to the bottom of this. For reference, the above error message was returned after we first tried moving temporary files in Amir’s homespace into a scratch folder, typing logout in the terminal, and shutting down the instance with ‘Save all, Exit, and Log Out’. The temporary files look like: .user_env.20230427TTTTTT.

Amir can spin up w_2023_15 (and anything we’ve tried earlier than w15), but w_2023_16 and w_2023_17 are a no-go. There’s nothing mission-critical in Amir’s homespace right now - if we had to wipe everything and start over, that would be an option, unless you have an alternative?

1 Like

Ok, we have solved the problem.

There was a .conda file in my home directory for some reason (I had made it!) and after moving that from the home directory I could build the 2023_17 image successfully.

2 Likes

w_2023_16 and 17 use rubin-env 6.0.0, so it makes some sense that a local .conda dating from the preceding rubin-env 5.1.0 might interfere. Glad you found the solution.

2 Likes

I am not sure what “/logout” is. Could you please clarify that?

This means to go to https://data.lsst.cloud/logout (or the equivalent URL for other instances of the RSP) to remove your auth tokens. This is the same URL reachable from the “Logout” entry in the menu with your username on the “landing page”.

1 Like

Thanks for the clarification.

Glad you found the issue. I should probably add .conda to the list of things moved aside when you clear your user environment.

1 Like