Issue affecting RSP deployment at NCSA (lsp-stable) - RESOLVED

[This applies to one of our internal RSP deployments for project staff at NCSA. It does not affect users using the Google-based deployment at data.lsst.cloud or other internal deployments]

For infrastructure reasons that are still being investigated [see at the end if you want the technical explanation], the RSP at NCSA is sometimes unable to provide the correct list of images (recommended, latest weekly etc) in the menu page that appears when you ask for a new notebook server (aka “the spawner page”). If this is happening to you, you can still get your desired image by selecting it from the drop down menu. Do not forget to also select the radio button when you do that.

I Can Handle The Geek

Somewhat simplified explanation: an RSP service pre-populates the “popular” docker images (recommended, last 2 weeklies, etc) on cluster nodes so that when you ask for a notebook you do not experience the sizeable delay incurred by downloading a large image on demand. Another service then asks the node whether it has those images and generates the menu form you see. For reasons we don’t understand completely at this point, when a service asks the nodes whether they indeed have those images, the nodes claim they don’t have them, even if they do. The spawner page dealt with the claim that no images were cached poorly; it was now been fixed to at least let the user proceed with selecting an uncached image. Depending on which node you land on, you may see no problem, a partial menu of cached images or no cached images at all.

This issue is now understood and has been temporarily addressed so things should be back to normal for users.

Many thanks to @adam @rra Yan and @ktl for investigating.

For the curious, apparently there is a configuration parameter in Kubernetes capping the number of images reported when you query for node images. The default is 50. For various reasons at NCSA we had gone past that limit.