Installing packages in lustre managed system

Tags: #<Tag:0x00007fb3878e86f8>

Hello,

I have found a problem when trying to install the LSST packages using the “rebuild lsst_distrib” command line. The error message is as follows:

~/lsstsw$ rebuild lsst_distrib
flock-fd: 200: Function not implemented
a rebuild is already in process.

The flock error points at the use of the lustre file system in our computing cluster:

(lsst-scipipe-984c9f7) jahumada@leftraru3:~/lsstsw$ pwd
/home/jahumada/lsstsw
(lsst-scipipe-984c9f7) jahumada@leftraru3:~/lsstsw$ mount | grep home
systemd-1 on /home type autofs (rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=29609)
172.22.30.1@o2ib:172.22.30.2@o2ib:/home on /home type lustre (rw,lazystatfs)
172.22.30.1@o2ib:172.22.30.2@o2ib:/home on /mnt/flock type lustre (rw,localflock,lazystatfs)

So it might be that currently lsstsw is not compatible with this type of file distribution. Do you know of any way to walk-around this issue?

Thanks very much!

Many of our build steps and (SQLite) Butler registries assume the use of filesystem-based locking: https://pipelines.lsst.io/install/prereqs.html#filesystem-prerequisites
If you have an alternative way of doing locking, we could look into incorporating it, but we will need some kind of lock mechanism.

It’s a little strange that your filesystem seems to be mounted with localflock which is supposed to support the flock call, but flock-fd, which calls Python’s fcntl.flock(), is reporting that the function is not implemented.

In this particular case, if you’re manually calling rebuild, you could try working around this by hacking lsstsw/bin/flock-fd or possibly by defining a shell function to override it:

$ flock-fd () {
    /bin/true
}

implementing flock-fd as the following script:

$ cat bin/flock-fd
#!/bin/bash

if [ $# -ne 1 ]; then
echo "usage: flock-fd "
exit 1
fi

flock -w 0 $1

give the output:

$ rebuild lsst_distrib
flock: 200: Function not implemented
a rebuild is already in process.

so it seems it’s not python’s problem, but flock itself is not available (despite localflock). no idea why.

i will replace with a dummy function and hope it works. just thought this might help.