Recently I’ve been trying to install and do some tests with the latest version of Qserv (by the time I’m writing this post was b3a533f).
I’m doing a simple multi-node installation on my CentOS7 localhost, using the deployment tool in the Qserv repo: admin/tools/docker/deployment/localhost/run-multinode-tests.sh, and this is my env.sh.
VERSION=b3a533f
NB_WORKERS=2
Set nodes names
DNS_DOMAIN=localdomain
MASTER=master."$DNS_DOMAIN"
for i in $(seq 1 “$NB_WORKERS”);
do
WORKERS="$WORKERS worker${i}.$DNS_DOMAIN"
done
The containers at the SHA you mention were chronologically the most recent built by Travis, but happen to have been built off one of Fabrice’s as-yet un-merged branches, and so aren’t guaranteed to be in working order.
Our latest SHA on master currently is 93c3b68. Could you please give it another shot with those containers and see if you achieve a better result? If not, follow up again here and we’ll be glad to help figure it out!
Hi Teng, thanks for checking master, that will make it easier for us to troubleshoot.
We’d be interested to see:
core file
the two called-out log files
/qserv/run/etc/xrdssi.cnf from inside the container
If you’d like to give email a shot for the core file, you could hit me at fritzm@slac.stanford.edu. Otherwise, if you put it up somewhere on AFS we’ll come and get it?
I managed to get a backtrace from that core file, and what we find is a boost::uuids::entropy_error being thrown from boost::uuids::random_generator().
This is likely a result of the Qserv worker Docker container being hosted on an older kernel that is missing some recent syscalls. Could you let us know what the kernel version is that is available on your docker host?