Verification Cluster (LSST Compute) Advancement

daues · September 16, 2016, 7:58pm

The construction and configuration of the Verification Cluster (LSST Compute) at NCSA has advanced to provide access to greater than 1000+ cores through the Simple Linux Utility for Resource Management (SLURM) scheduler. Computation on the 1000+ cores is supported by a GPFS file system for data and software. Users can begin working with this resource by logging in to the head/controller node lsst-dev7.ncsa.illinois.edu with standard NCSA id and credentials (same as the lsst-dev system) and submitting SLURM jobs utilizing the GPFS area on this system. Per user directories in a /scratch space on the GPFS have been created and are available for use. A shared software stack within the GPFS space has been provided by Science Pipelines.

Further details are provided on our first draft of documentation at
https://developer.lsst.io/services/lsst-dev7.html .

gpdf · September 16, 2016, 8:59pm

Thanks, @daues! Great news.

Two questions:

What are the actual CPU chips that are in these servers?

Is there local disk available on the servers for applications where one doesn’t want hundreds of cores all writing to the same directories in the shared file system at once? In that case a common trick is to write to local disk and then do a fast bulk copy to the shared file system at the end of the job.

Often system administrators do not want people to use /tmp for this sort of thing, as it can interfere with other system functions if, for instance, it fills up.

Thanks,
Gregory

daues · September 16, 2016, 9:38pm

If I have it right, the CPU is
[http://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz](http://Intel® Xeon® CPU E5-2680 v3)

I don’t think local disk is available, but I can inquire what would be
recommended as far as writing to the /tmp, etc . In the spirit of ‘demonstrating that the current setup is unsuitable’, maybe the admins will welcome some vigorous load on the GPFS to understand and measure the limitations.

        Greg

gpdf · September 16, 2016, 9:50pm

Thanks, Greg. For me the link got somewhat corrupted, so I repeat it here:

That’s a Haswell processor, i.e., with AVX2 instructions including FMA3. Good (though I don’t know whether the heavy-duty math in the stack is compiling to those instructions at the moment).

5.3 GB RAM per core. Do you know whether you are running with hyper-threading enabled?

Gregory

gpdf · September 16, 2016, 10:29pm

@ktl pointed out that this was covered in the document that @daues mentioned. Answer: yes (for now) - i.e., 2.7 GB/“core”.

swinbank · October 31, 2016, 4:52pm

Note that the current home of this documentation seems to be https://developer.lsst.io/services/verification.html.