The construction and configuration of the Verification Cluster (LSST Compute) at NCSA has advanced to provide access to greater than 1000+ cores through the Simple Linux Utility for Resource Management (SLURM) scheduler. Computation on the 1000+ cores is supported by a GPFS file system for data and software. Users can begin working with this resource by logging in to the head/controller node lsst-dev7.ncsa.illinois.edu with standard NCSA id and credentials (same as the lsst-dev system) and submitting SLURM jobs utilizing the GPFS area on this system. Per user directories in a /scratch space on the GPFS have been created and are available for use. A shared software stack within the GPFS space has been provided by Science Pipelines.
What are the actual CPU chips that are in these servers?
Is there local disk available on the servers for applications where one doesn’t want hundreds of cores all writing to the same directories in the shared file system at once? In that case a common trick is to write to local disk and then do a fast bulk copy to the shared file system at the end of the job.
Often system administrators do not want people to use /tmp for this sort of thing, as it can interfere with other system functions if, for instance, it fills up.
I don’t think local disk is available, but I can inquire what would be
recommended as far as writing to the /tmp, etc . In the spirit of ‘demonstrating that the current setup is unsuitable’, maybe the admins will welcome some vigorous load on the GPFS to understand and measure the limitations.
Thanks, Greg. For me the link got somewhat corrupted, so I repeat it here:
That’s a Haswell processor, i.e., with AVX2 instructions including FMA3. Good (though I don’t know whether the heavy-duty math in the stack is compiling to those instructions at the moment).
5.3 GB RAM per core. Do you know whether you are running with hyper-threading enabled?