Running the stack on HSC data for a DESC PSF project

joezuntz · July 21, 2016, 3:15pm

Hi all,

For a DESC project on PSF requirements we have obtained some HSC data of a star field and will be measuring PSF fields on it. Ideally we’d like to do this using (parts of) the DM stack. This topic is to document my progress and blockages on this and ask for help, as I’m not experienced in this area and mostly just cobbling things together from bits of docs.

Successful steps so far:

Installing the stack v12 on my laptop and on NERSC (cori)
Finding, installing, and running the ci_hsc repository (github lsst/ci_hsc)
Ingesting the star field data into a new repository with this command:
ingestImages.py $PWD/data $PWD/raw/*.fits --create --mode=link -c clobber=True register.ignore=True --doraise
(this makes various subdirs including AG BIAS DARK FOCUSING and the names of the target fields)

I gather the next stage is to generate the calibration repository, but this exists already in the ci_hsc example. I’m afraid I’m completely ignorant of how to do this - can someone advise?

Many thanks,
Joe

price · July 21, 2016, 3:39pm

You need the pipe_drivers package. Here’s an example of creating the calibs I used in testing DM-3373:

constructBias.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10382..10386:2 --calibId calibVersion=dm-3373 --job bias --cores 10 --batch-type=smp

ingestCalibs.py /tigress/HSC/HSC --rerun price/dm-3373 --calib=/tigress/pprice/dm-3373/CALIB --validity 720 `find CALIB  -name "BIAS-*.fits"`

constructDark.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10372..10376:2 --calibId calibVersion=dm-3373 --job dark --cores 16 --batch-type=smp

ingestCalibs.py /tigress/HSC/HSC --rerun price/dm-3373 --calib=/tigress/pprice/dm-3373/CALIB --validity 720 `find CALIB  -name "DARK-*.fits"`

constructFlat.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10484..10488:2 --calibId calibVersion=dm-3373 --job flat-g --cores 16 --batch-type=smp
constructFlat.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10464..10468:2 --calibId calibVersion=dm-3373 --job flat-r --cores 16 --batch-type=smp
constructFlat.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10352..10356:2 --calibId calibVersion=dm-3373 --job flat-i --cores 16 --batch-type=smp
constructFlat.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10418..10422:2 --calibId calibVersion=dm-3373 --job flat-z --cores 16 --batch-type=smp
constructFlat.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=10444..10448:2 --calibId calibVersion=dm-3373 --job flat-y --cores 16 --batch-type=smp

ingestCalibs.py /tigress/HSC/HSC --rerun price/dm-3373 --calib=/tigress/pprice/dm-3373/CALIB --validity 720 `find CALIB  -name "FLAT-*.fits"`

constructFringe.py /tigress/HSC/HSC --calib /tigress/pprice/dm-3373/CALIB --rerun price/dm-3373 --id visit=11718..11728:2 --calibId calibVersion=dm-3373 --job fringe-y --cores 16 --batch-type=smp

ingestCalibs.py /tigress/HSC/HSC --rerun price/dm-3373 --calib=/tigress/pprice/dm-3373/CALIB --validity 720 `find CALIB  -name "FRINGE-*.fits"`

joezuntz · July 22, 2016, 8:45am

Brilliant - thanks so much! I will give this a go and report back.

price · July 22, 2016, 1:00pm

I should also add that we have lots of calibs from processing the HSC SSP data, and would be happy to make them available for download if that would help.

price · July 22, 2016, 5:26pm

Here are our calibs: http://tigress-web.princeton.edu/~pprice/CALIB-LSST-20160419/ . It’s 733 GB total, but perhaps you can pick and choose the dates and filters you care about if you want to reduce the volume.

price · July 22, 2016, 5:36pm

Oh, and here’s the ticket where those calibs were constructed: DM-5124.

price · July 27, 2016, 12:27am

Please make sure you use the ‘master’ version of meas_extensions_psfex in your PSF experiments — we’ve just disabled oversampling, which should make for better PSF models .

joezuntz · August 12, 2016, 3:38pm

I have tried the bias correction but found it was all running extremely slowly - this may of course have been the insane Edison file system. Also The script seems to use a stack-installed mpiexec to launch the job, but NERSC uses srun from SLURM. Not sure if that would matter.

Anyway, I’ve now switched to our local cluster. After eventually realising out that the VISITS list should only contain the relevant visits (e.g. the bias ones) and not all of them(!) I have it running okay now on a single process. But when I try to run under PBS I get this error:

OpenBLAS: pthread_creat error in blas_thread_init function. Error code:11

This could be something odd on our cluster (some offending lib somewhere) but just in case I thought I should check here. The suggestion on stack overflow did not help

Thanks!
Joe

price · August 12, 2016, 4:11pm

Bias creation is fairly I/O intensive — it reads in a CCD, subtracts the overscan and then writes it out again. Later it will read those overscan-subtracted CCDs in piece by piece. All of that can be a bit overwhelming on some filesystems.

We use mpiexec just fine on our Slurm clusters. Note that you can also run those commands on a single node by specifying --batch-type=smp or just leaving --batch-type off completely since smp is the default.

I don’t remember having seen this before. Have you done export OMP_NUM_THREADS=1? It’s possible that OpenBLAS is trying to use threads which we don’t want, and that should suppress it.

joezuntz · August 15, 2016, 5:35pm

Brilliant - seems to be working nicely now on our local cluster with OMP_NUM_THREADS=1. Thanks!

The data set we were given didn’t seem to include any FLAT, FRINGE, or BFKERNEL files (or at least ingestImages.py didn’t make any links with those categories). So I’ve downloaded the ones you linked to on tigress above. Would the first step with those be ingestImages.py or would I skip straight to ingestCalibs.py for some reason?

price · August 15, 2016, 6:53pm

I have no idea what data you were given, but I don’t think there are standard calibs so you would need to build your own from raw exposures if the data you were given contains biases darks and flats, or download some built by someone else.

If you download our calibs, you shouldn’t have to do any ingestImages.py or ingestCalibs.py because they have already been ingested (put in calibRegistry.sqlite3 — make sure you also get that file!). You can jump directly to processing your data with processCcd.py or singleFrameDriver.py.

joezuntz · August 16, 2016, 2:12pm

Okay, brilliant - thanks - my problem was that the data I was given had biases and darks but no flats, as far as I could tell. So I decided to try ignoring the calibs from there and just using the ones you built.

I downloaded the bits of your calibrations that were nearby my test visit in time and used the same filter, removed everything I didn’t download from the calibRegistry.sqlite3 with some SQL delete statements.

Things seem to go okay but I now hit a seg fault:

...  Lots of reasonable looking output, then:
processCcd.calibrate.detectAndMeasure.deblend: Deblended: of 485 sources, 45 were deblended, creating 108 children, total 593 sources
processCcd.calibrate.detectAndMeasure.measurement: Measuring 593 sources (485 parents, 108 children) 
processCcd.calibrate.detectAndMeasure.measurement: Measuring 593 sources (485 parents, 108 children) 
processCcd.calibrate.detectAndMeasure.measurement.applyApCorr: Applying aperture corrections to 2 flux fields
processCcd.calibrate.detectAndMeasure.measurement.applyApCorr: Use naive flux sigma computation
processCcd.calibrate.detectAndMeasure.measurement: Measuring 593 sources (485 parents, 108 children) 
processCcd.calibrate: Copying flags from icSourceCat to sourceCat for 121 sources
processCcd.calibrate.astrometry.refObjLoader: Loading reference objects using center (1023.5, 2087.5) pix = Fk5Coord(248.4861175, 12.8905772, 2000.00) sky and radius 0.109876613994 deg
JAZ anDir = /home/zuntz/lsst/ci_hsc/sdss-dr9-fink-v5b
JAZ: andConfigPath: /home/zuntz/lsst/ci_hsc/sdss-dr9-fink-v5b/andConfig.py
Program received signal SIGSEGV, Segmentation fault.

Those last two lines are from me debugging an earlier problem - no changes apart from the print out. I ran under gdb and got this traceback:

#0  0x00007fffffff4dc0 in ?? ()
#1  0x00000031eb23493b in msort_with_tmp () from /lib64/libc.so.6
#2  0x00000031eb2348b8 in msort_with_tmp () from /lib64/libc.so.6
#3  0x00000031eb234f1c in qsort_r () from /lib64/libc.so.6
#4  0x00007fffd101f69f in permuted_sort (realarray=realarray@entry=0x7fffffff4ec0, array_stride=array_stride@entry=8, compare=<optimized out>, perm=perm@entry=0x7fffffff4e50, N=N@entry=4)
    at permutedsort.c:93
#5  0x00007fffd101eccf in healpix_distance_to_xyz (hp=hp@entry=263, Nside=Nside@entry=8, xyz=xyz@entry=0x7fffffff4f50, closestxyz=closestxyz@entry=0x7fffffff4f70) at healpix.c:1355
#6  0x00007fffd101f0e8 in healpix_distance_to_radec (hp=263, Nside=8, ra=248.48611753584646, dec=<optimized out>, closestradec=0x0) at healpix.c:1419
#7  0x00007fffd0b13ce4 in healpixDistance (hp=hp@entry=263, Nside=Nside@entry=8, coord=...) at python/lsst/meas/astrom/astrometry_net_wrap.cc:5886
#8  0x00007fffd0b14912 in _wrap_healpixDistance (args=<optimized out>) at python/lsst/meas/astrom/astrometry_net_wrap.cc:28049
#9  0x00007ffff7cfe412 in ext_do_call (nk=-889630080, na=<optimized out>, flags=<optimized out>, pp_stack=0x7fffffff51a8, func=0x7fffd0f32ab8) at Python/ceval.c:4661
 ...  Followed the usual many layers of PyEval_EvalFrameEx

Any advice? Many thanks again for all your help!

jbosch · August 16, 2016, 3:18pm

Most of the segfaults we see come from version conflicts - somewhere in the dependency tree, someone has two versions of a package (usually one living outside the LSST stack), and the wrong one is being picked up by the linker. I’d look in the dependencies of your astrometry.net package for anything you also have installed outside the stack (e.g. Python, wcslib, cfitsio), and see what your linker thinks about them (i.e. ldd, nm, or the OS X equivalents). If you’ve got multiple LSST versions of these package installed, you might want to try running your EUPS setup lines again (or in a new terminal), and if you think that is the problem, you could post those lines here.

joezuntz · August 16, 2016, 4:29pm

I can’t see any external libraries with lsof -p on the running process. Here’s a pastebin of its output:
http://pastebin.com/Ts5Shcpn

and:

 ldd /home/zuntz/lsst/anaconda/opt/lsst/meas_astrom/python/lsst/meas/astrom/_astrometry_net.so | grep -v anaconda

only reveals the obvious system libraries too.

My ASTROMETRY_NET_DATA_DIR is one possible suspect - I found if I didn’t set it manually then the system picked up the wrong one (in the conda package). Possibly some of my loading is in the wrong order? Here’s is the script I source to set up:

export PATH=/home/zuntz/lsst/anaconda/bin:$PATH
source activate lsst
source /home/zuntz/lsst/anaconda/bin/eups-setups.sh
setup lsst
setup obs_sdss
setup lsst_apps
setup obs_subaru
setup pipe_drivers

setup -j -r /home/zuntz/lsst/ci_hsc/
setup -j -r /home/zuntz/lsst/psfex/

setup astrometry_net_data

export ASTROMETRY_NET_DATA_DIR=/home/zuntz/lsst/ci_hsc/sdss-dr9-fink-v5b
export OMP_NUM_THREADS=1

Joe

price · August 16, 2016, 5:07pm

The reference catalog included in ci_hsc is drastically pared down to reduce the package size. It doesn’t include all of SDSS-DR9, which may be causing this segfault (if so, we should fix this so it fails nicely). I’ve put up the entire sdss-dr9-fink-v5b catalog for download: http://tigress-web.princeton.edu/~pprice/sdss-dr9-fink-v5b/

More discussion about reference catalogs (including the reference to cite for the sdss-dr9-fink-v5b catalog) is available here.

joezuntz · August 31, 2016, 1:50pm

Okay, thanks for the help - I’ve made a lot of progress.
I’ve managed to processCcd.py on all the data I have, though I had to set:

config.calibrate.doAstrometry = False
config.calibrate.doPhotoCal = False

as otherwise I couldn’t get rid of the astrometry.net healpix seg faults. I also had to roll my own parallelization as the inbuilt MPI and SMP approaches with singleFrameeDriver.py both had their own issues.

So now I have a bunch of files in a rerun subdirectory, in some_number/HSC-I/output and some_number/HSC-I/corr. There seems to be PSF information in the corr/CORR* files, but I can’t quite figure out how to deserialize it (my goal is to get to the point where I can call some function and get a PSF image at a chosen location in the plane).

I thought it could be to do with lsst.meas.extensions.psfex.PsfexPsf.readFits but I couldn’t get that to return anything other than None.

Any help would be amazing!

Cheers,
Joe

jbosch · August 31, 2016, 3:13pm

I’m afraid that the moment you’ll have to read the full images in and then extract PSFs from them. This is something we’re actively working on fixing, but it’s a bit structural, so that may not be immediate.

That will look something like:

import lsst.daf.persistence
butler = lsst.daf.persistence.Butler("/path/to/rerun")
calexp = butler.get("calexp", visit=NNN, ccd=XXX, immediate=True)
psf = calexp.getPsf()
# get an lsst.afw.image.ImageD of the PSF at point (x, y), centered on the middle pixel of the image.
image = psf.computeKernelImage(lsst.afw.geom.Point2D(x, y))
array = image.getArray()  # get a numpy array

You can use computeImage instead of computeKernelImage if you want the PSF image to be centered at (x, y), so you could e.g. use it to subtract a star at that position. That will still return a postage stamp; Image objects have an “xy0” integer offset that will be set here to indicate how to align the postage stamp with the larger image. (For computeKernelImage, that offset is set to put the origin of the postage stamp on the middle pixel).

joezuntz · August 31, 2016, 3:34pm

Brilliant - thanks Jim - that seems to work nicely.

price · August 31, 2016, 5:43pm

It would be good to get to the bottom of these issues so they won’t bite others. Did you use the full SDSS-DR9 I provided? What was the problem with singleFrameDriver.py?

joezuntz · September 5, 2016, 2:47pm

I used the full SDSS-DR9 one that you pointed to, yes. I suppose the target regions might be outside the DR9 region, not sure.

I just get the message “Aborted (signal 6)” from the PBS version of single_frame_driver.py. There are various error-looking messages in there but I can’t tell which ones should be expected and which ones are red flags. I will email you the full output from the job (the observer asked that we keep the target names private for now).

Cheers,
Joe