Running processFile with LSST Stack v12.0

josePhoenix · July 1, 2016, 10:54pm

I’m attempting to use the processFile scripts from https://github.com/lsst-dm/processFile to process some FITS images we put together. (We’re specifically interested in the performance of the deblender, which I’m not sure is run by processFile. Insights appreciated!)

I’ve gotten as far as installing the 12.0 stack from the conda channel and cloning the processFile repo. The README says the next step is to run scons, but I get stuck on the lack of C++11 support in our (admittedly old) RHEL server. Any workarounds for this? The processFile code seems to be pure Python, I think.

(Perhaps a similar/related query to How to run the DM stack on simulated FITS images ?)

josePhoenix · July 1, 2016, 10:56pm

Ah, I forgot to include the actual output. Here it is.

(lsst)18:55:22 science6:processFile[master] jlong$ source eups-setups.sh
(lsst)18:55:34 science6:processFile[master] jlong$ setup -r .
(lsst)18:55:39 science6:processFile[master] jlong$ scons
scons: Reading SConscript files ...
EUPS integration: enabled
Checking who built the CC compiler...error: no result
CC is gcc version 4.4.7
Checking for C++11 support
Checking whether the C++ compiler works... no
C++11 extensions could not be enabled for compiler 'gcc'
(lsst)18:55:58 science6:processFile[master] jlong$

josePhoenix · July 2, 2016, 12:01am

Okay, I’ve forked the processFile code to https://github.com/josePhoenix/processFile for fiddling with. So far I’ve renamed TraceLevelAction to LogLevelAction to track a rename elsewhere.

Now I’m running python bin/processFile.py /dev/null --show config=*.do* as described in the docs to print all the config options.

I was getting

Traceback (most recent call last):
  File "bin/processFile.py", line 343, in <module>
    config.calibrate.detection.reEstimateBackground = False
AttributeError: 'CalibrateConfig' object has no attribute 'detection'

Commenting out that line seems to let the config listing part of the task run. I can’t figure out what it should be though.

cc @fjaviersanchez

timj · July 2, 2016, 9:55am

Regarding the compiler, you need to have gcc4.8 or newer else scons pre-emptively complains (as you found). We use devtoolset-3 to get that: C++11/14 (gcc 4.8) now the baseline

jsick · July 2, 2016, 7:21pm

This is a good reminder that I need to overhaul the READMEs of all packages to point to installation docs (such as https://pipelines.lsst.io/install/newinstall.html#prerequisites). Since the Conda installation (currently) does not contain several important packages, a large fraction of our user base still needs to install packages from source and we aren’t providing documentation for that pathway. DM-4619 may cover this work.

ljones · July 3, 2016, 6:41pm

I thought @mjuric had added gcc to the latest conda installation?

josePhoenix · July 3, 2016, 6:55pm

I don’t believe so. At least, which gcc / which g++ don’t point to things in my conda environment. I’m installing according to https://pipelines.lsst.io/install/conda.html.

KSK · July 7, 2016, 1:21am

It appears that processFile was not updated when we overhauled our processing pipelines. I worry that it may take a bit of work to do this. I wonder if @rowen might be able to comment on how hard this would be to do.

rowen · July 7, 2016, 5:18pm

I assume the point is to avoid a butler, which is a great pity, since so much code has to be duplicated. The subtasks that it calls now want a butler in order to retrieve the reference catalog, though I guess you can pass in a reference catalog loader instead. Whether it’s worth the hassle I can’t say. It seems to me that a much more flexible solution is to have a way to make a butler in this situation.

KSK · July 7, 2016, 5:30pm

Russell makes a good point. Using a proper data repository would help with this. @josePhoenix: where are the FITS files you are using coming from? We should be able to help put together data repository fairly easily.

fjaviersanchez · July 7, 2016, 7:16pm

Sorry I just saw this.

It seems to be a small bug in the code. I updated it here:

Basically, you have to change calibrate.detection for just detection. After this, you will run into more trouble. I don’t know what to do with this but, it looks like a lot of things have to be changed to make it run on the new pipeline.

My problems now are:

AttributeError: 'CalibrateTask' object has no attribute 'getCalibKeys' and when I comment the lines that make use of this out then I run into this:

AttributeError: 'FakeAmp' object has no attribute 'getSuspectLevel'

josePhoenix · July 7, 2016, 10:46pm

what I’m really trying to do is take some postage-stamp type images of blended sources, run them through the deblender implementation as it currently stands, and examine the measurements that the deblender gets out, whatever they may be. I’m not sure what the easiest path to this is. It seems like passing arbitrary images into the deblending task requires a fair bit of mocking of interfaces, and I found processFile while looking to see if someone had already done this mocking.

The images all have some known angular scale, and filter transmission curve information. Some of them have WCS information, but others are from simulations (processed into simulated observations). I’m not sure that’s enough in common to make building a data store worthwhile. (When I looked at it, it seemed to include things like specifying camera geometry, which due to the heterogeneity of the data doesn’t make much sense. I might be misremembering!)

fjaviersanchez · July 7, 2016, 11:21pm

I am trying to do pretty much the same. I’m trying to test the deblender implementation with simulated images from GalSim/WeakLensingDeblending. I added WCS information in the headers but I am having trouble trying to use processFile. I also tried processCcd and processEimage with no success.

KSK · July 8, 2016, 4:42pm

@josePhoenix and @fjaviersanchez: Thanks for the added info. I am looking into how to resurrect processFile. In your particular use case, I’d like to be able to run just the bits you need. I’ll look into that too. Is it possible to attach an example image that I can play with?

josePhoenix · July 8, 2016, 5:32pm

Sure, here’s a typical blended object postage stamp from my simulation/analysis pipeline. This one is from a CANDELS mosaic, degraded to 0.7" seeing. example_blend_EGS_F606W_0.7arcsec.fits (14.1 KB)

fjaviersanchez · July 8, 2016, 6:11pm

Thanks a lot, this is one of the simulated CCD images from the WeakLensingDeblending package: https://drive.google.com/open?id=0Byv2YnL50yMsR3M1QTFCM0Q3TGs

mjuric · July 8, 2016, 8:51pm

I believe gcc is used to build the conda-distributed binaries, but only the runtime gets installed when you install the stack with conda (which is the proper behavior, IMHO – you shouldn’t need the whole toolchain to just run the code).

If you want to also build your own code using the conda-delivered gcc, run conda install gcc. Note that this is not an officially supported way to do it; very few people use it for daily development, so I’m not sure how well we’d be able to help you if you run into problems. It’s easy to try, though, so it may be worth a shot.

KSK · July 9, 2016, 8:14pm

@josePhoenix: I don’t think I can make the pipeline work on such small postage stamps. The problem is that the deblender requires a list of peaks and the list of peaks comes from the detection phase. The detection phase depends on background and psf estimation. With such a small postage stamp it’s hard to estimate the background and there are no PSF source, so things fall apart.

The options are to use larger cutouts of the mosaic so that we get a few stars in there or to try to feed these postage stamps to the deblender directly. If we go with the latter, we’ll need a catalog of peaks to go with each postage stamp. With the former, I think we can just run things through the way we do normal processing.

KSK · July 9, 2016, 11:18pm

@fjaviersanchez: I now have a solution that I hope will work for you. It needs some more work, but I thought I’d put it out there now so I can get comments. This solution makes it fairly trivial to ingest arbitrary files into LSST data repositories so you can treat them like any other image in the stack. In order to use it, you’ll need the tickets/DM-6924 branch of both obs_file and pipe_tasks.

To run processFile is a three step process. The first two are one time only if your input data are static. The following assumes you have a relatively recent master build and the ticket branches setup.

Create a repository to feed the ingest. This involves making a directory containing a file called _mapper. I made a directory called test_out. The content of the file is the single line lsst.obs.file.FileMapper.
Ingest the files. This puts the files in the location expected by the rest of the machinery and builds a database to lookup the available images.
$> ingestFiles.py test_out test_imgs/test.fits.gz
The first argument is a valid repository and the rest of the arguments will be treated as images to ingest. If no output is specified the images will be ingested into the repository specified (i.e. test_out).
Run processCcd
$> processCcd.py test_out/ --id filename='test.fits.gz' --config isr.noise=100000 isr.addNoise=True --output test_out
Just specify the file you want to process. In the case of your data, I also had to add background noise, otherwise certain algorithms, e.g. cosmic ray detection, do not behave well. The output argument is required.

Below is an example produced by the displaySources.py utility. If you have display_ds9 setup, you can specify a repository and a file and it will plot the parent sources as blue circles and the children as red plusses. Looking at this image, there are some things to tune. For example, I see some faint sources I would have expected to be detected, but maybe that is an artifact of my arbitrary choice of background noise. FYI, the blue patches are the pixels flagged as detected.

josePhoenix · July 11, 2016, 3:10am

Thanks Simon! That’s good to know. Would it be totally crazy to just stick our synthetic PSF and a bit of representative background on the side of a tile? Depending on the extent that the deblender performance is dependent on the PSF estimation, we might mislead ourselves in our analysis, I suppose.