Query failing for DP0.3 when trying to calculate summary statistics over 5 years

mschwamb · November 28, 2023, 8:40pm

Hi,

This might be one for @mjuric.

One of the masters students in my research group (@Birr0) is trying to query the DP0.3 ten year table with

query = """
        select sss.ssObjectId,
        band,
        count(phaseAngle), 
        min(phaseAngle) as minPhase, 
        max(phaseAngle) as maxPhase, 
        (max(phaseAngle) - min(phaseAngle))/count(sss.ssObjectId) as phaseDensity, 
        SUM(CASE WHEN phaseAngle < 2 THEN 1 ELSE 0 END) AS oppCOV, 
        SUM(CASE WHEN phaseAngle > 2 AND phaseAngle < 15 THEN 1 ELSE 0 END) AS modCOV, 
        SUM(CASE WHEN phaseAngle > 15 THEN 1 ELSE 0 END) AS highCOV,
        AVG(mag) as avgMag,
        STDDEV(mag) as magSTD,
        AVG(magErr) as avgMagErr,
        STDDEV(magErr) as MagErrSTD
        from dp03_catalogs_10yr.SSSource as sss
        left join dp03_catalogs_10yr.DiaSource as dia on sss.diaSourceId = dia.diaSourceId
        where dia.midPointMjdTai BETWEEN 60218.00491 and 62044.25491
        group by sss.ssObjectId, band
"""

and we are finding it breaks on the RSP. This is getting some summary statistics about the observations per object to identify at different points in the survey which Solar System objects would have enough observations for good phase curves. The above query is for calculating the stats for 5 years and fails.

The one year version of this query works fine and finishes in 7 minutes, so we’re confused why the 5 year version of this query fails after awhile (about 13 minutes). Any thoughts about what’s going on and whether this is something we can do different in our ADQL?

I’ve put the notebook I’ve been trying to run on the RSP here so you can view what we’re doing and the error that’s coming up.

Thanks,
~Meg

TomLoredo · November 28, 2023, 8:56pm

Perhaps this is related to the query size issue @MelissaGraham just posted about: What is the limit for lists passed to ADQL?. So it may be worth watching that thread.

mjuric · November 28, 2023, 8:58pm

Hmm, it looks like it may be a database performance question – pinging @MelissaGraham to take a look and/or direct it to the DB team.

MelissaGraham · November 28, 2023, 9:19pm

Hello, and thank you for posting this question and including the ADQL and a link to the error, that was all very helpful.

Thank you Tom but I suspect the error is not related to limits for lists, because the query is not passing a list.

The final part of the error “DALFormatError: ProtocolError: ('Connection broken: IncompleteRead” leads me to believe this is the same error as we’ve encountered before, where the memory is overwhelmed: New DALFormatError for a DP0.3 query. Except in this case, it was overwhelmed by a single query, and not a group of people. As I understand it, there are currently limits on the size of the results that can be retrieved from DP0.3 tables (I believe this is around 30GB).

A workaround might be to do multiple smaller queries (I know this is not ideal), but I am going to bring this to the attention of RSP developer team to get more insight here or at least a confirmation. I am unsure whether multiple smaller queries in rapid succession might just lead the same issue…

frossie · November 28, 2023, 10:16pm

Just for clarity, this is a current limitation with our TAP service (not the database). We are aware of the issue and are considering a number of solutions for operations.

MelissaGraham · November 28, 2023, 11:00pm

Thanks Frossie.

And Meg, I marked my previous response as the solution, because it was confirmed to be a memory issue and because currently, the only workaround is to do multiple smaller queries (e.g., one year at a time, as the 1-year version of the query worked fine).

frossie · December 20, 2023, 9:57pm

We have made some improvements to the DP0.3 TAP server that should improve its memory performance under large query results. You’re welcome to try it and give us feedback.