DP0.2 TAP query error

Hi

I’m trying a notebook TAP query, joining DiaObject and ForcedSourceOnDiaObject, using service.search

select f.* 
from dp02_dc2_catalogs.DiaObject as o join dp02_dc2_catalogs.ForcedSourceOnDiaObject as f on o.diaObjectId=f.diaObjectId
where o.nDiaSources > 389 or (o.nDiaSources = 389 and o.diaObjectId <= 1253460847850684465 )

After some minutes it fails with the below output. Not sure if it’s because I’m hitting too much data being returned or something I’m doing wrong with the query structure itself, a similar query of DiaObject joined with DiaSource works. The query that fails seemed to work through the portal initially but when I increased the row limit from 50,000 to 5,000,000 it seems to be hanging. I’m also running the query as a count(*) through the portal and that’s hanging too.

I suspect user error but if anyone has any ideas what’s causing this that’d be useful.

Thanks
Mike

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/query.py:241, in DALQuery.execute_votable(self, post)
    240 try:
--> 241     return votableparse(self.execute_stream(post=post).read)
    242 except Exception as e:

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/astropy/utils/decorators.py:604, in deprecated_renamed_argument.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    602         warnings.warn(msg, warning_type, stacklevel=2)
--> 604 return function(*args, **kwargs)

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/astropy/io/votable/table.py:177, in parse(source, columns, invalid, verify, chunk_size, table_number, table_id, filename, unit_format, datatype_mapping, _debug_python_based_parser)
    174 with iterparser.get_xml_iterator(
    175     source, _debug_python_based_parser=_debug_python_based_parser
    176 ) as iterator:
--> 177     return tree.VOTableFile(config=config, pos=(1, 1)).parse(iterator, config)

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/astropy/io/votable/tree.py:3858, in VOTableFile.parse(self, iterator, config)
   3856 config["_current_table_number"] = 0
-> 3858 for start, tag, data, pos in iterator:
   3859     if start:

ValueError: 1:0: syntax error

During handling of the above exception, another exception occurred:

DALServiceError                           Traceback (most recent call last)
Cell In[37], line 23
     21 #  source_results = service.search(source_query)
     22   print("querying Forced",forced_query)
---> 23   forced_results = service.search(forced_query)
     24   if loop==1:
     25       break

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/tap.py:257, in TAPService.run_sync(self, query, language, maxrec, uploads, **keywords)
    228 def run_sync(
    229         self, query, language="ADQL", maxrec=None, uploads=None,
    230         **keywords):
    231     """
    232     runs sync query and returns its result
    233 
   (...)
    253     TAPResults
    254     """
    255     return self.create_query(
    256         query, language=language, maxrec=maxrec, uploads=uploads,
--> 257         **keywords).execute()

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/tap.py:1076, in TAPQuery.execute(self)
   1062 def execute(self):
   1063     """
   1064     submit the query and return the results as a TAPResults instance
   1065 
   (...)
   1074        for errors parsing the VOTable response
   1075     """
-> 1076     return TAPResults(self.execute_votable(), url=self.queryurl, session=self._session)

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/query.py:243, in DALQuery.execute_votable(self, post)
    241     return votableparse(self.execute_stream(post=post).read)
    242 except Exception as e:
--> 243     self.raise_if_error()
    244     raise DALFormatError(e, self.queryurl)

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/query.py:252, in DALQuery.raise_if_error(self)
    250 if self._ex:
    251     e = self._ex
--> 252     raise DALServiceError.from_except(e, self.queryurl)

DALServiceError:  for https://data.lsst.cloud/api/tap/sync

Thank you for your report. We are investigating; superficially the behavior you are seeing is surprising.

In the mean time, it appears that you can get the results you are looking for by breaking the query apart into two pieces: query DiaObject for the diaObjectIds that you want, and then query ForcedSourceOnDiaObject along these lines:

SELECT * FROM dp02_dc2_catalogs.ForcedSourceOnDiaObject
  WHERE diaObjectId IN (1251824774548553839, 1253100208036773889, 1250153516874334229,1253460847850684465, 1253065023664685204)

One question to help us look more closely at this: would you be able to pin down the exact time (at least down to the minute, say) when you ran the notebook query that ended with the traceback reported above?

Interesting, thanks Mike for posting this.

I think it’s not necessarily user error. I tested a version of the query that includes a region constraint and uses a limit of 300 on nDiaSources instead of 389 (to make it a faster test).

SELECT f.* 
FROM dp02_dc2_catalogs.DiaObject AS o 
JOIN dp02_dc2_catalogs.ForcedSourceOnDiaObject AS f 
ON o.diaObjectId = f.diaObjectId
WHERE CONTAINS(POINT('ICRS', o.ra, o.decl), CIRCLE('ICRS', 62, -37, 0.4)) = 1 
AND ((o.nDiaSources > 300) OR (o.nDiaSources = 300 AND o.diaObjectId <= 1253460847850684465))

The above query takes only a few seconds and returns 4505 rows. That’s over 0.5 square degrees, so by rough approximation, your full-area query is trying to return 2 million rows. Which should be possible.

In the meantime while this error is being investigated, I put below an example of how to create the type of list that Gregory mentions.

Demo of how to make and pass long object id lists to queries.

The following demonstrates how to create a string list of comma-separated object identifiers, in parentheses, that can be passed to an ADQL query.

However, be aware that there is a limit on the list size, which seems to be around 50000, and so this might not work for your use case.

# import packages
from lsst.rsp import get_tap_service
import numpy as np

# set up the TAP service
service = get_tap_service("tap")

# define the first query only on diaObject table
# use a small area and a lower max nDiaSources for this quick test
query = '''
SELECT diaObjectId, nDiaSources
FROM dp02_dc2_catalogs.DiaObject 
WHERE CONTAINS(POINT('ICRS', ra, decl), 
CIRCLE('ICRS', 62, -37, 0.2)) = 1 
AND nDiaSources > 200
'''

# execute the first query (it returns 44 diaObjects)
results = service.search(query).to_table()
print(len(results))

# create a string list of the 44 diaObjectIds
temp = np.asarray(results['diaObjectId'][:], dtype='int')
mylist = "(" + ','.join(['%20i' % num for num in temp]) + ")"
print(mylist)

# define the second query only on ForcedSourceOnDiaObject
query2 = "SELECT * FROM dp02_dc2_catalogs.ForcedSourceOnDiaObject "\
         "WHERE diaObjectId IN "+mylist

# execute the second query (it returns 17705 sources)
results2 = service.search(query2).to_table()
print(len(results2))

Thanks @gpdf

I think the error posted came from my notebook execution at,

Last executed at 2023-11-30 21:07:52 in 30m 0.22s

note I’m on UK time. This was probably by 2nd or 3rd attempt (reducing the anticipated resultset size) and I’m afraid I don’t have a record of the first run producing the error.

My portal query is still showing as running though I can’t seem to get the UWS info page or cancel.

Phase:
EXECUTING
Start Time:
2023-11-30T17:44:41.983871127Z
UWS Job URL:
Copy to clipboard: https://data.lsst.cloud/api/tap/async/ke92r6bmobebz8bg
https://data.lsst.cloud/api/tap/async/ke92r6bmobebz8bg
Show
ID:
1701366281983

Thanks @MelissaGraham , funnily enough I thought about doing a long object Id list query as have used this for other use cases where I couldn’t express it as a join. I thought the above approach was more “correct” and generally I like to steer away from large WHERE IN clauses but I’ll give that a go.