What is the limit for lists passed to ADQL?

MelissaGraham · November 28, 2023, 1:25am

The advice for composing an ADQL query to retrieve data for a set of known objects identified by, e.g., their objectId, is to use a WHERE ... IN ().

By advice I mean the recommendations in the documentation for DP0.2 and for DP0.3 queries.

There seems to be a limit on how long the list can be, but I’m not sure exactly what the limit is. The DP0.3 advice mentions 50,000 objects. Is this known?

For example, in a query using ssObjectId on the MPCORB table, I’ve found that passing a list of ~10,000 ssObjectId works fine, and a list of ~50,000 didn’t cause an immediate error but did take >6 min so I stopped it, and that a list of ~100,000 caused an immediate error (pasted below).

Full version of the immediate error from a 105,133 object list.

---------------------------------------------------------------------------
DALQueryError                             Traceback (most recent call last)
Cell In[25], line 4
      1 query2 = "SELECT mpcDesignation, mpcNumber, ssObjectId, fullDesignation "\
      2          "FROM dp03_catalogs_1yr.MPCORB "\
      3          "WHERE ssObjectId IN " + tempstring
----> 4 results2 = service.search(query2).to_table()
      5 results2

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/tap.py:257, in TAPService.run_sync(self, query, language, maxrec, uploads, **keywords)
    228 def run_sync(
    229         self, query, language="ADQL", maxrec=None, uploads=None,
    230         **keywords):
    231     """
    232     runs sync query and returns its result
    233 
   (...)
    253     TAPResults
    254     """
    255     return self.create_query(
    256         query, language=language, maxrec=maxrec, uploads=uploads,
--> 257         **keywords).execute()

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/tap.py:1076, in TAPQuery.execute(self)
   1062 def execute(self):
   1063     """
   1064     submit the query and return the results as a TAPResults instance
   1065 
   (...)
   1074        for errors parsing the VOTable response
   1075     """
-> 1076     return TAPResults(self.execute_votable(), url=self.queryurl, session=self._session)

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/adhoc.py:111, in AdhocServiceResultsMixin.__init__(self, votable, url, session)
    110 def __init__(self, votable, url=None, session=None):
--> 111     super().__init__(votable, url=url, session=session)
    113     self._adhocservices = list(
    114         resource for resource in votable.resources
    115         if resource.type == "meta" and resource.utype == "adhoc:service"
    116     )

File /opt/lsst/software/stack/conda/miniconda3-py38_4.9.2/envs/lsst-scipipe-7.0.1/lib/python3.11/site-packages/pyvo/dal/query.py:319, in DALResults.__init__(self, votable, url, session)
    317 self._status = self._findstatus(votable)
    318 if self._status[0].lower() not in ("ok", "overflow"):
--> 319     raise DALQueryError(self._status[1], self._status[0], url)
    321 if self._status[0].lower() == "overflow":
    322     warn("Partial result set. Potential causes MAXREC, async storage space, etc.",
    323          category=DALOverflowWarning)

DALQueryError: missing required parameter: LANG

This question has also come up in this Support topic: ADQL question: Selecting a random subset of objects in DP0.

MelissaGraham · January 12, 2024, 7:14pm

While not explicitly verified to be a hard limit at exactly 50,000, the provided guidance is to use the `WHERE … IN ()``` method for up to 50,000 objects.