DALServiceError DALQueryError with user-uploaded table

Hi everyone,

looking forward to DP1 I have been playing with the RSP and I am hitting a snag.

I’ve verified that the DP03_06 notebook runs as expected however, I have uploaded my own table of RA and Dec

changed the query to just:

query = """
        SELECT dias.ra, dias.dec, dias.ssObjectId,
        ut1.ra AS ut1_ra, ut1.dec AS ut1_dec
        FROM dp03_catalogs_10yr.DiaSource AS dias, TAP_UPLOAD.ut1 AS ut1
        WHERE CONTAINS(POINT('ICRS', dias.ra, dias.dec),
        CIRCLE('ICRS', ut1.ra, ut1.dec, 0.00278))=1
        ORDER BY dias.ssObjectId
        """

and when running the job, I hit this error:

job = rsp_tap.submit_job(query, uploads={“ut1”: ut1})
job.run()

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.0.0/lib/python3.12/site-packages/pyvo/dal/tap.py:906, in AsyncTAPJob.run(self)
    904     response = self._session.post(
    905         '{}/phase'.format(self.url), data={"PHASE": "RUN"})
--> 906     response.raise_for_status()
    907 except requests.RequestException as ex:

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.0.0/lib/python3.12/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
   1023 if http_error_msg:
-> 1024     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error:  for url: https://data.lsst.cloud/api/ssotap/async/phase

During handling of the above exception, another exception occurred:

DALServiceError                           Traceback (most recent call last)
Cell In[24], line 2
      1 job = rsp_tap.submit_job(query, uploads={"ut1": ut1})
----> 2 job.run()

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.0.0/lib/python3.12/site-packages/pyvo/dal/tap.py:908, in AsyncTAPJob.run(self)
    906     response.raise_for_status()
    907 except requests.RequestException as ex:
--> 908     raise DALServiceError.from_except(ex, self.url)
    910 return self

DALServiceError: not found: phase
 for https://data.lsst.cloud/api/ssotap/async

any help would be much appreciated!

and further, the User Table also seems to be limited to 100_000 rows.

DALQueryError                             Traceback (most recent call last)
Cell In[260], line 1
----> 1 job.raise_if_error()

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.0.0/lib/python3.12/site-packages/pyvo/dal/tap.py:996, in AsyncTAPJob.raise_if_error(self)
    994     msg = self._job.errorsummary.message.content
    995 msg = msg or "<No useful error from server>"
--> 996 raise DALQueryError("Query Error: " + msg, self.url)

DALQueryError: Query Error: IllegalArgumentException:Row count exceeds maximum of 100000

To follow up - trying another approach, I seem to run into a different error when the User Table exceeds a certain size:

---------------------------------------------------------------------------
DALQueryError                             Traceback (most recent call last)
Cell In[168], line 1
----> 1 job.raise_if_error()

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.0.0/lib/python3.12/site-packages/pyvo/dal/tap.py:996, in AsyncTAPJob.raise_if_error(self)
    994     msg = self._job.errorsummary.message.content
    995 msg = msg or "<No useful error from server>"
--> 996 raise DALQueryError("Query Error: " + msg, self.url)

DALQueryError: Query Error: IllegalArgumentException:Size of upload file exceeds maximum of 33554432 bytes.

It’s possible these are related.

Hi @steventgk , thanks for your questions on this. Hm, so I tried running your query in DP03_06 and didn’t run into any issues, but perhaps you could share your table that you uploaded? You might also want to try running again to see if the error is still occurring.

I’ll ask around as well about the user-uploaded size / row limits.

1 Like

Actually, did you want to just try running your notebook on a shorter table and see if that works?

Hi @ryanlau, thanks for responding to this. I did try shortening the table and thats how I ran into the other errors. Making it smaller I eventually hit the IllegalArgumentException:Size error. Shrinking further I hit the IllegalArgumentException:Row error.

Hi Steven,

Hmm, so I got clarification from @frossie that the limit is on size, not rows, and that the limit is 32 MB.

So you tried to shrink the table to go below 32 MB and you were still encountering the size and row errors?

Yes I stripped my table down to a gaia table of 'source_id', 'ra', 'dec' which for order ~300_000 sources is smaller than the 32MB limit but larger than the 100_000 row size the error seems to throw.

Hmm, not sure where the row limit is coming from. We will investigate and report back. If you don’t mind sharing your catalog, can you put it on /scratch and DM me to let me know where it is? If that’s a problem never mind we can generate one.

DM’d you also - but to keep the loop going I generate a catalog using the pyvo TAP service on the RSP directly.

1 Like

Ok

  • We have found the errant row limit in an external codebase and removed it.
  • We have confirmed your catalog loads correctly in our staging environment
  • The fix will be deployed on data.lsst.cloud during the Patch Thursday upgrades later today.

This question triggered some internal debate as to whether the upload limit should be size-based or row-based, which will mean revisiting this in the future, however for now, after today, you should only see the 32MB limit.

Thank you for the report and the information to reproduce it.

1 Like

Thanks for your rapid response and solution @frossie !

I imagine that discussion will be difficult. To add my two cents of input to that:

  • It would be nice to have a convenient way to check the file size before running the upload; this seems to be non-trivial for an Astropy Table. I have tried estimating the number of bytes based on the number of rows and data type, but it appears that there is more overhead when uploading using the methods described here.

  • Related to the above point, without knowing the table size well, I trialed and errored with different table sizes, I found I was limited to around 250_000 rows of a table, including only:

source_id	ra	dec
int64	float64	float64

This is not necessarily a complaint (obviously, being able to do more is nice), but it is an indicator of a possible limitation that users may need to be aware of.

Many thanks again!

1 Like