Uppercase characters in Rubin table schemas and TAP services

Hi everyone!
I have a couple questions about the use of uppercase in Rubin table names, columns and TAP services. I noticed that the documentation (https://sdm-schemas.lsst.io/) describes many fields having uppercase (e.g. dp1 → Object table → coord_decErr). The same convention is exposed by the TAP service when looking at the available tables.
The tutorial notebooks (e.g. dp1 102_*) show examples where the queries are done respecting those exact names (something like “SELECT TOP 10 coord_ra, coord_dec, g_cModelMag FROM dp1.Object”) , although I tried with “SELECT TOP 10 coord_ra, coord_dec, g_cmodelmag FROM dp1.Object” and it also works. The column names on each query result match the respective queried column names.
When looking at other TAP services around the world, the column names are always (as far as I’ve seen) in lowercase. Why did Rubin decided to use mixed-casing? It’s something like snake_case + camelCase.
I’m developing some services for users in the ALeRCE broker and I’d like to keep the same naming convention as Rubin, but at the same it behaves differently to other astronomical services. When querying “select top 10 ObS_Id, S_ra, s_dEc from ivoa_des_dr2.obscore” from https://datalab.noirlab.edu/tap, it works but i get the answer in lowercase. This is fine because SQL is supposed to be case-insensitive and the reported column names by their TAP service (as it’s the case for any other TAP service) is in lowercase.
I have a couple design alternatives from the ALeRCE side, like enforcing the use of the exact Rubin names, but that will require annoying stuff like forcing our users (which are Rubin users) to use quotes around the fields they want to retrieve. Another way to go is the standard behavior where the user can write the query in any casing, but the names are shown by our service metadata in lowercase, as well as the names of the fields in the answer we provide.
It would be great to understand the reason behind the choice of this naming style in Rubin to help guide our decision, and also if it required some special considerations from your teams behind RSP, qserv and other services. I’m also interested to know what behavior would users prefer.
Cheers,

Ignacio Reyes Jainaga
The ALeRCE broker

1 Like

While searching through the lsst.io documentation, I found that @svoutsinas has worked on many TAP-related things in Rubin. Did you have to make any special considerations regarding the “mixed casing” of column names?

@frossie @MelissaGraham, given your experience developing, using, and teaching about the RSP, what do you think the behavior of our services should be? Should we force users to match the exact field names (e.g.,
SELECT ra, “raErr”, dec, “decErr” FROM alerce.lsst_object)
or follow the more “standard” TAP behavior (e.g.,
SELECT ra, raErr, dec, decErr FROM alerce.lsst_object, or even
SELECT ra, raerr, dec, decerr FROM alerce.lsst_object, but always returning the result columns in lowercase)?

Cheers,
Ignacio

I can’t answer about TAP specifically, but I suspect the reason that most of these columns were made camelCase in the first place is because that was the DM convention until RFC-623, where snake_case was formally allowed in coding guidelines and recommended for new code. Converting the entire codebase to snake_case was (rightly, IMO) deemed not to be worthwhile at the time.

I don’t remember anyone arguing strongly for column names to be converted to lowercase as part of that RFC, probably because Python and astropy/pandas column names in particular are case sensitive and so even just dropping all of the capital letters from object table column names would be a very disruptive change.

As Dan said, the mixed-case naming is a historical relic – our tabular data model has had this combination of underscores and capitalization for a very long time (probably going back to at least 2007) and we’ve just made our services work with that. Abandoning capitalization would likely require adding more underscores to many column names, in order to make the names easy for users to parse. We just haven’t had the time or appetite to make such changes.

That said, my understanding is that the behavior (queries are case-insensitive) you describe is expected, and that we are complying with TAP standards even if it is a little unusual.

I think it is up to you how your services handle things. Catalogs will continue to contain mixed case for Prompt Products and Data Preview 2, but we may be able to take community feedback into account when designing Data Release 1 data products.

Hi again, Ignacio –

Note that I marked the above as a “solution” because technically it answers your main questions, but others are still welcome to chime in here with more details or suggestions.