Saving strings/objects in TAP results to FITS

Hi
We’ve encountered a minor issue with saving TAP results containing strings/Objects as FITS files.
Attached is a simple notebook that replicates the error and the code is also pasted below. Basically just doing a small area search of obsCore, converting from pyVO results to astropy table and attempting to save as FITS.

import numpy as np
from lsst.rsp import get_tap_service
service = get_tap_service("tap")
assert service is not None
query="SELECT * FROM ivoa.ObsCore WHERE CONTAINS(POINT('ICRS', 59.42914386, -48.90777109), s_region)=1"
results = service.run_async(query)
table = results.to_table()
# Show headers
print(table.colnames)
table.info()
table.write("testResults.fits", format="fits", overwrite=True)

The error is below, where is seems to not know the length of the string/Object columns and fails.

I can get round the problem by changing all the Object type astropy colunms to fixed length strings. However wondering if anything has changed in one of the layers as we think this used to work?

Or whether there is a better more robust way of going from TAP results to FITS file?

Or if there is a better way of published the data (felis yaml file) that allows for the string length to be passed through?

Thanks
Mike

---------------------------------------------------------------------------
VerifyError                               Traceback (most recent call last)
Cell In[26], line 1
----> 1 table.write("testResults.fits", format="fits", overwrite=True)

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/table/connect.py:130, in TableWrite.__call__(self, serialize_method, *args, **kwargs)
    128 instance = self._instance
    129 with serialize_method_as(instance, serialize_method):
--> 130     self.registry.write(instance, *args, **kwargs)

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/registry/core.py:386, in UnifiedOutputRegistry.write(self, data, format, *args, **kwargs)
    381     format = self._get_valid_format(
    382         "write", data.__class__, path, fileobj, args, kwargs
    383     )
    385 writer = self.get_writer(format, data.__class__)
--> 386 return writer(data, *args, **kwargs)

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/fits/connect.py:457, in write_table_fits(input, output, overwrite, append)
    454 # Encode any mixin columns into standard Columns.
    455 input = _encode_mixins(input)
--> 457 table_hdu = table_to_hdu(input, character_as_bytes=True)
    459 # Check if output file already exists
    460 if isinstance(output, (str, os.PathLike)) and os.path.exists(output):

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/fits/convenience.py:544, in table_to_hdu(table, character_as_bytes)
    540             tarray.fill_value[colname] = ""
    542 # TODO: it might be better to construct the FITS table directly from
    543 # the Table columns, rather than go via a structured array.
--> 544 table_hdu = BinTableHDU.from_columns(
    545     tarray.filled(), header=hdr, character_as_bytes=character_as_bytes
    546 )
    547 for col in table_hdu.columns:
    548     # Binary FITS tables support TNULL *only* for integer data columns
    549     # TODO: Determine a schema for handling non-integer masked columns
    550     # with non-default fill values in FITS (if at all possible).
    551     # Be careful that we do not set null for columns that were not masked!
    552     int_formats = ("B", "I", "J", "K")

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/fits/hdu/table.py:145, in _TableLikeHDU.from_columns(cls, columns, header, nrows, fill, character_as_bytes, **kwargs)
     86 @classmethod
     87 def from_columns(
     88     cls,
   (...)     94     **kwargs,
     95 ):
     96     """
     97     Given either a `ColDefs` object, a sequence of `Column` objects,
     98     or another table HDU or table data (a `FITS_rec` or multi-field
   (...)    143     ``__init__`` may also be passed in as keyword arguments.
    144     """
--> 145     coldefs = cls._columns_type(columns)
    146     data = FITS_rec.from_columns(
    147         coldefs, nrows=nrows, fill=fill, character_as_bytes=character_as_bytes
    148     )
    149     hdu = cls(
    150         data=data, header=header, character_as_bytes=character_as_bytes, **kwargs
    151     )

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/fits/column.py:1495, in ColDefs.__init__(self, input, ascii)
   1492     self._init_from_coldefs(input._coldefs)
   1493 elif isinstance(input, np.ndarray) and input.dtype.fields is not None:
   1494     # Construct columns from the fields of a record array
-> 1495     self._init_from_array(input)
   1496 elif np.iterable(input):
   1497     # if the input is a list of Columns
   1498     self._init_from_sequence(input)

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/fits/column.py:1572, in ColDefs._init_from_array(self, array)
   1569     elif "K" in format:
   1570         bzero = np.uint64(2**63)
-> 1572 c = Column(
   1573     name=cname,
   1574     format=format,
   1575     array=array.view(np.ndarray)[cname],
   1576     bzero=bzero,
   1577     dim=dim,
   1578 )
   1579 self.columns.append(c)

File /opt/lsst/software/stack/conda/envs/lsst-scipipe-10.1.0/lib/python3.12/site-packages/astropy/io/fits/column.py:675, in Column.__init__(self, name, format, unit, null, bscale, bzero, disp, start, dim, array, ascii, coord_type, coord_unit, coord_ref_point, coord_ref_value, coord_inc, time_ref_pos)
    672     for val in invalid_kwargs.values():
    673         msg.append(indent(val[1], 4 * " "))
--> 675     raise VerifyError("\n".join(msg))
    677 for attr in KEYWORD_ATTRIBUTES:
    678     setattr(self, attr, valid_kwargs.get(attr))

VerifyError: The following keyword arguments to Column were invalid:
    Column format option (TFORMn) failed verification: Illegal format `P5A()`. The invalid value will be ignored for the purpose of formatting the data in this column.

@AndyJWil
TestStringsInTAP.ipynb (21.8 KB)

Hi @MRead , thank you for the question. It looks like the issue comes from astropy. Perhaps it’s about how astropy deals with fits table columns (it seems the P5A format could not be processed). It could be a general question about TAP. Could you try other data structures for storing/saving the data? Maybe a pandas data frame, or saving it into a python pickle, or an hdf5 file.

1 Like

Hi @MRead !

Thanks for reporting this issue.
Having briefly investigated it, my theory is that this may be a bug in astropy’s FITS writer when handling variable-length strings.

The TAP service correctly returns strings with arraysize="*" in the VOTable, and pyVO converts these to a Python object dtype. However, when astropy tries to write these object dtype columns to FITS format, it attempts to use the P format (variable-length arrays) but generates an invalid format specification, P5A() instead of what I think should be PA(max_length). I will bring this up with the astropy team either by raising an issue or proposing a PR once I do a bit more investigation into this just to confirm the theory.

This may indeed have worked with previous version of pyVO if they handled variable-length strings differently, perhaps coercing to a different dtype.

In the meantime I think your workaround of converting to fixed-length strings seems the most reasonable, but hopefully it is a temporary one if we can fix this upstream.

Thanks again for catching it!

1 Like

Hi @stvoutsin , thanks for looking into it.

We think (not sure!) when it worked on our RSP (11 months ago), the strings in the FITS table came through with the correct lengths as declared in the SQL eg

TTYPE7  = 'cc_flags'                                                            
TFORM7  = '16A     '                                                            

in the yaml felis file this is/was described as

  - name: cc_flags
    "@id": "#matches_source.cc_flags"
    datatype: string
    description: cc_flags
    length: 16
    mysql:datatype: VARCHAR(16)

is it possible that previously the TAP service retained the length=16 info i.e. arraysize="16" or some such rather than arraysize="*"

Through the portal, when I save as VOTable-FITS it works and again an actual array size is written for the character columns

<FIELD ID="col_0" arraysize="5" datatype="char" name="dataproduct_type" ucd="meta.code.class" utype="ObsDataset.dataProductType">
<DESCRIPTION>Data product (file content) primary type</DESCRIPTION>
</FIELD>
<FIELD ID="col_1" arraysize="21" datatype="char" name="dataproduct_subtype" ucd="meta.code.class" utype="ObsDataset.dataProductSubtype">
<DESCRIPTION>Data product specific type</DESCRIPTION>
</FIELD>

Anyhow we have a few work-arounds but it’d be good if these were not necessary :slight_smile: