Doxygen markup seems to cause py3 string parsing to sometimes fail

natelust · August 23, 2016, 4:50pm

When porting code to python 3, I came across a situation where the doxygen markup seems to tell python that we are trying to write a unicode literal, and importing the file fails with the error:

Traceback (most recent call last):
  File "tests/testPsfIO.py", line 44, in <module>
    import lsst.meas.algorithms as algorithms
  File "/Users/nate/repos_lsst/meas_algorithms/python/lsst/meas/algorithms/__init__.py", line 29, in <module>
    from .detection import *
  File "/Users/nate/repos_lsst/meas_algorithms/python/lsst/meas/algorithms/detection.py", line 211
    """
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2129-2130: truncated \uXXXX escape

The docstring itself can be seen on: Github. Prefixing the docstring with r (telling python a raw string is desired) fixes the problem. I am wary of embedding the escape character in the docstring itself, as it may interfere with doxygen generation. Does anyone have thoughts or preferences?

timj · August 23, 2016, 4:55pm

Specifically it’s the \util line in the string. That, obviously, isn’t allowed in a python 3 strings.

>>> type("\until")
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

It looks like r"" might be the only option until we implement RFC-214.

>>> r"\until"
'\\until'

jsick · August 23, 2016, 4:56pm

I looked at what PEP 257 recommends and they say this:

For consistency, always use “”“triple double quotes”"" around docstrings. Use r""“raw triple double quotes”"" if you use any backslashes in your docstrings. For Unicode docstrings, use u""“Unicode triple-quoted strings”"" .

So it seems that using raw strings for the current generation of doxygen-marked-up docstrings is the way to go. Numpydoc won’t have this issue (maybe for latex math in docstrings, but I don’t recall ever doing anything special in those cases).

natelust · August 23, 2016, 5:00pm

That is what I figured, but I wanted to get other input before I just did it.

KSK · August 23, 2016, 5:06pm

I’m fairly surprised this hasn’t come up before. We use \until liberally in doxygen, so this will come up again.

rowen · August 23, 2016, 5:10pm

Why not just use @util instead of \util and so on for all doxygen commands?

KSK · August 23, 2016, 5:11pm

Was that an intentional change in spelling? Is \until actually identical to @until?

jbosch · August 23, 2016, 5:11pm

+1

I tend to prefer “@” over “” in Doxygen universally (even in C++) for similar reasons - almost everything tries to interpret “”, but only Doxygen (of the things we run on our source files) pays attention to “@”.

timj · August 23, 2016, 5:26pm

Python 2, as is its wont, is far more relaxed about unicode in general so just silently ignores any unicode issues in strings. Python 3 cares deeply about unicode so complains.