Methods for Stringification

jbosch · January 23, 2017, 7:52pm

Continuing the discussion from Three things for productivity:

I don’t have a strong preference, but my sense is that for many small classes they can be the same or be closely related (one can delegate to the other). In rarer cases where stringification of an object could be multiple lines, I imagine we’d want to have two distinct methods in C++ as well.

I’d love to have someone better versed than me in the the Python conventions for str and repr put together a straw-man for how we should define them in general and how they should relate to any C++ stringification methods. It seems like having them present across the stack is something a lot of people want, and it’s not all that difficult.

AFAIK, there’s a relatively strong recommendation for how repr should behave, but three isn’t one for str, and it’s never been clear to me what to do with repr when it isn’t possible to make the string eval to construction.

pschella · January 24, 2017, 12:34am

In general I think __repr__ should always include the name of the type (and potentially / preferably nothing else), since this is what pybind11 prints in case a method is called with the wrong arguments. It is really inconvenient if it then says something like: available overloads int and double and called with <insert insanely long string that does not include the type name here>. Of course it doesn’t print exactly this, but you get the idea.

timj · January 24, 2017, 4:06pm

I absolutely agree with this. There is very clear guidance from the Python community that repr() should not be the same as str().

Python 2 docs say:

Return a string containing a printable representation of an object. This is the same value yielded by conversions (reverse quotes). It is sometimes useful to be able to access this operation as an ordinary function. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.

Python 3 docs say:

Return the canonical string representation of the object.

For many object types, including most builtins, eval(repr(obj)) == obj.

See also

Which can be summarized as the goal of repr() is to be unambiguous and the goal of str() is to be readable.

jbosch · January 24, 2017, 8:45pm

I think what I really want to know is which of these should be concise and which should be multi-line for heavyweight objects (e.g. a Schema or a Footprint); the problem with the official and broadly accepted Python guidelines (at least as I read them) is that they don’t actually tell me how to make that call. Given the places repr is used (especially by pybind11), it seems like it needs to be concise, and certainly not multi-line. But a short summary isn’t unambiguous, and most examples in those guidelines involves very simple classes with a small repr and an even smaller str.

timj · January 24, 2017, 8:59pm

I would say that __str__ should be as verbose as you are comfortable with appearing if someone just wants to print it to the screen. The Python 2 docs seem to suggest that it’s preferable for the __repr__ to return something that is evalable even if that means it’s quite long. If that is not practical then they want something concise (with the class name). It seems angle brackets are what Python 2 signifies to indicate which representations are evalable and which are informational.

It’s not entirely clear to me that numpy is a good example. For small arrays numpy shows all the contents, with large arrays it uses ... in the middle.

parejkoj · January 24, 2017, 11:15pm

I’d argue that numpy is a good example: it prints enough to be useful, and prints a summary when that’s not feasible. Plus, it is configurable.

pschella · February 28, 2017, 3:43pm

See RFC-298 for my suggestion.