Python properties and mutability/constness

As we inch towards more Pythonic interfaces, we’re going to repeatedly encounter problems in which a property needs to expose an immutable view to a mutable object.

For instance, let’s imagine that we have a Point class with x and y get/set properties, and a Box class with a get-only max property that returns a new Point (perhaps because it’s computed from the minimum point and the dimensions, rather than held by the box). This situation leads to the following surprising behavior:

box = Box(Point(2, 3), Point(5, 6))
box.max.x = 1    # silently does nothing

This is a problem intrinsic to Python, and I think it’s one of the reasons many of Python’s built-in types (str, complex) are immutable, and part of why there are immutable versions of other types (set and frozenset).

We already have this same problem with our getters, of course, but getters at least somewhat imply that the returned object is a copy; to my eyes, the following doesn’t read like something that should be automatically expected to work, even though it has the same behavior (and the same syntax would work for other getters):

box.getMax().setX(1)  # also silently does nothing, but maybe that's not as bad?

I think we have a few options here:

  1. Ignore this problem, aside from documenting it everywhere we can. This would be a horrible sin in C++, but maybe it’s just one part of Python’s “you can’t expect the language to stop you from shooting yourself in the foot” philosophy?

  2. Make all of our small, frequently-used classes (like both Point and Box) immutable in Python, turning those silent failures into helpful exceptions. Of course, this comes at the expense of never being able to modify a Point or Box in-place in Python, which might be a significant inconvenience in other code.

  3. Make immutable and mutable versions of all of our small, frequently-used classes. That’s a bit more work in the wrappers, and it still leaves one edge case that’s slightly confusing:

    point = box.max  # this returns FrozenPoint, but we probably wanted a copy
    ... # lots of other code, in which we forget where "point" came from
    point.x = 1  # this now throws an exception
    

I think I have a slight preference for (3), at least for classes as ubiquitous as Point and Box, but my mind is really not made up. For complex classes like Psf and Wcs that are nevertheless held frequently by other objects, I’d lean towards (2).

Does anyone else have an opinion on this, or some wisdom from other Python projects?

This is a fun question!

My first thought to sidestep the problem altogether by avoiding using properties to return derived values in the way you’re suggesting.

This is consistent with the library code: we don’t access my_list.max, but rather call max(my_list); we don’t access my_string.islower but call my_string.islower().

The property construction gives us the capability to make derived values that look like attributes. But values like max in the example you give don’t behave like attributes, so taking advantage of this capability is just confusing, not “pythonic”.

I would recommend making as many of your Python classes immutable as possible. The way Python syntax works (as you point out here, but also it other subtle ways) things just work better if you treat all objects as immutable. Mostly this is because everything in Python is a shared pointer, which means it is sometimes hard to keep track of which copies are the same and which are different. So if you change something, which other things got changed too? If everything, or at least most things, are immutable, then it’s not so complicated to anticipate what the code does. Functions that make some change return a new object with the requested changes. e.g. sed = sed.atRedshift(z), not sed.redshift = z.

When I realized this some time ago, I went through all the GalSim code and made everything immutable except for Image (since it would be onerous to make copies, and writing to images is kind of the point of GalSim after all) and the Random Deviates (since they are constantly changing their internal state every time you use them). Everything else has no mutating methods, which makes the code flow much more clearly than when some of them had mutating methods.

We don’t always enforce the immutability., so the user can shoot themselves in the foot by changing attribute values. But the recommended usage implies that this isn’t something that you should do. And of course, if you are worried about it, you can (sort of) enforce it with properties, giving things getters but no setters.

1 Like

I’m in the “make as many things immutable as possible” camp as well, I think.

That said, for your particular example, the numpy version is illustrative:

In [1]: x=np.random.random(10)

In [2]: x.max
Out[2]: <function max>

In [3]: x.max()
Out[3]: 0.98590735501586324

So, it’s not clear that max should be a property. A function that returns a (new, possibly immutable) point seems appropriate here.

Similarly, if it really should be modifiable, a property is a good fit, and numpy’s shape gives useful guidance:

In [4]: x.shape
Out[4]: (10,)

In [5]: x.shape = (5,2)

In [6]: x.shape
Out[6]: (5, 2)

In [7]: x
Out[7]:
array([[ 0.53197771,  0.89145647],
       [ 0.56458395,  0.98590736],
       [ 0.17718251,  0.16371767],
       [ 0.36506045,  0.21573979],
       [ 0.44252732,  0.33397569]])

In [8]: x.shape = (3,2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-3457d4669faf> in <module>()
----> 1 x.shape = (3,2)

ValueError: total size of new array must be unchanged

In general, +1 on immutability. Update-in-place poses provenance and other problems for persisted data; while the same principles need not apply directly to in-memory objects, I think they are still useful.

Immutability is pretty popular here, and I expect that I’ll eventually get around to writing a detailed RFC for making many important afw classes immutable - unless someone else beats me to it (you’re all welcome to).

I suppose the next question is whether they should be truly immutable in both C++ and Python, or just immutable in Python? I believe the latter would still address the “silent no-op” concerns I brought up, but full immutability is necessary to deal with the aliasing/predictability problems @rmjarvis brought up. I don’t think there are any hard rules you can make here - it needs to be a class-by-class decision - but I think it’s a good general principle to aim for full immutability for things you pass by shared_ptr in C++ (because aliasing questions are more common) and Python-only immutability for things you pass by value in C++ (because mostly you’ll have deep copies and hence aliasing isn’t a problem). Are there any other arguments on this subject that I’ve missed?

As for avoiding properties for computed values, I think that’s a good principle in general, but I don’t like how it applies to the Box example for two reasons:

  • It’s an implementation detail whether Box uses {min, dimensions} or {min, max} (or something else) as its internal representation, so choosing any of {min, max, dimensions} to be a method while the others are properties is encapsulation breaking. Maybe that just means none of them can be properties, but I suspect people will intuitively expect them to have attribute-like access (since any of them could in fact be an intrinsic attribute), and intuitive interfaces are good.
  • It’s easy to imagine a Box class in which all of {min, max, dimensions} are get/set properties, even if only two of them are used in the internal representation, if the setter for the third is non-trivial. If we have both a getter and a setter for a quantity, most users will be even more bothered by the lack of a property; we don’t want everyone to have to go through this conversation to understand this.

Ok, your last two bullets then strongly suggest to me that we should have Box and ImmutableBox (or somesuch), with the same properties, but Immutable doesn’t have the setters.

@jbosch Your last two bullets make me think that Box should be immutable, with {min, max, dimensions} as properties and constructors (or factory functions) that can create them from different combinations thereof. Analogous to a tuple.

I don’t think my last post really makes a strong case either for or against making Box immutable - rather, it makes a strong case for making Point immutable: it’s code like this:

box.min.x = 3

that we specifically need to disallow (since it’s impossible to make it do the right thing), regardless of whether

box.min = Point(3, 4)

is supported.

I do happen to think Box should be immutable as well - or at least there should be an immutable Box, even if it isn’t the only one - but to make that case, you need to look at e.g. Image:

image.bbox.dimensions = Extent(5, 6)

causes a similar problem, as there’s no good way to make that do the right thing.

2 Likes

I am strongly in favor of making Point (and Angle) immutable. I worry a bit about making Box immutable for one reason: a common idiom for constructing a box is:

box = Box()
loop over some data:
  compute a point or box, possibly using a lot of code
  box.include(point-or-box)

There are obvious ways to support this with an immutable Box, but the ones I have thought of seem a bit clumsier:

  • Box.include returns a new Box (which it would have to do in any case). This is natural but I worry about inefficiency for a large number of points.
  • We could offer a Box constructor that takes a vector of ptr-to-Point (or a vector of ptr-to-Box). in C++ this is a bit clumsy as we have to declare a vector of ptr-to-Point (or Box) and populate that before constructing the Box.

Anyway, this one objection aside, I would be thrilled to have Box immutable.

This is my main concern about a purely immutable Box as well, and I think the most elegant solution is probably to have both Box and ImmutableBox.

It seems a shame to have a mutable Box just for this one use case, but you are probably right. Two naming suggestions:

  • Box is immutable (because it will be the more common object); MutableBox is not.
  • Name the immutable version FrozenBox like Python frozenset

Seems like @rowen’s use case is easily supported with a Box constructor from a list.

included_points = []
loop over some data:
    compute a point or box, possibly using a lot of code
    included_points.append(point-or-box)
box = Box(included_points)  # or Box.from_list(included_points)

This doesn’t seem any less clean than the workflow you suggested, but would allow Box to be immutable.

My concern was more about C++ than Python, e.g. the hassle of declaring a vector of points. But declaration isn’t so bad with auto. On the whole I agree with you. I guess I’d rather make Box immutable than have two flavors.

Good point, but I think it’d be important to make sure the Box constructor actually takes an iterator (or iterator range in C++), rather than a container, to make sure this approach scales well when the number of points is very large. Making that work across the C++/Python boundary is a bit trickier, but still quite doable.

C++11, I think we can use initializer lists to make the syntax really quite nice:

Box box({point1, point2, point3, point4});

@jbosch I suspect we can get away with not supporting iterators in Python as arguments to Box, if it proves too messy. We don’t presently have any efficient way to create a box in Python that contains a large number of points; supporting a sequence of points would be an improvement, even if they all had to be in memory.

I feel if we’re worrying about the performance of Box in Python it points to a design issue (or an attempt to think of python as C++).

E.g, should the particular loop example up there be expressed using array arithmetic?

I agree that we probably only care about this performance detail in C++. But Pythonism has a very strong preference for not using sequences when iterators will do, and from an ideal design perspective this is no exception. Whether it’s easy enough for us to implement is another question, and one highly dependent on whether we’re doing it in Swig (not worth it), pybind11 (easy), or Cython (probably easy?).

My (not at all universally-held) opinion is that something like this should almost never be written as an array operation in Python, even if it’s faster.

If it needs to go faster than a Python for loop would allow, it should be moved down to C++. NumPy array vectorization is great for simple mathematical operations and boolean indexing, but if it leads to Python code that is harder to read than the equivalent C++, it’s a probably a bad idea.