Falling out of love with C++

jbosch · October 10, 2018, 4:25pm

This post is in a spirit somewhere between #software-dev and #dm-tea-time; it’s only tangentially relevant to current LSST DM work, and I’m not actually proposing any changes or actions here.

I’ve spent the last few years quite excited about all of the changes happening in C++ since the C++11 standard: the language is finally addressing a lot of long-standing issues and catching up with a lot of newer languages in terms of providing now-standard features.

It wasn’t until a year or so ago, however, that I started really putting in a concerted effort to improve my ability to use all of that new stuff via more than just learning-while-coding. I can happily say that other DM developers played a big role in nudging me out of what was actually some pretty serious complacency (particularly discussions with @natelust about new languages, with @pschella about new C++ language features, and with @kfindeisen about thread- and exception-safety). And after paying a lot more attention to a lot more resources on the web, the result is that I’m absolutely a much better C++ developer than I was two or three years ago, and I think the last I time I could honestly say that was probably in the late 1990s.

Unfortunately, I’m also less confident than I’ve ever been that the code I’m now writing is good code. That’s partly a statement about learning some humility, but it’s largely about not knowing what “good code” means in modern C++, especially in the scientific software world. Does it mean heavily using new language features (lambdas, rvalue references, noexcept, emplace, algebraic data types, …) that can make code safer, more performant, and more expressive for experienced eyes that can see past the awkward syntax and visual clutter? Or does it mean using those features only when there’s a demonstrated benefit relative to a simpler, more old-fashioned style? Does it mean using more functional programming idioms, since that’s where the language seems to be heading, even if that means it’ll be less comprehensible to most readers [at first]? All I’m sure of is that it means a lot more comments than I’m in the habit of writing right now, because someone reading my code is going to either be confused about what my code does or be confused about why I didn’t do it the recommended-by-expert-X way.

This is a pretty common concern in the broader C++ world recently; it’s not unrelated to Stroustrup’s recent Remember the Vasa essay, and I think it’s fair to say that it’s the problem the C++ Core Guidelines are being created to address. But over the past few months I’ve been reading the Core Guidelines front-to-back, and that just left me more depressed about the state of C++ style; I learned a ton, but:

I discovered a lot of areas where it seems the community of experts who know more than I do can’t decide what to recommend (for good reasons, not just bikeshedding);
What they do recommend differs quite a bit from what anyone could have done before C++14 (and by extension a lot of what’s in the Meyers books), and in many places focuses on anticipating language features that won’t arrive until C++20;
What they do recommend differs quite a bit from our own style guide, especially on topics we’ve updated since C++11 (some examples: C2, C20, ES23);
I cannot imagine a scientist or scientific software developer who doesn’t also treat C++ as a hobby learning all of this (e.g. the quite reasonable but extremely subtle distinctions in rules R32-R36, which also happen to differ from our own much-simpler-but-99%-ok recommendation on that topic).

That last point is the big one. I didn’t bring up the Core Guidelines because I think we need to update our style guide to improve compatibility with it (though perhaps we should); I’m making them because I’m really starting to doubt that C++ can be safely and scalably used by big scientific projects, given the “more interested in science/algorithms” profile of the developers we can realistically hire and retain.

When you also factor in the disaster that is hybrid C++/Python build/package/distribute tooling, I’m really not sure how I’d approach a project like ours anymore, if I was starting from zero. I used to be an unabashed advocate of our style of hybrid Python/C++, in which you write the core data structures and primitives in C++ and wrap them so developers in both languages can use them, motivated by the notion that even if most algorithmic code isn’t in C++, what is is important and difficult to get right, so it’s worth the effort to give it access to nice data structures and primitives. But that involves a lot more people writing C++ than the Astropy/Cython approach, where the compiled code is minimized but it only has plain old data and raw array pointers to work with. I’m still pretty confident that the latter is not for me - figuring out how to do “zero-cost abstractions” and “don’t pay for what you don’t use” (two core C++ development philosophies) when solving those important, difficult, not-for-Python problems is frankly a big part of what makes development fun for me. But I’m no longer at all certain that the approach I like is a good one for most people or projects.

That’s probably a moot point for LSST Construction - I think it’s too late to change tack completely now, and I’m by no means convinced that the Astropy/Cython approach would have been a better one for us, even if we’d adopted it years ago. The grass looks pretty brown on both sides of this fence.

But I’m bothered by not knowing what I’d recommend for new projects, in part because LSST is going to last a while, and a lot of the code we have now may end up being replaced at some point; I’d like to have a sense that that’s moving in the right direction. It also bothers me because I used to be sufficiently confident in the heavy-C++ with Python model that I put what limited free open-source dev time I had into making it better (via ndarray), and I’m just not excited about doing that anymore.

Maybe the C++ standard will settle down after C++20, and the Core Guidelines (and associated tooling) will improve. Maybe the answer for future projects is hybrid Rust/Python, which I find intriguing but have only barely played with. Maybe the answer involves Julia, Go, Swift, or something in JVM-land, none of which I’ve explored well enough to fairly judge (though I’ll admit to a possibly-irrational bias against any language that treats “within a factor of a few of C” as a success).

Anyhow, sorry about all that bad C++ code I wrote years ago. And sorry about all the better-but-still bad code I’m probably writing now. I’m curious to hear how you all think we can make it better.

price · October 10, 2018, 4:49pm

I still remember the first time I reviewed @jbosch’s C++ code: it was the most beautiful C++ I had ever read. If he’s upset with that, then have pity on me for the junk I write.

pschella · October 10, 2018, 4:55pm

Thank you for writing this. I fully agree. C++ is slowly morphing into a different language, albeit while retaining compatibility with older-style code, and it is still an open question if it will be good for our purpose when (and if it ever) gets there.
I got the feeling (at CPPCON2017) that the wider C++ community cares about the inconsistency and complexity problem and is trying to develop some more cohesion, but it is not entirely clear (to me) yet if it will manage to do so.
Unfortunately I also share your frustration about no longer knowing what to recommend to new projects. Clearly C++ with Python (and pybind11) is not ideal. But I do believe it is currently still the best of a list of bad options.
I actually think the main problem is that it is far too easy to write slow code in Python, and what is needed is a new simple language that addresses this problem specifically for modular algorithmic code.
I was hoping this would be Julia, but it seems to have drifted away from us a bit.

parejkoj · October 10, 2018, 4:58pm

As someone who, until joining LSST, knew only enough to write C++ code that looked like C-with-classes, I really appreciate this post.

The “magic factor” (how likely it is that a not-total-newbie developer would be able to write acceptable code) of much of what is “good C++” code feels very high to me. Some of that is certainly due to inexperience on my part, and some is due to how quickly the language is evolving, but it definitely presents a high bar for new LSST developers. I care a lot about the project producing understandable, maintainable code, but requiring a very deep knowledge of the language to even begin to understand how some constructs work makes long term maintainability much harder. I wonder what some of the big data companies (Amazon, Google, Tableau) have to say about this topic? They can afford to pay C++ developers more, and the C++ pool is certainly much larger outside of astronomy.

I’m with you on not being sure of the “correct” answer: the astropy/cython model has a lot of things going for it in terms of code simplicity, but not all problems map well into that space, and memory constraints can become a major factor when working with our quantity of data. Plus the unfortunate situation with python threads/processes makes for unpleasant multiprocessing options.

parejkoj · October 10, 2018, 5:19pm

To clarify my “magic factor” comment a bit as I think about it more: when pair coding with Jim, I often find myself feeling simultaneously that the code we’re writing really is “good C++”, and that I’m always just on the cusp of fully understanding what it’s doing, but that I’ll never be able to get there. That makes me fear that either C++ is altogether too clever for its own long-term good, or that I’m just fundamentally not smart enough to ever totally grok it.

frossie · October 10, 2018, 5:39pm

Regarding the “what would you recommend to a future project” - if LSST was starting today I wouldn’t hesitate to recommend “do it all in Python”. Not because it is the right answer for everything right now (though sure, cython helps a bit) but because I do have the feeling something is coming (maybe rust will take off or Julia or something else entirely or something better in the cython space or some amazing infrastructural advance where it doesn’t matter or maybe python will evolve - who knows) and so the simplicity of a single popular language stack will make the introduction of a new language to that stack simpler down the road.

I suspect lambdas are here to stay whatever the language, and would take them out of the list of things that we want to protect regular developers from.

From the consumer point of view, you are right. When I was running telescope ops I could troubleshoot bugs in a number of languages, a far bigger list than I would feel comfortable implementing something from scratch in; I could fix a bug with the java code even though I am not a java programmer because I could read it, even though I couldn’t really write it. Our C++ code defeats me. This is not necessarily a terrible thing - LSST is a large enough project that one can’t expect a few people to “fix everything” and sometimes difficult problems have difficult solutions. My main worry is that we don’t have the mimimum C++ possible (for understandable reasons, but there we are) so we have made the problem slightly worse than it needs to be.

adam · October 10, 2018, 5:48pm

I have a difficult time wrapping my head around Rust.

I’m a big fan of Go, because it’s like C with 30 years of experience about where C is fragile and with an emphasis on making life easy for the programmer rather than the compiler. But still: it’s like C, it’s strongly opinionated (in that there’s usually one obvious way to do things, there is One True Formatting Style which is what go fmt implements), and it makes it really easy to write straightforwardly maintainable code.

So I’d be interested to play with Cgo + Python bindings, to the degree that anything other than Python is actually needed in today’s computing world.

Yes, I know: no templates. Yes, I know: lots of boilerplate around error handling.

Yes, I know: in most programming languages the type system tells you when you’ve made a mistake. In Go, the type system tells you when Brian Kernighan made a mistake.

But it’s a small language that mostly fits in your head, like C, and very much not like C++.

kfindeisen · October 10, 2018, 5:59pm

A very interesting essay. As somebody who first learned to program in C++ before polishing his OOP credentials with Java, I can sympathize with some of the comments here (in particular, that while C++ has a lot of nice features, they are often handled clumsily for historical reasons).

For the question of “good code”, I still subscribe to the philosophy that the most important property a program can have is ease of modification (not least because it lets you add other desirable traits like user-friendliness or even speed). Simplicity is a big part of what makes code maintainable, and always has been, and that may well mean not relying on new features like rvalue references (though I disagree with @jbosch’s implication that “old-fashioned” coding style is necessarily simple).

However, part of the problem is deciding what counts as “needs experienced eyes” – for example, I personally consider noexcept no harder conceptually than declaring parameter and return types, but I’m used to thinking about code in terms of interfaces and contracts. To a programmer who doesn’t think of exceptions as part of a function’s API, noexcept would seem much less reasonable.

For the choice of project tooling, I think any time you try to combine multiple languages, you’re going to have problems at the interfaces, especially when those languages lie on different parts of the compiled vs. runtime spectrum. So I suppose for new projects I’d recommend picking one language that has most of the traits you want, and sticking with it. There’s no right tool for every job.

natelust · October 10, 2018, 6:00pm

As a comment, I have written a decent about of both rust and go. I really really like both, but for different things.

Go to me seems like python cross c, which can be a great thing (especially for developer time), and I even hear go 2.0 might have some form of generics and better error handling but who knows.

Rust did take me a little bit to get my head around, simply because it is something “new”. I put new in quotes as to some extent nothing is new under the programming sun, but rust puts things together in a different sort of way than most languages. Once you stop thinking you want it to be a C variant, it makes a lot of sense, is a lot of fun, and is some of the most robust, elegant, safe, and fun code I have written. I have not done any wrapping of python in rust, but I have looked at it, and it seems to be what I would expect, and enjoy working with.

That said I think both are great, and I am not for one over the other, or even endorsement of any, just sharing my experience.

kfindeisen · October 10, 2018, 6:03pm

As for the @jbosch’s concerns about where C++ is evolving and the ensuing debates over what should be recommended, I’d like to offer a bit of historical perspective: classes and objects were first introduced in a form we’d recognize in the 1960s. However, it took us literally decades to understand how to use them effectively: for example, the interface segregation principle, which allows object-oriented programming to actually create modular software, was introduced in the 80s or 90s (I’m having trouble pinning down a date), and the Liskov substitution principle, which says when inheritance is the right tool for a job, was developed 1988-1994.

Now that C++ seems to be specializing in relatively unexplored territory like template programming (and metaprogramming ), I think we’re once again in a regime where available tools have moved past the theory that lets us really understand them. People will use these tools in the real world, they’ll find out what works and what was a really bad idea, and eventually they’ll formalize the results in new programming paradigms.

Do we want to be the early adopters who have to figure everything out the hard way? Not my call.

pschella · October 10, 2018, 6:27pm

Although I’m not convinced Go is the right programming language for most Science code I must say that writing in it (for a year or so) was a very enjoyable experience. It is highly intuitive and easy to learn. Almost every time when I had to do something new my guess as to how to write it would be correct first time and a quick glance at the (excellent) documentation would confirm it.
Moreover, it was designed from the ground up for scaleability. Not just in code base size, but also in number of developers. Its structure and tooling takes away a lot of the fuss and (pointless in a Go world) discussions about style.
However, Rust is cool too

pschella · October 13, 2018, 5:37pm

@kfindeisen, I like your suggestion to “picking one language that has most of the traits you want, and sticking with it”. However, when picking any interpreted language (such as Python) that is not actually possible (at least, not that I know). If we could get away with using only the bits that others wrote for us in a different language (e.g. use numpy, scipy and such) but don’t extend them) then that would be true. But I have yet to see a project where that worked for everything without great cost in speed at some parts. Now you could argue that those could be written in Cython or something, but then we are technically back to two languages again. What could be done however is to push the threshold for when to switch to a different language much further than we have, and really only do it when all other avenues for solving the problem have been exhausted and when a clear component to be accelerated has been identified (preferably one that can be reused in many places). To make it more Pythonic I think one of the things that then also has to be let go is the framework nature of things. Having more smaller, loosely coupled components.
One, perhaps extreme, way to encourage this in LSST would be to require an FAQ (or perhaps even something stricter) before anything can be written in C++ instead of the default Python.
Another direction to take would be to pick a language on the compiled end of the spectrum (C++, Java, Go, Rust, take your pick) and write everything in that. But I suspect that, with C++/Java at least, will slow down progress for scientist-coders unless all needed components are somewhat known up front (e.g. rewriting an existing production pipeline into it would probably work fine, and may be a worthwhile endeavor).

pschella · October 13, 2018, 5:41pm

To add a quick thought. To enable only loose coupling. It could be required that any interface accessible from Python only accepts and returns either primitive types or ndarrays thereof.