This post is in a spirit somewhere between #software-dev and #dm-tea-time; it’s only tangentially relevant to current LSST DM work, and I’m not actually proposing any changes or actions here.
I’ve spent the last few years quite excited about all of the changes happening in C++ since the C++11 standard: the language is finally addressing a lot of long-standing issues and catching up with a lot of newer languages in terms of providing now-standard features.
It wasn’t until a year or so ago, however, that I started really putting in a concerted effort to improve my ability to use all of that new stuff via more than just learning-while-coding. I can happily say that other DM developers played a big role in nudging me out of what was actually some pretty serious complacency (particularly discussions with @natelust about new languages, with @pschella about new C++ language features, and with @kfindeisen about thread- and exception-safety). And after paying a lot more attention to a lot more resources on the web, the result is that I’m absolutely a much better C++ developer than I was two or three years ago, and I think the last I time I could honestly say that was probably in the late 1990s.
Unfortunately, I’m also less confident than I’ve ever been that the code I’m now writing is good code. That’s partly a statement about learning some humility, but it’s largely about not knowing what “good code” means in modern C++, especially in the scientific software world. Does it mean heavily using new language features (lambdas, rvalue references, noexcept, emplace, algebraic data types, …) that can make code safer, more performant, and more expressive for experienced eyes that can see past the awkward syntax and visual clutter? Or does it mean using those features only when there’s a demonstrated benefit relative to a simpler, more old-fashioned style? Does it mean using more functional programming idioms, since that’s where the language seems to be heading, even if that means it’ll be less comprehensible to most readers [at first]? All I’m sure of is that it means a lot more comments than I’m in the habit of writing right now, because someone reading my code is going to either be confused about what my code does or be confused about why I didn’t do it the recommended-by-expert-X way.
This is a pretty common concern in the broader C++ world recently; it’s not unrelated to Stroustrup’s recent Remember the Vasa essay, and I think it’s fair to say that it’s the problem the C++ Core Guidelines are being created to address. But over the past few months I’ve been reading the Core Guidelines front-to-back, and that just left me more depressed about the state of C++ style; I learned a ton, but:
- I discovered a lot of areas where it seems the community of experts who know more than I do can’t decide what to recommend (for good reasons, not just bikeshedding);
- What they do recommend differs quite a bit from what anyone could have done before C++14 (and by extension a lot of what’s in the Meyers books), and in many places focuses on anticipating language features that won’t arrive until C++20;
- What they do recommend differs quite a bit from our own style guide, especially on topics we’ve updated since C++11 (some examples: C2, C20, ES23);
- I cannot imagine a scientist or scientific software developer who doesn’t also treat C++ as a hobby learning all of this (e.g. the quite reasonable but extremely subtle distinctions in rules R32-R36, which also happen to differ from our own much-simpler-but-99%-ok recommendation on that topic).
That last point is the big one. I didn’t bring up the Core Guidelines because I think we need to update our style guide to improve compatibility with it (though perhaps we should); I’m making them because I’m really starting to doubt that C++ can be safely and scalably used by big scientific projects, given the “more interested in science/algorithms” profile of the developers we can realistically hire and retain.
When you also factor in the disaster that is hybrid C++/Python build/package/distribute tooling, I’m really not sure how I’d approach a project like ours anymore, if I was starting from zero. I used to be an unabashed advocate of our style of hybrid Python/C++, in which you write the core data structures and primitives in C++ and wrap them so developers in both languages can use them, motivated by the notion that even if most algorithmic code isn’t in C++, what is is important and difficult to get right, so it’s worth the effort to give it access to nice data structures and primitives. But that involves a lot more people writing C++ than the Astropy/Cython approach, where the compiled code is minimized but it only has plain old data and raw array pointers to work with. I’m still pretty confident that the latter is not for me - figuring out how to do “zero-cost abstractions” and “don’t pay for what you don’t use” (two core C++ development philosophies) when solving those important, difficult, not-for-Python problems is frankly a big part of what makes development fun for me. But I’m no longer at all certain that the approach I like is a good one for most people or projects.
That’s probably a moot point for LSST Construction - I think it’s too late to change tack completely now, and I’m by no means convinced that the Astropy/Cython approach would have been a better one for us, even if we’d adopted it years ago. The grass looks pretty brown on both sides of this fence.
But I’m bothered by not knowing what I’d recommend for new projects, in part because LSST is going to last a while, and a lot of the code we have now may end up being replaced at some point; I’d like to have a sense that that’s moving in the right direction. It also bothers me because I used to be sufficiently confident in the heavy-C++ with Python model that I put what limited free open-source dev time I had into making it better (via ndarray), and I’m just not excited about doing that anymore.
Maybe the C++ standard will settle down after C++20, and the Core Guidelines (and associated tooling) will improve. Maybe the answer for future projects is hybrid Rust/Python, which I find intriguing but have only barely played with. Maybe the answer involves Julia, Go, Swift, or something in JVM-land, none of which I’ve explored well enough to fairly judge (though I’ll admit to a possibly-irrational bias against any language that treats “within a factor of a few of C” as a success).
Anyhow, sorry about all that bad C++ code I wrote years ago. And sorry about all the better-but-still bad code I’m probably writing now. I’m curious to hear how you all think we can make it better.