Format C++ code using clang-format

pschella · December 7, 2016, 6:26pm

Currently our C++ style guide (https://developer.lsst.io/coding/cpp_style_guide.html) has lots of rules relating to formatting. Following these rules consistently takes time. When it is not done consistently the code looks different resulting in a higher cognitive load when reading. Moreover, digesting and the style guide itself also costs time.

In other languages (namely Go) it is common practice to let a tool (e.g. gofmt) handle all code formatting. From my own experience I can say this is extremely convenient. All code will always automatically be formatted exactly the same, and complying with the standards costs zero time. It can even be integrated with an editor save hook to automatically run the formatter when saving a file. This allows you to not worry at all about things like indentation, include ordering, etc.

For C++ the clang-format (http://clang.llvm.org/docs/ClangFormat.html) tool enables this for C++. It has standardized profiles complying with the LLVM, Google and Mozilla coding styles and a flexible configuration system (http://clang.llvm.org/docs/ClangFormatStyleOptions.html) to further specialize those styles. If more control is required the libformat offers that as well.

I would like to suggest that we use this tool to format all our C++ code. Following the recent change of the Python style guide to be a diff against PEP8, the C++ style guide can then just be a diff against whichever style we adopt for clang-format.

Based on the feedback here I would like to RFC this suggestion.

jbosch · December 7, 2016, 6:39pm

Do you have any experience with clang-format itself? This seems like a good idea in principle, and my only concern is whether it’s intelligent enough in practice to make the code as readable as we’d like. I could certainly believe that automatic formatting of go could be an intrinsically easier problem to solve than automatic formatting of C++.

pschella · December 7, 2016, 7:38pm

I have only limited experience using clang-format but from what I have seen it works fine. However, it depends on your expectation.

Because it uses clang to parse the code, it will format the code correctly (i.e. will not break it). However, if you expect it to format code exactly like you would have done it by hand, you will be likely be disappointed. For instance, it will not pick a different allignment style for things like matrices or something as one would perhaps have chosen by hand (although it is possible to disable formatting of a block with a comment directive). Moreover, although configurable, it is probably difficult to reproduce exactly the rules in the current style guide. We could, with the library, but that would in my opinion be a waste of effort. Better pick a base style with some minor tweaks (which I’m aware is in itself a change that some people might object to).

However, the same issues arise with gofmt. Anyone using that at first finds some things they don’t like, or some corner cases where they think different formatting would look better. But the big thing is that you don’t have to worry about it. It will immediately end all debates on formatting and all decision making associated with it while coding. In the end, just being able to write code not thinking about alignment and seeing it all line up consistently (even if it is ugly at times) on save is really really nice.

So, will any individual file look as readable as a handcrafted version? No.
But I think the consistency in itself makes the codebase as a whole more readable.
Moreover, the time gained by not having to think about formatting while coding is (IMHO) substantial.

parejkoj · December 7, 2016, 7:53pm

I very strongly support this idea. I think the first step, though, would be to do the same thing we did with python: figure out what our style guide is mostly closely related to, rewrite it as a diff against that, and then pick a formatter. I don’t know enough about C++ formatters to have an opinion on clang-format vs. others.

The hard part is figuring out what “standard” format our style guide is most similar to and re-writing it as a diff against that. Here’s how we did it for python, using pep8 as the base: https://jira.lsstcorp.org/browse/RFC-162

rowen · December 8, 2016, 12:13am

I agree. I’d love to have a formatter, but I’d like to make sure it is a good one that produces readable code. The pybind11 wrappers seem to produce really long lines that might be just the thing for turture-testing a few candidates or configurations. If we can figure out a configuration for clang-format we are happy with then I’m all for this suggestion.

pschella · December 8, 2016, 4:16pm

For pybind11 I am already quite happy with what clang-format -i -style=Google produces. Feel free to try it out.

pschella · December 8, 2016, 4:40pm

@parejkoj, we could follow that approach. However, unlike what was the case with Python, I suspect that this will be very hard to do in practice given the complexity of the language and the diversity in styles. Compounded by the fact that there isn’t such a thing as the “one true standard” (i.e. PEP8) for C++. Reading all the available style guides alone will be a big task. Coming up with some way to evaluate the relative closeness even more so.

On the plus side, choosing any style even if it is radically different from our current one isn’t a problem. Code reformatting is done fully automatically, and complying with the new style is always just one carriage return away.

Therefore I would suggest that we don’t evaluate styles based on how close they are to our current style, but rather on how sane and readable the resulting code is. And possibly, but not necessarily, on how widely used it is in industry or the open source community. As well as how easy it is to use with whatever tool we pick.

Note that when I say “style”, I don’t mean adopting the full style guide (of say Google). I mean just adopting the part that concerns formatting and is covered by the automatic tool. We can, and should, still have our own style guide for everything else. Basically the tool would replace section 6 (layout and formatting) from the DM C++ Style Guide.

The advantage that clang-format has over other code formatters is that it uses the same parsers as clang and is thus far more likely (I think) to cover all language complexities now and in the future.

price · December 8, 2016, 5:06pm

Our current style was chosen because we find it sane and readable. I like this idea if and only if the style it produces is very similar to what we currently have.

fred3m · December 8, 2016, 6:19pm

I am also a strong supporter of this idea. My only request is that this should be run automatically, such as with a githook. One of the things I don’t like about running a linter in python is that if I make several commits and forget to run the linter then I have a commit that is just formatting code, which is (IMO) highly undesirable. Ideally we would require that both autopep8 and clang-format run automatically every time we make a commit, so that each commit is guaranteed to be formatted properly.

parejkoj · December 8, 2016, 6:37pm

A solution to that is to have the linter run by your editor every time you hit save.

fred3m · December 8, 2016, 6:45pm

Not all editors have linters (and not all editor linters are created equal). I’m thinking that we could have a set of githooks that LSST developers are required to have on their machines.

timj · December 8, 2016, 7:44pm

Can @pschella send round some example formatted code so we can have a look. You might like it.

swinbank · December 8, 2016, 10:31pm

Please mandate outcome, not implementation. If you want to configure your system with a set of hooks, that’s great; if I achieve the same by teaching my performing parrot to retype my code, that should also be great, as long as we get the same result.

fred3m · December 9, 2016, 12:18am

Let me know where to find that parrot! Perhaps “require” was a poor choice of words.

What I meant is that I would like an official LSST python and C++ linter script that is easy to setup on our machines that would make sure that every commit is formatted in accordance with our style guide, and that everyone is expected to adhere to the same standard. If they have their own linter or parrot that can do this, that’s great. Although without mandating the script (either on the developers machine or on the server), it still places the reviewer responsible for checking formatting, which is the situation I wish to avoid. At the very least I would like to see code pushed to master automatically formatted to take the onus off the reviewer.

As a reviewer it’s frustrating reviewing code that has an extra commit to fix linting errors that have nothing to do with the current ticket. If we pick a set of automatic scripts we can run them on the entire stack and because the script is run with every commit or push to master, the formatting will always be correct.

Otherwise, what happens if my cat eats their parrot?

jbosch · December 9, 2016, 3:16pm

One big thing we lose by doing automatic reformatting of existing code is the ability to easily do git blame. I’m not sure that concern outweighs the advantages that have been expressed here, but it makes me lean a little bit towards an incremental/new-code approach to using automatic formatting tools (and linters, for that matter).

pschella · December 9, 2016, 3:32pm

I agree, losing the ability to use git-blame would be sad indeed.

timj · December 9, 2016, 4:46pm

you wouldn’t lose it as such. The information is still there, but you’d need to trampoline through the history a bit. It’s a shame git blame doesn’t have an ignore whitespace changes option.

kfindeisen · December 9, 2016, 7:18pm

I imagine some auto-formatters are more stable (in the sense that, once formatted, it won’t make further changes to indentation etc.) than others. I have no idea how clang-format scores in that respect.

jbosch · December 9, 2016, 7:49pm

I had been assuming it was pretty stable, but I don’t actually know that. I was much more concerned about making it harder to git blame old code by reformatting even when we weren’t changing anything else.

fred3m · December 9, 2016, 8:10pm

To a certain extent this has already happened on the python side. When we did the python 3 port we ran autopep8 on all of the python code. Even if people are just reformatting the files they are currently working on this still has the same affect on the ability to run git blame as it would to reformat the entire stack. In the latter case it would just happen faster.