Format C++ code using clang-format

I haven’t tested it, but git blame -w can supposedly ignore white space and git blame -w -M can detect line movement. This doesn’t help with blame in github but the UI is powerful in that case anyway.

Note that just as with python before we ran autopep8, our C++ almost certainly does not currently conform to our current coding standards. So, whether we automatically reformat or have people do it manually, there’s going to be a lot of code to cleanup to even the current standard.

Having pre-commit hooks would be nice, but doing it inline with editor linters is also extremely nice, and helps train you to write it correctly in the first place.

To give some flavour of what things would look like, I ran two files from afw (Image.h and Image.cc) through clang-format in three configurations:

  • style=google
  • style=google but with 4 spaces (instead of 2) for indentation
  • style=file with a custom style (derived from the Google style) as close to LSST’s as I could make it without spending a lot of time on it

The results are attached:

google-4space-Image.cc (35.5 KB)
google-4space-Image.h (26.4 KB)
google-Image.cc (35.2 KB)
google-Image.h (25.9 KB)
lsst-Image.cc (35.5 KB)
lsst-Image.h (30.6 KB)
orig-Image.cc (34.5 KB)
orig-Image.h (28.2 KB)

1 Like

Thanks, @pschella!

The actual styles seem perfectly fine to me. While they do differ from what we’re doing now, they’re no less readable, and certainly the custom-style one is similar enough to what we’re doing now that I don’t think it’ll lead to any slowdown in cognition, even we kept existing files as-is until we had cause to modify them.

What’s striking to me is how much apparently isn’t handled by clang-format, in that it’s apparently left inconsistent within these files (which were good choices, by the way, given how much internal inconsistency they started with!):

  • Sometimes the stuff within a namespace block is indented, sometimes it isn’t.

  • Sometimes there’s an extra newline between functions, sometimes there isn’t.

Do you know if these are just aspects that clang-format won’t ever pay any attention to? Or are they just things missing from the google style specification?

The namespaces inconsistency is just an option (I think). But my quick search didn’t yet reveal an easy way to set the number of blank lines between function declarations. That does not mean it is impossible though.

Just in case people want to try it out I attached the format file that closest matches the LSST style. To use it rename it to .clang-format (without the extension) and copy it into the top level directory of your source tree. Then run clang-format -i -style=file mycode.h.

When given the file option it will look for a .clang-format file in the current directory and then all the way up the tree.

clang-format-lsst.txt (2.7 KB)

Thank you @pschella. I agree with @jbosch that these all look fine to me. I would like to see the effects on some pybind11 code containing lambda functions because that tends to have very long lines. But my suspicion is we can live with any of these.

For the record: I would rather have all our code cleaned up to be readable and uniform, rather than have a mishmash of old and new. Using flags with git blame seems a small price to pay for readability.

So, I’d like to wrap this up in an RFC. In accordance with @swinbank’s and @fred3m’s comments I think we should add a style file to the DM guide (version controlled with the guide itself) and then change section 6 (concerning code layout) to read “code shall be formatted identically to that produced by running it through clang-format with the specified style file”.

3 Likes

I imagine you have to retain bits in the code layout part of the style guide that won’t be managed by clang-format (or make clear that the only rules are those enforced by clang-format).

Yes, and I would go with the former option. Remaining rules are in addition to running clang-format (or having your parrot type the code in the same way).

Sounds wonderful to me.

(I wrote “great” first time, but I needed to be more effusive in my praise in order to meet the minimum character count!)

It looks like @pschella’s example LSST-style formatting puts spaces around the * operator, which is verboten for us.

If we put clang-format in a commit hook, how do we take care of the differences we have in our standard compared to what it produces?

What is involved in building clang-format? If it’s going to be run by the commit hook, we’re all going to have to build it.

Verboten for some of us (who have been pretty vocal and influential in the past). If these spaces can’t be disabled without significantly decreasing the utility of the tool overall, then I would lean towards giving up this LSST-unique rule.

That rule also causes us problem with PEP8 in Python.

Yes, but at least the tool can be configured to disable the space addition. It wasn’t clear that clang-format could be configured that way (I looked at the style file but not the documentation).

In the python case, E226 disables application of the “whitespace around operators” rule for all operators, not just * and /.

I’d be more than happy to drop the spaces-around-operators rules if we switch to automatic formatting.

If we don’t switch to automatic formatting, I’d still like to remove them or at least downgrade them to recommendations; in some contexts adding spaces around * and / can greatly improve readability, especially when there is no + or - in the expression or the operands themselves involve operators like [].

4 Likes

It doesn’t look like it is possible to disable spacing around specific operators with clang-format. I’m sure we could patch the tool to do it and commit those changes upstream (or build our own tool with libformat). But it looks like none of the major C++ styles have this rule, so I personally would prefer dropping it.

1 Like

I would be sorry to see the rule about spaces around operators be simply discarded out-of-hand. I won’t rehash the same old arguments, but merely say that I think it’s important for the readability of our scientific code. I will admit that there are times when being flexible about it makes things more readable (exceptions that prove the rule?), but I would hate to see it broken completely and forever.

Yes, but it has the side effect of not fixing things at all if they are not compliant with our standard.