Format C++ code using clang-format

fred3m · December 9, 2016, 12:18am

Let me know where to find that parrot! Perhaps “require” was a poor choice of words.

What I meant is that I would like an official LSST python and C++ linter script that is easy to setup on our machines that would make sure that every commit is formatted in accordance with our style guide, and that everyone is expected to adhere to the same standard. If they have their own linter or parrot that can do this, that’s great. Although without mandating the script (either on the developers machine or on the server), it still places the reviewer responsible for checking formatting, which is the situation I wish to avoid. At the very least I would like to see code pushed to master automatically formatted to take the onus off the reviewer.

As a reviewer it’s frustrating reviewing code that has an extra commit to fix linting errors that have nothing to do with the current ticket. If we pick a set of automatic scripts we can run them on the entire stack and because the script is run with every commit or push to master, the formatting will always be correct.

Otherwise, what happens if my cat eats their parrot?

jbosch · December 9, 2016, 3:16pm

One big thing we lose by doing automatic reformatting of existing code is the ability to easily do git blame. I’m not sure that concern outweighs the advantages that have been expressed here, but it makes me lean a little bit towards an incremental/new-code approach to using automatic formatting tools (and linters, for that matter).

pschella · December 9, 2016, 3:32pm

I agree, losing the ability to use git-blame would be sad indeed.

timj · December 9, 2016, 4:46pm

you wouldn’t lose it as such. The information is still there, but you’d need to trampoline through the history a bit. It’s a shame git blame doesn’t have an ignore whitespace changes option.

kfindeisen · December 9, 2016, 7:18pm

I imagine some auto-formatters are more stable (in the sense that, once formatted, it won’t make further changes to indentation etc.) than others. I have no idea how clang-format scores in that respect.

jbosch · December 9, 2016, 7:49pm

I had been assuming it was pretty stable, but I don’t actually know that. I was much more concerned about making it harder to git blame old code by reformatting even when we weren’t changing anything else.

fred3m · December 9, 2016, 8:10pm

To a certain extent this has already happened on the python side. When we did the python 3 port we ran autopep8 on all of the python code. Even if people are just reformatting the files they are currently working on this still has the same affect on the ability to run git blame as it would to reformat the entire stack. In the latter case it would just happen faster.

brianv0 · December 9, 2016, 8:38pm

I haven’t tested it, but git blame -w can supposedly ignore white space and git blame -w -M can detect line movement. This doesn’t help with blame in github but the UI is powerful in that case anyway.

parejkoj · December 14, 2016, 1:23am

Note that just as with python before we ran autopep8, our C++ almost certainly does not currently conform to our current coding standards. So, whether we automatically reformat or have people do it manually, there’s going to be a lot of code to cleanup to even the current standard.

Having pre-commit hooks would be nice, but doing it inline with editor linters is also extremely nice, and helps train you to write it correctly in the first place.

pschella · December 14, 2016, 5:51pm

To give some flavour of what things would look like, I ran two files from afw (Image.h and Image.cc) through clang-format in three configurations:

style=google
style=google but with 4 spaces (instead of 2) for indentation
style=file with a custom style (derived from the Google style) as close to LSST’s as I could make it without spending a lot of time on it

The results are attached:

google-4space-Image.cc (35.5 KB)
google-4space-Image.h (26.4 KB)
google-Image.cc (35.2 KB)
google-Image.h (25.9 KB)
lsst-Image.cc (35.5 KB)
lsst-Image.h (30.6 KB)
orig-Image.cc (34.5 KB)
orig-Image.h (28.2 KB)

jbosch · December 14, 2016, 6:03pm

Thanks, @pschella!

The actual styles seem perfectly fine to me. While they do differ from what we’re doing now, they’re no less readable, and certainly the custom-style one is similar enough to what we’re doing now that I don’t think it’ll lead to any slowdown in cognition, even we kept existing files as-is until we had cause to modify them.

What’s striking to me is how much apparently isn’t handled by clang-format, in that it’s apparently left inconsistent within these files (which were good choices, by the way, given how much internal inconsistency they started with!):

Sometimes the stuff within a namespace block is indented, sometimes it isn’t.
Sometimes there’s an extra newline between functions, sometimes there isn’t.

Do you know if these are just aspects that clang-format won’t ever pay any attention to? Or are they just things missing from the google style specification?

pschella · December 14, 2016, 6:46pm

The namespaces inconsistency is just an option (I think). But my quick search didn’t yet reveal an easy way to set the number of blank lines between function declarations. That does not mean it is impossible though.

pschella · December 14, 2016, 9:25pm

Just in case people want to try it out I attached the format file that closest matches the LSST style. To use it rename it to .clang-format (without the extension) and copy it into the top level directory of your source tree. Then run clang-format -i -style=file mycode.h.

When given the file option it will look for a .clang-format file in the current directory and then all the way up the tree.

clang-format-lsst.txt (2.7 KB)

rowen · December 14, 2016, 9:47pm

Thank you @pschella. I agree with @jbosch that these all look fine to me. I would like to see the effects on some pybind11 code containing lambda functions because that tends to have very long lines. But my suspicion is we can live with any of these.

For the record: I would rather have all our code cleaned up to be readable and uniform, rather than have a mishmash of old and new. Using flags with git blame seems a small price to pay for readability.

pschella · December 14, 2016, 10:22pm

So, I’d like to wrap this up in an RFC. In accordance with @swinbank’s and @fred3m’s comments I think we should add a style file to the DM guide (version controlled with the guide itself) and then change section 6 (concerning code layout) to read “code shall be formatted identically to that produced by running it through clang-format with the specified style file”.

timj · December 14, 2016, 10:23pm

I imagine you have to retain bits in the code layout part of the style guide that won’t be managed by clang-format (or make clear that the only rules are those enforced by clang-format).

pschella · December 14, 2016, 10:26pm

Yes, and I would go with the former option. Remaining rules are in addition to running clang-format (or having your parrot type the code in the same way).

swinbank · December 14, 2016, 10:56pm

Sounds wonderful to me.

(I wrote “great” first time, but I needed to be more effusive in my praise in order to meet the minimum character count!)

price · December 15, 2016, 2:52am

It looks like @pschella’s example LSST-style formatting puts spaces around the * operator, which is verboten for us.

If we put clang-format in a commit hook, how do we take care of the differences we have in our standard compared to what it produces?

What is involved in building clang-format? If it’s going to be run by the commit hook, we’re all going to have to build it.

ktl · December 15, 2016, 4:03am

Verboten for some of us (who have been pretty vocal and influential in the past). If these spaces can’t be disabled without significantly decreasing the utility of the tool overall, then I would lean towards giving up this LSST-unique rule.