Fixing PEP8 technical debt at scale with sqre-codekit/autopep8

jsick · March 27, 2016, 4:20am

In RFC-162 we determined the set of PEP8 exclusions that are implied by our existing DM Python Style Guide (thanks @parejkoj, @timj and @frossie!). As @rowen noted, now that we can use linters in our editors for DM python code (and soon we’ll have automated linting of pull requests), there is a large amount of technical debt associated with giving our codebase a clean PEP8 checkout. At this point, PEP8 fixes must be made before any new code can be merged.

autopep8 gives us a way of fixing PEP8 issues against the DM Style Guide:

autopep8 . --in-place --recursive \
   --ignore E133,E226,E228,E251,N802,N803,W391 \
   --max-line-length 110

Given the large number of DM GitHub repos, we can take this automation one step further. In sqre-codekit we developed the infrastructure to clone all DM repos, make and commit code changes, and push those changes to a new branch from a single command. I’ve extended codekit to run autopep8:

lsst-autopep8 -u jonathansick --org lsst --team 'Data Management' \
    --branch tickets/DM-NNNN

(alternatively you can run lsst-autopep8 for a single repo with a --repo argument).

See the code at https://github.com/lsst-sqre/sqre-codekit/pull/3.

An example of what autopep8 does to LSST repos can be found in the https://github.com/jonathansick-shadow test organization (see u/jonathansick/autopep8 branches; here’s afw in particular: https://github.com/jonathansick-shadow/afw/commit/660287ac88f56b5a8e0b506364b3845611f2962c)

Given this:

Do we want to take an automated approach to fixing our PEP8 technical debt? If so,
What is the best way to coordinate the execution of lsst-autopep8 so that there are as few merge conflicts as possible for those with existing tickets in these repos?

Note that lsst-autopep8 only creates a new branch with PEP8 fixes; a manual merge step is still required. One can even imagine pushing additional formatting changes to the branch after autopep8 has done the initial work.

timj · March 27, 2016, 4:37am

It is tempting to take the opportunity to do the license fixes and futurize -1 on the same timescale.

Coordinating the changes for minimal impact with existing ticket branches may be a thorny problem. Ideally we’d sneak it in at the end of one ticket branch and before someone else starts something. That might be feasible for some repos but not all. On the plus side, since it’s automated doing the fixes again on the few ticket branches before merging might be doable.

parejkoj · March 27, 2016, 5:46am

Yes, yes, a thousand times yes.

Is this another “wait until the HSC merge is complete”, thing?

timj · March 27, 2016, 5:54am

That’s pretty much done so no worries there.

jsick · March 27, 2016, 6:33am

Perhaps the best approach is

Poll everyone on what repos they’re actively developing for,
Run autopep8, RFC-45 licensing, and futurize -1 on ‘dormant’ repos and get those merges in.
Work with primary developers on a case-by-case basis to get these changes made to the active repos. Certainly we could ask for autopep8 to be run at the end of ticket branches.
Have some sort of record keeping (ideally we can CI the PEP8/license stuff automatically) to ensure these changes make it by the end of the X2016 cycle.

jsick · March 27, 2016, 7:19am

Brain-dump of other things that could benefit from a mass codebase edit cycle:

Adding a CONTRIBUTING template
Add setup.cfg with PEP8 exception list
Add doc/xml to .gitignore

timj · March 27, 2016, 3:41pm

If we are going to bulk convert I’d probably just remove W391 from the ignore list as well. The main reason for adding it was not existing coding style but prevalence in code base.