Requesting Comments for Design Documentation Format for DM

We’re going to be updating the DM design documentation substantially over the next few months. In order to manage these documents in a controlled manner, storing them in GitHub repositories and accepting changes via pull request seems like a good strategy, making Word and other binary formats undesirable. We also need to link directly to a section of the (rendered) document (e.g. from a JIRA issue).

The DPDD and Applications design document (LDM-151) are currently written in LaTeX. Might it be better to write them in GitHub Flavored Markdown to get instant rendering and linkable section titles? I think the lack of rendered math, numbered sections, footnotes, etc. makes this infeasible. GitHub can also render reStructuredText and AsciiDoc, but I think GitHub’s HTML postprocessor strips <script> tags and thus breaks math rendering, no matter what the input language.

I hate to be the person with a hammer who see lots of nails, but, would you consider reST and Sphinx? With http://readthedocs.org we can have a GitHub commit hook that automatically refreshes the document’s HTML rendering.

Sphinx can certainly do math, numbered sections, and footnotes/sidenotes. Sphinx is also great at providing anchor links at each section so it’s easy to deep-link into a single page. Given this is how our technical docs will (probably) be made, it may make sense to use the same ecosystem for the design document.

A commit hook sounds like just the right solution to get dynamic rendering with full capability, as long as we can get it to work nicely with branches, not just master.

RTD provides multiple built versions that are easy to switch between from the build page itself, see: http://read-the-docs.readthedocs.org/en/latest/features.html?highlight=branch#versions We can have built versions for the HEADs of any branch we specify, or for any tags we specify, in addition to master. For an example, see the bottom right corner of http://docs.astropy.org/en/stable/

If you want I can bootstrap a sphinx repo+RTD account for you. Should only take an hour to get a MVP going. Sound good @ktl and @frossie ?

How difficult would it be to translate https://github.com/lsst-dm/dm_applications_design/blob/master/DM_Applications_Design.tex into reST for us to try things out?

Not hard at all (maybe pandoc can even do it automatically as a first cut). I can seed the LDM-151 repo today. What do you want the repo to be called and in what org? (or just create a GitHub repo and I’ll use it)

(oh, silly comment, I’ll try Sphinx out as a branch of dm_applications_design). Consider it done.

Can @mjuric merge my pull request first? :smile:

1 Like

The main worry I have about keeping things in $ \LaTeX $ is the potential difficulty of linking to sections of the rendered document. We did some experiments that seemed to show that Firefox could link to a section of a PDF but that Safari could not. I believe the Markdown/reST world is much better about that. Perhaps we can do $\LaTeX$/HTML in addition to $\LaTeX$/PDF?

While this is a very worthwhile long-term discussion/brainstorming to have, there are so many other documents (and readers/contributors) that these documents will flow into (or have to be read by/contributed to), and that are all in LaTeX, that I’d be extremely hesitant to change that right now.

For example, the DPDD and the apps design doc (LDM-151) texts need to feed about ~10 or so (white)papers that the PST has identified will need to be written (which will all be in LaTeX). The DPDD will also be (informally) RFC-ed with the science collaborations, with a request for patches. There are also many “traditional” scientists in those groups – not everyone knows ReST, but LaTeX is well understood. Also, we’ve already had fairly successful examples of using LaTeX+git to develop this level of documents (from the SRD, to DPDD itself, to the science book, and most recently to the cadence whitepapers last week). Other than occasional annoyances and a few issues (e.g., deep-linking into sections), LaTeX generally worked well.

Therefore, while there may be a better solution out there for developing these types of documents, I don’t think we know enough about the problem (or to what extent we even have one), to go ahead and develop solutions. I also fear that making such changes right now (we’re tool builders – we love to build tools) will detract us from getting the content in (which is the real burning issue).

So let’s stick to LaTeX for now, learn more about our usage patterns, and evolve the solution as we gather more experience.

Ah, I see – my notes say you wanted to use Markdown for the Middleware and Infrastructure design docs, not the Apps ones. I shouldn’t have gone by memory. (But I’m still very concerned about being able to link.)

“Wanted” is probably not the right word – I think the sense of my comment was that I didn’t want to force LaTeX onto a subculture that doesn’t already expect/use it.

I think the real discussion there (and the driver for thinking about other markup languages, Markdown included), was whether to continue using MS Word, as one can’t easily PR/diff word documents on github.

Anything that lets us get away from Word Track Changes to Git is a good thing in my opinion. Latex can be fine and tex4ht is excellent these days (And I think may support proper linking to sections in the HTML output).

Sorry for the delay, but here’s a first cut of LDM-151 rendered as HTML via reST/Sphinx on readthedocs:

http://dm-applications-design.readthedocs.org/en/tickets-dm-3546/

The source is at:

I’ll leave the judgement of whether this should be the medium for choice for DM design documents, but I will say it was a useful Sphinx exercise for me.

The version on readthedocs uses RTD’s default theme. If you don’t like what it looks like, rest assured that it can be changed if we decide to invest time into design work. If you build the docs locally (see the README.md for instructions) you’ll see the docs rendered with a different theme).

I ran into a few interesting issues with Sphinx

  • Sphinx can’t support bold+italics natively. The work-around would be to add semantics that create <span> tag in the HTML that has a class that can be styled as bold+italic, but I haven’t done this yet.
  • Section numbers are a bit tricky to do, surprisingly. There is a set of extensions for thesis styling (https://github.com/jterrace/sphinxtr/tree/master/extensions) but I haven’t tried that out.
  • Deep link anchors can either be automatically named or manually specified via a .. _my_label: directive appearing before the content/figure.
  • Sphinx has nice glossary support with the :term: role that auto links to the glossary.
  • I’d also like better bibtex/natbib integration if we’re going to use Sphinx as a latex replacement. Again, some research into what was done in https://github.com/jterrace/sphinxtr might be useful)

There’s also a couple bugs in readthedocs that we’ve found with this experiment.

  • The ‘Edit in GitHub’ link is broken. This is because RTD converts ‘/’ in branch names to ‘-’ (presumably because they actually store the versions in directories on a filesystem), and the RTD JS forgets that the GitHub branch name may not correspond to their version name. Hence ‘master’ would work, but ‘tickets/DM-3546’ does not with the GitHub edit button.
  • The RTD sidebar TOC javascript doesn’t seem to handle the document well; all document levels are expanded when they shouldn’t be.

Overall, the issues you see in the online document probably can be fixed with a bit of thought. This wouldn’t be a blocker since I’ll need to solve similar problems for the technical docs.

I’m surprised at how little math there is in the document. The TOC bar doesn’t seem to work properly for me (appears flat rather than expanding/collapsing). But overall, quite nice.

If we can link to an HTML rendering of $\LaTeX$, then we may not need this.

You’re right, the JavaScript that readthedocs’ theme uses to expand/collapse the sidebar TOC is quite broken.

It’s true; tex4ht may be the way to go to stay tex-native. It is a bit of work to make Sphinx feature-complete for replicating what LaTeX is good at (especially bibliography generation).

So I have two real problems with @mjuric’s argument.

One, that somehow the fact that the DPP, which as far as I know is a change-controlled document, will be arbitrarily changed by senior scientists (who somehow are more converse with LaTeX than Word) because it “flows” into other documents. The fact that it is a very simple markup format surely makes it easier to cut and paste into Word or LaTeX or Powerpoint ow whatever they want surely? Moreover I don’t even understand the “flow” part to be honest, surely people will just refer to it via LDM or DoI (if @timj has his way) in papers.

The other is that the science collaborations as a whole will prefer LaTeX to the default python markup. This one is incredibly easy to resolve - now that we have two examples up I’ll just ask Beth to run a poll on her collaboration (I assume someone will grant me that one collaboration is a reasonable sample and that one’s markup preference does not change with physical scale) .

I’m not a senior scientist [1], but I am on the hook for making substantial changes to this document over the next month (and, presumably, less substantial editing into the future). For what it’s worth, I have a fairly mild preference for working in LaTeX over reST. If there were a compelling argument for using the latter, I wouldn’t object, but I’ve not seen one in the above.

Tangentially: if we were seriously going to invest effort on tooling for documents like this, I’d start by questioning how we can make them as reusable as possible. We wouldn’t accept the same level of copy & paste duplication and lack of provenance tracking in our code as we do in our documents, and that’s something I’d really like to fix, but it would take more than simply changing the format and is clearly out of scope here.

[1] Actually, according to the spreadsheet that Jeff gave me, I appear to be an Applications Manager, a Sr. Software Engineer, and a Sr. Scientist all at the same time. But other than marvelling at my ability to wear multiple hats simultaneously, I don’t think that’s relevant.

Yes, that’s the content problem (though it’s also partly because we refer to papers or other documents with more in-depth descriptions, where available).

Sorry if I described this too clumsily – there will definitely be no arbitrary changes to these documents, everything will go through the normal change-control process, where changes are proposed, debated, and accepted. My desire is, however, is to make this process into, essentially, a pull request – and using a markup language that git can deal with helps there (as opposed to Word).

I also want it to be possible (and easy) for anyone in the scientific community to propose changes (though my expectations about the number of such proposals are appropriately realistic). Using a widely understood format (LaTeX) lowers the barrier to entry. Writing in LaTeX is also expected; though I think it’s far from perfect, in the astro community LaTeX is the lingua franca of communication for well thought-out ideas (and I acknowledge it may be different elsewhere, as I clarified in my reply to @ktl ).

Regarding the other part of your question – it’s about the reuse of the actual text (and the ability to import it from elsewhere). A few examples:

  • The DPDD (and LDM-151) are generally thought as being difficult to understand and read. This is based on ~2 years of experiences and anecdotal feedback, and feedback from our own PST (and @ivezic and @RHL can chime in here) . A large part of it seems to be that these are long, fairly dry, technical documents. I’ve been asked to develop a series of smaller, more focused, documents (papers, really) aiming to explain how the LSST data products will enable science in particular areas (e.g., “Minor bodies of the Solar System”, “Resolved stellar populations”, “Microlensing with LSST”, …). To do this, @ctslater and I will rely heavily on the actual text from the DPDD and LDM-151 (and, vice versa – we’ll move some of the text from these papers into the DPDD, as appropriate). Those papers have to be in LaTeX (they will be published, as proceedings or otherwise); keeping the same format for the source material will be very helpful to us.

  • We are very often asked to write (or contribute) to short whitepapers for various venues (e.g. a recent public example is the LSST/Euclid/WFIRST joint reprocessing whitepaper. Like above, keeping our sources in LaTeX makes this very easy.

  • More broadly, there are a number of other documents in the science community we “interoperate” with (e.g., the cadence whitepapers, and the DESC roadmap that’s currently being written). Speaking the same language helps.

We could, of course, convert back-and-forth between various markups, but this would only make sense if there was sufficient value to be reaped from it to offset the negatives and the extra work.

Understanding the markups preferred by various sections of the community would be a fun project (though a very difficult one; I suspect getting a representative sample would be non-trivial)! But, as mentioned above, this is not just about the science collaborations. Also, a large fraction of people whose inputs are most needed for these kinds of documents skew more “traditional”.

Bottom line: this has been a great discussion (and I’m amazed and in total awe how quickly @jsick turned around the LaTeX->sphinx demo!!), but we do need to move on. Let’s continue with the present practice of using LaTeX for DM Project Science-related documents. We can revisit it in a ~year or so, once we gather more experience with the pros and cons.