Thoughts on the Command Line (Super)Task UX

jsick · October 7, 2015, 4:48pm

Preface

Command line tasks are likely to be first point of contact for many astronomers with the LSST Stack. This means that the experience of using Command Line Tasks is extremely important.

If the command line experience is bad, frustrating, or even unpolished, we’re probably going to lose that astronomer forever. Certainly that astronomer will be reticent to invest in learning our Python API if we allow a belief that our entire stack is badly designed and executed.

With that in mind, I wanted to start a conversation about how we can deliver the best command line experience possible. My comments here are mostly agnostic of the actual architecture. I’m focussing entirely on the look and feel of a command line task. In the title I deliberately used the software hipster term ‘UX’ (mean user experience) since I believe that we should treat the medium of the command line with the same revere as tech companies treat iPhone screens or browsers.

We’ll know we’ve succeeded in designing the command line task experience when we can give a demo and hear the audience mutter “whoa, that’s cool!” This is the design we should strive for.

I also want to disclaim two things

I mean no offence to those whose existing code I might be claiming to be anti-patterns. I just want to help make things better.
I know these suggestions are outside the scope of the current SuperTask design. I think it’s worth starting this discussion now, though, to ensure that our overall task roadmap takes UX into consideration.

Some issues with tasks

Task names aren’t always coherent

One of the first things that struck me about our command tasks is that they look messy. See the task list in the pipetasks bin directory. And by messy, I mean that the names and verbs of the tasks don’t present a coherent vocabulary. To me, command line tasks look like an after-thought.

Command line vocabulares can be beautiful. For example, the vocabulary for vagrant

vagrant box
vagrant init
vagrant up
vagrant connect
vagrant suspend

With a controlled vocabulary like this, the vagrant app suddenly looks simple and knowable.

For the stack, it’s unclear what many tasks do from their name alone. dumpTaskMetadata.py is self-described in its own docstring as a tool to

Select images and report which tracts and patches they are in

I would never have guessed that. Unspecific names makes docs harder to read if one can’t find expected keywords while scanning the table of contents.

Task documentation is lacking

Even if the user has found the right task, we have the problem of documentation. We fundamentally need all tasks to be comprehensively documented in task docstrings and rendered to the LSST Stack Handbook.

But even then tasks are challenging to document because they are so configurable. It’s possible for sub-tasks to be redirected. Thus any ‘static’ documentation can be contradicted by task redirection done by the user.

A vision of the command line experience

Here I present a vision of what our command line experience could be like.

The lsst command

When we tell a new user about LSST’s task pipeline we tell them one thing: “check out the lsst app.”

> lsst

All tasks are namespaced into subcommands in the same sense as sprawling command line applications like git or our example vagrant from above.

Tasks would then be sub-commands

> lsst process-ccd [args]

Tasks provided with instruments or by community packages would have their own command space

sdss process-ccd [args]
decam process-ccd [args]
megacam process-ccd [args]

You’ll also notice that in such a command architecture, we’ve done away with amateur-looking taskName.py script names. The lsst task signature signals to the user: these aren’t cobbled scripts; this is a well-engineered application.

When you run the root command

A user knows that that the LSST pipeline is packaged in the lsst command. But that user doesn’t really know anything else; let alone how to run the pipeline.

The natural thing is to just type at the command line:

> lsst

When this happens, we help the user! The root command prints out a small help message pointing to the online task documentation. It also goes a step further, and prints out a list of all available command line tasks.

Now without even reading the docs, the user has a list of commands to try.

(Note: to be idiomatic and safe, lsst --help will do the same thing)

Getting help on running a task

So the user knows the commands, but how are they run and what do they do? The initial help message will the user to try any command with the help verb, as in:

> lsst help process-ccd

This will show user-oriented task documentation, including a usage example, and a list of arguments and their defaults.

Note that this documentation should include a schematic flow of any subtasks called. The argument list should include arguments associated with the subtasks.

Since the total collection of arguments might be overwhelming, some arguments may be labeled as ‘superuser arguments’ and their defaults often assumed to be correct. By default these superuser arguments would be omitted from the help printout.

> lsst help --all process-ccd

would reveal them.

Similarly, a user might want to filter the command line help to just the base package or a certain subtask. These commands would help with that

# only arguments for process-ccd itself
> lsst help --base process-ccd

# help for isr,calibrate subtasks.
> lsst help --sub isr,calibrate process-ccd

Graphical task help

The terminal has limited information bandwidth. Instead, we could use the show verb:

> lsst show process-ccd [args]

This launches a local static web page showing the pipeline, including task help and the values of arguments as currently set on the command line.

This is an improvement on the docs that I can ship with the LSST Stack Docs because these docs will reflect the actual state of a task given the current configuration, including redirected tasks and what arguments have been set to non-default values.

Graphical task composition

The static web server help provided by lsst show process-ccd was nice, but why settle?

The command

> lsst compose process-ccd

launches a graphical task composer. That is, a local python server is booted up. In this local web app, the user can actually configure and preview the task pipeline.

The user could graphically redirect a subtask to another one and dynamically see the new options that are needed.

The user could also see exactly what data would be processed given Butler data id selectors.

Once the user was satisfied, that pipeline configuration could be exported from the local web app so that the user could immediately run the pipeline in the command line.

Architectural requirements

This discussion is deliberately not about implementation, but rather about experience. Nonetheless, the experience requires these pieces of infrastructure to be implemented:

There needs to be a task registry that not only LSST stack tasks plug into, but any third-part obs_ tasks etc plug into as well. This will allow the lsst command to show a listing of all commands, and for lsst compose to help a user redirect subtasks by showing tasks available.
Tasks not long exist as command line scripts, but as Python modules that follow a task protocol/API.
There needs to be an API for tasks to expose their processing task pipeline DAG, as currently configured.

Closing

I’ve designed a command line task architecture not by considering the implementation details, but by instead considering the user experience. I’ve given a realization of what UX thinking might give you. But even if this specific command line UI is not adopted, I stress that UX thinking should be used when implementing any changes to the command line task architecture.

I also think that tasks should be viewed and designed as a cohesive whole. Tasks shouldn’t just be created to suit a need and figuratively thrown into a bin/ directory. Tasks should serve as a unified vocabulary for processing data.

timj · October 7, 2015, 5:33pm

I’m not saying this is how it should be and just providing this as a data point, but in the PICARD pipeline environment I had a single command that did not know about instruments/cameras. It read the data files it was being given and decided which of the “command spaces” to be enabled. This was all part of a system that had a metadata translation system baked in.

jsick · October 7, 2015, 5:48pm

Okay, I agree. It would be awesome if the user didn’t have to worry that the task comes from the obs_decam package, etc. I guess I envisioned this in the case configuration was a nightmare and a lot of custom work needed to be done to run pipelines for specific instruments.

ktl · October 8, 2015, 6:16pm

We need to get to this. There will be times when data from more than one instrument needs to be combined.

--show config already does something like this, but it’s only a baby step.

The command-line wrapper is meant to be (and usually is) a one-liner. So this is mostly the case already. The API needs to be nailed down further now that we know more about what we need to do, including providing more introspection.

The bigger worry I have is that we have several different types of configuration for tasks: algorithmic, I/O (Butler), and logging/debugging, at least. Right now, arguments for controlling these configurations are mixed and only provide limited functionality. How can we best handle these?

(Why SQuaRE Lounge and not a wider audience?)

timj · October 8, 2015, 6:23pm

PICARD had generic versions of “tasks” but would use instrument-specific versions in preference.

I think @jsick was asking for internal feedback first before going public.

jsick · October 8, 2015, 8:15pm

Yes! It would be really nice to classify tasks into different roles (with declared interface protocols) so that I can write clear documentation that annotate tasks with e.g., ‘expects X input protocol, makes Y output protocol.’ I’d love to see something like an Apple Automator where it’s fool-proof to know how to link things up. (Okay, maybe Automator has rough UX edges too).

Besides your list, I also know there should be a well defined protocol for tasks that work on a single exposure (e.g. calibration) vs tasks that take many and output one (e.g, coaddition).

I wonder if some sort of User Story Mapping would be useful here. We could actual go through various conceivable pipelines (i.e., pipelines that people would reasonably want to write) and ask ourselves how our task infrastructure could handle it.

First, whoops, I forgot that I had left the SQuaRE Lounge open to Admins, but also dramatically opened up the number of admins in the forum (including to all T/CAMs). We really like you folks, but don’t be offended if at some point the SQuaRE Lounge drops off your radar I’ll keep it open for the sake of this discussion now.

And second, yes, I wanted to get internal feedback before raising this point because a) until the bootcamp I had no knowledge of the real details of tasks, so my initial reactions would likely be silly, b) ‘track record’, and c) I didn’t want to bikeshed on the SuperTask design until I had something really substantial.

nidever · October 8, 2015, 9:00pm

I like the design, very slick. We do need to make it very useable and intuitive.

ktl · October 8, 2015, 9:02pm

I’d just caution that many products and entire companies have gone down the drain chasing the graphical-pipeline-generator, tasks-with-typed-ports vision (history lesson: look up Stardent AVS). I’d much rather adopt something from the outside than build something ourselves in this space, if we want it at all – and I don’t think anything has taken over in the outside world. (How many people actually use Automator?)

timj · October 8, 2015, 9:13pm

AVS was nice though. IBM Data Explorer had the same scheme as well (see e.g. Starlink Cookbook 2). Somehow IDL won the day! @frossie also tried to design the ORAC-DR recipe system to allow for a graphical interface that would allow primitives to be dragged around and connected up.

jsick · November 18, 2015, 4:35pm

Bumping this topic (this was originally an internal SQuaRE discussion).

mgckind · November 18, 2015, 5:49pm

This is almost exactly of one of the things Im doing with @gpdf in the process of redesigning pipe_base and (Super)Task. An initial to this approach is implemented and instead of lsst is call CmdLineActivator, which basically takes two sets of arguments, one for the activator and second set for the Task, it could even list the available tasks (no matter whether they live in pipe.tasks, obs.sdss, etc…) you would run it like:

$ CmdLineActivator processCcdTask --extras input --id filter=g …

This is the link with the branch, is still under development and we are in the process to document everything for a more broader “proposal”

More to come soon…

mgckind · November 18, 2015, 5:51pm

Just to complement that, In this proposed framework whenever a worflow or a set of task it’d also produce a dot file and/or a tree representation of the task, being each node an atomic unit of process…

jbosch · November 18, 2015, 6:04pm

I’m a big fan of a lot of ideas here, and aside from the disruption caused by changing some commonly-used interfaces, I think much of this can be accomplished without any major architectural changes - just a lot of detailed incremental-change work.

The new command structure is great, and I think it’ll be pretty easy to do with some fairly non-disruptive changes in pipe_base. I think that’s basically what @gpdf and @mgckind are already working on, plus a registry for all top-level tasks. That work may also present an opportunity to perhaps make the top-level SuperTasks themselves retargetable via config for a particular command, which would allow obs* packages to customize top-level SuperTasks via override and hence ensure we have the same interface for the all cameras (like @timj and @ktl, I do not want instrument-specific commands, and I consider the ones we have a temporary workaround for other deficiencies).

We may also want to consider adding all retargetable non-super Tasks to registries, and using RegistryField instead of ConfigurableField to nest them - that creates a lot syntax messiness in config we’d need to find a solution for, but it makes the space of possible configuration options much more well-defined, and it should help us define interfaces for those pluggable components much more explicitly.

For help on a top-level task, I think one success of the current design is that there’s a nice split between “regular” command-line arguments, data ID arguments, and configuration arguments. The first two categories are either the same or nearly the same for every top-level task, and I think they’re fairly easy to understand (though the data ID syntax is a bit arcane). The problem is in the configuration arguments, which don’t just vary between top-level Tasks, but can actually change when the configuration is itself modified. And there are way too many of them, and they’re organized in a way that makes sense to the implementer, not the user. This is a much harder problem to solve - I think it requires a team of smart people rethinking every aspect of our configuration system, from the Python-language files to how we handle plugging in external code. But I think we could get a lot of usability more easily by adding some property-like descriptors to the top-level configuration classes, which would represent the most important configurations and act purely by modifying one or more child configuration options simultaneously. That requires some real work on each concrete top-level SuperTask, but I don’t actually think it’s very much. It may also exacerbate some existing problems in which the configuration schemas for lower-level pluggable components is not well-defined, but I think that’s a problem we need to solve anyway…

I think graphical task representation is something we should save until we’ve had time to do some work auditing and cleaning up those lower-level non-super Task interfaces; I think we’ll just have a better understanding of how to do describe those interfaces in a general way when they’re in better shape specifically. The good news is that this work is already in @jdswinbank’s plan for DRP work in the next cycle (see https://jira.lsstcorp.org/browse/DM-3580; sorry there’s no description yet to make it obvious that’s what this is), and it’s something many people (especially @RHL) have been advocating as a high priority.

I think that cleanup work is also a bit of a prerequisite for figuring out where to draw the lines that decide which components should be non-super Tasks and which should be SuperTask. Things that are SuperTasks will be usable in more contexts and more introspectable (for e.g. graphical descriptions), but I they’ll also make it harder to follow the code that uses them, because they’ll be connected via e.g. workflow systems rather than just Python function calls, and their signatures will be generic instead of explicitly declaring their inputs and outputs in code form. And of course which of those is the right choice for any particular component is very much a case-by-case decision.