Should we make *all* --show config comparisons be case insensitive?

Sorry to revisit RFC-108, but Russell and I disagree about a stylistic point that came up after the RFC was accepted.

I added
The only difference is that I plan to make --show config=xxx be case-insensitive only if the pattern xxx is all lower-case. If people really want to match "background" lower-case only, they can say --show config=xxx:NOIGNORECASE
and Russell would prefer that either all searches be case insensitive, or that I made that last option config=xxx:case. My argument is that if someone goes to the trouble of using upper case we should obey their wishes, and that the NOIGNORECASE case is so rare that the obvious re.IGNORECASE-like syntax is good enough (but that we need to permit people to get exactly what they want – computers work for us, not vice-versa)

Do people care? If not, I’m going to merge my version.

I’m worried that it’s very complicated to explain your version in --help output. Simpler is better, and leaving out very rare cases is sometimes justifiable in the interest of simplicity. Why not just case-insensitive unless “:NOIGNORECASE” is appended?

RFC-108 has not been accepted…

How about using a boolean flag instead of creating an LSST specific search syntax?

I’m not thrilled with a separate boolean flag because it would only apply to this particular argument value. Seems even messier.

I could have sworn I hit that button. It was clearly accepted based on the responses I got.

I agree with KT on this one.

I remain concerned that if someone uses MixedCase we should obey it. If we don’t, NOIGNORECASE seems OK to me.

I think it’s quite likely that someone might use uppercase letters in a guess (“I think it was something like *Flux”), not intending any special meaning, and then miss out on relevant results. This is particularly true with mixedCase variable names. I think complete case-insensitivity would match the novice user’s expectations much better.

I agree with @ktl. My objections to @rhl’s implementation are:

  • I feel it is too complicated and hard to explain. My guess is the only folks who will know how to use it are those who read the code.
  • It is too magic – very nice when it does what you want, but hard to figure out when it doesn’t
  • It provides two different ways to do the same thing, one of which is very ugly
  • It requires a lot of code

My suggestion is to require that all searches be case-blind. Simple and easy to explain and probably perfectly adequate. I suggest we try to live with that for now and see if it’s a problem.

If we do get a strong call for case-sensitive searches then I agree they will probably have to be “per-pattern”. Some possible solutions (but please, only if we find we need such a solution later) include:

  • Use a different word than config for case-sensitive config searches, e.g. --show Config=caseSensitive
  • Use a special delimiter. If the command parser leaves double quotes alone, then use those, otherwise perhaps colons, e.g.: `–show config=:caseSensitive:
  • Use a trailing colon-separated word, like @rhl’s, solution but something short such as :case

Again, I think the best thing to do is to keep it simple and not support case-sensitive searches initially.

OK. How about I make the default case-insensitive and warn about non-all-lowercase strings. And support xxx:NOIGNORECASE.

@rhl do you have an important use case for xxx:NOIGNORECASE? If not, I would be much more comfortable not supporting it at all. If you must support it, I would like something easier to type.

That computers should do what you tell them to do. And that things like "*background*" match lots of things that "*Background*" don’t.

I don’t understand your objecting so strongly to the name. You just argued that it was of marginal use, so the length’s not a problem to you, and it maps trivially to python’s re.IGNORECASE.

I do not consider :NOIGNORECASE a sane thing to type on the command line. It’s too long, it’s hard to type because it is all uppercase, and it’s hard to parse due to the leading “NO”. It’s nearly impossible to remember. All of the solutions I mentioned are shorter and easier to document.

Source code is write once, read many, so long names are reasonable. Very long names are much less desirable for command line options. Also, in source code the NO would be absent, making the name clearer.

Printing a warning when you type an uppercase letter seems reasonable.

I tried your example on processCcd.py with lsstSim data and it is enlightening. There are too many config parameters, including a lot of clutter added by the way we handle dictionaries. I worry that glob expressions may not be good enough, but it’s better than nothing. I don’t think case sensitivity will help much, but maybe something is better than nothing.

As an aside, it might also help if we had a way of printing a briefer version of configs that excluded dict entries we were not using. Some of our configs have a lot of entries we don’t use use. Of course then we’d probably want a way of demanding the full config. In any case that would be a different ticket.

Useful links: https://jira.lsstcorp.org/browse/RFC-108 and https://jira.lsstcorp.org/browse/DM-4217 (the ticket where the work is being done and the discussion started)