After some misadventures involving git revert
, I’ve just merged DM-38953, which provides a lot more support for PipelineTask
connections that depend on configuration for more than just the dataset type name. This includes changing task dimensions, dataset type dimensions, storage classes, connection types (e.g. you can change an Input
into a PrerequisiteInput
), and creating completely new connections in __init__
.
It all works through regular attribute assignment and deletion syntax:
class SomeConnections(PipelineTaskConnections, dimensions=()):
a = Input(...)
b = Input(...)
def __init__(self, *, config):
# Remove an existing connection.
del self.a
# Replace an existing connection.
self.b = PrerequisiteInput(...)
# Add a brand new connection.
self.c = Output(...) # a totally new connection
# Change the task dimensions.
self.dimensions.update({"patch", "band"})
Some additional notes:
-
Delegating to
super().__init__
is harmless, but it now does nothing - the first step of initialization happens in the metaclass. -
Removing a connection via e.g.
self.inputs.remove("a")
still works, and we have no plans to drop support for it, but we prefer thedel self.a
approach as more intuitive and readable. We don’t currently see any reason for new code to interact with theinputs
,outputs
,initinputs
, etc. sets at all, but they’re all still there for backwards compatibility. -
The
dimensions
attribute on connections objects is aset
that may be modified in-place or replaced with another set-like object. After__init__
it will be turned into afrozenset
(as are theinputs
,outputs
, etc. sets). -
The only breaking changes were to connections classes that were assigning to the
self.allConnections
mapping in__init__
(which was never supported, but it was the only hack that worked for some problems before). This is now a read-only mapping view that is updated automatically when connection attributes are added, removed, or replaced.