I’ve been using bps.ctrl.parsl to submit batch-jobs via Slurm and have run into a continuing issue where, once created, the submission scripts have the #SBATCH --exclusive flag which prevents the job from exiting the queue due to QOS-limits at my institution. For the time being I’ve been doing the not-so-great solution of hard-coding exclusive=False in SlurmProvider on my installation (appending it to the provider’s arguments in line 161-167 in ctrl_bps_parsl/python/lsst/…/sites/slurm.py) just so that I can continue submitting jobs… but I’ve been wondering if there is a better solution.
I’ve tried adjusting the .yaml config for bps, including:
Along with the following (which I know shouldn’t work… but decided to try it just in case):
...
provider_options: {"exclusive" : False}
But the --exclusive flag has still persisted unless it is hardcoded in SlurmProvider… is there a mistake that I’ve been making or is this a bug? This issue has been persisting since bps.ctrl.parsl was added to official releases (v25_0_2), I’m currently using v26_0_0.
That’s correct. We could add support for that (I wonder how the values would be validated, but maybe that’s not important if the user takes complete responsibility for it), or you could add a subclass of Slurm to support your particular site.
Here’s a subclass implementation I’m using that modifies provider_options the way you want:
# hyak.py
from typing import TYPE_CHECKING, List
import os
from parsl.executors.base import ParslExecutor
from parsl.launchers import SrunLauncher
from lsst.ctrl.bps.parsl.configuration import get_bps_config_value
from lsst.ctrl.bps.parsl.sites import Slurm
__all__ = ("Hyak",)
class Hyak(Slurm):
def get_executors(self) -> List[ParslExecutor]:
max_blocks = get_bps_config_value(self.site, "max_blocks", int, 2)
return [
self.make_executor(
"hyak",
nodes=1,
provider_options=dict(
init_blocks=1,
min_blocks=1,
max_blocks=max_blocks,
parallelism=0.75,
launcher=SrunLauncher(overrides="-K0 -k"),
exclusive=False,
),
)
]
def select_executor(self, job: "ParslJob") -> str:
"""Get the ``label`` of the executor to use to execute a job
Parameters
----------
job : `ParslJob`
Job to be executed.
Returns
-------
label : `str`
Label of executor to use to execute ``job``.
"""
return "hyak"
I have this file available in a module called proc_lsst and allow it to be imported by setting the PYTHONPATH=/path/to/proc_lsst (after LSST stack setup) and I run bps submit submit.yaml with a submit YAML that specifies the site like so: