Not-Exclusive-Nodes with bps.ctrl.parsl

Hello!

I’ve been using bps.ctrl.parsl to submit batch-jobs via Slurm and have run into a continuing issue where, once created, the submission scripts have the #SBATCH --exclusive flag which prevents the job from exiting the queue due to QOS-limits at my institution. For the time being I’ve been doing the not-so-great solution of hard-coding exclusive=False in SlurmProvider on my installation (appending it to the provider’s arguments in line 161-167 in ctrl_bps_parsl/python/lsst/…/sites/slurm.py) just so that I can continue submitting jobs… but I’ve been wondering if there is a better solution.

I’ve tried adjusting the .yaml config for bps, including:

computeSite: slurm
site:
  slurm:
    class: lsst.ctrl.bps.parsl.sites.Slurm
    nodes: 5
    cores_per_node: 5
    mem_per_node: 35
    walltime: "24:00:00"
    provider_options:
        exclusive: False

Along with the following (which I know shouldn’t work… but decided to try it just in case):

...
        provider_options: {"exclusive" : False}

But the --exclusive flag has still persisted unless it is hardcoded in SlurmProvider… is there a mistake that I’ve been making or is this a bug? This issue has been persisting since bps.ctrl.parsl was added to official releases (v25_0_2), I’m currently using v26_0_0.

Thank you for the help!

It looks like the current version of the ctrl_bps_parsl does not support passing custom provider options to its Slurm executor.

Disclaimer: I’m not the author of the plugin nor its frequent user so there’s a non-zero chance that I might have missed something.

1 Like

That’s correct. We could add support for that (I wonder how the values would be validated, but maybe that’s not important if the user takes complete responsibility for it), or you could add a subclass of Slurm to support your particular site.

Here’s a subclass implementation I’m using that modifies provider_options the way you want:

# hyak.py
from typing import TYPE_CHECKING, List

import os

from parsl.executors.base import ParslExecutor
from parsl.launchers import SrunLauncher

from lsst.ctrl.bps.parsl.configuration import get_bps_config_value
from lsst.ctrl.bps.parsl.sites import Slurm

__all__ = ("Hyak",)

class Hyak(Slurm):
    def get_executors(self) -> List[ParslExecutor]:    
        max_blocks = get_bps_config_value(self.site, "max_blocks", int, 2)
        return [
            self.make_executor(
                "hyak",
                nodes=1,
                provider_options=dict(
                    init_blocks=1,
                    min_blocks=1,
                    max_blocks=max_blocks,
                    parallelism=0.75,
                    launcher=SrunLauncher(overrides="-K0 -k"),
                    exclusive=False,
                ),
            )
        ]

    def select_executor(self, job: "ParslJob") -> str:
        """Get the ``label`` of the executor to use to execute a job

        Parameters
        ----------
        job : `ParslJob`
            Job to be executed.

        Returns
        -------
        label : `str`
            Label of executor to use to execute ``job``.
        """
        return "hyak"

I have this file available in a module called proc_lsst and allow it to be imported by setting the PYTHONPATH=/path/to/proc_lsst (after LSST stack setup) and I run bps submit submit.yaml with a submit YAML that specifies the site like so:

# submit.yaml
computeSite: hyak
site:
  hyak:
    class: proc_lsst.Hyak
1 Like

Hi @antenglert , just wanted to follow up with you on your post to check if @stevenstetzler’s response has addressed your issues.