Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching SLURM account information for use in app forms #1970

Closed
Micket opened this issue Apr 23, 2022 · 11 comments · Fixed by #2479
Closed

Caching SLURM account information for use in app forms #1970

Micket opened this issue Apr 23, 2022 · 11 comments · Fixed by #2479
Labels
component/batch_connect enhancement This is functionality that builds upon existing functionality.
Milestone

Comments

@Micket
Copy link
Contributor

Micket commented Apr 23, 2022

Feature request:
I currently have a dashboard initializer apps/dashboard/initializers/ood.rb which does;

class SlurmData
  def self.accounts
    username = Etc.getlogin
    accounts = []
    IO.popen 'sacctmgr -np show accounts withassoc format=account,user' do |io|
      io.each do |line|
        acc = line.split('|')
        if acc[1] == username
          accounts.append(acc[0])
        end
      end
    end
    return accounts
  end
end

(which could of course be cleaned up (I think it should have used CurrentUser?), extended to fetch partitions, reservations, clusters. but I have not yet had any such need).

It lets me switch away from free text field as input and instead conveniently configure my apps to just do;

  bc_account:
    widget: "select"
    options: [<%= SlurmData.accounts.join(', ') %>]
    ...

which conveniently only presents the user with valid options; and for most users there is just one option which gets auto-selected.

In fact, this seems so generally applicable, I can't think of a reason why you wouldn't always want to do it like this, having all of this just be the default behavior.
Extending it to a multi cluster setup also seems feasible.

Is there any broader interest in this? I can work on a PR if there is (though my ruby is weak)

┆Issue is synchronized with this Asana task by Unito

@osc-bot osc-bot added this to the Backlog milestone Apr 23, 2022
@johrstrom
Copy link
Contributor

Yes I think we'd want this. We're adding auto_primary_group right now in #1964. so this may be auto_accounts?

That said, I don't think we'd want to use sacctmgr or if we did we'd have to push it to https://github.com/OSC/ood_core where all our cross scheduler logic is.

My guess is a simple group filter would do the trick. For us at OSC it'd be ^P*. All our chargeback projects start with a P like PZS0714. A group filter like ^P* at OSC would filter out all the other entitlement groups I'm also in like oscstaff. Would such a group filter work for you as well?

@Micket
Copy link
Contributor Author

Micket commented Apr 25, 2022

Unfortunately, it would not (I have groups, but not a pattern isolated to just slurm projects, and they might be present some time after expired projects).
Also, even for those using predictable unix-groups, you would still not have partitions, QoS, reservations available that could be easily fetched from sacctmgr, which would also be nice to be able to present, e.g. auto_group etc.
It would also force everyone setting up OOD to figure out what this means and hope such a predictable group and regex exist.

It just seems to me that the using the actual slurm account database just guarantees this is simply unconditionally correct for every site using OOD.
Is there an issue pushing it to core?

What actually prompted me to write this feature request was due to seeing #1964 and thinking that this would basically render auto_primary_group unnecessary.

@johrstrom
Copy link
Contributor

you would still not have partitions, QoS, reservations available

These would be separate fields. auto_queues would come have to come from ood_core, but i don't know about QoS or reservations. I'll have to check if those are cross scheduler fields and how that may fit. I've also got #1763 for things that are more static (of QoS is generally static).

Is there an issue pushing it to core?

No, not at all if it fits there.

this would basically render auto_primary_group unnecessary.

This a choice you're presenting to the user. auto_primary_group doesn't even show up in the form as an option. So the behavior of submitting the job with the primary group as the accounting_id is transparent to the user.

@Micket
Copy link
Contributor Author

Micket commented Apr 25, 2022

My goal here is to be able to stay as close to upstream as possible (since maintaining a fork is painful), so my hope is to find something pretty much everyone could be happy with to eliminate the need for forking off code.

I have a few question marks/design decisions;

  1. Would we keep some "queue accounting information"-struct, similarly to e.g. OodFilesApp, to be some globally accessible variable which could be used for all sort of things (perhaps some additional site-specific logic inside a submit.yml.erb file)?
    And also having these smart attributes as an additional feature on top of that?

  2. Slurm basically has fairshares on each (cluster,account,partition,qos)-tuple; meaning there may (perhaps even likely) that not all combinations exists; perhaps most commonly, a project might not have access to some particular partition.
    Similarly to how a node configurations isn't necessarily available for a given cluster (or partition).

I suppose the main 2 options for problem 2 would be;
a. Flatten it; just list all the possible options independently with the expectation that the user should know what combination of partition and account they can choose.

b. Keep track of what combinations exist. Not at all difficult in the code; basically just a hierarchy e.g.

{
    'cluster_a': {
        'account_a': ['partition_a', 'partition_b']
        'account_b': ['partition_a']
    }
    'cluster_b': {
        'account_c': ['partition_c']
    }
}

I've seen there is data-option-for-cluster-xxx tags (https://github.com/OSC/bc_osc_jupyter/blob/master/form.yml.erb#L63) Could this be extended here? (I only have 1 cluster, so, i haven't really been able to test this at all).
The struct above would be ideally boil down to something like

attributes:
  account:
    widget: "select"
    options:
      - [ "account_a", data-option-for-cluster-cluster_a: true ]
      - [ "account_b", data-option-for-cluster-cluster_a: true ]
      - [ "account_c", data-option-for-cluster-cluster_b: true ]
  partition:
    widget: "select"
    options:
      - [ "partition_a", data-options-for-account-account_a: true, data-options-for-account-account_b: true ]
      - [ "partition_b", data-options-for-account-account_a: true ]
      - [ "partition_c", data-options-for-account-account_c: true ]

If other schedulers actually don't have this split of project allocations per partition, then, one could still just populate the struct completely.

QoS is probably overkill (I don't want to present it), but I see no reason why it couldn't be done generally like this.

@johrstrom
Copy link
Contributor

We don't use partitions like that, so I'll have to think on how to compose these relationships. Ours are more like 'you request a gpu' so you get put into the gpuserial partition. It's sort transparent to the user. They get assigned the partition based off of the resource requests.

That said - I think the interface would look more like this in ood_core. The objects you're really interacting with are queues that may or may not have restrictions. So for cross-scheduler compatibility, other schedulers may not have group restrictions and they just leave that empty.

I'll submit a ticket to it for the same and link it here. That way the adapters (slurm in this case) have to worry about how to implement it and OOD (in this ticket) can then stitch it all together because it is cluster + account aware.

def queues
  [ { 
       name: 'partition_a',
       accounts: [ 'account_a', 'account_b' ]
   }]
end

Could this be extended here?

absolutely see this below.

{ "data-option-for-cluster-#{cluster.id}": false } unless mod.on_cluster?(cluster.id)

@johrstrom
Copy link
Contributor

Already had the ticket actually but now I can refine it.

OSC/ood_core#741

@Micket
Copy link
Contributor Author

Micket commented Apr 25, 2022

Sorry for a million questions here; the link to auto_modules.rb; isn't that just for automatically adding things to the form?
I was thinking about how it was handled on the web page, presumably via some javascript.
I thought perhaps some additional code was necessary in order for data-option-for-account-PZS0714: true would work?

I'm also a bit confused because I see both


and also
https://github.com/OSC/bc_osc_jupyter/blob/master/form.yml.erb#L134

both the variants
data-option-for-cluster-xxx
and
data-option-for-xxx

which made me even more confused in how it could work for bc_account field

@johrstrom
Copy link
Contributor

I see how you got there. Disregard that fixture file. That was copied from our real rstudio app which had it's own javascript at the time (so cluster was assumed or the only thing toggling).

This is the fixture file we use to test these features. I'm not sure what commit you'd linked but it seemed older.

https://github.com/OSC/ondemand/blob/master/apps/dashboard/test/fixtures/sys_with_gateway_apps/bc_jupyter/form.yml

I thought perhaps some additional code was necessary in order for data-option-for-account-PZS0714: true would work?

This is exactly how this would work. Also look at data-set-* functionality to set a value based on some other value.

Here are the docs on this feature.

https://osc.github.io/ood-documentation/develop/how-tos/app-development/interactive/dynamic-form-widgets.html

@Micket
Copy link
Contributor Author

Micket commented Apr 26, 2022

Thank you for the link! I tried searching but the RTD style page is pretty terrible when it comes to searching for dashes (https://osc.github.io/ood-documentation/develop/search.html?q=data-option-for)

So then the only step required for this particular PR would be to implement the smart attributes. I think i have the full picture now.

@lukew3 lukew3 added the enhancement This is functionality that builds upon existing functionality. label Jun 30, 2022
@johrstrom
Copy link
Contributor

So then the only step required for this particular PR would be to implement the smart attributes. I think i have the full picture now.

Sorry for the delay. No steps required - just submit a pull request! I did however get around to auto_groups in #2370 though they're unix groups.

I'm looking more into getting the available queues from a system. The sticking point is that while Slurm can do this, other systems may not be able to. In any case, it'll be limited to slurm support if we can pull queues & accounts from Slurm.

@Micket
Copy link
Contributor Author

Micket commented Nov 25, 2022

I had been meaning to have a go at this myself, but severe staff shortage ha forced me to completely drop everything non vital. Looking forward to see something like this upstream, as it would let me minimize my local changes.

@Oglopf Oglopf added component/dashboard bug Existing functionality not working as expected and removed bug Existing functionality not working as expected component/dashboard labels Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/batch_connect enhancement This is functionality that builds upon existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants