Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user.present group regression in 2017.7.5+ if gid_from_name is True #48640

Closed
boltronics opened this issue Jul 18, 2018 · 6 comments
Closed

user.present group regression in 2017.7.5+ if gid_from_name is True #48640

boltronics opened this issue Jul 18, 2018 · 6 comments
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P3 Priority 3 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@boltronics
Copy link
Contributor

boltronics commented Jul 18, 2018

Description of Issue/Question

There is a regression when upgrading from 2017.7.4 to 2017.7.5+ which results in a change in behaviour to the user.present functionality. My understanding is that this should never happen in a bugfix release unless it cannot be avoided due to a security issue.

I was building a new host with the following state applied (as rendered via state.show_sls users):

    myuser1:
        ----------
        __env__:
            base
        __sls__:
            users
        user:
            |_
              ----------
              fullname:
                  User 1
            |_
              ----------
              shell:
                  /bin/bash
            |_
              ----------
              gid_from_name:
                  True
            |_
              ----------
              uid:
                  None
            |_
              ----------
              gid:
                  myuser1
            |_
              ----------
              groups:
                  - adm
                  - sudo
                  - staff
            |_
              ----------
              home:
                  /home/myuser1
            |_
              ----------
              createhome:
                  None
            - present
            |_
              ----------
              order:
                  10018

Note I have replaced the id, fullname and gid arguments in all these examples to protect the innocent. I am focusing on just one specific user here for clarity.

This has worked for many years, and results in the following successful output on <=2017.7.4:

          ID: myuser1
    Function: user.present
      Result: True
     Comment: New user myuser1 created
     Started: 11:07:51.917005
    Duration: 82.221 ms
     Changes:   
              ----------
              fullname:
                  User 1
              gid:
                  1001
              groups:
                  - myuser1
                  - adm
                  - staff
                  - sudo
              home:
              homephone:
              name:
                  myuser1
              passwd:
                  x
              roomnumber:
              shell:
                  /bin/bash
              uid:
                  1001
              workphone:

The gid_from_name argument has historically functioned in more or less the same way as useradd's -U argument, which naturally makes sense given the module name is called useradd.

$ useradd --help | grep -- -U
  -U, --user-group              create a group with the same name as the user

The original documentation for this option in Salt said:
If True, the default group id will be set to the id of the group with the same name as the user, Default is False.
To me, this implies that it will create the user for the group - which has always been how useradd has functioned, and is the most logical behaviour since the user has already indicated they want the group.

Well, despite all of this, it appears this behaviour changed in this PR - including the documentation!

I don't understand why this change in functionality was permitted in a point release since those are supposed to be stable. A request to update documentation to reflect new behaviour in a point release should be a red flag! Instead we are left with being unable to log into a new system until we change all affected configurations.

----------
          ID: myuser1
    Function: user.present
      Result: False
     Comment: Default group with name "myuser1" is not present
     Started: 11:19:46.648672
    Duration: 2.019 ms
     Changes:  

I tried manually correcting this:

$ sudo salt 'host-in-question' group.add myuser1 1000
$ sudo salt 'host-in-question' group.add myuser2 1001
$ # etc.

I then re-ran my users state responsible for the above state sls data, and this still didn't work correctly! That left me with this situation:

$ sudo salt 'host-in-question' cmd.run 'getent passwd | grep myuser | head -n 2'
    myuser2:x:1000:1001:User 2,,,:/home/user2:/bin/bash
    myuser1:x:1001:1000:User 1,,,:/home/user1:/bin/bash

How is that even possible? Well let's look at the updated docs for gid_from_name:

If True, the default group id will be set to the id of the group with the same name as the user. If the group does not exist the state will fail.

Well yes it's technically working as documented... but the user id now doesn't match up with the group id!

There must be a way to fix that... looking at the uid argument documentation:

The user id to assign. If not specified, and the user does not exist, then the next available uid will be assigned.

So now there's no way to just say "match the group name". We can't embed some jinja to look it up, because that will be rendered before the group is created and it won't find anything!

This is all turning out to be quite a mess, and I'm at a loss as to how this was never picked up over the last few months by anyone. The way I see it I now have two options - write a custom state module, or replace the user.present sls blocks with cmd.run blocks. If I'm missing a more elegant solution here, please let me know.

Obviously nobody likes bugs and I've reported quite a few over the years, but this one makes me especially sad to see this broken so badly, in a point release, after it's been running for years without issue, in such a core piece of functionality that affects every host I manage.

Setup

This will probably suffice to trigger the main problem.

myuser1:
  user.present:
    - gid_from_name: True
    - gid: myuser1

Steps to Reproduce Issue

  1. Update to 2017.7.5 or higher.
  2. Remove the existing user.
    sudo salt 'host-in-question' user.delete myuser1 remove=True
    Note this command also removes the user group! I believe managing the user group is the more convenient behaviour here too - although this really should be documented since in some cases the user might not want or expect the group to be deleted.
  3. sudo salt 'host-in-question' state.sls users

Versions Report

    Salt Version:
               Salt: 2017.7.7
     
    Dependency Versions:
               cffi: Not Installed
           cherrypy: Not Installed
           dateutil: 2.5.3
          docker-py: Not Installed
              gitdb: Not Installed
          gitpython: Not Installed
              ioflo: Not Installed
             Jinja2: 2.8
            libgit2: Not Installed
            libnacl: Not Installed
           M2Crypto: Not Installed
               Mako: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.4.8
       mysql-python: Not Installed
          pycparser: Not Installed
           pycrypto: 2.6.1
       pycryptodome: Not Installed
             pygit2: Not Installed
             Python: 2.7.13 (default, Nov 24 2017, 17:33:09)
       python-gnupg: 0.3.9
             PyYAML: 3.12
              PyZMQ: 16.0.2
               RAET: Not Installed
              smmap: Not Installed
            timelib: Not Installed
            Tornado: 4.4.3
                ZMQ: 4.2.1
     
    System Versions:
               dist: debian 9.5 
             locale: ANSI_X3.4-1968
            machine: x86_64
            release: 4.9.0-6-amd64
             system: Linux
            version: debian 9.5
@boltronics
Copy link
Contributor Author

In practise with my current state layout with user.present dependent on a new group.present block (both in a jinja loop over all users defined in pillar), it appears from my brief testing that the groups are always created in the same order as the users, causing user/group ids to match.

I guess this could be forced by manually setting an order for each block in the loop with an incrementing counter. This will suffice for now.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jul 18, 2018

ping @meaksh can you comment here since this was your change.

@Ch3LL Ch3LL added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Jul 18, 2018
@Ch3LL Ch3LL added this to the Approved milestone Jul 18, 2018
@boltronics
Copy link
Contributor Author

Looking over #45365 some more, and looking over @meaksh's profile, I now have a better understanding of the problem.

I rolled back to 2017.7.4 and ran the example state from that PR that is supposed to have problems (which I called test.sls):

test:
  user.present:
    - gid_from_name: True

This is what I got:

root@debian-stretch-test-0:# salt-call state.sls test
[INFO    ] Loading fresh modules for state activity
[INFO    ] Fetching file from saltenv 'base', ** done ** 'test.sls'
[INFO    ] Running state [test] at time 10:39:25.973468
[INFO    ] Executing state user.present for [test]
[INFO    ] Executing command ['useradd', '-m', 'test'] in directory '/root'
[INFO    ] {'shell': '', 'workphone': '', 'uid': 1002, 'passwd': 'x', 'roomnumber': '', 'groups': ['test'], 'home': '/home/test', 'name': 'test', 'gid': 1002, 'fullname': '', 'homephone': ''}
[INFO    ] Completed state [test] at time 10:39:26.268421 duration_in_ms=294.954
local:
----------
          ID: test
    Function: user.present
      Result: True
     Comment: New user test created
     Started: 10:39:25.973467
    Duration: 294.954 ms
     Changes:   
              ----------
              fullname:
              gid:
                  1002
              groups:
                  - test
              home:
                  /home/test
              homephone:
              name:
                  test
              passwd:
                  x
              roomnumber:
              shell:
              uid:
                  1002
              workphone:

Summary for local
------------
Succeeded: 1 (changed=1)
Failed:    0
------------
Total states run:     1
Total run time: 294.954 ms
root@debian-stretch-test-0:# getent passwd test
test:x:1002:1002::/home/test:
root@debian-stretch-test-0:# getent group test
test:x:1002:
root@debian-stretch-test-0:# salt-call state.sls test
[INFO    ] Loading fresh modules for state activity
[INFO    ] Running state [test] at time 10:40:59.266846
[INFO    ] Executing state user.present for [test]
[INFO    ] User test is present and up to date
[INFO    ] Completed state [test] at time 10:40:59.279379 duration_in_ms=12.534
local:
----------
          ID: test
    Function: user.present
      Result: True
     Comment: User test is present and up to date
     Started: 10:40:59.266845
    Duration: 12.534 ms
     Changes:   

Summary for local
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  12.534 ms
root@debian-stretch-test-0:# dpkg -l | grep salt
ii  salt-common                   2017.7.4+ds-sp1                all          shared libraries that salt requires for all packages
ii  salt-minion                   2017.7.4+ds-sp1                all          client package for salt, the distributed remote execution system
root@debian-stretch-test-0:# 

As you can see, this was already working perfectly - no changes required.

However now with the PR merged, Salt checks for the return value of __salt__['file.group_to_gid'](name) which of course returns nothing since the group doesn't normally preexist, and then aborts with an error. Hence this was actually originally all good, and now is broken.

Then I looked at meaksh's GitHub profile and noticed he's running SuSE. As we probably all know, RPM-based distros like RedHat and SuSE do things a bit differently with regards to user management. These distributions don't create one group per user by default as is the norm elsewhere, but instead create a users group or some such, and have all users created with that as the default group.

How does this happen? It basically comes down to /etc/login.defs which is what commands like useradd reference to determine their default behaviour. The important setting here is USERGROUPS_ENAB which defaults to yes on Debian, and I'm betting defaults to no on SuSE. Hence the useradd command isn't creating groups by default on SuSE. Apparently the gid_from_name argument for user.present is broken in that environment.

I've given the Salt code and changes a quick skim just now, and I didn't notice anywhere where a group is explicitly created by Salt, or anywhere where the -U argument to the useradd command is specified, which I had always assumed was the case. I believe if gid_from_name is enabled, useradd should get the -U argument, which should effectively cause the value of USERGROUPS_ENAB to be ignored, but that wasn't happening.

This begs the question; what did gid_from_name actually do originally if it was the default on Debian and isn't working on SuSE? I created a new test and adjusted my test.sls state as follows:

test:
  user.present:
    - groups:
      - staff

Here is the result on Debian Stretch with Salt 2017.7.4:

root@debian-stretch-test-1:~# salt-call state.sls test
[INFO    ] Loading fresh modules for state activity
[INFO    ] Fetching file from saltenv 'base', ** done ** 'test.sls'
[INFO    ] Running state [test] at time 11:00:12.896379
[INFO    ] Executing state user.present for [test]
[INFO    ] Executing command ['useradd', '-m', 'test'] in directory '/root'
[INFO    ] Executing command ['usermod', '-G', 'staff', 'test'] in directory '/root'
[INFO    ] {'shell': '', 'workphone': '', 'uid': 1002, 'passwd': 'x', 'roomnumber': '', 'groups': ['staff', 'test'], 'home': '/home/test', 'name': 'test', 'gid': 1002, 'fullname': '', 'homephone': ''}
[INFO    ] Completed state [test] at time 11:00:12.972447 duration_in_ms=76.067
local:
----------
          ID: test
    Function: user.present
      Result: True
     Comment: New user test created
     Started: 11:00:12.896380
    Duration: 76.067 ms
     Changes:   
              ----------
              fullname:
              gid:
                  1002
              groups:
                  - staff
                  - test
              home:
                  /home/test
              homephone:
              name:
                  test
              passwd:
                  x
              roomnumber:
              shell:
              uid:
                  1002
              workphone:

Summary for local
------------
Succeeded: 1 (changed=1)
Failed:    0
------------
Total states run:     1
Total run time:  76.067 ms
root@debian-stretch-test-1:~# getent passwd test
test:x:1002:1002::/home/test:
root@debian-stretch-test-1:~# getent group test
test:x:1002:
root@debian-stretch-test-1:~# 

So the answer: absolutely nothing! So what the PR should have done is fix that by making it use the -U argument to useradd.

I'm actually confused now about what gid_from_name is supposed to do now in 2017.7.5+. What was the intended behaviour for this command in the PR? Again, the docs now say:
If True, the default group id will be set to the id of the group with the same name as the user. If the group does not exist the state will fail.
and they say for gid:
The id of the default group to assign to the user. Either a group name or gid can be used. If not specified, and the user does not exist, then he next available gid will be assigned.
So based on this, if we want to have a new user account use a group with the same name, why not just write:

test:
  user.present:
    - gid: test

? AFAIC the PR makes gid_from_name completely redundant now - which is more evidence that the original intent of that argument was to add a -U argument to useradd to have the group automatically created.

Even this bit in the gid docs says so: If not specified, and the user does not exist, then the next available gid will be assigned. - how can that work if user.present isn't creating a group? It's not going to assign a gid to a non-existent group - that would be crazy!

@c4t3l
Copy link

c4t3l commented Jul 24, 2018

My team recently upgraded from 2017.7.0 to 2018.3.2 and this behavior also exists.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jul 24, 2018

@boltronics thanks for all your work on this. we really appreciate it. @c4t3l looks like there is a proposed PR for this issue in #48704

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P3 Priority 3 fixed-pls-verify fix is linked, bug author to confirm fix and removed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Jul 24, 2018
@boltronics
Copy link
Contributor Author

Since the resolution involved adding a new feature, I don't expect the PR to be merged into a stable release until Fluorine, which I guess we will see sometime later this year.

I don't know if there are any plans to revert to the original 2017.7.4 behaviour in the next point release.

dafyddj added a commit to dafyddj/vault-formula that referenced this issue Jul 8, 2019
dafyddj added a commit to dafyddj/vault-formula that referenced this issue Jul 8, 2019
dafyddj added a commit to dafyddj/vault-formula that referenced this issue Jul 9, 2019
dafyddj added a commit to dafyddj/vault-formula that referenced this issue Jul 9, 2019
dafyddj added a commit to dafyddj/vault-formula that referenced this issue Jul 10, 2019
dafyddj added a commit to dafyddj/vault-formula that referenced this issue Jul 10, 2019
saltstack-formulas-travis pushed a commit to saltstack-formulas/vault-formula that referenced this issue Jul 10, 2019
# [1.1.0](v1.0.6...v1.1.0) (2019-07-10)

### Bug Fixes

* **package:** explicitly require package providing setcap ([d476700](d476700))
* **user:** handle removal of `gid_from_name` in Salt develop branch ([dee3748](dee3748)), closes [saltstack/salt#48640](saltstack/salt#48640)

### Code Refactoring

* **defaults:** place common values in defaults.yaml ([3656e31](3656e31))

### Continuous Integration

* **kitchen+travis:** bring into line with `template-formula` ([34f05bd](34f05bd))

### Features

* add support for openSUSE ([76b8ac3](76b8ac3))

### Tests

* **user+group:** test for vault user/group existence ([ff5cdf9](ff5cdf9))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P3 Priority 3 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

3 participants