Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDAP authentication with active directory [$10] #1491

Closed
tpetrosy opened this issue Nov 25, 2015 · 97 comments
Closed

LDAP authentication with active directory [$10] #1491

tpetrosy opened this issue Nov 25, 2015 · 97 comments

Comments

@tpetrosy
Copy link

Hello,
We try to integrate rocketchat with AD using LDAP.
Login works, but we have problem with active sessions.
Seems main.js creates new session with LDAP server for each user login and keeps connection up.
After 15 minutes LDAP server sends RST packet to application and drop established connection.
As soon as LDAP server drop session with application, all connected clients lose connection with rocketchat server.
There is what I get from logs when it happens

Error: read ECONNRESET
at errnoException (net.js:905:11)
at TCP.onread (net.js:559:19)

/var/www/rocket.chat/bundle/programs/server/packages/meteor.js:974
throw new Error("Meteor code must always run within a Fiber. " +
^
Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment.
at Object.Meteor.nodeCodeMustBeInFiber (packages/meteor/dynamics_nodejs.js:9:1)
at [object Object].
.extend.get (packages/meteor/dynamics_nodejs.js:21:1)
at Object.Meteor.isRestricted (packages/dispatch_run-as-user/packages/dispatch_run-as-user.js:137:1)
at [object Object].Mongo.Collection.(anonymous function) as update
at Object.UserPresence.removeConnectionsByInstanceId (packages/konecty_user-presence/packages/konecty_user-presence.js:88:1)
at process. (packages/konecty_user-presence/packages/konecty_user-presence.js:223:1)
at process.emit (events.js:117:20)
at process.exit (node.js:740:17)
at process.catchException (/usr/lib/node_modules/pm2/node_modules/pmx/lib/notify.js:52:15)
at process.g (events.js:180:16)

There is a $10 open bounty on this issue. Add to the bounty at Bountysource.

@srevereault
Copy link

Hello,
I confirm having the same problem :
Error: read ECONNRESET
at errnoException (net.js:905:11)
at TCP.onread (net.js:559:19)

RocketChat worked fine until I connected it to an AD server. I'm running the lastest Docker image with docker-compose.

@tpetrosy
Copy link
Author

Hi,
I tried with latest version from GitHub and got same problem, daemon drop all user connections.

@Megatronic79
Copy link

I’ve just updated to the latest build and LDAP authentication is still working properly and no crashes.

Just now upgraded the production version and same result, LDAP continues to work and RC is stable, this is 2003 and its a non Docker install.

So not being able to reproduce, was the LDAP working for you at all? is the time on the chat server and the AD server in sync? - Can you test your LDAP query in apache studio to confirm its correct?
Also seen some similar issues before with crashes when AD is used with local location for Avatars, has your users changed avatars?
Did you set the LDAP sync option set?

@tpetrosy
Copy link
Author

Hello,
Thank you for your replay. There are answers on your questions.

  1. I am using AD 2008 r2.
  2. I tried with Docker install, and non Docker install.
  3. Time sync is ok on both of the servers.
  4. Ldap query is correct (I am able to login, even if query is not correct, why rocket chat must crash because of that? )
  5. nobody tried to change Avatars.
  6. I tried with Ldap sync and without, the result is the same.

There are more info wich can be useful.

  1. Rocket chat opens connection and doesn't close it with ldap server for every attempt to login (even if I use bad user and password). I was able to open more than 200 connections only by clicking on login button, which is vulnerable issue.
  2. What I get on non Docker installation is that application crashes anyway, but it recover itself after 2-3 seconds.
    There are some system loop outputs for every 2 second when it happens (login time is 21:48:55, crash time is 22:03:38)

Connection status:
Thu Nov 26 22:03:35 UTC 2015
tcp 0 0 10.136.2.161:39138 10.136.0.101:389 ESTABLISHED 12766/main.js
Thu Nov 26 22:03:37 UTC 2015
tcp 0 0 10.136.2.161:39138 10.136.0.101:389 ESTABLISHED 12766/main.js
Thu Nov 26 22:03:39 UTC 2015
Thu Nov 26 22:03:41 UTC 2015


Open port status:
Thu Nov 26 22:03:35 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 12766/main.js
Thu Nov 26 22:03:37 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 12766/main.js
Thu Nov 26 22:03:39 UTC 2015
Thu Nov 26 22:03:41 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 22220/main.js
Thu Nov 26 22:03:43 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 22220/main.js


Process list status:
Thu Nov 26 22:03:35 UTC 2015
12766 Ssl 0:15 node /var/www/rocket.chat/bundle/main.js
Thu Nov 26 22:03:37 UTC 2015
12766 Ssl 0:15 node /var/www/rocket.chat/bundle/main.js
Thu Nov 26 22:03:39 UTC 2015
22220 Rsl 0:01 node /var/www/rocket.chat/bundle/main.js
Thu Nov 26 22:03:41 UTC 2015
22220 Ssl 0:03 node /var/www/rocket.chat/bundle/main.js


Error output from connected client browser:
GET http://rocketchat.myorg.org:3000/sockjs/info?cb=z9110hil5c 503 (Service Unavailable)y._start @ 8924e40c19fca54679d60bca73797c01ec281b56.js?meteor_js_resource=true:45(anonymous function) @ 8924e40c19fca54679d60bca73797c01ec281b56.js?meteor_js_resource=true:45


You can reproduce case fast, if you use tcpkill and try to login to system, it will close new created session between application and ldap server. (Rocket Chat will crash on every attempt to login)

@Megatronic79
Copy link

If using LDAP with wrong credentials is letting you in to RC then looks like there is an error in the filter and its not actually doing any LDAP. (Seen this when no bind is used)

Can you post your Bind Search field (change the username and password of course)

and checking your RC server can actually resolve your AD server ok?

@tpetrosy
Copy link
Author

Ok there is Bind Search.

But if filter is not correct, why login page works correctly ?
when I enter correct username and password I enter to system successfully if no, it gives me bad username or password output.

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=GRP,OU=Groups,OU=LCC,DC=myorg,DC=org)(mail=#{username}))", "scope": "sub", "userDN": "myuser", "password": "mypassword"}

@Megatronic79
Copy link

I think there is a default action, when ldap initial bind fails, to accept logon, prob left over from the testing. @rodrigok best to answer that one. - we should prob add some form of testing to confirm bind is successful before accepting the changes.

So looking at your bind you are using email to login with right?

Here is one from my dev server: (This one can either authenticate with Email or Username and must be in the group called RC_Users)

In this case:

Bind Account (proxy user) = [email protected]
Proxy user password = password
Domain = domain.com
Group = RC_Users,OU=Services,DC=domain,DC=com

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=RC_Users,OU=Services,DC=domain,DC=com)(|(mail=#{username})(sAMAccountName=#{username})))", "scope": "sub", "userDN": "[email protected]", "password": "password"}

@tpetrosy
Copy link
Author

RC doesn't let me to enter to system with wrong credentials.
It just create TCP session with LDAP server and keeps this session established, it creates new TCP session for every login attempts.

@tpetrosy
Copy link
Author

Yes you are right I use email as user login parameter and authentication works fine

@Megatronic79
Copy link

ok so the issue right now for you is that after 15 minutes the session is dropped for all connected users?

@Megatronic79
Copy link

and there appears to be no handling of the closure an ldap query?

@tpetrosy
Copy link
Author

I see here 2 issues.

  1. RC creates new TCP sessions with ldap server and keep session open for every login attempt.
  2. As soon as LDAP server close one of established connection with RC, the RC crashes and do restart.

@Megatronic79
Copy link

Are you using LDAPS? or LDAP?

@tpetrosy
Copy link
Author

I am using LDAP

@tpetrosy
Copy link
Author

One difference is I made manual installation inside self created Docker container. I am not sure if this can be the reason.

@Megatronic79
Copy link

Strange, I can only see an initial 389 connection to the DC, if I kill it the one session will reconnect but all other sessions are still up.

@Megatronic79
Copy link

I'll try and few things see if I can reproduce it

@tpetrosy
Copy link
Author

Thanks a lot!

@Megatronic79
Copy link

Ok I can confirm the non closure of ldap connections even on failed logon.

netstat -antup | grep 389

increases over time for the same process if I enter incorrect pw over and over. - so we need to check the clean-up on connections there.

I'm still not able to reproduce the session closure for all users thou, I can of course manually kill -9 the connection and this will force all clients to terminate and reconnect within a few seconds but I'm not clear why the server side is resetting this connection? - do you have any throttling or limits set on the server side? how many connections are active before it resets?

@tpetrosy
Copy link
Author

Ok what I was able to get from AD.
In ldap connection policy we have parameter
MaxConnIdleTime 900
I think this cause session closure.
Anyway I thing RC must create and clean TCP connection after authentication, because after authentication it will not use it any more, or create some connections and use this connections only for new authentication requests.

Now how to reproduce connection closure. Killing process will not give you what you are looking for.

  1. Login into RC and keep this open
  2. Use command on your RC linux host
    tcpkill -i eth0 host "ldapserverIP"
  3. Than open new browser and try to login.

tcpkill will detect TCP activity from RC to AD and will send RST to both of hosts. This will force hosts to close TCP session immediately.

This will crash RC and you will see that both of clients loose connection with server.
After few seconds server will recover itself (you can see that process PID was changed) and after clients will recover connection with server.

@tpetrosy
Copy link
Author

About amount of active sessions, it doesn't mater I happens even with 1 session.

@Megatronic79
Copy link

Ok then we need to be looking at why your server is terminating the connection? Sure we can handle this better with ldap but I cannot see the behaviour you are seeing with any of the deployments with AD.

Anyone else able to reproduce this?

@Megatronic79
Copy link

I'll create a fresh install this weekend to see if I get the same results.

@tpetrosy
Copy link
Author

As I said
On AD the ldap connection policy is
MaxConnIdleTime 900
this makes connection to be closed.
The problem is not, why server close the connection, the problem is why RC crashes because of that.
The connection closure reason can be (network problem, restart of AD, etc. )
Why all RC restarts because of that :)

@Megatronic79
Copy link

I also have the same default timeouts but no crashes.. that's why I said I will create a fresh one and see if I can reproduce it.

I just jumped onto the dev domain, logged on as user X on RC - then completely disconnected the Virtual DC from the domain just after logon (to simulate network problem, restart of AD etc..) and the user did not disconnect nor did RC crash - of course no more users can logon at that stage (as expected) but I do not see any crashes when a DC is removed from RC.

@Megatronic79
Copy link

if I use tcpkill -i eth0 host "ldapserverIP" and let it listen and kill the connection as it comes in then yep I can confirm RC crashes in that instance, so that defo needs further investigation and should be marked as a BUG. @rodrigok is the best to comment on the ldap code.

@Megatronic79
Copy link

Related to auto-connect on socket error (seems undocumented but possible to handle it gracefully?)

ldapjs/node-ldapjs#318

@miscs
Copy link

miscs commented Dec 21, 2015

Hi,

I use only LDAP-Users for testing. My miscs LDAP entry has [email protected] as mail and miscs as sAMAccountName. So I use this LDAP Query to enable login with miscs OR [email protected] for my user. As I said using miscs works but [email protected] crahses RC.

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(|(mail=#{username})(sAMAccountName=#{username})))", "scope": "sub", "userDN": "[email protected]", "password": "mypass"}

Rocket Log output for login with [email protected] is the same as in my comment above:

Bind before search [email protected] mypass
LDAP search dn DC=corp,DC=xxx,DC=com
LDAP search options { filter: '(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(|([email protected])([email protected])))',
  scope: 'sub' }
Attempt to bind DC=corp,DC=xxx,DC=com

events.js:72
        throw er; // Unhandled 'error' event
              ^
OperationsError: 00002020: Operation unavailable without authentication
    at messageCallback (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1419:45)
    at Parser.onMessage (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1089:14)
    at Parser.emit (events.js:95:17)
    at Parser.write (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/messages/parser.js:117:8)
    at Socket.onData (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1076:22)
    at Socket.emit (events.js:95:17)
    at Socket.<anonymous> (_stream_readable.js:765:14)
    at Socket.emit (events.js:92:17)
    at emitReadable_ (_stream_readable.js:427:10)
    at emitReadable (_stream_readable.js:423:5)

But even if I simplify my LDAP search to

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(sAMAccountName=#{username}))", "scope": "sub", "userDN": "[email protected]", "password": "mypass"}

RC crahses with the same error when I try to login with [email protected]. But in that case it should return "User not found".
Samba Log is always the same as in my comment above:

[2015/12/21 21:19:22.651375,  3] ../source4/auth/ntlm/auth.c:270(auth_check_password_send)
  auth_check_password_send: Checking password for unmapped user [xxx]\[rocketchat_service]@[(null)]
  auth_check_password_send: mapped user is: [xxx]\[rocketchat_service]@[(null)]
[2015/12/21 21:19:22.683116,  3] ../source4/auth/ntlm/auth.c:270(auth_check_password_send)
  auth_check_password_send: Checking password for unmapped user [xxx]\[]@[(null)]
  auth_check_password_send: mapped user is: [xxx]\[]@[(null)]

Using miscs works fine (using LDAP and it´s even the same LDAP user). So I think RC has a problem if LDAP can´t map any user?

@miscs
Copy link

miscs commented Dec 21, 2015

That seems to be the problem. If i use "ANonExistingUsername" RC always crashes.
So any Login with a Username non existing in LDAP brings RC down :(

@Megatronic79
Copy link

hi @miscs - i think your LDAP queries may not be quite right, from the looks of this you are using openldap but you are also using the attribute sAMAccountName=#{username}

sAMAccountName is Active Directory specific.

Shouldn't you be using the following?

uid=#{username}

making your simplified ldap query this:

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(uid=#{username}))", "scope": "sub", "userDN": "[email protected]", "password": "mypass"}

also, just a note, openldap doesn't natively use the overlay "memberOf" - so did you add this yeah?

easiest confirmation is to use apache directory studio to confirm the queries.

@miscs
Copy link

miscs commented Dec 21, 2015

The LDAP queries are working. We use Samba4 as backend which supports sAMAccountName=#{username} - if the query would not work I would not be able to login using only "miscs".

I use these queries in a lot of other applications and they are working everywhere just fine :)

Even in RC they are working as long as I use an existing LDAP user. The crash only occurs if I use a username which is not in LDAP and therefore cannot be mapped...

@Megatronic79
Copy link

hmm, ok, what is Samba4 sending back to RC as a response? can you inspect the traffic and post back?

Ive just tested the latest 0.10 and caused a failed lookup and an incorrect config just results in the correct LDAP response of user not found prompting the following:

failed

Guess we need to know a bit more about how samba is responding to find out why RC crashes.

@engelgabriel
Copy link
Member

You now have a new setting on the Admin -> General panel

image

@Megatronic79
Copy link

Thanks @engelgabriel

@miscs
Copy link

miscs commented Dec 21, 2015

with debug-level all I get

[methods] UserPresence:online -> userId: null , arguments:  {}
Bind before search [email protected] mypass
LDAP search dn DC=corp,DC=xxx,DC=com
LDAP search options { filter: '(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(sAMAccountName=checked))',
  scope: 'sub' }
Attempt to bind DC=corp,DC=xxx,DC=com

events.js:72
        throw er; // Unhandled 'error' event
              ^
OperationsError: 00002020: Operation unavailable without authentication
    at messageCallback (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1419:45)
    at Parser.onMessage (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1089:14)
    at Parser.emit (events.js:95:17)
    at Parser.write (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/messages/parser.js:117:8)
    at Socket.onData (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1076:22)
    at Socket.emit (events.js:95:17)
    at Socket.<anonymous> (_stream_readable.js:765:14)
    at Socket.emit (events.js:92:17)
    at emitReadable_ (_stream_readable.js:427:10)
    at emitReadable (_stream_readable.js:423:5)

So if I am reading the log the right way I have no user (null) which is correct since I used a non existing login. But then [methods] UserPresence:online -> userId: null , arguments: {} is called which won´t work with userId: null ...

@miscs
Copy link

miscs commented Dec 21, 2015

to intercept the samba4/ldap response I have to wait for our admin :( sorry. but maybe we can log the response using RC?

@Megatronic79
Copy link

i'll spin up a samba4 ldap tomo when I get to the office tomo and see if I can help ya out.

@miscs
Copy link

miscs commented Dec 22, 2015

thank you for all your help to figure this out!
but if it is too much hassle, I´ll do it with my admin next year (he is already in holidays)...

@miscs
Copy link

miscs commented Jan 4, 2016

Happy new year to all of you!

We just testet a little bit more and checked responses with wireshark. What we see is the following:

Samba4 Backend, Using LDAP

LogIn with existing User (Success)

  • Bind with service user : Success
  • Search for User : Success -> 1 Result
  • Bind with User-DN from Search-Result : Success
  • Fetch Attributes for User

LogIn with non-existing User (Server-Crash)

  • Bind with service user : Success
  • Search for User : Success -> 0 Result
  • Bind with Base-DN from Search-Result
  • Fetch Attributes for User -> "OperationsError: 00002020: Operation unavailable without authentication"

What RC should do is test is if the result set for User-Search is exactly 1 - if not throw error User not found and abort login.

Hope this helps :)

@engelgabriel
Copy link
Member

@miscs thanks for the extra info. Hopefully that will help @rodrigok :)

@Megatronic79
Copy link

great @miscs - strange that it behaves like this for samba\ldap but not for openldap and active directory

anyways - @rodrigok let me know if you need samba\ldap backend to test against.

@rodrigok
Copy link
Member

rodrigok commented Jan 4, 2016

@Megatronic79 yes, I need :)

I'm trying to simulate this with some online "demo" server but without success.

@Megatronic79
Copy link

sure thing :) , will get this online when I get to the office in the morning and will pm the details to ya.

@miscs
Copy link

miscs commented Jan 4, 2016

@rodrigok if you have an static IP I can also open our Samba4/LDAP server for that IP. Maybe thats the easiest?

@Megatronic79
Copy link

cool.

btw @miscs what kind of setup is your samba4\ldap?

@rodrigok
Copy link
Member

rodrigok commented Jan 4, 2016

@Megatronic79 thanks

@miscs I do not have an static IP, I can inform you my dynamic IP that can work for some time

@hameno
Copy link

hameno commented Jan 5, 2016

If I read the code at https://github.com/RocketChat/Rocket.Chat/blob/develop/packages/rocketchat-ldap/ldap_server.js#L158 correctly, I cannot see any check, if the returned search result is not empty. Could that be it?

@miscs
Copy link

miscs commented Jan 5, 2016

I just modified the from @hameno mentioned code a bit (inside my docker image) and the server won't crash any more :)

client.search(options.ldapOptions.dn, opts, function(err, res) {
                                        if (err) {
                                                console.log('LDAP: Search Error', err);
                                                return ldapAsyncFut.return({
                                                        error: err
                                                });
                                        }
                                        var res_count = 0;
                                        var dn = self.options.dn;
                                        res.on('searchEntry', function(entry) {
                                                res_count = res_count + 1;
                                                dn = entry.object.dn;
                                        });
                                        res.on('error', function(err) {
                                                console.log('LDAP: Search on Error', err);
                                                ldapAsyncFut.return({
                                                        error: err
                                                });
                                        });
                                        res.on('end', function(result) {
                                                if (res_count == 1) {
                                                        bind(dn);
                                                } 
                                                   //else {
                                                    //    var err = new Error('User not Found');
                                                    //    ldapAsyncFut.return({
                                                    //            error: err
                                                    //    });
                                                }
                                        });

But my changes are very dirty and additionally I am too stupid to return a correct LDAP error in case res_count != 1 (this is why the else flow is commented out).
Without the else-flow the server stays UP but of course hangs during login and the site needs to be refreshed :(

@Megatronic79
Copy link

Great!

Maybe @rodrigok can clean that code up for you and pass the correct LDAP value back but looks like you found the issue. :)

@rodrigok
Copy link
Member

rodrigok commented Jan 5, 2016

@miscs I implemented your code.

As far as I know the error returned is not relevant, will just cancel de login.

@miscs
Copy link

miscs commented Jan 6, 2016

cool. thanks a lot!!!
any chance to get this pushed to master soon? ;)

@engelgabriel
Copy link
Member

We push changes to master on mondays, is that ok?

@miscs
Copy link

miscs commented Jan 6, 2016

of course, thanks for the feedback!

@miscs
Copy link

miscs commented Jan 8, 2016

I just pulled the latest develop docker image and everything works fine!!!
Many thanks to all, especially @Megatronic79 @rodrigok and @engelgabriel !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants