CommsCentral

My Technology Adventures

Issues with NTLM when behind AWS Elastic Load Balancers - Cause and solution

Posted at: 2016-02-26 @ 23:50:38

Recently I was troubleshooting a issue, post deployment of Microsoft Dynamics (CRM) 2015 when put behind Amazon Web Servers (AWS) Elastic load balancers (ELBs), that caused me to do some investigation.

The application layout is shown below in a simplified diagram. The important part is two servers running CRM Frontend services behind an ELB.
CRM-Diagram-network-flow.png

The ELB was configured with HTTP listeners. When in this mode the AWS elb will actually act like a reverse proxy. Therefore the client HTTP request is terminated at the elb and then the elb will connect back to the HTTP server. Confused? This diagram should help
CRM-Reverse-proxy-tcp-flow.png

When visiting the site on an IE browser you'd be prompted for login credentials, on submitting username and password immediately another prompt would come up. After clicking ok a few times part of the CRM page will error with HTTP Error 401.1 - Unauthorized: Access is denied

----
The Cause
So to understand why this doesn't work we have to understand NTLM a little bit.

When a connection is made NTLM denies the connection and asks it to authentication. This is with a HTTP 401 Unauthorized , NTLMSSP_CHALLENGE
The client will then retry with a NTLMSSP_AUTH.

Once the authentication is completed that TCP connection is then authenticated. This authentication holds as long as the TCP session is the same, so same source address, same destination address, same source port and same destination port.

The packet capture shows a browser connecting into CRM directly. For the first TCP connection the browser starts, it gets challenged for auth, once that is complete all the requests on that same TCP connection are authorized.
The browser also opens up another TCP connection, it is again challenge for the connection and the browser will supply the username and password again as required for the new connection.
This-CRM-Working-server.png


If you think back to the ELB TCP connection diagram above, you can probably start to see why there might be a problem when using http liseners or any http reverse proxy.

The second capture screenshot below is via an ELB with http listeners enabled.

At the start, it all looks ok, new connection, auth prompt, browser supplies username/password just like last time all good.
Then for a TCP connection that is already authenticated the browser sends a NTLMSSP_NEGOTIATE... Wait wouldn't it only do that for a NEW TCP session?
This-crm-broken-server.png

The end users browser has actually switched source ports.. Ie established another new TCP connection, however the ELB is proxying that new connection back to the CRM server on the same TCP ports as the previous connection.

This is 100% supported for pure http but not NTLM. In NTLM it causes the existing connection to be invalidated and reauthenticated. The browser thinks it has two tcp connections, (both authenticated) running, when infact it only has 1.

The end result, you get prompted for username/password credentials again and again as the browser thinks the server isn't accepting the authentication information supplied as it gets an unauthorized response for a TCP NTLM connection that was already authenticated.

----
The solution
The solution is very simple. Disable http listeners on the ELB and use TCP listeners only!


What Fun! So for that how care, now you know NTLM doesn't work with ELB http listeners and why!
This will actually apply for any pure http load balancer that doesn't have native support for ntlm.

Caveman

----
Some handy references
https://msdn.microsoft.com/en-us/library/dd925287%28v=office.12%29.aspx
https://github.com/nodejitsu/node-http-proxy/issues/362
https://pubs.vmware.com/NSX-62/index.jsp?topic=%2Fcom.vmware.nsx.admin.doc%2FGUID-A781BD86-A40E-4B71-8634-5677CDD52664.html

https://s3.amazonaws.com/quickstart-reference/microsoft/sharepoint/latest/doc/Microsoft_SharePoint_2013_on_AWS.pdf -- This is document states NTLM isn't supported on ELB. At the time of writing I've got NTLM workloads behind ELBs in production working without issue using TCP listeners.






© 2015 CommsCentral