Haproxy for Active Directory LDAPS
One more issue to solve...
We have one domain controller at work that is kinda critical to the entire operation. I mean, not like this particular server is the only AD server on the site, but it somehow became only LDAPS server responsible for authentication for non-AD enabled services.
Like VPN. Like our ERP.
So when it's down, it's a bit of a problem. We tend to avoid making it go down during operational hours, but nothing is 100% infallible. Plus with trying to be more hands-off with hosts (I prefer cattle instead of pets), more and more Active Directory specific CVE's coming out, this one machine is being a bit of a pain in the ass.
How to make one service more reliable
Since all the DCs in a site (well, all in the organization) serve the same directory, so the thought crosses the mind to swap in a proxy host at the IP address, move the critical host to a new IP address, then configure the proxy to talk to whatever host it can connect to.
I've been aware of
HAProxy as a means to proxy traffic to multiple backend servers, but we've never really had an application to use it on, until this project.
A better way
Honestly, a better way would be to configure the appliances that are doing the LDAPS lookups be HA, but we can't all be winners. Some appliances don't support it, other appliances have some weird configuration gotcha's...
And in other cases, inter-company office politics make the change too hard.
Getting started, but with ANSIBLE!
I've been trying to get better with Ansible, so today I'm starting my configuration from a blank OS and a working ansible controller node.
First step, inventory.
Adding to inventory is easy enough - just add the host. Ran into a small issue that DNS resolution wasn't working yet for the host (DNS replication delays), so we can fix this by setting
ansible_host and a corresponding IP address.
Second step, getting connected
I wanted to figure out how to get the server managed without fiddling a lot with the host first. Normally I would connect to the host, install my own SSH keys, then connect from ansible, but I think I can get ansible to do this for me.
This wound up setting a bunch of extra variables on the command line:
ansible-playbook -l haproxy all-host-configure.yml --extra-vars "ansible_user=timatlee ansible_password=XXXXX ansible_sudo_pass=XXXXX"
I needed to install
sshpass on the controller node, but otherwise - I was good to go.
Second + n steps, writing the playbook
I've started my last few playbooks that simply call a role, then all the heavy lifting goes into a role. I feel like this enables me to more easily call dependencies on other roles or collections, set variables at a larger scope and so on. So really, I wind up with a playbook that reads something like:
1# haproxy.yml 2--- 3- name: HAProxy configuration 4 hosts: haproxy 5 become: true 6 7 roles: 8 - haproxy
haproxy, refers to a group within the inventory. The idea being - maybe I want more than one of these hosts.
Writing the role
This took some iteration.
Installing from Debian's repo's
Out of the gate, I started with installing
haproxy from Debian's repositories. This seemed to well enough, but I started getting some odd errors connecting to the LDAPS backend, similar to:
1[WARNING] (10) : Health check for server ldap/openldap succeeded, reason: Layer7 check passed, code: 0, info: "Success", check duration: 0ms, status: 3/3 UP. 2<133>Sep 13 19:39:29 haproxy: Health check for server ldap/openldap succeeded, reason: Layer7 check passed, code: 0, info: "Success", check duration: 0ms, status: 3/3 UP. 3[WARNING] (10) : Health check for server ldap/ad-ldap failed, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 0ms, status: 0/2 DOWN. 4[WARNING] (10) : Server ldap/ad-ldap is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. 5<133>Sep 13 19:39:29 haproxy: Health check for server ldap/ad-ldap failed, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 0ms, status: 0/2 DOWN. 6<129>Sep 13 19:39:29 haproxy: Server ldap/ad-ldap is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
These messages were taken from the github issue (at https://github.com/haproxy/haproxy/issues/1390). This lead me to a mentioned issue, which made reference to fixes not being backported into version 1.8 of
haproxy... and guess what version comes with the Debian apt repositories...
Installing from Docker
Oh yeah, back to Docker. The role's tasks change a bit:
- Install Docker
- Install a docker-compose file
- Bring it all up.
Turns out.. kinda is. Mostly. Sort of.
Jeff Geerling has a Docker role for ansible that did most of what I wanted. Docker-compose needed to be not installed so that Python's
docker library works. This is is an easy flag to set for the role.
pip also needs to be installed so that we can Python's
docker-compose installed. These are required for Ansible's interaction with Docker. I went ahead and got this from Debian's repositories, but wonder if there's a more "current" version to get.
In any case, installing
pip this way solved the problem.. so onwards.
After the software installation, it's just a matter of creating some directories and copying over some config files for
/etc/haproxy/errors/*.http(which I wound up commenting out from the configuration anyways..)
The docker-compose file is pretty straightforward, thankfully.
Haproxy doesn't seem to need much environment configuration, but there were a few additions:
haproxy's stats page is enabled, and I'm exposing that on port 8404. I'm leaving it up to the operating system to firewall it (instead of using something like
- I only care about proxying LDAPS traffic. We've moved away from LDAP, so there's no reason to be proxying port 389.
- I wasn't able to start the container on port 636 without the
net.ipv4.ip_unprivileged_port_start=0line. This is due to ports below 1024 being considered privileged, and only usable by the system.
1# docker-compose.yml 2--- 3version: '3.3' 4 5services: 6 haproxy: 7 container_name: haproxy 8 restart: always 9 image: haproxy 10 volumes: 11 - '/etc/haproxy:/usr/local/etc/haproxy:ro' 12 ports: 13 - '636:636' 14 - '8404:8404' 15 sysctls: 16 - net.ipv4.ip_unprivileged_port_start=0
This took a bit of work and fiddling around. I needed to have a few things happen here:
- Backend servers should be checked, and if offline, be taken out of the rotation
- SSL should not terminate at the proxy, instead should be passed through to the backend server (and terminate there). This removes the need for
haproxyto hold its own certificate, and allows me to continue to use the automatic certificate renewal in a domain controller
- Connections should be prioritized at one physical site, but failover to our other physical site. These are in separate IP address spaces
Some learnings along the way:
- I struggled quite a bit with the SSL piece, and in the end more or less gave up on validating SSL checks - and opted to add the
ssl-server-verify noneoption in the global section. I should revisit this, but I know that the traffic between
haproxyand my domain controllers are encrypted, but by disabling SSL verification, it is open to a MITM attack (though we would be in a BAD place if that happened...)
- Logging needs to come out to stdout so that
docker container logscan read them.
- I took the SSL bind options, ciphersuites and ciphers directly from the comments in the original haproxy.cfg file (specifically https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/). This also should be revisited, as the Mozilla list seems to be more comprehensive.
- Most of the defaults were left alone, though I did remove the error pages and changed the
- On that thought,
mode tcpis required to pass the LDAPS right through to the backend server. This has some unintended consequences when running in HTTP mode (missing some headers like
X-Forwardedand such), but these weren't a concern for this application.
frontend statspage needs to be
mode httpfor somewhat obvious reasons. This generates the basic stats page from within
haproxyand is enough to see what's going on at a high level. I've left the firewalling of that up to the OS, but some level of authentication should be set up here.
frontend ldaps-inis nothing too remarkable...
backend ldaps-outis where some of the fun begins...
- The default behaviour is for
haproxyto round-robin connect between the servers listed, but I want to prioritize the servers "closest" to
haproxy- so dc1, dc3 and dc4 are it.
- dc4 doesn't actually exist - that's google's DNS server - but it provides me an easy way of checking the behaviour when a server is offline or not able to be connected. This host basically goes offline after 2 seconds of
haproxybeing awake, and isn't used in the connection pool.
- e-dc2 and e-dc3 both have the
backuptag on them, and my understanding is that the first backup server gets used when all the non-backup servers are offline. This is desirable, as I shouldn't have all my DCs offline within a site. If we've hit this condition, I'm prioritizing getting the site operational, and the services that depend on that authentication can just.. wait.
- I was using
option ldap-check, but Active Directory does not allow anonymous binds by default - and we prefer to leave it that way. Instead, we can emulate a connection to the domain controller that doesn't do anything. Source found on this gist, but after some travels, found the original mailing list post here.
- The default behaviour is for
1# haproxy.cfg 2 3global 4 log stdout format raw daemon debug 5 daemon 6 ssl-server-verify none 7 8 9 # Default ciphers to use on SSL-enabled listening sockets. 10 # For more information, see ciphers(1SSL). This list is from: 11 # https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/ 12 # An alternative list with additional directives can be obtained from 13 # https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy 14 # ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS 15 # ssl-default-bind-options no-sslv3 16 ssl-default-bind-options ssl-min-ver TLSv1.2 prefer-client-ciphers 17 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 18 ssl-default-bind-ciphers ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES256:ECDH+AES128:!aNULL:!SHA1:!AESCCM 19 20 ssl-default-server-options ssl-min-ver TLSv1.2 21 ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 22 ssl-default-server-ciphers ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES256:ECDH+AES128:!aNULL:!SHA1:!AESCCM 23 24 tune.ssl.default-dh-param 2048 25 26 27defaults 28 log global 29 mode tcp 30 option tcplog 31 option dontlognull 32 timeout connect 1s 33 timeout client 20s 34 timeout server 20s 35 36frontend stats 37 mode http 38 option httplog 39 bind *:8404 40 stats enable 41 stats uri /stats 42 stats refresh 10s 43 stats admin if LOCALHOST 44 45frontend ldaps-in 46 mode tcp 47 option tcplog 48 bind *:636 49 mode tcp 50 option tcplog 51 default_backend ldaps-servers 52 53backend ldaps-servers 54 mode tcp 55 56 server dc1 192.168.10.253:636 check 57 server dc3 192.168.10.218:636 check 58 server dc4 126.96.36.199:636 check 59 server e-dc2 192.168.20.213:636 check backup 60 server e-dc3 192.168.20.214:636 check backup 61 62# option ldap-check 63 # Below, ldap check procedure : 64 option tcp-check 65 tcp-check connect port 636 ssl 66 tcp-check send-binary 300c0201 # LDAP bind request "<ROOT>" simple 67 tcp-check send-binary 01 # message ID 68 tcp-check send-binary 6007 # protocol Op 69 tcp-check send-binary 0201 # bind request 70 tcp-check send-binary 03 # LDAP v3 71 tcp-check send-binary 04008000 # name, simple authentication 72 tcp-check expect binary 0a0100 # bind response + result code: success 73 tcp-check send-binary 30050201034200 # unbind request
So while the whole thing works, there's a few things yet to fix:
- Fixing some variables in
- Revisiting the SSL connection verification to the domain controllers
- Expanding (and updating) the SSL configuration to be more current
And of course, testing...