Haproxy for Active Directory LDAPS

One more issue to solve...

We have one domain controller at work that is kinda critical to the entire operation. I mean, not like this particular server is the only AD server on the site, but it somehow became only LDAPS server responsible for authentication for non-AD enabled services.

Like VPN. Like our ERP.

So when it's down, it's a bit of a problem. We tend to avoid making it go down during operational hours, but nothing is 100% infallible. Plus with trying to be more hands-off with hosts (I prefer cattle instead of pets), more and more Active Directory specific CVE's coming out, this one machine is being a bit of a pain in the ass.

How to make one service more reliable

Since all the DCs in a site (well, all in the organization) serve the same directory, so the thought crosses the mind to swap in a proxy host at the IP address, move the critical host to a new IP address, then configure the proxy to talk to whatever host it can connect to.

I've been aware of HAProxy as a means to proxy traffic to multiple backend servers, but we've never really had an application to use it on, until this project.

A better way

Honestly, a better way would be to configure the appliances that are doing the LDAPS lookups be HA, but we can't all be winners. Some appliances don't support it, other appliances have some weird configuration gotcha's...

And in other cases, inter-company office politics make the change too hard.

Getting started, but with ANSIBLE!

I've been trying to get better with Ansible, so today I'm starting my configuration from a blank OS and a working ansible controller node.

First step, inventory.

Adding to inventory is easy enough - just add the host. Ran into a small issue that DNS resolution wasn't working yet for the host (DNS replication delays), so we can fix this by setting ansible_host and a corresponding IP address.

Second step, getting connected

I wanted to figure out how to get the server managed without fiddling a lot with the host first. Normally I would connect to the host, install my own SSH keys, then connect from ansible, but I think I can get ansible to do this for me.

This wound up setting a bunch of extra variables on the command line:

ansible-playbook -l haproxy all-host-configure.yml --extra-vars "ansible_user=timatlee ansible_password=XXXXX ansible_sudo_pass=XXXXX"

I needed to install sshpass on the controller node, but otherwise - I was good to go.

Second + n steps, writing the playbook

I've started my last few playbooks that simply call a role, then all the heavy lifting goes into a role. I feel like this enables me to more easily call dependencies on other roles or collections, set variables at a larger scope and so on. So really, I wind up with a playbook that reads something like:

1# haproxy.yml
2---
3- name: HAProxy configuration
4  hosts: haproxy
5  become: true
6
7  roles:
8    - haproxy

The hosts, haproxy, refers to a group within the inventory. The idea being - maybe I want more than one of these hosts.

Writing the role

This took some iteration.

Installing from Debian's repo's

Out of the gate, I started with installing haproxy from Debian's repositories. This seemed to well enough, but I started getting some odd errors connecting to the LDAPS backend, similar to:

1[WARNING]  (10) : Health check for server ldap/openldap succeeded, reason: Layer7 check passed, code: 0, info: "Success", check duration: 0ms, status: 3/3 UP.
2<133>Sep 13 19:39:29 haproxy[10]: Health check for server ldap/openldap succeeded, reason: Layer7 check passed, code: 0, info: "Success", check duration: 0ms, status: 3/3 UP.
3[WARNING]  (10) : Health check for server ldap/ad-ldap failed, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 0ms, status: 0/2 DOWN.
4[WARNING]  (10) : Server ldap/ad-ldap is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
5<133>Sep 13 19:39:29 haproxy[10]: Health check for server ldap/ad-ldap failed, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 0ms, status: 0/2 DOWN.
6<129>Sep 13 19:39:29 haproxy[10]: Server ldap/ad-ldap is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

These messages were taken from the github issue (at https://github.com/haproxy/haproxy/issues/1390). This lead me to a mentioned issue, which made reference to fixes not being backported into version 1.8 of haproxy... and guess what version comes with the Debian apt repositories...

Installing from Docker

Oh yeah, back to Docker. The role's tasks change a bit:

  • Install Docker
  • Install a docker-compose file
  • Bring it all up.

Easy, right?

Turns out.. kinda is. Mostly. Sort of.

Jeff Geerling has a Docker role for ansible that did most of what I wanted. Docker-compose needed to be not installed so that Python's docker library works. This is is an easy flag to set for the role.

Installing pip

Python's pip also needs to be installed so that we can Python's docker and docker-compose installed. These are required for Ansible's interaction with Docker. I went ahead and got this from Debian's repositories, but wonder if there's a more "current" version to get.

In any case, installing pip this way solved the problem.. so onwards.

Configurations

After the software installation, it's just a matter of creating some directories and copying over some config files for docker and haproxy:

  • /usr/local/src/docker-compose-haproxy/docker-compose.yml
  • /etc/haproxy/haproxy.cfg
  • /etc/haproxy/errors/*.http (which I wound up commenting out from the configuration anyways..)

Docker-compose

The docker-compose file is pretty straightforward, thankfully. Haproxy doesn't seem to need much environment configuration, but there were a few additions:

  • haproxy's stats page is enabled, and I'm exposing that on port 8404. I'm leaving it up to the operating system to firewall it (instead of using something like traefik)
  • I only care about proxying LDAPS traffic. We've moved away from LDAP, so there's no reason to be proxying port 389.
  • I wasn't able to start the container on port 636 without the net.ipv4.ip_unprivileged_port_start=0 line. This is due to ports below 1024 being considered privileged, and only usable by the system.
 1# docker-compose.yml
 2---
 3version: '3.3'
 4
 5services:
 6  haproxy:
 7    container_name: haproxy
 8    restart: always
 9    image: haproxy
10    volumes:
11      - '/etc/haproxy:/usr/local/etc/haproxy:ro'
12    ports:
13      - '636:636'
14      - '8404:8404'
15    sysctls:
16      - net.ipv4.ip_unprivileged_port_start=0

Haproxy

This took a bit of work and fiddling around. I needed to have a few things happen here:

  • Backend servers should be checked, and if offline, be taken out of the rotation
  • SSL should not terminate at the proxy, instead should be passed through to the backend server (and terminate there). This removes the need for haproxy to hold its own certificate, and allows me to continue to use the automatic certificate renewal in a domain controller
  • Connections should be prioritized at one physical site, but failover to our other physical site. These are in separate IP address spaces

Some learnings along the way:

  • I struggled quite a bit with the SSL piece, and in the end more or less gave up on validating SSL checks - and opted to add the ssl-server-verify none option in the global section. I should revisit this, but I know that the traffic between haproxy and my domain controllers are encrypted, but by disabling SSL verification, it is open to a MITM attack (though we would be in a BAD place if that happened...)
  • Logging needs to come out to stdout so that docker container logs can read them.
  • I took the SSL bind options, ciphersuites and ciphers directly from the comments in the original haproxy.cfg file (specifically https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/). This also should be revisited, as the Mozilla list seems to be more comprehensive.
  • Most of the defaults were left alone, though I did remove the error pages and changed the mode to tcp.
  • On that thought, mode tcp is required to pass the LDAPS right through to the backend server. This has some unintended consequences when running in HTTP mode (missing some headers like X-Forwarded and such), but these weren't a concern for this application.
  • The frontend stats page needs to be mode http for somewhat obvious reasons. This generates the basic stats page from within haproxy and is enough to see what's going on at a high level. I've left the firewalling of that up to the OS, but some level of authentication should be set up here.
  • frontend ldaps-in is nothing too remarkable...
  • backend ldaps-out is where some of the fun begins...
    • The default behaviour is for haproxy to round-robin connect between the servers listed, but I want to prioritize the servers "closest" to haproxy - so dc1, dc3 and dc4 are it.
    • dc4 doesn't actually exist - that's google's DNS server - but it provides me an easy way of checking the behaviour when a server is offline or not able to be connected. This host basically goes offline after 2 seconds of haproxy being awake, and isn't used in the connection pool.
    • e-dc2 and e-dc3 both have the backup tag on them, and my understanding is that the first backup server gets used when all the non-backup servers are offline. This is desirable, as I shouldn't have all my DCs offline within a site. If we've hit this condition, I'm prioritizing getting the site operational, and the services that depend on that authentication can just.. wait.
    • I was using option ldap-check, but Active Directory does not allow anonymous binds by default - and we prefer to leave it that way. Instead, we can emulate a connection to the domain controller that doesn't do anything. Source found on this gist, but after some travels, found the original mailing list post here.
 1# haproxy.cfg
 2
 3global
 4    log stdout format raw daemon debug
 5    daemon
 6    ssl-server-verify none
 7
 8
 9    # Default ciphers to use on SSL-enabled listening sockets.
10    # For more information, see ciphers(1SSL). This list is from:
11    #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
12    # An alternative list with additional directives can be obtained from
13    #  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
14    # ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
15    # ssl-default-bind-options no-sslv3
16    ssl-default-bind-options ssl-min-ver TLSv1.2 prefer-client-ciphers
17    ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
18    ssl-default-bind-ciphers ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES256:ECDH+AES128:!aNULL:!SHA1:!AESCCM
19
20    ssl-default-server-options ssl-min-ver TLSv1.2
21    ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
22    ssl-default-server-ciphers ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES256:ECDH+AES128:!aNULL:!SHA1:!AESCCM
23
24    tune.ssl.default-dh-param 2048
25
26
27defaults
28    log     global
29    mode    tcp
30    option  tcplog
31    option  dontlognull
32    timeout connect 1s
33    timeout client  20s
34    timeout server  20s
35
36frontend stats
37    mode http
38    option httplog
39    bind *:8404
40    stats enable
41    stats uri /stats
42    stats refresh 10s
43    stats admin if LOCALHOST
44
45frontend ldaps-in
46    mode    tcp
47    option  tcplog
48    bind *:636
49    mode tcp
50    option tcplog
51    default_backend ldaps-servers
52
53backend ldaps-servers
54    mode tcp
55
56    server dc1 192.168.10.253:636 check
57    server dc3 192.168.10.218:636 check
58    server dc4 8.8.8.8:636 check
59    server e-dc2 192.168.20.213:636 check backup
60    server e-dc3 192.168.20.214:636 check backup
61
62#    option ldap-check
63    # Below, ldap check procedure :
64    option                tcp-check
65    tcp-check             connect port 636 ssl
66    tcp-check             send-binary 300c0201            # LDAP bind request "<ROOT>" simple
67    tcp-check             send-binary 01                  # message ID
68    tcp-check             send-binary 6007                # protocol Op
69    tcp-check             send-binary 0201                # bind request
70    tcp-check             send-binary 03                  # LDAP v3
71    tcp-check             send-binary 04008000            # name, simple authentication
72    tcp-check             expect binary 0a0100            # bind response + result code: success
73    tcp-check             send-binary 30050201034200      # unbind request

Done ?

So while the whole thing works, there's a few things yet to fix:

  • Fixing some variables in docker-compose
  • Revisiting the SSL connection verification to the domain controllers
  • Expanding (and updating) the SSL configuration to be more current

And of course, testing...