Tinkering With My Home Network

I am not very good at networking stuff together. I mean, if it's a simple network, I can usually manage. When it comes to thinking about VLANs and similar topics, I start to get a bit hazy.. but when it gets into configuring devices to carry VLAN traffic, I really need to get back to basics and do a lot of "Ok, make sure X, Y and Z work here. Done.. now, make sure X, Y and Z work there.." and so on.

Basic troubleshooting stuff.

However, because this isn't my bread and butter, I'm just not completely awesome at it (and it doesn't help that the different Ubiquiti products I have in my home all have a different CLI ..), I've needed a large-ish change window to make some adjustments.

The holiday's provide me that window.

Problem to solve: I want to run VMs on dedicated hardware connected to my work's IPSec VPN

When COVID hit, I bought, installed and configured an EdgeRouter X - a lovely little (literally) network router that comes along with a healthy feature set of a full on router. I wired that up to my home network, forwarded some ports, plugged my computer in .. and I was effectively on my work's network at home - without the need to run a VPN client.

This was helpful, especially when I needed to service end-user computers. I could more or less do anything I needed to do as if I were in the office, including new builds using MDT and use other pre-boot (but networked enabled) tools. It proved to be hugely helpful, as I could just have stuff shipped to my home, do work, then ship it back to an end-user's house when I was done.

The design wasn't ideal (spoiler: it still isn't) - this network was behind two NAT's, and all the services that I wanted to run needed to be in my office space, which wasn't exactly huge. DNS was cumbersome, as I needed access to some on-prem stuff, but needed stuff to router over the IPSec wire.

Original Network Diagram

Things have changed, and now that I'm in a hybrid work environment, the need to ship computers around isn't there as much. The need to run additional services (test computers, containers, infrastructure proof of concepts, etc.) has replaced that previous need though.

Step 0: Stand up VM Infrastructure

I've been wanting to have a small on-premise VM cluster for a bit. I'd like to tinker with Kubernetes at some point, and having the ability to snapshot a host and revert it is really, really nice.

I wound up setting up 2 computers in a Proxmox cluster. I'm not going to go into details here, because configuring and setting them up was entirely unremarkable.. it's generally worked well out of the box, and other than an issue with Ceph that need to solve, it's been just swell.

Step 1: Configure a new VLAN on the Unifi equipment, connect to IPSec.

The first couple steps here are pretty straight forward. Create a new network, let it hand out IP addresses. Set up Wifi on the network.

The IPSec termination turned out to be a bigger issue. There's evidently some issue with how Strongswan (the IPSec software in Unifi) interacts with policy-based VPN's with multiple SA's (see https://community.ui.com/questions/Problems-with-site-to-site-tunnel-after-replacing-USG-Pro-with-UXG-Pro/86f68d36-dc3a-488a-a718-fd847c56b838#answer/d59aab80-2496-4fac-8f11-29d152f25751 for a more in-depth conversation ..)

So, when I would connect the VPN, I would see phase 1 of the VPN come up, but I would get intermittent results with phase 2 - one network would be routable, but the other network would not. If I remove the working network (to see if I could get the other network working), the first network would still get connected - but not actually be routable.

This wasn't going to work out the way I wanted to, so on to a plan B.

Step 1b: Change up the configuration - make a separate network

A lot of people post about pfSense as a network router / firewall, so I wondered if this might be an option for me.

With this in mind, I change the network I defined in Unifi so that it's defined as a "VLAN Only" network.

Defining a VLAN Only network

Ensuring that Proxmox is VLAN aware is as trivial as a checkbox (well, one per node, but it's really that easy..):

Enabling VLAN on each Proxmox node

pfSense gets installed as a Proxmox guest. It carries two network interfaces - one WAN one for the WAN. It's WAN IP is an address on my internal network (in the 192.168.b.0/24 subnet), and the LAN network is what work is routing to me - and the same VLAN defined earlier. I'll set IPSec forwarding to the WAN IP address of pfSense. pfSense will handle the DHCP, DNS and other basic network functionality that network will need:

pfSense proxmox hardware configuration

pfSense WAN and LAN ip's, as shown on the console

Wifi is still handled by Unifi; and it's as easy as creating a wireless network, and assigning it to the newly created VLAN-only network.

Turning on a container VM (oh I love these things.. so lightweight) with the only NIC in the VLAN gives me an IP I expect, some semblance of a DNS configuration (that'll need tweaking later..), but I can prove that traffic goes from the VLAN to pfSense to Unifi then out the network.

Time for IPSec.

Step 1c: IPSec for the VLAN

The IPSec configuration on pfSense suffered a similar issue as that on Unifi, except in this case, I'm presented a considerable amount more configuration options to try to make it go.

In the end, the winning option seemed to be this little checkbox here:

Split Connections

Upon connection, I see multiple Phase 1 connections, each with their own SA (Phase 2 information). This.. mostly? works. Every now and then, one tunnel doesn't seem to come up, but a service restart seems to resolve this - at least for a period of time.

pfSense's documentation (https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/configure-p1.html) identifies this as a scenario that applies to me:

IKEv2 Only) By default when an IKEv2 tunnel has multiple phase 2 definitions the settings are collapsed in the IPsec configuration such that all phase 2 combinations are held in a single child SA.

Split Connections changes this behavior to be more like IKEv1 where each phase 2 entry is configured by the daemon as its own separate child SA.

Certain scenarios require this behavior, such as:

  • The remote peer does not properly handle multiple addresses in single traffic selectors. This is especially common in Cisco, Checkpoint, Fortinet, and Juniper equipment.

  • Each child SA must have unique traffic selector or proposal settings. This could be due to the peer only allowing specific combinations of local/remote subnet pairs or different encryption options for each child SA.

The long-term fix is that I would need to create a virtual interface on both ends of the tunnel, and set up routing rules. This is likely doable, but requires some additional work on the work end of things, something that I'm specifically trying to avoid.

Step 2: Switching!

This soaked up a lot more time than I expected.

The layout of my house, and positions of equipment, has a few switches in weird places in the basement, a haul up to the Cloffice (closet + office) that ends at the EdgeRouter X. So.. how hard can this be? I'll trunk the default and VLAN 21 around, and boom, we're done!

Well, after losing access to the switches.. and a factory reset (or two) later.. it was time to take a step back.

Thinking about layer 1, connectivity looks a bit like this:

Layer 1 map

Connectivity between the Telus router and the Unifi is simple - no VLANs or special routing there. The Unifi is a client of the Telus router (and the Telus router simply sends everything to Unifi).

Since all the ports on the Unifi are occupied (and it's in an inconvenient place to physically access), I'm not concerned about having static access to VLAN 21 at the Unifi.

Plugging my computer into the EdgeSwitch, and ensuring I can get VLAN 21 and the default VLAN's will be important.

The trick with the EdgeSwitch was configuring the uplink port (in my case, port 1) as a trunk port, and the port that goes to the EdgeRouter as:

  • Default VLAN as "untagged". This becomes the default VLAN if none is defined by the connecting client.
  • VLAN 21 as "tagged". This is available to the connecting client, provided it is VLAN aware.

EdgeSwitch Switch Configuration

Once I'm happy that the EdgeSwitch is handling traffic the way I'm wanting it to, it's on to the EdgeRouter.

I ran the EdgeRouter through the "Switch" wizard. I allow the management IP to be picked up from DHCP (which I'll make static later), enable VLAN awareness, and let 'er rip.

Once it's up, there are a few steps to configure:

  • From the Dashboard, 'Add Interface', and create a VLAN. ID is 21, and the interface is switch0.
  • Once created, I need to reconfigure the phyiscal switch0 (not the newly created switch0.21).
    • eth0 needs to be aware of what VLAN's it carries, so these need to go in the vid field. Leave PVID alone - this allows communication on the default VLAN.
    • eth1 (and other ports that'll be connected into VLAN21) get the VLAN set in the pvid field - this sets the Port's VLAN ID to 21.
    • Any interface that is on the default VLAN, but can be tagged by the client, basically needs to mirror the configuration for eth0 - set the vid value to whatever VLAN.

EdgeRouter Switch configuration

Step 3: Cleanup

Finally, I'll set some DHCP assignments from Unifi - so that the VLAN 21 router, EdgeSwitch and EdgeRouter occupy IP space that isn't in the DHCP pool.

Exporting working config's from all the network devices will keep me from having to remember that I posted this blurb to my website and be left in a position to rebuild it later...

Step: whatever

There are some final issues here that need resolving:

  • It would be great if the VPN tunnel could just terminate at the Unifi. It would eliminate the need for the pfSense VM, and simplify some configuration, especially the port forwarding bits. The VPN endpoint being behind two NAT's still isn't ideal, and I think you could probably find a holy war on Slashdot or some other forum debating the success of this implementation.
  • The VPN tunnel doesn't survive the expiration of the Life Time value. Every 24 hours, the tunnel drops and must be manually restarted. There are some further settings that can be flipped to see if this behaviour can be resolved, but the end result may be a small shell script to restart the service if we can determine the tunnel is down.