[3] On clean slate, adoption and VPC peering

There are a few things that an infrastructure engineer loves more than having a chance to build infra from scratch when starting a new job. Not having to deal with legacy code, historical reasons™, known unknowns… However, the reality is a harsh mistress so getting a clean slate is rarely an option. An alternative to that, still very good in my book, is to be able to ignore what is currently running and instead create a new, cleaner, better™ setup in parallel and then move things over one at a time.

I expect neither of these when I start a new job.

I got adopted

With this in mind I started a new adventure last November. I’ve decided to join ArK Kapital, which was creating a lot of buzz in the Nordics and managed to team together some really impressive names from the Stockholm IT scene.

I was not expecting a green field opportunity because the company has been working for a while, and was about to launch the second version of its cool financial insights platform called AiM. Yet, the gods of infrastructure were in a good mood that month, so I was given a chance to build new things in parallel, test them properly and then phase out the old setup. Say no more!

I was also lucky to be kind of adopted by the backend team, since I was the first full time infrastructure engineer in the company, a team in becoming if you will. The backend team had a deep understanding of what is currently running and what would make it better and ready for the future ahead of us. We took a step back and looked at the big picture. We didn’t like what we saw. It was good for what it was supposed to do up until 2022. but it was not ready for the big aspirations ArK was having.

We knew that we needed to be ready for growth, we just didn’t know how big that growth would be, and how fast. We knew that new markets around the world would start opening for us, but what would that mean for AiM and its underlying infrastructure was in the domain of the “known unknowns”. With that in mind we knew that we wanted to move away from GCP Cloud Run and start using Kubernetes. I, for one, wanted my developers to be able to work better and faster with their applications. In the long run I wanted to allow my developers to have their own namespace in a cluster where they would be able to mess around the whole service setup without interrupting anyone. Of course, I also wanted to not burn a ton of money along the way. ArK’s management left me with an impression that they really think through every krona the company spends. Getting a budget for fancy tools is not a problem, but do you really need it? I certainly like that approach to money management. So, make it fast, scalable, light, maintainable, developer-friendly and not too expensive? Challenge accepted!

I have been doing this work for ~20 years, and I’d like to think that I know what I’m doing. One of the things keeping me sane all these years was contemplating over the work that I’ve done. I constantly keep asking myself “how will this thing that I’m doing today come to kick me in the rear end in 6 months or a year?”. It is a good way to stop oneself from becoming too cocky about what you’re doing, because, let’s face it, it will come back, and it will kick you, the only question is how painful will it be. So, even though I was hired to be a “senior infrastructure engineer”, that does not mean that I have all the answers in my back pocket. Having someone to bounce my ideas and thoughts with was of utmost importance. So being adopted by the backend team was what made my work so much easier. It took us ~2 months of super fun, but super hard work to get to a setup that ticks all the checkboxes.

The purpose of this blog post is not to deep dive into every single thing we’ve done but to address one particular problem we’ve hit early on. I might write about other solutions/decisions we’ve made in the future, but what I’d like to address today was probably the most annoying issue we faced.

Ingen fara på taket

In my time as an infrastructure engineer (sysadmin, DevOps, SRE and all the other names popular at the time) I’ve seen startup companies having two approaches to building and securing things. One was to keep things open, make every developer an admin in GCP/AWS/Provider-X and then, at some point in the future, when things start to seemingly work, start closing the infrastructure off so that it becomes secure. I have always put such companies in the category “talk about the security first, implement it the last”. I’m not saying that it’s not doable, but in the long run it makes things harder. Especially if such a company allows things to be created manually (because it’s faster) and avoids using infra-as-code from the very start. Things can and will get ugly.

Then there are other companies. Those that have a more “shoot first, ask questions later” approach to security. Their road is a bit slower because you really have to think things through and make sure that the access is granted only where it’s needed. If possible, not a single permission more than it’s actually needed with a clear audit trail.

To my great surprise, ArK was mostly in the second group. Security was taken very seriously early on. There was room for improvement, but the foundations upon which the house was being built were very sane. That made my job much easier since I didn’t have to spend meetings upon meetings explaining to my CTO why we should use Terraform and not click around the UI, and why people should have the least privilege in the system possible. She already enforced those ideas even before I joined.

The joy of computer networking

One of the early decisions I’ve made was that I wanted completely private Kubernetes clusters. Not just nodes, but the control plane too. In an ideal world I would have one GCP project that would be a gateway into other projects, meaning that if I wanted to access Project B I would have to get through Project A that acted as a VPC peering gateway for private infrastructure. That is what I’ve always done in AWS. However, GCP will not let me do it the way I wanted. For you see, GCP does not support transitive VPC peering:

Only directly peered networks can communicate. Transitive peering is not supported. In other words, if VPC network N1 is peered with N2 and N3, but N2 and N3 are not directly connected, VPC network N2 cannot communicate with VPC network N3 over VPC Network Peering.

Now that is a problem. If we were not using Kubernetes this would not be an issue. However, when you set up a GKE cluster some magical things happen. Your nodes will be running using the VPC you’ve setup yourself, and then Google creates a control plane in a completely different network, owned by them, and automatically creates a VPC peering between your VPC and that other network. That is how all GCP managed services run. In most cases that is invisible to the end user and all you have to worry about is potentially creating firewall rules for those networks. However, when you use private Kubernetes that becomes a problem, because of the quote I’ve added above. This in practice means that if I were to create the earlier mentioned Project A that I would use to access all the other projects through the VPC peering, I would not be able to get to the Kubernetes control plane due to this limitation.

At this point we located two problems. Problem number one was: how do we access the Kubernets cluster once it’s running? Problem number two: how will our CI/CD pipeline, which we planned to run in Project A access the cluster? The obvious answer to me was: VPN. Google’s answer to this problem was to set up network proxies. I didn’t like that one very much, and it was not an option for the CD tool we’ve decided to use.

I didn’t feel like fully self-hosting a VPN for that, and I didn’t want to trust the office and use site-to-site VPN, so after some testing we’ve decided to use Tailscale to access infrastructure from anywhere(on that some other time). It comes with a wonderful ACL system which makes deciding who can access what really easy.

One Tailscale to go, please

The setup of Tailscale was very straightforward. We’ve set up the smallest VM we could find in GCP and ran the following on it:

  1. Get into the instance
gcloud compute ssh --zone "europe-west1-b" "our-cool-instance"  --tunnel-through-iap --project "our-cool-gcp-project"

The instance is in the private subnet, so not visible on the Internet. However, remember that you need to grant SSH access from Google IP ranges for this command above to work.

  1. Update the instance and setup IP forwarding
apt update && apt upgrade
echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
sudo sysctl -p /etc/sysctl.d/99-tailscale.conf
  1. Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
  1. Run tailscale with a list of subnets that you want accessible through the VPN
tailscale up --advertise-routes=10.100.10.0/24,172.16.0.32/28

The two IP ranges listed here are a private IP range from our VPC 10.100.10.0, and the IP range used for GKE control plane 172.16.0.32/28. It is with this network that Google will create that magical VPC peering I’ve mentioned before. Once you run tailscale command you will be presented with a URL that will open up your Taislcale dashboard where you need to approve this connection and then further approve the advertised routes.

And that’s it. If you are on Tailscale network you will now be able to run kubectl against the GKE control plane as you can reach 172.16.0.32/28 network. With Tailscale access control you can now limit which clients have access to that IP range and port 443 (GKE API). Even though all the employees need to be authenticated using gcloud to be able to do anything, why trust them to even reach the API, right?

Jason and the Argonauts

I always loved how the Kubernetes community played with the nautical terminology for tools developed around and for Kubernetes, starting with the name of the tool itself. I am also a big fan of Greek mythology. Whether people are aware or not, Ancient Greek culture has influenced our civilization in many ways. And it continues to do so. This, however, didn’t affect my decision to use ArgoCD as the CD tool at ArK. OK, it did, just a little. The fact is that ArgoCD (and other tools the project is building) are awesome in many ways and I really enjoy using them.

Sadly, ArgoCD was not built with fully private clusters in mind. You will find a lot of ways people make workarounds for this. ArgoCD doesn’t properly support PROXY so Google’s suggestion for private clusters would not work in our case. And I really didn’t like the proxy idea so that was off the table. So, what are the options we have?

The most obvious one is to run ArgoCD in each cluster we have. I didn’t like that at all. If you have only one project/cluster, sure, but we have more than one. I also like allowing my developers access to it and I like having GitHub triggering webhooks to deploy changes. Perhaps I wanted too much? Initially we went with multiple ArgoCD installations, but I couldn’t let it go. I continued contemplating on the subject. So, I started thinking from the ArgoCD point of view, as if I were in there. What do I need to access other clusters? I need to be able to get to the Kubernetes API that is on the network I can’t VPC peer into. What if I were a VPN user? You see where I’m going with this?

I’ve set up a new ArgoCD in Project A, and without granting any access privileges to it, I’ve added one of the private clusters to it. Things failed, of course, but I needed to understand which of the various components of ArgoCD actually needed that access at all. It came down to just 2 - argocd-server and argocd-application-controller.

Knowing that Tailscale works in Kubernetes, I went on to read the ArgoCD Helm chart in detail to understand if I could add Tailscale sidecar to those two services, making them a part of my VPN network and allowing them to access the API of the clusters they needed access to. Once I’ve understood what ArgoCD Helm chart allows me to do (thank you ArgoCD people!) all I had to do is update my values.yaml with the following:

  controller:
    extraContainers:
      - name: ts-sidecar
        imagePullPolicy: Always
        image: "ghcr.io/tailscale/tailscale:v1.36.2"
        env:
          # Store the state in a k8s secret
          - name: TS_KUBE_SECRET
            value: tailscale-application-controller
          - name: TS_USERSPACE
            value: "false"
          - name: TS_EXTRA_ARGS
            value: "--accept-routes"
        securityContext:
          capabilities:
            add:
              - NET_ADMIN

  server:
    extraContainers:
      - name: ts-sidecar
        imagePullPolicy: Always
        image: "ghcr.io/tailscale/tailscale:v1.36.2"
        env:
          # Store the state in a k8s secret
          - name: TS_KUBE_SECRET
            value: tailscale-server
          - name: TS_USERSPACE
            value: "false"
          - name: TS_EXTRA_ARGS
            value: "--accept-routes"
        securityContext:
          capabilities:
            add:
              - NET_ADMIN

ArgoCD could now access GKE API IP range and I could authenticate it using argocd CLI tool. I could now have a single installation of ArgoCD, get all the features I wanted and keep my setup completely locked out. It was a good day.

Die unendliche Geschichte

Working as an infrastructure engineer is a never ending story. The setup we’ve created at ArK is pretty nice, but it’s far from over for us. There are more things to tweak, more privileges to be removed and things to be locked down. Yet, the journey so far has been very fun. If there is one thing I will take from this little essay then that would be that it takes two to break the silos in a company. After this initial work I went on to work very closely with other teams and in the process I broke the silos that might have existed between the infra and other teams, setting the foundation of any new infra engineers that join ArK.

Now I will step back and contemplate on the work that we’ve done. Trying to figure out all the ways this setup will come to kick me in the rear end in 6 months or a year.