tailscale/tailscale

Incorrect TargetPort when user port conflicts with health check port

Summary

  • Context: The provision() function in cmd/k8s-operator/egress-services.go manages port mappings between ClusterIP Services and proxy containers for egress traffic routing.

  • Bug: When a user specifies a Service port that matches the port number of a previously-configured health check port, the function reuses the old health check port configuration instead of allocating a new container port.

  • Actual vs. expected: The user’s port retains TargetPort: 9002 (pointing to the health check endpoint) instead of being allocated a unique container port in the range [10000-11000).

  • Impact: Cluster traffic sent to the user’s port is routed to the health check endpoint instead of being proxied to the tailnet target, breaking the egress service functionality for that port.

Code with bug

The port cleanup loop in provision() matches ports using only Port and Protocol fields, without considering whether the matched port is a health check port that requires special handling:

for i := len(clusterIPSvc.Spec.Ports) - 1; i >= 0; i-- {
    pm := clusterIPSvc.Spec.Ports[i]
    found := false

    for _, wantsPM := range svc.Spec.Ports {
        if wantsPM.Port == pm.Port &&
            strings.EqualFold(string(wantsPM.Protocol), string(pm.Protocol)) {

            // Updates name but doesn't update TargetPort
            // <-- BUG 🔴 Health check ports should be removed, not renamed
            if wantsPM.Name != "" {
                clusterIPSvc.Spec.Ports[i].Name = wantsPM.Name
            } else {
                clusterIPSvc.Spec.Ports[i].Name = "tailscale-unnamed"
            }

            found = true
            break
        }
    }

    if !found {
        clusterIPSvc.Spec.Ports = slices.Delete(
            clusterIPSvc.Spec.Ports,
            i,
            i+1

When a user’s port matches an existing health check port (default: 9002), the loop keeps the old port entry and updates only its Name field from "tailscale-health-check" to the user’s specified name. The TargetPort remains set to 9002 instead of being allocated a new container port.

Example

A test reproducing the bug shows the incorrect behavior when a user adds port 9002 after an initial reconcile:

First reconcile (user has port 80):


Second reconcile (user adds port 9002):

Input ExternalName Service specifies:


Expected ClusterIP Service result:


Actual ClusterIP Service result:


The user’s port 9002 has TargetPort: 9002, which points to the health check endpoint. Cluster traffic sent to port 9002 will reach the health check instead of being proxied to the intended tailnet target.

Recommended fix

The cleanup loop should check whether a matched port is a health check port (identifiable by Name: "tailscale-health-check") and remove it instead of keeping it. This ensures user ports receive fresh container port allocations:

for i := len(clusterIPSvc.Spec.Ports) - 1; i >= 0; i-- {
    pm := clusterIPSvc.Spec.Ports[i]
    found := false

    for _, wantsPM := range svc.Spec.Ports {
        if wantsPM.Port == pm.Port &&
            strings.EqualFold(string(wantsPM.Protocol), string(pm.Protocol)) {

            // Check if this is a health check port that should be removed
            if pm.Name == "tailscale-health-check" { // <-- FIX 🟢 Remove health check ports that conflict
                break // Don't mark as found, will be deleted below
            }

            if wantsPM.Name != "" {
                clusterIPSvc.Spec.Ports[i].Name = wantsPM.Name
            } else {
                clusterIPSvc.Spec.Ports[i].Name = "tailscale-unnamed"
            }

            found = true
            break
        }
    }

    if !found {
        clusterIPSvc.Spec.Ports = slices.Delete(clusterIPSvc.Spec.Ports, i, i+1

With this fix, when a user specifies port 9002, the old health check port would be removed, and the subsequent port allocation logic would assign a proper container port (e.g., 10001) as the TargetPort for the user’s port.