Debugging Kubernetes Pods with Netshoot

Debugging Kubernetes Pods with Netshoot

The kubectl debug command is a powerful tool for troubleshooting Kubernetes pods by either adding an ephemeral container to an existing pod or creating a copy of the pod with debugging modifications. Below is a step-by-step explanation of how it works, using a real-world example.


Real-World Example: Debugging a Crashing Node.js Application

Scenario: A pod named nodejs-app is repeatedly crashing. The logs show a generic error, but the container lacks tools to diagnose further (e.g., curl, netstat, or cat). You need to inspect the file system and network connections.


Option 1: Using an Ephemeral Container

Step 1: Run the kubectl debug Command

kubectl debug pod/nodejs-app -it \
  --image=busybox:latest \
  --target=nodejs-container \
  -- sh
  • Flags:

    • -it: Attach an interactive terminal.

    • --image=busybox: Use a lightweight image with debugging tools.

    • --target=nodejs-container: Share the process namespace with the target container.

Step 2: Kubernetes Adds the Ephemeral Container

  1. The Kubernetes API updates the pod’s ephemeralContainers field (pods are immutable except for this field).

  2. The kubelet on the node starts the busybox container alongside the original nodejs-container.

  3. The ephemeral container shares:

    • The process namespace (via --target), allowing you to see processes from nodejs-container.

    • The network namespace, so it can access the same network interfaces (e.g., localhost).

    • Volumes (if explicitly mounted).

Step 3: Debug Inside the Ephemeral Container

Once inside the busybox container:

# Inspect processes from the target container:
ps aux

# Check open ports/networking:
netstat -tulpn

# Access shared volumes (if mounted):
ls /path/to/app/logs
  • Example Diagnosis: You discover a missing file in /app/config or a port conflict.

Step 4: Exit and Cleanup

  • Exit the shell with exit.

  • The ephemeral container terminates and is automatically removed, but its logs persist in the pod’s description.


Option 2: Creating a Pod Copy

Use this if the pod is in a crash loop (ephemeral containers may not stay running).

Step 1: Run the kubectl debug Command

kubectl debug pod/nodejs-app -it \
  --copy-to=nodejs-app-debug \
  --container=nodejs-container \
  -- sh
  • Flags:

    • --copy-to=nodejs-app-debug: Create a copy of the pod.

    • The original container’s command is replaced with sleep infinity to keep it running.

Step 2: Debug the Copied Pod

  1. The copied pod nodejs-app-debug starts with the same configuration but modified entrypoint.

  2. Exec into the copied pod’s container:

kubectl exec -it nodejs-app-debug -c nodejs-container -- sh
  1. Diagnose issues (e.g., test file permissions or run the app manually):
# Check file permissions:
ls -l /app

# Manually start the app to see errors:
node server.js

Step 3: Cleanup

Delete the copied pod after debugging:

kubectl delete pod/nodejs-app-debug

Key Differences Between the Two Methods

Ephemeral ContainerPod Copy
Attaches to the live pod.Creates a new pod.
Lightweight and fast.Better for crash-looping pods.
Shares namespaces with the target container.Requires manual volume/network inspection.

Under the Hood: How It Works

  1. Ephemeral Containers:

    • Added via the Kubernetes API using a PATCH request to the pod’s ephemeralContainers field.

    • Requires the EphemeralContainers feature gate (enabled by default since v1.23).

    • The kubelet dynamically starts the new container without restarting the pod.

  2. Pod Copy:

    • Creates a new pod with spec.containers[*].command overridden to sleep infinity.

    • Copies labels, volumes, and environment variables from the original pod.


Using “netshoot” for debugging

What is Netshoot?

Netshoot is a troubleshooting toolbox for Kubernetes and container networking. It is a lightweight Docker image packed with networking utilities to help debug network issues inside containers and Kubernetes clusters.

How Does Netshoot Work?

Netshoot runs as a privileged container inside a Kubernetes cluster, allowing engineers to diagnose network issues from within the cluster.

How to Use Netshoot with kubectl debug?

1️⃣ Attach Netshoot to a Running Pod

If a pod (e.g., my-app-pod) is running and you need to debug it, run:

kubectl debug my-app-pod -it --image=nicolaka/netshoot

💡 What Happens?

  • This injects a temporary Netshoot container inside my-app-pod.

  • It provides access to networking tools without modifying the original container.

2️⃣ Debugging DNS Issues in the Pod

Once inside the ephemeral Netshoot container, you can run:

nslookup my-service.default.svc.cluster.local
dig my-service.default.svc.cluster.local

3️⃣ Checking Pod Connectivity

ping another-pod-ip
curl http://another-service:8080

4️⃣ Capturing Network Traffic

tcpdump -i eth0 -n -c 10

When to Use kubectl debug with Netshoot?

Pod is running but lacks debugging tools (e.g., distroless/alpine-based containers).
Need to inspect networking issues (e.g., DNS failures, packet drops).
Want to avoid modifying a production pod while troubleshooting.

By combining kubectl debug with Netshoot, you get powerful network debugging inside Kubernetes without altering your existing pods! 🚀

References

https://kubernetes.io/docs/reference/kubectl/generated/kubectl_debug/

https://sia-ai.medium.com/introduction-to-debugging-locally-and-live-on-kubernetes-8c8ecd3acbaa

https://medium.com/@danielepolencic/isolating-kubernetes-pods-for-debugging-5fe41e630e9