Pod Troubleshooting Commands

This guide provides actionable commands and best practices for troubleshooting pods in Kubernetes clusters (AKS, EKS, GKE, and on-prem). Use these steps for real-life incident response and GitOps workflows.

Common Troubleshooting Commands

List all Pods in all Namespaces:

kubectl get pods --all-namespaces

Check Resource Consumption:

kubectl top pods --all-namespaces

Describe a Pod:

kubectl describe pod <pod-name> -n <namespace>

View Pod Logs:

kubectl logs <pod-name> -n <namespace>

Follow Pod Logs (stream in real-time):

kubectl logs -f <pod-name> -n <namespace>

Exec into a Pod:

kubectl exec -it <pod-name> -n <namespace> -- <command>

Get Events for a Pod:

kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>

Check Pod Health (Readiness/Liveness):

kubectl describe pod <pod-name> -n <namespace> | grep -i 'readiness\|liveness\|conditions'

Retrieve Pod IP and Node:

kubectl get pod <pod-name> -n <namespace> -o wide

Restart a Pod:

kubectl delete pod <pod-name> -n <namespace>

Check Pod Status:

kubectl get pod <pod-name> -n <namespace> -o wide

List Pod Events (sorted):

kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.metadata.creationTimestamp'

Verify Pod Affinity/Anti-Affinity:

kubectl describe pod <pod-name> -n <namespace> | grep -i nodeaffinity

Check Resource Requests and Limits:

kubectl describe pod <pod-name> -n <namespace> | grep -i resources

Identify Stuck Pods:

kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.metadata.creationTimestamp' | tail -n 1

Real-Life Troubleshooting Workflow

Identify the failing pod:
```
kubectl get pods -A | grep -i error
```

Check pod status and events:

kubectl describe pod <pod-name> -n <namespace>
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>

Inspect logs:
```
kubectl logs <pod-name> -n <namespace>
```

Check resource usage:

kubectl top pod <pod-name> -n <namespace>

Exec into the pod for deeper inspection:

kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

Review affinity, resource limits, and node assignment:

kubectl describe pod <pod-name> -n <namespace> | grep -i 'affinity\|resources\|node'

If using GitOps: Check if the manifest in Git matches the running pod. If not, investigate drift or failed syncs (ArgoCD/Flux dashboards).

Best Practices (2025)

Always check pod events and logs before restarting or deleting pods
Use kubectl get events sorted by timestamp for recent issues
Validate resource requests/limits to avoid OOMKilled or throttling
Use LLMs (Copilot, Claude) to generate troubleshooting scripts or analyze logs
Document recurring issues and solutions in your team knowledge base

Common Pitfalls

Ignoring events (often contain the root cause)
Restarting pods without root cause analysis
Not checking for node-level issues (disk, network, taints)
Manual changes outside Git in GitOps-managed clusters

References

Pod Troubleshooting Commands - Specific tools for debugging pods
Kubernetes Core Concepts - Understanding fundamentals helps troubleshooting
Logging - Collecting logs from Kubernetes
Metrics - Monitoring Kubernetes performance