Pod Troubleshooting Commands
This guide provides actionable commands and best practices for troubleshooting pods in Kubernetes clusters (AKS, EKS, GKE, and on-prem). Use these steps for real-life incident response and GitOps workflows.
Common Troubleshooting Commands
List all Pods in all Namespaces:
kubectl get pods --all-namespaces
Check Resource Consumption:
kubectl top pods --all-namespaces
Describe a Pod:
kubectl describe pod <pod-name> -n <namespace>
View Pod Logs:
kubectl logs <pod-name> -n <namespace>
Follow Pod Logs (stream in real-time):
kubectl logs -f <pod-name> -n <namespace>
Exec into a Pod:
kubectl exec -it <pod-name> -n <namespace> -- <command>
Get Events for a Pod:
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>
Check Pod Health (Readiness/Liveness):
kubectl describe pod <pod-name> -n <namespace> | grep -i 'readiness\|liveness\|conditions'
Retrieve Pod IP and Node:
kubectl get pod <pod-name> -n <namespace> -o wide
Restart a Pod:
kubectl delete pod <pod-name> -n <namespace>
Check Pod Status:
kubectl get pod <pod-name> -n <namespace> -o wide
List Pod Events (sorted):
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.metadata.creationTimestamp'
Verify Pod Affinity/Anti-Affinity:
kubectl describe pod <pod-name> -n <namespace> | grep -i nodeaffinity
Check Resource Requests and Limits:
kubectl describe pod <pod-name> -n <namespace> | grep -i resources
Identify Stuck Pods:
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.metadata.creationTimestamp' | tail -n 1
Real-Life Troubleshooting Workflow
-
Identify the failing pod:
kubectl get pods -A | grep -i error -
Check pod status and events:
kubectl describe pod <pod-name> -n <namespace> kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> -
Inspect logs:
kubectl logs <pod-name> -n <namespace> -
Check resource usage:
kubectl top pod <pod-name> -n <namespace> -
Exec into the pod for deeper inspection:
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh -
Review affinity, resource limits, and node assignment:
kubectl describe pod <pod-name> -n <namespace> | grep -i 'affinity\|resources\|node' -
If using GitOps: Check if the manifest in Git matches the running pod. If not, investigate drift or failed syncs (ArgoCD/Flux dashboards).
Best Practices (2025)
- Always check pod events and logs before restarting or deleting pods
- Use
kubectl get eventssorted by timestamp for recent issues - Validate resource requests/limits to avoid OOMKilled or throttling
- Use LLMs (Copilot, Claude) to generate troubleshooting scripts or analyze logs
- Document recurring issues and solutions in your team knowledge base
Common Pitfalls
- Ignoring events (often contain the root cause)
- Restarting pods without root cause analysis
- Not checking for node-level issues (disk, network, taints)
- Manual changes outside Git in GitOps-managed clusters
References
Related Topics
- Pod Troubleshooting Commands - Specific tools for debugging pods
- Kubernetes Core Concepts - Understanding fundamentals helps troubleshooting
- Logging - Collecting logs from Kubernetes
- Metrics - Monitoring Kubernetes performance