Google Kubernetes Engine (GKE)
Google Kubernetes Engine (GKE) is Google Cloudβs managed Kubernetes service that provides a secure, production-ready environment for deploying containerized applications. This guide focuses on practical deployment scenarios using Terraform and gcloud CLI.
Key Features
- Autopilot: Fully managed Kubernetes experience with hands-off operations
- Standard: More control over cluster configuration and node management
- GKE Enterprise: Advanced multi-cluster management and governance features
- Auto-scaling: Automatic scaling of node pools based on workload demand
- Auto-upgrade: Automated Kubernetes version upgrades
- Multi-zone/region: Deploy across zones/regions for high availability
- VPC-native networking: Uses alias IP ranges for pod networking
- Container-Optimized OS: Secure by default OS for GKE nodes
- Workload Identity: Secure access to Google Cloud services from pods
Deploying GKE with Terraform
Standard Cluster Deployment
resource "google_container_cluster" "primary" {
name = "my-gke-cluster"
location = "us-central1-a"
remove_default_node_pool = true
initial_node_count = 1
# Enable Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
# IP allocation policy for VPC-native
ip_allocation_policy {
cluster_ipv4_cidr_block = "/16"
services_ipv4_cidr_block = "/22"
}
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
# Release channel for auto-upgrades
release_channel {
channel = "REGULAR"
}
# Maintenance window
maintenance_policy {
recurring_window {
start_time = "2022-01-01T00:00:00Z"
end_time = "2022-01-02T00:00:00Z"
recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
}
}
}
resource "google_container_node_pool" "primary_nodes" {
name = "primary-node-pool"
location = "us-central1-a"
cluster = google_container_cluster.primary.name
node_count = 3
management {
auto_repair = true
auto_upgrade = true
}
autoscaling {
min_node_count = 1
max_node_count = 10
}
node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100
disk_type = "pd-standard"
# Google recommends custom service accounts with minimal permissions
service_account = google_service_account.gke_sa.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
# Enable workload identity on node pool
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
env = "production"
}
tags = ["gke-node", "production"]
}
}
resource "google_service_account" "gke_sa" {
account_id = "gke-service-account"
display_name = "GKE Service Account"
}
resource "google_project_iam_member" "gke_sa_roles" {
for_each = toset([
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/monitoring.viewer",
"roles/artifactregistry.reader"
])
role = each.key
member = "serviceAccount:${google_service_account.gke_sa.email}"
project = var.project_id
}
resource "google_compute_network" "vpc" {
name = "gke-vpc"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnet" {
name = "gke-subnet"
ip_cidr_range = "10.10.0.0/16"
region = "us-central1"
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.20.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.30.0.0/16"
}
}
Autopilot Cluster Deployment
resource "google_container_cluster" "autopilot" {
name = "autopilot-cluster"
location = "us-central1"
# Enable Autopilot mode
enable_autopilot = true
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
# IP allocation policy for VPC-native
ip_allocation_policy {
cluster_ipv4_cidr_block = "/16"
services_ipv4_cidr_block = "/22"
}
# Release channel (required for Autopilot)
release_channel {
channel = "REGULAR"
}
# Workload identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
}
Deploying GKE with gcloud CLI
Creating a Standard Cluster
# Create VPC
gcloud compute networks create gke-vpc --subnet-mode=custom
# Create subnet
gcloud compute networks subnets create gke-subnet \
--network=gke-vpc \
--region=us-central1 \
--range=10.10.0.0/16 \
--secondary-range=pods=10.20.0.0/16,services=10.30.0.0/16
# Create service account
gcloud iam service-accounts create gke-sa --display-name="GKE Service Account"
# Assign roles
for role in roles/logging.logWriter roles/monitoring.metricWriter roles/monitoring.viewer roles/artifactregistry.reader
do
gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
--member="serviceAccount:gke-sa@$(gcloud config get-value project).iam.gserviceaccount.com" \
--role="${role}"
done
# Create GKE cluster
gcloud container clusters create my-gke-cluster \
--zone=us-central1-a \
--network=gke-vpc \
--subnetwork=gke-subnet \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--enable-ip-alias \
--enable-private-nodes \
--master-ipv4-cidr=172.16.0.0/28 \
--enable-master-global-access \
--no-enable-basic-auth \
--release-channel=regular \
--workload-pool=$(gcloud config get-value project).svc.id.goog \
--no-issue-client-certificate \
--num-nodes=1 \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--machine-type=e2-standard-4 \
--disk-size=100 \
--disk-type=pd-standard \
--service-account=gke-sa@$(gcloud config get-value project).iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--metadata=disable-legacy-endpoints=true \
--tags=gke-node,production \
--node-labels=env=production \
--enable-autoupgrade \
--enable-autorepair
Creating an Autopilot Cluster
# Create VPC and subnet (same as above)
# Create Autopilot cluster
gcloud container clusters create-auto autopilot-cluster \
--region=us-central1 \
--network=gke-vpc \
--subnetwork=gke-subnet \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--enable-master-global-access \
--release-channel=regular \
--workload-pool=$(gcloud config get-value project).svc.id.goog
Real-World Example: Deploying a Microservice Application
This example demonstrates deploying a complete microservices application to GKE:
Step 1: Create GKE infrastructure with Terraform
# main.tf - GKE Infrastructure
provider "google" {
project = var.project_id
region = var.region
}
# VPC Network
resource "google_compute_network" "vpc" {
name = "microservices-vpc"
auto_create_subnetworks = false
}
# Subnet
resource "google_compute_subnetwork" "subnet" {
name = "microservices-subnet"
ip_cidr_range = "10.0.0.0/16"
region = var.region
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "192.168.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "172.16.0.0/16"
}
}
# NAT Router and Gateway for private clusters
resource "google_compute_router" "router" {
name = "microservices-router"
region = var.region
network = google_compute_network.vpc.id
}
resource "google_compute_router_nat" "nat" {
name = "microservices-nat"
router = google_compute_router.router.name
region = var.region
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
# Service Account
resource "google_service_account" "gke_sa" {
account_id = "microservices-gke-sa"
display_name = "Microservices GKE Service Account"
}
# IAM roles for the Service Account
resource "google_project_iam_member" "gke_sa_roles" {
for_each = toset([
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/monitoring.viewer",
"roles/artifactregistry.reader"
])
role = each.key
member = "serviceAccount:${google_service_account.gke_sa.email}"
project = var.project_id
}
# GKE Cluster
resource "google_container_cluster" "primary" {
name = "microservices-cluster"
location = var.region
# We create a separate node pool below
remove_default_node_pool = true
initial_node_count = 1
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.32/28"
}
# Enable Binary Authorization
binary_authorization {
evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE"
}
# Enable Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Release channel
release_channel {
channel = "REGULAR"
}
}
# Node Pools
resource "google_container_node_pool" "general" {
name = "general"
location = var.region
cluster = google_container_cluster.primary.name
autoscaling {
min_node_count = 1
max_node_count = 5
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100
disk_type = "pd-standard"
service_account = google_service_account.gke_sa.email
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
role = "general"
}
taint = []
}
}
# Create a dedicated node pool for database workloads
resource "google_container_node_pool" "database" {
name = "database"
location = var.region
cluster = google_container_cluster.primary.name
autoscaling {
min_node_count = 1
max_node_count = 3
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = "e2-highmem-4"
disk_size_gb = 200
disk_type = "pd-ssd"
service_account = google_service_account.gke_sa.email
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
role = "database"
}
taint {
key = "workloadType"
value = "database"
effect = "NO_SCHEDULE"
}
}
}
# Artifact Registry (for storing container images)
resource "google_artifact_registry_repository" "repo" {
provider = google-beta
location = var.region
repository_id = "microservices"
format = "DOCKER"
# Encryption using CMEK (Customer-Managed Encryption Keys)
kms_key_name = google_kms_crypto_key.artifact_key.id
}
# KMS Key for encrypting Artifact Registry
resource "google_kms_key_ring" "keyring" {
name = "microservices-keyring"
location = var.region
}
resource "google_kms_crypto_key" "artifact_key" {
name = "artifact-key"
key_ring = google_kms_key_ring.keyring.id
}
Step 2: Create Kubernetes manifests for the application
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: microservices
labels:
istio-injection: enabled
---
# frontend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: microservices
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: us-central1-docker.pkg.dev/PROJECT_ID/microservices/frontend:latest
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /readiness
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
serviceAccountName: frontend-sa
---
# frontend-service.yaml
apiVersion: v1
kind: Service
metadata:
name: frontend
namespace: microservices
spec:
selector:
app: frontend
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
namespace: microservices
spec:
replicas: 3
selector:
matchLabels:
app: backend-api
template:
metadata:
labels:
app: backend-api
spec:
containers:
- name: backend-api
image: us-central1-docker.pkg.dev/PROJECT_ID/microservices/backend:latest
ports:
- containerPort: 8081
env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: db_host
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
serviceAccountName: backend-sa
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend-api
topologyKey: "kubernetes.io/hostname"
---
# database.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: database
namespace: microservices
spec:
serviceName: database
replicas: 1
selector:
matchLabels:
app: database
template:
metadata:
labels:
app: database
spec:
containers:
- name: database
image: us-central1-docker.pkg.dev/PROJECT_ID/microservices/postgres:13
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: POSTGRES_DB
value: app
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
nodeSelector:
role: database
tolerations:
- key: workloadType
operator: Equal
value: database
effect: NoSchedule
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "premium-rwo"
resources:
requests:
storage: 100Gi
---
# ingress.yaml (using Ingress-NGINX controller)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: microservices-ingress
namespace: microservices
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backend-api
port:
number: 80
Step 3: Create Deployment Pipeline (Cloud Build)
# cloudbuild.yaml
steps:
# Build the container images
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/frontend:${_VERSION}', './frontend']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/backend:${_VERSION}', './backend']
# Push the container images to Artifact Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/frontend:${_VERSION}']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/backend:${_VERSION}']
# Deploy to GKE
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'apply'
- '-f'
- 'kubernetes/namespace.yaml'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=microservices-cluster'
# Create secrets
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'create'
- 'secret'
- 'generic'
- 'db-credentials'
- '--namespace=microservices'
- '--from-literal=username=admin'
- '--from-literal=password=${_DB_PASSWORD}'
- '--dry-run=client'
- '-o'
- 'yaml'
- '|'
- 'kubectl'
- 'apply'
- '-f'
- '-'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=microservices-cluster'
# Update kubernetes manifests with the new image version
- name: 'gcr.io/cloud-builders/sed'
args:
- '-i'
- 's|us-central1-docker.pkg.dev/PROJECT_ID/microservices/frontend:latest|us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/frontend:${_VERSION}|g'
- 'kubernetes/frontend.yaml'
- name: 'gcr.io/cloud-builders/sed'
args:
- '-i'
- 's|us-central1-docker.pkg.dev/PROJECT_ID/microservices/backend:latest|us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/backend:${_VERSION}|g'
- 'kubernetes/backend.yaml'
# Apply the Kubernetes manifests
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'apply'
- '-f'
- 'kubernetes/.'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=microservices-cluster'
substitutions:
_VERSION: '1.0.0'
_DB_PASSWORD: 'changeme' # Should be set via Cloud Build triggers or Secret Manager
options:
dynamic_substitutions: true
Best Practices
- Security
- Use private clusters with no public endpoint
- Implement Workload Identity for pod-level access to Google Cloud resources
- Apply the principle of least privilege for service accounts
- Enable Binary Authorization for secure supply chain
- Keep nodes and master updated with release channels
- Reliability
- Deploy across multiple zones/regions for high availability
- Use Pod Disruption Budgets to ensure availability during maintenance
- Implement proper health checks and readiness/liveness probes
- Set appropriate resource requests and limits
- Use node auto-provisioning to handle fluctuating workloads
- Cost Optimization
- Use Autopilot for hands-off management and optimized costs
- Leverage Spot VMs for batch or fault-tolerant workloads
- Set up cluster autoscaler to scale nodes based on demand
- Use horizontal pod autoscaling (HPA) based on CPU/memory/custom metrics
- Implement PodNodeSelector to ensure pods run on appropriate nodes
- Monitoring and Logging
- Enable Cloud Monitoring and Logging during cluster creation
- Set up custom dashboards for cluster and application metrics
- Create log-based alerts for critical issues
- Use Cloud Trace and Profiler for application performance monitoring
- Implement distributed tracing using OpenTelemetry
Common Issues and Troubleshooting
Networking Issues
- Ensure pod CIDR ranges donβt overlap with VPC subnets
- Check firewall rules for master-to-node and node-to-node communication
- Verify kube-proxy is running correctly for service networking
- Use Network Policy to control pod-to-pod traffic
Performance Problems
- Review pod resource settings (requests/limits)
- Check for node resource exhaustion (CPU, memory)
- Look for noisy neighbor issues on shared nodes
- Monitor network throughput and latency
Deployment Failures
- Verify service account permissions
- Check image pull errors (registry access, image existence)
- Examine pod events with
kubectl describe pod - Review logs with
kubectl logsor Cloud Logging
Scaling Issues
- Ensure cluster autoscaler is properly configured
- Check if pods have appropriate resource requests
- Verify node resource availability
- Look for pod affinity/anti-affinity conflicts