Cloud Architect Interview Questions — 35 Real Questions & Answers (2026) | TechCerted

Cloud architect interviews in 2026 follow a 4-6 round loop over 2-4 weeks: recruiter screen, hiring manager screen, IaC technical screen (Terraform/Pulumi), platform deep-dive (AWS/Azure/GCP specifics), system design whiteboard, and behavioral. Security questions now appear in every round, not just a dedicated stage. Cost is treated as an architectural constraint from day one. Interviewers intentionally leave constraints ambiguous to test whether you ask clarifying questions before designing.

AWS architecture questions

'Design a VPC for a multi-tier web application.' — Public subnet for ALB, private subnets for app and DB tiers, NAT Gateway for outbound, VPC endpoints for S3 and DynamoDB. Security groups (stateful, instance-level) plus NACLs (stateless, subnet-level) for layered defense.
'A developer needs temporary access to a production S3 bucket. How?' — IAM role with STS AssumeRole, scoped policy with time-bound session, CloudTrail logging. Never distribute access keys. Use IAM Identity Center for human access.
'Your Lambda function has cold start issues impacting UX. Fix it.' — Provisioned Concurrency for predictable workloads, optimize package size, keep the function warm with scheduled invocations, consider container-based Lambda for larger runtimes.
'Design a disaster recovery strategy with RPO of 15 minutes and RTO of 1 hour.' — Multi-region active-passive with Aurora Global Database (cross-region replication), S3 cross-region replication, Route 53 health checks for automated failover, infrastructure-as-code for rapid recreation.
'How do you optimize a $50K/month AWS bill?' — Right-sizing instances (check CPU/memory utilization), Reserved Instances or Savings Plans for steady workloads, Spot for batch processing, S3 lifecycle policies, unused EBS/EIP cleanup, Cost Explorer anomaly detection.

Kubernetes and container questions

'A pod is in CrashLoopBackOff. Walk me through debugging.' — kubectl describe pod (check events), kubectl logs (check application errors), check resource limits (OOMKilled?), verify health/readiness probes, check image pull and secrets.
'Explain the difference between a Deployment, StatefulSet, and DaemonSet.' — Deployment: stateless replicas, rolling updates. StatefulSet: stable network identities, ordered deployment, persistent volumes (databases). DaemonSet: one pod per node (monitoring agents, log collectors).
'How do you handle secrets in Kubernetes?' — External Secrets Operator syncing from AWS Secrets Manager or Vault, sealed secrets for GitOps, never store secrets in ConfigMaps or environment variables in plain text.
'Design a multi-cluster Kubernetes architecture.' — Federation or fleet management (GKE Enterprise, EKS Anywhere), service mesh for cross-cluster communication, GitOps with ArgoCD for consistent deployments, centralized observability.

Terraform and IaC questions

'How do you manage Terraform state in a team environment?' — Remote backend (S3 + DynamoDB for locking), state isolation per environment (workspaces or separate state files), state encryption at rest, import existing resources before modifying.
'What is Terraform drift and how do you detect it?' — Drift occurs when actual infrastructure diverges from state. Detect with terraform plan in CI/CD, or tools like Spacelift/env0 for continuous drift detection. Fix by reconciling: import, replace, or taint.
'Explain Terraform modules and when NOT to use them.' — Modules encapsulate reusable infrastructure patterns. Do not use for simple, one-off resources (over-abstraction). Do not nest modules more than 2 levels deep. Keep modules focused on a single responsibility.
'How do you handle sensitive values in Terraform?' — Use variables marked as sensitive, store values in a secrets manager (not tfvars files), use SOPS or sealed secrets for encrypted values in Git, never commit state files containing secrets.

System design and whiteboard

'Design the infrastructure for a real-time analytics platform processing 1M events/second.' — Kinesis or Kafka for ingestion, Flink for stream processing, DynamoDB or Redis for real-time queries, S3 + Athena for historical analysis. Discuss partitioning strategy, exactly-once semantics, and cost at scale.
'Design a global CDN-backed web application with sub-100ms latency.' — CloudFront with edge caching, origin shield, Lambda@Edge for personalization, multi-region backends with Global Accelerator, Route 53 latency-based routing.
'A startup needs to go from 0 to production in 2 weeks. Design the infrastructure.' — Start with managed services: ECS Fargate or App Runner, RDS, S3, CloudFront. IaC from day one with Terraform. CI/CD with GitHub Actions. Monitoring with CloudWatch. Do not over-engineer: no Kubernetes, no multi-region, no custom service mesh.

AI infrastructure questions (new for 2026)

'Design GPU infrastructure for serving an LLM with 100 requests/second.' — GPU instance selection (A100 vs H100), model parallelism strategy, batching with vLLM or TensorRT-LLM, auto-scaling based on queue depth, cost optimization with Spot GPU instances for batch workloads.
'How do you manage costs for AI/ML workloads in the cloud?' — Spot instances for training, reserved capacity for inference, right-sizing GPU instances, model quantization to use smaller GPUs, FinOps dashboards for per-model cost attribution.
'Design a RAG system architecture on AWS.' — S3 for document storage, OpenSearch or Pinecone for vector search, Lambda or ECS for embedding pipeline, API Gateway for serving, SageMaker for model hosting, CloudWatch for monitoring retrieval quality.

Behavioral questions

'Tell me about a time an architecture decision you made caused a production outage.' — Show you take ownership, describe root cause analysis, and explain what architectural guardrails you added to prevent recurrence.
'How do you handle a situation where a developer wants to bypass security controls for speed?' — Show you balance security and velocity. Propose alternatives that maintain security while reducing friction (automated compliance checks, pre-approved patterns).
'Describe how you evaluate build vs buy decisions.' — Framework: team capacity, maintenance burden, differentiation value, vendor lock-in risk, total cost of ownership over 3 years, security implications. Show you consider both technical and business factors.

Cloud Architect Interview Questions — 35 Real Questions & Answers (2026)

AWS architecture questions

Kubernetes and container questions

Terraform and IaC questions

System design and whiteboard

AI infrastructure questions (new for 2026)

Behavioral questions

Related Career Paths

Related Certifications

Cloud Architect vs Solutions Architect — Salary, Skills & Career Path Compared (2026)

Is HashiCorp Certified: Terraform Associate Worth It in 2026? Cost, ROI & Honest Review

DevOps Engineer Salary in 2026 — By City, Certification & Specialization

Cloud Architect Salary in 2026 — By City, Platform & Certification