AI/ML Engineer Interview Questions — 35 Real Questions & Answers (2026) | TechCerted

AI/ML engineer interviews in 2026 follow a 4-6 round structure: recruiter screen, coding (LeetCode medium-hard), ML system design, ML depth/breadth technical, and behavioral. Google runs 5 rounds, Meta runs 4 virtual onsite rounds (2 coding + 1 system design + 1 behavioral), and Amazon emphasizes Leadership Principles alongside LeetCode easy-medium. New for 2026: LLM and GenAI questions now appear in nearly every ML interview loop.

ML fundamentals (asked everywhere)

'Explain the bias-variance tradeoff and how you would diagnose which one is the problem.' — High bias: underfitting (training and validation error both high). High variance: overfitting (low training error, high validation error). Fix bias with more complex models; fix variance with regularization or more data.
'Walk me through gradient descent. What are the differences between batch, mini-batch, and stochastic?' — Batch uses all data per step (stable but slow), stochastic uses one sample (noisy but fast), mini-batch is the practical middle ground. Mention learning rate scheduling and Adam optimizer.
'How do you handle class imbalance in a classification problem?' — Resampling (SMOTE, undersampling), class weights, threshold tuning, ensemble methods, or focal loss. The right approach depends on the severity and the cost matrix of false positives vs false negatives.
'Explain L1 vs L2 regularization. When would you use each?' — L1 (Lasso) drives coefficients to zero for feature selection. L2 (Ridge) shrinks coefficients evenly. Use L1 when you suspect many irrelevant features; L2 when all features contribute.
'What is the difference between precision and recall? When does each matter more?' — Precision = of predicted positives, how many are correct. Recall = of actual positives, how many did you find. Spam filter: optimize precision. Cancer screening: optimize recall.

ML system design (FAANG favorite)

'Design a product recommendation system for an e-commerce platform.' (Meta/Amazon) — Candidate retrieval with embeddings, ranking with a learned model, re-ranking for diversity/business rules. Discuss offline vs online evaluation, A/B testing framework, and cold-start problem.
'Design a content moderation system at scale.' (Google/Meta) — Multi-modal pipeline: text classifier, image classifier, video frame sampling. Discuss precision-recall tradeoffs for different violation types, human-in-the-loop for edge cases, and appeal workflow.
'Design a fraud detection system for a payments platform.' (Amazon/Stripe) — Real-time feature computation, ensemble model with rule-based overrides, feedback loop from manual reviews. Discuss latency constraints, feature stores, and concept drift monitoring.
'Design a search autocomplete system.' (Google/Amazon) — Trie-based candidate generation, personalized ranking model, caching layer. Discuss how to handle trending queries, typo correction, and offensive content filtering.
'Design a video recommendation feed like TikTok.' — Two-tower retrieval model, contextual bandit for exploration, session-based features. Discuss engagement vs quality tradeoffs, filter bubbles, and real-time inference.

LLM and GenAI questions (new for 2026)

'When would you use RAG vs fine-tuning? Design a pipeline for each.' — RAG for dynamic knowledge, fine-tuning for behavior/style. RAG pipeline: chunk documents, embed with a model, store in vector DB, retrieve top-k, inject into prompt. Fine-tuning: curate dataset, choose base model, train with LoRA/QLoRA, evaluate on held-out set.
'How do you evaluate LLM output quality at scale?' — Automated metrics (RAGAS for RAG, BLEU/ROUGE for summarization), LLM-as-judge with calibrated rubrics, human evaluation samples. Discuss the tradeoffs of each and when you need all three.
'How do you defend against prompt injection?' — Input sanitization, system prompt isolation, output filtering, canary tokens, instruction hierarchy. Discuss the fundamental tension: LLMs are instruction-following by design.
'Design an AI agent system with tool use.' — Router model decides which tools to call, tool execution sandbox, result validation, retry logic, memory/context management. Discuss error handling and cost budgets.
'How do you optimize LLM inference cost and latency?' — Model distillation, quantization (INT8/INT4), KV cache optimization, batching strategies, model routing (small model for easy queries, large for hard). Discuss the cost-quality Pareto frontier.

Coding questions

'Implement K-Means clustering from scratch.' (Google) — Initialize centroids, assign points to nearest centroid, recompute centroids, repeat until convergence. Discuss initialization strategies (K-Means++) and convergence criteria.
'Implement logistic regression with gradient descent.' (Amazon/Meta) — Sigmoid function, binary cross-entropy loss, gradient computation, weight update loop. Discuss learning rate selection and feature scaling.
'Compute AUC-ROC from scratch given predictions and labels.' (Uber) — Sort by predicted probability, sweep threshold, compute TPR and FPR at each point, integrate. Discuss interpretation and when AUC is misleading.
'Build rolling window features for time series.' (Amazon/Spotify) — Sliding window aggregations (mean, std, min, max) with proper handling of window edges. Discuss leakage prevention.
'Implement a simple neural network with backpropagation in NumPy.' (Google/Apple) — Forward pass, loss computation, backward pass with chain rule, weight updates. Discuss vanishing gradients and activation function choices.

Behavioral questions for ML roles

'Tell me about a model you deployed that failed in production.' — They want to hear about monitoring, debugging methodology, and what you changed. Not having a failure story is a red flag.
'How do you handle the tradeoff between model accuracy and inference latency?' — Show you think about business constraints, not just model performance. Discuss distillation, quantization, and acceptable accuracy loss.
'Describe a time you disagreed with a stakeholder about a data-driven decision.' — Show you can defend your analysis while remaining open to new information.
'How do you stay current with ML research?' — Mention specific papers, conferences (NeurIPS, ICML), or newsletters. Generic answers like 'I read blogs' are insufficient.
'What is your biggest technical regret?' — Shows self-awareness. Discuss what you would do differently and why.

Common mistakes

Production blindness — discussing models without mentioning deployment, monitoring, or maintenance.
Shallow tool-centric answers — saying 'I use XGBoost' without explaining WHY it fits the problem.
Ignoring data quality — jumping to model selection without discussing data cleaning, labeling, and validation.
Metric selection as afterthought — choosing accuracy for imbalanced datasets is an instant credibility hit.
Not asking clarifying questions in system design — the ambiguity is intentional. Ask about scale, latency requirements, and success metrics before designing.

AI/ML Engineer Interview Questions — 35 Real Questions & Answers (2026)

ML fundamentals (asked everywhere)

ML system design (FAANG favorite)

LLM and GenAI questions (new for 2026)

Coding questions

Behavioral questions for ML roles

Common mistakes

Related Career Paths

Related Certifications

Data Scientist Salary in 2026 — By City, Experience & Certification

Data Engineer vs Data Scientist — Salary, Skills & Career Path Compared (2026)

Is Google Data Analytics Professional Certificate Worth It in 2026? Cost, ROI & Honest Review

The 7 Best AI Certifications in 2026 (Ranked by Salary Impact)