Principal Data Scientist – Conversational AI | USA | Apply Now

Filled
February 24, 2026

Job Description

A global retail and technology leader, Walmart, is seeking a Principal Data Scientist to join its Next Gen Commerce team in Sunnyvale, California. This high-impact leadership role focuses on building the future of conversational shopping through intelligent AI agents that reason, recommend, and proactively assist customers. You will serve as the technical authority for defining, measuring, and improving AI quality using advanced evaluation frameworks, LLM-as-a-judge systems, and automated pipelines.

The ideal candidate will bring deep expertise in Generative AI, large language models, and evaluation methodologies, along with strong hands-on technical leadership. You will collaborate closely with engineering and product teams to translate subjective quality objectives into measurable metrics that drive continuous model improvement and safe deployment at scale.

Key Responsibilities

Design and implement advanced evaluation architectures for conversational AI systems using hybrid scoring and LLM-as-a-judge frameworks

Develop high-precision prompts for evaluator models and calibrate them against human benchmarks to ensure reliability

Lead model distillation and optimization efforts to create scalable and cost-efficient evaluation models

Curate large-scale datasets and “Golden Set” benchmarks from conversational logs to standardize evaluation processes

Integrate quality metrics into CI/CD pipelines for automated regression testing and production monitoring

Conduct deep failure analysis on AI agents, including hallucinations, safety risks, and tool misuse

Leverage evaluation insights to influence modeling teams and prioritize system improvements

Mentor senior data scientists and establish best practices for AI evaluation across the organization

Contribute thought leadership through research publications, patents, or conference presentations

Required Qualifications

Advanced degree (Master’s or PhD) in Computer Science, Statistics, Mathematics, Computational Linguistics, or related field

7+ years of experience in data science or machine learning with a focus on NLP, deep learning, or AI evaluation

Strong expertise in Large Language Models, prompt engineering, and instruction tuning

Proficiency in Python and core ML libraries such as PyTorch, NumPy, Pandas, and Scikit-learn

Experience designing evaluation metrics for non-deterministic AI outputs such as summarization or conversational responses

Knowledge of scalable data pipelines and distributed ML systems

Preferred Qualifications

PhD in Machine Learning, NLP, or a related quantitative discipline

Experience with conversational AI, retrieval-augmented generation (RAG), or recommendation systems in e-commerce environments

Knowledge of model distillation, LoRA, parameter-efficient tuning, or instruction optimization techniques

Publications, patents, or open-source contributions in AI or LLM evaluation

Familiarity with subjective evaluation frameworks for open-ended AI outputs

Compensation & Benefits

The position offers a competitive annual salary ranging from $143,000 to $286,000, along with performance bonuses, stock opportunities, and a comprehensive benefits package. Benefits include medical, dental, and vision coverage, retirement plans, paid time off, parental leave, disability coverage, employee discounts, and education assistance programs.