AI-Native Data Engineer

The AI-Native Data Engineer is a new breed of data engineer — one who builds data infrastructure specifically designed to power AI systems, not just analytics dashboards. Where a traditional data engineer builds pipelines for BI reporting, the AI-Native Data Engineer builds the feature stores, embedding pipelines, RAG data layers and model training infrastructure that AI products depend on. Role & Responsibilities: • Design and build data pipelines that feed AI systems: document ingestion pipelines, embedding generation pipelines, feature engineering for ML models and real-time data streams for agentic AI • Build and maintain vector databases and semantic search infrastructure using Azure AI Search, Pinecone or pgvector • Implement RAG data infrastructure on Databricks: chunking strategies, metadata enrichment, document processing pipelines using Delta Lake as the foundation • Build feature stores for ML models using Databricks Feature Store — designing reusable features that data scientists can consume • Develop data quality frameworks specifically for AI: ensuring data fed to LLMs is accurate, current and appropriately formatted • Implement data observability for AI pipelines: monitoring embedding drift, detecting data quality degradation that affects model performance • Build evaluation datasets and test pipelines for AI systems — generating ground truth data, running automated evals and tracking model performance • Collaborate with AI engineers on context window optimisation: understanding token limits, chunking trade-offs and retrieval quality metrics Required Skills & Experience: • 4+ years of data engineering experience with at least 1 year focused on AI/ML data infrastructure • Deep Databricks expertise: Delta Lake, Workflows, Feature Store, MLflow — used in production • Strong Python skills with experience building data pipelines for LLM applications • Experience with embedding models and vector search: you have built and optimised a RAG pipeline end to end • Azure data platform knowledge: ADLS Gen2, Azure AI Search, Azure Functions, Event Hub • Understanding of LLM context management: token counting, chunking strategies, prompt construction from retrieved data • Databricks Certified Data Engineer Associate or Professional preferred What We Offer: • Work at the frontier of AI data infrastructure • Salary ₹40–60L based on experience • Remote-first with flexible working • Access to cutting-edge AI tooling and Databricks preview features The AI-Native Data Engineer builds data pipelines that feed LLMs rather than dashboards. If you understand why RAG data quality is an engineering problem not a model problem, this role was built for people like you.

Remote · India / UK / US | ₹40–60L

Databricks
LLM Pipelines
Azure
Python
Delta Lake