• Home
  • Selected Work
  • Writing
  • CV
  • Work With Me

Farrukh Nauman

AI engineer shipping agentic systems — Text-to-SQL, coding agents, tool contracts, evaluation harnesses — against production data platforms like Snowflake, Databricks, and SQL databases.

Farrukh Nauman

AI Engineering & Agentic Systems · Production Data Platforms · Time Series ML

I design and ship AI-agent systems — Text-to-SQL, coding agents, evaluation harnesses, tool contracts — wired to production data platforms like Snowflake, Databricks, and SQL databases.

AI engineer and principal consultant. I build agentic systems that work against real schemas and real data: enforced retrieval, deterministic guardrails, result-level evaluation, and the test discipline that makes agents trustworthy in production.

See case studies Get in touch

Choose the CV closest to your assignment

~1 week Agent-driven migration validator from zero to live dashboard

50%+ compute savings SQL pipeline redesigned to incremental by an AI-agent workflow — 5-10x faster, row-level correctness proof

10-100x faster Time series classification for industrial telemetry and edge deployment

~11M SEK Led a Vinnova/EU AI portfolio; interim lead of a DS & analytics team

What I help with

AI Engineering & Agentic Systems

The problem: You want to use AI agents for real engineering work — wired into your data platform, your CLIs, and your real schemas — not another chat demo that falls over on production data. You need someone who has shipped agentic systems with the test discipline to make them reliable.

What I do: Design and build agentic systems end to end: coding agents wired to Databricks/Snowflake CLIs and APIs, Text-to-SQL over real schemas, tool and API contracts, evals, run manifests, and sandboxed execution. I write the harness and the operating constraints that keep agents trustworthy in production.

Example outcomes: Built a full agent-driven validation workflow — discovery, per-pipeline scripts, unified validator, historical storage, and live dashboards. Shipped Text-to-SQL against a production ERP with result-level evaluation. Open-sourced ts-agents, a framework with strict JSON tool contracts and sandboxed execution.

Migration Assurance

The problem: You are migrating between data platforms and nobody can answer “is the target data correct?” with confidence. Spot-checks and row counts are not enough.

What I do: Build agent-driven cross-platform validation — automated comparison of schemas, row counts, key distributions, and date ranges across dozens of tables. The coding agent discovers pipelines, writes per-table validators, and deploys dashboards the team can trust.

Example outcomes: Validated dozens of tables across Databricks and Snowflake in under a week using a coding agent. Surfaced a row-count discrepancy of more than 2x on the first automated run.

Warehouse Cost & Performance

The problem: Your warehouse bill is growing faster than your data. Pipelines run on brute-force full scans. Nobody has profiled the actual bottlenecks or tested whether an incremental approach is both faster and correct.

What I do: Use AI-agent workflows to profile pipelines, benchmark alternatives at multiple data scales and warehouse sizes, and validate correctness with row-level comparisons — not just aggregate checksums. The agent parallelizes experiments that would take a human weeks.

Example deliverables: Agent-driven profiling reports, benchmark harnesses across warehouse sizes, correctness-validation checks, and rollout plans for incremental pipeline changes.

AI Enablement & Change

The problem: Your teams are experimenting with AI but stuck between interesting demos and production. Nobody owns the path from “we should use this” to “it is in daily use,” and stakeholders across business, IT, and data are not aligned.

What I do: Run AI opportunity discovery, prioritize high-friction workflows, align business/IT/data stakeholders, lead PoCs through to production, and build internal capability — with governance and ROI framing, not hype.

Example outcomes: Interim lead of a 4–7 person DS & analytics team owning roadmap and stakeholders. Project lead across an ~11M SEK Vinnova/EU AI portfolio. Established an AI mentorship program that built durable in-house capability.

Time Series / Telemetry ML

The problem: You have sensor, telemetry, or time series data and need production ML — activity recognition, anomaly detection, forecasting, or classification — but your team’s ML experience is limited or focused elsewhere.

What I do: Design and deliver time series ML systems from evaluation harness to production deployment. I specialize in approaches that are fast enough for edge and IoT (ROCKET family, lightweight models) while maintaining accuracy.

Example outcomes: Delivered a time series classification system with state-of-the-art accuracy and 10-100x faster inference than deep learning baselines. Established in-house activity recognition from sensor data.

Featured work

Text-to-SQL Against a Real ERP

Built natural-language querying against a wide production ERP — enforced retrieval, deterministic guardrails, value-aware evaluation with Wilson confidence bounds, and a streaming full-stack TypeScript/React app. Open-sourced a synthetic rebuild preserving the real traps.

Read the essay

SQL Pipeline Optimization

Redesigned a years-old daily full-recompute pipeline into a validated incremental one — 5-10x faster with 50%+ compute savings and row-level correctness proof. The fastest approach tested was wrong; multi-scale benchmarks and parallel AI-agent workflows found the working design.

Read the case study

Cross-Platform Migration Validation

Built a production-grade validation system for a large Databricks-to-Snowflake migration: automated comparison across dozens of tables, historical result storage, and a live team dashboard — from zero to deployed in about a week using a coding agent.

Read the case study

Time Series Classification for Production

Delivered a time series classification system for industrial telemetry: state-of-the-art accuracy with 10-100x faster inference than deep learning, designed for edge and IoT deployment.

Read more

Migration Testing: What Counts as Evidence

A practitioner’s reference for migration test suites: five layers (unit, parity, manifest integrity, end-to-end, performance), why every reference artifact carries a cryptographic hash, and how to sequence the suite so the oracle leads the implementation. The follow-up to “Don’t Port the Syntax. Port the Evidence.”

Read the essay

Who I work best with

Teams building AI agents against real data — Text-to-SQL, coding agents, RAG over schemas — who need someone who has shipped agentic systems with evaluation discipline, not just prompts.

Data platform & engineering leaders who want AI-agent workflows for migration validation, pipeline optimization, or analytics automation on Snowflake/Databricks/SQL databases.

Leaders driving AI adoption stuck between demos and production who need someone to run discovery, align stakeholders, and ship real agentic workflows.

Teams with telemetry or time series data that need production ML — activity recognition, anomaly detection, or forecasting — not a research paper.

More writing on the blog →

Need AI-agent systems that hold up in production?

Let’s talk about the agentic system, data platform, or evaluation problem you’re solving.

Get in touch Email me

Copyright 2026, Farrukh Nauman