Farrukh Nauman
Farrukh Nauman
Enterprise Data Platforms, Coding Agents, and Time Series ML
I help enterprise data teams de-risk migrations, cut warehouse cost, and automate analytics workflows with coding agents.
Principal consultant with deep experience across enterprise data platforms, time series / telemetry ML, and hands-on AI delivery in messy real-world environments.
What I help with
Migration Assurance
The problem: You are migrating between data platforms and nobody can answer “is the target data correct?” with confidence. Spot-checks and row counts are not enough.
What I do: Build systematic cross-platform validation — automated comparison of schemas, row counts, key distributions, and date ranges across dozens of tables. I design the framework, deploy dashboards the team can trust, and surface real defects early.
Example outcomes: Validated dozens of tables across Databricks and Snowflake in under a week. Surfaced a row-count discrepancy of more than 2x on the first automated run.
Warehouse Cost & Performance
The problem: Your warehouse bill is growing faster than your data. Pipelines run on brute-force full scans. Nobody has profiled the actual bottlenecks or tested whether an incremental approach is both faster and correct.
What I do: Profile pipeline execution step by step, benchmark alternatives at multiple data scales and warehouse sizes, and validate correctness with row-level comparisons — not just aggregate checksums.
Example deliverables: Profiling reports, benchmark harnesses across warehouse sizes, correctness-validation checks, and rollout plans for incremental pipeline changes.
Analytics Automation with Coding Agents
The problem: Your analytics team spends too much time on manual notebook runs, repetitive SQL, and copy-paste workflows. You have heard about coding agents but need someone who has shipped real work with them, not just demos.
What I do: Design and implement coding-agent workflows for migration, validation, and pipeline modernization. I write the harness, the test discipline, and the operating constraints that make agents reliable in production.
Example outcomes: Used a coding agent to build a full validation workflow — discovery, per-pipeline scripts, unified validator, historical storage, and live dashboards — across seven working sessions.
Time Series / Telemetry ML
The problem: You have sensor, telemetry, or time series data and need production ML — activity recognition, anomaly detection, forecasting, or classification — but your team’s ML experience is limited or focused elsewhere.
What I do: Design and deliver time series ML systems from evaluation harness to production deployment. I specialize in approaches that are fast enough for edge and IoT (ROCKET family, lightweight models) while maintaining accuracy.
Example outcomes: Delivered a time series classification system with state-of-the-art accuracy and 10-100x faster inference than deep learning baselines. Established in-house activity recognition from CAN/telemetry data.
Featured work
SQL Pipeline Optimization
Redesigned a years-old daily full-recompute pipeline into a validated incremental one — 5–10x faster with row-level correctness proof. The fastest approach tested was wrong; multi-scale benchmarks and parallel AI-agent workflows found the working design.
Cross-Platform Migration Validation
Built a production-grade validation system for a large Databricks-to-Snowflake migration: automated comparison across dozens of tables, historical result storage, and a live team dashboard — from zero to deployed in about a week using a coding agent.
Time Series Classification for Production
Delivered a time series classification system for industrial telemetry: state-of-the-art accuracy with 10-100x faster inference than deep learning, designed for edge and IoT deployment.
Who I work best with
Data platform & engineering leaders migrating between warehouses who need systematic validation — not spot-checks and hope.
Technical sponsors & architects who need a senior practitioner to lead or unblock a workstream — someone who owns deliverables, not slide decks.
Teams with telemetry or time series data that need production ML — activity recognition, anomaly detection, or forecasting — not a research paper.
Working on a migration, cost blowup, or analytics automation problem?
Let’s talk.