Lessons from ts-agents: a CLI-first time-series automation toolkit
2026-03-01
CLI-first
Stable contract for tools
Composes with shell/CI
Artifacts are paths
Skills
SKILL.md runbooks
Domain priors + checklists
Works with many harnesses
Sandboxes
Isolate messy deps
Scale compute when needed
Save logs + outputs
Make quick-n-dirty analysis fast, repeatable, and hackable.
ts-agentsAdd richer GUI interactions (artifact browser + job monitor + review UX) — but keep the CLI as the stable spine.
CLI + skills + sandboxes + optional UI/agents
ts-agents ...) for scripting, composing, automationlocal, subprocess, docker, daytona, modalDesign choice: artifacts over chat — tools write plots/reports to disk; the agent returns paths + summaries.
Agents and UIs are optional front-ends.
┌──────────────────────────────────┐ ┌──────────────────────────────┐
│ ts-agents CLI (contract) │ │ Front-ends (swappable) │
│ workflow / tool / sandbox / │ │ │
│ skills / data / agent │ │ • Claude Code / Codex CLI │
└──────────┬───────────────────────┘ │ • Custom agents (simple + │
│ │ deep via LangChain) │
▼ │ • Gradio UI │
┌──────────────────────────────────┐ └──────────────────────────────┘
│ Tool registry + wrappers │ ▲
│ metadata: params + cost + │ │
│ timeouts ├───────────────┘
│ wrap: LangChain / deepagent │
└──────────┬───────────────────────┘
│
▼
┌──────────────────────────────────┐
│ Execution layer (sandboxes) │
│ local • subprocess • docker • │
│ daytona • modal │
└──────────┬───────────────────────┘
│
▼
┌──────────────────────────────────┐
│ Artifacts (outputs/…) │
│ plots • tables • JSON • │
│ QMD/PDF • logs │
└──────────────────────────────────┘
Five common integration styles (and why they feel different).
| Approach | What the agent sees | Tradeoffs |
|---|---|---|
| Function tool-calls (LangChain, deepagents) | JSON-schema tools; direct function calls | Great framework UX; wrappers/parsers can be brittle |
Single CLI contract (ts-agents) |
One command surface + stable artifacts | Composable, debuggable, harness-friendly |
| Many small CLIs | Dozens of commands and pipes | Strong Unix composability; weaker discoverability |
| Service/API (HTTP/gRPC) | Network calls + JSON | Good for multi-user governance; infra/auth overhead |
| Notebook/interpreter | Inline Python/cells | Flexible exploration; weaker repeatability |
Key idea: pick a stable contract first (CLI or API), then add wrappers as convenience layers.
CLI as the primary interface + wrappers for agent frameworks.
ts-agentsSKILL.md runbooks for reliable external harness executionWhy this matters: you can swap the “brain” (Claude/Codex/LangChain) without rewriting the time-series code.
Both exist in ts-agents — the point is to compare tradeoffs.
SKILL.md runbooks define workflows + outputsts-agents in a terminalPractical recommendation: start with Path B. Build Path A only when you need strict policies, custom UX, or deep integrations.
Plain-text runbooks are a high-leverage domain prior.
A skill is a cheap way to inject domain expertise: tool order, commands, artifacts, and done criteria.
These are agent runtimes, not just chat UIs.
| Capability | Claude Code CLI | Codex CLI |
|---|---|---|
| Repo-aware editing | Multi-file in-terminal edits | Full-screen TUI + diffs |
| Command execution | Shell commands with logs | Command runs with transcript |
| Permissions/sandbox | Permission modes + sandbox | Approval modes (read-only/auto/full) |
| Long-running work | Background bash + subagents | Background mode + cloud execution |
| Session persistence | Resume/continue sessions | Persistent interactive sessions |
| Context management | Auto-compaction near limits | Long-session context management |
| Extensibility | Hooks + plugins + subagents | Skills + MCP + scripting |
| Review workflow | Git-oriented flows | Built-in review presets |
Implementing wait/resume/background/compaction/permissions/diffs in a custom harness is real engineering work.
Multi-tool + multi-artifact workflows need a workbench.
stdout/stderr, JSON)stdout is a universal UI (text/JSON)Time-series stacks are fragmented; one env rarely covers everything.
numpy/torch/protobuf, etc.)Minutes-to-hours jobs need async, progress, and resumption.
Submit → Execute → Progress → Artifacts → Resume
LOW → VERY_HIGH) + approval gatesWith many similar tools, LLMs often choose slow or mediocre defaults.
Typical failure: given many classifiers, an agent jumps straight to expensive methods before cheap baselines.
minimal / demo / full) to reduce choice setSKILL.md decision trees (“try cheap baselines first…”)In practice: tool ordering + cost flags + skills.
Isolation + scalability + safety for tool execution.
ts-agents backends: local · subprocess · docker · daytona · modal
Rule of thumb: default local; burst to sandbox/cloud for heavy or messy deps; keep artifacts as the interface.
Hackable batteries-included beats reinventing the harness.
# chain tools via artifacts
uv run ts-agents tool run forecast_theta_with_data \
--run Re200Rm200 \
--var bx001_real \
--param horizon=30 \
--save outputs/theta.json
# extract and post-process
cat outputs/theta.json | \
jq '.forecast[0:10]' \
> outputs/preview.json
# make it repeatable
make demo-forecastingContrast: general-purpose CLIs are powerful, but they don’t ship your domain tools.
Where to invest next to make DS agents feel real.
North star: keep the CLI contract stable while iterating agents + UI around it.
Invest in CLI tools + SKILL.md runbooks — let mature CLI agents be your harness.
Treat artifacts as first-class outputs — chat is just the control plane.
Dependencies are the #1 silent failure mode — sandboxes reduce pain but add latency/cost.
Long-running compute needs runtime features (background/progress/resume), not bigger prompts.
Tool routing needs domain priors: cost metadata, bundles, and decision trees.
A stable CLI contract keeps the system hackable and benchmark-friendly.
Repo + docs: github.com/fnauman/ts-agents
# workflow-first demo (no API key)
uv run ts-agents workflow run forecast-series \
--input-json '{"series":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]}' \
--horizon 5
# agent run (requires OPENAI_API_KEY)
uv run ts-agents agent run \
"Compare forecasting methods for bx001_real" \
--type deep
# skills export for Claude/Codex
uv run ts-agents skills export --all-agentsts-agents