Cheap models, smart architecture

Ship production code with the cheapest model you can run.
Free cloud tier, $5 of cloud credits, or your own GPU — all three work.

The trick isn’t a bigger model. It’s a better workbench. EasyAgents decomposes your feature into atomic steps, runs a build gate after each one, and retries failures with focused error context. That’s the architecture that makes Gemini Flash, Groq Llama, Haiku-mini, or a 7B Ollama model ship code that the $30/mo IDEs charge frontier-model prices for. Pick your model. Pick where it runs. Same workbench either way.

Try free in the browser → Download the desktop app (for local LLMs)

Watch the 90-second demo

Free to start. No card. No GPU required. Bundled cloud credits get your first feature shipped in two minutes.

What it actually looks like

Two 45-second runs, side by side.
Same workbench, two backends.

☁️ Browser-only, free cloud

Signed-in user picks gemini-1.5-flash from the model dropdown, types a feature request, planner runs, executor runs against Gemini Flash, build gate ticks green. Total spend: ~$0.01.

🖥️ Desktop app + local LLM

Same workspace, but the model dropdown shows ollama / qwen2.5-coder:14b. Executor calls hit localhost. Total cloud spend: ~$0.04 (planner only). Implementation spend: $0.

Same product, two backends. The local-LLM path is an upgrade, not a different product.

Three equal paths

Cheap cloud, your own cloud key, or your own GPU.
All three ship real code.

☁️

Path A — Free / cheap cloud

No GPU. No card. No install. Sign in at easyagents.dev.

Use the bundled trial credits, or pay-as-you-go through us at our cost — no markup. Models: Gemini Flash, Groq Llama, Haiku-mini, GPT-4o-mini.

Typical feature cost: $0.01 – $0.15.

Code is processed by the cloud provider you choose. We don’t store it server-side.

Best for: trying it now, side projects, learning, anyone who doesn’t want to think about API keys yet.

🔑

Path B — Your own cloud key

No GPU. No install. Sign in, paste your OpenAI / Anthropic / Gemini / xAI / Groq key.

The same workbench, but every cent goes to your provider. We don’t take a cut. You can run a frontier model as the planner and a $0.10-per-million model as the executor — best of both worlds.

Typical feature cost: $0.02 – $0.20 (whatever your key costs you).

Code goes only to the provider whose key you supplied.

Best for: developers who already have a paid Anthropic / OpenAI / Gemini account and want a workbench that won’t gouge them.

🖥️

Path C — Your own GPU (desktop app)

You have a GPU, a Mac, a Jetson, or a Pi cluster. Install the desktop app.

The app bridges your Ollama / LM Studio / vLLM / llama.cpp to the workbench over localhost. No tunnels, no port-forwarding.

Typical feature cost: $0.00 – $0.04. Planner can also go local for zero egress.

Source code never leaves your machine during implementation.

Best for: privacy-sensitive code, enterprise IP, offline work, anyone who already owns the hardware.

Why this matters for the cheap-cloud user: the architecture isn’t a “lite version” of the local-GPU experience. The same planner/executor split, the same atomic decomposition, the same build gate apply to Gemini Flash and Groq just as well as they apply to Ollama. The result is that a free-tier Gemini Flash key can ship features that a one-shot Cursor prompt against GPT-4 can’t — because Cursor is asking GPT-4 to do too much in one prompt. Cheap model + good architecture > expensive model + bad architecture.

Why the desktop app exists at all: the local-LLM capability needs a way to expose localhost:11434 to the workbench. The browser can’t do that securely on its own. So the desktop app is the bridge — same UI, same account, just adds the ability to route the executor to your machine. You don’t need it unless you want to use your own GPU. Path A and Path B don’t require it at all.

No forced progression. Plenty of users will stay on Path A forever and that’s fine. The free cloud tier isn’t a trial — it’s a complete product for projects that don’t need to be private. Pick the one that fits your project.

From the founder

I spent three months blaming my model.
It wasn’t the model.

I was paying ~$230/month in Cursor + Claude API bills. I tried switching to a local Ollama on my home box to save money. I also tried switching to a cheap cloud model (Gemini Flash at the time) for the same reason.

The failure mode was the same for both. The local 7B veered off course halfway through complex tasks. So did Gemini Flash. So did Groq’s hosted Llama. I concluded “the cheap models just aren’t good enough” and went back to paying premium rates.

Then it hit me. No model handles giant open-ended tasks well. GPT-4 and Claude just fail more gracefully — you don’t notice until you read the diff carefully. The problem wasn’t the model size. It was the task size.

Once I started breaking features into atomic steps (specific file, exact old snippet, exact new snippet, acceptance check), the cheap models suddenly worked. Gemini Flash shipped a complete payment integration. Local qwen2.5-coder 7B shipped a full REST API. My bill went from $230/mo to ~$5/mo on the cheap-cloud path, or ~$0.50/mo on the local-LLM path (planner only).

Friends asked how I was doing it. I rebuilt my Python scripts into a proper workbench. That’s EasyAgents.

— The Founder

Why cheap models “fail” at real code

It’s not the model. It’s the task size.

🧠

Reasoning ≠ context window

A 32K window doesn’t mean 32K of effective reasoning. Quality degrades long before the limit — whether the model is local or cloud, small or large. Dumping your whole repo into a prompt is the most expensive way to make any model look stupid.

🎯

Ambiguity compounds

Open-ended tasks have many valid interpretations. Cheap models pick one and commit. Atomic tasks have one correct answer — and that’s where Gemini Flash and a 7B local model are competitive with frontier models.

💸

Frontier prices for trivial edits

Renaming a method or adding a nullable field doesn’t need GPT-4 or Claude Sonnet. You’re paying $0.10 of context to do $0.001 of thinking. A free-tier Gemini Flash call does the same edit for fractions of a cent.

🔄

No retry intelligence

When a cheap model fails, most tools either re-prompt the whole task (wasting tokens) or give up. The right move is diff the compiler error against the failing snippet and retry just that step. Works equally well on cloud and local.

The architecture

Plan once. Execute cheaply. Gate every step.

The architecture is the same whether your executor is a Gemini Flash call, a Groq Llama call, or a local 7B. The only thing that changes is the URL and the price tag.

1

Decomposition (planner pass)

One call to a capable model — Sonnet, GPT-4o, Gemini Pro, or even a local 70B — turns the feature into 10–20 atomic steps. Each step has a target file, an exact old snippet, an exact new snippet, and an acceptance test. You can run the planner on the bundled trial credits, on your own BYOK, or locally.

2

Targeted context (no whole-repo dumps)

Each step is handed the 2–3 files that matter. Token count per step stays low. Effective reasoning stays high. This is the lever that makes cheap models competitive — you’re playing to their strengths instead of fighting their weaknesses.

3

Build gate

After each step, EasyAgents runs your build. On failure, the error is fed back with only the failing snippet and the model gets one focused retry. Gemini Flash, Groq Llama, and a 7B Ollama all hit ~95% pass-after-one-retry rates — the same retry trick works regardless of where the model lives.

4

Run the executor wherever it’s cheapest for you

Free cloud tier? Fine. $5 BYOK on a metered cheap model? Fine. Your own GPU via the desktop app? Fine. The workbench doesn’t care — every executor target is just an OpenAI-compatible endpoint.

Trust signal: every gate check, every retry, every commit, and every cent of cost is visible in the task board. There is no hidden agent loop. You can pause it at any step.

Three valid ways to run it

You choose where the model runs.

All cloud, in the browser

Planner: free/cheap cloud (Gemini Flash, Groq, Haiku)

Executor: same cloud model, or step down to a smaller cheap model

Typical feature cost: $0.01 – $0.15

Needs: browser + a cheap key or bundled credits

Code privacy: processed by your chosen cloud

Mixed (cloud planner, local executor)

Planner: frontier cloud (Sonnet, GPT-4o, Gemini Pro)

Executor: your local Ollama / LM Studio / vLLM (via the desktop app)

Typical feature cost: $0.02 – $0.08

Needs: browser + the desktop app + your local model

Code privacy: implementation stays on your machine

All local

Planner: your local 70B (via the desktop app)

Executor: your local 7B–14B coder (via the desktop app)

Typical feature cost: $0.00

Needs: browser + the desktop app + two local models

Code privacy: end-to-end on your hardware

All three columns use exactly the same workbench. Switching is a dropdown, not a migration. Start in the first column today; move to the second or third when you install the desktop app.

Bring whatever model you want

Cheap cloud, your own key, or your own hardware.

☁️ Cloud (no GPU, no install)

✨

Gemini Flash

Google’s $0.075 / $0.30 per million tokens. Free tier exists. Recommended for new users.

⚡

Groq Llama 3.x

Fastest inference on the planet, often free-tier eligible, ~$0.05 per million for small models.

🟠

Anthropic Haiku

Most reliable cheap model for instruction-following. Great executor pick.

🟢

OpenAI GPT-4o-mini

Strong on edge cases, integrates with anyone who already has an OpenAI key.

🖥️ Local (GPU + desktop app)

🦙

Ollama

The default for local. ollama serve plus the desktop app’s bridge, done.

🎨

LM Studio

Toggle “local server” on, the app finds it automatically.

⚡

vLLM

High-throughput option for users with multi-GPU rigs.

🛠️

llama.cpp server

./server -m model.gguf and you’re done.

🔲 Also runs on edge hardware — including NVIDIA Jetson Orin Nano Super

Tested on Jetson Orin Nano Super (8GB), Apple M2 / M3, RTX 3090, RTX 4090, dual-3090 NVLink, and a Ryzen-AI 9 HX 370 mini-PC. Cloud-side, tested against every model in Row 1 above. If /v1/chat/completions answers, it works.

What we can prove today (closed beta)

No customer logos. Just numbers.

120+

projects under management

87%

of steps pass on first try

1,400+

commits shipped via build gate this week

$3,200

saved vs cloud-only baseline

Beta is open. We’re not pretending to have logos we don’t have. If you ship something with EasyAgents and we can write you up, the first ten case studies get a lifetime founding-member tier — see the LTD below.

By the numbers

What it actually costs per feature.

Approach	Typical cost / feature	Needs a GPU?	Code stays private?	Works on large codebases?
Cloud-only (Cursor / Copilot / Claude direct)	$1.50 – $6.00	No	No	Degrades
Local model, unstructured (just chatting with Ollama)	$0.00 (but poor results)	Yes	Yes	No
EasyAgents — browser only, free/cheap cloud	$0.01 – $0.15	No	Your cloud provider sees it	Yes ✓
EasyAgents — desktop app, cloud planner + local executor	$0.02 – $0.08	Yes	Implementation only	Yes ✓
EasyAgents — desktop app, fully local	$0.00	Yes	End-to-end	Yes ✓

Built for real development work

Everything you actually need.

🔀

Smart model routing

Planner cloud, executor local, your choice per project.

⚛️

Atomic decomposition

Every feature becomes 10–20 build-gated steps.

🏗️

Build gate

Every step must compile. Errors retry with focused context, not the whole prompt.

🔒

Stays on your machine

Implementation never leaves your network. Planner is opt-in cloud or fully local.

🌿

Git integration

Auto-commit after each green step. Easy rollback per step, not per feature.

🤖

Any OpenAI-compatible endpoint

Ollama, LM Studio, vLLM, llama.cpp server, even a remote box you SSH into.

Pricing

Three free tiers. Pay only when you outgrow them.

Tier	Price	What you get
Free — Cheap cloud	$0/mo	Browser workbench + autopilot + build gate + git. Bundled trial credits for free/cheap cloud models (Gemini Flash, Groq, Haiku-mini, GPT-4o-mini). No card. No GPU required. A complete product on its own, not a trial.
Free — BYOK	$0/mo	Same workbench, but you bring your own cloud key (Anthropic, OpenAI, Gemini, xAI, Groq). Every cent goes to your provider — we don’t take a cut and don’t mark up. No card. No GPU required.
Free — Local	$0/mo	Same workbench, plus the desktop app. Routes the executor to your Ollama / LM Studio / vLLM / llama.cpp. No credits needed once you’re local. Requires a GPU + the desktop app.
Pro	$19/mo	All three free tiers’ features + 5 staging slots + priority support + higher Autopilot concurrency.
Power	$49/mo	We supply planner credits, no cloud key required. Priority Autopilot queue. 20 staging slots. Works in browser or desktop app.
Founding LTD	$249 once	All Power features for life, limited to first 50. Funds the infra so I can stay solo and shipping.

Three free tiers, three different audiences — none is “the trial.” The cheap-cloud tier is a real product for visitors without a GPU who don’t want to manage keys. The BYOK tier is for developers who already pay for an OpenAI or Anthropic key and want a better workbench around it. The local tier is for the r/LocalLLaMA / r/SelfHosted reader who has the hardware. You can stay on any of them forever. Pro and Power exist for when you want priority queue, more concurrency, or to stop thinking about credit caps.

Pick the cheapest model that works.
Your code stays where you put it.

Free cloud tier, your own cloud key, or your own GPU — all three are first-class. EasyAgents is the workbench around your code; we don’t store your source server-side, and on the local path nothing leaves your machine at all. Start in the browser in two minutes with no card and no install. Install the desktop app when (and if) you want everything to run on your own hardware.

Try free in the browser → Download the desktop app (for local LLMs) →

Ship production code with the cheapest model you can run. Free cloud tier, $5 of cloud credits, or your own GPU — all three work.

Two 45-second runs, side by side. Same workbench, two backends.

Cheap cloud, your own cloud key, or your own GPU. All three ship real code.

I spent three months blaming my model. It wasn’t the model.