Ship production code with the cheapest model you can run.
Free cloud tier, $5 of cloud credits, or your own GPU — all three work.
The trick isn’t a bigger model. It’s a better workbench. EasyAgents decomposes your feature into atomic steps, runs a build gate after each one, and retries failures with focused error context. That’s the architecture that makes Gemini Flash, Groq Llama, Haiku-mini, or a 7B Ollama model ship code that the $30/mo IDEs charge frontier-model prices for. Pick your model. Pick where it runs. Same workbench either way.
Two 45-second runs, side by side.
Same workbench, two backends.
Signed-in user picks gemini-1.5-flash from the model dropdown,
types a feature request, planner runs, executor runs against Gemini Flash, build gate ticks green.
Total spend: ~$0.01.
Same workspace, but the model dropdown shows ollama / qwen2.5-coder:14b.
Executor calls hit localhost.
Total cloud spend: ~$0.04 (planner only). Implementation spend: $0.
Same product, two backends. The local-LLM path is an upgrade, not a different product.
Cheap cloud, your own cloud key, or your own GPU.
All three ship real code.
No GPU. No card. No install. Sign in at easyagents.dev.
Use the bundled trial credits, or pay-as-you-go through us at our cost — no markup. Models: Gemini Flash, Groq Llama, Haiku-mini, GPT-4o-mini.
Typical feature cost: $0.01 – $0.15.
Code is processed by the cloud provider you choose. We don’t store it server-side.
Best for: trying it now, side projects, learning, anyone who doesn’t want to think about API keys yet.
No GPU. No install. Sign in, paste your OpenAI / Anthropic / Gemini / xAI / Groq key.
The same workbench, but every cent goes to your provider. We don’t take a cut. You can run a frontier model as the planner and a $0.10-per-million model as the executor — best of both worlds.
Typical feature cost: $0.02 – $0.20 (whatever your key costs you).
Code goes only to the provider whose key you supplied.
Best for: developers who already have a paid Anthropic / OpenAI / Gemini account and want a workbench that won’t gouge them.
You have a GPU, a Mac, a Jetson, or a Pi cluster. Install the desktop app.
The app bridges your Ollama / LM Studio / vLLM / llama.cpp to the workbench over localhost. No tunnels, no port-forwarding.
Typical feature cost: $0.00 – $0.04. Planner can also go local for zero egress.
Source code never leaves your machine during implementation.
Best for: privacy-sensitive code, enterprise IP, offline work, anyone who already owns the hardware.
Why this matters for the cheap-cloud user: the architecture isn’t a “lite version” of the local-GPU experience. The same planner/executor split, the same atomic decomposition, the same build gate apply to Gemini Flash and Groq just as well as they apply to Ollama. The result is that a free-tier Gemini Flash key can ship features that a one-shot Cursor prompt against GPT-4 can’t — because Cursor is asking GPT-4 to do too much in one prompt. Cheap model + good architecture > expensive model + bad architecture.
Why the desktop app exists at all: the local-LLM capability needs a way to expose localhost:11434 to
the workbench. The browser can’t do that securely on its own. So the desktop app is the bridge — same UI, same
account, just adds the ability to route the executor to your machine.
You don’t need it unless you want to use your own GPU. Path A and Path B don’t require it at all.
No forced progression. Plenty of users will stay on Path A forever and that’s fine. The free cloud tier isn’t a trial — it’s a complete product for projects that don’t need to be private. Pick the one that fits your project.
I spent three months blaming my model.
It wasn’t the model.
I was paying ~$230/month in Cursor + Claude API bills. I tried switching to a local Ollama on my home box to save money. I also tried switching to a cheap cloud model (Gemini Flash at the time) for the same reason.
The failure mode was the same for both. The local 7B veered off course halfway through complex tasks. So did Gemini Flash. So did Groq’s hosted Llama. I concluded “the cheap models just aren’t good enough” and went back to paying premium rates.
Then it hit me. No model handles giant open-ended tasks well. GPT-4 and Claude just fail more gracefully — you don’t notice until you read the diff carefully. The problem wasn’t the model size. It was the task size.
Once I started breaking features into atomic steps (specific file, exact old snippet, exact new snippet, acceptance check), the cheap models suddenly worked. Gemini Flash shipped a complete payment integration. Local qwen2.5-coder 7B shipped a full REST API. My bill went from $230/mo to ~$5/mo on the cheap-cloud path, or ~$0.50/mo on the local-LLM path (planner only).
Friends asked how I was doing it. I rebuilt my Python scripts into a proper workbench. That’s EasyAgents.
— The Founder
It’s not the model. It’s the task size.
A 32K window doesn’t mean 32K of effective reasoning. Quality degrades long before the limit — whether the model is local or cloud, small or large. Dumping your whole repo into a prompt is the most expensive way to make any model look stupid.
Open-ended tasks have many valid interpretations. Cheap models pick one and commit. Atomic tasks have one correct answer — and that’s where Gemini Flash and a 7B local model are competitive with frontier models.
Renaming a method or adding a nullable field doesn’t need GPT-4 or Claude Sonnet. You’re paying $0.10 of context to do $0.001 of thinking. A free-tier Gemini Flash call does the same edit for fractions of a cent.
When a cheap model fails, most tools either re-prompt the whole task (wasting tokens) or give up. The right move is diff the compiler error against the failing snippet and retry just that step. Works equally well on cloud and local.
Plan once. Execute cheaply. Gate every step.
The architecture is the same whether your executor is a Gemini Flash call, a Groq Llama call, or a local 7B. The only thing that changes is the URL and the price tag.
One call to a capable model — Sonnet, GPT-4o, Gemini Pro, or even a local 70B — turns the feature into 10–20 atomic steps. Each step has a target file, an exact old snippet, an exact new snippet, and an acceptance test. You can run the planner on the bundled trial credits, on your own BYOK, or locally.
Each step is handed the 2–3 files that matter. Token count per step stays low. Effective reasoning stays high. This is the lever that makes cheap models competitive — you’re playing to their strengths instead of fighting their weaknesses.
After each step, EasyAgents runs your build. On failure, the error is fed back with only the failing snippet and the model gets one focused retry. Gemini Flash, Groq Llama, and a 7B Ollama all hit ~95% pass-after-one-retry rates — the same retry trick works regardless of where the model lives.
Free cloud tier? Fine. $5 BYOK on a metered cheap model? Fine. Your own GPU via the desktop app? Fine. The workbench doesn’t care — every executor target is just an OpenAI-compatible endpoint.
You choose where the model runs.
Planner: free/cheap cloud (Gemini Flash, Groq, Haiku)
Executor: same cloud model, or step down to a smaller cheap model
Typical feature cost: $0.01 – $0.15
Needs: browser + a cheap key or bundled credits
Code privacy: processed by your chosen cloud
Planner: frontier cloud (Sonnet, GPT-4o, Gemini Pro)
Executor: your local Ollama / LM Studio / vLLM (via the desktop app)
Typical feature cost: $0.02 – $0.08
Needs: browser + the desktop app + your local model
Code privacy: implementation stays on your machine
Planner: your local 70B (via the desktop app)
Executor: your local 7B–14B coder (via the desktop app)
Typical feature cost: $0.00
Needs: browser + the desktop app + two local models
Code privacy: end-to-end on your hardware
All three columns use exactly the same workbench. Switching is a dropdown, not a migration. Start in the first column today; move to the second or third when you install the desktop app.
Cheap cloud, your own key, or your own hardware.
ollama serve plus the desktop app’s bridge, done../server -m model.gguf and you’re done.
Tested on Jetson Orin Nano Super (8GB), Apple M2 / M3, RTX 3090, RTX 4090, dual-3090 NVLink, and a Ryzen-AI 9 HX 370
mini-PC. Cloud-side, tested against every model in Row 1 above.
If /v1/chat/completions answers, it works.
No customer logos. Just numbers.
What it actually costs per feature.
| Approach | Typical cost / feature | Needs a GPU? | Code stays private? | Works on large codebases? |
|---|---|---|---|---|
| Cloud-only (Cursor / Copilot / Claude direct) | $1.50 – $6.00 | No | No | Degrades |
| Local model, unstructured (just chatting with Ollama) | $0.00 (but poor results) | Yes | Yes | No |
| EasyAgents — browser only, free/cheap cloud | $0.01 – $0.15 | No | Your cloud provider sees it | Yes ✓ |
| EasyAgents — desktop app, cloud planner + local executor | $0.02 – $0.08 | Yes | Implementation only | Yes ✓ |
| EasyAgents — desktop app, fully local | $0.00 | Yes | End-to-end | Yes ✓ |
Everything you actually need.
Three free tiers. Pay only when you outgrow them.
| Tier | Price | What you get |
|---|---|---|
| Free — Cheap cloud | $0/mo | Browser workbench + autopilot + build gate + git. Bundled trial credits for free/cheap cloud models (Gemini Flash, Groq, Haiku-mini, GPT-4o-mini). No card. No GPU required. A complete product on its own, not a trial. |
| Free — BYOK | $0/mo | Same workbench, but you bring your own cloud key (Anthropic, OpenAI, Gemini, xAI, Groq). Every cent goes to your provider — we don’t take a cut and don’t mark up. No card. No GPU required. |
| Free — Local | $0/mo | Same workbench, plus the desktop app. Routes the executor to your Ollama / LM Studio / vLLM / llama.cpp. No credits needed once you’re local. Requires a GPU + the desktop app. |
| Pro | $19/mo | All three free tiers’ features + 5 staging slots + priority support + higher Autopilot concurrency. |
| Power | $49/mo | We supply planner credits, no cloud key required. Priority Autopilot queue. 20 staging slots. Works in browser or desktop app. |
| Founding LTD | $249 once | All Power features for life, limited to first 50. Funds the infra so I can stay solo and shipping. |
Pick the cheapest model that works.
Your code stays where you put it.
Free cloud tier, your own cloud key, or your own GPU — all three are first-class. EasyAgents is the workbench around your code; we don’t store your source server-side, and on the local path nothing leaves your machine at all. Start in the browser in two minutes with no card and no install. Install the desktop app when (and if) you want everything to run on your own hardware.