Stop Vibe Coding. Start Vibe Engineering.

Stop Vibe Coding. Start Vibe Engineering.
I bank with Bancolombia. Their statement exports are XLSX files where the same workbook mixes US locale (1,375,571.00) and Colombian locale (1.375.571,00) for two different fields. Credit card statements come in three layouts depending on the network (Visa, Mastercard, Amex), and the multi-currency cards split into PESOS and DOLARES sheets that share a credit limit. Installment plans show up encoded as cuotas X/Y in a single cell. Reversals appear as ghost duplicates with matching authorization codes. Every existing personal finance app either ignores the format or asks for my banking credentials.
So I built Mintroot. Native Android. Kotlin + Jetpack Compose. Encrypted local database. Gemini for classification with a user-supplied key. Forty-eight hours, end to end, one conversation with Opus 4.7 over a 1M-token context window. Signed APK on hour 47.
That's the part that gets attention. It's not the story.
The story is what happened in the first three hours of that conversation, before a single line of Kotlin existed. I named the problem. I scoped it. I wrote a list of what I would not build. I sketched ten domain entities. I rejected three features the model offered. By the time the first BancolombiaSavingsParser got generated, the architecture was already decided — not by the model, by me, with the model as a thinking partner.
That gap — between "let the model decide" and "decide with the model" — is the difference between vibe coding and vibe engineering. And it is the entire reason this project shipped instead of becoming another half-working repo on my laptop.
Vibe Coding vs Vibe Engineering
Let me define both before anyone reads their own definition into them.
Vibe coding is prompt-and-pray. You open a chat, describe a thing, accept whatever lands, glue it together, hope it runs. The artifact is the goal. The model is the architect, the developer, and the QA. You are the typist. If it works, you ship. If it doesn't, you regenerate.
Vibe engineering treats AI as leverage on the parts of engineering that scale: typing, recall, exploration, refactoring. The judgment — what to build, why, how it fits, what to cut — stays with you. The model writes the code. You own the problem.
The trap is that vibe coding feels productive. Files appear. The terminal scrolls. Something compiles. Dopamine. But producing artifacts is not the same as solving a problem, and a repo full of generated code that doesn't ship is not engineering — it's just a tour of what the model felt like writing today.
The Four Moves
Vibe engineering, the way I run it, is four moves in order. I do them with the model in the room, but I do them. Skip any of the first three and you are vibe coding with extra steps.
Identify
Name a real, specific pain. Not a market — a problem you or someone you know experiences weekly. If you cannot describe it in one sentence with a named victim, you do not have a project yet. You have a vibe.
Planify
Write the "not building" list before the "building" list. Decide the scope of rejection first. Constraints, non-goals, and explicit out-of-scope items are what protect the project from the model's seductive helpfulness later.
Structure
Sketch the domain, the modules, the boundaries, the state machines. The architecture should make most future decisions for you. The model proposes; you sign off. This is the part that lets the 1M context window actually pay off.
Create
Only now does code get written. With full scope, full architecture, and full taste applied to every accept. The model is a very good autocomplete on top of a plan. It is a very bad architect. Give it the plan; let it write.
1. Identify
Before any code, I want one sentence: who hurts, and how. Not "an app for finances." A real, irritating, named pain.
Mintroot's pain has a name and a shape. Bancolombia's XLSX exports come in three structural layouts that share almost nothing:
- Savings is one sheet, header rows 2–12, movements rows 14+, date format
D/MMwith the year inferred by walking forward through the rows and detecting month wraparounds. Period crosses a calendar quarter. The same statement repeatsInformación Cliente/Movimientosheader bands every ~50 rows because Excel doesn't paginate well. - Credit cards are one or two sheets named
PESOSandDOLARES. The dollar sheet can be empty but still indicates a multi-currency card. Authorization codes are 6-digit numeric for Visa/Amex but letter-prefixed alphanumeric for Mastercard, and the Mastercard prefix encodes the operation type (R*purchase,C*payment,T*installment).Pago mínimouses US locale;Pago totaluses Colombian locale; same workbook, same column. - Investment funds use date format
YYYYMMDDwith no separators, and yield accrual is invisible in the movement table — it compounds into the unit value and surfaces only in the Resumen block.
That's a problem with a shape. I can describe done in one line: import Bancolombia XLSX of any of those three shapes, classify the transactions, and answer questions about my money in plain Spanish. If I cannot do that in one sentence, I do not start.
If you can't name the pain in one sentence, you don't have a project. You have a vibe.
2. Planify
Scope is a list of what you will not build. I wrote that list first, in hour two of the conversation, and the model never saw a "build the app" prompt until the list was locked.
- Bancolombia XLSX import (savings, CC multi-currency, funds)
- Auto-detection of statement type, account, period
- Encrypted Room DB (SQLCipher) on device
- Gemini classifier with user-supplied API key
- Confidence-aware approval queue (threshold 0.75)
- ClassificationCache with EXACT / PREFIX / REGEX patterns
- Internal transfer matching (TC payments, FIC contributions)
- Installment-plan and reversal modeling
- Compose dashboards: net worth, ratios, top merchants
- Chat with tool calls against the local DB
- CSV + JSON export from day one
- Other banks (BBVA, Davivienda, etc.)
- PDF parsing — XLSX only in v1
- Cloud sync, accounts, or backend
- Multi-user / couple-finance features
- Onboarding wizard — start in the app
- Light mode — dark-first
- Play Store distribution — APK via GitHub
- Hard budgets and goals — reflect, do not coach
- Subscription detection as a discrete feature
- Push reminders for upcoming payments
- Categories editor V1 — auto-classification only
- Privacy non-negotiable: bank data never leaves device unencrypted
- No subscription, no ads, no broker
- Min SDK 31 (Android 12); target latest stable
- BYO Gemini key — no infrastructure I have to run
- 48 hours end to end, signed APK or it does not count
- Source files immutable; DB derived; rebuildable from archive
- Single Android process, single user, single conversation
The "not building" column is what saved this project. Every time the model suggested a feature — and Opus is generous with suggestions — I checked it against that list. Multi-bank adapter abstraction? Not building. Cloud backup? Not building. Onboarding wizard? Not building. The list outranked the model every time.
This is the move vibe coding skips. When you don't write down what you will not build, the model fills the silence with everything it can imagine, and you end up shipping nothing because you tried to ship everything.
3. Structure
Once the scope is locked, I draw the system. Not in detail — bones first. Where the layers are, who owns what data, where the boundaries are, what state machines govern which entities.
The boring rules: ui/ never imports data/ directly. domain/ is pure Kotlin, no Android imports. The Gemini client is one place, so the prompts are one place, so the taxonomy is one place. The parser is mechanical, the classifier is adaptive, and they meet at one interface: ParsedStatement → List<StagedTransaction>.
This is also where I locked the most decision-dense part of the project: the data model. Ten primary entities, all of them argued for in conversation before any of them got generated.
I also wrote down a thesis for the AI half of the system, in hour three, that the entire chat then operated under:
The LLM is the cognitive engine. It does the thinking. The database is its memory. The user is the
editor of last resort. The cache means the LLM does not classify the same merchant 200 times — the
first time it sees RAPPI COLOMBIA*DL it produces a structured classification with reasoning, and
from then on the cache answers.
That single sentence pre-empted dozens of downstream questions. Should we cache classifications? Yes. Should the LLM ever be called for a transaction it has classified before? No, unless the user invalidates. Should the cache generalize? Yes, via periodic generalization passes that propose PREFIX_MATCH and REGEX_LEARNED entries. Should user corrections be sticky? Yes — user_validated is the only immutable flag in classification. All of that flowed from one design principle, decided once.
This is where the 1M context window starts paying for itself. The thesis I wrote at hour three was still loaded at hour thirty. The model did not reinvent the cache shape in a fresh session because there was no fresh session. Every new file got generated with full awareness of every previous file. That's not magic — it's what happens when you stop fragmenting context across chats.
4. Create
Only now does code get written. And here's the part vibe coders miss: writing code is the easy half. By the time I asked for the first BancolombiaSavingsParser, the model didn't need much. It had the scope, the architecture, the taxonomy, the constraints. The prompt was three lines. The output was almost shippable.
The model is a very good autocomplete on top of a plan. It is a very bad architect. Ask "build me a finance app" and you get a vibe. Ask "implement this contract, with these constraints, against this fixture, per the architecture we agreed at hour three" and you get code.
What the 1M Context Window Actually Did
Now the part everyone wants. The single-conversation, 1M-token thing.
It did not write better code. Opus writes the same Kotlin in a 200K session as it does in a 1M session. What 1M gave me was continuity — the elimination of every micro-tax that fragments AI development across sessions.
Classic multi-session
Single-shot 1M context
The big number is not the token count. The big number is the zero in "zero re-primes." Every session boundary in classic AI development is a place where context decays, decisions get re-explained badly, and the model gradually starts contradicting itself. The 1M context window does not make the model smarter. It removes the failure mode where the model forgets what it agreed to four hours ago.
That said: 1M context is not a license to vibe code longer. If your first three hours are bad, the next twenty-seven are bad too, with full fidelity. The window is a multiplier. It multiplies whatever you put in.
The Moments I Said No
The cleanest signal that you are vibe engineering instead of vibe coding is how often you reject the model's suggestions. Mine, for this project, in order:
Every "no" was a feature. Saying no is engineering. Saying yes to everything the model proposes is dictation.
Where the Plan Earned Its Keep
The structural decisions made in hours 1–3 paid off everywhere in hours 4–48. A few of the moments where I was glad past-me had decided:
Account.metadata_jsoninstead of more columns. Credit cards have cupo and statement cycle dates; funds have unit value and rentability; loans have rate and term. Different fields per type. Stuffing them as columns produces a sparse, ugly schema; using a typed JSON column kept the relational schema clean. Decided at hour two, never revisited.source_file_hashfor dedup. Every import re-checks the SHA-256 against existingStatement.source_file_hash. Re-importing the same XLSX is a no-op. This is the kind of correctness property you cannot retrofit after the fact — it has to be in the schema on day one.- Single signed
Transaction.amount. Two-column debit/credit schemas are double-entry accounting legacy that does not suit a personal-app DB. A single signed decimal is unambiguous and simpler. Decided once, propagated to every metric query for free. classification_history_jsonas append-only log. Every re-classification appends to the transaction's own audit trail. Five years from now, the user can ask "why is this categorized as Mercado?" and get a chain of decisions, including which prompt version classified it and which user correction overrode it. The cost: one JSON column. The value: auditability forever.- Four-layer matching algorithm. High-confidence auto-match → probable match (review) → LLM arbitration → user confirmation. Designed once, used by both internal transfer detection and loan payment matching. One algorithm, two callers, zero duplication.
None of these were decisions the model made for me. They were decisions I made with the model. That distinction is the entire point.
Useful Is the Only Bar
Here's the part I want to land hard. It does not matter whether Mintroot is useful to the community. It is useful to me. I bank with Bancolombia. I have real XLSX files. I have real money to track. I will use this app every month. If three other people on the internet also use it, that is gravy. If zero do, the project still succeeded the moment I imported my first statement and saw a year of spending categorized in 90 seconds.
This is the part that gets lost in the "AI lets you ship faster" discourse. Speed is not the unlock. Speed is just the visible part. The actual unlock is that AI lowers the activation energy for solving real, specific, personal problems — the long tail of pain points that were never going to justify a startup but absolutely justify a weekend.
Vibe coders build apps that already exist, badly, because the model defaults to averages. Vibe engineers build apps that don't exist yet, well enough to use, because they brought the problem and used the model for leverage.
If the answer to any of those is no, you did not engineer anything. You generated something.
What This Looks Like as a Practice
If you want to try this on your next project, the loop is small enough to write on a sticky note:
- Find a problem you have this week. Not a market. A pain. Your own, preferably. Name the victim and the irritation in one sentence.
- Write the "not building" list before the "building" list. The scope you reject is the scope you ship. Lock constraints early; the model will respect them if they exist.
- Decide the architecture in conversation, but sign off yourself. Sketch the modules, the boundaries, the state machines, the data model. The model proposes; you dispose. Three hours of design saves three days of regret.
- Only after the first three: generate code. With full context. With taste applied to every accept. Treat every "do you want me to also..." as a budget decision.
- Reject the seductive abstraction. Two callers is a coincidence; wait for three. Premature factories are how solo projects collapse.
- Ship the smallest thing that works on Monday. Iterate from there or don't. Either is fine. Useful beats complete every time.
The 1M context window is a force multiplier on this loop. It is not a substitute for it. If you walk in without a plan, you walk out with a longer pile of code that still does not solve your problem — just generated with more continuity.
The Pitch
Mintroot exists because I needed it, scoped it ruthlessly, structured it before generating it, and then used Opus 4.7's 1M context window to hold the whole thing in one conversational head for two days. Not because I am special. Because the four moves are simple and the model is patient.
Vibe coding produces artifacts. Vibe engineering produces solutions.
The model will happily do either. The choice is yours, every conversation, every accept, every "do you want me to also..." that lands in your chat. Most of those should be no. Most of the rest should be later. The few that survive that filter are the ones worth typing yes to.
Go solve something specific. Bring the problem. Let the model do the typing.
Mintroot is open source on GitHub. MIT licensed. Bring your own Gemini key. Built in 48 hours, one conversation, zero vibes.