Mohammad Jalili | Mohism

Two Weeks, a Hackathon, and a Multi-Model Trading Agent in Rust

What we built at the Agora Agents Hackathon, why we used three AI models instead of one, and whether Rust in a 2-week sprint was the right call.

Every automated portfolio tool we’d seen was either a black box you hand your money to, or a rules-based bot that does exactly what you configured it to do three months ago. My friend Mahdi Zarrintareh and I wanted something in between: an agent that reads the market, designs a rebalance, explains its reasoning, and then waits for you to say yes or no before touching anything. Human-in-the-loop not as a safety disclaimer, but as the actual product.

That was the idea we took into the Agora Agents Hackathon: two weeks, Circle’s blockchain infrastructure, $50k in prizes.

We called it Aegis.

What it is

Aegis manages a USDC portfolio across two blockchains (Arc and Base). You set a goal: a risk profile (conservative, balanced, growth), a time horizon, and an objective. Then it runs on auto-pilot. The agent reads price data and market conditions every few minutes, classifies the current regime, and rebalances toward your target allocation on its own.

It only stops to ask you when a guardrail trips: a constitution check, a single-asset concentration cap, a stable-reserve floor, or an active depeg. Outside those, it executes without interrupting you. You can flip back to approving every move at any time.

The moment that made the concept click for me was around day eight. One of our test users set a balanced goal and went to sleep. The agent caught a regime shift at 2am, proposed a defensive rebalance, and (because we’d defaulted to auto-pilot by then) just ran it. Nobody approved it. Nobody was awake. The decision log showed the model, the regime, the confidence score. It was just there in the morning. That’s the version of the product we wanted to build.

Every decision records which model ran it, what regime it saw, and its confidence level so you can audit the full history. The agent also has a memory: it compresses each 24-hour period into a summary it carries forward, so it’s not making decisions in a vacuum every time. It can see its own past reasoning and whether the moves it proposed actually played out the way it expected.

That’s the product. Now for the parts that were more interesting to build.

Three models instead of one

The most deliberate technical decision was using different models for different tasks in the agent loop. Here’s how it breaks down:

The regime classifier runs on every price tick. Its job is to look at volatility, correlation, and drawdown data and output one of three labels: RiskOn, Neutral, RiskOff. This is essentially a structured JSON classification task. It doesn’t need deep reasoning, it needs to be fast and cheap. We use Haiku for this, which at $0.25 per million input tokens is practically free to run continuously.

The strategist only runs when the regime has shifted or the portfolio has drifted past a threshold. It’s doing the actual work: reading the current allocation, the historical performance, the market regime, any yield opportunities, and proposing target weights. This needs real reasoning capability. We route this to Opus or Sonnet depending on the complexity of the portfolio state.

The critic gets the strategist’s proposal and attacks it. Does this increase concentration risk? Is there a better allocation given the tax lot situation? Is this actually consistent with the stated goal? If it finds something meaningful, the strategist gets one revision pass. We use GPT-5 here. Using a different model family for the critic wasn’t just a cost decision. There’s a version of this where the strategist and critic are the same model arguing with itself, and we were skeptical that would surface genuine disagreement.

This routing adds complexity. You’re managing three different API clients, handling different response formats, and the orchestration logic in the agent module ends up being the most sensitive code in the system. But the alternative is asking one model to be cheap, fast, and deeply reasoning at the same time, and that’s not how any of these models work.

The Rust backend in a two-week sprint

Yes, we wrote the backend in Rust. It’s an Axum server with SQLx and PostgreSQL. The agent loop, the Circle API integrations, the SSE streaming endpoint, all of it.

The honest answer to “was this the right call” is: it depends on your definition of right. We didn’t move as fast in the first three days as we would have in Node or Python. The type system caught several things that would have been runtime errors in a language that doesn’t care about ownership or nullability. Whether those saves were worth the cost of slower initial velocity is genuinely hard to measure.

What I can say is that the real-time parts worked on the first try. The SSE endpoint streams price ticks, regime flips, agent decisions, and rebalance status updates to the frontend. No race conditions, no mysterious state divergence. When the agent loop and the streaming server are running concurrently, Rust’s borrow checker makes sure you can’t accidentally share mutable state across them in ways that cause subtle bugs.

For a hackathon, where you’re moving fast and sleep-deprived, that forced correctness has value. You’re not debugging thread safety issues at 1am.

What Circle’s stack actually felt like to use

This hackathon was specifically built around Circle’s infrastructure: their L1 blockchain (Arc), cross-chain transfer protocol (CCTP V2), smart account wallets, and Paymaster for gas abstraction.

The Paymaster was genuinely useful. Users pay gas in USDC rather than needing a native gas token. For a stablecoin-native product, this matters: your user has USDC, they want to manage USDC, and the UX shouldn’t require them to acquire a separate token just to pay for transactions.

CCTP V2’s “Fast Transfer with Hooks” for moving USDC between Arc and Base was the part that surprised us most. Cross-chain transfers are usually the part of a DeFi product that introduces the most latency and error surface. The hook mechanism lets you attach logic to the transfer completion, which made the executor code significantly cleaner than we expected.

The wallet setup (Circle’s modular smart accounts) took longer than expected to wire up correctly. The documentation covers the happy path well. The basic “create wallet, fund it, sign a transaction” flow is clear and the examples work. What it doesn’t cover is what happens when you’re building on top of it. Account initialization with an existing signer, recovery flows, the correct order of operations when a smart account hasn’t been deployed yet. None of that was documented. We pieced it together from SDK source code and one helpful thread in the Discord. If you’re building with Circle’s smart accounts, budget time for that gap. The core primitives are solid; the integration surface is where you’ll lose hours.

What running it for real found

Three people used Aegis end to end on Arc and Base testnet, each with their own Circle wallet. Not simulated sessions: they onboarded, set a goal, and ran real multi-leg plans. CCTP bridge plus a USDC swap, gas paid through Paymaster, actual on-chain state.

The count is small. What matters is that running it with real users changed the product in ways code review wouldn’t have. The original design required approval for every move. Users found it tedious. We flipped the default to auto-pilot and moved manual approval to opt-in. That’s the version that shipped.

Running it also surfaced two execution bugs. A stale balance read was oversizing cross-chain transfers: the agent was sizing a bridge leg against a balance that hadn’t reflected a recent move yet. And a curation gap in the token registry was letting the agent propose assets with no live on-chain liquidity. Both were real failures, not hypotheticals, and both got fixed. That’s the argument for putting real transactions in front of real wallets as early as possible, even when the user count is three.

What we’d do differently

The agent’s memory compression is too aggressive. We summarize 24 hours into a single structured document, which means the agent loses detail about what actually happened during that period. A more layered approach, keeping recent decisions at higher fidelity and compressing older history more aggressively, would give it better short-term recall.

The frontend is doing too much state management. We used Zustand, which is fine, but the real-time state (SSE events) and the server state (React Query) ended up partially duplicating each other in ways that created some confusing update sequences. If we were starting over, the real-time layer would own its own store that’s read-only to the UI.

We also left tax-loss harvesting and USYC yield integration as planned features. Given that we had 1099-DA export in the architecture diagrams and zero of it implemented, I’d scope more ruthlessly next time and cut earlier.

The part that held up

The transparency held up. Every rebalance, whether auto-executed or manually approved, shows the model that ran it, the regime it saw, and its confidence. That audit trail was the thing people actually wanted. Not control over every transaction, but the ability to see what the agent was thinking and why it acted.

Auto-pilot turned out to be the right default once we stopped insisting that approval was a feature and started treating it as friction. The agent doing its job quietly, with a full decision log you can inspect anytime, is more useful than an agent that asks permission for every move. People will trust a system that shows its work. They won’t use one that makes them approve it.

We’re still building. The tax harvesting layer, the USYC yield sleeve, more layered memory. There’s a real product here and the hackathon was two weeks of it.


Thoughts? Email me at .