Alpha Arena: How AI Performs in the Real Crypto Market

ACMer

6 months ago

These days, an interesting experiment is drawing attention both in China and abroad — nof1.ai has launched a challenge to test how well different AI models can trade and invest in the cryptocurrency market. Right now, six top AI models are actually trading with real money to see who can earn the most profit. It’s not a simulation — it’s real money, real trades.

At the moment, DeepSeek once took the lead but has now been surpassed by Alibaba’s Qwen 3 Max, earning over $3,000 in profit. The rules are simple: each AI starts with $10,000 of real capital, analyzes six major crypto assets like Bitcoin and Ethereum, and makes its own buy/sell and leverage decisions. Whoever ends up with the most money wins.

The fascinating part is that these are general-purpose AI models — not specifically trained for trading. This makes it a true test of whether large models can develop real-world investment intuition. Each AI acts independently, without coordination or sharing strategies. If AI can consistently make money, it could eventually manage investment pools, assist DAOs in decision-making, and spot arbitrage opportunities automatically.

Alpha Arena is the first live-money benchmark to evaluate AI’s investing ability. More info: Nof1 / Alpha Arena.
Each model receives $10,000 in real capital to trade crypto perpetual contracts on Hyperliquid. This article analyzes their trading styles, risk profiles, and performance based on live transaction data.

DeepSeek was leading at one point but is now temporarily overtaken by Qwen 3 Max. GPT-5 trades too frequently — almost like a “restless scalper,” paying a lot in transaction fees. Once again, frequent trading often leads to faster losses. In markets, fewer trades often mean lower risk.

I can’t help but wonder — what if the AI simply did nothing? If it just held that $10,000, it would neither lose nor gain. But perhaps the prompt forbids pure “HODL” behavior — otherwise, it wouldn’t be much of a competition.

It seems even AI can’t escape the human trader’s curse — trading too often to find stability.

AI Real-time Trading in Real Markets (Crypto)

Competition Rules and Objectives

Starting Capital: $10,000 real funds per model
Market: Crypto perpetuals on Hyperliquid
Objective: Maximize risk-adjusted returns with full transparency
Transparency: All model trades and outputs are public
Autonomy: Each AI must independently generate alpha, manage position sizing, timing, and risk
Duration: Season 1 runs until November 3, 2025, 5:00 PM EST

Current Standings (as of Oct 23)

Model	Approx. Net Value	Performance Summary
Qwen 3 Max	$14,287.91	Leader with strong timing and position management
DeepSeek V3.1 Chat	$12,766.00	Stable, systematic trading with good risk control
Grok 4	$8,500.46	Aggressive directional trading; large single-trade volatility
Claude 4.5 Sonnet	$8,734.66	Moderate trend-following style, balanced frequency
Gemini 2.5 Pro	$3,607.77	Frequent misjudgment of market direction; high drawdowns
GPT 5	$2,714.07	High-frequency scalping but continuous losses; weak risk control

Highlighted Live Trades and Analysis

Grok 4 — Long BNB (10/23 16:11)
- Entry $1,076.9, Exit $1,143; Quantity 7.07; Notional $7,614 → $8,081
- Holding time 136h36m; Net P&L: +$463.13
- Analysis: Long-term holding with eventual profit shows Grok’s patience and trend detection, though capital utilization and drawdown risk remain concerns.
GPT 5 — Multiple Shorts/Scalps (10/23 Various)
- Example: Short BNB at $1,103 → $1,124.6 (-$40.14); other ETH, SOL, BTC trades also slightly negative
- Analysis: GPT 5’s style is short-term and high-frequency; execution is fast but lacks conviction or trend following. Frequent small losses compound over time.
Claude 4.5 Sonnet — BNB / ETH / SOL (10/23 Various)
- BNB Long +$175.62; small losses in ETH and SOL trades
- Analysis: Claude prefers gentle trend-following with conservative position sizing and stop rules.
Gemini 2.5 Pro — Multiple Shorts/Longs, Consistent Small Losses
- Across BTC, BNB, DOGE, SOL, many small-to-mid losses
- Analysis: Likely relying on short-term momentum or mean reversion models that fail under high volatility.
DeepSeek V3.1 Chat — Long XRP (10/22)
- 61h38m holding, -$455.66 net loss
- Analysis: Despite loss, DeepSeek demonstrates strong consistency and position control — loss due to directional exposure in volatile conditions.

AI Strategy Types and Weaknesses

High-Frequency Short-Term (e.g. GPT 5)
- Pros: Reacts quickly to micro-movements
- Cons: Vulnerable to noise, high cost from fees and slippage
Mid/Long-Term Trend (e.g. Grok 4, Qwen 3 Max)
- Pros: Captures major moves and favorable risk/reward
- Cons: Requires patience, exposed to drawdowns and capital lock-up
Systematic and Controlled (e.g. DeepSeek V3.1)
- Pros: Smooth equity curve, strong risk control
- Cons: Conservative under fast market opportunities
Statistical / Momentum (e.g. Gemini 2.5 Pro)
- Pros: Effective in structured or trending markets
- Cons: Weak in chaotic or noisy markets; repeated stop losses eat capital

Why Use Real Markets as a Training Ground?

Markets are dynamic, adversarial, and ever-evolving — as AI improves, the challenge itself becomes harder, driving continuous progress.
Real money introduces genuine friction: fees, slippage, capital usage, and risk constraints that simulations ignore.
Human behavior, emotion, and randomness make markets the ultimate test of adaptive intelligence.

Improvement Directions and Research Insights

Enhanced Risk Management: Layered stop-losses, dynamic position sizing (based on volatility), and drawdown-triggered limits.
Hybrid Strategy: Combine short-term signals with long-term trend logic via multi-agent or hierarchical decision systems.
Online Learning: Continual fine-tuning using self-generated data under strict anti-overfitting rules.
Stress Testing: Use synthetic “black swan” events to evaluate model robustness under tail-risk scenarios.
Transaction Cost Modeling: Incorporate fees, slippage, and execution delays directly into reward functions.

Conclusion: Capital Allocation as the True Test of Intelligence

Alpha Arena is more than a contest of profit — it’s a live experiment in defining intelligence through capital allocation.
The early results suggest that patience, risk control, and noise filtering define practical investment intelligence.
For those exploring AI-driven investing, Alpha Arena offers a rare, transparent, and high-stakes environment for research.
For collaborations and hiring, visit Nof1 / Alpha Arena.

Selected Trade Logs (Excerpt, Reverse Chronological)

Time (UTC)	Model	Asset	Direction	Entry → Exit	Qty	Notional	Duration	Net P&L
10/23 16:11	Grok 4	BNB	Long	$1,076.9 → $1,143	7.07	$7,614 → $8,081	136h36m	$463.13
10/23 16:10	GPT 5	BNB	Short	$1,103 → $1,124.6	-1.81	$1,996 → $2,036	7h35m	-$40.14
10/23 15:20	Claude 4.5	SOL	Long	$190.16 → $188.4	37.02	$7,040 → $6,975	53m	-$70.76
10/23 14:10	GPT 5	ETH	Long	$3,891.1 → $3,834.5	1.40	$5,448 → $5,368	4h45m	-$82.06
10/22 22:39	DeepSeek V3.1	XRP	Long	$2.4666 → $2.3397	3,542	$8,737 → $8,287	61h38m	-$455.66
10/22 22:11	Grok 4	ETH	Long	$3,851.2 → $3,724.4	5.06	$19,487 → $18,845	118h33m	-$657.41

Grok’s trading style is the most aggressive of all. Its drawdowns are brutal — when the market turns against Grok, it doesn’t back down. It keeps using extremely high leverage, sometimes even going 20x long, doubling down on volatility.

The worst performers so far? GPT-5 and Gemini.

Once all these large models have been running for a full month, the data will become far more meaningful.

Just yesterday, these AI traders were the darlings of the crypto market. Today, they’ve all crashed.

In Alpha Arena, each AI model is given $10,000 of real capital to trade freely in real markets — fully autonomous, no human intervention. After just two days, DeepSeek was leading the pack with over +40% gains, sitting comfortably at #1.
But early this morning, the market suddenly tanked. The AIs didn’t react in time — they just kept holding on stubbornly, and all got trapped in heavy losses. DeepSeek, the former champion, lost 31% in a single day, while even the usually steady Qwen 3 Max dropped 20%.

What went wrong?

When the competition started on the 18th, the models happened to enter at a market low. The best performers used 10–15x leverage, riding a strong uptrend. Seeing the rally, most AIs piled into long positions. But when the market suddenly reversed, the AIs — unable to read news or sense sentiment shifts — simply executed their algorithms mechanically, without timely stop-losses. With leverage that high, even a small drop caused massive liquidation.

And then there’s Gemini, which made things worse by trading too frequently — its losses from transaction fees alone were substantial.

In the end, all six models crashed almost simultaneously. Still, this is only the third day of the experiment. The timeframe is too short — over the long run, it’s still anyone’s game.

What’s fascinating about this project is that it forces AIs to fight real battles — exposing weaknesses that static benchmarks never reveal. Who adapts fastest under pressure? Who breaks first when volatility spikes? And the ultimate question — would you trust an AI to trade your money?

This is the first product combining AI + Crypto + Web3, capturing global attention. With just $60,000 and two weeks of work, the team built a world-class platform where six top Chinese and Western AI models trade autonomously, 24/7, in real time. It is addictive — you can’t help checking who’s winning or losing. If they added a prediction market feature where users could bet on outcomes, it’d go viral instantly. Whoever the product manager is — absolute genius.

If someone turns this into an actual trading tool, I’d invest right away. Rumor has it, someone in China is already open-sourcing it.

DeepSeek, this is your home turf — you know this game.

Real-money trading has its own kind of thrill — pure adrenaline.

At the bottom of the leaderboard are GPT-5 and Gemini, each down around $3,000 in four days.

On Hyperliquid, the AIs are trading crypto perpetuals fully on-chain — transparent and traceable. In the end, whoever earns the most wins.

Although there’s no API access to see how the models “think” or analyze trades internally, you can still view every transaction. Each model behaves like a trader with a distinct personality.

DeepSeek Chat v3.1 trades like a disciplined long-biased strategist — calm and steady, no high-frequency noise. Grok 4 embraces volatility and can stomach big swings. Qwen remains conservative — steady but flat. GPT-5 and Gemini 2.5 Pro, on the other hand, are like hyperactive day traders — overtrading, countertrend, stumbling in and out, and bleeding money fast.

In the past, we thought AI’s biggest achievements were writing papers, generating images, making videos, or coding. But all of those happen in sterile, predictable environments. The crypto market is nothing like that — it’s a live, zero-sum battlefield.

Financial markets are the ultimate world-modeling engine, and the only benchmark that gets harder as AI gets smarter.

In markets, the logic is simple: volatility, reaction, punishment, reward. The next generation of AI won’t be judged by who labels data better but by three new abilities:

How fast it interprets volatility, How well it balances risk, How quickly and accurately it corrects mistakes. That’s the new benchmark — and it’s a game-changer. New rules, new standards, and a new era that will redefine how we evaluate AI systems and their real-world intelligence.

The real market is the ultimate test of intelligence. Whether it’s a horse or a donkey — you’ll only know once you run it. But tell me, would you dare let an AI trade your real money?

Artificial Intelligence

Post: AI Trading in Real Markets? Would you put your money to it?

–EOF (The Ultimate Computing & Technology Blog) —

2324 words
Last Post: Detecting Compile-time vs Runtime in C++: if consteval vs std::is_constant_evaluated()
Next Post: Deep Dive into Reinforcement Learning: Understanding the Q-Learning Algorithm

The Permanent URL is: Alpha Arena: How AI Performs in the Real Crypto Market (AMP Version)