LunarCrush · Insights

2026-05-15 · cross-LLM analysis

Published today · xai-org/x-algorithm

X just open-sourced its algorithm.
Here is what four frontier LLMs see in it.

On May 15 2026, xAI published xai-org/x-algorithm — a Rust-first rewrite of X's recommendation system with a Grok-1 transformer at the core. We ran the release past GPT-5, Gemini 2.5 Pro, and Grok 4, then layered our own analysis. Below: what they all agreed on, where each one saw something different, and what it means for anyone selling X engagement data.

Codebase

57% Rust

43% Python · gRPC service boundaries

Model backbone

Grok-1

Transformer ported into recsys ranking

Released model

256-d · 2L

3GB toy via Git LFS · production weights withheld

Commit history

2 commits

Fresh release or aggressive squash

→ TL;DR

This is a re-architecture, not a refresh. The 2023 Twitter release was Scala/Java with hand-crafted features. This is Rust + transformer + multi-action heads. Different system entirely.
The IP is in what is missing. Training pipelines, real Phoenix weights, production indices — none of it released. The 256-dim toy model is a runnable demo, not the production system.
The interesting bet is architectural unification. Grok's LLM transformer is now also the X ranker. Scaling one benefits both products.
For social-data players: the multi-action heads are an actionable signal. They tell you exactly what X's algorithm optimizes for — and what it actively suppresses.

What is in the box

Five components, two languages, one production stack.

The repo ships five working modules plus a Rust trait framework. Models, training, and serving infra are not included — and that absence is the tell.

home-mixer

Orchestration layer

gRPC endpoint that blends candidates, injects ads, applies brand-safety tracking.

phoenix

ML retrieval + ranking

Grok-1 transformer adapted for recsys. Multi-action prediction heads. Attention-masked for cacheable scoring.

grox

Content understanding

Spam + PTOS enforcement classifiers. Probably the most policy-relevant module.

thunder

In-network store

Kafka-ingested feature store. Schema references included; producer/consumer impls are not.

candidate-pipeline

Reusable framework

Rust trait-based stages. Composable retrieval → filter → rank → mix.

— what is missing

The actual IP

Full training pipelines, real Phoenix weights, production embedding indices, Kafka serving code, raw engagement data.

Where all three LLMs agreed

Five things GPT-5, Gemini, and Grok 4 all flagged independently.

We sent the same prompt to all three. These five points came back from every one of them, in different words but identical substance. That convergence is the signal.

Attention masking is the genuinely novel insight

Candidates cannot attend to each other in-batch. Scores become batch-agnostic, cacheable, and resistant to manipulation via batch composition.

3/3 flagged

"Eliminated every hand-engineered feature" is marketing

The Author Diversity Scorer is a post-ranking heuristic. Brand-safety coupling is another. The claim does not survive contact with the repo.

3/3 flagged

"End-to-end" is structurally hollow

No training pipelines. No data. No production indices. You cannot reproduce, audit, or fully replicate anything material.

3/3 flagged

The multi-action weights are the audit target

The ratio of like vs reply vs click vs block / mute / report decides whether the algorithm rewards outrage or thoughtful content. None published.

3/3 flagged

Transparency as competitive pressure

Meta and TikTok now either match or explain why they do not. A reputational gambit aimed squarely at competitors.

3/3 flagged

Where they diverged

Three frontier models. Three distinctive lenses.

Same prompt, same repo, three different angles. Worth noticing what each chose to elevate — and where they all stayed quiet.

OpenAI GPT-5

via api.openai.com

Most thorough on governance

"Brand safety and PTOS classifier thresholds, error costs, and their coupling to ranking — false positives/negatives by topic or dialect. Robustness: feature missingness handling, spam defenses, and places a spammer could exploit candidate masking/caching."

policymanipulation surfaceenterprise lens

Google Gemini 2.5 Pro

via generativelanguage.googleapis.com

Most measured and political

"This is a talent acquisition play and a transparency gambit. It positions X/xAI as a serious engineering organization, using Rust and a unified Grok-based ML stack to attract top talent."

strategic framingdiplomatictalent gambit

xAI Grok 4

via api.x.ai

Most direct — and self-critical

"256-dim / 2-layer model is a toy; the actual system runs on larger un-released weights. Phoenix still inherits Grok-1 inductive biases plus post-ranking diversity and brand-safety logic."

technical bluntnessparent-co self-critiquenoted bias

→ Telling absence

None of the three frontier models touched the political / moderation-policy context — verified-account amplification, content-moderation rollback under Musk, hate-speech policy changes since 2024. All three stayed in pure ML/engineering critique. Either alignment training is making them shy on contested topics, or those features simply are not in this release. Worth knowing which.

Ultrathink — what the LLMs missed

Four deeper reads on what just shipped.

Beyond what the frontier LLMs converged on, four insights that change how to interpret the release.

The real story is architectural unification, not transparency.

They took Grok-1's transformer and used it as the recommendation ranker's backbone. The marginal cost of scaling Grok now improves both the LLM and the X feed. This is the same compute-flywheel play TPUs gave Google and PyTorch gave Meta. If it works, "Grok improves → feed improves" becomes a structural advantage.

The release is audit-shaped without being auditable.

The 256-dim, 2-layer model is a runnable demo to silence "where's the code." The actual production system runs on weights that are not in the repo. You can inspect the architecture but you cannot validate any behavioral claim against the real model. Transparency theatre, not transparency.

The cacheable scoring is also an anti-manipulation defense.

Independent candidate scores mean you cannot game ranking through batch composition — no faking co-occurrence patterns to surface low-signal content. Quietly the most underrated piece of the design. Twitter has historically struggled with this exact attack class.

The multi-action heads are an actionable map.

Multi-action prediction with explicit negative weights for block/mute/report tells you exactly what the algorithm optimizes for and against. For anyone selling X engagement data, this rewrites how to score "quality" engagement vs raw volume. Suppression detection becomes possible.

What this unlocks for LunarCrush

X just told us how to score our own data.

Three direct implications for LunarCrush — and any institutional buyer of X social engagement signals.

Quality engagement scoring

Multi-action heads with explicit negative weights for block/mute/report let us refine "engagement quality" scoring beyond raw volume. Sellable to hedge funds: "this creator's reach is being algorithmically supported / suppressed."

Creator algorithmic ceiling

The Author Diversity Scorer attenuates repeat-author exposure per session. High raw engagement no longer implies algorithmic favor. Our creator analytics need to factor in a per-session ceiling.

Public grox taxonomy

X's spam + PTOS classifier architecture is now public reference. We can legitimately mirror their content-quality taxonomy in our X ingestion pipeline — better noise filtering, fewer false-engagement signals reaching customers.

X just open-sourced its algorithm.Here is what four frontier LLMs see in it.

Five components, two languages, one production stack.

Five things GPT-5, Gemini, and Grok 4 all flagged independently.

Three frontier models. Three distinctive lenses.

Four deeper reads on what just shipped.

The real story is architectural unification, not transparency.

The release is audit-shaped without being auditable.

The cacheable scoring is also an anti-manipulation defense.

The multi-action heads are an actionable map.

X just told us how to score our own data.

X just open-sourced its algorithm.
Here is what four frontier LLMs see in it.