Search Bench

Search Bench is a community-driven experiment that compares search engines without showing which engine produced which results. Inspired by LLM Arena, it presents two anonymous result sets side-by-side, asks you to pick which you prefer (or mark them as similar), and aggregates those blind votes into a public leaderboard.

Methodology

Each vote is a pairwise comparison: you see results from two engines and choose one, or call it a tie (counted as 0.5 win each).

Scoring uses a Bradley–Terry model:

  • Ability scores are updated iteratively and normalized by geometric mean.
  • Final scores are log-scaled: 1500 + 400 × log₁₀(ability), producing an Elo-like scale.

Matchup selection is adaptive — it prioritizes under-sampled search engines and close matchups via an uncertainty × closeness weighting, so the most informative comparisons are shown more often.

Caveats

Search Bench is intentionally not an objective ranking: queries and voters are self-selected, results vary by context, and “prefer” is subjective. The goal is to explore whether removing brand bias reveals a real preference signal at scale, and how that signal shifts as more independent voters contribute.

See also the launch blog post.

Search Bench main page & leaderboard

Technologies used:

  • Next.js,
  • Tailwind CSS,
  • SQLite.
I launched it in January 2026.