What this is

Raw Sports Vault is a premium baseball data library covering 2010 through 2026, with historical records dating back to 1871. More sports are coming in future releases. Today the catalog is seven bundles of pre-cleaned, pre-enriched datasets ranging from $59 (CSV/Excel of the historical record) to $449 (every dataset in every format, including a pre-loaded SQLite database).

The product isn't the data — the data is public. The product is what we did to it: cleaning, deduplication, schema standardization, leakage-checked feature engineering, and packaging in formats you can actually use.

If you've ever spent a weekend trying to merge a historical archive with pitch-by-pitch tracking and an odds feed and gave up, that's the problem we solved.

What's in the library

  • Hundreds of cleaned baseball tables today — 27 historical tables (records dating back to 1871), pitch-by-pitch tracking 12 seasons, enriched player stats, all major odds sources, weather, umpires, lineups, and ~50 derived datasets. New sports added in future releases.
  • Pre-computed derived features — career WAR cumulative, head-to-head pitcher×batter matchups (637,696 unique pairs), velocity fatigue curves by pitch-count bucket, batter platoon splits, umpire K-boost, and more.
  • Deep historical coverage — core data 2010–2026 with leakage-aware joins (rolling stats use shift-1 to exclude the current game, prior-season stats join on season - 1); historical records back to 1871 for the foundational tables.
  • Updated annually — every November, every package is refreshed. The latest version is always available at current pricing.

Where the data comes from

Every byte we ship is sourced from a public origin. The work product we charge for is what's done in between: cleaning, joining, deduplicating, enriching, and reformatting. Sources:

  • Historical baseball public records — comprehensive batting, pitching, fielding, teams, salaries, awards, and postseason results, 1871–2025.
  • Retrosheet — game logs and play-by-play.
  • Baseball Savant / Statcast — pitch-by-pitch 2015–2026, leaderboards (sprint speed, OAA, EV/barrels, expected stats, percentiles, catcher framing).
  • FanGraphs — season-level batter and pitcher statistics, Stuff+, velocity trends.
  • Baseball Reference / Sean Smith WAR — career WAR for batters and pitchers.
  • The Odds API + SBR + public sportsbook archives — historical moneylines, totals, line movement.
  • Visual Crossing / public weather feeds — game-day weather aligned to venue + date.
  • MLB Stats API — game logs, lineups, umpire assignments.
  • TeamRankings, ActionNetwork-equivalent feeds — supplemental team-level metrics.

If you're a source maintainer and have questions about how we use your data, email us — we'll happily walk through specifics.

Disclaimer

Data compiled and enriched from publicly available sources for research and analytical purposes. Not affiliated with or endorsed by any sport data provider or league. All derived features, enrichments, and computed statistics are original work product.
All sales are final. Digital data files cannot be returned once accessed.

What you're licensing is our compiled bundle. Use it for research, modeling, betting, fantasy, editorial, and internal analytics. Don't redistribute the raw files as a competing product. Keep your receipt email safe — it contains your download link. See the FAQ for full usage terms and policies.

Contact

Questions, custom dataset requests, broken-file reports, enterprise licenses, or feedback:

support@rawsportsvault.com

We answer within one business day. For download-link, receipt, or payment issues, contact payhip.com directly — we do not have access to your payment or account information.

Pick a package

Or grab the free sample first if you want to see the schema before committing.