// Calendar-year 2025 data cash costs for OpenAI, Anthropic, Google DeepMind.
// Buckets: licenses, dataVendors (vendors producing/scoring data + RL gyms), internalGeneration (in-house human data).
// For OpenAI: estimate vendors first from sector shares, then internal = anchor - licenses - vendors.
// ===== Anchors & shared sector priors =====
@name("OpenAI 2025 total data costs ($)")
@doc("The Information: 'OpenAI plans to pay around $1B this year in data-related costs' (includes human experts + RL gyms). Tight core, with a tail.")
openaiDataTotal2025 = mx([950M to 1.05B, 800M to 1.2B], [0.7, 0.3]) // :contentReference[oaicite:5]{index=5}
@name("Anthropic RL gyms next-12mo ($)")// How many AI digital workers? // ===================================================== // Key question: on tasks that AI can currently solve, how many digital workers could OpenAI deploy given their current inference compute stocks? // Method: // 1. We start with estimates we've made of OpenAI's compute stocks (currently around 1M H100-equivalents, split roughly equally between Hoppers and GB200s) // 2. From this, we account for utilization + only some fraction of compute going to inference, and get a daily amount of inference FLOP // 3. We then make assumptions about of GPT-5 active params, and based on this estimate (a) the number of GPT-5 inference tokens OpenAI could generate in a day. We ensemble this with another estimate of (a), based on OpenAI's announcement that they process 3 billion messages a day. // 4. We then estimate (b) the number of "tokens" spent by a human employee in a day, based on estimates of human thinking speed, as well as some token usage data from METR