DeepSeek-V3 Technical Report
2,788,000 GPU-hours * 350W TDP of H800 = 975,800,000 GPU Watt-hours
975,800,000 GPU Wh * (1.2 to account for non-GPU hardware) * (1.3 PUE[1]) = 1,522,248,000 Total Wh, or 1,522,248 kWh to train DeepSeek-V3
(1,522,248 kWh) * (0.582kg CO2eq/kWh in China[2]) = 885,948 kg CO2 equivalents to train DeepSeek-V3
A typical US passenger vehicle emits about 4.6 metric tons of CO2 per year.[3]
885,948 kg CO2 per DeepSeek / 4,600 kg CO2 per car = 192.6 cars per DeepSeek
So, the final training run for DeepSeek-V3 emitted as much greenhouse gasses as would be emitted from running about 193 more cars on the road for a year.
I also did some more math and found that this training run used about as much electricity as 141 US households would use over the course of a year.[4]
[1] https://enviliance.com/regions/east-asia/cn/report_10060
[2] https://ourworldindata.org/grapher/carbon-intensity-electric...
[3] https://www.epa.gov/greenvehicles/greenhouse-gas-emissions-t...
[4] divided total kWh by the value here: https://www.eia.gov/tools/faqs/faq.php?id=97&t=3
DeepSeek would have to fully train a brand new V3 every week to approach the kinds of power consumption numbers that individual bitcoin mining facilities are doing.
The energy use from BTC is ludicrous.
(I'm assuming 155 TWh/yr for Bitcoin, using the low-end estimate from here: https://www.polytechnique-insights.com/en/columns/energy/bit... )
A group at Stanford has been benchmarking model providers by transparency here: https://crfm.stanford.edu/fmti/May-2024/index.html
I think a great way to create positive change in the world is to pressure OpenAI, Anthropic, Google, XAI, and Meta to all share details about the energy cost of training and inference for their models. If every major provider provided this transparency, it would be less valuable to keep that info secret from a "keep your competitors in the dark" perspective. It would also allow customers to make decisions based on more than just performance and cost.
[0] https://skift.com/2024/11/06/co2-setback-as-emissions-on-uk-...
A cluster of 2,000 GPUs is what a second tier AI lab has access to. And it shows that you can play in the state of the art LLM-game with some capital and a lot of brains.
I don't know what your household budget is, but $60M might not be what most people associate with "some capital".
And the GPUs would be a shared resource so what you should calculate is what it would have cost to rent them - probably something like 2 m.
That being said I'm amazed how far 1B models have come. I remember when TinyLlama came out a few years ago, it was not great. ($40K training cost iirc.)
That was a 1B model, but these days even 0.5B models are remarkably coherent.