Some AI Prompts Can Cause 50 Times More CO2 Emissions Than Others

6 hours ago 2

Whether it be writing an email or planning a vacation, about a quarter of Americans say they interact with artificial intelligence several times a day, while another 28% say their use is about once a day.

But many people might be unaware of the environmental impact of their searches. A request made using ChatGPT, for example, consumes 10 times the electricity of a Google search, according to the International Energy Agency. In addition, data centers, which are essential for powering AI models, represented 4.4% of all the electricity consumed in the U.S. in 2023—and by 2028 they’re expected to consume approximately 6.7 to 12% of the country’s electricity. It’s likely only going to increase from there: The number of data centers worldwide have risen from 500,000 in 2012 to over 8 million as of September 2024.

A new study, published in Frontiers, aims to draw more attention to the issue. Researchers analyzed the number of “tokens”—the smallest units of data that a language model uses to process and generate text—required to produce responses, and found that certain prompts can release up to 50 times more CO2 emissions than others.

Different AI models use a different number of parameters; those with more parameters often perform better. The study examined 14 large language models (LLMs) ranging from seven to 72 billion parameters, asking them the same 1,000 benchmark questions across a range of subjects. Parameters are the internal variables that a model learns during training, and then uses to produce results.

Reasoning-enabled models, which are able to perform more complex tasks, on average created 543.5 “thinking” tokens per question (these are additional units of data that reasoning LLMs generate before producing an answer). That’s compared to more concise models which required just 37.7 tokens per question. The more tokens were used, the higher the emissions—regardless of whether or not the answer was correct.

The subject matter of the topics impacted the amount of emissions produced. Questions on straightforward topics, like high school history, produced up to six times fewer emissions than subjects like abstract algebra or philosophy, which required lengthy reasoning processes.

Currently, many models have an inherent “accuracy-sustainability trade-off,” researchers say. The model which researchers deemed the most accurate, the reasoning-enabled Cogito model, produced three times more CO2 emissions than similar sized models that generated more concise answers. The inherent challenge then, in the current landscape of AI models, is to be able to optimize both energy efficiency and accuracy. “None of the models that kept emissions below 500 grams of CO₂ equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly,” first author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences, said in a press release.

It’s not just the types of questions asked or the degree of the answer’s accuracy, but the models themselves that can lead to the difference in emissions. Researchers found that some language models produce more emissions than others. For DeepSeek R1 (70 billion parameters) to answer 600,000 questions would create CO2 emissions equal to a round-trip flight from London to New York, while Qwen 2.5 (72 billion parameters) can answer over three times as many questions—about 1.9 million—with similar accuracy rates and the same number of emissions.

The researchers hope that users might be more mindful of the environmental impact of their AI use. “If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure," said Dauner, "they might be more selective and thoughtful about when and how they use these technologies.”

Read Entire Article