Texas State University deems GSF SCI an effective metric to evaluate the carbon impact of software

Posted on October 7th, 2022

The Texas State University has just released its report on a study evaluating software carbon intensity of foundation models. Among other findings, the study confirms that the SCI is suitable for effectively measuring the carbon impact of software.

Texas State University deems GSF SCI an effective metric to evaluate the carbon impact of software

What are foundation models?

Foundation models are large software applications for artificial intelligence (AI) computations, e.g., natural language processing, computer vision, voice recognition, etc. They store and work with vast amounts of data. Not surprisingly, massive amounts of energy are consumed when running AI applications. Reason enough to conduct a study to find out if optimizations can reduce their carbon footprint.

Explosion of AI applications in the past decade

Added capabilities have led to an exponential increase in model size and complexity in the past ten years. Increased size and complexity automatically bring more computing. OpenAI reported that the required computing to train state-of-the-art deep learning models had increased 300,000-fold since 2012. For example, training the GPT-3 model consumed approximately 190,000 kWh of energy and produced 85,000 kg of CO2.

But what about after the training phase, when a model is deployed? This is where we have been facing an information gap. Previous studies had primarily focussed on energy use during the training phase of the model. The TSU study is bridging the gap by taking on the inference stage, where the model is put into action on live data to produce actionable output.

Natural Language Processing (NLP) focus of study

To narrow the project scope, the study focuses on a specific type of AI model, the Natural Language Processing (NLP). Interestingly, NLP models allow for accurate power measurements since we can run them on physical commodity hardware. The study thus investigated energy consumption and carbon emissions at the inference stage of such pre-trained open-source foundation models. The selected models were: GPT-J 6B, GPT-Neo 2.7B, GPT-Neo 1.3B, GPT-Neo 125M, and GPT-2.

SCI - Measuring carbon intensity

Measuring and reducing carbon emissions from foundation models is a daunting task. The study used the GSF SCI specification to quantitatively evaluate the carbon impact of the foundation models. The SCI provides us with the rate of carbon emissions per unit of R. For the study, R symbolized one request sent to the AI, producing the relevant output.

SCI = ((E ∗ I) +M) per R

E = Energy consumed by a software system

I = Location-based marginal carbon emissions

M = Embodied emissions of a software system

Study conclusions

When looking at carbon emissions of various foundation models, the study finds that some have more adverse environmental effects than others due to their higher energy consumption. The Key was applying the SCI approach to calculate and compare the SCI metric for each model. GPT-Neo 125M, for example, shows massive energy savings compared to the other models, although with lower quality results. At the same time, GPT-J 6B produces nearly 100% more carbon emissions than GPT-Neo 2.7B.

The study also compares the deployment of foundation models on CPUs versus GPUs, which did not affect the output quality. However, leveraging GPU acceleration significantly benefits both response time and energy efficiency.

The study concludes that:

The environmental impact of foundation models can be quantitatively measured and compared.
SCI is an effective metric to evaluate the carbon impact of different foundation models at the inference stage.
It is possible to replace carbon-intensive foundation models with more efficient ones without sacrificing model quality.
Deploying foundation models on more efficient hardware (e.g. GPUs) can significantly reduce SCI.

“We find that carbon intensive models do not necessarily yield better quality. For example, we observe that when being asked the same set of questions, the answers generated by GPT-Neo 1.3B have similar quality of answers generated by GPT-J 6B but GPT-Neo 1.3B only consumes 27% of energy. Replacing GPT-J 6B with GPT-Neo 1.3B will help mitigate energy requirements and reduce carbon waste without compromising quality.”

You can find a full copy of the final report here:

https://github.com/Green-Software-Foundation/eval_sci_of_foundation_models/blob/main/Report/Final_Report.pdf

This article is licenced under Creative Commons (CC BY 4.0)