Maintenance Mindset: From big data to bigger data — The zettascale revolution has arrived
Welcome to Maintenance Mindset, our editors’ takes on things going on in the worlds of manufacturing and asset management that deserve some extra attention. This will appear regularly in the Member’s Only section of the site.
This month Oracle began taking orders for its first zettascale supercomputer. What is zettascale, you ask? I certainly had to look it up. In short, it is a measure of supercomputer performance, and it represents a really big number. Zettascale computing is a system capable of calculating at least 1021 operations per second. That’s a sextillion or 1,000,000,000,000,000,000,000. For me, it’s hard to even comprehend a number that big. But right now, the strength of our most powerful supercomputers is dictating not only the power and speed of our artificial intelligence (AI) applications but also how quickly we can develop them. Essentially, the strength of AI is humanity’s inability to comprehend numbers and information at this scale.
The zettascale supercomputer is a cloud computing cluster from Oracle Cloud Infrastructure (OCI), which works along with Nvidia’s Blackwell platform and is available with up to 131,072 Nvidia GPUs or graphical processing units. GPUs were originally created to accelerate computer graphics, but they have also been found useful for non-graphic calculations, as well as training neural networks, the machine learning models that are designed to function like a brain’s neural network.
“As businesses, researchers and nations race to innovate using AI, access to powerful computing clusters and AI software is critical,” said Ian Buck, vice president of hyperscale and high-performance computing at Nvidia. “Nvidia’s full-stack AI computing platform on Oracle’s broadly distributed cloud will deliver AI compute capabilities at unprecedented scale to advance AI efforts globally and help organizations everywhere accelerate research, development and deployment.”
OCI highlighted two customers using its high-performance AI Infrastructure: WideLabs, a Brazilian startup, training one of the country’s largest large language models (LLMs), Amazonia IA, and collaboration platform Zoom.
WideLabs also developed bAIgrapher, an application that uses its LLM to generate biographical content based on data collected from patients with Alzheimer’s disease to help them preserve important memories.
In August, Zoom Video Communications rolled out Zoom Docs, powered by Zoom AI Companion, its AI-first collaborative docs solution. It will transform content from Zoom Meetings into actionable documents, such as meeting summaries, business proposals or reports, and probably anything else you can dream up in text form.
Nvidia founder and CEO Jensen Huang spoke earlier this year about “AI factories” and the new data infrastructure for accelerated AI production. These next-gen data centers host the supercomputer platforms needed for the most computationally intensive tasks.
All of these hyperscalers work in many industries, such as financial services, retail, telecommunications and higher education, but manufacturing and industrial processing are big markets too.
In 2021, Nvidia said “the world’s largest manufacturing players” were tapping its AI platform, and it partnered with Sight Machine to tackle the biggest challenge for enterprise manufacturers in scaling AI solutions—manufacturers routinely work with asset tags, millions of tags that have been created over many years, often without a guiding standard or clear records. The tagging situation is compounded by data historians and data lakes that can obscure tag origins and meaning. What Nvidia has been doing with Sight Machine uses AI for automated introspection and labeling across vast amounts of data.
That same year, Nvidia released a case study with Shell, highlighting how it was using AI and high-performance computing to process and deliver real-time analytics for subsurface modeling to improve hydrocarbon extraction. Shell also used AI to better predict the behavior of large-scale reservoirs, in order to better know where to find oil and how to best extract it.
Shell is also developing lower-carbon energy products from sustainable feedstocks and designing new industrial reactors to use the sustainable feedstock. Correctly sizing new facilities around industrial reactors, while considering feedstock and project logistics, is a delicate balance to achieve optimal operations. Digital simulations help address the best options, and powered by Nvidia AI GPUs, those calculations were cut from two weeks to five days.
Most recently, Nvidia is diving into predictive maintenance for manufacturing. Its specific case study, released in August, focuses on companies that manufacture and lease computing devices or Desktop-as-a-Service (DaaS). Nvidia built a predictive model to estimate the remaining useful life of computing assets, based on various operational parameters, sensor data, and historical maintenance records. A predictive model for computing devices is built by aggregating data from thermal, battery, fan, disk, and CPU sensors that measure the temperature, cycles, and other aspects of the machine, and the aggregated data is applied to a forecasting model. Nvidia used GPU-accelerated data science to speed up implementation.
Nvidia founder and CEO Huang said, “The AI factory will become the bedrock of modern economies across the world.” Data goes in; intelligence comes out. These data center facilities will house the supercomputers and storage needed to accelerate AI adoption in every other industry.
We probably don’t even have a word yet for the numbers that will represent the compute power for future AI factories. More than a googolplex (10 to the power of a googol, or 1 followed by 100 zeros.) It is said to surpass the total number of particles in the observable universe. Perhaps some humans can comprehend numbers that large (not me), but it is certainly AI’s playground.