Lessons for Geoscientists from the book Real World AI: A Practical Guide for Responsible Machine Learning

In this blog article Enthought Energy Solutions Vice President Mason Dykstra looks at the recently published book titled “Real World AI: A Practical Guide for Responsible Machine Learning” in the context of both the technical challenges faced by geoscientists and how to scale.

Author: Mason Dykstra, Ph.D., Vice President, Energy Solutions 

In the newly released book titled “Real World AI: A Practical Guide for Responsible Machine Learning,” Alyssa Simpson Rochwerger and Wilson Pang share a number of examples of how organizations have succeeded – and failed – at integrating machine learning initiatives. Among them, “Only 20 percent of AI in pilot stages at major companies make it to production, and many fail to serve their customers as well as they could. In some cases, it’s because they’re trying to solve the wrong problem. In others, it’s because they fail to account for all the variables—or latent biases—that are crucial to a model’s success or failure.”

Artificial intelligence is pushing the boundaries of what is possible: live facial recognition, autocorrecting language editors, fraud detection, customer service chatbots and more. Advances in autonomous technologies are changing the way people and machines operate. And people are increasingly reliant on the power of AI technology to improve decision-making, safety, communication and productivity. In fact, today’s AI has advanced so much that most people are impacted by some form of AI every day without realizing it.

But using AI to solve complex, real-world business problems comes with a unique set of challenges that are absent in academic and scientific research settings. Overcoming these challenges requires resources, skills, collaboration and knowledge, of which data science is only one. Used appropriately, applied machine learning can help companies accelerate and transform their businesses in ways unimagined a few years ago. 

With today’s combination of open source big data, faster processing speeds and transformative advances in cloud computing, businesses have the potential to implement and scale their AI faster than ever. However, deploying AI initiatives is a huge undertaking and a treacherous process. According to industry analysts, more than 80% of AI projects never make it past the pilot stage. Why? 

The 4 Biggest Business Challenges of Deploying AI and the Geoscientist

“Real World AI” highlights four key challenges organizations face when leveraging AI. This categorization serves very well in providing guidance to geoscientists working to deliver value to their organizations through AI/machine learning technologies and workflows. 

1. Defining the problem

The first step to overcoming any challenge is identifying the problem. AI is no exception. Once you have a defined objective, you can implement a specialized approach that takes existing data, operational constraints and risk into consideration.

In “Real World AI,” defining the problem also includes determining how well you want to solve the problem. In health care, a cancer detection neural network requires human-machine interaction and a high degree of certainty to reduce risk and improve final outcomes, while many other life sciences applications have less consequential health risks. 

In the energy industry, the geoscientist must use practical knowledge and experience in defining both the problem and an acceptable level of accuracy required of the solution. Risks for shallow drilling hazards and uncertainty in pore pressure predictions are critical to operations and safety, while field development plans and reserves calculations have significant impact on financial decisions and, thus, results in problems such as these need to be accurate. On the other hand, less rigor might be considered for many problems that don’t have a high safety or financial risk profile. 

For geoscientists working to introduce AI into their workflows, it is best to start small, understand the risks, criticality of accuracy and have a clear line of sight to business value for the selected problem. It’s helpful to develop a plan to scale the new workflow across the organization early on, something often missing from AI pilot projects.  

2. Gathering training data

One of the biggest challenges associated with machine learning is gathering and organizing the right data to train models. In the real world, high-quality, accurate data is incredibly important – and incredibly difficult to collect. 

“When creating AI in the real world, the data used to train the model is far more important than the model itself,” Rochwerger and Pang write. “This is a reversal of the typical paradigm represented by academia, where data science PhDs spend most of their focus and effort on creating new models. But the data used to train models in academia are only meant to prove the functionality of the model, not solve real problems. Out in the real world, high-quality and accurate data that can be used to train a working model is incredibly tricky to collect.”

In upstream oil and gas, accessing and integrating accurate, up-to-date data is one of the industry’s biggest challenges, particularly for geoscientists. Data on reservoirs is added continuously at highly varying scales in space and time, from initial seismic and other remotely sensed data, to exploration log and core data, to production data. AI workflows and data infrastructure must be built with these physical and time scales in mind. The OSDU Data Platform is a significant step in addressing this challenge. 

In the future it is reasonable to expect the energy industry to develop analogues to ImageNet for training data on any number of geoscience applications. As more open source data becomes available, those companies with exclusive access to massive data sets will find it less and less of a competitive advantage. 

Considering the rapid advances in AI, if a business opportunity presents itself from exclusive data, it is best to move on it quickly before the advantage disappears. 

3. Maintaining machine learning models

To provide maximum value and maintain confidence in results, AI requires continuous human-machine collaboration. Regardless of the algorithm, machine learning is only as accurate as the labeled data and human input. If the data is of questionable accuracy or obsolete, the resulting model will be of limited use.

Rochwerger and Pang continue, “Don’t forget to allocate resources for the ongoing training of your model. Models have to be trained continually, or they’ll become less accurate over time as the real world changes around them.”

Machine learning models are increasingly becoming commodities. More and more data is becoming available, and analogues to ImageNet start to exist for various industries and domains. For subsurface data, ImageNet analogues are on the horizon, albeit initially only for domain-specific workflows, for example seismic or well log interpretation. 

More sophisticated models that take into consideration all available subsurface data are the next challenge for research and pilot projects, and then with success, to scale across organizations. Major operators with their broad and deep data sets may hold a competitive advantage here. How long such an advantage will persist is another question. 

A critical element for maintaining any machine learning model will be a highly intuitive user interface, specifically for integrating and labeling data for new generations of workflows. 

A first step in this direction is found in our machine learning application, SubsurfaceAI Seismic, where geoscientists are able to train the AI in the way they want the interpretation to be done. In this cloud-enabled application, domain experts provide ‘near real-time’ QC feedback on predictions made by the machine learning models. No data science or IT knowledge is required of the domain expert, another feature of future applications.  

4. Gathering the right team

The fourth main challenge “Real World AI” sets out is gathering the right team. Successfully solving real-world problems with AI, including scaling across the organization, requires deep, cross-functional collaboration. This collaboration extends from domain experts who ultimately discover the value, to parts of the organization often not considered by those closest to the challenge.  

“A business problem that can be solved by a model alone is very unusual. Most problems are multifaceted and require an assortment of skills—data pipelines, infrastructure, UX, business risk analysis,” Rochwerger and Pang observe. “Even with a wonderful business strategy, a well-articulated, specific problem, and a great team, it’ll be impossible to achieve success without access to the data, tools, and infrastructure necessary to ingest each dataset, save it, move it to the right place, and manipulate it.” 

Geoscientists today are in the transition from machine learning pilot projects to scaling across the organization. Experts often spend significant amounts of time interacting with data science experts and IT departments rather than working on their domain challenges. Deployment of external software with a strong AI component can be more challenging than traditional ones due to platform integration issues, particularly if plans include migrating to the cloud. The OSDU Data Platform is important to consider when planning for scaling of new generation workflows. 

In summary, this book is a worthwhile read for anyone with significant involvement in scaling advanced scientific software and new workflows across an organization. One comment to add from the Enthought experience across multiple industries: Start with a small project, have line of sight to business value and develop plans to scale to the organization as the project progresses. Look for pilot locations within the business where there is clear value, with domain experts committed to achieving success.  

We’d welcome a conversation about our experience. 

Resources

The Challenges of Applied Machine Learning on TechTalks 

Real World AI: A Practical Guide for Responsible Machine Learning on Amazon 

VentureBeat article

The OSDU Data Platform – open source, standards-based, technology-agnostic data platform 

The Enthought SubsurfaceAI Seismic custom deep learning application 

About the Author

Mason Dykstra is Enthought’s Vice President of Energy Solutions. As an intuitive thought leader with previous experience in academia, Statoil and Anadarko, he helps oil and gas companies connect the dots between science, engineering, technology and business needs. Mason leads the Enthought team of experts in tackling problems that contribute to the bottom lines of its customers. Connect with Mason on LinkedIn at linkedin.com/in/mason-dykstra-a304b25/ to join his online conversations.

Share this article:

Related Content

Leveraging AI for More Efficient Research in BioPharma

In the rapidly-evolving landscape of drug discovery and development, traditional approaches to R&D in biopharma are no longer sufficient. Artificial intelligence (AI) continues to be a...

Read More

Utilizing LLMs Today in Industrial Materials and Chemical R&D

Leveraging large language models (LLMs) in materials science and chemical R&D isn't just a speculative venture for some AI future. There are two primary use...

Read More

Top 10 AI Concepts Every Scientific R&D Leader Should Know

R&D leaders and scientists need a working understanding of key AI concepts so they can more effectively develop future-forward data strategies and lead the charge...

Read More

Why A Data Fabric is Essential for Modern R&D

Scattered and siloed data is one of the top challenges slowing down scientific discovery and innovation today. What every R&D organization needs is a data...

Read More

Jupyter AI Magics Are Not ✨Magic✨

It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow. Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic. New tools are…

Read More

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Materials Informatics Manager Enthought, Materials Science Solutions The American Chemical Society (ACS) is a premier scientific organization with members all over…

Read More

Real Scientists Make Their Own Tools

There’s a long history of scientists who built new tools to enable their discoveries. Tycho Brahe built a quadrant that allowed him to observe the…

Read More

How IT Contributes to Successful Science

With the increasing importance of AI and machine learning in science and engineering, it is critical that the leadership of R&D and IT groups at...

Read More

From Data to Discovery: Exploring the Potential of Generative Models in Materials Informatics Solutions

Generative models can be used in many more areas than just language generation, with one particularly promising area: molecule generation for chemical product development.

Read More

7 Pro-Tips for Scientists: Using LLMs to Write Code

Scientists gain superpowers when they learn to program. Programming makes answering whole classes of questions easy and new classes of questions become possible to answer….

Read More