In this blog article Enthought Energy Solutions Vice President Mason Dykstra looks at the recently published book titled “Real World AI: A Practical Guide for Responsible Machine Learning” in the context of both the technical challenges faced by geoscientists and how to scale.
Author: Mason Dykstra, Ph.D., Vice President, Energy Solutions
In the newly released book titled “Real World AI: A Practical Guide for Responsible Machine Learning,” Alyssa Simpson Rochwerger and Wilson Pang share a number of examples of how organizations have succeeded – and failed – at integrating machine learning initiatives. Among them, “Only 20 percent of AI in pilot stages at major companies make it to production, and many fail to serve their customers as well as they could. In some cases, it’s because they’re trying to solve the wrong problem. In others, it’s because they fail to account for all the variables—or latent biases—that are crucial to a model’s success or failure.”
Artificial intelligence is pushing the boundaries of what is possible: live facial recognition, autocorrecting language editors, fraud detection, customer service chatbots and more. Advances in autonomous technologies are changing the way people and machines operate. And people are increasingly reliant on the power of AI technology to improve decision-making, safety, communication and productivity. In fact, today’s AI has advanced so much that most people are impacted by some form of AI every day without realizing it.
But using AI to solve complex, real-world business problems comes with a unique set of challenges that are absent in academic and scientific research settings. Overcoming these challenges requires resources, skills, collaboration and knowledge, of which data science is only one. Used appropriately, applied machine learning can help companies accelerate and transform their businesses in ways unimagined a few years ago.
With today’s combination of open source big data, faster processing speeds and transformative advances in cloud computing, businesses have the potential to implement and scale their AI faster than ever. However, deploying AI initiatives is a huge undertaking and a treacherous process. According to industry analysts, more than 80% of AI projects never make it past the pilot stage. Why?
The 4 Biggest Business Challenges of Deploying AI and the Geoscientist
“Real World AI” highlights four key challenges organizations face when leveraging AI. This categorization serves very well in providing guidance to geoscientists working to deliver value to their organizations through AI/machine learning technologies and workflows.
1. Defining the problem
The first step to overcoming any challenge is identifying the problem. AI is no exception. Once you have a defined objective, you can implement a specialized approach that takes existing data, operational constraints and risk into consideration.
In “Real World AI,” defining the problem also includes determining how well you want to solve the problem. In health care, a cancer detection neural network requires human-machine interaction and a high degree of certainty to reduce risk and improve final outcomes, while many other life sciences applications have less consequential health risks.
In the energy industry, the geoscientist must use practical knowledge and experience in defining both the problem and an acceptable level of accuracy required of the solution. Risks for shallow drilling hazards and uncertainty in pore pressure predictions are critical to operations and safety, while field development plans and reserves calculations have significant impact on financial decisions and, thus, results in problems such as these need to be accurate. On the other hand, less rigor might be considered for many problems that don’t have a high safety or financial risk profile.
For geoscientists working to introduce AI into their workflows, it is best to start small, understand the risks, criticality of accuracy and have a clear line of sight to business value for the selected problem. It’s helpful to develop a plan to scale the new workflow across the organization early on, something often missing from AI pilot projects.
2. Gathering training data
One of the biggest challenges associated with machine learning is gathering and organizing the right data to train models. In the real world, high-quality, accurate data is incredibly important – and incredibly difficult to collect.
“When creating AI in the real world, the data used to train the model is far more important than the model itself,” Rochwerger and Pang write. “This is a reversal of the typical paradigm represented by academia, where data science PhDs spend most of their focus and effort on creating new models. But the data used to train models in academia are only meant to prove the functionality of the model, not solve real problems. Out in the real world, high-quality and accurate data that can be used to train a working model is incredibly tricky to collect.”
In upstream oil and gas, accessing and integrating accurate, up-to-date data is one of the industry’s biggest challenges, particularly for geoscientists. Data on reservoirs is added continuously at highly varying scales in space and time, from initial seismic and other remotely sensed data, to exploration log and core data, to production data. AI workflows and data infrastructure must be built with these physical and time scales in mind. The OSDU Data Platform is a significant step in addressing this challenge.
In the future it is reasonable to expect the energy industry to develop analogues to ImageNet for training data on any number of geoscience applications. As more open source data becomes available, those companies with exclusive access to massive data sets will find it less and less of a competitive advantage.
Considering the rapid advances in AI, if a business opportunity presents itself from exclusive data, it is best to move on it quickly before the advantage disappears.
3. Maintaining machine learning models
To provide maximum value and maintain confidence in results, AI requires continuous human-machine collaboration. Regardless of the algorithm, machine learning is only as accurate as the labeled data and human input. If the data is of questionable accuracy or obsolete, the resulting model will be of limited use.
Rochwerger and Pang continue, “Don’t forget to allocate resources for the ongoing training of your model. Models have to be trained continually, or they’ll become less accurate over time as the real world changes around them.”
Machine learning models are increasingly becoming commodities. More and more data is becoming available, and analogues to ImageNet start to exist for various industries and domains. For subsurface data, ImageNet analogues are on the horizon, albeit initially only for domain-specific workflows, for example seismic or well log interpretation.
More sophisticated models that take into consideration all available subsurface data are the next challenge for research and pilot projects, and then with success, to scale across organizations. Major operators with their broad and deep data sets may hold a competitive advantage here. How long such an advantage will persist is another question.
A critical element for maintaining any machine learning model will be a highly intuitive user interface, specifically for integrating and labeling data for new generations of workflows.
A first step in this direction is found in our machine learning application, SubsurfaceAI Seismic, where geoscientists are able to train the AI in the way they want the interpretation to be done. In this cloud-enabled application, domain experts provide ‘near real-time’ QC feedback on predictions made by the machine learning models. No data science or IT knowledge is required of the domain expert, another feature of future applications.
4. Gathering the right team
The fourth main challenge “Real World AI” sets out is gathering the right team. Successfully solving real-world problems with AI, including scaling across the organization, requires deep, cross-functional collaboration. This collaboration extends from domain experts who ultimately discover the value, to parts of the organization often not considered by those closest to the challenge.
“A business problem that can be solved by a model alone is very unusual. Most problems are multifaceted and require an assortment of skills—data pipelines, infrastructure, UX, business risk analysis,” Rochwerger and Pang observe. “Even with a wonderful business strategy, a well-articulated, specific problem, and a great team, it’ll be impossible to achieve success without access to the data, tools, and infrastructure necessary to ingest each dataset, save it, move it to the right place, and manipulate it.”
Geoscientists today are in the transition from machine learning pilot projects to scaling across the organization. Experts often spend significant amounts of time interacting with data science experts and IT departments rather than working on their domain challenges. Deployment of external software with a strong AI component can be more challenging than traditional ones due to platform integration issues, particularly if plans include migrating to the cloud. The OSDU Data Platform is important to consider when planning for scaling of new generation workflows.
In summary, this book is a worthwhile read for anyone with significant involvement in scaling advanced scientific software and new workflows across an organization. One comment to add from the Enthought experience across multiple industries: Start with a small project, have line of sight to business value and develop plans to scale to the organization as the project progresses. Look for pilot locations within the business where there is clear value, with domain experts committed to achieving success.
We’d welcome a conversation about our experience.
The Challenges of Applied Machine Learning on TechTalks
Real World AI: A Practical Guide for Responsible Machine Learning on Amazon
The OSDU Data Platform – open source, standards-based, technology-agnostic data platform
The Enthought SubsurfaceAI Seismic custom deep learning application
About the Author
Mason Dykstra is Enthought’s Vice President of Energy Solutions. As an intuitive thought leader with previous experience in academia, Statoil and Anadarko, he helps oil and gas companies connect the dots between science, engineering, technology and business needs. Mason leads the Enthought team of experts in tackling problems that contribute to the bottom lines of its customers. Connect with Mason on LinkedIn at linkedin.com/in/mason-dykstra-a304b25/ to join his online conversations.
ChatGPT on Software Engineering
Recently, I’ve been working on a new course offering in Enthought Academy titled Software Engineering for Scientists and Engineers course. I’ve focused on distilling the…
What’s in a __name__?
if __name__ == “__main__”: When I was new to Python, I ran into a mysterious block of code that looked something like: def main(): …
3 Trends for Scientists To Watch in 2023
As a company that delivers Digital Transformation for Science, part of our job at Enthought is to understand the trends that will affect how our…
Retuning the Heavens: Machine Learning and Ancient Astronomy
What can we learn about machine learning from ancient astronomy? When thinking about Machine Learning it is easy to be model-centric and get caught up…
Extracting Target Labels from Deep Learning Classification Models
In the blog post Configuring a Neural Network Output Layer we highlighted how to correctly set up an output layer for deep learning models. Here,…
Exploring Python Objects
Introduction When we teach our foundational Python class, one of the things we do is make sure that our students know how to explore Python…
Choosing the Right Number of Clusters
Introduction When I first started my machine learning journey, K-means clustering was one of the first algorithms I was introduced to – and it is…
Prospecting for Data on the Web
Introduction At Enthought we teach a lot of scientists and engineers about using Python and the ecosystem of scientific Python packages for processing, analyzing, and…
Configuring a Neural Network Output Layer
Introduction If you have used TensorFlow before, you know how easy it is to create a simple neural network model using the Keras API. Just…
No Zero Padding with strftime()
One of the best features of Python is that it is platform independent. You can write code on Linux, Windows, and MacOS and it works…