In an example away from seismic, this shows a thin section, where machine learning techniques can be applied across multiple images, ones previously unused due to the significant demands of expert time, and difficulties in organizing and sharing data. See a demo at: https://www.enthought.com/industries/oil-and-gas/core-analysis/
Author: Brendon Hall, Ph.D., Director, Energy Solutions
The SEG 2020 Virtual Conference is now a wrap. This blog post takes the opportunity provided by the event to present a selection of questions from attendees who found us both in the Virtual Expo chat feature as well as via the Enthought website.
Why is it important to have a domain specific labeling tool for Artificial Intelligence or Machine Learning?
Generic image labeling tools exist, but it is tedious to label seismic data using these. A seismic-specific labeling tool provides a convenient interface for labeling common interpretation objects, like horizons, faults and salt. This allows an interpreter to quickly create data that can be used to train a machine learning model.
Related to this, a problem often encountered when working with scientific images is a lack of training data. For example, there isn’t an equivalent of ImageNet for seismic or thin section images. Domain experts are needed to manually generate suitable training data, and this points to the need for a labeling capability and models that deal well when faced with small data sets.
What are the differences between available machine learning models?
It is difficult to provide a succinct answer to this common question, as there are many ways machine learning models can differ. Some models are useful for classification, like identifying objects in images. Others are designed for predicting values, or clustering data in relevant groups.
Random Forests models are able to make reasonable predictions with relatively small datasets. Deep learning models are capable of identifying complex features in images, but many require massive amounts of data. Some models can be interrogated to understand how they arrived at decisions. Others are black boxes that are difficult to interpret. It is important to consider the business purpose of the model, the amount of data available, and accuracy and robustness requirements.
What is the difference between autotrackers, which can be quite effective as an AI technique in seismic, and what Enthought is doing?
A very good question, and a deeper dive into wavelets and autotrackers would be a great topic for a future blog post. However, for a brief answer:
There are different types of autotrackers, and on many data sets they can be effective. To generalize, autotrackers are based on wavelet characteristics. What Enthought has done is take the approach of building the AI/ML models to recognize patterns in the facies, much as experts do.
It is early times, but the approach is promising. Some results are available to show on open source data sets. See the video ‘Introducing Pattern Recognition AI’.
What is different about the scientific software developed by Enthought from other industry packages?
First, our starting point is a client or unique industry problem where AI/machine/deep learning can make a significant difference, usually both in expert efficiency gain and depth of understanding. We have access to a significant set of digital tools and foundational code, developed over many years working with science-driven industries.
To get started, we look at the business problem and how data flows through the current process. During a collaborative discovery process, we analyze which parts of the process are manual and tedious. These can usually be automated, which is the first win.
We then consider the steps that require expert judgement and interpretation. Is it possible to design an AI assistant that can generate different scenarios or suggestions that can shorten the interpretation time? If the interpretation task requires annotating data, perhaps an AI assistant can learn by observing the interpreter over time.
Then comes what we call ‘applied digital innovation’. We work collaboratively with clients to develop a strategy for collecting, cleaning and curating ML-ready data assets. We iterate and rapidly deploy tools on the cloud or on-premises resources.
We understand the role of the scientist in interpretation workflows, and craft AI tools to assist and collaborate with them to generate business results faster, more accurately and reproducibly.
The final step has the work ready for import into industry standard packages, for example Petrel or DecisionSpace.
What are your thoughts on the ability of industry standard software packages to deliver on the potential of AI/ML?
For some problems, such as forecasting time series or predicting machine failure, there are good solutions available. For complex cognitive tasks like interpreting seismic data, it will take some time before standard tools are able to make robust predictions. There is limited suitable training data, although some companies do have access to large sets.
The other issue is that seismic data is not standardized in a meaningful way. There are many tasks involved in interpreting seismic (e.g. finding faults, mapping horizons). These tasks are somewhat subjective, and results can depend on the interpreter, company knowledge base, and procedures.
This makes it difficult to create off-the-shelf machine learning models that are able to consistently deliver business value, and then to integrate these into established software.
This is why the Enthought approach is to collaborate with operator experts using their data – and real world problems – starting from an extensive foundational machine learning toolkit. From there, fast prototyping and innovation become possible. This is more difficult in established software packages. .
How is the cloud going to impact workflows in upstream oil and gas?
We can get some sense of what is possible by considering how the cloud is already transforming common office software. Collaboration will be effortless. We will be able to leave comments and ask questions that can be answered in real time, and scientific applications will be integrated with communication tools.
Data will no longer be an artefact that resides in a specific application or has an owner. It will become a resource that will continually flow through a system of workflows to produce data products that support improved business decision making. This is a key advance.
The nature of software applications will also change. Desktop software is relatively static, bound to release cycles. Cloud based applications can be continuously improved. (This is something Enthought does to great effect in collaborations using our deep learning toolkit.) This will enable a system of workflows that can be optimized over time based on business needs.
How important is infrastructure to innovation, often at an individual expert level, through AI/machine learning?
Infrastructure is critical to facilitating the innovation required for digital transformation. Software components need to be secure and scalable. Data need to be stored in a consistent, searchable manner that can be accessed by machines and humans.
The infrastructure must enable the tools to be crafted and used by domain experts to enable sustainable, continuous improvement and transformation. This could be as simple as leveraging a cloud provider and working with partners who can help organize and safeguard data.
We find that some subsurface teams have not had to engage their IT departments at the level of change necessary to progress advanced scientific computing. This can be one of the bigger challenges in a project. Enthought built its own infrastructure to enable advanced computing, to test and customize in support of projects. This has proved useful in helping certain client IT functions find the way forward.
What is Enthought? What do they do?
Since 2001, Enthought has been building customized scientific software across multiple science-driven industries, including energy, targeting early and ongoing business impact. We are experts in using Python to build solutions, and have developed training programs to prepare enterprise scientists to build their own tools.
These cutting edge scientific solutions are built collaboratively with clients – domain expertise paired with ‘scientists who code’ – using the latest AI machine learning, modeling and simulation technology. Our iterative approach is focused on quickly proving concepts, and developing intuitive user interfaces to build software tools whose starting point is to have business impact.
To ask your own question or learn more, send us an email at firstname.lastname@example.org.
About the Author
Brendon Hall, Ph.D., Director, Energy Solutions at Enthought holds a Ph.D. in mechanical engineering from the University of California Santa Barbara, and a B.Eng. in mechanical engineering as well as B.Sc. in computer science from Western University in Canada.
With more than 10 years of industry experience, Brendon is a recognized leader in machine learning who understands the science of oil and gas exploration. His experience as a computational geologist at ExxonMobil and as a research geophysicist at ION Geophysical add to his background in software development to enable cross-functional discussions that drive business value across oil and gas organizations.
ChatGPT on Software Engineering
Recently, I’ve been working on a new course offering in Enthought Academy titled Software Engineering for Scientists and Engineers course. I’ve focused on distilling the…
What’s in a __name__?
if __name__ == “__main__”: When I was new to Python, I ran into a mysterious block of code that looked something like: def main(): …
3 Trends for Scientists To Watch in 2023
As a company that delivers Digital Transformation for Science, part of our job at Enthought is to understand the trends that will affect how our…
Retuning the Heavens: Machine Learning and Ancient Astronomy
What can we learn about machine learning from ancient astronomy? When thinking about Machine Learning it is easy to be model-centric and get caught up…
Extracting Target Labels from Deep Learning Classification Models
In the blog post Configuring a Neural Network Output Layer we highlighted how to correctly set up an output layer for deep learning models. Here,…
Exploring Python Objects
Introduction When we teach our foundational Python class, one of the things we do is make sure that our students know how to explore Python…
Choosing the Right Number of Clusters
Introduction When I first started my machine learning journey, K-means clustering was one of the first algorithms I was introduced to – and it is…
Prospecting for Data on the Web
Introduction At Enthought we teach a lot of scientists and engineers about using Python and the ecosystem of scientific Python packages for processing, analyzing, and…
Configuring a Neural Network Output Layer
Introduction If you have used TensorFlow before, you know how easy it is to create a simple neural network model using the Keras API. Just…
No Zero Padding with strftime()
One of the best features of Python is that it is platform independent. You can write code on Linux, Windows, and MacOS and it works…