Geophysics in the Cloud Competition

Join the 2021 GSH Geophysics in the cloud competition. Build a novel seismic inversion app and access all the data on demand with serverless cloud storage. Example notebooks show how to access this data and use AWS SageMaker to build your ML models. With prizes.

Author: Ben Lasscock, Ph.D.

Geophysics in the Cloud Competition

The 2021 Houston GSH Geophysics on Cloud Competition is sponsored by AWS Energy and Enthought. This competition will allow teams and individuals to develop new and innovative solutions for seismic inversion. We’re going all in on cloud. You will be provided the latest technologies for serverless access to big data, examples of AWS SageMaker to learn how to build ML models on the cloud and a gather.town, where we can work and collaborate in 8-bit.

Access All the Data

A common theme when discussing AI/ML in exploration geophysics has been that only a very small percentage of available data is used in analysis and decision making. One of the goals of this competition is to make ALL data available to the participants, on demand.

This competition presents both a logistical and technical challenge for both organizers and participants. The seismic datasets are large. Downloading this data would typically take hours, a cost multiplied across each and every participant. For the organizers, we don’t want to see the work of loading and manipulating large SEGY format data replicated across the teams. More overhead loading data means less time (and less fun) developing ML for seismic inversion.

While we want participants to have access to ALL the data, we expect they will only use what they find to be relevant in solving the competition problem. This detail is important when using specialist GPUs and tools like AWS SageMaker to build models. We don’t want to be wasting valuable compute time doing I/O.

Going Serverless

Competition datasets will be made available to the participants through a convenient api, the data reformatted for efficient serverless access. Serverless means that the data can be accessed directly from blob storage (S3). For the organizers, we don’t have to manage an extra server to provide access to data. For the participants, it means efficient access to the parts of the dataset they want, on demand.

One such efficient format of seismic data is OpenVDS. OpenVDS provides fast access to slices (inline, crossline, and time) and 3D chunks. The upcoming release of OpenVDS+, by Bluware, provides an easy (pip installable) library that participants can use in their notebooks. OpenVDS is also part of the OSDU Data Platform, so we should be seeing a lot more of it in the future.

Get Started

The problem of assembling an AI/ML ready data set has been solved by using a serverless model, making the most of the scarce resources available for the competition.

This story really isn’t too different from what we see in the industry at large: how to get the most innovation with the least expenditure while making highly efficient use of expert time.

Let the competition begin. Entries close 26 March, and the competition begins 1 April. No foolin’.

Visit the website to learn more and enter the competition.

About the Author

Ben Lasscock, holds a Ph.D. and a B.Sc. in theoretical physics as well as a B.Sc. in physics and theoretical physics from the University of Adelaide. Before coming to geoscience, Ben worked as a portfolio manager at a large hedge fund in Australia. He has publications in the areas of high energy physics, Bayesian time series analysis and geophysics.

Share this article:

Related Content

Top 10 AI Concepts Every Scientific R&D Leader Should Know

R&D leaders and scientists need a working understanding of key AI concepts so they can more effectively develop future-forward data strategies and lead the charge...

Read More

Why A Data Fabric is Essential for Modern R&D

Scattered and siloed data is one of the top challenges slowing down scientific discovery and innovation today. What every R&D organization needs is a data...

Read More

Jupyter AI Magics Are Not ✨Magic✨

It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow. Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic. New tools are…

Read More

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Materials Informatics Manager Enthought, Materials Science Solutions The American Chemical Society (ACS) is a premier scientific organization with members all over…

Read More

Real Scientists Make Their Own Tools

There’s a long history of scientists who built new tools to enable their discoveries. Tycho Brahe built a quadrant that allowed him to observe the…

Read More

How IT Contributes to Successful Science

With the increasing importance of AI and machine learning in science and engineering, it is critical that the leadership of R&D and IT groups at...

Read More

From Data to Discovery: Exploring the Potential of Generative Models in Materials Informatics Solutions

Generative models can be used in many more areas than just language generation, with one particularly promising area: molecule generation for chemical product development.

Read More

7 Pro-Tips for Scientists: Using LLMs to Write Code

Scientists gain superpowers when they learn to program. Programming makes answering whole classes of questions easy and new classes of questions become possible to answer….

Read More

The Importance of Large Language Models in Science Even If You Don’t Work With Language

OpenAI's ChatGPT, Google's Bard, and other similar Large Language Models (LLMs) have made dramatic strides in their ability to interact with people using natural language....

Read More

4 Reasons to Learn Xarray and Awkward Array—for NumPy and Pandas Users

You know it. We know it. NumPy is cool. Pandas is cool. We can bend them to our will, but sometimes they’re not the right tools…

Read More