How IT Contributes to Successful Science

 

With the increasing importance of AI and machine learning in science and engineering, it is critical that the leadership of R&D and IT groups at innovative companies are aligned. Inappropriate budgeting, policies, or vendor choices can unnecessarily block critical research programs; conversely an "anything goes" approach can squander valuable resources or leave an organization open to novel security threats.

At the heart of the tension between R&D and IT is the fundamental fact that the needs of research groups are qualitatively different from most standard business activities, often in ways that are unfamiliar to IT:

  1. R&D changes workflows and adopts new technologies frequently as new and different approaches to problems are tried; 
  2. those technologies are often atypical and specialized; 
  3. and they require closer interaction and collaboration between researchers and IT staff to resolve.

Some of these new technologies, like advanced AI, that researchers are adopting also hold the promise of solving some of the challenges that IT has in providing safe and scalable solutions for research groups.

Research IT Requires Flexibility

An effective research organization should always be trying new things, whether different methods, different materials, different designs, different equipment or different algorithms. This is not just for the sake of doing new things but because new, relevant, workflows and technologies should at a minimum be evaluated and, if effective, adopted. R&D's frequent flux is in contrast to many other parts of a business: financial reports should be consistent over time, data warehouse schemas should only evolve gradually, manufacturing processes shouldn't change from day to day, and so forth.

Unfortunately, novelty can strain standard IT policies and resources. For example, a great deal of modern lab equipment is essentially a PC with specialized instruments attached to it - often running an outdated OS and software and all too frequently with known security vulnerabilities - which cannot be upgraded easily. Or the adoption of a novel lab methodology requires the ability to search a new type of semi-structured data that it generates, requiring a new database. Or researchers want to be able to run complex third party research code to replicate or adapt a new analysis.

Often, exceptions need to be granted to policies, or IT resources spent, for R&D to be successful.

Modern machine learning technologies, such as Large Language Models (LLMs), exemplify the need for flexibility in processes. To make use of them, either an IT organization needs to be able to deploy and scale 3rd party GPU-based containerised compute resources along with the large data sources required for using these, or policies need to be developed and enforced allowing the use of proprietary data with AI provider's APIs. As a fast-moving area of work, best practices and preferred services are themselves changing from month to month.

When research is a critical need for a business, like in materials science and the life sciences, IT needs the flexibility, expertise and resources to adapt to the needs of R&D.

One-Size Does Not Fit All

Managing an organization's IT needs is a complex operation, and there are playbooks for controlling complexity: standardizing portfolios, removing redundant solutions, reducing opportunities for security issues, providing common solutions to needs, and so on. For most business users some easy decisions can be made, such as: using only the most recent versions of a preferred OS, providing users with a standard laptop configuration from a single vendor, having a single data lake for the entire company, strictly controlling available software and internet access.

But to perform their jobs, researchers frequently need access to resources which are not typical for an everyday business use-case. It could be something as simple as larger displays to support high-density data visualization, access to sites that may not be on typical business-oriented whitelists, or higher-speed network access for data transfer. But it can also be machines with specific GPU capabilities, the ability to compile and run arbitrary code, or the ability to spin-up containerized infrastructure. Atypical use-cases need to be recognized when deciding policies and assigning budgets: R&D user's needs are different.

Additionally, the resources that researchers require also vary over time: a researcher may not need anything unusual in terms of computing power 90% of the time, but every few months they need a week of time on a GPU cluster to train a machine learning model; or every week they need to upload large amounts of data from an instrument, but otherwise have normal network usage.

Even stepping outside the constraints of individual user's needs, R&D groups tend to have different needs than other business use-cases. For example scientific data tends to not fit either a fixed-schema data warehouse, or a data-lake of text documents, but instead collections of highly structured array or image data. Typical off-the shelf data storage solutions frequently don't work well with research data.

Care needs to be taken when looking at external vendors: many IT and software consulting firms do not have any experience with the particular needs of scientific research and can't converse with researchers on their own terms. When looking for management or software consulting services for R&D needs, success is more likely when working with an external partner with deep expertise in interfacing with scientists and engineers. Enthought has over 20 years of experience helping companies in this way.

When it comes to R&D, IT needs to be prepared and equipped to support a diversity of hardware, operating systems, software, data storage and other technologies.

Education and Support

Most scientists and engineers are confident computer users, many know how to code and are comfortable with SQL, HTML and many other standard technologies. But every researcher's knowledge has limits: they may not be familiar with containerized deployment; they may need help installing complex machine learning libraries like TensorFlow into their analysis environments; or accessing and provisioning cloud services; or not know how to access 3rd party large language model APIs.

Additionally, researchers may not be as familiar with best practices and security threats that come with these technological choices. From query and prompt injection, to OAuth identity management, to typosquatting in open-source package management systems, researchers need to be aware of the potential issues that their workflows may come with. IT needs to provide support via education, monitoring and timely vulnerability notifications.

Companies that want to support modern AI workflows must have IT and DevOps employees who can work closely with researchers to provide the support, services and education they need, particularly when it comes to cloud computing, containerization and orchestration, and management of analysis environments.

Modern Solutions to Modern Problems 

Ironically, the new technologies which cause some of these challenges can also be a key to providing solutions for them. Cloud technologies, whether internal or external, permit unprecedented flexibility in providing computing or data resources.

Previously a researcher or lab might need to be provided with a dedicated GPU workstation that was capable of handling their peak computing workloads. Now there is the possibility of providing dedicated compute capability that can handle day-to-day workloads, coupled with access to cloud resources that can be used to cover occasional surges in need or access to GPUs.

Web-based analysis technologies like JupyterLab and JupyterHub offer the promise of being able to provide self-contained, sandboxed environments for users. Researchers can work with familiar notebook-based tools and have full control over their environments, installing whatever packages they need, but in a way that is isolated from internal corporate networks. If such an environment is compromised it may be serious for the worker and their immediate research, but it is unlikely to become a company-wide breach.

Businesses which can acquire these capabilities will have a distinct advantage in harnessing the new tools of scientific computing. As AI and machine learning become more mainstream, research-oriented companies in life sciences and materials technology will need to ensure that their IT and R&D groups are aligned to provide the data and cloud computing capabilities needed to support researcher's needs. And inevitably the alignment will need to grow even closer for whatever technologies emerge next.

 

Learn more about how scientific data is different here

Share this article:

Related Content

Top 10 AI Concepts Every Scientific R&D Leader Should Know

R&D leaders and scientists need a working understanding of key AI concepts so they can more effectively develop future-forward data strategies and lead the charge...

Read More

Why A Data Fabric is Essential for Modern R&D

Scattered and siloed data is one of the top challenges slowing down scientific discovery and innovation today. What every R&D organization needs is a data...

Read More

Jupyter AI Magics Are Not ✨Magic✨

It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow. Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic. New tools are…

Read More

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Materials Informatics Manager Enthought, Materials Science Solutions The American Chemical Society (ACS) is a premier scientific organization with members all over…

Read More

Real Scientists Make Their Own Tools

There’s a long history of scientists who built new tools to enable their discoveries. Tycho Brahe built a quadrant that allowed him to observe the…

Read More

How IT Contributes to Successful Science

With the increasing importance of AI and machine learning in science and engineering, it is critical that the leadership of R&D and IT groups at...

Read More

From Data to Discovery: Exploring the Potential of Generative Models in Materials Informatics Solutions

Generative models can be used in many more areas than just language generation, with one particularly promising area: molecule generation for chemical product development.

Read More

7 Pro-Tips for Scientists: Using LLMs to Write Code

Scientists gain superpowers when they learn to program. Programming makes answering whole classes of questions easy and new classes of questions become possible to answer….

Read More

The Importance of Large Language Models in Science Even If You Don’t Work With Language

OpenAI's ChatGPT, Google's Bard, and other similar Large Language Models (LLMs) have made dramatic strides in their ability to interact with people using natural language....

Read More

4 Reasons to Learn Xarray and Awkward Array—for NumPy and Pandas Users

You know it. We know it. NumPy is cool. Pandas is cool. We can bend them to our will, but sometimes they’re not the right tools…

Read More