DeepModeling

Define the future of scientific computing together

On the journey toward developing a Large Atomic Model (LAM), the core Deep Potential development team has launched the OpenLAM initiative for the community. OpenLAM’s slogan is "Conquer the Periodic Table!" The project aims to create an open-source ecosystem centered on microscale large models, providing new infrastructure for microscopic scientific research and driving transformative advancements in microscale industrial design across fields such as materials, energy, and biopharmaceuticals.

Codes

The code underwent a large-scale restructuring, and the DeePMD-kit v3 alpha version was successfully released in early March. Compared to v2, DeePMD-kit v3 allows users to train and run deep potential models on either the TensorFlow or PyTorch framework, facilitating broader compatibility across downstream applications. DeePMD-kit v3 also adds support for the DPA-2 model, marking a new chapter for Large Atomic Models (LAM). For a more detailed report, see: https://github.com/deepmodeling/deepmd-kit/discussions/3401.

The latest 2024 Q1 branch version of DeePMD-kit v3 (available at https://github.com/deepmodeling/deepmd-kit/tree/2024Q1) includes the following new features:

DeepSpin Upgrade: The PyTorch version now supports all descriptors, including DPA-2, and allows integration with model structures like type embedding, enabling the development of higher-precision magnetic models. This is particularly advantageous for studies involving magnetic systems. Example input: https://github.com/deepmodeling/deepmd-kit/blob/2024Q1/examples/spin/se_e2_a/input_torch.json.
Multitask Feature Upgrade: The PyTorch version now supports multitask fine-tuning, allowing users to fine-tune a pre-trained model on multiple downstream systems simultaneously. Documentation is available at: https://github.com/deepmodeling/deepmd-kit/blob/devel/doc/train/finetuning.md#multi-task-fine-tuning.

Date

  • A data-cleaning workflow was initially established (https://github.com/zjgemi/lam-data-cleaning), and 74 datasets selected from the ColabFit collection (https://materials.colabfit.org/) were incorporated into training. Tests indicate that the model achieved the expected accuracy in multitask training at a scale of 100 tasks.

  • A new chalcogenide solid electrolyte dataset was added, covering 26 types of sulfides. Compared to other general-purpose force fields, the DPA-SSE model more accurately predicts dynamic properties such as the diffusion coefficient of Li ions and ionic conductivity.

  • The MPtraj dataset was cleaned and categorized, and training and testing were conducted on the updated DPA-2 model structure. Initial results indicate an improvement in model performance.

Models

The pre-trained model OpenLAM_2.1.0_27heads_20224Q1.pt, compatible with the latest 2024 Q1 branch of DeePMD-kit v3, was updated and uploaded to the AIS-Square website: https://aissquare.com/models/detail?pageType=models&name=DPA-2.1.0-2024Q1&id=244. Tests confirmed that migrating the code to DeePMD-kit v3 did not affect model accuracy. Additionally, tasks like ANI-1x and Transition-1x, previously used as downstream test sets, were included in the pre-trained model, and more extensive training improved model accuracy.

The system was updated with usage guides for zero-shot, single-task fine-tuning, and multi-task fine-tuning with pre-trained models. For any questions on model usage, community members can connect with developers in the GitHub discussion forum: https://github.com/deepmodeling/deepmd-kit/discussions/3772.

Infrastructure

  • The cloud-native AI4Science workflow framework, Dflow, has been openly and expansively driving the development of a range of community software ecosystems.

    • Preprint of the paper: https://arxiv.org/abs/2404.18392
  • dpgen2 now supports DeePMD-kit v3 and introduces new features, including DP-Gen for fine-tuning DPA-2 pre-trained models, large model distillation workflows, and multi-task DP-GEN.

    • Project repository: https://github.com/deepmodeling/dpgen2
  • By combining the crystal structure prediction algorithm CALYPSO with DP-GEN, a model construction scheme suited for high-throughput structure prediction has been designed. Additionally, dpgen2 is integrated into the structure search software Calypso as an optional configuration space exploration engine.

    • Article link: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.109.094117
  • The open-source cloud-native alloy property calculation workflow APEX V1.2.0 has been released, effectively supporting the evaluation of the OpenLAM Large Atomic Model.
    • Project repository: https://github.com/deepmodeling/apex
    • Preprint of the paper: https://arxiv.org/abs/2404.17330

The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new infrastructure for microscale scientific research and drive the transformation of microscale industrial design in fields such as materials, energy, and biopharmaceuticals by establishing an open-source ecosystem around large microscale models. Relevant models, data, and workflows will be consolidated around the AIS Square; related software development will take place in the DeepModeling open-source community. At the same time, we welcome open interaction from different communities in model development, data sharing, evaluation, and testing.

See AIS Square for more details.

Read more »

The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new infrastructure for microscale scientific research and drive the transformation of microscale industrial design in fields such as materials, energy, and biopharmaceuticals by establishing an open-source ecosystem around large microscale models. Relevant models, data, and workflows will be consolidated around the AIS Square; related software development will take place in the DeepModeling open-source community. At the same time, we welcome open interaction from different communities in model development, data sharing, evaluation, and testing.

See AIS Square for more details.

Read more »

Peter Thiel once said, "We wanted flying cars, instead we got 140 characters (Twitter)." Over the past decade, we have made great strides at the bit level (internet), but progress at the atomic level (cutting-edge technology) has been relatively slow.

The accumulation of linguistic data has propelled the development of machine learning and ultimately led to the emergence of Large Language Models (LLMs). With the push from AI, progress at the atomic level is also accelerating. Methods like Deep Potential, by learning quantum mechanical data, have increased the space-time scale of microscopic simulations by several orders of magnitude and have made significant progress in fields like drug design, material design, and chemical engineering.

The accumulation of quantum mechanical data is gradually covering the entire periodic table, and the Deep Potential team has also begun the practice of the DPA pre-training model. Analogous to the progress of LLMs, we are on the eve of the emergence of a general Large Atom Model (LAM). At the same time, we believe that open-source and openness will play an increasingly important role in the development of LAM.

Against this backdrop, the core developer team of Deep Potential is launching the OpenLAM Initiative to the community. This plan is still in the draft stage and is set to officially start on January 1, 2024. We warmly and openly welcome opinions and support from all parties.

The slogan for OpenLAM is "Conquer the Periodic Table!" We hope to provide a new infrastructure for microscale scientific research and drive the transformation of microscale industrial design in fields such as materials, energy, and biopharmaceuticals by establishing an open-source ecosystem around large microscale models. Relevant models, data, and workflows will be consolidated around the AIS Square; related software development will take place in the DeepModeling open-source community. At the same time, we welcome open interaction from different communities in model development, data sharing, evaluation, and testing.

OpenLAM's goals for the next three years are: In 2024, to effectively cover the periodic table with first-principles data and achieve a universal property learning capability; in 2025, to combine large-scale experimental characterization data and literature data to achieve a universal cross-modal capability; and in 2026, to realize a target-oriented atomic scale universal generation and planning capability. Ultimately, within 5-10 years, we aim to achieve "Large Atom Embodied Intelligence" for atomic-scale intelligent scientific discovery and synthetic design.

OpenLAM's specific plans for 2024 include:

  • Model Update and Evaluation Report Release:

    • Starting from January 1, 2024, driven by the Deep Potential team, with participation from all LAM developers welcomed.
    • Every three months, a major model version update will take place, with updates that may include model architecture, related data, training strategies, and evaluation test criteria.
  • AIS Cup Competition:

    • Initiated by the Deep Potential team and supported by the Bohrium Cloud Platform, starting in March 2024 and concluding at the end of the year;
    • The goal is to promote the creation of a benchmarking system focused on several application-oriented metrics.
  • Domain Data Contribution:

    • Seeking collaboration with domain developers to establish "LAM-ready" datasets for pre-training and evaluation.
    • Domain datasets for iterative training of the latest models will be updated every three months.
  • Domain Application and Evaluation Workflow Contribution:

    • The domain application and evaluation workflows will be updated and released every three months.
  • Education and Training:

    • Planning a series of educational and training events aimed at LAM developers, domain developers, and users to encourage advancement in the field.
  • How to Contact Us:

    • Direct discussions are encouraged in the DeepModeling community.
    • For more complex inquiries, please contact the project lead, Han Wang (王涵, wang_han@iapcm.ac.cn), Linfeng Zhang (张林峰, zhanglf@aisi.ac.cn), for the new future of Science!

Lecture 1: Deep Potential Method for Molecular Simulation, Roberto Car

Lecture 2: Deep Potential at Scale, Linfeng Zhang

Lecture 3: Towards a Realistic Description of H3O+ and OH- Transport, Robert A. DiStasio Jr.

Lecture 4: Next Generation Quantum and Deep Learning Potentials, Darrin York

Lecture 5: Linear Response Theory of Transport in Condensed Matter, Stefano Baroni

Lecture 6: Deep Modeling with Long-Range Electrostatic Interactions, Chunyi Zhang

Hands-on session 4: Machine learning of Wannier centers and dipoles

Hands-on session 5: Long range electrostatic interactions with DPLR

Hands-on session 6: Concurrent learning with DP-GEN

Do you prepare to read a long article before clicking the tutorial? Since we can teach you how to setup a DeePMD-kit training in 5 minutes, we can also teach you how to install DeePMD-kit in 5 minutes. The installation manual will be introduced as follows:

Install with conda

After you install conda, you can install the CPU version with the following command:

1
conda install deepmd-kit=*=*cpu lammps-dp=*=*cpu -c deepmodeling

To install the GPU version containing CUDA 10.1:

1
conda install deepmd-kit=*=*gpu lammps-dp=*=*gpu -c deepmodeling

If you want to use the specific version, just replace * with the version:

1
conda install deepmd-kit=1.3.3=*cpu lammps-dp=1.3.3=*cpu -c deepmodeling

Install with offline packages

Download offline packages in the Releases page, or use wget:

1
wget https://github.com/deepmodeling/deepmd-kit/releases/download/v1.3.3/deepmd-kit-1.3.3-cuda10.1_gpu-Linux-x86_64.sh -O deepmd-kit-1.3.3-cuda10.1_gpu-Linux-x86_64.sh

Take an example of v1.3.3. Execuate the following commands and just follow the prompts.

1
sh deepmd-kit-1.3.1-cuda10.1_gpu-Linux-x86_64.sh

With Docker

To pull the CPU version:

docker pull ghcr.io/deepmodeling/deepmd-kit:1.2.2_cpu
To pull the GPU version:

docker pull ghcr.io/deepmodeling/deepmd-kit:1.2.2_cuda10.1_gpu

Tips

dp is the program of DeePMD-kit and lmp is the program of LAMMPS.

1
2
dp -h
lmp -h

GPU version has contained CUDA Toolkit. Note that different CUDA versions support different NVIDIA driver versions. See NVIDIA documents for details.

Don't hurry up and try such a convenient installation process. But I still want to remind everyone that the above installation methods only support the official version released by DeePMD-kit. If you need to use the devel version, you still need to go through a long compilation process. Please refer to the installation manual.

DeePMD-kit is a software to implement Deep Potential. There is a lot of information on the Internet, but there are not so many tutorials for the new hand, and the official guide is too long. Today, I'll take you 5 minutes to get started with DeePMD-kit.

Let's take a look at the training process of DeePMD-kit:

graph LR
A[Prepare data] --> B[Training]
B --> C[Freeze the model]

What? Only three steps? Yes, it's that simple.

Read more »

The integration of machine learning and physical modeling is changing the paradigm of scientific research. Those who hope to extend the frontier of science and solve challenging practical problems through computational modeling are coming together in new ways never seen before. This calls for a new infrastructure--new platforms for collaboration, new coding
frameworks, new data processing schemes, and new ways of using the computing power. It also calls for a new culture—the culture of working together closely for the benefit of all, of free exchange and sharing of knowledge and tools, of respect and appreciation of each other's work, and of the pursuit of harmony among diversity.

The DeepModeling community is a community of such a group of people.

Read more »
0%