torchmd-net開源程序是訓練神經網絡潛力

?一、軟件介紹

文末提供程序和源碼下載

TorchMD-NET 提供最先進的神經網絡電位（NNP）和訓練它們的機制。如果有多個 NNP，它可提供高效、快速的實現，并且它集成在 GPU 加速的分子動力學代碼中，如 ACEMD、OpenMM 和 TorchMD。TorchMD-NET 將其 NNP 公開為 PyTorch 模塊。

二、Available architectures?可用的架構

Equivariant Transformer (ET)
等變變壓器（ET）
Transformer (T) 變壓器（T）
Graph Neural Network (GN)
圖神經網絡（GN）
TensorNet 張量網

三、Installation?安裝

TorchMD-Net 可用作 pip 可安裝輪，也可用于 conda-forge

TorchMD-Net provides builds for CPU-only, CUDA 11.8 and CUDA 12.4. CPU versions are only provided as reference, as the performance will be extremely limited. Depending on which variant you wish to install, you can install it with one of the following commands:
TorchMD-Net 提供純 CPU、CUDA 11.8 和 CUDA 12.4 的構建。CPU 版本僅供參考，因為性能將非常有限。根據您要安裝的變體，您可以使用以下命令之一進行安裝：

# The following will install the CUDA 12.4 version by default
pip install torchmd-net 
# The following will install the CUDA 11.8 version
pip install torchmd-net --extra-index-url https://download.pytorch.org/whl/cu118 --extra-index-url https://us-central1-python.pkg.dev/pypi-packages-455608/cu118/simple
# The following will install the CUDA 12.4 version
pip install torchmd-net --extra-index-url https://download.pytorch.org/whl/cu124 --extra-index-url https://us-central1-python.pkg.dev/pypi-packages-455608/cu124/simple
# The following will install the CPU only version (not recommended)
pip install torchmd-net --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://us-central1-python.pkg.dev/pypi-packages-455608/cpu/simple

Alternatively it can be installed with conda or mamba with one of the following commands. We recommend using?Miniforge?instead of anaconda.
或者，可以使用以下命令之一使用 conda 或 mamba 進行安裝。我們建議使用 Miniforge 而不是 anaconda。

mamba install torchmd-net cuda-version=11.8
mamba install torchmd-net cuda-version=12.4

Install from source?從源碼安裝

TorchMD-Net is installed using pip, but you will need to install some dependencies before. Check?this documentation page.
TorchMD-Net 是使用 pip 安裝的，但您需要先安裝一些依賴項。查看此文檔頁面。

四、Usage?用法

指定訓練參數可以通過配置 yaml 文件或直接通過命令行參數完成。可以在 examples/ 中找到某些模型和數據集的架構和訓練規范的幾個示例。請注意，如果 yaml 文件和命令行中都存在參數，則命令行版本優先。可以通過設置?CUDA_VISIBLE_DEVICES?環境變量來選擇 GPU。否則，該參數?--ngpus?可用于選擇要訓練的 GPU 數量（默認值 -1 使用所有可用的 GPU 或中指定的?CUDA_VISIBLE_DEVICES?GPU）。請記住，nvidia-smi 報告的 GPU ID 可能與?CUDA_VISIBLE_DEVICES?使用的 GPU ID 不同。
For example, to train the Equivariant Transformer on the QM9 dataset with the architectural and training hyperparameters described in the paper, one can run:
例如，要使用論文中描述的架構和訓練超參數在 QM9 數據集上訓練等變變換器，可以運行：

mkdir output
CUDA_VISIBLE_DEVICES=0 torchmd-train --conf torchmd-net/examples/ET-QM9.yaml --log-dir output/

Run?torchmd-train --help?to see all available options and their descriptions.
Run （運行?torchmd-train --help?）可查看所有可用選項及其描述。

Creating a new dataset?創建新數據集

If you want to train on custom data, first have a look at?torchmdnet.datasets.Custom, which provides functionalities for loading a NumPy dataset consisting of atom types and coordinates, as well as energies, forces or both as the labels. Alternatively, you can implement a custom class according to the torch-geometric way of implementing a dataset. That is, derive the?Dataset?or?InMemoryDataset?class and implement the necessary functions (more info?here). The dataset must return torch-geometric?Data?objects, containing at least the keys?z?(atom types) and?pos?(atomic coordinates), as well as?y?(label),?neg_dy?(negative derivative of the label w.r.t atom coordinates) or both.
如果您想使用自定義數據進行訓練，請首先查看?torchmdnet.datasets.Custom?，它提供了加載由原子類型和坐標以及能量和/或力組成的 NumPy 數據集的功能。或者，您可以根據實現數據集的 torch-geometric 方式實現自定義類。也就是說，派生?Dataset?or?InMemoryDataset?類并實現必要的函數（更多信息在這里）。數據集必須返回 torch-geometric?Data?對象，至少包含鍵?z?（原子類型）和?pos?（原子坐標）以及?y?（標簽）?neg_dy?（標簽 w.r.t 原子坐標的負導數）或兩者。

Custom prior models?自定義先前模型

In addition to implementing a custom dataset class, it is also possible to add a custom prior model to the model. This can be done by implementing a new prior model class in?torchmdnet.priors?and adding the argument?--prior-model <PriorModelName>. As an example, have a look at?torchmdnet.priors.Atomref.
除了實現自定義數據集類之外，還可以向模型添加自定義先驗模型。這可以通過在中?torchmdnet.priors?實現一個新的先前模型類并添加參數?--prior-model <PriorModelName>?來完成。例如，請查看?torchmdnet.priors.Atomref?。

Multi-Node Training?多節點訓練

In order to train models on multiple nodes some environment variables have to be set, which provide all necessary information to PyTorch Lightning. In the following we provide an example bash script to start training on two machines with two GPUs each. The script has to be started once on each node. Once?torchmd-train?is started on all nodes, a network connection between the nodes will be established using NCCL.
為了在多個節點上訓練模型，必須設置一些環境變量，這些變量為 PyTorch Lightning 提供所有必要的信息。在下文中，我們提供了一個示例 bash 腳本，用于在兩臺每臺機器上開始訓練，每臺機器有兩個 GPU。該腳本必須在每個節點上啟動一次。在所有節點上啟動后?torchmd-train?，將使用 NCCL 在節點之間建立網絡連接。

In addition to the environment variables the argument?--num-nodes?has to be specified with the number of nodes involved during training.
除了環境變量之外，?--num-nodes?還必須指定參數以及訓練期間涉及的節點數。

export NODE_RANK=0
export MASTER_ADDR=hostname1
export MASTER_PORT=12910mkdir -p output
CUDA_VISIBLE_DEVICES=0,1 torchmd-train --conf torchmd-net/examples/ET-QM9.yaml.yaml --num-nodes 2 --log-dir output/

NODE_RANK?: Integer indicating the node index. Must be?0?for the main node and incremented by one for each additional node.
NODE_RANK?：表示節點索引的整數。必須用于?0?主節點，并且每增加一個節點，其增量為 1。
MASTER_ADDR?: Hostname or IP address of the main node. The same for all involved nodes.
MASTER_ADDR?：主節點的主機名或 IP 地址。所有相關節點都是一樣的。
MASTER_PORT?: A free network port for communication between nodes. PyTorch Lightning suggests port?12910?as a default.
MASTER_PORT?：用于節點之間通信的空閑網絡端口。PyTorch Lightning 建議將 port?12910?作為默認值。

Known Limitations?已知限制

Due to the way PyTorch Lightning calculates the number of required DDP processes, all nodes must use the same number of GPUs. Otherwise training will not start or crash.
由于 PyTorch Lightning 計算所需 DDP 進程數量的方式，所有節點都必須使用相同數量的 GPU。否則，訓練將無法開始或崩潰。
We observe a 50x decrease in performance when mixing nodes with different GPU architectures (tested with RTX 2080 Ti and RTX 3090).
我們觀察到，當混合使用具有不同 GPU 架構的節點時，性能下降了 50 倍（使用 RTX 2080 Ti 和 RTX 3090 測試）。
Some CUDA systems might hang during a multi-GPU parallel training. Try?export NCCL_P2P_DISABLE=1, which disables direct peer to peer GPU communication.
某些 CUDA 系統可能會在多 GPU 并行訓練期間掛起。Try?export NCCL_P2P_DISABLE=1?，這將禁用直接的對等 GPU 通信。

Cite?引用

If you use TorchMD-NET in your research, please cite the following papers:
如果您在研究中使用 TorchMD-NET，請引用以下論文：

Main reference?主要參考

<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@misc{pelaez2024torchmdnet,
title={TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations}, 
author={Raul P. Pelaez and Guillem Simeon and Raimondas Galvelis and Antonio Mirarchi and Peter Eastman and Stefan Doerr and Philipp Th?lke and Thomas E. Markland and Gianni De Fabritiis},
year={2024},
eprint={2402.17660},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
</code></span></span></span></span>

TensorNet?張量網

<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@inproceedings{simeon2023tensornet,
title={TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials},
author={Guillem Simeon and Gianni De Fabritiis},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=BEHlPdBZ2e}
}
</code></span></span></span></span>

Equivariant Transformer

<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@inproceedings{
tholke2021equivariant,
title={Equivariant Transformers for Neural Network based Molecular Potentials},
author={Philipp Th{\"o}lke and Gianni De Fabritiis},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=zNHzqZ9wrRB}
}
</code></span></span></span></span>

Graph Network?Graph 網絡

<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@article{Majewski2023,title = {Machine learning coarse-grained potentials of protein thermodynamics},volume = {14},ISSN = {2041-1723},url = {http://dx.doi.org/10.1038/s41467-023-41343-1},DOI = {10.1038/s41467-023-41343-1},number = {1},journal = {Nature Communications},publisher = {Springer Science and Business Media LLC},author = {Majewski,  Maciej and Pérez,  Adrià and Th\"{o}lke,  Philipp and Doerr,  Stefan and Charron,  Nicholas E. and Giorgino,  Toni and Husic,  Brooke E. and Clementi,  Cecilia and Noé,  Frank and De Fabritiis,  Gianni},year = {2023},month = sep 
}
</code></span></span></span></span>

Developer guide?開發人員指南

Implementing a new architecture
實施新架構

To implement a new architecture, you need to follow these steps:
要實施新架構，您需要執行以下步驟：
1.?Create a new class in?torchmdnet.models?that inherits from?torch.nn.Model. Follow TorchMD_ET as a template. This is a minimum implementation of a model:
1. 在中?torchmdnet.models?創建一個繼承自?torch.nn.Model?的新類。以 TorchMD_ET 為模板。這是模型的最低實現：

class MyModule(nn.Module):def __init__(self, parameter1, parameter2):super(MyModule, self).__init__()# Define your model hereself.layer1 = nn.Linear(10, 10)...# Initialize your model parameters hereself.reset_parameters()def reset_parameters(self):# Initialize your model parameters herenn.init.xavier_uniform_(self.layer1.weight)...def forward(self,z: Tensor, # Atomic numbers, shape (n_atoms, 1)pos: Tensor, # Atomic positions, shape (n_atoms, 3)batch: Tensor, # Batch vector, shape (n_atoms, 1). All atoms in the same molecule have the same value and are contiguous.q: Optional[Tensor] = None, # Atomic charges, shape (n_atoms, 1)s: Optional[Tensor] = None, # Atomic spins, shape (n_atoms, 1)) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]:# Define your forward pass herescalar_features = ...vector_features = ...# Return the scalar and vector features, as well as the atomic numbers, positions and batch vectorreturn scalar_features, vector_features, z, pos, batch

2.?Add the model to the?__all__?list in?torchmdnet.models.__init__.py. This will make the tests pick your model up.
2. 將模型添加到中的?__all__?torchmdnet.models.__init__.py?列表中。這將使測試選取您的模型。
3.?Tell models.model.create_model how to initialize your module by adding a new entry, for instance:
3. 通過添加新條目來告訴models.model.create_model如何初始化您的模塊，例如：

    elif args["model"] == "mymodule":from torchmdnet.models.torchmd_mymodule import MyModuleis_equivariant = False # Set to True if your model is equivariantrepresentation_model = MyModule(parameter1=args["parameter1"],parameter2=args["parameter2"],**shared_args, # Arguments typically shared by all models)

4.?Add any new parameters required to initialize your module to scripts.train.get_args. For instance:
4. 添加將模塊初始化為 scripts.train.get_args 所需的任何新參數。例如：

  parser.add_argument('--parameter1', type=int, default=32, help='Parameter1 required by MyModule')...

5.?Add an example configuration file to?torchmd-net/examples?that uses your model.
5. 添加一個使用您的模型?torchmd-net/examples?的示例配置文件。
6.?Make tests use your configuration file by adding a case to tests.utils.load_example_args. For instance:
6. 通過向 tests.utils.load_example_args 添加 case 來使測試使用您的配置文件。例如：

if model_name == "mymodule":config_file = join(dirname(dirname(__file__)), "examples", "MyModule-QM9.yaml")

At this point, if your module is missing some feature the tests will let you know, and you can add it. If you add a new feature to the package, please add a test for it.
此時，如果您的模塊缺少某些功能，測試會通知您，您可以添加它。如果您向包中添加了新功能，請為其添加測試。

Code style?代碼樣式

We use?black. Please run?black?on your modified files before committing.
我們使用黑色。請在提交之前運行?black?您修改的文件。

Testing?測試

To run the tests, install the package and run?pytest?in the root directory of the repository. Tests are a good source of knowledge on how to use the different components of the package.
要運行測試，請安裝軟件包并在存儲庫的根目錄中運行?pytest?。測試是有關如何使用包的不同組件的良好知識來源。