吳恩達神經網絡1-2-2
預測毒性 (Predicting Toxicity)
相關資料 (Related Material)
Jupyter Notebook for the article
Jupyter Notebook的文章
Drug Discovery with Graph Neural Networks — part 1
圖神經網絡進行藥物發現-第1部分
Introduction to Cheminformatics
化學信息學導論
Deep learning on graphs: successes, challenges, and next steps (article by prof Michael Bronstein)
圖上的深度學習:成功,挑戰和下一步 (邁克爾·布朗斯坦教授的文章)
Towards Explainable Graph Neural Networks
走向可解釋的圖形神經網絡
目錄 (Table of Contents)
- Introduction 介紹
- Approaching the Problem with Graph Neural Networks 圖神經網絡解決問題
- Hands-on Part with Deepchem Deepchem的動手部分
- About Me 關于我
介紹 (Introduction)
In this article, we will cover another crucial factor that determines whether the drug can pass safety tests — toxicity. In fact, the toxicity accounts for 30% of rejected drug candidates making it one of the most important factors to consider during the drug development stage [1]. Machine learning will prove here very beneficial as it can filter out toxic drug candidates in the early stage of the drug discovery process.
在本文中,我們將介紹另一個決定藥物是否可以通過安全性測試的關鍵因素- 毒性 。 實際上,毒性占被拒絕藥物候選者的30%,這使其成為藥物開發階段要考慮的最重要因素之一[1]。 機器學習在這里將被證明是非常有益的,因為它可以在藥物發現過程的早期篩選出有毒的候選藥物。
I will assume that you’ve read my previous article which explains some topics and terms that I will be using in this article :) Let’s get started!
我假設您已經閱讀了上一篇文章 ,其中解釋了本文中將使用的一些主題和術語:)讓我們開始吧!
圖神經網絡解決問題 (Approaching the Problem with Graph Neural Networks)
The feature engineering part is pretty much the same as in part 1 of the series. To convert molecular structure into an input for GNNs, we can create molecular fingerprints, or feed it into graph neural network using adjacency matrix and feature vectors. This features can be automatically generated by external software such as RDKit or Deepchem so we don’t have to worry much about it.
特征工程部分與本系列的第1部分幾乎相同。 要將分子結構轉換為GNN的輸入,我們可以創建分子指紋,或使用鄰接矩陣和特征向量將其輸入到圖神經網絡中。 此功能可由RDKit或Deepchem等外部軟件自動生成,因此我們不必為此擔心。
毒性 (Toxicity)
The biggest difference is in the machine learning task itself. Toxicity prediction is a classification task, in contrary to the solubility prediction which is a regression task as we might recall from the previous article. There are many different toxicity effects such as carcinogenicity, respiratory toxicity, irritation/corrosion, and others [2]. This makes it a slightly more complicated challenge to work with as we might have to cope also with the imbalanced classes.
最大的區別在于機器學習任務本身。 毒性預測是分類任務,與溶解度預測相反,溶解度預測是回歸任務,正如我們可能從上一篇文章中回憶的那樣。 有許多不同的毒性作用,例如致癌性,呼吸毒性,刺激/腐蝕等[2]。 這使工作變得更加復雜,因為我們可能還必須應對不平衡的班級。
Fortunately, the toxicity datasets are often considerably bigger than the solubility counterparts. For example, the Tox21 dataset has ~12k training samples when the Delaney dataset used for solubility prediction has only ~3k training samples. This makes neural networks architectures a more promising approach to use as it can capture more hidden information.
幸運的是,毒性數據集通常比溶解度對應數據大得多。 例如,當用于溶解度預測的Delaney數據集只有約3k訓練樣本時,Tox21數據集具有約1.2萬訓練樣本。 這使得神經網絡體系結構可以捕獲更多隱藏信息,因此成為一種更有希望的方法。
Tox21數據集 (Tox21 Dataset)
Tox21 dataset was created as a project challenging researchers to develop machine learning models that achieve the highest performance on the given data. It contains 12 distinct labels and each indicates a different toxicity effect. Overall, the dataset has 12,060 training samples and 647 test samples.
Tox21數據集是作為一個項目而創建的,該項目挑戰研究人員開發可在給定數據上實現最高性能的機器學習模型。 它包含12個不同的標簽,每個標簽都表示不同的毒性作用。 總體而言,數據集包含12,060個訓練樣本和647個測試樣本。
The winning approach for this challenge was DeepTox [3] which is a deep learning pipeline that utilizes chemical descriptors to predict the toxicity classes. It highly suggests that deep learning is the most effective approach and that graph neural networks have potential to achieve even higher performance.
應對這一挑戰的成功方法是DeepTox [3],它是一種深度學習管道,利用化學描述符來預測毒性等級。 它強烈表明深度學習是最有效的方法,并且圖神經網絡有潛力獲得更高的性能。
Deepchem的動手部分 (Hands-on Part with Deepchem)
Colab notebook that you can run by yourself is here.
您可以自己運行的Colab筆記本在這里。
Firstly, we import the necessary libraries. Nothing new here — we will be using Deepchem to train a GNN model on Tox21 data. The GraphConvModel is an architecture that was created by Duvenaud, et al. It uses a modified version of fingerprint algorithms to make them differentiable (so we can do a gradient update). It is one of the first GNN architectures that were designed to handle molecular structures as graphs.
首先,我們導入必要的庫。 這里沒什么新鮮的 -我們將使用Deepchem在Tox21數據上訓練GNN模型。 GraphConvModel是Duvenaud等人創建的架構。 它使用指紋算法的修改版本以使其具有差異性(因此我們可以進行梯度更新)。 它是最早設計用于將分子結構作為圖形處理的GNN架構之一。
# Importing required libraries and its utilities
import numpy as npnp.random.seed(123)
import tensorflow as tftf.random.set_seed(123)
import deepchem as dc
from deepchem.molnet import load_tox21
from deepchem.models.graph_models import GraphConvModel
Deepchem contains a convenient API to load the Tox21 for us with .load_tox21 function. We choose a featurizer as GraphConv — it will create chemical descriptors (i.e. features) to match the input requirements for our model. As this is a classification task, ROC AUC score will be used as a metric.
Deepchem包含一個方便的API,可使用來為我們加載Tox21。 load_tox21函數。 我們選擇一個特征化器作為GraphConv —它會創建化學描述符(即特征)以匹配模型的輸入要求。 由于這是分類任務,因此ROC AUC得分將用作度量。
# Tox21 is a part of Deepchem library
# so we can convieniently download it using load_tox21 function
tox21_tasks, tox21_datasets, transformers = load_tox21(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = tox21_datasets# Define metric for the model
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean, mode="classification")
The beauty of the Deepchem is that the models use Keras-like API. We can train the model with .fit function. We pass len(tox21_tasks) into model’s arguments, which is a number of labels (12 in this case). This will set the output size of the final layer as 12. We use a batch size of 32 to speed up the computation time and to specify that the model is used for the classification task. The model takes several minutes to train on the Google Colab notebooks.
Deepchem的優點在于模型使用類似Keras的API。 我們可以用訓練模型。 擬合函數。 我們將len(tox21_tasks)傳遞給模型的arguments ,它是許多標簽(在這種情況下為12)。 這會將最終層的輸出大小設置為12。我們使用32的批處理大小來加快計算時間,并指定將模型用于分類任務。 該模型需要幾分鐘才能在Google Colab筆記本上進行訓練。
# Define and fit the model
model = GraphConvModel(len(tox21_tasks), batch_size=32, mode='classification')
print("Fitting the model")
model.fit(train_dataset, nb_epoch=10)
After the training is complete, we can evaluate the model. Nothing difficult here again— we can still use the Keras API for that part. The ROC AUC scores are obtained with the .evaluate function.
訓練完成后,我們可以評估模型。 在這里沒什么困難的-我們仍然可以使用Keras API進行該部分。 ROC AUC得分是通過.evaluate函數獲得的。
print("Evaluating model with ROC AUC")
train_scores = model.evaluate(train_dataset, [metric], transformers)
valid_scores = model.evaluate(valid_dataset, [metric], transformers)print("Train scores")
print(train_scores)print("Validation scores")
print(valid_scores)
In my case, the train ROC AUC score was higher than the validation ROC AUC score. This might indicate that model is overfitting to some molecules.
在我的案例中,火車的ROC AUC分數高于驗證的ROC AUC分數。 這可能表明模型對某些分子過度擬合。

You can do much more with Deepchem that. It contains several different GNN models that are as easy to use as in this tutorial. I highly suggest looking at their tutorials. For the toxicity task, they have gathered several different examples that run with different models. You can find it here.
利用Deepchem,您可以做更多的事情。 它包含幾種不同的GNN模型,這些模型與本教程一樣易于使用。 我強烈建議您看一下他們的教程。 對于毒性任務,他們收集了使用不同模型運行的幾個不同示例。 你可以在這里找到它。
Thank you for reading the article, I hope it was useful for you!
感謝您閱讀本文,希望對您有所幫助!
關于我 (About Me)
I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)
我是阿姆斯特丹大學的人工智能碩士研究生。 在業余時間,您會發現我不喜歡數據或調試我的深度學習模型(我發誓它能工作!)。 我也喜歡遠足:)
Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:
如果您想與我的最新文章和其他有用內容保持聯系,這是我的社交媒體個人資料:
Medium
中
Linkedin
領英
Github
Github
Personal Website
個人網站
翻譯自: https://towardsdatascience.com/drug-discovery-with-graph-neural-networks-part-2-b1b8d60180c4
吳恩達神經網絡1-2-2
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/391562.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/391562.shtml 英文地址,請注明出處:http://en.pswp.cn/news/391562.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!