吳恩達神經網絡1-2-2_圖神經網絡進行藥物發現-第2部分

吳恩達神經網絡1-2-2

預測毒性 (Predicting Toxicity)

相關資料 (Related Material)

  • Jupyter Notebook for the article

    Jupyter Notebook的文章

  • Drug Discovery with Graph Neural Networks — part 1

    圖神經網絡進行藥物發現-第1部分

  • Introduction to Cheminformatics

    化學信息學導論

  • Deep learning on graphs: successes, challenges, and next steps (article by prof Michael Bronstein)

    圖上的深度學習:成功,挑戰和下一步 (邁克爾·布朗斯坦教授的文章)

  • Towards Explainable Graph Neural Networks

    走向可解釋的圖形神經網絡

目錄 (Table of Contents)

  • Introduction

    介紹
  • Approaching the Problem with Graph Neural Networks

    圖神經網絡解決問題
  • Hands-on Part with Deepchem

    Deepchem的動手部分
  • About Me

    關于我

介紹 (Introduction)

In this article, we will cover another crucial factor that determines whether the drug can pass safety tests — toxicity. In fact, the toxicity accounts for 30% of rejected drug candidates making it one of the most important factors to consider during the drug development stage [1]. Machine learning will prove here very beneficial as it can filter out toxic drug candidates in the early stage of the drug discovery process.

在本文中,我們將介紹另一個決定藥物是否可以通過安全性測試的關鍵因素- 毒性 。 實際上,毒性占被拒絕藥物候選者的30%,這使其成為藥物開發階段要考慮的最重要因素之一[1]。 機器學習在這里將被證明是非常有益的,因為它可以在藥物發現過程的早期篩選出有毒的候選藥物。

I will assume that you’ve read my previous article which explains some topics and terms that I will be using in this article :) Let’s get started!

我假設您已經閱讀了上一篇文章 ,其中解釋了本文中將使用的一些主題和術語:)讓我們開始吧!

圖神經網絡解決問題 (Approaching the Problem with Graph Neural Networks)

The feature engineering part is pretty much the same as in part 1 of the series. To convert molecular structure into an input for GNNs, we can create molecular fingerprints, or feed it into graph neural network using adjacency matrix and feature vectors. This features can be automatically generated by external software such as RDKit or Deepchem so we don’t have to worry much about it.

特征工程部分與本系列的第1部分幾乎相同。 要將分子結構轉換為GNN的輸入,我們可以創建分子指紋,或使用鄰接矩陣和特征向量將其輸入到圖神經網絡中。 此功能可由RDKit或Deepchem等外部軟件自動生成,因此我們不必為此擔心。

毒性 (Toxicity)

The biggest difference is in the machine learning task itself. Toxicity prediction is a classification task, in contrary to the solubility prediction which is a regression task as we might recall from the previous article. There are many different toxicity effects such as carcinogenicity, respiratory toxicity, irritation/corrosion, and others [2]. This makes it a slightly more complicated challenge to work with as we might have to cope also with the imbalanced classes.

最大的區別在于機器學習任務本身。 毒性預測是分類任務,與溶解度預測相反,溶解度預測是回歸任務,正如我們可能從上一篇文章中回憶的那樣。 有許多不同的毒性作用,例如致癌性,呼吸毒性,刺激/腐蝕等[2]。 這使工作變得更加復雜,因為我們可能還必須應對不平衡的班級。

Fortunately, the toxicity datasets are often considerably bigger than the solubility counterparts. For example, the Tox21 dataset has ~12k training samples when the Delaney dataset used for solubility prediction has only ~3k training samples. This makes neural networks architectures a more promising approach to use as it can capture more hidden information.

幸運的是,毒性數據集通常比溶解度對應數據大得多。 例如,當用于溶解度預測的Delaney數據集只有約3k訓練樣本時,Tox21數據集具有約1.2萬訓練樣本。 這使得神經網絡體系結構可以捕獲更多隱藏信息,因此成為一種更有希望的方法。

Tox21數據集 (Tox21 Dataset)

Tox21 dataset was created as a project challenging researchers to develop machine learning models that achieve the highest performance on the given data. It contains 12 distinct labels and each indicates a different toxicity effect. Overall, the dataset has 12,060 training samples and 647 test samples.

Tox21數據集是作為一個項目而創建的,該項目挑戰研究人員開發可在給定數據上實現最高性能的機器學習模型。 它包含12個不同的標簽,每個標簽都表示不同的毒性作用。 總體而言,數據集包含12,060個訓練樣本和647個測試樣本。

The winning approach for this challenge was DeepTox [3] which is a deep learning pipeline that utilizes chemical descriptors to predict the toxicity classes. It highly suggests that deep learning is the most effective approach and that graph neural networks have potential to achieve even higher performance.

應對這一挑戰的成功方法是DeepTox [3],它是一種深度學習管道,利用化學描述符來預測毒性等級。 它強烈表明深度學習是最有效的方法,并且圖神經網絡有潛力獲得更高的性能。

Deepchem的動手部分 (Hands-on Part with Deepchem)

Colab notebook that you can run by yourself is here.

您可以自己運行的Colab筆記本在這里。

Firstly, we import the necessary libraries. Nothing new here — we will be using Deepchem to train a GNN model on Tox21 data. The GraphConvModel is an architecture that was created by Duvenaud, et al. It uses a modified version of fingerprint algorithms to make them differentiable (so we can do a gradient update). It is one of the first GNN architectures that were designed to handle molecular structures as graphs.

首先,我們導入必要的庫。 這里沒什么新鮮的 -我們將使用Deepchem在Tox21數據上訓練GNN模型。 GraphConvModel是Duvenaud等人創建的架構。 它使用指紋算法的修改版本以使其具有差異性(因此我們可以進行梯度更新)。 它是最早設計用于將分子結構作為圖形處理的GNN架構之一。

# Importing required libraries and its utilities
import numpy as npnp.random.seed(123)
import tensorflow as tftf.random.set_seed(123)
import deepchem as dc
from deepchem.molnet import load_tox21
from deepchem.models.graph_models import GraphConvModel

Deepchem contains a convenient API to load the Tox21 for us with .load_tox21 function. We choose a featurizer as GraphConv — it will create chemical descriptors (i.e. features) to match the input requirements for our model. As this is a classification task, ROC AUC score will be used as a metric.

Deepchem包含一個方便的API,可使用來為我們加載Tox21。 load_tox21函數。 我們選擇一個特征化器作為GraphConv —它會創建化學描述符(即特征)以匹配模型的輸入要求。 由于這是分類任務,因此ROC AUC得分將用作度量。

# Tox21 is a part of Deepchem library
# so we can convieniently download it using load_tox21 function
tox21_tasks, tox21_datasets, transformers = load_tox21(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = tox21_datasets# Define metric for the model
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean, mode="classification")

The beauty of the Deepchem is that the models use Keras-like API. We can train the model with .fit function. We pass len(tox21_tasks) into model’s arguments, which is a number of labels (12 in this case). This will set the output size of the final layer as 12. We use a batch size of 32 to speed up the computation time and to specify that the model is used for the classification task. The model takes several minutes to train on the Google Colab notebooks.

Deepchem的優點在于模型使用類似Keras的API。 我們可以用訓練模型。 擬合函數。 我們將len(tox21_tasks)傳遞給模型的arguments 它是許多標簽(在這種情況下為12)。 這會將最終層的輸出大小設置為12。我們使用32的批處理大小來加快計算時間,并指定將模型用于分類任務。 該模型需要幾分鐘才能在Google Colab筆記本上進行訓練。

# Define and fit the model
model = GraphConvModel(len(tox21_tasks), batch_size=32, mode='classification')
print("Fitting the model")
model.fit(train_dataset, nb_epoch=10)

After the training is complete, we can evaluate the model. Nothing difficult here again— we can still use the Keras API for that part. The ROC AUC scores are obtained with the .evaluate function.

訓練完成后,我們可以評估模型。 在這里沒什么困難的-我們仍然可以使用Keras API進行該部分。 ROC AUC得分是通過.evaluate函數獲得的。

print("Evaluating model with ROC AUC")
train_scores = model.evaluate(train_dataset, [metric], transformers)
valid_scores = model.evaluate(valid_dataset, [metric], transformers)print("Train scores")
print(train_scores)print("Validation scores")
print(valid_scores)

In my case, the train ROC AUC score was higher than the validation ROC AUC score. This might indicate that model is overfitting to some molecules.

在我的案例中,火車的ROC AUC分數高于驗證的ROC AUC分數。 這可能表明模型對某些分子過度擬合。

Image for post

You can do much more with Deepchem that. It contains several different GNN models that are as easy to use as in this tutorial. I highly suggest looking at their tutorials. For the toxicity task, they have gathered several different examples that run with different models. You can find it here.

利用Deepchem,您可以做更多的事情。 它包含幾種不同的GNN模型,這些模型與本教程一樣易于使用。 我強烈建議您看一下他們的教程。 對于毒性任務,他們收集了使用不同模型運行的幾個不同示例。 你可以在這里找到它。

Thank you for reading the article, I hope it was useful for you!

感謝您閱讀本文,希望對您有所幫助!

關于我 (About Me)

I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)

我是阿姆斯特丹大學的人工智能碩士研究生。 在業余時間,您會發現我不喜歡數據或調試我的深度學習模型(我發誓它能工作!)。 我也喜歡遠足:)

Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:

如果您想與我的最新文章和其他有用內容保持聯系,這是我的社交媒體個人資料:

  • Medium

  • Linkedin

    領英

  • Github

    Github

  • Personal Website

    個人網站

翻譯自: https://towardsdatascience.com/drug-discovery-with-graph-neural-networks-part-2-b1b8d60180c4

吳恩達神經網絡1-2-2

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/391562.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/391562.shtml
英文地址,請注明出處:http://en.pswp.cn/news/391562.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

android初學者_適用于初學者的Android廣播接收器

android初學者Let’s say you have an application that depends on a steady internet connection. You want your application to get notified when the internet connection changes. How do you do that?假設您有一個依賴穩定互聯網連接的應用程序。 您希望您的應用程序在…

Android熱修復之 - 阿里開源的熱補丁

1.1 基本介紹     我們先去github上面了解它https://github.com/alibaba/AndFix 這里就有一個概念那就AndFix.apatch補丁用來修復方法,接下來我們看看到底是怎么實現的。1.2 生成apatch包      假如我們收到了用戶上傳的崩潰信息,我們改完需要修復…

leetcode 456. 132 模式(單調棧)

給你一個整數數組 nums &#xff0c;數組中共有 n 個整數。132 模式的子序列 由三個整數 nums[i]、nums[j] 和 nums[k] 組成&#xff0c;并同時滿足&#xff1a;i < j < k 和 nums[i] < nums[k] < nums[j] 。 如果 nums 中存在 132 模式的子序列 &#xff0c;返回…

seaborn分類數據可視:散點圖|箱型圖|小提琴圖|lv圖|柱狀圖|折線圖

一、散點圖stripplot( ) 與swarmplot&#xff08;&#xff09; 1.分類散點圖stripplot( ) 用法stripplot(xNone, yNone, hueNone, dataNone, orderNone, hue_orderNone,jitterTrue, dodgeFalse, orientNone, colorNone, paletteNone,size5, edgecolor"gray", linewi…

數據圖表可視化_數據可視化十大最有用的圖表

數據圖表可視化分析師每天使用的最佳數據可視化圖表列表。 (List of best data visualization charts that Analysts use on a daily basis.) Presenting information or data in a visual format is one of the most effective ways. Researchers have proved that the human …

javascript實現自動添加文本框功能

轉自&#xff1a;http://www.cnblogs.com/damonlan/archive/2011/08/03/2126046.html 昨天&#xff0c;我們公司的網絡小組決定為公司做一個內部的網站&#xff0c;主要是為員工比如發布公告啊、填寫相應信息、投訴、問題等等需求。我那同事給了我以下需求&#xff1a; 1.點擊一…

從Mysql slave system lock延遲說開去

本文主要分析 sql thread中system lock出現的原因&#xff0c;但是筆者并明沒有系統的學習過master-slave的代碼&#xff0c;這也是2018年的一個目標&#xff0c;2018年我都排滿了&#xff0c;悲劇。所以如果有錯誤請指出&#xff0c;也作為一個筆記用于后期學習。同時也給出筆…

傳智播客全棧_播客:從家庭學生到自學成才的全棧開發人員

傳智播客全棧In this weeks episode of the freeCodeCamp podcast, Abbey chats with Madison Kanna, a full-stack developer who works remotely for Mediavine. Madison describes how homeschooling affected her future learning style, how she tackles imposter syndrom…

leetcode 82. 刪除排序鏈表中的重復元素 II(map)

解題思路 map記錄數字出現的次數&#xff0c;出現次數大于1的數字從鏈表中移除 代碼 /*** Definition for singly-linked list.* public class ListNode {* int val;* ListNode next;* ListNode() {}* ListNode(int val) { this.val val; }* ListNode(…

python 列表、字典多排序問題

版權聲明&#xff1a;本文為博主原創文章&#xff0c;遵循 CC 4.0 by-sa 版權協議&#xff0c;轉載請附上原文出處鏈接和本聲明。本文鏈接&#xff1a;https://blog.csdn.net/justin051/article/details/84289189Python使用sorted函數來排序&#xff1a; l [2,1,3,5,7,3]print…

接facebook廣告_Facebook廣告分析

接facebook廣告Is our company’s Facebook advertising even worth the effort?我們公司的Facebook廣告是否值得努力&#xff1f; 題&#xff1a; (QUESTION:) A company would like to know if their advertising is effective. Before you start, yes…. Facebook does ha…

如何創建自定義進度欄

Originally published on www.florin-pop.com最初發布在www.florin-pop.com The theme for week #14 of the Weekly Coding Challenge is: 每周編碼挑戰第14周的主題是&#xff1a; 進度條 (Progress Bar) A progress bar is used to show how far a user action is still in…

基于SpringBoot的CodeGenerator

title: 基于SpringBoot的CodeGenerator tags: SpringBootMybatis生成器PageHelper categories: springboot date: 2017-11-21 15:13:33背景 目前組織上對于一個基礎的crud的框架需求較多 因此選擇了SpringBoot作為基礎選型。 Spring Boot是由Pivotal團隊提供的全新框架&#xf…

seaborn線性關系數據可視化:時間線圖|熱圖|結構化圖表可視化

一、線性關系數據可視化lmplot( ) 表示對所統計的數據做散點圖&#xff0c;并擬合一個一元線性回歸關系。 lmplot(x, y, data, hueNone, colNone, rowNone, paletteNone,col_wrapNone, height5, aspect1,markers"o", sharexTrue,shareyTrue, hue_orderNone, col_orde…

hdu 1257

http://acm.hdu.edu.cn/showproblem.php?pid1257 題意&#xff1a;有個攔截系統&#xff0c;這個系統在最開始可以攔截任意高度的導彈&#xff0c;但是之后只能攔截不超過這個導彈高度的導彈&#xff0c;現在有N個導彈需要攔截&#xff0c;問你最少需要多少個攔截系統 思路&am…

eda可視化_5用于探索性數據分析(EDA)的高級可視化

eda可視化Early morning, a lady comes to meet Sherlock Holmes and Watson. Even before the lady opens her mouth and starts telling the reason for her visit, Sherlock can tell a lot about a person by his sheer power of observation and deduction. Similarly, we…

我的AWS開發人員考試未通過。 現在怎么辦?

I have just taken the AWS Certified Developer - Associate Exam on July 1st of 2019. The result? I failed.我剛剛在2019年7月1日參加了AWS認證開發人員-聯考。結果如何&#xff1f; 我失敗了。 The AWS Certified Developer - Associate (DVA-C01) has a scaled score …

關系數據可視化gephi

表示對象之間的關系&#xff0c;可通過gephi軟件實現&#xff0c;軟件下載官方地址https://gephi.org/users/download/ 如何來表示兩個對象之間的關系&#xff1f; 把對象變成點&#xff0c;點的大小、顏色可以是它的兩個參數&#xff0c;兩個點之間的關系可以用連線表示。連線…

Hyperledger Fabric 1.0 從零開始(十二)——fabric-sdk-java應用

Hyperledger Fabric 1.0 從零開始&#xff08;十&#xff09;——智能合約&#xff08;參閱&#xff1a;Hyperledger Fabric Chaincode for Operators——實操智能合約&#xff09; Hyperledger Fabric 1.0 從零開始&#xff08;十一&#xff09;——CouchDB&#xff08;參閱&a…

css跑道_如何不超出跑道:計劃種子的簡單方法

css跑道There’s lots of startup advice floating around. I’m going to give you a very practical one that’s often missed — how to plan your early growth. The seed round is usually devoted to finding your product-market fit, meaning you start with no or li…