COVID-19研究助理

These days scientists, researchers, doctors, and medical professionals face challenges to develop answers to their high priority scientific questions.

如今,科學家,研究人員,醫生和醫學專家面臨著挑戰,無法為其高度優先的科學問題找到答案。

The rapid acceleration in new coronavirus literature makes it difficult for the medical research community to Keep Up. Therefore there’s a growing urgency for approaches in Natural Language Processing and AI to help medical professionals generate new insights in support of the ongoing fight against this infectious disease.

新的冠狀病毒文獻的Swift發展使醫學研究界難以跟上。 因此,越來越需要采用自然語言處理和AI的方法來幫助醫學專業人士產生新見解,以支持正在進行的抵抗這種傳染病的斗爭。

Objective:

目的:

We aim to assist medical professionals to accelerate their work to help fight COVID19. This will help reduce search time for the medical professional time by accessing a wider range of research resources. All the resources they need in one place.

我們旨在協助醫療專業人員加快工作速度,以對抗COVID19。 通過訪問更廣泛的研究資源,這將有助于減少醫學專業人士的搜索時間。 他們需要的所有資源都集中在一處。

Datasets challenge:

數據集挑戰:

Kaggle has prepared free accessible datasets related to COVID-19 Open Research Dataset (CORD-19).

Kaggle已準備了與COVID-19開放研究數據集(CORD-19)相關的免費的可訪問數據集。

Image for post
Open Research Dataset Challenge (CORD-19)
開放研究數據集挑戰(CORD-19)

The Cord-19 resource offers more than 158,000 scholarly articles, including over 75,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.

Cord-19資源提供了超過158,000篇學術文章,其中包括超過75,000篇全文,涉及COVID-19,SARS-CoV-2和相關冠狀病毒。

We found these datasets useful to apply the Watson Discovery AI Search Engine on those articles.

我們發現這些數據集對于將Watson Discovery AI搜索引擎應用于這些文章非常有用。

Watson Discovery is a search tool powered by machine learning which can continue to learn and improve over time.

Watson Discovery是一種基于機器學習的搜索工具,可以隨著時間的推移不斷學習和改進。

With this provided datasets 158,000 scholarly articles, we have only prepared “comm_use_subset” which it has 9,120 articles to feed inside Watson Discovery.

借助此提供的數據集,有158,000篇學術文章我們僅準備了“ comm_use_subset ”,其中有 9,120 篇文章可以在Watson Discovery中提供。

Image for post

Solution:

解:

We are looking into building an assistant smart AI conversational chatbot to answer the user’s high priority scientific questions.

我們正在研究建立一個輔助智能AI對話聊天機器人,以回答用戶的高優先級科學問題。

Step 1: Data analysis: clean the data from JSON files based on text-only:

步驟1:數據分析:基于純文本清除JSON文件中的數據:

we are extracting articles “full-text article” from JSON files and save the results in the form of Txts \ Html.

我們 將從JSON文件 中提取文章 “全文文章” ,并將結果保存為Txts \ Html的形式。

Due to Watson Discovery limit with 50k characters for every single document, the datasets are provided in “JSON files” which all the file has more than 50k characters because of JSON codes. Therefore, We have applied this simple py script below to extract “full-text article” from JSON files and save the results in the form of TXT \ HTML.

由于每個文檔的Watson Discovery限制為50k個字符,因此在“ JSON文件”中提供了數據集,由于JSON代碼,所有文件都超過50k個字符。 因此,我們在下面應用了這個簡單的py腳本,以 從JSON文件中 提取 “全文文章” ,并將結果保存為TXT \ HTML格式。

In our case, we have saved the results in HTML format, because WD doesn’t support Txt format. WD only supports document formats like pdf, word, excel, PowerPoint, Html, png, jpeg, and JSON.

在本例中,我們將結果保存為HTML格式,因為WD不支持Txt格式。 WD僅支持pdf,word,excel,PowerPoint,Html,png,jpeg和JSON等文檔格式。

The script below does; Each executed file formatted like the following order:

下面的腳本可以; 每個執行的文件的格式如下:

  • title

    標題
  • abstract

    抽象
  • full-text article

    全文

This task helped us to have clear data formatted as a text document, it will be easy to manage the data capacity for the numbers of characters in each files.

此任務幫助我們將清晰的數據格式化為文本文檔,可以輕松管理每個文件中字符數的數據容量。

Py Script:

Py腳本:

import json
import os, glob#directory for atricle json files:
articles_dir = 'D:/DEV/Kaggle/CORD-19-research-challenge/comm_use_subset/comm_use_subset'
#output directory for processed files:
output_dir = 'D:/DEV/Kaggle/CORD-19-research-challenge/output'os.chdir(articles_dir)
for file in glob.glob('*.json'):
print('Processing file: ',file)
with open(file, 'r', encoding = 'utf8') as article_file:
article = json.load(article_file)
title = article['metadata']['title']
abstract_sections = []
abstract_texts = dict()

body_sections = []
body_texts = dict()#reading abstracts
for abst in article['abstract']:
abst_section = abst['section']
abst_text = abst['text']
if abst_section not in abstract_sections:
abstract_sections.append(abst_section)
abstract_texts[abst_section] = abst_text
else:
abstract_texts[abst_section] = abstract_texts[abst_section] + '\n' + abst_text

#reading body
for body in article['body_text']:
body_section = body['section']
body_text = body['text']
if body_section not in body_sections:
body_sections.append(body_section)
body_texts[body_section] = body_text
else:
body_texts[body_section] = body_texts[body_section] + '\n' + body_text

with open(output_dir+'/clean.'+file.replace('.json','.html') , 'w', encoding = 'utf8') as out_file:
out_file.writelines(title)
out_file.writelines('\n\n')
#print abstracts
for a_section in abstract_sections:
out_file.writelines('\n\n')
out_file.writelines(a_section)
out_file.writelines('\n')
out_file.writelines(abstract_texts[a_section])
#print body
for b_section in body_sections:
out_file.writelines('\n\n')
out_file.writelines(b_section)
out_file.writelines('\n')
out_file.writelines(body_texts[b_section])
out_file.writelines('\n')

Step 2: Feed Watson Discovery:

步驟2:輸入 Watson Discovery:

Create your IBM free Cloud account: https://ibm.biz/BdqbAU

創建您的IBM免費云帳戶: https : //ibm.biz/BdqbAU

With Watson Discovery smart AI search engine, we have fed and trained our queries and rated the results with WD Machine learning.

借助Watson Discovery智能AI搜索引擎,我們已經喂飽并訓練了我們的查詢,并通過WD Machine learning對結果進行了評分。

Image for post
Image for post

Rate the best relevant article for an example question that will be asking by a researcher.

為最相關的文章評分,以解決研究人員將要提出的示例問題。

Image for post

This task required a lot of reading and understanding the academic and scientific articles, we have built around 100 queries so far.

此任務需要大量閱讀和理解學術和科學文章,到目前為止,我們已經建立了約100個查詢。

Expected questions from the user with best relevant answers.

用戶期望的問題以及最佳的相關答案。

Image for post

Step4: Integrate Watson Assistant with Watson Discovery:

步驟4:將Watson Assistant與Watson Discovery集成在一起:

Watson Assistant is a conversation AI platform that helps you provide customers fast, straightforward, and accurate answers to their questions, across any application, device, or channel.

Watson Assistant是一個對話式AI平臺,可幫助您在任何應用程序,設備或渠道上為客戶提供快速,直接,準確的問題解答。

Calling Watson Assistant from Java Script for the server connection:

從Java Script調用Watson Assistant進行服務器連接:

const AssistantV1 = require('ibm-watson/assistant/v1');
const { getAuthenticatorFromEnvironment } = require('ibm-watson/auth');// need to manually set url and disableSslVerification to get around
// current Cloud Pak for Data SDK issue IF user uses
// `CONVERSATION_` prefix in run-time environment.
let auth;
let url;
let disableSSL = false;try {
// ASSISTANT should be used
auth = getAuthenticatorFromEnvironment('ASSISTANT');
url = process.env.ASSISTANT_URL;
if (process.env.ASSISTANT_DISABLE_SSL == 'true') {
disableSSL = true;
}
} catch (e) {
// but handle if alternate CONVERSATION is used
auth = getAuthenticatorFromEnvironment('CONVERSATION');
url = process.env.CONVERSATION_URL;
if (process.env.CONVERSATION_DISABLE_SSL == 'true') {
disableSSL = true;
}
}
console.log('Assistant auth:',JSON.stringify(auth, null, 2));const assistant = new AssistantV1({
version: '2020-03-01',
authenticator: auth,
url: url,
disableSslVerification: disableSSL
});// SDK uses workspaceID, but Assistant tooling refers to the this value as the SKILL ID.
assistant.workspaceId = process.env.ASSISTANT_SKILL_ID;module.exports = assistant;

Step 5: Test the app

步驟5 :測試應用

The methodology is defined as: - The user interacts with Watson Assistant.- Watson Assistant Invokes Watson Discovery.- Watson Discovery finds the optimal results regarding the queries and responds to the Assistant.- Watson Assistant displays the results to the User.

該方法定義為: -用戶與Watson Assistant交互。-Watson Assistant調用Watson Discovery。-Watson Discovery找到有關查詢的最佳結果并響應Assistant。-Watson Assistant將結果顯示給用戶。

Finally, we have Integrated Watson Assistant with Watson Discovery, then configured the front-end app with Watson Assistant, then deployed on the IBM cloud. The app is live running, we are going to keep it alive for a while

最后 ,我們將Watson Assistant與Watson Discovery集成在一起,然后使用Watson Assistant配置了前端應用程序,然后將其部署在IBM云上。 該應用程序正在運行,我們將使其保持一段時間

live Demo: https://covid19assistantcfc.mybluemix.net/

現場演示: https : //covid19assistantcfc.mybluemix.net/

We are still generating real crisis questions from the abstracts and the articles, we will be able to keep training the Discovery, and rates the best answers for the bot.

我們仍在從摘要和文章中產生真正的危機問題,我們將能夠繼續培訓Discovery,并為該機器人評價最佳答案。

Image for post

Project Demo:

項目演示:

演示地址

Conclusion:

結論:

To conclude, this conversational AI chatbot in the research community can be beneficial to help scientists and doctors reducing time and accelerating their work to fight back COVID-19.

總而言之,研究社區中的這種對話式AI聊天機器人可以幫助科學家和醫生減少時間并加快反擊COVID-19的工作。

GitHub Repository for this project:

該項目的GitHub存儲庫:

[1] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

[1] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

[2] https://www.semanticscholar.org/cord19

[2] https://www.semanticscholar.org/cord19

[3] https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html

[3] https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html

[4] https://www.statnews.com/2020/03/16/database-launched-to-spur-ai-tools-to-fight-coronavirus/

[4] https://www.statnews.com/2020/03/16/database-launched-to-spur-ai-tools-to-fight-coronavirus/

[5] https://github.com/Call-for-Code/Solution-Starter-Kit-Communication-2020#the-idea

[5] https://github.com/Call-for-Code/Solution-Starter-Kit-Communication-2020#the-idea

翻譯自: https://medium.com/swlh/covid-19-research-assistant-using-ai-watson-discovery-to-analyze-open-research-dataset-by-kaggle-9807cf467626

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390832.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390832.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390832.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Node.js umei圖片批量下載Node.js爬蟲1.00

這個爬蟲在abaike爬蟲的基礎上改改圖片路徑和下一頁路徑就出來了,代碼如下: // // umei圖片批量下載Node.js爬蟲1.00 // 2017年11月13日 //// 內置http模塊 var httprequire("http");// 內置文件處理模塊,用于創建目錄和圖片文件 v…

交通銀行信息技術管理部副總經理張漫麗:交通銀行“大數據+人工智能”應用研究...

文 | 交通銀行信息技術管理部副總經理張漫麗 大數據隱含著巨大的社會、經濟、科研價值,已引起了各行各業的高度重視。如果能通過人工智能技術有效地組織和使用大數據,將對社會經濟和科學研究發展產生巨大的推動作用,同時也孕育著前所未有的機…

安軟件一勞永逸_如何克服一勞永逸地公開演講的恐懼

安軟件一勞永逸If you’re like most people, the idea of public speaking terrifies you (it terrifies me too). So how do you get over those jitters, get up on stage, and give an amazing talk? First, a disclaimer: this article is purely about your stage prese…

Go語言實戰 : API服務器 (8) 中間件

為什么需要中間件 我們可能需要對每個請求/返回做一些特定的操作,比如 記錄請求的 log 信息在返回中插入一個 Header部分接口進行鑒權 這些都需要一個統一的入口。這個功能可以通過引入 middleware 中間件來解決。Go 的 net/http 設計的一大特點是特別容易構建中間…

缺失值和異常值的識別與處理_識別異常值-第一部分

缺失值和異常值的識別與處理📈Python金融系列 (📈Python for finance series) Warning: There is no magical formula or Holy Grail here, though a new world might open the door for you.警告 : 這里沒有神奇的配方或圣杯,盡管…

SQL Server 常用分頁SQL

今天無聊和朋友討論分頁,發現網上好多都是錯的。網上經常查到的那個Top Not in 或者Max 大部分都不實用,很多都忽略了Order和性能問題。為此上網查了查,順帶把2000和2012版本的也補上了。 先說說網上常見SQL的錯誤或者說局限問題 12345select…

Word中摘要和正文同時分欄后,正文跑到下一頁,怎么辦?或Word分欄后第一頁明明有空位后面的文字卻自動跳到第二頁了,怎么辦?...

問題1:Word中摘要和正文同時分欄后,正文跑到下一頁,怎么辦?或Word分欄后第一頁明明有空位后面的文字卻自動跳到第二頁了,怎么辦? 答:在word2010中,菜單欄中最左側選“文件”->“選…

leetcode 664. 奇怪的打印機(dp)

題目 有臺奇怪的打印機有以下兩個特殊要求: 打印機每次只能打印由 同一個字符 組成的序列。 每次可以在任意起始和結束位置打印新字符,并且會覆蓋掉原來已有的字符。 給你一個字符串 s ,你的任務是計算這個打印機打印它需要的最少打印次數。…

SQL數據類型說明和MySQL語法示例

SQL數據類型 (SQL Data Types) Each column in a database table is required to have a name and a data type. 數據庫表中的每一列都必須具有名稱和數據類型。 An SQL developer must decide what type of data that will be stored inside each column when creating a tab…

PHP7.2 redis

為什么80%的碼農都做不了架構師?>>> PHP7.2 的redis安裝方法: 順便說一下PHP7.2的安裝: wget http://cn2.php.net/distributions/php-7.2.4.tar.gz tar -zxvf php-7.2.4.tar.gz cd php-7.2.4./configure --prefix/usr/local/php…

leetcode 1787. 使所有區間的異或結果為零

題目 給你一個整數數組 nums??? 和一個整數 k????? 。區間 [left, right]&#xff08;left < right&#xff09;的 異或結果 是對下標位于 left 和 right&#xff08;包括 left 和 right &#xff09;之間所有元素進行 XOR 運算的結果&#xff1a;nums[left] XOR n…

【JavaScript】網站源碼防止被人另存為

1、禁示查看源代碼 從"查看"菜單下的"源文件"中同樣可以看到源代碼&#xff0c;下面我們就來解決這個問題&#xff1a; 其實這只要使用一個含有<frame></frame>標記的網頁便可以達到目的。 <frameset> <frame src"你要保密的文件…

梯度 cv2.sobel_TensorFlow 2.0中連續策略梯度的最小工作示例

梯度 cv2.sobelAt the root of all the sophisticated actor-critic algorithms that are designed and applied these days is the vanilla policy gradient algorithm, which essentially is an actor-only algorithm. Nowadays, the actor that learns the decision-making …

共享語義 unix語義_語義UI按鈕

共享語義 unix語義什么是語義UI按鈕&#xff1f; (What are Semantic UI Buttons?) A button indicates a possible user action. Semantic UI provides an easy-to-use syntax that simplifies not only the styling of a button, but also the natural language semantics.按…

垃圾回收算法優缺點對比

image.pngGC之前 說明&#xff1a;該文中的GC算法講解不僅僅局限于某種具體開發語言。 mutator mutator 是 Edsger Dijkstra 、 琢磨出來的詞&#xff0c;有“改變某物”的意思。說到要改變什么&#xff0c;那就是 GC 對象間的引用關系。不過光這么說可能大家還是不能理解&…

標準C程序設計七---77

Linux應用 編程深入 語言編程標準C程序設計七---經典C11程序設計 以下內容為閱讀&#xff1a; 《標準C程序設計》&#xff08;第7版&#xff09; 作者&#xff1a;E. Balagurusamy&#xff08;印&#xff09;&#xff0c; 李周芳譯 清華大學出版社…

leetcode 1190. 反轉每對括號間的子串

題目 給出一個字符串 s&#xff08;僅含有小寫英文字母和括號&#xff09;。 請你按照從括號內到外的順序&#xff0c;逐層反轉每對匹配括號中的字符串&#xff0c;并返回最終的結果。 注意&#xff0c;您的結果中 不應 包含任何括號。 示例 1&#xff1a; 輸入&#xff1a…

yolo人臉檢測數據集_自定義數據集上的Yolo-V5對象檢測

yolo人臉檢測數據集計算機視覺 (Computer Vision) Step by step instructions to train Yolo-v5 & do Inference(from ultralytics) to count the blood cells and localize them.循序漸進的說明來訓練Yolo-v5和進行推理(來自Ultralytics )以對血細胞進行計數并將其定位。 …

oauth2-server-php-docs 授權類型

授權碼 概觀 在Authorization Code交付式時使用的客戶端想要請求訪問受保護資源代表其他用戶&#xff08;即第三方&#xff09;。這是最常與OAuth關聯的授予類型。 詳細了解授權碼 用例 代表第三方來電履行 創建一個實例OAuth2\GrantType\AuthorizationCode并將其添加到您的服務…

flask框架視圖和路由_角度視圖,路由和NgModule的解釋

flask框架視圖和路由Angular vs AngularJS (Angular vs AngularJS) AngularJS (versions 1.x) is a JavaScript-based open source framework. It is cross platform and is used to develop Single Page Web Application (SPWA). AngularJS(版本1.x)是一個基于JavaScript的開源…