These days scientists, researchers, doctors, and medical professionals face challenges to develop answers to their high priority scientific questions.
如今,科學家,研究人員,醫生和醫學專家面臨著挑戰,無法為其高度優先的科學問題找到答案。
The rapid acceleration in new coronavirus literature makes it difficult for the medical research community to Keep Up. Therefore there’s a growing urgency for approaches in Natural Language Processing and AI to help medical professionals generate new insights in support of the ongoing fight against this infectious disease.
新的冠狀病毒文獻的Swift發展使醫學研究界難以跟上。 因此,越來越需要采用自然語言處理和AI的方法來幫助醫學專業人士產生新見解,以支持正在進行的抵抗這種傳染病的斗爭。
Objective:
目的:
We aim to assist medical professionals to accelerate their work to help fight COVID19. This will help reduce search time for the medical professional time by accessing a wider range of research resources. All the resources they need in one place.
我們旨在協助醫療專業人員加快工作速度,以對抗COVID19。 通過訪問更廣泛的研究資源,這將有助于減少醫學專業人士的搜索時間。 他們需要的所有資源都集中在一處。
Datasets challenge:
數據集挑戰:
Kaggle has prepared free accessible datasets related to COVID-19 Open Research Dataset (CORD-19).
Kaggle已準備了與COVID-19開放研究數據集(CORD-19)相關的免費的可訪問數據集。

The Cord-19 resource offers more than 158,000 scholarly articles, including over 75,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.
Cord-19資源提供了超過158,000篇學術文章,其中包括超過75,000篇全文,涉及COVID-19,SARS-CoV-2和相關冠狀病毒。
We found these datasets useful to apply the Watson Discovery AI Search Engine on those articles.
我們發現這些數據集對于將Watson Discovery AI搜索引擎應用于這些文章非常有用。
Watson Discovery is a search tool powered by machine learning which can continue to learn and improve over time.
Watson Discovery是一種基于機器學習的搜索工具,可以隨著時間的推移不斷學習和改進。
With this provided datasets 158,000 scholarly articles, we have only prepared “comm_use_subset” which it has 9,120 articles to feed inside Watson Discovery.
借助此提供的數據集,有158,000篇學術文章 , 我們僅準備了“ comm_use_subset ”,其中有 9,120 篇文章可以在Watson Discovery中提供。

Solution:
解:
We are looking into building an assistant smart AI conversational chatbot to answer the user’s high priority scientific questions.
我們正在研究建立一個輔助智能AI對話聊天機器人,以回答用戶的高優先級科學問題。
Step 1: Data analysis: clean the data from JSON files based on text-only:
步驟1:數據分析:基于純文本清除JSON文件中的數據:
we are extracting articles “full-text article” from JSON files and save the results in the form of Txts \ Html.
我們 將從JSON文件 中提取文章 “全文文章” ,并將結果保存為Txts \ Html的形式。
Due to Watson Discovery limit with 50k characters for every single document, the datasets are provided in “JSON files” which all the file has more than 50k characters because of JSON codes. Therefore, We have applied this simple py script below to extract “full-text article” from JSON files and save the results in the form of TXT \ HTML.
由于每個文檔的Watson Discovery限制為50k個字符,因此在“ JSON文件”中提供了數據集,由于JSON代碼,所有文件都超過50k個字符。 因此,我們在下面應用了這個簡單的py腳本,以 從JSON文件中 提取 “全文文章” ,并將結果保存為TXT \ HTML格式。
In our case, we have saved the results in HTML format, because WD doesn’t support Txt format. WD only supports document formats like pdf, word, excel, PowerPoint, Html, png, jpeg, and JSON.
在本例中,我們將結果保存為HTML格式,因為WD不支持Txt格式。 WD僅支持pdf,word,excel,PowerPoint,Html,png,jpeg和JSON等文檔格式。
The script below does; Each executed file formatted like the following order:
下面的腳本可以; 每個執行的文件的格式如下:
- title 標題
- abstract 抽象
- full-text article 全文
This task helped us to have clear data formatted as a text document, it will be easy to manage the data capacity for the numbers of characters in each files.
此任務幫助我們將清晰的數據格式化為文本文檔,可以輕松管理每個文件中字符數的數據容量。
Py Script:
Py腳本:
import json
import os, glob#directory for atricle json files:
articles_dir = 'D:/DEV/Kaggle/CORD-19-research-challenge/comm_use_subset/comm_use_subset'
#output directory for processed files:
output_dir = 'D:/DEV/Kaggle/CORD-19-research-challenge/output'os.chdir(articles_dir)
for file in glob.glob('*.json'):
print('Processing file: ',file)
with open(file, 'r', encoding = 'utf8') as article_file:
article = json.load(article_file)
title = article['metadata']['title']
abstract_sections = []
abstract_texts = dict()
body_sections = []
body_texts = dict()#reading abstracts
for abst in article['abstract']:
abst_section = abst['section']
abst_text = abst['text']
if abst_section not in abstract_sections:
abstract_sections.append(abst_section)
abstract_texts[abst_section] = abst_text
else:
abstract_texts[abst_section] = abstract_texts[abst_section] + '\n' + abst_text
#reading body
for body in article['body_text']:
body_section = body['section']
body_text = body['text']
if body_section not in body_sections:
body_sections.append(body_section)
body_texts[body_section] = body_text
else:
body_texts[body_section] = body_texts[body_section] + '\n' + body_text
with open(output_dir+'/clean.'+file.replace('.json','.html') , 'w', encoding = 'utf8') as out_file:
out_file.writelines(title)
out_file.writelines('\n\n')
#print abstracts
for a_section in abstract_sections:
out_file.writelines('\n\n')
out_file.writelines(a_section)
out_file.writelines('\n')
out_file.writelines(abstract_texts[a_section])
#print body
for b_section in body_sections:
out_file.writelines('\n\n')
out_file.writelines(b_section)
out_file.writelines('\n')
out_file.writelines(body_texts[b_section])
out_file.writelines('\n')
Step 2: Feed Watson Discovery:
步驟2:輸入 Watson Discovery:
Create your IBM free Cloud account: https://ibm.biz/BdqbAU
創建您的IBM免費云帳戶: https : //ibm.biz/BdqbAU
With Watson Discovery smart AI search engine, we have fed and trained our queries and rated the results with WD Machine learning.
借助Watson Discovery智能AI搜索引擎,我們已經喂飽并訓練了我們的查詢,并通過WD Machine learning對結果進行了評分。


Rate the best relevant article for an example question that will be asking by a researcher.
為最相關的文章評分,以解決研究人員將要提出的示例問題。

This task required a lot of reading and understanding the academic and scientific articles, we have built around 100 queries so far.
此任務需要大量閱讀和理解學術和科學文章,到目前為止,我們已經建立了約100個查詢。
Expected questions from the user with best relevant answers.
用戶期望的問題以及最佳的相關答案。

Step4: Integrate Watson Assistant with Watson Discovery:
步驟4:將Watson Assistant與Watson Discovery集成在一起:
Watson Assistant is a conversation AI platform that helps you provide customers fast, straightforward, and accurate answers to their questions, across any application, device, or channel.
Watson Assistant是一個對話式AI平臺,可幫助您在任何應用程序,設備或渠道上為客戶提供快速,直接,準確的問題解答。
Calling Watson Assistant from Java Script for the server connection:
從Java Script調用Watson Assistant進行服務器連接:
const AssistantV1 = require('ibm-watson/assistant/v1');
const { getAuthenticatorFromEnvironment } = require('ibm-watson/auth');// need to manually set url and disableSslVerification to get around
// current Cloud Pak for Data SDK issue IF user uses
// `CONVERSATION_` prefix in run-time environment.
let auth;
let url;
let disableSSL = false;try {
// ASSISTANT should be used
auth = getAuthenticatorFromEnvironment('ASSISTANT');
url = process.env.ASSISTANT_URL;
if (process.env.ASSISTANT_DISABLE_SSL == 'true') {
disableSSL = true;
}
} catch (e) {
// but handle if alternate CONVERSATION is used
auth = getAuthenticatorFromEnvironment('CONVERSATION');
url = process.env.CONVERSATION_URL;
if (process.env.CONVERSATION_DISABLE_SSL == 'true') {
disableSSL = true;
}
}
console.log('Assistant auth:',JSON.stringify(auth, null, 2));const assistant = new AssistantV1({
version: '2020-03-01',
authenticator: auth,
url: url,
disableSslVerification: disableSSL
});// SDK uses workspaceID, but Assistant tooling refers to the this value as the SKILL ID.
assistant.workspaceId = process.env.ASSISTANT_SKILL_ID;module.exports = assistant;
Step 5: Test the app
步驟5 :測試應用
The methodology is defined as: - The user interacts with Watson Assistant.- Watson Assistant Invokes Watson Discovery.- Watson Discovery finds the optimal results regarding the queries and responds to the Assistant.- Watson Assistant displays the results to the User.
該方法定義為: -用戶與Watson Assistant交互。-Watson Assistant調用Watson Discovery。-Watson Discovery找到有關查詢的最佳結果并響應Assistant。-Watson Assistant將結果顯示給用戶。
Finally, we have Integrated Watson Assistant with Watson Discovery, then configured the front-end app with Watson Assistant, then deployed on the IBM cloud. The app is live running, we are going to keep it alive for a while
最后 ,我們將Watson Assistant與Watson Discovery集成在一起,然后使用Watson Assistant配置了前端應用程序,然后將其部署在IBM云上。 該應用程序正在運行,我們將使其保持一段時間
live Demo: https://covid19assistantcfc.mybluemix.net/
現場演示: https : //covid19assistantcfc.mybluemix.net/
We are still generating real crisis questions from the abstracts and the articles, we will be able to keep training the Discovery, and rates the best answers for the bot.
我們仍在從摘要和文章中產生真正的危機問題,我們將能夠繼續培訓Discovery,并為該機器人評價最佳答案。

Project Demo:
項目演示:
演示地址
Conclusion:
結論:
To conclude, this conversational AI chatbot in the research community can be beneficial to help scientists and doctors reducing time and accelerating their work to fight back COVID-19.
總而言之,研究社區中的這種對話式AI聊天機器人可以幫助科學家和醫生減少時間并加快反擊COVID-19的工作。
GitHub Repository for this project:
該項目的GitHub存儲庫:
[1] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
[1] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
[2] https://www.semanticscholar.org/cord19
[2] https://www.semanticscholar.org/cord19
[3] https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html
[3] https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html
[4] https://www.statnews.com/2020/03/16/database-launched-to-spur-ai-tools-to-fight-coronavirus/
[4] https://www.statnews.com/2020/03/16/database-launched-to-spur-ai-tools-to-fight-coronavirus/
[5] https://github.com/Call-for-Code/Solution-Starter-Kit-Communication-2020#the-idea
[5] https://github.com/Call-for-Code/Solution-Starter-Kit-Communication-2020#the-idea
翻譯自: https://medium.com/swlh/covid-19-research-assistant-using-ai-watson-discovery-to-analyze-open-research-dataset-by-kaggle-9807cf467626
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/390832.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/390832.shtml 英文地址,請注明出處:http://en.pswp.cn/news/390832.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!