What is NCBI Virus?(什么是NCBI病毒)
主要功能:
- Compare your sequence to those in the NCBI Virus database using NCBI BLAST algorithm.
使用NCBI BLAST算法將您的序列與NCBI病毒數據庫中的序列進行比較。 - Search, view and download nucleotide and protein sequences using virus name or taxonomy group.
使用病毒名稱或分類組搜索、查看和下載核苷酸和蛋白質序列。 - Quickly access common data sets for all viruses, all human viruses, bacteriophages, or sequences released in the past month.
快速訪問過去一個月發布的所有病毒、所有人類病毒、噬菌體或序列的通用數據集。 - Explore the massive, normalized datasets and identify data trends.
探索龐大的標準化數據集并確定數據趨勢。
Ways to access NCBI Virus data(訪問NCBI病毒數據的方法)
Select one of the three options to access NCBI Virus data.
從三個選項中選擇一個以訪問NCBI病毒數據。
Option 1:
Through the navigation menu in Find data tab select one of the drop-down links:
通過“查找數據”選項卡中的導航菜單,選擇其中一個下拉鏈接:
Search by sequence to use virus-specific NCBI BLAST tool.
按序列搜索以使用病毒特異性NCBI BLAST工具
Search by virus to perform virus sequence search based on virus name or taxonomy.
按病毒搜索以根據病毒名稱或分類法執行病毒序列搜索
All viruses, Human viruses, Bacteriophages, New sequences (past one month) and Available SARS-CoV-2 sequences to view preselected data sets.
所有病毒、人類病毒、噬菌體、新序列(過去一個月)和可用的嚴重急性呼吸系統綜合征冠狀病毒2型序列以查看預選數據集。
Option 2:
The same functionalities can be accessed through the buttons Search by sequence and Search by virus located on NCBI Virus home page.
可以通過NCBI病毒主頁上的“按序列搜索”和“按病毒搜索”按鈕訪問相同的功能。
The results can be viewed in the Results Table, and further refined by utilizing the sequence attributes (metadata) in the Refine Results panel located on the right side of the table. Additionally, you can download the results, conduct multiple sequence alignments, and generate phylogenetic trees using the selected results.
結果可以在“結果表”中查看,并通過使用位于表右側的“優化結果”面板中的序列屬性(元數據)進行進一步優化。此外,您可以下載結果,進行多序列比對,并使用所選結果生成系統發育樹。
Option 3:
Through NCBI Visual Data Dashboard via statistics buttons located in the top row of the dashboard.
通過位于儀表板頂行的統計按鈕,通過NCBI可視化數據儀表板。
NCBI Virus BLAST? tool
The NCBI Virus BLAST? tool provides rapid insight into query sequences by presenting BLASTn and BLASTp results alongside normalized metadata, when available. (NCBI Virus BLAST?工具通過在可用的情況下顯示BLASTn和BLASTp結果以及標準化元數據,提供對查詢序列的快速洞察。)These attributes include: isolation source, host, country, collection and release date, as well as taxonomy and genetic attributes such as completeness, and segment or protein names when applicable. (這些屬性包括:分離來源、宿主、國家、收集和發布日期,以及分類學和遺傳屬性,如完整性,以及片段或蛋白質名稱(如適用)。)The normalized metadata is generated via an internal, curator-guided data-processing pipeline that maps sequence-record attributes to standardized vocabularies to provide a user-friendly view of the data.(規范化元數據是通過一個內部的、由策展人引導的數據處理管道生成的,該管道將序列記錄屬性映射到標準化詞匯表,以提供用戶友好的數據視圖。)
Compare your sequence to those in the NCBI Virus database using the BLAST algorithm
使用BLAST算法將您的序列與NCBI病毒數據庫中的序列進行比較
Press on the button Search by sequence (or select this option from the Find data navigation tab on the top of the page).
按“按序列搜索”按鈕(或從頁面頂部的“查找數據”導航選項卡中選擇此選項)。
Select Nucleotide or Protein tab. Nucleotide tab allows to perform BLASTn search (search against all NCBI virus nucleotide sequences). Protein tab allows to perform BLASTp search (search against all NCBI virus protein sequences). Read more about BLAST? searches at NCBI BLAST Guide.
選擇核苷酸或蛋白質選項卡。核苷酸選項卡允許執行BLASTn搜索(針對所有NCBI病毒核苷酸序列進行搜索)。蛋白質標簽允許進行BLASTp搜索(針對所有NCBI病毒蛋白質序列進行搜索)。有關BLAST?搜索的更多信息,請訪問NCBI BLAST指南。
In NCBI Virus Search by sequence input form enter NCBI sequence accession sequence in plain text or FASTA format and click Start search.
在NCBI病毒按序列搜索輸入表單中,以純文本或FASTA格式輸入NCBI序列accession序列,然后單擊開始搜索。
The BLAST search results will open in a new window, presented in a tabulated format (the Results Table).
BLAST搜索結果將在一個新窗口中打開,以列表格式顯示(結果表)。
Compare your sequences to the sequences in up-to-date Betacoronavirus database
將您的序列與最新Betacoronavirus數據庫中的序列進行比較
To accommodate the SARS-CoV-2 outbreak(爆發 ; 爆發,突然發生) the Betacoronavirus blast database was created. It is regularly updated and includes all sequences from the genus(屬 ) Betacoronavirus. To search your sequence in Betacoronavirus database using BLAST:
為了適應嚴重急性呼吸系統綜合征冠狀病毒2型的爆發,創建了Betacoronavirus blast數據庫。它定期更新,包括Betacoronavirus屬的所有序列。要使用BLAST在Betacoronavirus數據庫中搜索您的序列:
Press on the button Search by sequence (or select this option from the Find data navigation tab on the top of the page).
按“按序列搜索”按鈕(或從頁面頂部的“查找數據”導航選項卡中選擇此選項)。
Select Nucleotide or Protein tab. 選擇核苷酸或蛋白質選項卡。
In NCBI Virus Search by sequence input form enter NCBI sequence accession sequence in plain text or FASTA format and click Search up-to-date Betacoronavirus DB button.
在NCBI病毒按序列搜索輸入表中,以純文本或FASTA格式輸入NCBI序列accession序列,然后單擊搜索最新的Betacoronavirus DB按鈕。
The BLAST search results will open in a separate window in a tabular format (the Results Table).
BLAST搜索結果將在一個單獨的窗口中以表格格式打開(結果表)。
Compare BLAST results in the Results Table
Nucleotide tab allows to perform BLASTN search (using Megablast - optimize for highly similar sequences - search against all NCBI virus nucleotide sequences).
核苷酸選項卡允許執行BLASTN搜索(使用Megablast-優化高度相似的序列-搜索所有NCBI病毒核苷酸序列)。
Protein tab allows to perform BLASTP search (search against all NCBI virus protein sequences). Read more about BLAST algorithms on NCBI BLAST help documentation.
蛋白質標簽允許進行BLASTP搜索(針對所有NCBI病毒蛋白質序列進行搜索)。在NCBI BLAST幫助文檔中關于BLAST算法的信息。
In BLAST search Results Table you can compare search results in tabular display using the following sortable default columns:
在BLAST搜索結果表中,您可以使用以下可排序的默認列在表格顯示中比較搜索結果:
Accession - the NCBI accession number of the NCBI Virus database sequence. Reference sequence accessions marked with label “RefSeq”.
Accession-NCBI病毒數據庫序列的NCBI Accession號(登錄號 ; 檢索號 ; 收錄號 ; 存取號 )。標記有標簽“RefSeq”的參考序列accessions。
Coverage - query coverage. 覆蓋率-查詢覆蓋率。
Identity - the highest percent identity of all query-subject alignments.
相似性-所有查詢-主題對齊的最高相似性百分比。
Submitters(Submitter 遞交者信息) - authors submitted the sequence. Only first submitter’s name is displayed in the column (for example, Baranov,P.V., et al.). To obtain a full list of submitters, click on sequence accession number, this will open the details menu. Click on accession number in the details panel, this will open GenBank Entrez page with all information available for the selected sequence. Alternatively, you can use Download button with CSV format option. The column “Submitters” in the downloaded table will contain the name of all authors submitted each sequence.
提交者-作者提交了序列。列中只顯示第一個提交者的姓名(例如,Baranov,P.V.等人)。要獲得提交者的完整列表,請單擊序列accession號,這將打開詳細信息菜單。點擊詳細信息面板中的accession號,這將打開GenBank Entrez頁面,其中包含所選序列的所有可用信息。或者,您可以使用帶有CSV格式選項的下載按鈕。下載表格中的“提交者”列將包含每個序列提交的所有作者的姓名。
Release date - the date when sequence was released (publicly appeared) in GenBank or other INSDC databases.
發布日期-序列在GenBank或其他INSDC數據庫中發布(公開出現)的日期。
Isolate - Individual isolate from which the sequence was obtained, typically an alphanumeric sample ID. Isolate name parsed from “/isolate” field of GenBank record. SARS-CoV-2 sequence isolate name is formatted according to the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) definitions.
描述獲得該樣本的特征性信息,用于顯示獨特性,有助于多個樣本間的輔助性區分
如 “isolate: Han”,表示樣本來自于特定人群;“isolate: Prostate Cancer Cell Line”,表示樣本來源于特定類型細胞。
隔離物-從中獲得序列的單個隔離物,通常是字母數字樣本ID。從GenBank記錄的“/隔離物”字段解析的隔離物名稱。嚴重急性呼吸系統綜合征冠狀病毒2型序列分離物名稱是根據國際病毒分類委員會(ICTV)冠狀病毒科研究小組的定義格式化的。???
Species – virus species name. 物種——病毒物種名稱
Molecule type - viral nucleic acid type. Molecule type is provided by International Committee on Taxonomy of Viruses (ICTV) in the Master Species List and maintained in the NCBI Taxonomy database. RefSeqs that have “Unknown” molecule type belong to tax groups which were not recognized by the ICTV yet.
分子型-病毒核酸型。分子類型由國際病毒分類委員會(ICTV)在《主要物種名錄》中提供,并保存在NCBI分類數據庫中。具有“未知”分子類型的RefSeqs屬于尚未被ICTV承認的tax groups。
Length - sequence length. Length—序列長度
Geo(地理) Location - country/region of virus specimen(樣品 ; 樣本 ; 標本 ; 抽樣,血樣,尿樣 ; 單一實例) collection. May contain additional geographic information, for example, US state.
地理位置-病毒樣本采集的國家/地區。可能包含其他地理信息,例如美國。
BLAST results can be customized by adding/removing additional columns from the Results Table in Select columns drop-down menu.
BLAST結果可以通過在“選擇列”下拉菜單的“結果表”中添加/刪除其他列進行自定義。
Additional columns include:
USA. If the sample was collected in the United States, the column shows the state abbreviation.
美國。如果樣本是在美國采集的,則該欄顯示國家縮寫。
Host(樣本來源生物的天然(非實驗室)宿主物種學名,即拉丁名) – virus isolation host (read more about isolation host vocabulary mapping). If isolation host is unknown (/host field of the GenBank record), but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the Results Table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the host column of the table.
host-病毒隔離host(閱讀有關隔離host詞匯映射的更多信息)。如果隔離host未知(GenBank記錄的/host字段),但實驗室host存在(如GenBank記錄中的/lab_host字段所示),則實驗室host將出現在結果表的host列中。如果隔離host和實驗室host都可以映射,則表的host列中只顯示隔離host。
Collection Date – virus specimen collection date.
采集日期–病毒樣本采集日期。
SRA accession - NCBI Sequence Read Archive (SRA) accession number.
SRA accession-NCBI序列讀取檔案(SRA)accession號。
Score - the total alignment scores (Total score) from all alignment segments.
分數-所有對齊線段的總對齊分數(總分)。
Genus. 屬
Family. 家族
Sequence type – complete/partial/proviral/refseq read more about sequence type here.
序列類型–完整/部分/provial/refseq點擊此處了解更多關于序列類型的信息。
Nuc completeness - nucleotide completeness (note: it is preliminary data, not always accurate).
核苷酸完整性-核苷酸完整性(注:這是初步數據,并不總是準確的)。
Genotype. 基因型
Segment – segment name in case of segmented viruses.
Segment–如果是分段病毒,則為分段名稱
Publications - links to the associated with sequences publications in PubMed.
出版物-PubMed中與序列相關的出版物的鏈接
Country - country of specimen collection (only country, no any additional information).
國家-標本采集的國家(僅國家,無任何其他信息)。
Isolation source – sequence isolation source read more about isolation source here.
隔離源-序列隔離源在這里關于隔離源的信息。
BioSample – NCBI BioSample accession number.
BioProject – NCBI BioProject accession number.
GenBank title.
The default number of rows displayed in the Results Table is 200. You can change the number of table rows by selecting number results per page (200, 100, 50 or 25) in Select Columns menu.
“結果表”中顯示的默認行數為200。通過在“選擇列”菜單中選擇每頁的結果數(200、100、50或25),可以更改表格行數。
View BLAST Alignment of selected sequences
To compare search results in pair-wise alignment:
要比較成對排列的搜索結果,請執行以下操作:
Select sequences to display.
選擇要顯示的序列。
Click on View BLAST Alignment of selected sequences link displayed in the center of the Info panel located above the Results Table.
點擊結果表上方信息面板中央顯示的查看所選序列的BLAST比對鏈接。
The new page will show a graphical view of pairwise alignments between selected BLAST results and the query, along with a feature map (if available) of the query at the top of the view.
新頁面將顯示所選BLAST結果和查詢之間成對比對的圖形視圖,以及視圖頂部的查詢特征圖(如果可用)。
Read more how to use alignment viewer please refer to NCBI Multiple Sequence Alignment Viewer documentation.
有關如何使用比對查看器的更多信息,請參閱NCBI多序列比對查看器文檔。
Build multiple sequence alignment of selected BLAST results
構建所選BLAST結果的多序列比對
To build multiple sequences alignment based on selected BLAST results:
為了基于所選擇的BLAST結果構建多個序列比對:
Select sequences that you want to align.
選擇要對齊的序列。
Press the button Align on the right above the Results Table.
按下“結果表”上方右側的“對齊”按鈕。
Multiple sequence alignment will open at the new page. Multiple sequence alignments calculated using MUSCLE.
多序列對齊將在新頁面打開。使用MUCLE計算的多序列比對。
Read more how to use alignment viewer please refer to NCBI Multiple Sequence Alignment Viewer documentation.
有關如何使用比對查看器的更多信息,請參閱NCBI多序列比對查看器文檔。
Build phylogenetic tree of selected BLAST results
構建所選BLAST結果的系統發育樹
To build a phylogenetic tree to see the relationships of selected sequences:
要構建系統發育樹以查看所選序列的關系,請執行以下操作:
Select sequences to display. 選擇要顯示的序列。
Press the button labeled Build Phylogenetic Tree on the right above the Results Table.
按下結果表右上方標有“構建系統發育樹”的按鈕。
The tree will be calculated and available in tree viewer on a separate page.
該樹將在單獨的頁面上計算并在樹查看器中可用。
For more about Tree Viewer and how to use it, please refer to NCBI Tree Viewer help documentation located here.
有關樹查看器及其使用方法的更多信息,請參閱此處的NCBI樹查看器幫助文檔。
Refine tabular BLAST results via filters:通過過濾器優化表格BLAST結果:
1. Virus name or taxonomy 病毒名稱或分類
To Restrict search results to the particular virus group:要將搜索結果限制為特定的病毒組,請執行以下操作:
On BLAST result page in Refine Results panel (left upper corner) click on Virus.在優化結果面板(左上角)的BLAST結果頁面上,單擊病毒。
In the text box paste or start typing a single virus taxonomy name, or taxid (only 5 top taxa will be shown).在文本框中粘貼或開始鍵入單個病毒分類名稱,或滑行(只顯示5個頂部分類群)。
Select your taxid (NCBI taxonomy database ID) from the fly-out menu.從彈出菜單中選擇您的taxid(NCBI分類數據庫ID)。
The filtered results will be presented in the Results Table with the following 5 default sortable columns: accession, coverage, identity, species, country, host, collection date. Additional columns to display connected metadata can be added via the Customize Table menu. The query sequence will be highlighted in the first row of the table.
過濾后的結果將顯示在結果表中,其中包含以下5個默認可排序列:加入、覆蓋、身份、物種、國家、宿主、采集日期。可以通過“自定義表”菜單添加其他列以顯示連接的元數據。查詢序列將在表的第一行中突出顯示。
2. Accession
You can search for the particular accessions in the Results Table by entering them in the search form under the Accession filter. The results on the table will be limited to the entered accession numbers.
您可以在結果表中的accession過濾器下的搜索表單中輸入特定的accession信息。表中的結果將僅限于輸入的accession號。
3. Sequence length
To restrict your results to the particular sequence length, enter the minimum and maximum length in nucleotides (for nucleotide search) or amino acids (for protein search).
要將結果限制為特定的序列長度,請輸入核苷酸(用于核苷酸搜索)或氨基酸(用于蛋白質搜索)的最小和最大長度。
4. Ambiguous Characters
允許在結果表上設置每個序列中所需的最大模糊字符數(核苷酸中的N或蛋白質中的X)。
5. Sequence type
All sequences (Nucleotide or Protein) available in the NCBI Virus resource can be filtered based on following sequence types - GenBank and RefSeq.
NCBI病毒資源中可用的所有序列(核苷酸或蛋白質)都可以根據以下序列類型進行過濾——GenBank和RefSeq。
GenBank sequences include all sequences available in GenBank, except RefSeqs.
GenBank序列包括GenBank中可用的所有序列,參考序列除外。
Refseq filtered nucleotide sequences include all reference sequences for the selected virus. Note, that few RefSeqs are partial genomes, based on the International Committee on Taxonomy of Viruses (ICTV) proposal.
Refseq過濾的核苷酸序列包括所選病毒的所有參考序列。請注意,根據國際病毒分類委員會(ICTV)的建議,很少有參考序列是部分基因組。
6. RefSeq genome completeness
Complete or partial RefSeq genomes - filter for all complete (or partial) genomes, reference records (RefSeqs), and proteins form these RefSeqs. In case of segmented viruses complete genomes contain all genome segments. Most of RefSeq records are complete, but few RefSeqs are partial, based on International Commitee on Taxonomy of Viruses (ICTV) proposal.
完整或部分RefSeq基因組-過濾所有完整(或部分)基因組、參考記錄(RefSeq)和形成這些RefSeq的蛋白質。在分段病毒的情況下,完整的基因組包含所有的基因組片段。根據國際病毒分類委員會(ICTV)的建議,大多數RefSeq記錄是完整的,但很少有RefSeq是部分的。
7. Nucleotide completeness
Complete nucleotide sequences - filter for all NCBI viral nucleotide sequences, where GenBank ASN.1 format contains the following descriptors: descr/molinfo/completeness=complete or there is a word ‘complete’ present in the record’s definition line (defline). It also includes complete reference records (RefSeqs).
完整核苷酸序列-過濾所有NCBI病毒核苷酸序列,其中GenBank ASN.1格式包含以下描述符:descr/molifo/complety=完整或記錄的定義行(defline)中存在“完整”一詞。它還包括完整的參考記錄(參考序列)。
Partial nucleotide sequence – filter for sequences that are not complete according to the definition above.
部分核苷酸序列-根據上述定義過濾不完整的序列。
If Protein tab selected and complete nucleotide sequence type filter applied, results will include all proteins from complete genomes or individual complete segments in case of segmented viruses.
如果選擇了蛋白質標簽并應用了完整核苷酸序列類型過濾器,結果將包括來自完整基因組的所有蛋白質,或者在分段病毒的情況下包括單個完整片段。
8. Isolate
Isolate - individual isolate from which the sequence was obtained, typically an alphanumeric sample ID. Isolate name parsed from “/isolate” field of GenBank record. SARS-CoV-2 sequence isolate name is formatted according to the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) definitions.
隔離物-從中獲得序列的單個隔離物,通常是字母數字樣本ID。從GenBank記錄的“/隔離物”字段解析的隔離物名稱。SARS-CoV-2序列分離物名稱是根據國際病毒分類委員會(ICTV)冠狀病毒科研究小組的定義格式化的。
9. Proteins
Protein name parsed from “/product=” field of GenBank nucleotide and protein records
==從GenBank核苷酸和蛋白質記錄的“/product=”字段解析的蛋白質名稱 ==
10. Provirus
Provirus sequences - filter for sequences that have “/proviral” source qualifier in the GenBank record.
Provirus序列-篩選GenBank記錄中具有“/proval”源限定符的序列。
11. Geographic region
The Geographic region filter allows you to type your country of interest in the text box or select the continent(s) of interest. Selecting a continent also selects all the countries within that continent automatically.
地理區域過濾器允許您在文本框中鍵入感興趣的國家或選擇感興趣的大陸。選擇一個大陸也會自動選擇該大陸內的所有國家。
Clicking on the arrow next to a continent’s name opens a secondary selection menu to (un)select the country(s) belonging to the continent of interest. The selected countries are listed below the continent name.
點擊大陸名稱旁邊的箭頭打開一個二級選擇菜單,以(取消)選擇屬于感興趣大陸的國家。所選國家列在大陸名稱下方。
If an entire continent is selected, the continent’s name will be shown in a pillbox below, indicating that all countries for the continent are selected. If at least one country is selected, the corresponding continent is no longer displayed and instead, a pillbox for each selected country is shown below the associated continent. Each continent’s behavior is independent of the other continents.
如果選擇了整個大陸,則該大陸的名稱將顯示在下面的pillbox中,表示該大陸的所有國家都已選擇。如果至少選擇了一個國家,則不再顯示相應的大陸,而是在相關大陸下方顯示每個選定國家的pillbox。每個大陸的行為都獨立于其他大陸。
Selection can be deselected by clicking on the pillboxes, and multiple concurrent selections are supported.
點擊pillboxes可以取消選擇,并且支持多個同時選擇。
12. Isolation host or taxonomy
Enter a host name or taxid to the text box and several host terms will be suggested (only 20 top taxids will be shown). Select the desired host term and hit Enter. The results will be restricted to sequences in the database with the indicated host term. Multiple hosts can be filtered on simultaneously by adding additional host terms to the filter.
在文本框中輸入host名或taxid,將建議使用幾個host術語(只顯示20個頂部taxids)。選擇所需的host術語,然后點擊Enter。結果將僅限于數據庫中具有指定宿主項的序列。通過向篩選器中添加其他host項,可以同時篩選多個host。
The terms for isolation host are parsed from the source/host field in a sequence’s GenBank record. Parsed terms are mapped to a standardized vocabulary, which was derived by curators by aggregating the variety of terms in GenBank files. Common mis-spellings are also included in this mapping strategy. For example, “Accipter cooperii” is mapped to “Accipiter cooperii”.
隔離host的術語是從序列的GenBank記錄中的source/host字段中解析的。解析后的術語被映射到標準化詞匯表,該詞匯表是由curators通過匯總GenBank文件中的各種術語而導出的。常見的拼寫錯誤也包括在這個映射策略中。例如,“Accipeter cooperii”映射為“Accipiter cooperii“。
The terms for isolation hosts are displayed in the host column of the Results Table. In case if the isolation source is unknown, but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the Results Table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the table (host column).
隔離host 的術語顯示在“結果表”的host 列中。如果隔離源未知,但實驗室host 存在(如GenBank記錄的/lab_host字段所示),則實驗室host 將出現在結果表的host 列中。如果隔離host 和實驗室host 都可以映射,則表中只顯示隔離host (host 列)。
13. Submitters
To search for sequences submitted by a particular author(s) enter the author’s last names with or without initials.
要搜索特定作者提交的序列,請輸入作者的姓氏(帶或不帶首字母縮寫)。
The following formats are supported: 支持以下格式:
Chiang,T.Y. Forsyth,K.A. Knittig,L.C. Lim,O.P. Chiang,T.Y., Forsyth,K.A., Knittig,L.C., Lim,O.P. Chiang Forsyth Knittig Lim Chiang, Forsyth, Knittig, Lim
14. Isolation source
The terms for isolation source are parsed from the isolation source field in a sequence’s GenBank record. Examples of parsed terms are serum and plasma, which are all mapped to the standardized vocabulary term blood.
隔離源的術語是從序列的GenBank記錄中的隔離源字段中解析的。解析術語的例子是血清和血漿,它們都映射到標準化詞匯術語血液。
Common mis-spelling as well as regional spelling differences are included in the mapping strategy. Multiple terms can be selected.
映射策略中包括常見的拼寫錯誤以及區域拼寫差異。可以選擇多個術語。
15. Sample collection date
Collection date (From, To) - is the collection date for the sample from which the sequence was derived.
采集日期(From,To)-是衍生序列的樣本的采集日期。
By default, the To: date is set to the current date.
默認情況下,“截止日期”設置為當前日期。
Use mm/dd/yyyy or yyyy formats or click on the calendar icon(圖標 ; 偶像 ; 圖符 ; 圣像 ; 崇拜對象 ) and select dates.
使用mm/dd/yyyy或yyyy格式,或單擊日歷圖標并選擇日期。
16. Sequence release date
Release date (From, To) – the date when sequence was released (publicly appeared) in GenBank or another INSDC database.
發布日期(從,到)-序列在GenBank或另一個INSDC數據庫中發布(公開出現)的日期。
By default, the To: date is set to the current date.
默認情況下,“截止日期”設置為當前日期。
Use mm/dd/yyyy or yyyy formats or click on the calendar icon and select dates.
使用mm/dd/yyyy或yyyy格式,或單擊日歷圖標并選擇日期。
17. Environmental sourse
Environmental source filter allows to select virus sequences isolated from the environmental sources. Generally, environmental isolates are identified by searching for key terms, such as sewage or ocean water from /isolation_source and /note fields of GenBank records when /host field is empty.
環境源過濾器允許選擇從環境源分離的病毒序列。通常,環境隔離物是通過搜索關鍵術語來識別的,例如當/host字段為空時,來自GenBank記錄的/inisolation_source和/note字段的污水或海水。???
Select Include - to include all sequences isolated from environmental sources to the Results Table.
選擇Include(包括)-將從環境源分離的所有序列包括在Results Table(結果表)中。
Select Exclude - to exclude all sequences isolated from environmental sources to the Results Table.
選擇“排除”-將從環境源隔離的所有序列排除到“結果表”中。
Select Only - to view only sequences isolated from environmental sources.
選擇“僅”-僅查看與環境源隔離的序列。???
18. Laboratory samples
Lab host filter allows to view laboratory isolated virus sequences. Lab host identified by searching lab host name in /lab_host field of GenBank record. Additionally (only for bacteriophages) if /host and /lab_host fields are empty, lab host identified by parsing lab host name from bacteriophage organism name of GenBank record.
實驗室宿主過濾器允許查看實驗室分離的病毒序列。通過在GenBank記錄的/Lab_host字段中搜索實驗室host名來識別實驗室host。此外(僅適用于噬菌體),如果/host和/lab_host字段為空,則通過從GenBank記錄的噬菌體生物名稱中解析實驗室宿主名稱來識別實驗室宿主。
Select Include - to include all laboratory isolated virus sequences to the Results Table.
選擇包括-將所有實驗室分離的病毒序列包括在結果表中
Select Exclude - to exclude all laboratory isolated virus sequences to the Results Table.
選擇排除-將所有實驗室分離的病毒序列排除到結果表中
Select Only - to view only laboratory isolated virus sequences.
僅選擇-僅查看實驗室分離的病毒序列
Note: lab host name can be viewed in the result table (in host column) only in cases when the isolation host cannot be identified (/host field of GenBank record is empty).
注意:只有在無法識別隔離host (GenBank記錄的/host字段為空)的情況下,才能在結果表(host 列)中查看實驗室host 名。
19. Vaccine strain
Vaccine strain filter allows to find virus vaccine strain sequences. Vaccine strains identified by searching vaccine strain terms in /isolation_source, /note, /host and definition line of GenBank record.
疫苗菌株過濾器可以找到病毒疫苗菌株序列。通過在GenBank記錄的/sisolation_source、/note、/host和定義行中搜索疫苗菌株術語來識別疫苗菌株。
Select Include - to include all virus vaccine strain sequences to the Results Table.
選擇包括-將所有病毒疫苗株序列包括在結果表中
Select Exclude - to exclude all virus vaccine strain sequences to the Results Table.
選擇排除-將所有病毒疫苗株序列排除到結果表中
Select Only - to view only virus vaccine strain sequences.
僅選擇-僅查看病毒疫苗株序列
Search for sequences by virus name or taxonomy group 按病毒名稱或分類組搜索序列
Find your virus sequence(s) 查找您的病毒序列
Option 1:
Select Search by virus drop-down option from navigation menu Find Data tab on any of NCBI Virus pages. This will open the selection menu.
在任何NCBI病毒頁面上,從導航菜單“查找數據”選項卡中選擇“按病毒搜索”下拉選項。這將打開選擇菜單。
Start typing in the text box, then select your taxid (NCBI taxonomy database ID). To select all viral sequences, enter and then select the term viruses.
開始在文本框中鍵入,然后選擇您的taxid(NCBI分類數據庫ID)。要選擇所有病毒序列,請輸入并選擇術語“病毒”。
The results will be shown in the table. 結果將顯示在表中
Note: Please view a list of all viral taxonomy terms using the NCBI taxonomy pages.
注意:請使用NCBI分類頁面查看所有病毒分類術語的列表。
Option 2:
Click on button Search by virus located in the central part of NCBI virus home page.
點擊NCBI病毒主頁中央部分的“按病毒搜索”按鈕
Start typing in the text box, then select your taxid (NCBI taxonomy database ID).
開始在文本框中鍵入,然后選擇您的taxid(NCBI分類數據庫ID)
This will open the tabular interface with sequences from the selected taxonomy group.
這將打開具有所選分類組中的序列的表格界面
Compare results in the Results Table
Click on the Nucleotide tab to access genomic sequences, the Protein tab to access amino acid sequences for individual proteins, or RefSeq Genome tab to access RefSeq genomes. For segmented viruses each RefSeq genome includes all segments for each segmented virus
單擊核苷酸選項卡可訪問基因組序列,單擊蛋白質選項卡可訪問單個蛋白質的氨基酸序列,或單擊RefSeq基因組選項卡可訪問RefSeq基因。對于分段病毒,每個RefSeq基因組包括每個分段病毒的所有片段
In virus search Results Table you can compare search results in tabular display using the following sortable default columns:
在病毒搜索結果表中,您可以使用以下可排序的默認列以表格形式比較搜索結果:
==Accession== - the NCBI accession number of the NCBI Virus database sequence.
==Submitters== - authors submitted the sequence. Only the first submitter's name displayed in the column (for example, Baranov,P.V., et al.). To obtain a full list of submitters, click on sequence accession number, this will open the details menu. Click on the accession number in the details panel, this will open GenBank Entrez page with all information available for the selected sequence. Alternatively, you can use the Download button with CSV format option. The column "Submitters" in the downloaded table will contain the name of all authors submitted each sequence.
==Release date== - the date when sequence was released (publicly appeared) in GenBank or other INSDC databases.
==Isolate== - Individual isolate from which the sequence was obtained, typically an alphanumeric sample ID. Isolate name parsed from "/isolate" field of GenBank record. SARS-CoV-2 sequence isolate name is formatted according to the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) definitions.
==Species== – virus species name.
==Molecule type== - viral nucleic acid type. Molecule type is provided by International Committee on Taxonomy of Viruses (ICTV) in the Master Species List and maintained in the NCBI Taxonomy database. RefSeqs that have "Unknown" molecule type belong to tax groups which were not recognized by the ICTV yet.
==Length== - sequence length.
==Geo Location== – country/region of virus specimen collection.
==USA.== If the sample was collected in the United States, the column shows the state abbreviation.
==Host== – virus isolation host (Read more about isolation source vocabulary mapping here). If isolation host is unknown (/host field of the GenBank record), but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the Results Table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the host column of the table.
Search results can be customized by adding/removing additional columns from the Results Table in Select Columns dropdown menu.
Additional columns include:
Isolation source – sequence isolation source (read more about isolation source here).
Collection Date – virus specimen collection date.
SRA accession - NCBI Sequence Read Archive (SRA) accession number.
Genus.
Family.
Sequence type – complete/partial/refseq (read more about sequence type here).
Nuc completeness - nucleotide completeness (note: it is preliminary data, not always accurate).
Genotype.
Segment – segment name in case of segmented viruses.
Publications - links to associated with sequences publications in PubMed.
BioSample – NCBI BioSample accession number.
BioProject – NCBI BioProject accession number.
GenBank title.
The default number of rows displayed in the Results Table is 200. You can change the number of table rows by selecting number results per page (200, 100, 50 or 25) in Select Columns menu.
Build multiple sequence alignment of selected results
Please, refer to the Build multiple sequence alignment of selected BLAST results, since functionality is the same.
請參閱所選BLAST結果的構建多序列比對,因為功能是相同的。
Build phylogenetic tree of selected results
Please, refer to the Build phylogenetic tree of selected BLAST results, since functionality is the same.
請參閱所選BLAST結果的構建系統發育樹,因為功能是相同的。
Refine tabular results via filters
Please, refer to the Refine tabled BLAST results via filters, since functionality is the same.
請參閱通過過濾器優化表格BLAST結果,因為功能是相同的。
How to find, view and download SARS-CoV-2 sequences and related metadata?
如何查找、查看和下載SARS-CoV-2序列和相關元數據?
In order to provide free and easy access to genome and protein sequences and associated metadata from the SARS-CoV-2, we created a dedicated Severe acute respiratory syndrome coronavirus 2 data hub.
為了免費、方便地訪問嚴重急性呼吸系統綜合征冠狀病毒2型的基因組和蛋白質序列以及相關元數據,我們創建了一個專門的嚴重急性呼吸綜合征冠狀病毒二型數據中心。
You can access the Results Table on SARS-CoV-2 data hub, by pressing “RefSeq genomes”, “nucleotide” or “protein” links on announcement(公告 ; 宣布 ; 通告 ; 宣告 ; 布告 ) banner located on NCBI home page, in “Find data” navigation menu or using “Up-to-date SARS-CoV-2” shortcut(快捷方式 ; 近路;捷徑 ; 快捷辦法,捷徑 ) button in “Search by virus” form.
您可以訪問SARS-CoV-2數據中心的結果表,方法是在NCBI主頁的“查找數據”導航菜單中按公告橫幅上的“RefSeq genomics”、“核苷酸”或“蛋白質”鏈接,或在“按病毒搜索”窗體中使用“最新嚴重急性急性呼吸系統疾病冠狀病毒2型”快捷按鈕。
SARS-CoV-2 data hub allows to search, retrieve, and analyze and vizualize SARS-CoV-2 data available in GenBank. This page also provides links to Betacoronavirus BLAST, SARS-CoV-2 articles in PubMed, SRA data, NCBI SARS-CoV-2 resources, Data Sets command line and CDC outbreak information.
SARS-CoV-2數據中心允許搜索、檢索、分析和實時化GenBank中可用的SARS-CoV-2數據。該頁面還提供了Betacoronavirus BLAST、PubMed中的SARS-CoV-2文章、SRA數據、NCBI SARS-CoV-2資源、數據集命令行和美國疾病控制與預防中心疫情信息的鏈接。
SARS-CoV-2 data hub results table has “Pangolin” column which is specific only to SARS-CoV-2 data. Pango lineages are determined by Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages). All SARS-CoV-2 GenBank records reprocessed nightly by Pangolin pipeline using UShER pipeline. The field is empty if the sequence was released after the Pangolin run that day. The field will show unclassifiable if the sequence does not meet requirements to be processed, and will show unassigned if the Pangolin tool is not able to determine the lineage for the sequence. You can view Pango version by downloading results in CSV format. You can view version strings in Pango Versions column. Each string includes the following sources: pangolin/pangolin-data/constellations/scorpio. For example, 4.0.6/1.8/v0.1.8/0.3.17.
SARS-CoV-2數據中心結果表有“Pangolin”列,該列僅適用于SARS-CoV-2數據。Pango譜系由Pangolin(名為全球爆發譜系的系統發育分配)決定。Pangolin管道使用UShER管道每晚重新處理所有SARS-CoV-2 GenBank記錄。如果該序列是在當天Pangolin run后發布的,則該字段為空。如果序列不符合要處理的要求,該字段將顯示為不可分類,如果Pangolin工具無法確定序列的譜系,則該字段顯示為未分配。您可以通過下載CSV格式的結果來查看Pango版本。您可以在PangoVersions列中查看版本字符串。每個字符串包括以下來源:pangolin/pangolin-data/constellations/scorpio。例如,4.0.6/1.8/v0.1.8/0.3.17。
There are two filters on “Refine Results” panel which are specific only to SARS-CoV-2 data:
“優化結果”面板上有兩個過濾器,僅適用于嚴重急性呼吸系統綜合征冠狀病毒2型數據:
Pango lineage(血統 ; 世系 ; 家系 ; 宗系) - allows to filter sequences a particular Pango lineage assigned. Pango lineages are determined by Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages). All SARS-CoV-2 GenBank records reprocessed nightly by Pangolin pipeline using UShER pipeline. The field is empty if the sequence is unclassifiable or if it was released after a UShER run that day. You can view Pango version by downloading results in CSV format. You can view version strings in Pango Versions column. Each string includes the following sources: pangolin/pangolin-data/constellations/scorpio. For example, 4.0.6/1.8/v0.1.8/0.3.17.
Pango譜系-允許過濾特定Pango譜系分配的序列。Pango譜系由Pangolin(名為全球爆發譜系的系統發育分配)決定。Pangolin管道使用UShER管道每晚重新處理所有嚴重急性呼吸系統綜合征冠狀病毒2型GenBank記錄。如果序列不可分類,或者在當天UShER運行后發布,則該字段為空。您可以通過下載CSV格式的結果來查看Pango版本。您可以在Pango Versions列中查看版本字符串。每個字符串包括以下來源:pangolin/pangolin-data/constellations/scorpio。例如,4.0.6/1.8/v0.1.8/0.3.17。
Random sampling - allows to filter sequences that were collected randomly for the purpose of baseline surveillance.(監控;(對犯罪嫌疑人或可能發生犯罪的地方的)監視) For example, this filter can be helpful if you would like to know which lineages are increasing in frequency, or are looking for a rough estimate of the infection rate in geographical regions where that data isn’t available yet. Random sampling of samples (e.g., not for vaccine breakthrough or localized outbreak investigation) allows to make these estimates better.
隨機采樣-允許過濾為基線監測目的隨機收集的序列。例如,如果你想知道哪些譜系的頻率在增加,或者正在尋找尚未獲得數據的地理區域的感染率的粗略估計,那么這個過濾器可能會有所幫助。樣本的隨機抽樣(例如,不用于疫苗突破或局部疫情調查)可以使這些估計更好。
NCBI Virus scanns SARS-CoV-2 GenBank records and any linked BioSample records. If either of the following field/value pairs are found, then the sequence is included in our “random sampling” filter:
==NCBI病毒掃描嚴重急性呼吸系統綜合征冠狀病毒2型GenBank記錄和任何相關的BioSample記錄。如果找到以下字段/值對中的任何一個,則序列將包含在我們的“隨機采樣”過濾器中: ==
GenBank: KEYWORDS - purposeofsampling:baselinesurveillance
BioSample: purpose of sequencing - Baseline surveillance
Select Include - to include all randomly sampled SARS-CoV-2 sequences to the Results Table.
Select Exclude - to exclude all randomly sampled SARS-CoV-2 sequences from the Results Table.
Select Only - to view only randomly sampled SARS-CoV-2 sequences.
For other filters description please, refer to the Refine tabled BLAST results via filters, since functionality is the same.關于其他過濾器的描述,請參閱通過過濾器優化表格BLAST結果,因為功能是相同的。
By clicking on “SARS-CoV-2 interactive dashboard” link on the announcement banner located on NCBI home page you can access geographic and time distribution graphs. You also can access it through SARS-CoV-2 data hub
通過點擊NCBI主頁上公告橫幅上的“嚴重急性呼吸系統綜合征冠狀病毒2型交互式儀表板”鏈接,您可以訪問地理和時間分布圖。您也可以通過嚴重急性呼吸系統綜合征冠狀病毒2型數據中心訪問它
Where can I find SARS-CoV-2 lineage-related information?
You can explore lineage geo-temporal and mutation data using the interactive SARS-CoV-2 Variants Overview dashboard which can be accessed through the announcement banner located on NCBI home page.
您可以使用交互式SARS-CoV-2變異株Overview dashboard來探索譜系、地理時間和突變數據,該面板可以通過NCBI主頁上的公告橫幅訪問。
Learn more using SARS-CoV-2 Variants Overview help center.
使用SARS-CoV-2變體了解更多信息概述幫助中心
View and download specific virus sequence sets 查看和下載特定的病毒序列集
Find specific data sets
Option 1:
From navigation menu Find data tab select the desired group of viruses: All viruses, Human viruses, Bacteriophages, New sequences (past one month) and Available SARS-CoV-2 sequences to view preselected data sets.
從導航菜單“查找數據”選項卡中,選擇所需的病毒組:所有病毒、人類病毒、噬菌體、新序列(過去一個月)和可用的嚴重急性呼吸系統綜合征冠狀病毒2型序列,以查看預選的數據集。
Bacteriophages include virus groups with the following NCBI Taxonomy IDs: 10472, 10656, 10659, 10841, 10860, 10877, 11989, 28883, 1714270, 12333, 79205, 2136181
噬菌體包括具有以下NCBI分類ID的病毒群:10472、10656、10659、10841、10860、10877、11989、28883、1714270、12333、79205、2136181
You can also access the selected virus groups through the “Popular Searchers” panel located on the Results Table. The following virus groups can be accessed:
您也可以通過結果表上的“Popular Searchers”面板訪問選定的病毒組。可以訪問以下病毒組:
Influenza virus - allows access to data for the following genera: Alphainfluenzavirus, Betainfluenzavirus, Gammainfluenzavirus and Deltainfluenzavirus. Capital letters A, B, C and D in brackets indicate the predominant species in each genus.
流感病毒-允許訪問以下屬的數據:甲型流感病毒、β流感病毒、γ流感病毒和德爾坦流感病毒。括號中的大寫字母A、B、C和D表示每個屬的優勢種。
Rotavirus 輪狀病毒;輪狀病毒疫苗;輪狀病毒屬;人類輪狀病毒
Dengue virus 登革熱病毒
West Nile virus 西尼羅河病毒
Zika virus 寨卡病毒
MERS coronavirus MERS冠狀病毒
Ebolavirus 埃博拉病毒
SARS-CoV-2 coronavirus
Option 2:
Click on button Search by sequence located in the central part of NCBI virus home page.
點擊NCBI病毒主頁中央部分的“按序列搜索”按鈕
Select the desired popular virus searches group button located beneath the text box.
選擇位于文本框下方的所需流行病毒搜索組按鈕
Both options will open the tabular display with the information about viruses from the selected group.
這兩個選項都將打開包含所選組中病毒信息的表格顯示。
Learn more how to compare results in tabular display, build multiple sequence alignment of selected results, build phylogenetic tree of selected results or refine the Results Table via filters.
了解更多如何在表格顯示中比較結果、建立所選結果的多序列比對、建立所選擇結果的系統發育樹或通過過濾器完善結果表。
Option 3:
Use NCBI Visual Data Dashboard to explore, view and download the massive, normalized datasets. Learn more.
使用NCBI可視化數據面板來探索、查看和下載大規模的標準化數據集。了解更多信息
Download sequences
To download sequences in a variety of formats (FASTA, accession list, the Results Table as CSV or XML), choose Nucleotide, Protein, or RefSeq Genomes tab and optionally select individual sequences to download.
要下載各種格式的序列(FASTA、accession列表、CSV或XML形式的結果表),請選擇核苷酸、蛋白質或RefSeq基因組選項卡,然后選擇要下載的單個序列。
You can also specify if you want to download a randomized or stratified randomized sequence set.
還可以指定是要下載隨機化序列集還是分層隨機化序列集。
Download a randomized sequence set 下載隨機序列集
Disclaimers 免責聲明
Please note, our current platform does not have the capability to generate repeatable randomized searches. We realize the importance of repeatability in the scientific community and are working diligently(勤奮地;勤勉地) to include this feature in our upcoming updates.
請注意,我們目前的平臺不具備生成可重復隨機搜索的能力。我們意識到可重復性在科學界的重要性,并正在努力將這一功能納入我們即將發布的更新中。
Downloading randomized subsets in either FASTA format or accession list is currently available for nucleotide, protein, and assembly records. We are working to make them available for coding region records in the future.
以FASTA格式或登錄列表下載隨機子集目前可用于核苷酸、蛋白質和組裝記錄。我們正在努力使它們在未來可用于編碼區域記錄。
A randomized subset of sequences (also referred to as ‘downsampling’) can allow a user to work with a smaller subset of sequences selected at random from a larger dataset, as an approximation of the full dataset
隨機序列子集(也稱為“下采樣”)可以允許用戶使用從較大數據集中隨機選擇的較小序列子集,作為完整數據集的近似值
A smaller, representative sequence set could make downstream analysis faster and less computationally intensive,( 密集的 ; 集約的 ; 徹底的 ; 十分細致的 ; 短時間內集中緊張進行的 😉 and still allow for interpretation of the larger collection. When downloading a randomized subset, the file name will include the date of download and the randomization seed used.
較小的、有代表性的序列集可以使下游分析更快、計算密集度更低,并且仍然允許對較大的集合進行解釋。下載隨機化子集時,文件名將包括下載日期和使用的隨機化種子。
Filters can be applied prior to downsampling as described here. After clicking the download button, a menu will allow you to select the download format, then a 2nd step will include an option to download a randomized subset of all the records in your filtered dataset. You can download a set of randomized sequences in a variety of formats (FASTA, accession list, Results table in CSV, or XML formats). Before opening the “Download” menu, please make sure to select the tab above the Results Table which corresponds to the data type you want to download. If you picked the “Nucleotide” tab, you will only be able to download randomized sequence data in FASTA Nucleotide, Nucleotide Accession list, XML, and CSV formats. If you chose the “Protein” tab, you will only be able to download randomized sequence data in FASTA Protein, Protein Accession List, XML, and CSV formats. If you picked the “RefSeq Genomes” tab, you will only be able to download randomized sequence data in Accession Assembly list, XML, and CSV formats.
如本文所述,可以在下采樣之前應用濾波器。單擊下載按鈕后,菜單將允許您選擇下載格式,然后第二步將包括下載過濾數據集中所有記錄的隨機子集的選項。您可以下載一組各種格式的隨機化序列(FASTA、accession列表、CSV或XML格式的結果表)。在打開“下載”菜單之前,請確保選擇與要下載的數據類型相對應的結果表上方的選項卡。如果選擇“核苷酸”選項卡,則只能下載FASTA核苷酸、核苷酸Accession列表、XML和CSV格式的隨機序列數據。如果選擇“蛋白質”選項卡,則只能下載FASTA蛋白質、蛋白質Accession列表、XML和CSV格式的隨機序列數據。如果您選擇了“RefSeq Genomes”選項卡,您將只能在Accession組件列表中下載隨機化序列數據
Download a stratified randomized sequence set
Randomized subsets of sequences can be stratified, meaning equally distributed over a field of categories (also referred to as ‘stratified downsampling’). This enables a user to work with a subset of sequences selected from a dataset, as an approximation of the full dataset, with equal numbers of sequences from a selected category, to approximate a larger sequence collection. The fields currently available for stratification are Country and Host. Before opening the “Download” menu, please make sure to select the tab above the Results table which corresponds to the data type you want to download. If you picked the “Nucleotide” tab, you will only be able to download randomized sequence data in FASTA Nucleotide, Nucleotide Accession list, XML, and CSV formats. If you chose the “Protein” tab, you will only be able to download randomized sequence data in FASTA Protein, Protein Accession List, XML, and CSV formats. If you picked the “RefSeq Genomes” tab, you will only be able to download randomized sequence data in Accession Assembly list, XML, and CSV formats.
序列的隨機子集可以被stratified,這意味著在一個類別字段上均勻分布(也稱為“分層下采樣”)。這使得用戶能夠使用從數據集中選擇的序列子集,作為整個數據集的近似值,使用來自所選類別的相等數量的序列,來近似更大的序列集合。目前可用于分層的字段有Country和Host。在打開“下載”菜單之前,請確保選擇“結果”表上方與您要下載的數據類型相對應的選項卡。如果選擇“核苷酸”選項卡,則只能下載FASTA核苷酸、核苷酸Accession列表、XML和CSV格式的隨機序列數據。如果選擇“蛋白質”選項卡,則只能下載FASTA蛋白質、蛋白質Accession列表、XML和CSV格式的隨機序列數據。如果您選擇了“RefSeq Genomes”選項卡,您將只能下載run
When downloading a stratified randomized subset, the file name will include the date of download and the randomization seed used.
下載分層隨機化子集時,文件名將包括下載日期和使用的隨機化種子。
Step by step instructions how to download sequences
如何下載序列的分步說明
Click Download button on the upper left side of NCBI Virus Results Table page.
點擊NCBI病毒結果表頁面左上角的下載按鈕
This will open the download menu consisting of 3 steps.
這將打開由3個步驟組成的下載菜單
Step 1: Select Data Type. 選擇數據類型
Nucleotide, protein, or coding region sequence (CDS) in FASTA format. Please note, that currently, randomized subsets are not available for coding region sequence (CDS) FASTA files.
FASTA格式的核苷酸、蛋白質或編碼區序列(CDS)。請注意,目前,隨機化子集不可用于編碼區域序列(CDS)FASTA文件。
Accession list for nucleotide, protein, or assembly records. Please note, currently, randomized subsets are not available for coding region sequence (CDS) accession lists.
核苷酸、蛋白質或組裝記錄的accession列表。請注意,目前,隨機化子集不可用于編碼區序列(CDS)accession列表。
Results Table – the contents of the Results Table, including the metadata, in CSV format (comma separated values table format) or in XML format.
結果表–結果表的內容,包括元數據,采用CSV格式(逗號分隔值表格式)或XML格式。
Step 2: Select Records. ==選擇記錄 ==
Select which records you would like to download:
選擇要下載的記錄:
only selected records, which were selected using checkboxes in the results table, all records in the results table, randomized subset of up to 2,000 records in the Results Table (for Nucleotide FASTA, Protein FASTA, Nucleotide Accession List, Protein Accession List, Assembly Accession List, CSV, and XML formats only).
僅使用結果表中的復選框選擇的選定記錄、結果表中所有記錄、結果表格中最多2000條記錄的隨機化子集(僅適用于核苷酸FASTA、蛋白質FASTA、核苷酸Accession列表、蛋白質Accession列表、Assembly Accession列表、CSV和XML格式)。
Randomized subsets contain a limited number of sequences randomly selected from all of the available sequences in the Results Table. As an option, you can choose to stratify your subset by a field, meaning that a roughly equal number of sequences will be randomly selected for each value of that field.
隨機化子集包含從結果表中的所有可用序列中隨機選擇的有限數量的序列。作為一種選擇,您可以選擇按字段對子集進行分層,這意味著將為該字段的每個值隨機選擇大致相等數量的序列。
To use options for randomized subsets, select ‘Download a randomized subset of all records’ and then select either a fully randomized subset or a stratified subset. Enter the total number of randomly sorted records that you want to download into the input box, and enter the category that you want to stratify across from the dropdown.
要使用隨機化子集的選項,請選擇“下載所有記錄的隨機化子集”,然后選擇完全隨機化子集或分層子集。在輸入框中輸入要下載的隨機排序記錄的總數,然后從下拉列表中輸入要分層的類別。
Randomized subsets contain a limited number of sequences randomly selected from all the available sequences in the Results Table. As an option, you can choose to stratify your subset by a field (up to 20 records country or per host), meaning that a roughly equal number of sequences will be randomly selected for each value of that field.
隨機化子集包含從結果表中的所有可用序列中隨機選擇的有限數量的序列。作為一種選擇,您可以選擇按字段(最多20個記錄國家或每個host)對子集進行分層,這意味著將為該字段的每個值隨機選擇大致相等數量的序列。
To use options for randomized subsets, select 'Download a randomized subset of records (up to 2,000) and then select either a fully randomized subset or a stratified subset. Enter the number of randomly sorted records (up to 2,000 for randomized subset and up 20 records per value for stratified subset) that you want to download into the input box and enter the category that you want to stratify across from the dropdown.
要使用隨機化子集的選項,請選擇“下載記錄的隨機化子集(最多2000個)”,然后選擇完全隨機化子集或分層子集。在輸入框中輸入要下載的隨機排序記錄數(隨機化子集最多2000條,分層子集每個值最多20條),然后從下拉列表中輸入要分層的類別。
The fields currently available for stratification are Country and Host.
目前可用于分層的字段有Country和Host。
Click “Next” and follow the prompts on the 3rd step in the menu to begin your download.
單擊“下一步”,然后按照菜單中第3步的提示開始下載。
Step 3.
If in step 1 you selected Sequence Data (FASTA format), in step 3 you can select FASTA definition line for the sequences that you are going to download.
如果在步驟1中選擇了序列數據(FASTA格式),則在步驟3中可以為要下載的序列選擇FASTA定義行。
In case if nucleotide or protein sequence data were selected in Step 1, the default FASTA definition line will be presented in the format (accession) | (GenBank title) and will include the GenBank sequence accession number and GenBank title:
如果在步驟1中選擇了核苷酸或蛋白質序列數據,則默認FASTA定義行將以(accession)|(GenBank標題)的格式顯示,并將包括GenBank序列accession號和GenBank標題:
AAO17794 |VP4 spike protein[Human rotavirus A].
In case if coding region option was selected, the default definition line format will be (nucleotide accession)(cds coordinates)| (GenBank title) and will include the related GenBank nucleotide sequence accession number, the indication that this is a coding region (cds), related GenBank protein accession number and related protein GenBank title:
如果選擇了編碼區選項,默認定義行格式將為(核苷酸accession)(cds 坐標)|(GenBank標題),并將包括相關GenBank核苷酸序列accession號、這是編碼區的指示(cds)、相關GenBank蛋白質accession號和相關蛋白質GenBank標題:
NC_045425.1:319…1659 |replication endonuclease [Thermus phage phiOH3].
You can change this default defline to fit your own needs by selecting Build custom sequence title option. Here you can select the following options (columns):
您可以通過選擇“構建自定義序列標題”選項來更改此默認定義以滿足自己的需要。您可以在此處選擇以下選項(列):
Assembly
SRA accession
Submitters
Release date
Pangolin
Random Sampling
Isolate
Species
Genus
Family
Molecule type
Length
Sequence type
Nucleotide Completeness
Genotype
Segment
Publication
Geo Location
Country
Host isolation source
Collection date
BioSample
BioProject
You can view description for each option in the description of the Results Table columns.
您可以在“結果表”列的說明中查看每個選項的說明。
If in Step 1 you selected the Accession list , you can download nucleotide, protein and and RefSeq genome assembly accession numbers with or without vesrsion number. For example: NC_045512 (without version) or NC_045512.2 (with version).
如果在步驟1中您選擇了Accession列表,您可以下載核苷酸、蛋白質和RefSeq基因組組裝的Accession號,包括或不包括vesrsion號。例如:NC_045512(不帶版本)或NC_045512.2(帶版本)。
If in Step 1 you selected the the Results Table in CSV format, the downloaded results will show all selected columns data. You can modify the selected columns and choose the columns you need in Step 3: Select columns to include in results set. You also can select if you want to include accession number with or without version number.
如果在步驟1中選擇了CSV格式的結果表,則下載的結果將顯示所有選定的列數據。您可以在步驟3中修改所選列并選擇需要的列:選擇要包含在結果集中的列。您還可以選擇是否要包括帶有或不帶有版本號的accession號。
NCBI Visual Data Dashboards
NCBI Virus visual data dashboards support data exploration and discovery across our normalized datasets. They can be used to identify trends in data and to select specific subsets based on those trends.
NCBI病毒可視化數據儀表板支持在我們的標準化數據集上進行數據探索和發現。它們可以用于識別數據中的趨勢,并根據這些趨勢選擇特定的子集。
Visual dashboards in Virus encompass: 病毒包圍中的可視化儀表板
- Dashboard located on the NCBI Virus Home page, which provides virus sequence statistics, Virus Taxonomy Sunburst Chart, and a Host Distribution Bar Chart.
位于NCBI病毒主頁上的儀表板,提供病毒序列統計信息、病毒分類Sunburst圖表和Host分布條形圖。 - Dashboard “Visual Filters for GenBank Sequences”, which displays data for specific viral taxa(分類群;分類單元;類群) and includes Sequence Type links with calculated virus sequence statistics, a Geographic Distribution choropleth(等值線圖 ) that shows the geographic distribution of sequence records based on collection locations, and time sliders(滑塊;滑動器;滑動條;游標;旅行者) for Collection and Release Date to dynamically show the number of sequences for each time interval.
儀表板“GenBank序列的視覺過濾器”,顯示特定病毒分類群的數據,包括與計算的病毒序列統計信息的序列類型鏈接,根據采集位置顯示序列記錄的地理分布的地理分布choropleth,以及采集和發布日期的時間滑塊,以動態顯示每個時間間隔的序列數量。
1: Home Page Dashboard
Access sequence data via buttons located in the top row for the following statistics:
通過位于最上面一行的按鈕訪問序列數據,以進行以下統計:
RefSeq Nucleotides - all viral nucleotide reference sequences available at NCBI (find more about reference sequences here).
RefSeq核苷酸-NCBI提供的所有病毒核苷酸參考序列(點擊此處了解更多參考序列)。
All Proteins - all NCBI viral protein sequences, including RefSeq proteins.
所有蛋白質-所有NCBI病毒蛋白質序列,包括RefSeq蛋白質。
All Nucleotides – all viral nucleotide records available at NCBI, including RefSeqs.
所有核苷酸——NCBI提供的所有病毒核苷酸記錄,包括參考序列。
RefSeq Proteins - all viral protein reference sequences available at NCBI.
RefSeq蛋白質-NCBI提供的所有病毒蛋白質參考序列。
Complete Nucleotides – all NCBI viral nucleotide sequences, where GenBank ASN.1 format contains the following descriptors: descr/molinfo/completeness=complete or there is a word ‘complete’ present in the record’s definition line (defline). It also includes complete reference records (RefSeqs).
完整核苷酸–所有NCBI病毒核苷酸序列,其中GenBank ASN.1格式包含以下描述符:descr/molifo/complety=完整,或者在記錄的定義行(defline)中存在“完整”一詞。它還包括完整的參考記錄(參考序列)。
Clicking on each button will show a results table with the corresponding sequences. Those results can be further refined by using filters for various sequence attributes (metadata) located on the left side of the Results Table page (learn more here).
單擊每個按鈕將顯示具有相應序列的結果表。通過使用位于結果表頁面左側的各種序列屬性(元數據)的過濾器,可以進一步細化這些結果(在此處了解更多信息)。
Explore virus taxonomy hierarchy using sunburst chart
Virus taxonomy can be explored via an interactive sunburst chart. The default view represents the classification for all available NCBI viral taxa. The inner layer (ring) represents four non-taxonomic groups of viruses: RNA viruses, DNA viruses, DNA/RNA viruses (which includes reverse-transcribing viruses), and Unclassified viruses. Only 4 levels of the whole hierarchy are visible on the plot at a given time.
==病毒分類法可以通過交互式的sunburst圖表進行探索。默認視圖表示所有可用NCBI病毒分類群的分類。內層(環)代表四類非分類病毒:RNA病毒、DNA病毒、DNA/RNA病毒(包括逆轉錄病毒)和未分類病毒。在給定的時間,整個層次結構中只有4個級別在繪圖上可見。 ==
To explore virus taxonomy, click on any slice (section) of any layer on the sunburst chart. This will trigger the plot to zoom into the selected taxa and display any additional taxa below the selection. Each viral taxa name is displayed on a corresponding slice or can be viewed in the hover-over tool-tip by placing your cursor over the slice. Dynamic breadcrumbs with viral taxa names are located above the sunburst plot. Breadcrumbs are also a secondary navigation system that show the location of the taxa in the hierarchy and clicking on one will refocus the plot on the selected taxa. You can also see breadcrumbs by hovering( 盤旋 ; 翱翔 ; 靠近 ; 躊躇,彷徨 ; 處于不穩定狀態 ; 停懸;空中懸停) over any slice in the sunburst. Clicking on the center of the sunburst chart will return you to the parent taxa.
要探索病毒分類,請單擊日光圖上任何層的任何切片(部分)。這將觸發繪圖放大到選定的分類群,并顯示所選分類群下方的任何其他分類群。每個病毒分類群的名稱都顯示在相應的切片上,或者可以通過將光標放在切片上,在懸停工具提示中查看。帶有病毒分類群名稱的動態面包屑位于sunburst圖上方。面包屑也是一種輔助導航系統,可以顯示分類群在層次結構中的位置,單擊其中一個會將繪圖重新聚焦到選定的分類群上。你也可以通過在陽光下的任何any slice懸停來看到breadcrumbs。點擊陽光爆發圖的中心將返回到父分類群。
Select specific virus taxonomy group and view statistics for specific sequence sets with quick links to download them選擇特定的病毒分類組,并通過快速下載鏈接查看特定序列集的統計信息
After selecting a specific taxonomy group on sunburst chart, you can view and explore the updated statistics in the top row of the dashboard.
在sunburst圖表上選擇特定的分類組后,您可以在面板的頂行查看和瀏覽更新的統計信息。
Select a host term from the Host Distribution bar chart and see the distribution of that host among the various viral taxa 從宿主分布條形圖中選擇一個宿主術語,并查看該宿主在各種病毒分類群中的分布
The interactive Host Distribution chart shows the distribution of virus host species. Each host bar is proportional to the number of virus sequences isolated from this host. The total number of virus sequences for each bar can be viewed by hovering over the bar.
交互式宿主分布圖顯示了病毒宿主物種的分布。每個宿主條與從該宿主分離的病毒序列的數量成比例。將鼠標懸停在條形圖上可以查看每個條形圖的病毒序列總數。
To select a host species, click on a bar or on a corresponding host name. This will highlight selected host, as well as all virus taxonomy groups containing sequences isolated from the selected host. Only one host can be selected at a time. Clicking on the selected host a second time will de-select it or you can use the Reset option available in the top right corner of the host chart. The statistics in the top row of the dashboard will be updated based on the selected host.
要選擇宿主物種,請單擊欄或相應的宿主名稱。這將突出顯示所選宿主,以及包含從所選宿主分離的序列的所有病毒分類組。一次只能選擇一個宿主。第二次單擊所選宿主將取消選擇它,或者您可以使用主機圖表右上角的“重置”選項。儀表板頂行中的統計信息將根據所選宿主進行更新。
You can search for a host species by scrolling the scrollbar on Host Distribution Chart, or by using keyboard combination “CTRL+F”.
您可以通過滾動“宿主分布圖”上的滾動條或使用鍵盤組合“CTRL+F”來搜索宿主物種。
You can reset Host Distribution chart the the original view by pressing on button “Reset” in the upper right corner of the chart.
您可以通過按下圖表右上角的“重置”按鈕將主機分布圖表重置為原始視圖。
Explore viral taxonomy hierarchy within a given taxon highlighted by the host selection 探索宿主選擇突出顯示的給定分類單元中的病毒分類層次
By clicking on a highlighted taxonomy group, you can further explore viral taxonomy hierarchy on sunburst chart. The lower layers that include taxa with sequences from the selected host will be highlighted. While zooming in, not all taxa will be highlighted if not all taxa include sequences from the selected host.
通過點擊突出顯示的分類組,您可以進一步探索sunburst圖表上的病毒分類層次結構。較低的層包括具有所選宿主序列的分類群將被突出顯示。放大時,如果不是所有分類群都包括來自所選宿主的序列,則不是所有的分類群都會高亮顯示。
2: “Visual Filters for GenBank Sequences” Dashboard
“GenBank序列的可視化過濾器”面板
“Visual Filters for GenBank Sequences” is a dashboard which enables filtering of your virus search results based on important attributes, like geographic location, collection, and release date, using visualized, graphical filters.
“GenBank序列的可視化過濾器”是一個儀表板,可以使用可視化的圖形過濾器,根據地理位置、收藏和發布日期等重要屬性過濾病毒搜索結果。
How to access “Visual Filters for GenBank Sequences”?
==如何訪問“GenBank序列的可視化過濾器”? ==
There are several ways to access Visual Filters for GenBank Sequences.
有幾種方法可以訪問GenBank序列的視覺過濾器。
-
From NCBI Virus home page follow the steps below:
從NCBI病毒主頁,按照以下步驟操作:
Select ‘Search by Virus’.
選擇“按病毒搜索”
Type virus name, then select an option from the autocomplete list.
鍵入病毒名稱,然后從自動完成列表中選擇一個選項
View the results table for your virus of interest.
查看您感興趣的病毒的結果表
Find a tab named “Visual Filters for GenBank Sequences” above the results table.
在結果表上方找到一個名為“GenBank序列的視覺過濾器”的選項卡
Click on the tab “Visual Filters for GenBank Sequences” to switch to visual filtering.
點擊選項卡“GenBank序列的視覺過濾器”切換到視覺過濾 -
From the Results Table page access the “Visual Filters for GenBank Sequences” tab in the header above the results table.
從結果表頁面訪問結果表上方標題中的“GenBank序列的視覺過濾器”選項卡
Please note, if any filters were applied on the results table, switching to the “Visual Filters for GenBank Sequences” dashboard will reset all the filters except for the virus name.
請注意,如果在結果表上應用了任何篩選器,則切換到“GenBank序列的視覺篩選器”面板將重置除病毒名稱之外的所有篩選器
3. By adding NCBI Virus “taxid” number directly to the page URL: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/dashboard?taxid=
For example, for Zika virus (taxid=64320), enter the following URL: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/dashboard?taxid=64320
How to use Visual Filters for GenBank Sequences?
Visual filters allow to filter your search by geographic location, collection time, and release time. Each filtering feature on the dashboard is interactive and connective, so when a filter is applied in one feature, it is also reflected in the other features. When using these filters, the top summary section is automatically updated to reflect the number of records in the NCBI RefSeq, Nucleotide, and Protein sets in the NCBI Virus database that fit the combined conditions of your search.
==視覺過濾器允許按地理位置、收集時間和發布時間過濾您的搜索。儀表板上的每個過濾功能都是交互式的和連接的,因此當過濾器應用于一個功能時,它也會反映在其他功能中。使用這些過濾器時,頂部摘要部分會自動更新,以反映NCBI病毒數據庫中符合搜索組合條件的NCBI RefSeq、Nucleotide和Protein集合中的記錄數。 ==
Geographic Distribution choropleth map allows to select sequence records collected at that location.
地理分布choropleth地圖允許選擇在該位置收集的序列記錄
Click on a selected geographic location to filter sequences by collection location.
單擊選定的地理位置以按采集位置篩選序列
Map allows to select multiple international locations or multiple locations in the USA. The selections will reset if you change between the International and USA maps.
地圖允許選擇多個國際位置或在美國的多個位置。如果您在國際和美國地圖之間切換,則會重置選擇。
To select a single location, start typing the name of the region and select the one from a dropdown list.
要選擇一個位置,請開始鍵入區域名稱,然后從下拉列表中選擇一個
Please note, that color shades on the map are based on nucleotide record numbers for the virus; darker shades correspond to higher numbers, and lighter shades - to lower numbers.
請注意,地圖上的色調是基于病毒的核苷酸記錄編號;較深的陰影對應較高的數字,較淺的陰影對應較低的數字。
By using the Collection Time and Release Time sliders, you can view a histogram of distribution of nucleotide record numbers in different time intervals.
通過使用“采集時間”和“發布時間”滑塊,可以查看不同時間間隔內核苷酸記錄數分布的直方圖。
Use the sliders or click date columns to select records by the sample collection date or the GenBank release date. Weekly, monthly and yearly time intervals can be selected.
使用滑塊或單擊日期列,按樣本采集日期或GenBank發布日期選擇記錄。可以選擇每周、每月和每年的時間間隔
Collection Time graph:采集時間圖
Select collection date range of the samples by either selecting one time interval bar or dragging the ends of the sliders.
通過選擇一個時間間隔條或拖動滑塊的末端來選擇采樣的采集日期范圍。
Slider displays data from the earliest collection year for this virus data to the current year.
滑塊顯示從該病毒數據的最早收集年份到當前年份的數據
If the collection time for a record is incomplete, we collapse it like this: If the record only has a year, the record is shown as Jan 1 of that year. If the record only has year and month, the record is shown on the first day of that month.
如果一個記錄的收集時間不完整,我們會這樣折疊它:如果該記錄只有一年,則該記錄顯示為當年的1月1日。如果記錄只有年份和月份,則記錄顯示在該月的第一天。
Release Time graph:發布時間圖
Select release date range of the samples by either selecting one bar or dragging the ends of the sliders.
通過選擇一個條或拖動滑塊的末端來選擇樣本的發布日期范圍
Slider displays data from the year this virus data was released first time to the current year.
滑塊顯示從該病毒數據首次發布的年份到當前年份的數據
You can also select different bi-yearly intervals, which will show you the portion of the graph for that time frame. However, you still have to click on the bar or select the time interval with the sliders to apply filtering.
您還可以選擇不同的兩年一次的間隔,這將顯示該時間段的圖形部分。但是,您仍然需要單擊欄或使用滑塊選擇時間間隔來應用過濾。
The top header of the Dashboard includes a link back to the Results Table page where you can review your results in tabular format, apply more filters, and download FASTA sequences, an accession list, or the table itself.
儀表板的頂部標題包含一個返回結果表頁面的鏈接,您可以在其中以表格格式查看結果,應用更多篩選器,并下載FASTA序列、accession列表或表本身。
Note, that all filters applied in the graphical view will remain in effect on the Result Table page. However, if you switch from the Results Table page back to the visual filters, all applied filters will be lost, except for the selected virus name.
請注意,圖形視圖中應用的所有過濾器在“結果表”頁面上將保持有效。但是,如果從“結果表”頁面切換回視覺篩選器,則除選定的病毒名稱外,所有應用的篩選器都將丟失。
How to find, view and download HIV-1 sequences and related metadata?
Public HIV-1 nucleotide and protein sequence data are displayed in HIV-1 data hub.
公共HIV-1核苷酸和蛋白質序列數據顯示在HIV-1數據中心
HIV-1 data hub can be accessed by typing and selecting HIV-1 in Search by virus name or taxonomy input form.
通過在“按病毒名稱搜索”或分類法輸入表單中鍵入并選擇HIV-1,可以訪問HIV-1數據中心
Alternatively, it can be accessed from NCBI home page by typing HIV-1 in search window. This will open another page with HIV-1 virus genome assembly information. Press on NCBI virus button to access HIV-1 data hub.
或者,可以通過在搜索窗口中鍵入HIV-1從NCBI主頁訪問它。這將打開另一個包含HIV-1病毒基因組組裝信息的頁面。按下NCBI病毒按鈕訪問HIV-1數據中心。
These are early days for HIV-1 data support in NCBI Virus. Please stay tuned for updates and further details relevant to HIV-1.
這是在NCBI病毒中支持HIV-1數據的早期階段。請繼續關注與HIV-1相關的更新和更多詳細信息。