biopython中文指南_Biopython新手指南-第1部分

biopython中文指南

When you hear the word Biopython what is the first thing that came to your mind? A python library to handle biological data…? You are correct! Biopython provides a set of tools to perform bioinformatics computations on biological data such as DNA data and protein data. I have been using Biopython ever since I started studying bioinformatics and it has never let me down with its functions. It is an amazing library which provides a wide range of functions from reading large files with biological data to aligning sequences. In this article, I will introduce you to some basic functions of Biopython which can make implementations much easier with just a single call.

當您聽到Biopython一詞時,您想到的第一件事是什么? 一個處理生物學數據的python庫...? 你是對的! Biopython提供了一套工具,可對DNA數據和蛋白質數據等生物學數據進行生物信息學計算。 自從我開始研究生物信息學以來,我就一直在使用Biopython,但是它從來沒有讓我失望過它的功能。 它是一個了不起的庫,它提供了廣泛的功能,從讀取帶有生物學數據的大文件到比對序列。 在本文中,我將向您介紹Biopython的一些基本功能,這些功能只需一次調用就可以使實現更加容易。

入門 (Getting started)

The latest version available when I’m writing this article is biopython-1.77 released in May 2020.

在我撰寫本文時,可用的最新版本是2020年5月發布的biopython-1.77

You can install Biopython using pip

您可以使用pip安裝Biopython

pip install biopython

or using conda.

或使用conda 。

conda install -c conda-forge biopython

You can test whether Biopython is properly installed by executing the following line in the python interpreter.

您可以通過在python解釋器中執行以下行來測試Biopython是否已正確安裝。

import Bio

If you get an error such as ImportError: No module named Bio then you haven’t installed Biopython properly in your working environment. If no error messages appear, we are good to go.

如果您收到諸如ImportError: No module named Bio類的錯誤,則說明您的工作環境中沒有正確安裝Biopython。 如果沒有錯誤消息出現,我們很好。

In this article, I will be walking you through some examples where Seq, SeqRecord and SeqIO come in handy. We will go through the functions that perform the following tasks.

在本文中,我將向您介紹一些示例,其中SeqSeqRecordSeqIO會派上用場。 我們將介紹執行以下任務的功能。

  1. Creating a sequence

    創建一個序列
  2. Get the reverse complement of a sequence

    獲取序列的反補
  3. Count the number of occurrences of a nucleotide

    計算核苷酸的出現次數
  4. Find the starting index of a subsequence

    查找子序列的起始索引
  5. Reading a sequence file

    讀取序列文件
  6. Writing sequences to a file

    將序列寫入文件
  7. Convert a FASTQ file to FASTA file

    將FASTQ文件轉換為FASTA文件
  8. Separate sequences by ids from a list of ids

    按ID從ID列表中分離序列

1.創建一個序列 (1. Creating a sequence)

To create your own sequence, you can use the Biopython Seq object. Here is an example.

要創建自己的序列,可以使用Biopython Seq對象。 這是一個例子。

>>> from Bio.Seq import Seq
>>> my_sequence = Seq("ATGACGTTGCATG")
>>> print("The sequence is", my_sequence)
The sequence is ATGACGTTGCATG
>>> print("The length of the sequence is", len(my_sequence))
The length of the sequence is 13

2.獲得序列的反補 (2. Get the reverse complement of a sequence)

You can easily get the reverse complement of a sequence using a single function call reverse_complement().

您可以使用單個函數reverse_complement()輕松獲得序列的反向補碼。

>>> 
The reverse complement if the sequence is CATGCAACGTCAT

3.計算核苷酸的出現次數 (3. Count the number of occurrences of a nucleotide)

You can get the number of occurrence of a particular nucleotide using the count() function.

您可以使用count()函數獲得特定核苷酸的出現count()

>>> print("The number of As in the sequence", my_sequence.count("A"))
The number of As in the sequence 3

4.查找子序列的起始索引 (4. Find the starting index of a subsequence)

You can find the starting index of a subsequence using the find() function.

您可以使用find()函數find()序列的起始索引。

>>> print("Found TTG in the sequence at index", my_sequence.find("TTG"))
Found TTG in the sequence at index 6

5.讀取序列文件 (5. Reading a sequence file)

Biopython’s SeqIO (Sequence Input/Output) interface can be used to read sequence files. The parse() function takes a file (with a file handle and format) and returns a SeqRecord iterator. Following is an example of how to read a FASTA file.

Biopython的SeqIO (序列輸入/輸出)接口可用于讀取序列文件。 parse()函數獲取一個文件(具有文件句柄和格式),并返回一個SeqRecord迭代器。 以下是如何讀取FASTA文件的示例。

from Bio import SeqIOfor record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)

record.id will return the identifier of the sequence. record.seq will return the sequence itself. record.description will return the sequence description.

record.id將返回序列的標識符。 record.seq將返回序列本身。 record.description將返回序列描述。

6.將序列寫入文件 (6. Writing sequences to a file)

Biopython’s SeqIO (Sequence Input/Output) interface can be used to write sequences to files. Following is an example where a list of sequences are written to a FASTA file.

Biopython的SeqIO (序列輸入/輸出)接口可用于將序列寫入文件。 以下是將序列列表寫入FASTA文件的示例。

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import generic_dnasequences = ["AAACGTGG", "TGAACCG", "GGTGCA", "CCAATGCG"]records = (SeqRecord(Seq(seq, generic_dna), str(index)) for index,seq in enumerate(sequences))with open("example.fasta", "w") as output_handle:
SeqIO.write(

This code will result in a FASTA file with sequence ids starting from 0. If you want to give a custom id and a description you can create the records as follows.

此代碼將生成一個FASTA文件,其序列ID從0開始。如果要提供自定義ID和說明,可以按以下方式創建記錄。

sequences = ["AAACGTGG", "TGAACCG", "GGTGCA", "CCAATGCG"]
new_sequences = []i=1for
record = SeqRecord(
new_sequences.append(record)with open("example.fasta", "w") as output_handle:
SeqIO.write(

The SeqIO.write() function will return the number of sequences written.

SeqIO.write()函數將返回寫入的序列數。

7.將FASTQ文件轉換為FASTA文件 (7. Convert a FASTQ file to FASTA file)

We need to convert DNA data file formats in certain applications. For example, we can do file format conversions from FASTQ to FASTA as follows.

我們需要在某些應用程序中轉換DNA數據文件格式。 例如,我們可以按照以下步驟進行從FASTQ到FASTA的文件格式轉換。

from Bio import SeqIOwith open("path/to/fastq/file.fastq", "r") as input_handle, open("path/to/fasta/file.fasta", "w") as output_handle:    sequences = SeqIO.parse(input_handle, "fastq")        
count = SeqIO.write(sequences, output_handle, "fasta") print("Converted %i records" % count)

If you want to convert a GenBank file to FASTA format,

如果要將GenBank文件轉換為FASTA格式,

from Bio import SeqIO
with open("

sequences = SeqIO.parse(input_handle, "genbank")
count = SeqIO.write(sequences, output_handle, "fasta")
print("Converted %i records" % count)

8.將ID序列與ID列表分開 (8. Separate sequences by ids from a list of ids)

Assume that you have a list of sequence identifiers in a file named list.lst where you want to separate the corresponding sequences from a FASTA file. You can run the following and write those sequences to a file.

假設您有一個名為list.lst的文件中的序列標識符列表,您想在其中將相應的序列與FASTA文件分開。 您可以運行以下命令,并將這些序列寫入文件。

from Bio import SeqIOids = set(x[:-1] for x in open(path+"list.lst"))with open(path+'list.fq', mode='a') as my_output:

for seq in SeqIO.parse(path+"list_sequences.fq", "fastq"):

if seq.id in ids:
my_output.write(seq.format("fastq"))

最后的想法 (Final Thoughts)

Hope you got an idea of how to use Seq, SeqRecord and SeqIO Biopython functions and will be useful for your research work.

希望您對如何使用SeqSeqRecordSeqIO Biopython函數有所了解,并且對您的研究工作很有用。

Thank you for reading. I would love to hear your thoughts. Stay tuned for the next part of this article with more usages and Biopython functions.

感謝您的閱讀。 我很想聽聽您的想法。 請繼續關注本文的下一部分,了解更多用法和Biopython函數。

Cheers, and stay safe!

干杯,保持安全!

翻譯自: https://medium.com/computational-biology/newbies-guide-to-biopython-part-1-9ec82c3dfe8f

biopython中文指南

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/387964.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/387964.shtml
英文地址,請注明出處:http://en.pswp.cn/news/387964.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

整合后臺服務和驅動代碼注入

整合后臺服務和驅動代碼注入 Home鍵的驅動代碼: /dev/input/event1: 0001 0066 00000001 /dev/input/event1: 0000 0000 00000000 /dev/input/event1: 0001 0066 00000000 /dev/input/event1: 0000 0000 00000000 對應輸入的驅動代碼: sendevent/dev/…

Java作業09-異常

6. 為如下代碼加上異常處理 byte[] content null; FileInputStream fis new FileInputStream("testfis.txt"); int bytesAvailabe fis.available();//獲得該文件可用的字節數 if(bytesAvailabe>0){content new byte[bytesAvailabe];//創建可容納文件大小的數組…

為數據計算提供強力引擎,阿里云文件存儲HDFS v1.0公測發布

2019獨角獸企業重金招聘Python工程師標準>>> 在2019年3月的北京云棲峰會上,阿里云正式推出全球首個云原生HDFS存儲服務—文件存儲HDFS,為數據分析業務在云上提供可線性擴展的吞吐能力和免運維的快速彈性伸縮能力,降低用戶TCO。阿里…

對食材的敬畏之心極致產品_這些數據科學產品組合將給您帶來敬畏和啟發(2020年中的版本)

對食材的敬畏之心極致產品重點 (Top highlight)為什么選擇投資組合? (Why portfolios?) Data science is a tough field. It combines in equal parts mathematics and statistics, computer science, and black magic. As of mid-2020, it is also a booming fiel…

android模擬用戶輸入

目錄(?)[-] geteventsendeventinput keyevent 本文講的是通過使用代碼,可以控制手機的屏幕和物理按鍵,也就是說不只是在某一個APP里去操作,而是整個手機系統。 getevent/sendevent getevent&sendevent 是Android系統下的一個工具&#x…

真格量化常見報錯信息和Debug方法

1.打印日志 1.1 在代碼中添加運行到特定部分的提示: 如果我們在用戶日志未能看到“調用到OnQuote事件”文字,說明其之前的代碼就出了問題,導致程序無法運行到OnQuote函數里的提示部分。解決方案為仔細檢查該部分之前的代碼是否出現問題。 1.2…

向量積判斷優劣弧_判斷經驗論文優劣的10條誡命

向量積判斷優劣弧There are a host of pathologies associated with the current peer review system that has been the subject of much discussion. One of the most substantive issues is that results reported in leading journals are commonly papers with the most e…

自定義PopView

改代碼是參考一個Demo直接改的&#xff0c;代碼中有一些漏洞&#xff0c;如果發現其他的問題&#xff0c;可以下方直接留言 .h文件 #import <UIKit/UIKit.h> typedef void(^PopoverBlock)(NSInteger index); interface CustomPopView : UIView //property(nonatomic,copy…

線控耳機監聽

當耳機的媒體按鍵被單擊后&#xff0c;Android系統會發出一個廣播&#xff0c;該廣播的攜帶者一個Action名為MEDIA_BUTTON的Intent。監聽該廣播便可以獲取手機的耳機媒體按鍵的單擊事件。 在Android中有個AudioManager類&#xff0c;該類會維護MEDIA_BUTTON廣播的分發&#xf…

當編程語言掌握在企業手中,是生機還是危機?

2019年4月&#xff0c;Java的收費時代來臨了&#xff01; Java是由Sun微系統公司在1995年推出的編程語言&#xff0c;2010年Oracle收購了Sun之后&#xff0c;Java的所有者也就自然變成了Oracle。2019年&#xff0c;Oracle宣布將停止Java 8更新的免費支持&#xff0c;未來Java的…

sql如何處理null值_如何正確處理SQL中的NULL值

sql如何處理null值前言 (Preface) A friend who has recently started learning SQL asked me about NULL values and how to deal with them. If you are new to SQL, this guide should give you insights into a topic that can be confusing to beginners.最近開始學習SQL的…

名言警句分享

“當你想做一件事&#xff0c;卻無能為力的時候&#xff0c;是最痛苦的。”基拉大和轉載于:https://www.cnblogs.com/yuxijun/p/9986489.html

文字創作類App分享-簡書

今天我用Mockplus做了一套簡書App的原型&#xff0c;這是一款文字創作類的App&#xff0c;用戶通過寫文、點贊等互動行為&#xff0c;提高自己在社區的影響力&#xff0c;打造個人品牌。我運用了Mockplus基礎組件、交互組件、移動組件等多個組件庫&#xff0c;簡單拖拽&#xf…

數據可視化 信息可視化_動機可視化

數據可視化 信息可視化John Snow’s map of Cholera cases near London’s Broad Street.約翰斯諾(John Snow)在倫敦寬街附近的霍亂病例地圖。 John Snow, “the father of epidemiology,” is famous for his cholera maps. These maps represent so many of our aspirations …

android 接聽和掛斷實現方式

轉載▼標簽&#xff1a; android 接聽 掛斷 it 分類&#xff1a; android應用技巧 參考&#xff1a;android 來電接聽和掛斷 支持目前所有版本 注意&#xff1a;android2.3版本及以上不支持下面的自動接聽方法。 &#xff08;會拋異常&#xff1a;java.lang.Securi…

Eclipse External Tool Configration Notepad++

Location&#xff1a; C:\Program Files\Notepad\notepad.exe Arguments&#xff1a;  ${resource_loc} 轉載于:https://www.cnblogs.com/rgqancy/p/9987610.html

利用延遲關聯或者子查詢優化超多分頁場景

2019獨角獸企業重金招聘Python工程師標準>>> MySQL并不是跳過offset行&#xff0c;而是取offsetN行&#xff0c;然后返回放棄前offset行&#xff0c;返回N行&#xff0c;那當offset 特別大的時候&#xff0c;效率就非常的低下&#xff0c;要么控制返回的總頁數&…

客戶流失_了解客戶流失

客戶流失Big Data Analytics within a real-life example of digital music service數字音樂服務真實示例中的大數據分析 Customer churn is a key predictor of the long term success or failure of a business. It is the rate at which customers are leaving your busine…

Java 動態加載class 并反射調用方法

反射方法&#xff1a; public static void main(String[] args) throws Exception { File filenew File("D:/classtest");//類路徑(包文件上一層) URL urlfile.toURI().toURL(); ClassLoader loadernew URLClassLoader(new URL[]{url});//創…

Nginx:Nginx limit_req limit_conn限速

簡介 Nginx是一個異步框架的Web服務器&#xff0c;也可以用作反向代理&#xff0c;負載均衡器和HTTP緩存&#xff0c;最常用的便是Web服務器。nginx對于預防一些攻擊也是很有效的&#xff0c;例如CC攻擊&#xff0c;爬蟲&#xff0c;本文將介紹限制這些攻擊的方法&#xff0c;可…