Dali(Distance-matrix ALIgnment)是一種廣泛使用的蛋白質結構比對工具,主要用于比較蛋白質三維結構之間的相似性。它通過計算蛋白質結構之間的距離矩陣來評估結構之間的相似性,并生成比對結果。
1. 安裝
wget http://ekhidna2.biocenter.helsinki.fi/dali/DaliLite.v5.tar.gz ./
tar -zxvf DaliLite.v5.tar.gzcd /home/you/DaliLite.v5/bin
make clean
make # ignore Warnings
http://ekhidna2.biocenter.helsinki.fi/dali/README.v5.html
2. 準備文件夾,構建Dali數據庫
# 存放dali數據庫數據
mkdir -p dali/dali_query_db dali/dali_target_db
cd dali# 存放原始的pdb文件(或ent文件)
mkdir query_struct target_struct### 拷貝結構文件到相應的目錄
cp ../rag2_structures/* query_struct/
cp ../hits_AF_structure/* target_struct/
3. ?構建名稱映射
Dali要求結構文件的命名滿足pdb數據庫中結構文件的命名規范,如果是AF預測的結構或自己命名的結構,需要轉換。
轉換腳本
vim prepare_ln_for_dali_db.sh
內容:
#!/bin/bash# Usage: ./prepare_pdb_links.sh /path/to/src_dir [prefix]
# Example: ./prepare_pdb_links.sh /home/user/structures rag2# Input: source directory containing .pdb files
SRC_DIR="$1"
PREFIX="$2"if [[ -z "$SRC_DIR" || ! -d "$SRC_DIR" ]]; thenecho "? Please provide a valid source directory containing .pdb files."echo "Usage: $0 /path/to/src_dir [prefix]"exit 1
fi# Use directory name as default prefix if not provided
if [[ -z "$PREFIX" ]]; thenPREFIX=$(basename "$SRC_DIR")
fi# Output files located in the same directory as SRC_DIR
LINK_DIR="$SRC_DIR/${PREFIX}_renamed_pdbs"
LIST_FILE="$SRC_DIR/${PREFIX}_pdb_list.txt"
MAPPING_FILE="$SRC_DIR/${PREFIX}_pdb_id_mapping.tsv"mkdir -p "$LINK_DIR"
> "$LIST_FILE"
> "$MAPPING_FILE"generate_pdb_id() {local chars=( {0..9} {A..Z} )local id=""for ((i = 0; i < 4; i++)); doid="${id}${chars[$(( RANDOM % ${#chars[@]} ))]}"doneecho "$id"
}used_ids=()for pdb_file in "$SRC_DIR"/*.pdb; do[[ -e "$pdb_file" ]] || continue # Skip if no pdb filesorig_name=$(basename "$pdb_file")while true; donew_id=$(generate_pdb_id)if [[ ! " ${used_ids[@]} " =~ " ${new_id} " ]]; thenused_ids+=("$new_id")breakfidonenew_name="pdb${new_id}.ent"ln -sf "$(realpath "$pdb_file")" "$LINK_DIR/$new_name"echo "$LINK_DIR/$new_name" >> "$LIST_FILE"echo -e "$new_name\t$orig_name" >> "$MAPPING_FILE"
doneecho "Soft links created in: $LINK_DIR"
echo "PDB list: $LIST_FILE"
echo "ID mapping: $MAPPING_FILE"
改變模式
chmod +x prepare_ln_for_dali_db.sh
運行:
./prepare_ln_for_dali_db.sh query_struct query./prepare_ln_for_dali_db.sh target_struct target
4. 構建Dali數據庫
import.pl --pdblist query_struct/query_pdb_list.txt --dat dali_query_dbimport.pl --pdblist target_struct/target_pdb_list.txt --dat dali_target_db
5. 準備搜索列表
ls dali_query_db |awk -F '.' '{print $1}' > query.lstls dali_target_db |awk -F '.' '{print $1}' > target.lst
注:query.lst 、target.lst 每一行為結構名+鏈名, 如:0MDTA,0MDT:映射的結構名,符合pdb的命令規范(隨機定的,不是pdb的id號),A:表示A鏈
6. 搜索同源結構
dali.pl --query query.lst --db target.lst --dat1 dali_query_db --dat2 dali_target_db#dali.pl --query query.lst --db target.lst --dat1 dali_query_db --dat2 dali_target_db --np 64
注:不能并行計算,,如需要并行計算需要在安裝時:
# if using openmpi (check OPENMPI_PATH in Makefile)
make parallel
結果在 query_name.txt 和?query_name.html文件中,如:1SDPA.txt,?1SDPA.html
參考:
https://ekhidna2.biocenter.helsinki.fi/dali/