fastGEO v1.7.0 大更新,支持PCA、差異分析、火山圖、熱圖、差異箱線圖、去批次等分析

前言

之前一篇文章【fastGEO V1.6.1 這個版本強的可怕,GEO數據自動下載、探針注釋、Shiny App】介紹了fastGEO用于GEO數據下載和探針注釋的核心功能。

雖然是付費50獲取安裝包(剛開始是20),但也深受歡迎,說明這個R包的確好用。

尤其是我自己,用了開發的這個R包后做數據挖掘方便了太多太多。

需要的小伙伴可以直接聯系我付費50獲取,價格只會漲不會跌,購買后永久免費獲取最新版本。

也收到了一些用戶的反饋,希望加入一些分析功能,其實這些在之前的推文【fastGEO v1.01,快速下載GEO表達譜、差異分析、火山圖、熱圖】也展示過,只不過還沒有放在R包里。

所以這一次更新,把PCA、差異分析、火山圖、熱圖、差異箱線圖、去批次這些分析通通加到R包里,僅使用我的R包就可以快速完成這些流程。

更新展示

格式懶得改了,可以移步到Gitee里在瀏覽,或點擊【閱讀原文】,效果更佳:fastGEO安裝和使用教程。

下游分析

get_GEO_group
  • 根據樣本信息提取并整理表達譜和分組信息
  • 獲取表達譜和分組信息后就可以做一系列的分析了
  • 指定作為分組依據的臨床信息表的列名, 及每個分組的匹配模式和分組名稱即可
  • 可指定兩組或多組
obj = download_GEO("GSE18105", out_dir = "test/00_GEO_data_GSE18105")
Find local annotation file, will be used preferably!
INFO [2025-08-02 15:00:34] Step1: Download GEO data ...
INFO [2025-08-02 15:00:34] Querying dataset: GSE18105 ...- Use local curl- Found 1 GPL- Found 111 samples, 39 metas.- Writting sample_anno to test/00_GEO_data_GSE18105/GSE18105_sample_anno.csv - Normalize between arrays ...- Successed, file save to test/00_GEO_data_GSE18105/GSE18105_GPL570.RData.
INFO [2025-08-02 15:00:55] Step2: Annotate probe GPL570 ...
INFO [2025-08-02 15:00:55] Use built-in annotation file ...
- All porbes matched!
- All porbes annotated!
INFO [2025-08-02 15:00:55] Removing duplicated genes by method: max ...
INFO [2025-08-02 15:01:20] Done.
expM = obj$expM
sample_anno = obj$sample_anno
hd(expM)
dim: 22881 × 111 
A data.frame: 5 × 5
GSM452552GSM452553GSM452554GSM452555GSM452556
<dbl><dbl><dbl><dbl><dbl>
A1BG 3.8841673.8059433.7793923.9612853.980595
A1BG-AS1 4.7556355.0219914.9928245.0026804.811076
A1CF 7.6039068.9070488.0351918.6039417.788007
A2M10.9725957.6647119.7034059.6846169.976007
A2M-AS1 4.3499034.1279304.2403154.5377124.449675
hd(sample_anno)
dim: 111 × 39 
A data.frame: 5 × 5
titlegeo_accessionstatussubmission_datelast_update_date
<chr><chr><chr><chr><chr>
GSM452552patient 100, cancer, LCMGSM452552Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452553patient 101, cancer, LCMGSM452553Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452554patient 102, cancer, LCMGSM452554Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452555patient 103, cancer, LCMGSM452555Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452556patient 104, cancer, LCMGSM452556Public on Feb 04 2010Sep 14 2009Aug 28 2018
table_GEO_clinical(obj$sample_anno)
$characteristics_ch1metastasis: metastasis metastasis: metastatic recurrence 26                                18 metastasis: none 67 $characteristics_ch1.1tissue: cancer, homogenized         tissue: cancer, LCM 17                          77 
tissue: normal, homogenized 17 
# 兩組
group_list = get_GEO_group(obj, group_name = "characteristics_ch1.1", "cancer, homogenized" = "Cancer", "normal, homogenized" = "Normal")
group = group_list$group 
table(group)
group
Cancer Normal 17     17 
SID = group_list$SID
hd(SID)
dim: 34 
  1. 'GSM452646'
  2. 'GSM452647'
  3. 'GSM452648'
  4. 'GSM452649'
  5. 'GSM452650'
expM = group_list$expM
hd(expM)
dim: 22881 × 34 
A data.frame: 5 × 5
GSM452646GSM452647GSM452648GSM452649GSM452650
<dbl><dbl><dbl><dbl><dbl>
A1BG4.2183754.102014 4.0801644.0171333.958630
A1BG-AS14.8134364.965083 4.8545584.6858864.796381
A1CF8.8662997.978485 8.9470889.3733719.222967
A2M6.5133928.51780811.5186168.2768357.645857
A2M-AS13.8807563.910804 5.1145584.0815893.817199
sample_anno = group_list$sample_anno
hd(sample_anno)
dim: 34 × 39 
A data.frame: 5 × 5
titlegeo_accessionstatussubmission_datelast_update_date
<chr><chr><chr><chr><chr>
GSM452646patient 006, cancer, homogenizedGSM452646Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452647patient 011, cancer, homogenizedGSM452647Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452648patient 024, cancer, homogenizedGSM452648Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452649patient 027, cancer, homogenizedGSM452649Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452650patient 028, cancer, homogenizedGSM452650Public on Feb 04 2010Sep 14 2009Aug 28 2018
# 多組
group_list = get_GEO_group(obj, group_name = "characteristics_ch1.1", "cancer, homogenized" = "Cancer", "normal, homogenized" = "Normal","cancer, LCM" = "Cancer_LCM")
table(group_list$group)

?
Cancer Cancer_LCM Normal
17 77 17

run_PCA
  • 執行PCA分析并繪圖
  • 默認添加橢圓、默認不添加樣本標簽
  • 可輸出圖片(PDF/PNG), 返回ggplot對象, 可自行修改
group = group_list$group 
expM = group_list$expM
SID = group_list$SID
table(group)
groupCancer Cancer_LCM     Normal 17         77         17 
run_PCA(expM, group, out_name = "./test/00_GEO_data_GSE18105/01_PCA")

?

?

# 修改屬性
run_PCA(expM, group, title = "PCA Plot", legend.title = "Group", text.size = 30, pt.size = 5, key.size = 5, color = c("red", "green", "blue"))

?

?

  • 添加label, 適合樣本較少的情況, 可以更清晰地展示離群樣本名
  • 這里每組只提取前5個樣本展示
SID2 = unlist(lapply(group_list$SID_list, function(x) x[1:5]))
group2 = group[match(SID2, SID)]
expM2 = expM[, SID2]
table(group2)                     
group2Cancer Cancer_LCM     Normal 5          5          5 
run_PCA(expM2, group2, label = TRUE)

?

?

run_DEG_limma
  • 快速進行limma差異分析, 適合標準化的芯片數據或高通量數據
  • 可設置閾值, 默認 |log2FC| > 1, Padj < 0.05
  • 可設置不使用P值矯正
  • 可添加基因label, 可自定義基因
DEG_tb = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", out_name = "./test/00_GEO_data_GSE18105/02_DEG")
Number of Up regulated genes: 1006 
Number of Down regulated genes: 1096 
Number of Not Change regulated genes: 20779 
head(DEG_tb)
A data.frame: 6 × 7
logFCAveExprtP.Valueadj.P.ValBDEG
<dbl><dbl><dbl><dbl><dbl><dbl><chr>
FOXQ15.7163548.77703013.7020082.117724e-158.075941e-1224.97073Up
CLDN14.6408078.50236919.3303546.543012e-201.497107e-1534.74020Up
LGR54.4854297.35480310.2233196.637827e-121.421583e-0917.14899Up
KRT234.4453036.613692 8.9371071.917386e-101.589992e-0813.84479Up
DPEP14.2603638.24049010.1225208.572521e-121.662270e-0916.89826Up
CEMIP4.1478648.04887712.2184765.487202e-145.879765e-1121.82719Up
# 最低閾值
DEG_tb2 = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", log2FC = 0.5, padj = FALSE, out_name = "./test/00_GEO_data_GSE18105/03_DEG")
Number of Up regulated genes: 2969 
Number of Down regulated genes: 2724 
Number of Not Change regulated genes: 17188 
plot_volcano_limma
  • 根據差異表格繪制火山圖
group_list = get_GEO_group(obj, group_name = "characteristics_ch1.1", "cancer, homogenized" = "Cancer", "normal, homogenized" = "Normal","cancer, LCM" = "Cancer_LCM")
table(group_list$group)

?
Cancer Cancer_LCM Normal
17 77 17

group = group_list$group 
expM = group_list$expM
SID = group_list$SID
table(group)
groupCancer Cancer_LCM     Normal 17         77         17 
DEG_tb = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", out_name = "./test/00_GEO_data_GSE18105/02_DEG")
Number of Up regulated genes: 1006 
Number of Down regulated genes: 1096 
Number of Not Change regulated genes: 20779 
plot_volcano_limma(DEG_tb, out_name = "./test/00_GEO_data_GSE18105/04_volcano")
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

# 顯示閾值
plot_volcano_limma(DEG_tb, caption = TRUE)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

# 選擇 top logFC 前10的基因進行顯示
plot_volcano_limma(DEG_tb, caption = TRUE, method = "logFC")
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

# 不顯示基因名
plot_volcano_limma(DEG_tb, caption = TRUE, label = FALSE)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.

# 自定義label基因, 每組隨機挑選5個
set.seed(1234)
gene_slect = as.character(aggregate(rownames(DEG_tb), by = list(DEG_tb$DEG), function(x) sample(x, 5))[, 2])
gene_slect
  1. 'CPM'
  2. 'STX16'
  3. 'TTC27'
  4. 'FFAR4'
  5. 'RP11-31K23.2'
  6. 'MCM2'
  7. 'GNA11'
  8. 'PIK3IP1'
  9. 'RFC3'
  10. 'CAPN9'
  11. 'TFPI'
  12. 'ASCC3'
  13. 'DMXL2'
  14. 'GK5'
  15. 'SLC2A1'
plot_volcano_limma(DEG_tb, caption = TRUE, label = gene_slect)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

  • 如果差異分析的閾值發生改變, 這里也要進行一樣的設置
# 最低閾值
DEG_tb2 = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", log2FC = 0.5, padj = FALSE, out_name = "./test/00_GEO_data_GSE18105/03_DEG")
Number of Up regulated genes: 2969 
Number of Down regulated genes: 2724 
Number of Not Change regulated genes: 17188 
plot_volcano_limma(DEG_tb2, log2FC = 0.5, padj = FALSE)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

plot_heatmap_DEG
  • 根據差異分析結果和表達譜繪制差異基因熱圖
# 僅支持輸入兩種分組
group2 = group[group != "Cancer_LCM"]
expM2 = expM[, group != "Cancer_LCM"]
table(group2)
group2
Cancer Normal 17     17 
plot_heatmap_DEG(expM2, DEG_tb, group2, out_name = "./test/00_GEO_data_GSE18105/04_heatmap")

?

?

set_image(7, 10)
plot_heatmap_DEG(expM2, DEG_tb, group2, ntop = 30, w = 15)

?

?

plot_boxplot
gene_select = c(head(rownames(DEG_tb), 3), tail(rownames(DEG_tb), 3))
gene_select
  1. 'FOXQ1'
  2. 'CLDN1'
  3. 'LGR5'
  4. 'GCG'
  5. 'ZG16'
  6. 'CLCA4'
source("W:/02_Study/R_build/build/fastGEO/functions/analysis.R")
set_image(14, 6)
plot_boxplot(expM2, group2, genes = gene_select,out_name = "./test/00_GEO_data_GSE18105/06_boxplot", w = 10, h = 5)

?

?

set_image(14, 6)
plot_boxplot(expM2, group2, genes = gene_select, breaks = c("Normal", "Cancer"), # 修改x軸順序ylab = "log2(FPKM + 1)", # 修改y軸標題color = c("blue", "red"), # 修改顏色w = 10, h = 5)

?

?

# 多組
set_image(14, 6)
plot_boxplot(expM, group, breaks = c("Normal", "Cancer", "Cancer_LCM"), # 修改x軸順序comparisons = list(c("Cancer", "Normal"), c("Cancer_LCM", "Normal")), # 手動指定分組genes = gene_select)

?

?

run_DEG_deseq2
  • 快速使用DESeq2進行差異分析, 適合count數據
  • 有count數據優先使用DESeq2
load2("./test/TCGA/00_TCGA_data.RData")
Loading objects:expMexpM_FPKMgroupsample_anno
hd(expM)
dim: 59427 × 562 
A data.frame: 5 × 5
TCGA-60-2712-01A-01R-0851-07TCGA-56-7221-01A-11R-2045-07TCGA-21-A5DI-01A-31R-A26W-07TCGA-43-7657-01A-31R-2125-07TCGA-94-7033-01A-11R-1949-07
<int><int><int><int><int>
5_8S_rRNA 0 0 0 0 0
5S_rRNA 0 0 0 4 4
7SK 0 3 0 0 0
A1BG 7 1 0 6 5
A1BG-AS18134314124
table(group)
groupLUAD Normal 511     51 
sum(group == "Normal")

51

# 取子集測試
set.seed(1234)
SID_tumor = sample(colnames(expM)[group == "LUAD"], sum(group == "Normal"))
expM2 = expM[, c(SID_tumor, colnames(expM)[group == "Normal"])]
group2 = group[match(colnames(expM2), colnames(expM))]
table(group2)
group2LUAD Normal 51     51 
DEG_tb_count = run_DEG_deseq2(expM2, group2, Case = "LUAD", Control = "Normal", out_name = "./test/TCGA/TCGA_LUAD_DEG")
estimating size factorsestimating dispersionsgene-wise dispersion estimatesmean-dispersion relationshipfinal dispersion estimatesfitting model and testing-- replacing outliers and refitting for 3453 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)estimating dispersionsfitting model and testing

?

Number of Up regulated genes: 6032 
Number of Down regulated genes: 11035 
Number of Not Change regulated genes: 37154 
count_to_exp
  • 用于將count矩陣轉為標準化的表達譜, 可用于繪制PCA、熱圖和箱線圖等
  • 不依賴于基因長度, 省去計算TPM/FPKM的麻煩
  • 使用 DESeq2 的 VST 標準化方法
expM_vst = count_to_exp(expM2)
hd(expM_vst)
dim: 59427 × 102 
A data.frame: 5 × 5
TCGA-58-A46K-01A-11R-A24H-07TCGA-68-A59J-01A-21R-A26W-07TCGA-66-2753-01A-01R-0980-07TCGA-77-8136-01A-11R-2247-07TCGA-46-3765-01A-01R-0980-07
<dbl><dbl><dbl><dbl><dbl>
5_8S_rRNA1.8852981.8852981.8852981.8852981.885298
5S_rRNA3.1842271.8852982.5866421.8852981.885298
7SK1.8852981.8852981.8852981.8852981.885298
A1BG3.4522683.8816633.2503464.2692593.959024
A1BG-AS15.4023775.8044805.6396236.7559545.297581
expM_FPKM2 = expM_FPKM[, colnames(expM2)]
hd(expM_FPKM2)
dim: 59427 × 102 
A data.frame: 5 × 5
TCGA-58-A46K-01A-11R-A24H-07TCGA-68-A59J-01A-21R-A26W-07TCGA-66-2753-01A-01R-0980-07TCGA-77-8136-01A-11R-2247-07TCGA-46-3765-01A-01R-0980-07
<dbl><dbl><dbl><dbl><dbl>
5_8S_rRNA0.00000000.00000000.000000000.00000000.0000000
5S_rRNA1.71874520.00000000.631337140.00000000.0000000
7SK0.00000000.00000000.000000000.00000000.0000000
A1BG0.12578330.17976580.081612320.27774680.1615653
A1BG-AS10.84149010.94553280.870187151.59129640.5865966
  • 測試vst算法與TCGA官網提供的FPKM的相關性
  • 結果顯示Sperman相關性達到0.984, 說明相關性極高, 可以等價
# 加載必需的包(確保版本兼容)
if (!require("dplyr", quietly = TRUE)) {install.packages("dplyr")library(dplyr)
}
if (!require("ggplot2", quietly = TRUE)) {install.packages("ggplot2")library(ggplot2)
}# 確保兩個矩陣的基因和樣本匹配
common_genes <- intersect(rownames(expM_vst), rownames(expM_FPKM2))
common_samples <- intersect(colnames(expM_vst), colnames(expM_FPKM2))expM_vst_filtered <- expM_vst[common_genes, common_samples]
expM_FPKM_filtered <- expM_FPKM2[common_genes, common_samples]# 轉換為長格式(不使用rownames_to_column)
# 處理VST數據:手動添加基因名列
vst_df <- as.data.frame(expM_vst_filtered)
vst_df$gene <- rownames(vst_df)  # 手動將行名轉為gene列
vst_long <- tidyr::pivot_longer(vst_df, cols = -gene, names_to = "sample", values_to = "vst_value")# 處理FPKM數據:手動添加基因名列
fpkm_df <- as.data.frame(expM_FPKM_filtered)
fpkm_df$gene <- rownames(fpkm_df)  # 手動將行名轉為gene列
fpkm_long <- tidyr::pivot_longer(fpkm_df, cols = -gene, names_to = "sample", values_to = "fpkm_value")# 合并數據
combined_data <- inner_join(vst_long, fpkm_long, by = c("gene", "sample"))# 計算相關性
cor_pearson <- cor(combined_data$vst_value, combined_data$fpkm_value, method = "pearson")
cor_spearman <- cor(combined_data$vst_value, combined_data$fpkm_value, method = "spearman")
hd(combined_data)
dim: 6061554 × 4 
A data.frame: 5 × 4
genesamplevst_valuefpkm_value
<chr><chr><dbl><dbl>
15_8S_rRNATCGA-58-A46K-01A-11R-A24H-071.8852980
25_8S_rRNATCGA-68-A59J-01A-21R-A26W-071.8852980
35_8S_rRNATCGA-66-2753-01A-01R-0980-071.8852980
45_8S_rRNATCGA-77-8136-01A-11R-2247-071.8852980
55_8S_rRNATCGA-46-3765-01A-01R-0980-071.8852980
# 繪制散點圖
set.seed(1234)
combined_data = data.frame(combined_data)
combined_data2 = combined_data[sample(1:nrow(combined_data), 10000), ]
# 計算相關性
cor_pearson <- cor(combined_data2$vst_value, combined_data2$fpkm_value, method = "pearson")
cor_spearman <- cor(combined_data2$vst_value, combined_data2$fpkm_value, method = "spearman")
ggplot(combined_data2, aes(x = fpkm_value, y = vst_value)) +geom_point(alpha = 0.3, size = 1) +geom_smooth(method = "lm", color = "red", se = FALSE) +labs(x = "FPKM Expression",y = "VST Normalized Expression",title = paste0("Correlation between VST and FPKM\n","Pearson: ", round(cor_pearson, 3), " | ","Spearman: ", round(cor_spearman, 3))) +theme_minimal() +theme(plot.title = element_text(hjust = 0.5))
[1m[22m`geom_smooth()` using formula = 'y ~ x'

  • 繪制的熱圖也看不出差異
plot_heatmap_DEG(expM_vst, DEG_tb_count, group2)

?

?

plot_heatmap_DEG(expM_FPKM2, DEG_tb_count, group2)

?

?

run_ComBat
  • 去除多個數據集之間的批次效應, 并繪制去批次前后的PCA圖
# 下載處理 GSE108109 芯片數據
obj = download_GEO("GSE108109", out_dir = "./test/run_ComBat")
INFO [2025-08-02 21:47:24] Step1: Download GEO data ...
INFO [2025-08-02 21:47:24] Querying dataset: GSE108109 ...- Use local curl- Found 1 GPL- Found 111 samples, 34 metas.- Writting sample_anno to ./test/run_ComBat/GSE108109_sample_anno.csv - Normalize between arrays ...- Successed, file save to ./test/run_ComBat/GSE108109_GPL19983.RData.
INFO [2025-08-02 21:47:35] Step2: Annotate probe GPL19983 ...
INFO [2025-08-02 21:47:36] Use built-in annotation file ...
- All porbes matched!
- Warnning: 732/25582 probes fail to annotated!
INFO [2025-08-02 21:47:36] Removing duplicated genes by method: max ...
INFO [2025-08-02 21:48:01] Done.
obj_list = get_GEO_group(obj, group_name = "source_name_ch1","Membranous Nephropathy" = "MN","Living donor" = "Control")
expM_GSE108109 = obj_list$expM
group_GSE108109 = obj_list$group
#下載處理 GSE216841 高通量
sample_anno = get_GEO_pheno("GSE216841")
INFO [2025-08-02 21:52:25] Querying dataset: GSE216841 ...- Use local curl- Found 1 GPL- Found 34 samples, 40 metas.- Writting sample_anno to 00_GEO_data/GSE216841_sample_anno.csv - Can't found expression profile, please seehttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE216841 - Return sample annotation!
hd(sample_anno)
dim: 34 × 40 
A data.frame: 5 × 5
titlegeo_accessionstatussubmission_datelast_update_date
<chr><chr><chr><chr><chr>
GSM6696604Normal control 1GSM6696604Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696605Normal control 2GSM6696605Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696606Normal control 3GSM6696606Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696607Normal control 4GSM6696607Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696608Normal control 5GSM6696608Public on Jan 25 2023Oct 28 2022Jan 25 2023
expM_raw = read.faster("./test/run_ComBat/GSE216841_MNd_ncounts_annot.txt.gz")
hd(expM_raw)
dim: 20321 × 37 
A data.frame: 5 × 5
ensembl_gene_idhgnc_GRCh38p12descriptionN_CTRL1N_CTRL2
<chr><chr><chr><dbl><dbl>
ENSG00000000003ENSG00000000003TSPAN6 tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858] 92.874228309.359955
ENSG00000000005ENSG00000000005TNMD tenomodulin [Source:HGNC Symbol;Acc:HGNC:17757] 1.020596 1.041616
ENSG00000000419ENSG00000000419DPM1 dolichyl-phosphate mannosyltransferase subunit 1, catalytic [Source:HGNC Symbol;Acc:HGNC:3005]255.148977 1.041616
ENSG00000000457ENSG00000000457SCYL3 SCY1 like pseudokinase 3 [Source:HGNC Symbol;Acc:HGNC:19285] 373.538103887.456841
ENSG00000000460ENSG00000000460C1orf112chromosome 1 open reading frame 112 [Source:HGNC Symbol;Acc:HGNC:25565] 110.224358289.569251
countM = expM_raw[, -c(1:3)]
name_list = c("N_CTRL" = "Normal control", "MEMBR" = "Idiopathic membranous nephropathy", "MCD" = "Minimal change disease")
new_name = paste(name_list[paste0(gsub("[0-9]", "", colnames(countM)))], gsub("[A-Z_]", "", colnames(countM)))
colnames(countM) = sample_anno$geo_accession[match(new_name, sample_anno$title)]
countM = ceiling(countM)
countM = data.frame(SYMBOL = expM_raw$hgnc_GRCh38p12, countM)
countM = aggregate(. ~ SYMBOL, data = countM, function(x) x[which.max(x)])
countM = col2rownames(countM)
expM = count_to_exp(countM)
converting counts to integer mode

?

group = subString(sample_anno[colnames(countM), "title"], -1, " ", rev = TRUE, collapse = " ")
table(group)
group
Idiopathic membranous nephropathy            Minimal change disease 12                                14 Normal control 8 
SID_MN = colnames(expM)[group == "Idiopathic membranous nephropathy"]
SID_Control = colnames(expM)[group == "Normal control"]
expM_GSE216841 = expM[, c(SID_MN, SID_Control)]
group_GSE216841 = rep_by_len(c("MN", "Control"), list(SID_MN, SID_Control))
expM_list = list2(expM_GSE108109, expM_GSE216841)
names(expM_list) = subString(names(expM_list), 2, "_")
names(expM_list)
  1. 'GSE108109'
  2. 'GSE216841'
group_merge = Reduce(c, sapply(paste0("group_", names(expM_list)), get))
table(group_merge)
group_merge
Control      MN 14      56 
set_image(9, 10)
obj = run_ComBat(expM_list, group_merge, out_name = "./test/run_ComBat/01_Combat")
Warning message in MASS::cov.trob(data[, vars]):
"Probable convergence failure"
Warning message in MASS::cov.trob(data[, vars]):
"Probable convergence failure"
Warning message in MASS::cov.trob(data[, vars]):
"Probable convergence failure"

# 提取結果
expM = obj$expM
group = obj$group
boxplot(expM)

table(group)
group
Control      MN 14      56 

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/pingmian/91592.shtml
繁體地址,請注明出處:http://hk.pswp.cn/pingmian/91592.shtml
英文地址,請注明出處:http://en.pswp.cn/pingmian/91592.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

LLM 典型模型技術特性及項目落地全流程實踐

在大語言模型(LLM)技術快速迭代的當下,開發者面臨的核心挑戰已從 “是否使用” 轉變為 “如何正確選型并高效落地”。本文將系統剖析當前主流 LLM 的技術特性,結合實際項目架構,提供從模型選型、接口集成到性能優化的全流程技術方案,并附關鍵代碼實現,為工業級 LLM 應用…

機器學習消融實驗:方法論演進、跨領域應用與前沿趨勢

一、定義與起源 消融實驗&#xff08;Ablation Study&#xff09;是一種系統性移除或修改模型關鍵組件以評估其對整體性能貢獻的實驗方法論。其術語源于神經科學和實驗心理學&#xff08;20世紀60-70年代&#xff09;&#xff0c;指通過切除動物腦區研究行為變化的實驗范式。2…

北京-4年功能測試2年空窗-報培訓班學測開-今天來聊聊我的痛苦

最近狀態很不對勁&#xff0c;因為我很少花時間好好思考&#xff0c;只是處于執行狀態&#xff0c;甚至也不太寫筆記了&#xff0c;我原以為這樣會更高效&#xff0c;現在想想&#xff0c;開始不愿花時間深思才是斷弦的開始吧而且從結課后我有了隱瞞&#xff0c;我不想過多透露…

深度解析 | AI 幻覺的形成和應對路徑

寫這一篇的緣由一是因為我也在摸索如何降低 AI 幻覺提升 AI 工具使用效率&#xff0c;二是因為前兩周在MIT學習時老師講的一節課&#xff0c;剛好也解釋了這個問題&#xff0c;所以一并做個總結&#xff0c;分享給大家。 近幾年&#xff0c;大型語言模型&#xff08;LLM&#…

Java把word轉HTML格式

Java把word轉HTML格式&#xff0c;兩種方式方式一&#xff1a;maven引入依賴,pom.xml<dependency><groupId>e-iceblue</groupId><artifactId>spire.office.free</artifactId><version>5.3.1</version> </dependency>然后代碼讀…

#C語言——學習攻略:探索字符函數和字符串函數(一)--字符分類函數,字符轉換函數,strlen,strcpy,strcat函數的使用和模擬實現

&#x1f31f;菜鳥主頁&#xff1a;晨非辰的主頁 &#x1f440;學習專欄&#xff1a;《C語言學習》 &#x1f4aa;學習階段&#xff1a;C語言方向初學者 ?名言欣賞&#xff1a;"編程的本質是理解問題&#xff0c;然后把它分解成可執行的步驟。" 目錄 1. 字符分類函…

(吃飯)質數時間

題目描述如果把一年之中的某個時間寫作 a 月 b 日 c 時 d 分 e 秒的形式&#xff0c;當這五個數都為質數時&#xff0c;我們把這樣的時間叫做質數時間&#xff0c;現已知起始時刻是 2022 年的 a 月 b 日 c 時 d 分 e 秒&#xff0c;終止時刻是 2022 年的 u 月 v 日 w 時 x 分 y…

【RK3568 RTC 驅動開發詳解】

RK3568 RTC 驅動開發詳解一、Linux RTC 子系統架構?二、設備樹配置?三、驅動四、時間相關命令實時時鐘&#xff08;RTC&#xff09;是嵌入式系統中不可或缺的硬件模塊&#xff0c;負責在系統斷電后繼續計時&#xff0c;為設備提供穩定的時間基準。本文將以瑞芯微 RK3568 平臺…

文本編碼檢測庫`chardet` 和 `uchardet`對比使用示例及注意事項

在處理未知編碼的二進制數據時&#xff0c;chardet 和 uchardet 是兩個非常實用的字符編碼自動檢測庫&#xff0c;尤其適用于從衛星通信、文件、網絡流等來源獲取的未標明編碼的文本數據。一、chardet&#xff08;Python版&#xff09; ? 簡介 chardet 是一個用 Python 編寫的…

[Windows]Postman-app官方歷史版本下載方法

Postman-app官方歷史版本下載方法最新版&歷史版本官網地址最新版本下載歷史版本下載禁止自動更新方法Postman最新版安裝后必須要登錄才能使用某些特定功能&#xff0c;多有不便&#xff0c;因此花了點時間整理了一下歷史版本如何下載的方法&#xff0c;鏈接均為官網鏈接&am…

【Spring Boot 快速入門】三、分層解耦

目錄分層解耦案例&#xff1a;將 emp.xml 中的數據解析并響應三層架構分層解耦IOC & DI 入門IOC 詳解DI 詳解分層解耦 案例&#xff1a;將 emp.xml 中的數據解析并響應 emp.xml 內容如下&#xff1a; <emps><emp><name>Tom</name><age>18…

井云科技2D交互數字人:讓智能服務觸手可及的實用方案

在如今的數字化時代&#xff0c;智能交互已成為各行業提升服務質量的重要方向。而井云 2D 交互數字人系統憑借其獨特的技術優勢&#xff0c;正逐漸成為眾多企業實現智能服務升級的優選。它無需復雜的操作和高昂的成本&#xff0c;就能讓數字人在各類線下場景中發揮重要作用&…

本地部署VMware ESXi,并實現無公網IP遠程訪問管理服務器

ESXi&#xff08;VMware ESXi&#xff09;是VMware公司推出的一款企業級虛擬化平臺&#xff0c;基于裸機&#xff08;bare-metal&#xff09;安裝的虛擬化操作系統。它可以在一臺物理服務器上運行多個虛擬機&#xff0c;廣泛應用于數據中心和云計算環境中。很多公司為了方便管理…

讓科技之光,溫暖銀齡歲月——智紳科技“智慧養老進社區”星城國際站溫情紀實

七月的風&#xff0c;帶著夏日的熱情&#xff0c;輕輕拂過邯鄲星城國際社區蔥郁的綠意。2025年7月30日&#xff0c;一個以“幸福晚景&#xff0c;樂享銀齡—智慧養老進社區”為主題的活動&#xff0c;如一股暖流&#xff0c;浸潤了社區的長者們。智紳科技懷揣著“科技賦能養老&…

Java單元測試和設計模式

單元測試 . 測試分類 什么是測試? 測試的目的是盡可能多的發現軟件中存在的BUG,而不是為了隱藏BUG。事實上測試有很多種類,比如:邊界測試,壓力測試,性能測試等 黑盒測試 黑盒測試也叫功能測試,主要關注軟件每個功能是否實現,并不關注軟件代碼是否有錯誤;測試人員…

UOS統信桌面系統解決編譯錯誤:C compiler cc is not found指南

一、系統環境 1.操作系統版本2.編譯環境 PC:~$ gcc --version gcc (Uos 8.3.0.13-deepin1) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY o…

深入理解 Docker 容器網絡:為什么用 host 網絡模式能解決連通性問題?

Docker 已經成為現代應用部署的標配&#xff0c;大家都知道它的網絡隔離做得很好&#xff0c;既安全又靈活。不過&#xff0c;在實際用 Docker 部署服務的過程中&#xff0c;相信很多人都遇到過這樣的情況&#xff1a;主機上能連通的外部服務&#xff0c;一到容器里卻死活連不上…

Spring Boot 異常處理:從全局捕獲到優化用戶體驗!

全文目錄&#xff1a;開篇語**前言****1. Spring Boot 異常處理的基本概念****2. 使用 ExceptionHandler 局部處理異常****示例&#xff1a;局部異常處理****優化建議&#xff1a;****3. 使用 ControllerAdvice 和 RestControllerAdvice 進行全局異常處理****示例&#xff1a;全…

vue3.0 + TypeScript 中使用 axios 同時進行二次封裝

項目背景是vite搭建的vue3.0 TypeScript 的項目&#xff0c;需要統一處理和統一維護就對axios進行了二次封裝 axios的安裝 npm install axios定義http文件夾然后內部定義index.ts文件&#xff0c;內部開始封裝 import axios, {type AxiosInstance} from "axios";…

ESP32- 項目應用1 音樂播放器之sd的驅動配置 #1

音樂播放器 ESP32- 項目應用1 音樂播放器之sd的驅動配置 #1 文章目錄 音樂播放器 1 sd卡介紹 1.1 SDCARD介紹 1.2 物理結構 1.3 協議說明 1.4 sd 卡模式 1.5 數據模式 1.6 sdio 初始化流程 1.7 SPI 模式下的 SD 卡初始化 2 原理圖 2.1 sd原理圖 2.2 esp32的接口 3 代碼配置 3.…