大数跨境

基于Seurat v5的SCOP彻底搞定了,单细胞随便跑。现在v4的SCP和v5的SCOP都有了

基于Seurat v5的SCOP彻底搞定了,单细胞随便跑。现在v4的SCP和v5的SCOP都有了 生信钱同学
2026-01-20
399
导读:要整就把关键的功能整全
SCOP这个包装阉割版的很容易,但是可能也就只能画画图那些,一些核心的功能,像内部基于python和R的多种双细胞方法(scDblFinder
Scrublet)、多种整合方法任选(Uncorrected、SeuratscVI,HarmonyScanoramaBBKNN)、RNA速率、PAGA、Slingshot、等这些关键功能能否跑通,才是这个包的关键,但是这个往往需要解决很多bug才能装成。装成了,真的很省事,就比如跑RNA速率这个事,基本上好的文章经常能看到,但真要自己基于python取单独跑,还是挺麻烦的,现在全部都基于Seurat对象,直接输入,还是非常方便的;我们已经把SCOP做成一个app在桌面上,根据我们的全套教程,打开即用;
目前无论你是习惯了用Seurat v4(SCP的基础单细胞分析Seaurt v4,v5和SCP包都装好了,我们提供了详细的代码解读和视频)还是Seurat v5(SC0P的基础),我们都已经全部安装好了,这也是大家在使用过程中需要注意的最大的区别;
除了常规的Rstudio网页版、jupyter网页版和ssh(带有root权限),我们还提供了下面这个VNC桌面系统,他不会出现各种网页链接,ssh链接老是中断造成的困扰,任何远程连接,都避免不了这个困扰,都没有桌面系统使用的实在,而且不卡顿。在这里面跑任务,自己的电脑关机了,任务都不会停,与你自己本地的电脑无关,告别终端需要后台提交任务的繁杂步骤,打开R或者Positron直接跑,后续的一些列新工具的部署,我们都将通过VNC系统给大家推出Positron教程 |完胜Rstudio、Jupyter,就没有搞不定的包,R和Python环境随意切换,实现装包自由
找到代码位置,复制到一个新的代码窗口就可以自己改动了,“生信钱同学代码共享”文件夹是公共文件夹大家没有改动和保存的权限,自己另存到其他地方,就能修改这个脚本了
# 此脚本运行是有关scop(全能型单细胞测序分析的工具),由“生信钱同学”公众号提供;# scop的使用发文请引用https://github.com/mengxu98/scop# 此脚本需借助“生信钱同学云计算”服务器中的VNC桌面系统中的SCOP软件才能正常运行# 其他地方无法保证全部正常运行# 如何使用VNC,请看https://mp.weixin.qq.com/s/TblLF90KvJ0NG1P_4P6ZAA# 所有图表都将保存在“plots”目录中,按顺序编号运行,保证可产生此代码全部结果,不会报错。# 此脚本名称会带有日期标志,我们会定期维护,请用最新的代码。# ==========================================================================
# 此脚本如果你把所有整合方法都跑一遍,Seurat对象容量会不断累积,# 仅仅此套示例数据就将达到60G内存,请严格注意自己的内存使用情况# 怎么“最直观的查看自己内存使用情况”可以看⬇️笔记链接# https://notes.r-py.com/003-%E6%9C%8D%E5%8A%A1%E5%99%A8%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98%E6%B1%87%E6%80%BB/008-%E6%9C%80%E7%9B%B4%E8%A7%82%E7%9A%84%E6%9F%A5%E7%9C%8B%E8%87%AA%E5%B7%B1%E7%9A%84%E5%86%85%E5%AD%98%E7%9A%84%E4%BD%BF%E7%94%A8%E6%96%B9%E6%B3%95.html
# 此scop环境是一个功能强大的集成的单细胞分析环境,为保证环境稳定性,不支持大家再装包进去# 只允许运行此环境现有的功能,如需其他功能分析,可自建环境
# 如果你是新手,必须从头到位运行,否则极易出错,不要跳过任何计算的步骤# 必须全部运行# 必须强调的是SCOP是基于Seurat v5做,不要用Seurat v4的对象的数据去跑,绝对报错,这也是SCOP与SCP最大的区别# Seurat v4的对象的数据请用我们的SCP教程去做options(scop_env_init = TRUE, reticulate.conda_binary = "/scp-pub000/scop202512/bin/conda", scop_envname = "scop_env")options(future.globals.maxSize = 12000 * 1024 ^ 2)current_user <- Sys.info()[["user"]]user_lib_path <- file.path("/home", current_user, "R/x86_64-conda-linux-gnu-library/4.4")# 必须全部运行.libPaths(c(  "/scp-pub000/scop202512/envs/scop/lib/R/library"  user_lib_path                                    ))1. 环境设置# --------------------------------------------------------------------------# 此脚本需借助“生信钱同学云计算”服务器中的VNC桌面系统中的SCOP软件才能正常运行library(reticulate)library(scop)##功能强大,但是数据对象会越堆越大,注意自己的内存使用library(Seurat)library(thisutils)library(slingshot)
# 此脚本需借助“生信钱同学云计算”服务器中的VNC桌面系统中的SCOP软件才能正常运行# 如何使用VNC,请看https://mp.weixin.qq.com/s/TblLF90KvJ0NG1P_4P6ZAA
# scop功能强大,我们保证当前代码全部正常运行,但此工具参数众多,此脚本之外的使用,请自行尝试解决。# 请注意,你的所有结果,会放在右侧的plot文件夹中,可自行查看dir.create("plots", showWarnings = FALSE)
# --------------------------------------------------------------------------2. 数据加载、预处理与探索# --------------------------------------------------------------------------print("--- 2. 正在加载数据, 运行标准流程并进行初步探索 ---")data(pancreas_sub)pancreas_sub <- standard_scop(srt = pancreas_sub)# 此脚本需借助“生信钱同学云计算”服务器中的VNC桌面系统中的SCOP软件才能正常运行print(pancreas_sub)
# --- 初步探索性数据分析与可视化 ---print("---    2a. 初步探索性数据分析与可视化 ---")
# [序号 01] 按细胞类型和亚型分组png("plots/01_CellDimPlot_CellType_SubCellType.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(  srt = pancreas_sub,  group.by = c("CellType""SubCellType"),  reduction = "UMAP",  theme_use = "theme_blank")dev.off()
# [序号 02] 按亚型分组,按细胞周期着色png("plots/02_CellDimPlot_SubCellType_by_Phase.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(  srt = pancreas_sub,  group.by = "SubCellType",  stat.by = "Phase",  reduction = "UMAP",  theme_use = "theme_blank")dev.off()
# [序号 03] 特征基因表达png("plots/03_FeatureDimPlot_Markers.png", width = 10, height = 8, units = "in", res = 300)FeatureDimPlot(  srt = pancreas_sub,  features = c("Sox9""Neurog3""Fev""Rbp4"),  reduction = "UMAP",  theme_use = "theme_blank")dev.off()
# [序号 04] 多个特征比较png("plots/04_FeatureDimPlot_Compare_Markers.png", width = 10, height = 8, units = "in", res = 300)FeatureDimPlot(  srt = pancreas_sub,  features = c("Ins1""Gcg""Sst""Ghrl"),  compare_features = TRUE,  label = TRUE,  label_insitu = TRUE,  reduction = "UMAP",  theme_use = "theme_blank")dev.off()
# [序号 05] 分组热图ht <- GroupHeatmap(  srt = pancreas_sub,  features = c(    "Sox9""Anxa2"    "Neurog3""Hes6"    "Fev""Neurod1"    "Rbp4""Pyy"    "Ins1""Gcg""Sst""Ghrl"   ),  group.by = c("CellType""SubCellType"),  heatmap_palette = "YlOrRd",  cell_annotation = c("Phase""G2M_score""Cdh2"),  cell_annotation_palette = c("Dark2""Paired""Paired"),  show_row_names = TRUE, row_names_side = "left",  add_dot = TRUE, add_reticle = TRUE)png("plots/05_GroupHeatmap_Markers.png", width = 12, height = 10, units = "in", res = 300)print(ht$plot)dev.off()
# --- 标准流程UMAP图 ---print("---    2b. 保存标准流程UMAP图 ---")# [序号 06] 标准流程UMAPpng("plots/06_StandardUMAP_CellType.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(  srt = pancreas_sub,  group.by = c("CellType""SubCellType"),  reduction = "StandardUMAP2D",  theme_use = "theme_blank")dev.off()
# --------------------------------------------------------------------------3. 细胞质控(CellQC)# --------------------------------------------------------------------------print("--- 3. 正在运行细胞质控并保存图表 ---")# 提供了两种非常常用的双细胞计算方法,任选其一;pancreas_sub <- RunCellQC(srt = pancreas_sub, db_method="scDblFinder")#pancreas_sub <- RunCellQC(srt = pancreas_sub, db_method="Scrublet")
# [序号 07] 质控UMAPpng("plots/07_CellQC_UMAP.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(srt = pancreas_sub, group.by = "CellQC", reduction = "UMAP")dev.off()
# [序号 08] 质控统计条形图png("plots/08_CellStatPlot_QC_by_CellType.png", width = 10, height = 8, units = "in", res = 300)CellStatPlot(srt = pancreas_sub, stat.by = "CellQC", group.by = "CellType", label = TRUE)dev.off()
# [序号 09] 质控Upset图png("plots/09_CellStatPlot_QC_Upset.png", width = 10, height = 8, units = "in", res = 300)CellStatPlot(  srt = pancreas_sub,  stat.by = c(    "db_qc""outlier_qc",    "umi_qc""gene_qc",    "mito_qc""ribo_qc",    "ribo_mito_ratio_qc""species_qc"  ),  plot_type = "upset",  stat_level = "Fail")dev.off()
# --------------------------------------------------------------------------4. 多样本数据整合# 我们提供了多种多样本整合去除批次效应的方法,任选其一就行# 请注意后续的脚本都是基于integration_method = "Seurat"结果做的# 如果你使用了其他方法整合,后续脚本可能需要修改参数,请自行到scop官网学习# scop的使用请引用https://github.com/mengxu98/scop
# --------------------------------------------------------------------------print("--- 4. 正在运行多样本数据整合流程 ---")data("panc8_sub")panc8_sub <- standard_scop(srt = panc8_sub)panc8_sub <- integration_scop(srt_merge = panc8_sub, batch = "tech", integration_method = "Seurat")#panc8_sub <- integration_scop(srt_merge = panc8_sub, batch = "tech", integration_method = "Uncorrected")#scVI巨慢无比,非必需不建议大家跑#panc8_sub <- integration_scop(srt_merge = panc8_sub, batch = "tech", integration_method = "scVI")#panc8_sub <- integration_scop(srt_merge = panc8_sub, batch = "tech", integration_method = "Harmony")#panc8_sub <- integration_scop(srt_merge = panc8_sub, batch = "tech", integration_method = "Scanorama")#这个也较慢,请注意#panc8_sub <- integration_scop(srt_merge = panc8_sub, batch = "tech", integration_method = "BBKNN")
# ⚠️⚠️⚠️# 请注意后续的脚本都是基于integration_method = "Seurat"结果做的# 如果你使用了其他方法整合,后续脚本可能需要修改参数,请自行到scop官网学习# scop的使用请引用https://github.com/mengxu98/scop# [序号 10] Seurat整合结果png("plots/10_Integration_Seurat_UMAP.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(  srt = panc8_sub,  group.by = c("celltype""tech"),  reduction = "SeuratUMAP2D",  title = "Seurat",  theme_use = "theme_blank")dev.off()
# --------------------------------------------------------------------------5. 细胞注释与投射# --------------------------------------------------------------------------print("--- 5. 正在进行细胞注释与投射 ---")
# --- 细胞投射 ---print("---    5a. 运行细胞投射 (Cell Projection) ---")genenames <- make.unique(  thisutils::capitalize(    rownames(panc8_sub[["RNA"]]),    force_tolower = TRUE))names(genenames) <- rownames(panc8_sub)
panc8_rename <- RenameFeatures(  srt = panc8_sub,  newnames = genenames,  assays = "RNA")
srt_query <- RunKNNMap(  srt_query = pancreas_sub,  srt_ref = panc8_rename,  ref_umap = "SeuratUMAP2D")
# [序号 11] 投射图png("plots/11_ProjectionPlot.png", width = 10, height = 8, units = "in", res = 300)ProjectionPlot(  srt_query = srt_query,  srt_ref = panc8_rename,  query_group = "SubCellType",  ref_group = "celltype")dev.off()
# --- 基于Bulk RNA-seq数据的注释 ---print("---    5b. 使用Bulk RNA-seq数据集进行细胞注释 ---")data("ref_scMCA")pancreas_sub <- RunKNNPredict(  srt_query = pancreas_sub,  bulk_ref = ref_scMCA,  filter_lowfreq = 20)
# [序号 12] Bulk注释结果png("plots/12_Annotation_Bulk_UMAP.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(  srt = pancreas_sub,  group.by = "KNNPredict_classification",  reduction = "UMAP",  label = TRUE)dev.off()
# --- 基于单细胞数据集的注释 ---print("---    5c. 使用单细胞数据集进行细胞注释 ---")pancreas_sub <- RunKNNPredict(  srt_query = pancreas_sub,  srt_ref = panc8_rename,  ref_group = "celltype",  filter_lowfreq = 20)
# [序号 13] 单细胞注释结果png("plots/13_Annotation_SC_UMAP.png", width = 10, height = 8, units = "in", res = 300)CellDimPlot(  srt = pancreas_sub,  group.by = "KNNPredict_classification",  reduction = "UMAP",  label = TRUE)dev.off()
# [序号 14] 相关性热图ht <- CellCorHeatmap(  srt_query = pancreas_sub,  srt_ref = panc8_rename,  query_group = "SubCellType",  ref_group = "celltype",  nlabel = 3,  label_by = "row",  show_row_names = TRUE,  show_column_names = TRUE)png("plots/14_CellCorHeatmap.png", width = 10, height = 8, units = "in", res = 300)print(ht$plot)dev.off()
# --------------------------------------------------------------------------6. 拟时序分析与动态基因分析# --------------------------------------------------------------------------print("--- 6. 正在运行拟时序分析与动态基因分析 ---")
# --- PAGA (细胞群关联图) ---print("---    6a. 运行PAGA分析 ---")pancreas_sub <- RunPAGA(  srt = pancreas_sub,  group_by = "SubCellType",  linear_reduction = "PCA",  nonlinear_reduction = "UMAP")
# [序号 15] PAGA图png("plots/15_PAGA_Plot.png", width = 10, height = 8, units = "in", res = 300)PAGAPlot(  srt = pancreas_sub,  reduction = "UMAP",  label = TRUE,  label_insitu = TRUE,  label_repel = TRUE)dev.off()
# --- RNA Velocity (RNA速率) ---##计算很耗时,正常现象print("---    6b. 运行RNA Velocity分析 ---")pancreas_sub <- RunSCVELO(  srt = pancreas_sub,  group_by = "SubCellType",  linear_reduction = "PCA",  nonlinear_reduction = "UMAP")
# [序号 16] RNA速率向量图png("plots/16_VelocityPlot_Vectors.png", width = 10, height = 8, units = "in", res = 300)VelocityPlot(  srt = pancreas_sub,  reduction = "UMAP",  group_by = "SubCellType")dev.off()
# [序号 17] RNA速率流线图png("plots/17_VelocityPlot_Stream.png", width = 10, height = 8, units = "in", res = 300)VelocityPlot(  srt = pancreas_sub,  reduction = "UMAP",  plot_type = "stream")dev.off()
# --- Slingshot (轨迹推断) ---print("---    6c. 运行Slingshot轨迹推断 ---")pancreas_sub <- RunSlingshot(  srt = pancreas_sub,  group.by = "SubCellType",  reduction = "UMAP")
# [序号 18] Slingshot谱系图png("plots/18_Slingshot_Lineages.png", width = 10, height = 8, units = "in", res = 300)FeatureDimPlot(  pancreas_sub,  features = paste0("Lineage"1:3),  reduction = "UMAP",  theme_use = "theme_blank")dev.off()
# --- 动态特征、差异表达与富集分析 ---print("---    6d. 分析动态基因、差异表达与功能富集 ---")
pancreas_sub <- RunDynamicFeatures(  srt = pancreas_sub,  lineages = c("Lineage1""Lineage2"),  n_candidates = 200)
##计算很耗时,正常现象pancreas_sub <- RunDEtest(  srt = pancreas_sub,  group_by = "CellType",  fc.threshold = 1,  only.pos = FALSE)
# [序号 19]  差异表达火山图png("plots/19_VolcanoPlot_DE.png", width = 12, height = 8, units = "in", res = 300)VolcanoPlot(  srt = pancreas_sub,  group_by = "CellType")dev.off()
##计算很耗时,正常现象pancreas_sub <- RunEnrichment(  srt = pancreas_sub,  group_by = "CellType",  db = c("TF""CSPA","GO_BP","GO_CC","GO_MF","KEGG"),  species = "Mus_musculus",  DE_threshold = "avg_log2FC > log2(1.5) & p_val_adj < 0.05")
# [序号 20] 富集分析条形图png("plots/20_Enrichment_Bar.png", width = 10, height = 8, units = "in", res = 300)EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal""Endocrine"), plot_type = "bar")dev.off()
# [序号 21] 富集分析词云图png("plots/21_Enrichment_Wordcloud.png", width = 10, height = 8, units = "in", res = 300)EnrichmentPlot( srt = pancreas_sub, group_by = "CellType", group_use = c("Ductal""Endocrine"), plot_type = "wordcloud")dev.off()
##计算很耗时,正常现象pancreas_sub <- RunGSEA(  srt = pancreas_sub,  group_by = "CellType",  db = "GO_BP",  species = "Mus_musculus",  DE_threshold = "p_val_adj < 0.05")
# [序号 22]  GSEA富集图png("plots/22_GSEAPlot.png", width = 10, height = 8, units = "in", res = 300)GSEAPlot( srt = pancreas_sub, group_by = "CellType", group_use = "Endocrine", id_use = "GO:0007186")dev.off()
# --- 可视化动态特征 ---
# [序号 23] 动态热图ht <- DynamicHeatmap(  srt = pancreas_sub,  lineages = c("Lineage1""Lineage2"),  use_fitted = TRUE,  n_split = 6,  reverse_ht = "Lineage1",  species = "Mus_musculus",  db = c("TF""CSPA""GO_BP"),  anno_terms = TRUE,  anno_keys = TRUE,  anno_features = TRUE,  heatmap_palette = "viridis",  cell_annotation = "SubCellType",  separate_annotation = list(    "SubCellType"c("Nnat""Irx1")  ),  separate_annotation_palette = c("Paired""Set1"),  feature_annotation_palcolor = list(    c("gold""steelblue"), c("forestgreen")  ),  pseudotime_label = 25,  pseudotime_label_color = "red",  height = 5,  width = 12)png("plots/23_DynamicHeatmap.png", width = 18, height = 10, units = "in", res = 300)print(ht$plot)dev.off()
# [序号 24]  动态基因变化曲线png("plots/24_DynamicPlot_Lineages.png", width = 12, height = 8, units = "in", res = 300)DynamicPlot(  srt = pancreas_sub,  lineages = c("Lineage1""Lineage2"),  group.by = "SubCellType",  features = c(    "Plk1""Hes1""Neurod2""Ghrl""Gcg""Ins2"  ),  compare_lineages = TRUE,  compare_features = FALSE)dev.off()

【声明】内容源于网络
0
0
生信钱同学
北京大学在读博士生,记录自己的学习日常🌞分享生信知识:如单细胞和空间测序、多组学分析、宏基因组、病理组学、影像组学等生物信息学、机器学习和深度学习内容🌬
内容 541
粉丝 1
生信钱同学 北京大学在读博士生,记录自己的学习日常🌞分享生信知识:如单细胞和空间测序、多组学分析、宏基因组、病理组学、影像组学等生物信息学、机器学习和深度学习内容🌬
总阅读1.7k
粉丝1
内容541