大数跨境
0
0

实战(三):Alphafold3批量实现蛋白与蛋白、蛋白与核酸、蛋白复合体与核酸相互作用预测

实战(三):Alphafold3批量实现蛋白与蛋白、蛋白与核酸、蛋白复合体与核酸相互作用预测 Dr.X的基因空间
2025-02-26
2
导读:利用Alphafold3批量完成多个蛋白复合体与核酸的相互作用结构预测

实战(三):利用Alphafold3批量实现蛋白复合体与RNA结构相互作用研究

写在前面的
之前的推送介绍了如何利用Alphafold3批量实现蛋白质结构从头预测及其与金属离子相互作用的技术,但是这些方面暂未触及到其核心。AlphaFold3的核心创新在于其多尺度、多组分联合建模能力,使得其突破了传统结构预测工具(如AlphaFold2)在复杂生物分子网络建模中的局限性,支持蛋白质-RNA/DNA-配体的多层级互作预测,突破了单一或双组分预测的边界。通过引入扩散模型和几何深度学习算法,显著提升界面残基的构象采样效率。本期的推送将介绍如何利用Alphafold3解析蛋白与其他生物大分子之间的相互作用。

蛋白与蛋白,蛋白与核酸在结构水平的多重相互作用

       蛋白质与蛋白质、蛋白质与核酸的多重相互作用是生命体系复杂功能调控的核心基础,其分子机制的研究对于理解细胞信号转导、基因表达调控、疾病发生等关键生物学过程具有重要意义。例如蛋白分子p53-MDM2相互作用失调导致抑癌功能丧失,KRAS突变体获得新结合伴侣(如CRAF)驱动MAPK通路持续激活进而引发癌症。或TDP-43蛋白异常相分离促进淀粉样纤维形成,伴随RNA剪接功能丧失进而诱导神经性退行疾病(Nature 2022, 601:434–439)。简而言之,蛋白质与核酸的多重相互作用构成了生命调控的分子语法,随着结构生物学和生物信息学的发展,相关研究正在经历从静态结构解析到动态网络建模的范式转变。例如

病毒学领域

       有研究通过解析新冠病毒的S蛋白三聚体在不同pH下的构象变化(开放态vs闭合态),结合分子动力学模拟(MD)验证其与ACE2受体的结合能变化(MM-PBSA计算ΔG)。也有研究解析病毒包膜内N蛋白-RNA复合物与S蛋白胞内域的相互作用界面,揭示基因组包装与膜融合的偶联机制。从而指导靶向病毒关键位点的抗病毒药物设计。

肿瘤学领域

       有研究通过解析RNA茎环结构与p53-DNA结合域的界面匹配,结合SAXS(小角X射线散射)验证复合物构象。或是通过荧光原位杂交(FISH)与荧光共振能量转移(FRET)验证lncRNA促进p53与ATM激酶的相分离凝聚体形成,最终揭示了非编码RNA通过结构特异性互作与相分离调控双重机制精确调控p53信号通路的分子基础,为开发基于RNA构象的癌症治疗策略提供了新靶点。

神经科学领域

       有研究从阿尔兹海默症(AD)患者脑脊液中纯化tau纤维,采用玻璃化冷冻技术保持天然构象,并利用利用RELION 3.1对2.9 Å分辨率数据进行三维重构,识别tau蛋白的C型折叠核心(PHF6/PHF6*片段)进一步通过固态核磁共振(ssNMR)验证、生物信息学和分子动力学模拟方法驱动发现tau纤维核心由两对反向平行的β-折叠层构成,形成刚性“干”区域且不同AD亚型患者中tau纤维存在结构多态性(Type I/II),解释临床异质性。

       上述的研究方向体现了多学科交叉的爆发力——从原子水平的计算预测到细胞尺度的功能验证,从分子机制的深度解析到转化医学的快速推进。

利用Alphafold3对新冠病毒RdRp复合体从头合成RNA建模

       在本次教程中我选择用新冠病毒的RNA依赖性RNA聚合酶(RdRp)复合体合成RNA的从头建模作为AlphaFold3使用的教学案例。之所以选择这个案例进行教学主要基如下以下考虑:

一:公共卫生认知基础

       新冠病毒(SARS-CoV-2)引发的全球大流行使公众对其核心复制机制建立了普遍认知基础。RdRp作为病毒基因组复制的核心分子机器,其结构功能研究在疫情期获得突破性进展(Nature 2020, 582:289-293),这为教学案例提供了良好的知识衔接点。

二:复合体结构典型性

       新冠病毒RdRp全酶由催化亚基NSP12与辅助因子NSP7/NSP8构成三级组装体系(Cell 2020, 182:417-428),其特点包括:

1.多层级相互作用

       NSP12-NSP7界面涉及疏水核心(ΔG = -9.3 kcal/mol),NSP8通过螺旋结构域与NSP12的掌域形成氢键网络。

2.动态功能耦合

       冷冻电镜研究显示,NSP7/8通过构象调节使NSP12的催化中心形成完整RNA通道(直径~12Å)。

三:多组分互作复杂性

      RdRp的RNA合成过程涉及:

1.蛋白-蛋白互作:NSP12与辅助因子形成持续结合界面(结合面达1600Ų)
2.蛋白-核酸互作:模板RNA通过"拇指-掌-指"结构域被精确锚定(Kd=15nM)
3.动态构象变化:从引发态到延伸态涉及7个关键残基的协同位移(MD模拟显示RMSD 4.2Å)

四:教学示范价值

      该案例可系统展示AlphaFold3的三大技术优势:

1.多组分建模:同时预测蛋白质三级复合体与RNA模板的原子模型
2.界面精度:总体预测误差<1.2Å
3.掌握Alphafold3核心:

      (1).蛋白质结构预测;
      (2).蛋白和蛋白相互作用预测;
      (3).蛋白与核酸相互作用预测;

五:转化研究意义

       RdRp复合体是抗病毒药物(如瑞德西韦及其衍生药物VV166)的核心靶标,通过AlphaFold3解析其复合体构象,可为新一代抑制剂设计提供结构基础。此案例既具备结构生物学的经典特征,又承载着重大公共卫生价值,是展示计算结构生物学方法学进步与转化应用潜力的理想模型体系。

利用Perl语言编程和Alphafold3实现批量蛋白核酸相互作用预测

一:蛋白序列文件准备

       按照如下格式准备新冠病毒RdRp复合体NSP12, NSP7, NSP8的氨基酸序列fasta文件,另存为proteins.faa

>NSP12
SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
>NSP7
SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
>NSP8
AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ

二:RNA序列文件准备

       按照如下格式准备一段新冠病毒合成的RNA序列,另存为RNA.fna

>RNA
UUUUCAUGCUACGCGUAG

三:编写生成蛋白RNA互作的Perl程序

       与在线Alphafold3不同,本地使用Alphafold3具备批量预测模型的潜力。因此搭配Perl语言批量读取序列生成输入Alphafold3可以实现批量预测结构的目的。我写了一个可以批处理序列的Perl程序,该程序的输出结果可以之间作为Alphafold3的输入数据。

#!/usr/bin/perl
usestrict;
usewarnings;
useJSON;
useBio::SeqIO;

# 初始化 JSON 数据结构
my %data = (
    name       => "Multi-proteins-RNA-interactions",
    modelSeeds => ["42"],
    sequences  => []
);

my $fasta = Bio::SeqIO -> new(-file => $ARGV[0], -format => 'fasta');

while(my $seq = $fasta -> next_seq){
    my ($id, $sseq) = ($seq -> id, $seq -> seq);
    add_protein_chain($seq, 1);
}

$fasta = Bio::SeqIO -> new(-file => $ARGV[1], -format => 'fasta');

while(my $seq = $fasta -> next_seq){
    my ($id, $sseq) = ($seq -> id, $seq -> seq);
    add_rna_sequence($seq, 1);
}

#自由添加蛋白质链
sub add_protein_chain {
    my ($sequence, $count) = @_;
    push @{$data{sequences}}, {
        proteinChain => {
            count => $count || 1,
            sequence => $sequence
        }
    };
}

#自由添加RNA序列
sub add_rna_sequence {
    my ($sequence, $count) = @_;
    push @{$data{sequences}}, {
        rnaSequence => {
            count => $count || 1,
            sequence => $sequence
        }
    };
}

# 将数据结构转换为 JSON 并写入文件
my $json = JSON->new->pretty->allow_blessed->convert_blessed->encode(\%data);
print $json;

       将上面的程序另存为make-protein-nucleotide-json.pl,在终端输入如下命令

perl make-protein-nucleotide-json.pl proteins.faa RNA.faa > alphafold_input.json

       运行完perl程序后会得到如下内容的json文件

{
   "name" : "Multi-protein-RNA-interactions",
   "modelSeeds" : [
      "42"
   ],
   "sequences" : [
      {
         "proteinChain" : {
            "count" : 1,
            "sequence" : null
         }
      },
      {
         "proteinChain" : {
            "count" : 1,
            "sequence" : null
         }
      },
      {
         "proteinChain" : {
            "count" : 1,
            "sequence" : null
         }
      },
      {
         "rnaSequence" : {
            "count" : 1,
            "sequence" : null
         }
      }
   ]
}

运行本地Alphafold3预测蛋白-核酸相互作用

        前期的教程已经讲过如何基于docker运行alphafold3,需要注意的知识点仍然是如何mount计算机真实路径至docker虚拟路径中,以及如何在docker中指定和调用Alphafold3依赖的模型参数和数据库。按照如下命令运行Alphafold3

nohup docker run \
--volume ~/home/data/analysis_data/test_data/AF3-test3/af_input:/root/af_input \
--volume ~/home/data/analysis_data/test_data/AF3-test3/af_output:/root/af_output \
--volume ~/home/data/Ref_data/AF3-db:/root/public_databases \
--volume ~/home/software/alphafold3:/root/models \
--gpus 2 \
alphafold3 \
python /app
/alphafold/run_alphafold.py \
--json_path=/root/af_input/alphafold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output &

       运行Alphafold3后产生如下运行记录

I0211 17:39:39.336284140698391662592 folding_input.py:1027] Detected /root/af_input/alphafold_input.json is an AlphaFold Server JSON since the top-level is a list.
I0211 17:39:39.336488140698391662592 folding_input.py:1033] Loading 1 fold jobs from /root/af_input/alphafold_input.json
I0211 17:39:39.852058140698391662592 xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
I0211 17:39:39.853077140698391662592 xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
I0211 17:40:11.077423140698391662592 pipeline.py:40] Getting protein MSAs for sequence SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 17:40:11.086848140559045883456 jackhmmer.py:78] Query sequence: SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 17:40:11.087447140558676784704 jackhmmer.py:78] Query sequence: SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 17:40:11.087629140558173468224 jackhmmer.py:78] Query sequence: SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 17:40:11.087741140559045883456 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpgtee9wuf/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpgtee9wuf/query.fasta /root/public_databases/uniref90_2022_05.fa"
I0211 17:40:11.087960140557972141632 jackhmmer.py:78] Query sequence: SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 17:40:11.088586140558676784704 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpuzmmz_lt/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpuzmmz_lt/query.fasta /root/public_databases/mgy_clusters_2022_05.fa"
I0211 17:40:11.088745140558173468224 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmplzyx31je/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmplzyx31je/query.fasta /root/public_databases/bfd-first_non_consensus_sequences.fasta"
I0211 17:40:11.121484140557972141632 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpmup7gcjw/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpmup7gcjw/query.fasta /root/public_databases/uniprot_all_2021_04.fa"
I0211 17:55:11.480792140558173468224 subprocess_utils.py:97] Finished Jackhmmer in900.392 seconds
I0211 18:17:41.876636140559045883456 subprocess_utils.py:97] Finished Jackhmmer in2250.788 seconds
I0211 18:28:36.482494140558676784704 subprocess_utils.py:97] Finished Jackhmmer in2905.394 seconds
I0211 18:32:10.385065140557972141632 subprocess_utils.py:97] Finished Jackhmmer in3119.263 seconds
I0211 18:32:12.933654140698391662592 pipeline.py:73] Getting protein MSAs took 3121.86 seconds for sequence SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 18:32:12.933791140698391662592 pipeline.py:79] Deduplicating MSAs and getting protein templates for sequence SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 18:32:12.953114140557972141632 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/hmmbuild --informat stockholm --hand --amino /tmp/tmpq5ce1w7i/output.hmm /tmp/tmpq5ce1w7i/query.msa"
I0211 18:32:13.437376140557972141632 subprocess_utils.py:97] Finished Hmmbuild in0.484 seconds
I0211 18:32:13.438385140557972141632 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/hmmsearch --noali --cpu 8 --F1 0.1 --F2 0.1 --F3 0.1 -E 100 --incE 100 --domE 100 --incdomE 100 -A /tmp/tmpkfq0gmn0/output.sto /tmp/tmpkfq0gmn0/query.hmm /root/public_databases/pdb_seqres_2022_09_28.fasta"
I0211 18:32:28.981170140557972141632 subprocess_utils.py:97] Finished Hmmsearch in15.543 seconds
I0211 18:32:49.275989140698391662592 pipeline.py:108] Deduplicating MSAs and getting protein templates took 36.34 seconds for sequence SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 18:32:49.276157140698391662592 pipeline.py:115] Filtering protein templates for sequence SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 18:32:49.286554140698391662592 pipeline.py:124] Filtering protein templates took 0.01 seconds for sequence SADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQ
I0211 18:32:49.496333140698391662592 pipeline.py:40] Getting protein MSAs for sequence AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:32:49.498258140557972141632 jackhmmer.py:78] Query sequence: AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:32:49.498481140558676784704 jackhmmer.py:78] Query sequence: AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:32:49.498833140558173468224 jackhmmer.py:78] Query sequence: AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:32:49.499069140557770815040 jackhmmer.py:78] Query sequence: AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:32:49.499204140557972141632 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmp23nsl393/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmp23nsl393/query.fasta /root/public_databases/uniref90_2022_05.fa"
I0211 18:32:49.499649140558676784704 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpgro7etx2/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpgro7etx2/query.fasta /root/public_databases/mgy_clusters_2022_05.fa"
I0211 18:32:49.499794140557770815040 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpv6b1p5cx/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpv6b1p5cx/query.fasta /root/public_databases/uniprot_all_2021_04.fa"
I0211 18:32:49.500071140558173468224 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpgqw8six_/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpgqw8six_/query.fasta /root/public_databases/bfd-first_non_consensus_sequences.fasta"
I0211 18:36:11.734694140558173468224 subprocess_utils.py:97] Finished Jackhmmer in202.235 seconds
I0211 18:43:50.872779140557972141632 subprocess_utils.py:97] Finished Jackhmmer in661.373 seconds
I0211 18:47:28.914976140557770815040 subprocess_utils.py:97] Finished Jackhmmer in879.415 seconds
I0211 18:51:50.494594140558676784704 subprocess_utils.py:97] Finished Jackhmmer in1140.995 seconds
I0211 18:51:50.495594140698391662592 pipeline.py:73] Getting protein MSAs took 1141.00 seconds for sequence AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:51:50.495653140698391662592 pipeline.py:79] Deduplicating MSAs and getting protein templates for sequence AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:51:50.504516140558676784704 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/hmmbuild --informat stockholm --hand --amino /tmp/tmpygm3oeit/output.hmm /tmp/tmpygm3oeit/query.msa"
I0211 18:51:50.608269140558676784704 subprocess_utils.py:97] Finished Hmmbuild in0.104 seconds
I0211 18:51:50.609090140558676784704 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/hmmsearch --noali --cpu 8 --F1 0.1 --F2 0.1 --F3 0.1 -E 100 --incE 100 --domE 100 --incdomE 100 -A /tmp/tmp0iwp3wne/output.sto /tmp/tmp0iwp3wne/query.hmm /root/public_databases/pdb_seqres_2022_09_28.fasta"
I0211 18:51:54.415552140558676784704 subprocess_utils.py:97] Finished Hmmsearch in3.806 seconds
I0211 18:52:14.740610140698391662592 pipeline.py:108] Deduplicating MSAs and getting protein templates took 24.24 seconds for sequence AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:52:14.740819140698391662592 pipeline.py:115] Filtering protein templates for sequence AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:52:14.745248140698391662592 pipeline.py:124] Filtering protein templates took 0.00 seconds for sequence AIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQ
I0211 18:52:14.831073140698391662592 pipeline.py:40] Getting protein MSAs for sequence SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 18:52:14.833058140558676784704 jackhmmer.py:78] Query sequence: SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 18:52:14.833216140558173468224 jackhmmer.py:78] Query sequence: SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 18:52:14.833602140557770815040 jackhmmer.py:78] Query sequence: SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 18:52:14.833740140557703702080 jackhmmer.py:78] Query sequence: SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 18:52:14.834078140558676784704 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpgqiaatlm/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpgqiaatlm/query.fasta /root/public_databases/uniref90_2022_05.fa"
I0211 18:52:14.834447140558173468224 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpx07wiq0k/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpx07wiq0k/query.fasta /root/public_databases/mgy_clusters_2022_05.fa"
I0211 18:52:14.834768140557703702080 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpll5ckqr8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpll5ckqr8/query.fasta /root/public_databases/uniprot_all_2021_04.fa"
I0211 18:52:14.834983140557770815040 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/jackhmmer -o /dev/null -A /tmp/tmpbjv0fnlo/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --cpu 8 -N 1 -E 0.0001 --incE 0.0001 /tmp/tmpbjv0fnlo/query.fasta /root/public_databases/bfd-first_non_consensus_sequences.fasta"
I0211 18:54:21.119582140557770815040 subprocess_utils.py:97] Finished Jackhmmer in126.285 seconds
I0211 18:58:49.779217140558676784704 subprocess_utils.py:97] Finished Jackhmmer in394.945 seconds
I0211 19:04:51.945898140557703702080 subprocess_utils.py:97] Finished Jackhmmer in757.111 seconds
I0211 19:06:50.604054140558173468224 subprocess_utils.py:97] Finished Jackhmmer in875.770 seconds
I0211 19:06:50.605273140698391662592 pipeline.py:73] Getting protein MSAs took 875.77 seconds for sequence SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 19:06:50.605401140698391662592 pipeline.py:79] Deduplicating MSAs and getting protein templates for sequence SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 19:06:50.611465140558173468224 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/hmmbuild --informat stockholm --hand --amino /tmp/tmpabt0fo1b/output.hmm /tmp/tmpabt0fo1b/query.msa"
I0211 19:06:50.659533140558173468224 subprocess_utils.py:97] Finished Hmmbuild in0.048 seconds
I0211 19:06:50.659960140558173468224 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/hmmsearch --noali --cpu 8 --F1 0.1 --F2 0.1 --F3 0.1 -E 100 --incE 100 --domE 100 --incdomE 100 -A /tmp/tmpgt9pyxc0/output.sto /tmp/tmpgt9pyxc0/query.hmm /root/public_databases/pdb_seqres_2022_09_28.fasta"
I0211 19:06:52.530276140558173468224 subprocess_utils.py:97] Finished Hmmsearch in1.870 seconds
I0211 19:07:30.084060140698391662592 pipeline.py:108] Deduplicating MSAs and getting protein templates took 39.48 seconds for sequence SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 19:07:30.084285140698391662592 pipeline.py:115] Filtering protein templates for sequence SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 19:07:30.086436140698391662592 pipeline.py:124] Filtering protein templates took 0.00 seconds for sequence SKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQ
I0211 19:07:30.180609140698391662592 pipeline.py:141] Getting RNA MSAs for sequence UUUUCAUGCUACGCGUAG
I0211 19:07:30.181582140558173468224 nhmmer.py:88] Query sequence: UUUUCAUGCUACGCGUAG
I0211 19:07:30.182150140558676784704 nhmmer.py:88] Query sequence: UUUUCAUGCUACGCGUAG
I0211 19:07:30.182767140557770815040 nhmmer.py:88] Query sequence: UUUUCAUGCUACGCGUAG
I0211 19:07:30.183299140558173468224 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/nhmmer -o /dev/null --noali --cpu 8 -E 0.001 --rna -A /tmp/tmpk0v5j_hu/output.sto --F3 0.02 /tmp/tmpk0v5j_hu/query.a3m /root/public_databases/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta"
I0211 19:07:30.183624140558676784704 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/nhmmer -o /dev/null --noali --cpu 8 -E 0.001 --rna -A /tmp/tmpu5mjbe5b/output.sto --F3 0.02 /tmp/tmpu5mjbe5b/query.a3m /root/public_databases/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta"
I0211 19:07:30.183948140557770815040 subprocess_utils.py:68] Launching subprocess "/hmmer/bin/nhmmer -o /dev/null --noali --cpu 8 -E 0.001 --rna -A /tmp/tmpfun30viz/output.sto --F3 0.02 /tmp/tmpfun30viz/query.a3m /root/public_databases/rnacentral_active_seq_id_90_cov_80_linclust.fasta"
I0211 19:07:34.878908140558676784704 subprocess_utils.py:97] Finished Nhmmer in4.695 seconds
I0211 19:10:13.374217140557770815040 subprocess_utils.py:97] Finished Nhmmer in163.190 seconds
I0211 19:22:05.155069140558173468224 subprocess_utils.py:97] Finished Nhmmer in874.972 seconds
I0211 19:22:05.155911140698391662592 pipeline.py:167] Getting RNA MSAs took 874.98 seconds for sequence UUUUCAUGCUACGCGUAG
I0211 19:22:10.876149140698391662592 pipeline.py:165] processing Multi-Proteins-RNA-interactions, random_seed=42
I0211 19:22:10.959557140698391662592 pipeline.py:258] Calculating bucket size forinput with 1231 tokens.
I0211 19:22:10.959770140698391662592 pipeline.py:264] Got bucket size 1280forinput with 1231 tokens, resulting in49 padded tokens.
Running AlphaFold 3. Please note that standard AlphaFold 3 model parameters are
only available under terms of use provided at
https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md.
If you donot agree to these terms and are using AlphaFold 3 derived model
parameters, cancel execution of AlphaFold 3 inference with CTRL-C, anddonot
use the model parameters.
Found local devices: [CudaDevice(id=0), CudaDevice(id=1)]
Building model from scratch...
Processing 1 fold inputs.
Processing fold input Multi-Proteins-RNA-interactions
Checking we can load the model parameters...
Running data pipeline...
Processing chain A
Processing chain A took 3158.42 seconds
Processing chain B
Processing chain B took 1165.33 seconds
Processing chain C
Processing chain C took 915.35 seconds
Processing chain D
Processing chain D took 874.98 seconds
Output directory: /root/af_output/multi-proteins-rna-interactions
Writing model input JSON to /root/af_output/multi-proteins-rna-interactions
Predicting 3D structure for Multi-Proteins-RNA-interactions for seed(s) (42,)...
Featurising data for seeds (42,)...
Featurising Multi-Proteins-RNA-interactions with rng_seed 42.
Featurising Multi-Proteins-RNA-interactions with rng_seed 42 took 42.23 seconds.
Featurising data for seeds (42,) took  47.22 seconds.
Running model inference for seed 42...
Running model inference for seed 42 took  153.54 seconds.
Extracting output structures (one per sample) for seed 42...
Extracting output structures (one per sample) for seed 42 took  7.00 seconds.
Running model inference and extracting output structures for seed 42 took  160.53 seconds.
Running model inference and extracting output structures for seeds (42,) took  160.53 seconds.
Writing outputs for Multi-Proteins-RNA-interactions for seed(s) (42,)...
Done processing fold input Multi-Proteins-RNA-interactions.
Done processing 1 fold inputs.

Alphafold3预测结果评估

       整个预测过程大概花费时间2小时。完成预测后会在对应路径下输出.cif文件。使用pymol软件即可查看三维结构并测量相互作用位点。如下图是从不同方向观测我预测的RdRp复合体合成的RNA结果,用不同颜色分别标记出NSP12, NSP7, NSP8和RNA。

       实际上这个结构已经有科学家用冷冻电镜解析了真实的三维结构,通过空间结构比对,可以发现预测的结构和真实的结构非常接近,RMSD值在0.15. 下图中蓝色和绿色分别是复合体相互作用的预测结构和真实结构的空间比对分析


       本期推送重点介绍了如何使用Alphafold3的核心功能完成蛋白和蛋白,蛋白和核酸的相互作用结构预测,总体而言,AlphaFold3不仅是一个结构预测工具,更是系统生物学时代的探针。它迫使研究者重新审视“结构-功能-机制”的线性逻辑,转而拥抱多维互作网络的研究范式。掌握这一工具,意味着在前沿科学竞争中获得先发制人的战略优势——因为最前沿的科学问题,往往始于对不可见世界的计算洞察。

往期精彩

实战(二):利用本地Alphafold3预测蛋白质和金属离子相互作用

实战!利用本地Alphafold3批量预测蛋白结构及其与生物大分子相互作用

Alphafold3源代码已完全公开(附本地安装教程)

申请Alphafold3模型参数

拿走不谢!Alphafold3数据库已打包百度网盘(文末领取链接)


【声明】内容源于网络
0
0
Dr.X的基因空间
【中国科学院博士】10年生命科学数据挖掘研究经验,关注生物医药领域体外诊断(IVD)方向,如肿瘤早筛、传染病未知病原快速检测中的技术创新及其与人工智能(AI)的赋能应用
内容 176
粉丝 0
Dr.X的基因空间 【中国科学院博士】10年生命科学数据挖掘研究经验,关注生物医药领域体外诊断(IVD)方向,如肿瘤早筛、传染病未知病原快速检测中的技术创新及其与人工智能(AI)的赋能应用
总阅读0
粉丝0
内容176