大数跨境
0
0

Aspera: Moving the bio-omics data at maximum speed

Aspera: Moving the bio-omics data at maximum speed Dr.X的基因空间
2021-07-04
1
导读:A guidence to use Apsera software to transfer the large biological files

Aspera: Moving the bio-omics data at maximum speed

Preface
Here, I want to develop this Subions into a platform that can promote biological workers in third world countries to apply bioinformatics to analyze biological data. On the other hand, I hope my Subions would become a bioinformatics belt and road. Each Wechat push will have both Chinese and English versions. In the current push, the high-speed data transfer tool name Aspera is mainly introduced. I hope this push would be meaningfully guide you guys on how to correctly install and use the software.

Introduction

    Today, with the development of bioinformatics and the cheapness of high-throughput sequencing, more and more multiomics data are being archived. The biological laboratories are facing greater challenges transferring large files and massive sets of data quickly and reliably between global and individuals. Failing to meet the challenges could limit the laboratory’s ability to meet scientific imperatives that yield obtained the huge scientific findings.     The current open-source alternatives software like Tsunami may work in the special and controlled network conditions. However, it a high cost to network efficiency. Therefore, the Aspera FASP technology is developed for transferring the large files in an efficient and low-cost way.

Installations

    The software Aspera could be installed in Linux, Windows and Mac systems. Here, we mainly focused on how to install it on Linux system since the the various analysis pipelines were mostly performed on this system. Thus, it is highly recommend to install a Unix-like system in advance if your laptop installed one of the other two system. The detailed codes about the installations are marked as follows:

wget https://ak-delivery04-mul.dhe.ibm.com/sar/CMA/OSA/092u0/0/ibm-aspera-connect-3.10.0.180973-linux-g2.12-64.tar.gz
tar xvf ibm-aspera-connect-3.10.0.180973-linux-g2.12-64.tar.gz
sh ibm-aspera-connect-3.10.0.180973-linux-g2.12-64.sh
cd  .aspera #the path were you installed
echo “PATH=$PATH:$PWD” >> ~/.bashrc
source ~/.bashrc

Application instance

    NCBI and EBI are the top two major biological data storage sites. Here, let’s started with downloading the raw genomic sequencing data from EBI via the Aspera software.
    Taking the sequencing data submitted by David et al as an example [1]. Open the EBI website and input the Bioproject number PRJEB36820. Download the text file from the place were I used the red arrow marked to archive the links.

    The detailed codes for downloading were shown as follows:

ascp -Q -T -l 100M -P 33001 -i asperaweb_id_dsa.putty era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/ERR395/007/ERR3957750/ERR3957750_1.fastq.gz . #input the path were the asperaweb_id_dsa.putty file placed

    According to my tests, an average speed with at least 10Mb per one second will reached, sometimes the speed will faster than 100Mb per one second.

[1] Keohane DM, Ghosh TS, Jeffery IB, Molloy MG, O'Toole PW, Shanahan F. Microbiome and health implications for ethnic minorities after enforced lifestyle changes. Nat Med. 2020 Jul;26(7):1089-1095. doi: 10.1038/s41591-020-0963-8. Epub 2020 Jul 6. PMID: 32632193.


【声明】内容源于网络
0
0
Dr.X的基因空间
【中国科学院博士】10年生命科学数据挖掘研究经验,关注生物医药领域体外诊断(IVD)方向,如肿瘤早筛、传染病未知病原快速检测中的技术创新及其与人工智能(AI)的赋能应用
内容 176
粉丝 0
Dr.X的基因空间 【中国科学院博士】10年生命科学数据挖掘研究经验,关注生物医药领域体外诊断(IVD)方向,如肿瘤早筛、传染病未知病原快速检测中的技术创新及其与人工智能(AI)的赋能应用
总阅读92
粉丝0
内容176