大数跨境
0
0

A Briefly Introduce of Local Alignment: BLAST

A Briefly Introduce of Local Alignment: BLAST Dr.X的基因空间
2022-02-01
0
导读:Better understanding the process of BLAST in an arithmetic way.

A Briefly Introduce of Local Alignment: BLAST

Preface
Sequence alignment is a common and classical method in bioinformatics research. After years of development, various methods of sequence alignment have been established. BLAST(Basic Local Alignment Search Tool) algorithm is a pair-wise sequences local alignment algorithm proposed by Altschul et al in 1990. It uses a short fragment matching algorithm and an effective statistical model to find the best Local Alignment effect between the target sequence and the database. The current push will briefly explain BLAST from an algorithmic perspective.

Basic idea of BLAST

        The BLAST would improve the accuracy of matching by producing fewer but better quality enhancement points. First using the hash method for the index to the position of the query sequence base by establishing hash table, and the query sequence and database all sequences in the league match, find out the exact match "seed", to "seed" as the center, using dynamic programming method to extend into longer regions, both sides in the range of certain precision to select qualified al serial output. The sequence with the highest score is the sequence with the most alignment.

Algorithm process of BLAST

1.The first step of BLAST will find some sub-sequences with equally length that can form unvacancy perfect match from two sequences, that is, pairs of sequence fragments.
2.Next, find all pairs of the fragments of two sequences which exceeds a certain degree value.
3.By extending the pair of sequence fragments according to the given similarity threshold, a certain length of pair of similarity fragments are obtained, which is called high score fragment pair.

        Specially, the search space between two sequences can be visualized as a graph with one sequence along the X-axis and the other along the Y-axis(As shown in Figure 2). Each point in this space represents a pairing of two letters, one from each sequence. Each pair of letters has a score that is determined by a scoring matrix whose values were determined empirically. An alignment is a sequence of paired letters that may contain gaps. Ungapped alignments appear as diagonal lines in the search space, and the score of an ungapped alignment is simply the sum of the scores of the individual letter pairs. Alignments containing gaps appear as broken diagonals in the search space, and their score is the sum of the letter pairs minus the gap costs, which usually penalize more score points for initiating a gap than extending a gap. In a BLAST report, unaligned regions aren't displayed, and gaps are represented by dashes. However, a simple change in parameters can change one into the other. The diagrams in this chapter show only one gapped alignment, which is indicated in Figure 2.

Figure 2

        However, unlike the other alignment algorithm, BLAST will not explore the entire search space between two sequences. Minimizing the search space is the key to its speed but at the cost of a loss in sensitivity. You will find that the speed/sensitivity trade-off is a key concept when designing BLAST experiments. How exactly does BLAST find similarities without exploring the entire search space? It uses three layers of rules to sequentially refine potential high scoring pairs (HSPs). These heuristic layers, known as seeding, extension, and evaluation, form a stepwise refinement procedure that allows BLAST to sample the entire search space without wasting time on dissimilar regions.

Epilogue
Therefore, through the understanding of BLAST algorithm, it can be concluded that BLAST is not the most accurate sequence alignment approach, but the most reasonable alignment method. Since it is designed to obtain the most reasonable alignment of a target sequence from a large s of sequences by given a limited time.


【声明】内容源于网络
0
0
Dr.X的基因空间
【中国科学院博士】10年生命科学数据挖掘研究经验,关注生物医药领域体外诊断(IVD)方向,如肿瘤早筛、传染病未知病原快速检测中的技术创新及其与人工智能(AI)的赋能应用
内容 176
粉丝 0
Dr.X的基因空间 【中国科学院博士】10年生命科学数据挖掘研究经验,关注生物医药领域体外诊断(IVD)方向,如肿瘤早筛、传染病未知病原快速检测中的技术创新及其与人工智能(AI)的赋能应用
总阅读169
粉丝0
内容176