The Research and Application Progress of Transcriptome Sequencing Technology (I)

Ivan Chen
5 min readMay 28, 2020

--

Transcriptomics is a discipline that systematically studies gene transcription profiles from the overall transcription level and reveals the molecular mechanisms of complex biological pathways and trait regulatory networks. Before the development of high-throughput sequencing technology, high-throughput gene expression array and serial analysis of gene expression (SAGE) based on cDNA hybridization fluorescence detection were the main methods for studying gene expression information in animal and plant tissues means.

In the 1970s, the emergence of the first-generation sequencing technology (Sanger dideoxy sequencing technology) enabled sequencing of nucleic acid sequences. However, it is difficult to apply it to the research of omics sequencing due to the low throughput of the Sanger sequencing method, which cannot content the requirements of mass sequencing. The high-throughput sequencing technology, as known as next generation sequencing (NGS technology), realizes high-throughput and automation of sequencing to accelerate the rapid development of transcriptomics research. At present, the next generation sequencing platform mainly includes the 454-sequencing technology launched by 454Life Sciences, the Solexa and SOLID sequencing technologies successively launched by Illumina and ABI. The third-generation sequencing technology is also called single molecule sequencing technology, with ultra-long read length (average read length 10–15 kb, maximum read length up to 60 kb), no PCR amplification bias and GC preference features, which is considered an ideal sequencing platform for genome-wide de novo genome assembly, full-length transcript sequencing, and epigenetic sequencing.

  1. Transcriptome Sequencing

Transcriptome sequencing (RNA-seq) is a technique that uses high-throughput sequencing technology to sequence and analyze all or part of m RNA, small RNA, and no-coding RNA in a cell or tissue. The general flow of transcriptome sequencing and analysis is shown in Figure 1. Currently, the most common transcriptome sequencing is based on next-generation sequencing technology, with Illumina’s NGS sequencing platform as the mainstream. This method needs to process RNA samples according to the purpose of the experiment, reverse transcribe some or all of mRNA, miRNA, lncRNA into a cDNA library, and then use a high-throughput sequencing platform for sequencing. Usually, according to the length of sequencing objects, libraries of fragments of different sizes will be selected when building libraries by sequencing. In general, when performing mRNA sequencing, a library of several hundred bp fragments is usually established when the library is constructed, and bidirectional sequencing is more preferred; when miRNA sequencing, miRNA is usually separated, and a small fragment library is separately established and then unidirectional sequencing is performed; and Long non-coding RNA (lncRNA) has forward transcription and reverse transcription, so chain-specific library sequencing is often used.

Figure 1. The general flow of transcriptome sequencing and analysis

1.1. mRNA sequencing

For mRNA sequencing, the mRNA has poly-A structural features at the 3 ‘end to enrich mRNA molecules without intron sequences transcribed by specific tissues or cells under specific time and space conditions and then reverse transcribed into cDNA for library sequencing. According to the mRNA sequence obtained by sequencing, it can be accurately aligned to the reference genome sequence to judge the boundary between exon and intron. Species without reference genomes can obtain specific sequence information of transcripts by de novo assembly of sequences. The study of transcriptomes in different tissues of different species and different developmental stages can reveal the characteristics of species specificity and spatial and temporal differences in gene transcription, and provide clues to the transcriptome level for an in-depth understanding of the molecular mechanism of species and traits.

1.2. Small RNA sequencing

Small RNA refers to RNA molecules with a length of 20–50 nt, including miRNA, siRNA, snoRNA, and piRNA, etc., which are regulated biological processes by participating in mRNA degradation, inhibiting the translation process, promoting heterochromatin formation and DNA apparent modification. According to the structural characteristics of the 5'-end phosphate group and the 3'-end hydroxyl group of the small RNA, which links the sequencing adapters and screens the small RNA sequencing library for sequencing. The biological functions of miRNA are relatively conserved among species, which is the focus of small RNA sequencing research.

1.3. lncRNA sequencing

Long-chain non-coding RNA (lncRNA) is a type of RNA molecules with a length of more than 200 nt and no coding protein function, which often has strong species and tissue specificity. Part of the lncRNA is located in the enhancer region of the gene, and realizes the function of the enhancer through its own transcription. lncRNA is regulated in a variety of ways and widely exists in various types of animal and plant cells. It can regulate the functions of various biological molecules by participating in the formation of chromosome structure and binding with transcription factors, proteins, RNA precursors, and miRNA. Part of the lncRNA contains a ploy-A tail structure, so part of the lncRNA sequence information is often included in the m RNA sequencing results. The current research on lncRNA starts with finding lncRNA molecules that are differentially expressed, mainly based on the positional relationship between lncRNA and key coding genes, and further predicts the regulatory relationship between the two.

1.4. circRNA sequencing

Circular RNA (circ-RNA) has a particularly stable loop-forming structure and is not easily degraded by RNases. It is believed that it can perform long-term transcriptional regulation in vivo. The same genomic sequence may produce multiple types of circRNA molecules. The different combinations of exon and intron shearing make circRNA may contain multiple exon or intron sequences. circRNA has the “sponge” effect of adsorbing miRNA molecules and intervenes in the process of miRNA regulation of m RNA. circRNA has a competitive inhibitory effect on mRNA transcription at the same genomic location, and circRNAs containing exons may also open loops and retranslate. circRNA is considered to have great potential in the clinical diagnosis of related diseases, molecular markers for prevention, and drug treatment targets. It is due to the stable function of the organism and the tissue-specific expression pattern. Features that are too relevant. Because circRNA functions stably on organisms, has tissue-specific expression patterns, and is less relevant to host gene expression, which is considered to be extremely useful in clinical diagnosis, prevention of molecular markers and drug treatment targets.

To be continued in Part II…

--

--

No responses yet