Background and overview
An exome is the sum of all protein coding sequences (ie, exons) on the genomic DNA of an individual. The human exome sequence accounts for only 1% of the entire human genome sequence, about 30 Mb, including about 180,000 exons. It is estimated that 85% of the human pathogenic mutations are located in the 1% protein coding sequence. Therefore, the exome group of patients with various diseases is sequenced and analyzed, and the “coding sequence”, which is the most relevant to the disease, is the region exome, which captures most of the disease-causing mutation information of the disease. In the localization of single-gene causative genes, it is a common practice to assume that the disease-causing mutation is a rare mutation in a patient and does not exist in the commonly used database. Therefore, the commonly used strategy is to filter the variants obtained by sequencing to commonly used databases (such as the dbSNP database), HapMap plan database and Thousand Genome Project database). Some scholars also use their own database (in-housedatabase) for filtering.
Whole exome sequencing (WES) is a novel genomic analysis technique that targets only exon region DNA. It firstly captures and enriches the whole genome exon region DNA, then performs high-throughput sequencing, and then combines bioinformatics analysis to identify pathogenic genes associated with rare and common diseases. At present, compared with whole genome sequencing, WES has the advantages of low cost and high efficiency. Since 2009, WES has been widely used in the study of Mendel’s genetic diseases, rare syndromes and complex diseases, and has achieved unprecedented achievements. In recent years, the exome sequencing has been localized and cloned for the pathogenic genes of many intractable diseases. Exome capture sequencing has highlighted its superiority in the fields of basic medicine and translational medicine.
Applications of WES in diseases
After exome sequencing was successfully applied in Freeman-Sheldon and Miller syndrome, it was used in identification of disease-causing gene MLL2 of Kabuki syndrome and succeed. In the identification of pathogenic genes for single-gene genetic diseases, target sequence capture sequencing has also been successfully applied, which further reduces the cost of pathogenic gene cloning. For example, X-exome capture sequencing was applied to the study of terminalosseous dysplasia. By sequencing two unrelated patients, FLNA was identified as its causative gene. Exome capture sequencing has also achieved significant results in the study of polygenic diseases.
In 2011, the Chen Saijuan group of the School of Medicine of Shanghai Jiaotong University, through the sequencing of the exon of acute monocytic leukemia, found that DNMT3A gene mutation occurred in high frequency in this type of leukemia patients. Further studies confirmed DNA methylation was related to the occurrence of the disease. The SWI/SNF chromatin remodeling complex gene PBRM1 was identified as the second major oncogene of clear cell carcinoma by exome sequencing of clear cell carcinoma tissues, and further confirmed the role of somatic inheritance on the disease., emphasized the role of chromatin biology in the pathogenesis of cancer.
By sequencing 10 cases of pancreatic neuroendocrine tumor tissue, it was found that the MEN1, DAXX, and ATRX genes involved in chromosome remodeling have high frequency mutations in somatic cells, and the genes in the mTOR pathway also have high frequency mutations.
By sequencing two cases of metastatic uveal melanoma, Harbour et al found that the BAP1 gene is mutated in high frequency in somatic cells.
In 2011, Cai Zhiming group of Peking University Shenzhen Hospital sequenced eight exon groups of human bladder transitional cell tissue and found that UTX, MLL-MLL3, CREBBP-EP300, NCOR1, ARID1A and CHD6 genes are high in transitional cell carcinoma tissues.
Frequency mutations provide new ideas for the study of the genetic basis of transitional cell carcinoma.
Through the resequencing of ion channel-related genes, it is believed that obtaining variants by large sample sequencing is only the first step in disease research, and functional studies on cells or models are more critical. The use of exome sequencing for the study of polygenic diseases, including cancer, is currently carried out at the organizational level, and studies at the germline level have not been reported. In the study of polygenic diseases, exome sequencing is more inclined to a suggestive approach, which is to do small sample exome sequencing first, and then use target sequence capture or Sanger direct sequencing to verify in large sample validation, which greatly reduces the cost of research.
A brief introduction of sequencing platform
Currently, the three major exome capture sequencing platforms used commercially are: Sure Select Human All Exon 50Mb, SeqCap EZ Exome Library v2.0, and TruSeq Exome Enrichment. In 2011, Clark et al. of Stanford University School of Medicine conducted a comprehensive evaluation of the three commercial platforms.
Further reading: Principles and Workflow of Whole Exome Sequencing