中国普通外科杂志 ›› 2021, Vol. 30 ›› Issue (3): 276-285..doi: 10.7659/j.issn.1005-6947.2021.03.005

• 专题研究 • 上一篇    下一篇

基于生物信息学胰腺腺癌关键基因的筛选及支持向量机诊断模型的构建

张波1,徐涛1,徐浩2,夏雨1,周文策1,2   

  1. (1. 兰州大学第一临床医学院,甘肃 兰州 730000;2. 兰州大学第一医院 普通外科,甘肃 兰州 730000)
  • 收稿日期:2020-10-29 修回日期:2021-02-12 出版日期:2021-03-25 发布日期:2021-04-06
  • 通讯作者: 周文策, Email: zhouwc@lzu.edu.cn
  • 作者简介:张波,兰州大学第一临床医学院住院医师,主要从事肝胆胰疾病的临床和基础方面的研究。
  • 基金资助:
    甘肃省重点研发计划基金资助项目(17YF1FA128);甘肃省兰州市人才创新创业基金资助项目(2017-RC-37)。

Identification of hub genes in pancreatic adenocarcinoma and construction of a support vector machine diagnostic classifier based on bioinformatics approaches

ZHANG Bo1, XU Tao1, XU Hao2, XIA Yu1, ZHOU Wence1, 2   

  1. (1. The First Clinical Medical College, Lanzhou University, Lanzhou 730000, China; 2. Department of General Surgery, the First Hospital of Lanzhou University, Lanzhou 730000, China)
  • Received:2020-10-29 Revised:2021-02-12 Online:2021-03-25 Published:2021-04-06

摘要: 背景与目的:胰腺癌是一种常见的消化道恶性肿瘤,其主要病理类型为胰腺腺癌(PAAD),因早期诊断困难且缺乏有效的治疗措施,故预后极差。因此,寻找PAAD的诊治新靶标具有重要意义。本研究通过生物信息学方法筛选与PAAD诊断和预后相关的关键基因,构建分类PAAD样本和正常样本的支持向量机(SVM)模型,以期为PAAD的诊治及机制研究提供依据。
方法:从基因表达数据库(GEO)中下载3个芯片数据(GSE28735、GSE62165、GSE62452),应用R语言的Limma包筛选出PAAD组织和正常组织间的差异表达基因(DEGs)。利用STRING数据库对DEGs进行GO和KEGG通路富集分析。再以STRING数据库构建DEGs的蛋白互作网络(PPI),利用Cytoscape软件进行可视化编辑,并通过MCODE插件进行关键子网络分析。使用R语言的survival包筛选PPI和关键子网络中与预后相关的关键节点,将其上传至Metascape进行功能富集分析。利用R语言caret包中递归式特征消除(RFE)算法筛选关键节点中的最优特征基因,在GEPIA数据库中验证最优特征基因的表达差异,随后通过R语言的e1071包构建最优特征基因的SVM模型,并在3个芯片数据中借助R语言的pROC包对该模型进行验证。在TCGA数据库中,用R语言的survminer包筛选出最优特征基因中与PAAD预后相关的基因作为关键基因。
结果:共筛选出257个DEGs,包括168个上调基因和89个下调基因。GO分析结果表明DEGs主要参与细胞外基质的组成、细胞黏附、丝氨酸肽酶活性等生物学过程。KEGG分析显示,DEGs主要富集于蛋白质的消化和吸收、胰腺的分泌、黏着斑、PI3K-Akt信号通路。生存分析筛选出14个关键节点同时在GSE28735和GSE62452中与预后相关(均P<0.05),这些基因在肿瘤侵犯和肿瘤发生中发挥一定作用。RFE筛选出8个最优特征基因:LAMA3、FN1、ITGA3、MET、PLAU、CENPF、MMP14、OAS2;GEPIA数据库验证发现这8个最优特征基因在PAAD组织中明显上调(均P<0.01);这些基因构建的SVM模型在3个芯片数据中ROC曲线的AUC依次为0.898、1.000、0.905。TCGA数据库验证发现LAMA3、ITGA3、MET、PLAU、CENPF及OAS2的上调与PAAD预后不良有关(均P<0.05)。
结论:关键基因LAMA3、ITGA3、MET、PLAU、CENPF及OAS2可能成为PAAD诊治的新靶点;基于8个最优特征基因构建的SVM模型可有效诊断PAAD。

关键词: 胰腺肿瘤, 基因表达谱, 支持向量机, 计算生物学

Abstract: Background and Aims: Pancreatic cancer is a common malignant tumor of the digestive tract. Its main pathological type is pancreatic adenocarcinoma (PAAD). Due to the difficulty of early diagnosis and lack of effective treatment measures, the prognosis of PAAD is extremely poor. Therefore, defining new targets for the diagnosis and treatment of PAAD is of great significance. This study was conducted to screen the hub genes related to the diagnosis and prognosis of PAAD by bioinformatics analysis, and then construct a support vector machine (SVM) model to classify PAAD and normal pancreatic samples, so as to provide a useful resource for researches in terms of diagnosis, treatment and mechanism of PAAD. 
Methods: Three microarray datasets (GSE28735, GSE62165, GSE62452) were downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) between PAAD tissue and normal pancreatic tissue were screened using Limma package of R language. GO and KEGG pathway enrichment analysis of the DEGs were performed using STRING database. Then, protein-protein interaction networks (PPI) of the DEGs were generated using the STRING server and visualized by Cytoscape software. Key subnetwork module analyses were performed through MCODE plug-in. R language survival package was used to screen the key nodes related to prognosis in PPI and key subnetworks, and then, the key nodes were uploaded to Metascape for function enrichment analysis. The recursive feature elimination (RFE) algorithm in caret package of R language was used to select the optimal feature genes in key nodes, and the expression differences of the optimal feature genes were verified in GEPIA database. A SVM classifier based on the optimal feature genes was constructed using the R language e1071 package, and the R language pROC package was used to verify the model in the 3 microarray datasets. In the TCGA database, the R package survminer was used to select the genes related to the prognosis of PAAD among the optimal feature genes as the hub genes. 
Results: A total of 257 DEGs were screened, including 168 up-regulated genes and 89 down-regulated genes. GO analysis showed that DEGs were mainly involved in biological processes such as the extracellular matrix organization, cell adhesion, serine-type peptidase activity. KEGG analysis showed that DEGs were mainly enriched in protein digestion and absorption, pancreatic secretion, focal adhesion and PI3K-Akt signaling pathway. Survival analysis showed that 14 key nodes were associated with the prognosis in both GSE28735 and GSE62452 (all P<0.05), and these genes played a certain role in neoplasm invasiveness and oncogenesis. RFE screened out 8 optimal feature genes: LAMA3, FN1, ITGA3, MET, PLAU, CENPF, MMP14, and OAS2; GEPIA database validation found that the 8 optimal feature genes were significantly up-regulated in PAAD tissues (all P<0.01). The AUC of ROC curve of the SVM model constructed by these genes in the 3 microarray datasets were 0.898, 1.000 and 0.905, respectively. TCGA database verification found that the up-regulations of LAMA3, ITGA3, MET, PLAU, CENPF and OAS2 were associated with poor prognosis of PAAD (all P<0.05).
Conclusion: The hub genes LAMA3, ITGA3, MET, PLAU, CENPF and OAS2 may be new targets for diagnosis or treatment of PAAD. The SVM model based on 8 optimal feature genes offers an effective tool for diagnosing PAAD.

Key words: Pancreatic Neoplasms, Gene Expression Profiling, Support Vector Machine, Computational Biology

中图分类号: 

  • R735.9
[1] 韩仕锋, 金春风, 朱磊. 预后指数在可切除胰腺导管腺癌患者术后评估中临床价值[J]. 中国普通外科杂志, 2021, 30(3): 247-253.
[2] 秦雯, 陈泰文, 郑海平, 杨建宇, 朱小东. 血清ANGPTL2与胰腺癌临床病理特征的关系及其在胰腺癌诊断中的价值[J]. 中国普通外科杂志, 2021, 30(3): 254-260.
[3] 刘向梅, 许达峰, 王春玲, 符于正, 王丹, 武金才. NIMA相关激酶2在胰腺癌中的表达及其临床意义[J]. 中国普通外科杂志, 2021, 30(3): 269-275.
[4] 李铭旭, 仲成成, 张功铭, 胡伟, , 王仲, . DEP结构域蛋白质1B在胰腺癌中的表达及其临床意义[J]. 中国普通外科杂志, 2021, 30(3): 261-268.
[5] 李威威, 刘金龙. 循环肿瘤DNA在胰腺癌中的临床应用进展[J]. 中国普通外科杂志, 2021, 30(3): 337-342.
[6] 谢学文, 费书珂. 中性粒细胞胞外诱捕网与胰腺疾病的研究进展[J]. 中国普通外科杂志, 2021, 30(3): 343-348.
[7] 王蒲雄志, 于新哲, 史向军, 袁周 . 胰腺导管腺癌的免疫治疗研究进展 [J]. 中国普通外科杂志, 2021, 30(3): 330-336.
[8] 张帆, 李泽东, 彭禹, 陈胜, 周钧. 基于血清miRNA表达数据的胰腺癌诊断决策树构建[J]. 中国普通外科杂志, 2021, 30(2): 211-218.
[9] 韩伟光, 莘玮, 苏水霞, 王青. 基于生物信息学的胆囊癌差异表达谱中关键蛋白调控基因分析[J]. 中国普通外科杂志, 2021, 30(2): 165-172.
[10] 李文菠, 孙成杰, 周国俊, 应伟, 冯彦超, 黄婷, 侍琳, 黄理政, 李健水, 冷政伟, . 肝细胞癌发生发展关键基因及其功能的生物信息学分析[J]. 中国普通外科杂志, 2021, 30(1): 32-43.
[11] 周发权, 陈师, 孙红玉, 汤礼军. 系统免疫炎症指数与胰腺癌患者预后关系的系统评价和Meta分析[J]. 中国普通外科杂志, 2020, 29(9): 1076-1083.
[12] 龙官保, 黎昕, 肖朝文, 张俊, 赵新阳, 蔡常春. 紧邻主胰管的胰腺囊性肿瘤局部切除的初步经验:附4例报告[J]. 中国普通外科杂志, 2020, 29(9): 1147-1150.
[13] 黄治伟, 谭鹏, 杜毅超, 石昊, 陈浩, 钱保林, 刘羽, 付文广, 李秋. 隐丹参酮对胰腺癌细胞生物学行为的影响及作用机制[J]. 中国普通外科杂志, 2020, 29(9): 1060-1068.
[14] 顾亚奇, 赵沨, 余红东, 周茂旭. 加速康复外科对胰头癌行胰十二指肠切除术患者术后恢复及营养状况的影响[J]. 中国普通外科杂志, 2020, 29(8): 987-993.
[15] 任天宇, 周新童, 党胜春. miR-486-5p的靶基因预测及其在胰腺腺癌中作用的生物信息学分析[J]. 中国普通外科杂志, 2020, 29(6): 715-722.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李宇,郝杰,孙昊,王林,仵正,王铮,陶杰,杨雪. 一期腹腔镜胆囊切除联合胆总管探查取石与分期内镜取石和腹腔镜胆囊切除术治疗胆囊结石合并胆总管结石的比较[J]. 中国普通外科杂志, 2016, 25(2): 202 -208 .
[2] 肖卫星,周君,顾梦佳,肖广远. 精准肝切除在肝内胆管结石手术治疗中的应用[J]. 中国普通外科杂志, 2016, 25(2): 191 -196 .
[3] 徐安书1|傅朝春1|韦萍1|王峻峰2|刘煜1|付彪1|张世博1|邓大波1|杨晓宾1|陈会彬1. 3D打印技术在精准切除治疗肝脏肿瘤中的应用[J]. , 2018, 27(1): 29 -34 .
[4] 白宁1, 2|王栋2|欧阳锡武2|周乐杜2|王志明2. 苦参碱对大鼠肝缺血再灌注损伤的抑制作用及机制[J]. , 2018, 27(1): 81 -86 .
[5] 王烁, 陈波, 苗雄鹰. 谷胱甘肽过氧化物酶1在肝癌组织中的表达及其意义的生物信息学分析[J]. 中国普通外科杂志, 2019, 28(2): 179 -187 .
[6] 黄隽, 胡元萍, 陈创, 齐晓伟. 新型冠状病毒肺炎疫情特殊时期对乳腺癌治疗临床问题的若干思考[J]. 中国普通外科杂志, 2020, 29(2): 153 -160 .
[7] 廖信芳1,李正荣2,杨清水1,张乡城2, 李柱1,揭志刚2. microRNA-139-5p及其靶基因Notch1在结直肠癌中的作用[J]. 中国普通外科杂志, 0, (): 1373 -1378 .
[8] 项灿宏|童翾 . 肝门部胆管癌外科治疗的进展与争议[J]. , 2018, 27(2): 137 -142 .
[9] 黄耿文|申鼎成|何文|杨柳|周书毅|阳建怡|纪连栋|魏伟. 快速康复模式下的腹腔镜腹股沟疝修补术[J]. , 2016, 25(10): 1470 -1474 .
[10] 王桂立1|韩思林2|王利新1. 经腹腔干-胃十二指肠弓逆向介入治疗肠系膜上动脉开口处完全闭塞无残端病变1例[J]. , 2017, 26(6): 699 -705 .