A Big Data Approach in Mutation Analysis and Prediction

  • Silvana Albert Department of Computer Science, Faculty of Mathematics and Computer Science, Babeș-Bolyai University, Cluj-Napoca, Romania


Although the technology advancement in the last few years has been exponentially growing, there are still a lot of medical problems that don’'t have an accessible solution. One of these problems is the one that genetics is facing: the absence of a solution for inspecting the previously reported genetic mutations. In order to confirm a mutation, the specialists need to narrow it down based on their experience and, if present, the few documented precedent cases. This paper focuses on presenting a solution for analyzing big amounts of historical genetic data in an efficient, fast and user-friendly way. As a proof of concept, it demonstrates the huge role that Big Data has in genetic mutations aggregation and it can be considered a starting point for similar solutions that aim to continuously innovate genetics. The effectiveness of our proposal is highlighted by comparing it with similar existing solutions.


[1] S. Ayme and J. Schmidtke. Networking for rare diseases: a necessity for Europe. Bundesgesundheitsblatt, 2007.
[2] S. Bamford, E. Dawson, S. Forbes, J. Clements., R. Pettett, A. Dogan, A. Flanagan, J. Teague, P.A. Futreal, M.R. Stratton, and R. Wooster. The cosmic (catalogue of somatic mutations in cancer) database and website. Br. J. Cancer, 91(2):355–8, July 2004.
[3] Berkeley University. Understanding Evolution - The causes of mutations. http://evolution.berkeley.edu/evolibrary/article/evo 20. Online; 2017.
[4] A. Cockcroft and D. Sheahan. The Netflix Technology Blog. https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e. Online; 2011.
[5] B. Feldman, E.M. Martin, and T. Skotnes. Big data in healthcare hype and hope. Technical report, Dr. Bonnie 360, October 2012.
[6] L. Fernandes, M. O 0 Connor M., and V. Weaver. Big data, bigger outcomes. AHIMA, 83(10):38–43, 2012.
[7] S. Finlay. Predictive Analytics, Data Mining and Big Data: Myths, Misconceptions and Methods. Business in the Digital Economy. Palgrave Macmillan UK, 2014.
[8] French National Institute for Health and Medical Research. The portal for rare diseases and orphan drugs. http://www.orpha.net/consor/cgi-bin/index.php. Online; 2017.
[9] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, November 2009.
[10] R. Hecht and S. Jablonski. NoSQL evaluation: A use case oriented survey. In 2011 International Conference on Cloud and Service Computing, pages 336–341, Dec 2011.
[11] Institute of Medical Genetics in Cardiff. The Human Gene Mutation Database. http://www.hgmd.cf.ac.uk/ac/index.php. Online; 2017.
[12] A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35–40, April 2010.
[13] S. Maiella, A. Rath, C. Angin, F. Mousson, and O. Kremp. [orphanet and its consortium: where to find expert-validated information on rare diseases]. Revue neurologique, 169(Suppl 1):S3–8, 2013.
[14] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A Byers-Hung. Big data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, June 2011.
[15] Ministry for Primary Industries. COSMIC, the Catalogue Of Somatic Mutations In Cancer. http://cancer.sanger.ac.uk/cosmic. Online; v80, released 13-Feb-17.
[16] T. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997.
[17] P.D. Stenson, M. Mort, E.V. Ball, K. Evans, M. Hayden, S. Heywood, M. Hussain, A.D. Phillips, and D.N. Cooper. The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human Genetics, pages 1–13, 2017.
[18] G. Stoesser, W. Baker, and A. Broek. The embl nucleotide sequence database. Nucleic Acids Research, 30:21–26, 2002.
[19] T. A. M. C. Thantriwatte and C. I. Keppetiyagama. NoSQL query processing system for wireless ad-hoc and sensor networks. In 2011 International Conference on Advances in ICT for Emerging Regions (ICTer), pages 78–82, Sept 2011.
[20] B. Wang, L. Ruowang, and W. Perrizo. Big Data Analytics in Bioinformatics and Healthcare. IGI Global, Hershey, PA, USA, 1st edition, 2014.
[21] R. Wullianallur and V. Raghupathi. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1):1–3, 2014.
[22] B. Zenger. Can big data solve healthcares big problems? Health Byte, 2012.
How to Cite
ALBERT, Silvana. A Big Data Approach in Mutation Analysis and Prediction. Studia Universitatis Babeș-Bolyai Informatica, [S.l.], v. 62, n. 1, p. 75-89, may 2017. ISSN 2065-9601. Available at: <http://www.cs.ubbcluj.ro/~studia-i/journal/journal/article/view/7>. Date accessed: 29 nov. 2020. doi: https://doi.org/10.24193/subbi.2017.1.06.