A Big Data Approach in Mutation Analysis and Prediction
Although the technology advancement in the last few years has been exponentially growing, there are still a lot of medical problems that don’'t have an accessible solution. One of these problems is the one that genetics is facing: the absence of a solution for inspecting the previously reported genetic mutations. In order to confirm a mutation, the specialists need to narrow it down based on their experience and, if present, the few documented precedent cases. This paper focuses on presenting a solution for analyzing big amounts of historical genetic data in an efficient, fast and user-friendly way. As a proof of concept, it demonstrates the huge role that Big Data has in genetic mutations aggregation and it can be considered a starting point for similar solutions that aim to continuously innovate genetics. The effectiveness of our proposal is highlighted by comparing it with similar existing solutions.
 S. Bamford, E. Dawson, S. Forbes, J. Clements., R. Pettett, A. Dogan, A. Flanagan, J. Teague, P.A. Futreal, M.R. Stratton, and R. Wooster. The cosmic (catalogue of somatic mutations in cancer) database and website. Br. J. Cancer, 91(2):355–8, July 2004.
 Berkeley University. Understanding Evolution - The causes of mutations. http://evolution.berkeley.edu/evolibrary/article/evo 20. Online; 2017.
 A. Cockcroft and D. Sheahan. The Netflix Technology Blog. https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e. Online; 2011.
 B. Feldman, E.M. Martin, and T. Skotnes. Big data in healthcare hype and hope. Technical report, Dr. Bonnie 360, October 2012.
 L. Fernandes, M. O 0 Connor M., and V. Weaver. Big data, bigger outcomes. AHIMA, 83(10):38–43, 2012.
 S. Finlay. Predictive Analytics, Data Mining and Big Data: Myths, Misconceptions and Methods. Business in the Digital Economy. Palgrave Macmillan UK, 2014.
 French National Institute for Health and Medical Research. The portal for rare diseases and orphan drugs. http://www.orpha.net/consor/cgi-bin/index.php. Online; 2017.
 M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, November 2009.
 R. Hecht and S. Jablonski. NoSQL evaluation: A use case oriented survey. In 2011 International Conference on Cloud and Service Computing, pages 336–341, Dec 2011.
 Institute of Medical Genetics in Cardiff. The Human Gene Mutation Database. http://www.hgmd.cf.ac.uk/ac/index.php. Online; 2017.
 A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35–40, April 2010.
 S. Maiella, A. Rath, C. Angin, F. Mousson, and O. Kremp. [orphanet and its consortium: where to find expert-validated information on rare diseases]. Revue neurologique, 169(Suppl 1):S3–8, 2013.
 J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A Byers-Hung. Big data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, June 2011.
 Ministry for Primary Industries. COSMIC, the Catalogue Of Somatic Mutations In Cancer. http://cancer.sanger.ac.uk/cosmic. Online; v80, released 13-Feb-17.
 T. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997.
 P.D. Stenson, M. Mort, E.V. Ball, K. Evans, M. Hayden, S. Heywood, M. Hussain, A.D. Phillips, and D.N. Cooper. The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human Genetics, pages 1–13, 2017.
 G. Stoesser, W. Baker, and A. Broek. The embl nucleotide sequence database. Nucleic Acids Research, 30:21–26, 2002.
 T. A. M. C. Thantriwatte and C. I. Keppetiyagama. NoSQL query processing system for wireless ad-hoc and sensor networks. In 2011 International Conference on Advances in ICT for Emerging Regions (ICTer), pages 78–82, Sept 2011.
 B. Wang, L. Ruowang, and W. Perrizo. Big Data Analytics in Bioinformatics and Healthcare. IGI Global, Hershey, PA, USA, 1st edition, 2014.
 R. Wullianallur and V. Raghupathi. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1):1–3, 2014.
 B. Zenger. Can big data solve healthcares big problems? Health Byte, 2012.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
When the article is accepted for publication, I, as the author and representative of the coauthors, hereby agree to transfer to Studia Universitatis Babes-Bolyai, Series Informatica, all rights, including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author specifically retain: the right to make further copies of all or part of the published article for my use in classroom teaching; the right to reuse all or part of this material in a review or in a textbook of which I am the author; the right to make copies of the published work for internal distribution within the institution that employs me.