STC 2015

Abstract Details

10/14/2015 | 2:00 PM - 2:45 PM | Atlantic II

A Framework to Maintain Data Relevance in a Data Lake.

Data Lake, by definition, accepts data from any source, in any format and arriving at any time. Data Lake will eventually have to deal with issues like data quality, validity of data, and its misuse. In our approach, various types of data were generated to meet the entry criteria of the Data Lake. The associated header was parsed to recognize the data source, validity of subscription, category of data, type & size of data, and duration for which the data should be persisted. We have created a multiple data entry environment along with database for persistence. Primary Node recognizes and directs the input data stream based on secondary attribute and name for data classification and storage. The framework identifies various types of data in the incoming data stream and stores the data in the repository in a structured and organized manner. The framework avoids indiscriminate dumping of data in to the Data Lake by using subscription, hierarchy and period of validity of the data. Data has been segregated and stored based internal logical separation, which eliminate the eventuality of Data Swamp and help maintain the relevance of data.

Presentation:
This presentation has not yet been uploaded.

Handouts:
No handouts have been uploaded.

Tejas Dharamsi (Primary Presenter,Author), P.E.S Institute of Technology, tejasdharamsi@pesit.pes.edu;
Tejas Dharamsi is currently pursuing his Bachelors Degree in Computer Science Engineering at PES Institute of Technology, Bangalore India. He is currently Member Technical Staff at Ordell Ugo an environment to foster undergraduate research. He has been a student developer for Freenet Project Inc. at Google Summer of Code 2014 and is also the Google Student Ambassador, India - 2014 for PES Institute of Technology. Also a research intern with Carnegie Mellon University Electrical and Computer Engineering Department during the summer of 2015.

Satyanarayana Srinivas (Primary Presenter,Author), P.E.S University, satyasrinivas@pes.edu;
Satya Srinivas, a Research Scholar in PES University, his area of research interest is in Data Mining & Analytics, and Big Data Analytics with Visualisation, teaches Undergraduate and Post-Graduate students and guides them in research. His keen interests lies in fact that data explosion has happened and needs to be contained to study the hidden nuggets. He has been in the industry for 25 years and has led a number of multi-million dollars deals with Fortune 500 companies.

Sadiur Rahman (Co-Presenter,Author,Co-Author), P.E.S Institute of Technology, sadirahman1618@gmail.com;
Sadiur Rahman is currently pursuing his Bachelors Degree in Computer Science Engineering at PES Institute of Technology, Bangalore India. He is interested in Data Mining ,Big Data and its analytic. He is currently a Mozilla Firefox Student Ambassador for P.E.S Institute of Technology. He is also the main web developer for Mera Medicare, a U.S based pharmaceutical firm.

Rupam Rai (Author,Co-Author), P.E.S Institute of Technology, write2rupam855@gmail.com;
Rupam Rai is currently pursuing her Bachelors Degree in Computer Science Engineering at PES Institute of Technology, Bangalore India. She is interested in Data Mining ,Big Data and its analytic. Previously, an intern at S.A.I.L C&IT department.