Big Data Analytics: Predicting Academic Course Preference Using HADOOP Inspired Map Reduce


  • K. Manasa
  • Md. Asim


With the rapid development of the new technologies, new academic courses introduced to educational system which results in large data which is unregulated data and it is also challenging for the students to prefer those courses in order to increase their career prospects and another challenge is to convert the unregulated data into structured and meaningful information there is a need of Data Mining Tools. Hadoop Distributed File System is used to hold large amount of data and these files are stored in redundant fashion across multiple machines. The process of extracting information is more complex and it is difficult to handle at a shorter duration, this is because the data is unstructured. To handle large amounts of data, the data mining systems uses file systems for decision-making. Knowledge extracted using Map Reduce will be helpful in decision making for students to determine courses chosen for industrial trainings. Map Reduce jobs run over Hadoop clusters by splitting the big data into small chunks and process the data by running it parallel on distributed clusters. The current work believes that only large volumes of data can be evaluated for the efficiency of HDFS tools in data handling and extraction, but not for systems where data is minimal, just like our project where the information is too small for the students to choose their courses. It is also observed that HDFS systems are not as effective in batch processing and for real-time applications. In order to overcome these two problems, we proposed to work on UIDAI Aadhaar real-time dataset to perform data analysis using Apache Spark Ecosystems.