Data Preprocessing: A preliminary step for web data mining

  • Huma Jamshed
  • M. Sadiq Ali Khan
  • Muhammad Khurram
  • Syed Inayatullah
  • Sameen Athar

Resumen

In recent years immense growth of data i.e. big data is observed resulting in a brighter and more optimized future. Big Data demands large computational infrastructure with high–performance processing capabilities. Preparing big data for mining and analysis is a challenging task and requires data to be preprocessed to improve the quality of raw data. The data instance representation and quality are foremost. Data preprocessing is preliminary data mining practice in which raw data is transformed into a format suitable for another processing procedure. Data preprocessing improves the data quality by cleaning, normalizing, transforming and extracting relevant feature from raw data. Data preprocessing significantly improve the performance of machine learning algorithms which in turn leads to accurate data mining. Knowledge discovery from noisy, irrelevant and redundant data is a difficult task therefore precise identification of extreme values and outlier, filling up missing values poses challenges. This paper discusses various big data pre–processing techniques in order to prepare it for mining and analysis tasks.

Publicado
2019-05-17