Pengaruh pembersihan data terhadap akurasi prediksi model machine learning

  • Zidan Firdaus Tirta Program Studi Teknik Informatika, Universitas Islam Negeri Maulana Malik Ibrahim Malang
Keywords: data imputataion;, mean absolute error;, linear regression

Abstract

This study aims to evaluate the impact of data cleaning on the accuracy of machine learning predictions using the "World Energy Consumption" dataset, which includes energy consumption data from 1965 to 2023. Two approaches to handling missing values were compared: imputation using the forward fill (ffill) method and replacing missing values with zero. A linear regression model was used to predict the energy consumption for 2023, and Mean Absolute Error (MAE) was calculated to assess model performance. The analysis revealed that the MAE for the model trained on data with imputation was higher compared to the model trained on data with missing values replaced by zero. This finding suggests that the ffill imputation method may not always be optimal if previous values are not relevant, while replacing missing values with zero provides more stable results. The study recommends exploring alternative imputation methods and testing various datasets to enhance prediction accuracy in the future.

Downloads

Download data is not yet available.

References

Palmer, A., Jimenez, R., & Gervilla, E. (2011). Data Mining: Machine Learning and Statistical Techniques.

Harrington, Peter. (2012). Machine Learning in Action, Manning Publications Co, New York.

Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data.

Puteri, K., & Silvanie, A. (2020). Machine learning untuk model prediksi harga sembako dengan metode regresi linear berganda. Jurnal Nasional Informatika (JUNIF), 1(2), 82-94.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques.

Valtorta, M. (2006). The Effects of Data Quality on Machine Learning Algorithms.

Seyedzadeh, S., Glesk, I., & Roper, M.. Machine learning for estimation of building energy consumption and performance: a review.

Dwi Shaputra, R., Hidayat, S. Implementasi regresi linier untuk prediksi penjualan dan cash flow pada aplikasi point of sales restoran.

PlumX Metrics

Published
2024-12-02
How to Cite
Tirta, Z. F. (2024). Pengaruh pembersihan data terhadap akurasi prediksi model machine learning. Maliki Interdisciplinary Journal, 2(11). Retrieved from https://urj.uin-malang.ac.id/index.php/mij/article/view/10708
Section
Articles