Pengaruh pembersihan data terhadap akurasi prediksi model machine learning
Abstract
This study aims to evaluate the impact of data cleaning on the accuracy of machine learning predictions using the "World Energy Consumption" dataset, which includes energy consumption data from 1965 to 2023. Two approaches to handling missing values were compared: imputation using the forward fill (ffill) method and replacing missing values with zero. A linear regression model was used to predict the energy consumption for 2023, and Mean Absolute Error (MAE) was calculated to assess model performance. The analysis revealed that the MAE for the model trained on data with imputation was higher compared to the model trained on data with missing values replaced by zero. This finding suggests that the ffill imputation method may not always be optimal if previous values are not relevant, while replacing missing values with zero provides more stable results. The study recommends exploring alternative imputation methods and testing various datasets to enhance prediction accuracy in the future.
Downloads
References
Palmer, A., Jimenez, R., & Gervilla, E. (2011). Data Mining: Machine Learning and Statistical Techniques.
Harrington, Peter. (2012). Machine Learning in Action, Manning Publications Co, New York.
Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data.
Puteri, K., & Silvanie, A. (2020). Machine learning untuk model prediksi harga sembako dengan metode regresi linear berganda. Jurnal Nasional Informatika (JUNIF), 1(2), 82-94.
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques.
Valtorta, M. (2006). The Effects of Data Quality on Machine Learning Algorithms.
Seyedzadeh, S., Glesk, I., & Roper, M.. Machine learning for estimation of building energy consumption and performance: a review.
Dwi Shaputra, R., Hidayat, S. Implementasi regresi linier untuk prediksi penjualan dan cash flow pada aplikasi point of sales restoran.
Copyright (c) 2024 Zidan Firdaus Tirta

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.



