A Study on Data Scaling Methods for Machine Learning

Authors

DOI:

https://doi.org/10.55938/ijgasr.v1i1.4

Keywords:

Scalability in Machine Learning, Data Scaling For Machine Learning, Utility of Scalability in Machine-Learning, Scalability Methods of Machine-Learning

Abstract

Machine learning (ML), a computational self-learning platform, is expected to be applied in a variety of settings. ML, on the other hand, uses a model built with a learning structure rather than traditional code that is written line by line in a continuous pattern. These models are created and equipped to determine the results of training using historical data. Scalability is a major challenge in real machine learning programs. Many ML-based technologies are essential to quickly analyze new data and create forecasts, as forecasts become meaningless after a few ticks (think real-time methods such as stock markets and clickstream data). Many machine-learning programs, on the other hand, need to be able to scale and train with gigabytes or terabytes of data during model training (As is found in the model from a web-scale image corpus). High-dimensional challenges pose new obstacles to machine learning professionals who are increasingly interested in scalability as well as algorithm quality. Against the backdrop of the current situation, this overview article on the scope of scalability in machine learning platforms collects, investigates, and analyzes the current state, aspects, and perspectives of scalability that can be added to machine learning platforms in a variety of ways to improve efficiency. The purpose is to do. Reliability when processing large amounts of data.

Downloads

Download data is not yet available.

References

Xi Hang Cao, Ivan Stojkovic & Zoran Obradovic: A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC Bioinformatics. Vol-17. Article number-359. 2016.

Amr Elrafey, janusz wojtusiak: Recent advances in scaling-down sampling methods in machine learning. WIREs Computational Statistics. 2017.

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou: Deep Learning Scaling Is Predictable, Empirically. Baidu Research. 2017.

Payman Mohassel, Yupeng Zhang: SecureML: A System for Scalable Privacy-Preserving Machine Learning. IEEE Symposium on Security and Privacy. 2017.

Andres R. Masegosa, Ana M. Martınez, Darıo Ramos-Lopez, Rafael Caba˜nas, Antonio Salmeron, Thomas D. Nielsen, Helge Langseth, Anders L. Madsen: AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning. Arxiv.org. 2017.

Xing Wan: Influence of feature scaling on convergence of gradient iterative algorithm. Journal of Physics: Conference Series. IOP Publishing. Number-1213. 2019.

Anna Karen Garatees Camilla, Amir Hajjam El Hassani, Emmanuel Andres: Big data scalability based on Spark Machine Learning Libraries. Conference Paper. 2019.

Hadis Karimipour, Ali Dehghantanha, Reza M. Parizi, Kim-Kwang Raymond Choo, Henry Leung: A Deep and Scalable Unsupervised Machine Learning System for Cyber-Attack Detection in Large-Scale Smart Grids. Special Section On Digital Forensics Through Multimedia Source Inference. 2019

Md Arafatur, RahmanaA. Taufiq, AsyharibL.S.Leonga, G.B.Satryac, M.Hai Taod, M.F.Zolkipli: Scalable machine learning-based intrusion detection system for IoT-enabled smart cities. Sustainable Cities and Society. Vol-61. 2020

Dalwinder Singh, Birmohan Singh: Investigating the impact of data normalization on classification performance. Applied Soft Computing. Vol-97. 2020.

Alexander Wikner, Jaideep Pathak, Brian Hunt, Michelle Girvan, Troy Arcomano, Istvan Szunyogh, Andrew Pomerance, and Edward Ott: Combining machine learning with knowledge-based modeling for scalable forecasting and subgrid-scale closure of large, complex, spatiotemporal systems. Chaos: An Interdisciplinary Journal of Nonlinear Science. Vol-30. Issue-5. 2020.

Jiahao Wang, Azzedine Boukerche: The Scalability Analysis of Machine Learning Based Models in Road Traffic Flow Prediction. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). 2020.

Md Manjurul Ahsan, M. A. Parvez Mahmud, Pritom Kumar Saha, Kishor Datta Gupta, Zahed Siddique: Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies. Vol-9. 52. 2021.

Chayakrit Krittanawong, Albert J. Rogers, Kipp W. Johnson, Zhen Wang, Mintu P. Turakhia, Jonathan L. Halperin & Sanjiv M. Narayan: Integration of novel monitoring devices with machine learning technology for scalable cardiovascular management. Nature Reviews Cardiology. Vol-18. pp-75–91. 2021.

Ben Henghes, Connor Pettitt, Jeyan Thiyagalingam, Tony Hey, Ofer Lahav: Benchmarking and scalability of machine-learning methods for photometric redshift estimation. Monthly Notices of the RoyalAstronomical Society. Vol-505. Issue-4. pp- 4847–4856. 2021.

Yugesh Verma: Why Data Scaling is important in Machine Learning & How to effectively do it. Developers Corner. 2021

Lina Zhou: Machine Learning on Big Data: Opportunities and Challenges. National Science Foundation. 2017

Foster Provost: A Survey of Methods for Scaling Up Inductive Algorithms. Kluwer Academic Publishers. 1997.

Published

2022-02-23

How to Cite

Sharma, V. (2022). A Study on Data Scaling Methods for Machine Learning. International Journal for Global Academic & Scientific Research, 1(1), 31–42. https://doi.org/10.55938/ijgasr.v1i1.4