Reliability of Training Data Sets for ML Classifiers: A Lesson Learned from Mechanical Engineering

Juric, Radmila; Danilchanka, Natallia

dc.contributor.author	Juric, Radmila
dc.contributor.author	Danilchanka, Natallia
dc.date.accessioned	2021-04-06T12:24:13Z
dc.date.available	2021-04-06T12:24:13Z
dc.date.created	2021-03-10T12:19:27Z
dc.date.issued	2020
dc.identifier.citation	Juric, R., Danilchanka, N., & Mousavi, M. G. (2020, January). Reliability of Training Data Sets for ML Classifiers: a Lesson Learned from Mechanical Engineering. In T. X. Bui (Red.), Proceedings of the 53rd Hawaii International Conference on System Sciences (s. 891-900).	en_US
dc.identifier.isbn	978-0-9981331-3-3
dc.identifier.uri	https://hdl.handle.net/11250/2736402
dc.description.abstract	The popularity of learning and predictive technologies, across many problem domains, is unprecedented and it is often underpinned with the fact that we efficiently compute with vast amounts of data and data types, and thus should be able to resolve problems, which we could not in the past. This view is particularly common among scientists who believe that the excessive amount of data, we generate in real life, is ideal for performing predictions and training algorithms. However, the truth might be quite different. The paper illustrates the process of preparing a training data set for an ML classifier, which should predict certain conditions in mechanical engineering. It was not the case that it was difficult to define and choose classifiers, in order to secure safe predictions. It was our inability to create a safe, reliable and trustworthy training data set, from scientifically proven experiments, which created the problem. This places serious doubts on the way we use learning and predictive technologies today. It remains debatable what the next step should be. However, if in ML algorithms, and classifiers in particular, the semantic which is built-in data sets, influences classifier’s definition, it would be very difficult to evaluate and rely on them, before we understand data semantics fully. In other words, we still do not know how the semantic, sometimes hidden in a data set, can adversely affect algorithms trained by them.	en_US
dc.language.iso	eng	en_US
dc.relation.ispartof	Proceeding of the 53rd Hawaii International Conference on System Sciences (HICSS 2020)
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.title	Reliability of Training Data Sets for ML Classifiers: A Lesson Learned from Mechanical Engineering	en_US
dc.type	Chapter	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	891-900	en_US
dc.identifier.doi	https://doi.org/10.24251/HICSS.2020.111
dc.identifier.cristin	1896950
cristin.ispublished	true
cristin.fulltext	original

Tilhørende fil(er)

Filnavn:: 2020Juric%2CDanilchanka%26Mous ...
Størrelse:: 383.1Kb
Format:: PDF
Beskrivelse:: 2020JuricReliability

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for realfag og industrisystemer [124]
Publikasjoner fra CRIStin [3623]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal