Dresden 2020 – wissenschaftliches Programm
Die DPG-Frühjahrstagung in Dresden musste abgesagt werden! Lesen Sie mehr ...
Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
SYBD: Symposium Big data driven materials science
SYBD 1: Big Data Driven Materials Science
SYBD 1.4: Hauptvortrag
Dienstag, 17. März 2020, 11:15–11:45, HSZ 02
Identifying Domains of Applicability of Machine Learning Models for Materials Science — •Mario Boley1, Christopher Sutton2, Luca M. Ghiringhelli2, Matthias Rupp3, Jilles Vreeken4, and Matthias Scheffler2 — 1Monash University, Melbourne, Australia — 2Fritz Haber Institute of the Max Planck Society, Berlin, Germany — 3Citrine Informatics, Redwood City, California — 4Helmholtz Center for Information Security, Saarbrücken, Germany
Machine learning (ML) promises to accelerate the discovery of novel materials by allowing to rapidly screen compounds at orders of magnitude lower computational cost than first-principles electronic-structure approaches. A critical obstacle for the development of novel ML models is that the complex choices involved in designing them are currently made based on the simplistic metric of the average model test error. Treating models as a black box that produces a single error statistic can render them as insufficient for certain screening tasks while they actually predict the target property accurately in sub-domains of the considered materials. We present an alternative diagnostic tool based on subgroup discovery that detects domains of applicability of ML models. These domains are given as a combination of simple conditions on the unit cell structure (e.g., on the lattice vectors, lattice angles, and bond distances) under which the model error is substantially lower than its global average in the complete materials class. Such descriptions allow to understand and subsequently address systematic shortcomings of the investigated ML model and to focus sampling of candidate materials to regions of low expected error.