Dresden 2020 – wissenschaftliches Programm
Die DPG-Frühjahrstagung in Dresden musste abgesagt werden! Lesen Sie mehr ...
Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
CPP: Fachverband Chemische Physik und Polymerphysik
CPP 68: Topical Session: Data Driven Materials Science - Descriptors (joint session MM/CPP)
CPP 68.2: Vortrag
Mittwoch, 18. März 2020, 12:00–12:15, BAR 205
Information-theory-driven identification of compact descriptors for accurate machine-learning predictions — •Benjamin Regler, Matthias Scheffler, and Luca M. Ghiringhelli — Fritz Haber Institute of the Max Planck Society, Berlin, Germany
Machine learning (ML) is useful for predicting materials behavior by relating physical and chemical properties (features) of known materials to the property of interest (target). Aiming at a rational, unbiased, and data-driven identification of relevant features, we use a combination of statistical and information-theoretical techniques to identify the subset of features that unequivocally represent each material in the data set and contribute most to predicting the target property. The novelty and power of our approach is that it does not assume any specific functional form of the “features → target” relationship. Based on the concept of cumulative mutual information, our framework assigns quantitative scores for the “strength” of the feature’s contributions, ranks the features by their scores, and selects the most contributing features to be relevant prior to ensuing data analysis. The scoring and selection algorithm is then supplemented by a purely ML procedure built on the selected and compact feature subset. We identify compact feature subsets for predicting (i) the ground-state crystal-structure of octet-binary compound semiconductors and (ii) elastic properties of inorganic crystalline compounds. In each case, we show that only a few features are actually required to obtain accurate predictions, thereby reducing the complexity of the ML model and sensitivity to the availability of materials data.