Preprints

2024

Adaptive catalyst discovery using multicriteria Bayesian optimization with representation learning

Jie Chen, Pengfei Ou, Yuxin Chang, Hengrui Zhang, Xiao-Yan Li, Edward H. Sargent, and Wei Chen

arXiv preprint, 2024

Abs arXiv

High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, we propose an uncertainty-aware atomistic machine learning model, UPNet, which enables automated representation learning directly from high-dimensional catalyst structures and achieves principled uncertainty quantification. Utilizing a constrained expected improvement acquisition function, our BO framework simultaneously considers multiple evaluation criteria. Using the proposed methods, we explore catalyst discovery for the CO2 reduction reaction. The results demonstrate that our approach achieves high prediction accuracy, facilitates interpretable feature extraction, and enables multicriteria design optimization, leading to significant reduction of computing power and time (10x reduction of required DFT calculations) in high-performance catalyst discovery.

Journal Papers

2025

Emerging microelectronic materials by design: Navigating combinatorial design space with scarce and dispersed data

Hengrui Zhang, Alexandru B. Georgescu, Suraj Yerramilli, Christopher Karpovich, Daniel W. Apley, Elsa A. Olivetti, James M. Rondinelli, and Wei Chen

Accounts of Materials Research, 2025

Abs HTML

The increasing demands of sustainable energy, electronics, and biomedical applications call for next-generation functional materials with unprecedented properties. Of particular interest are emerging materials that display exceptional physical properties, making them promising candidates in energy-efficient microelectronic devices. As the conventional Edisonian approach becomes significantly outpaced by growing societal needs, emerging computational modeling and machine learning (ML) methods are employed for the rational design of materials. However, the complex physical mechanisms, cost of first-principles calculations, and the dispersity and scarcity of data pose challenges to both physics-based and data-driven materials modeling. Moreover, the combinatorial composition-structure design space is high-dimensional and often disjoint, making design optimization nontrivial. In this Account, we review a team effort toward establishing a framework that integrates data-driven and physics-based methods to address these challenges and accelerate materials design. We begin by presenting our integrated materials design framework and its three components in a general context. We then provide an example of applying this materials design framework to metal-insulator transition (MIT) materials, a specific type of emerging materials with practical importance in next-generation memory technologies. We identify multiple new materials which may display this property and propose pathways for their synthesis. Finally, we identify some outstanding challenges in data-driven materials design, such as materials data quality issues and property-performance mismatch. We seek to raise awareness of these overlooked issues hindering materials design, thus stimulating efforts toward developing methods to mitigate the gaps.
Graph representation of local environments for learning high-entropy alloy properties

Hengrui Zhang, Ruishu Huang, Jie Chen, James M. Rondinelli, and Wei Chen

Machine Learning: Science and Technology, 2025

Abs HTML PDF

Graph neural networks (GNNs) have excelled in predictive modeling for both crystals and molecules, owing to the expressiveness of graph representations. High-entropy alloys (HEAs), however, lack chemical long-range order, limiting the applicability of current graph representations. To overcome this challenge, we propose a representation of HEAs as a collection of local environment (LE) graphs. Based on this representation, we introduce the LESets machine learning model, an accurate, interpretable GNN for HEA property prediction. We demonstrate the accuracy of LESets in modeling the mechanical properties of quaternary HEAs. Through analyses and interpretation, we further extract insights into the modeling and design of HEAs. In a broader sense, LESets extends the potential applicability of GNNs to disordered materials with combinatorial complexity formed by diverse constituents and their flexible configurations.

2024

High-entropy alloy electrocatalysts screened using machine learning informed by quantum-inspired similarity analysis

Yuxin Chang, Ian Benlolo, Yang Bai, Christoff Reimer, Daojin Zhou, Hengrui Zhang, Hidetoshi Matsumura, Hitarth Choubisa, Xiao-Yan Li, Wei Chen, Pengfei Ou, Isaac Tamblyn, and Edward H. Sargent

Matter, 2024

Abs HTML

The discovery of new electrocatalysts can be aided by density functional theory (DFT) computation of overpotentials based on the energies of chemical intermediates on prospective adsorption sites. We hypothesize that when training a machine learning model on DFT data, one could improve accuracy by introducing a quantitative measure of similarity among adsorption sites. When we augment a graph neural network-based machine learning workflow using similarity as an input feature, we find that the required training dataset size is decreased from 1,600 to 800, leading to a 2× acceleration: the number of DFT calculations required to train to a given level of accuracy is cut in half. This approach identifies Fe_0.125Co_0.125Ni_0.229Ir_0.229Ru_0.292 as a promising oxygen reduction reaction catalyst with an overpotential of 0.24 V, outperforming a Pt/C benchmark. We examine, by studying experimentally four additional HEAs, the predictive power of the computational approach.
Learning molecular mixture property using chemistry-aware graph neural network

Hengrui Zhang, Tianxing Lai, Jie Chen, Arumugam Manthiram, James M. Rondinelli, and Wei Chen

PRX Energy, 2024

Abs HTML PDF

Recent advances in machine learning (ML) have expedited materials discovery and design. One significant challenge faced in ML for materials is the expansive combinatorial space of potential materials formed by diverse constituents and their flexible configurations. This complexity is particularly evident in molecular mixtures, a frequently explored space for materials such as battery electrolytes. Owing to the complex structures of molecules and the sequence-independent nature of mixtures, conventional ML methods have difficulties in modeling such systems. Here we present MolSets, a specialized ML model for molecular mixtures. Representing individual molecules as graphs and their mixture as a set, MolSets leverages a graph neural network and the deep sets architecture to extract information at the molecule level and aggregate it at the mixture level, thus addressing local complexity while retaining global flexibility. We demonstrate the efficacy of MolSets in predicting the conductivity of lithium battery electrolytes and highlight its benefits in virtual screening of the combinatorial chemical space.
Bayesian optimization of environmentally sustainable graphene inks produced by wet jet milling

Lindsay E. Chaney, Anton van Beek, Julia R. Downing, Jinrui Zhang, Hengrui Zhang, Janan Hui, E. Alexander Sorensen, Maryam Khalaj, Jennifer B. Dunn, Wei Chen, and Mark C. Hersam

Small, 2024

Abs HTML PDF

Liquid phase exfoliation (LPE) of graphene is a potentially scalable method to produce conductive graphene inks for printed electronic applications. Among LPE methods, wet jet milling (WJM) is an emerging approach that uses high-speed, turbulent flow to exfoliate graphene nanoplatelets from graphite in a continuous flow manner. Unlike prior WJM work based on toxic, high-boiling-point solvents such as n-methyl-2-pyrollidone (NMP), this study uses the environmentally friendly solvent ethanol and the polymer stabilizer ethyl cellulose (EC). Bayesian optimization and iterative batch sampling are employed to guide the exploration of the experimental phase space (namely, concentrations of graphite and EC in ethanol) in order to identify the Pareto frontier that simultaneously optimizes three performance criteria (graphene yield, conversion rate, and film conductivity). This data-driven strategy identifies vastly different optimal WJM conditions compared to literature precedent, including an optimal loading of 15 wt% graphite in ethanol compared to 1 wt% graphite in NMP. These WJM conditions provide superlative graphene production rates of 3.2 g hr^-1 with the resulting graphene nanoplatelets being suitable for screen-printed micro-supercapacitors. Finally, life cycle assessment reveals that ethanol-based WJM graphene exfoliation presents distinct environmental sustainability advantages for greenhouse gas emissions, fossil fuel consumption, and toxicity.

2023

Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learning

Jie Chen, Hengrui Zhang, Carolin B. Wahl, Wei Liu, Chad A. Mirkin, Vinayak P. Dravid, Daniel W. Apley, and Wei Chen

Proceedings of the National Academy of Sciences, 2023

Abs HTML

A bottleneck in high-throughput nanomaterials discovery is the pace at which new materials can be structurally characterized. Although current machine learning (ML) methods show promise for the automated processing of electron diffraction patterns (DPs), they fail in high-throughput experiments where DPs are collected from crystals with random orientations. Inspired by the human decision-making process, a framework for automated crystal system classification from DPs with arbitrary orientations was developed. A convolutional neural network was trained using evidential deep learning, and the predictive uncertainties were quantified and leveraged to fuse multiview predictions. Using vector map representations of DPs, the framework achieves a testing accuracy of 0.94 in the examples considered, is robust to noise, and retains remarkable accuracy using experimental data. This work highlights the ability of ML to be used to accelerate experimental high-throughput materials data analytics.
Automated crystal system identification from four-dimensional scanning transmission electron microscopy data using brain-inspired artificial intelligence

Carolin B. Wahl, Jie Chen, Hengrui Zhang, Wei Liu, Shengtong Zhang, Jiezhong Wu, Chad A. Mirkin, Vinayak P. Dravid, Daniel W. Apley, and Wei Chen

Microscopy and Microanalysis, 2023

HTML PDF
ET-AL: Entropy-targeted active learning for bias mitigation in materials data

Hengrui Zhang, Wei Chen, James M. Rondinelli, and Wei Chen

Applied Physics Reviews, 2023

Abs HTML PDF

Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.

2022

Uncertainty-aware mixed-variable machine learning for materials design

Hengrui Zhang, Wei Chen, Akshay Iyer, Daniel W. Apley, and Wei Chen

Scientific Reports, 2022

Abs HTML PDF

Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
High-throughput investigation of structural evolution upon solid-state in Cu–Cr–Co combinatorial multilayer thin-film

Jian Hui, Qingyun Hu, Hengrui Zhang, Jie Zhao, Yuxi Luo, Yang Ren, Zhan Zhang, and Hong Wang

Materials & Design, 2022

Abs HTML PDF

Cu–Cr–Co combinatorial multilayer thin-films were prepared by a high-throughput ion beam sputtering system. Based on the thickness ratio among the individual nanoscale monolayers (Cu, Cr, Co), the resulting stoichiometry covered the entire phase diagram. The chemical composition and structure of Cu–Cr–Co combinatorial chip upon solid-state reaction were studied by lab-based micro-X-ray fluorescence (μ-XRF) and high-throughput synchrotron X-ray diffraction (XRD), respectively. A composition-structure map for Cu–Cr–Co combinatorial chip was developed through automated data analysis employing hierarchical clustering techniques. The structural evolution of Cu–Cr–Co combinatorial chip as a function of heat-treatment temperature, time, and modulation period were studied systematically. Furthermore, the effect of the elemental distribution in-depth direction was investigated to gain more insights regarding phase transformation. This work provides an efficient method and new perspectives for the design and optimization of the composition and structure of high-performance thin-films.

2021

Demand-driven materials design

Hengrui Zhang

Journal of Shanghai Jiao Tong University, 2021

Abs HTML

Emerging paradigms provide a promising solution to current challenges faced by materials science and engineering. In this paper, the progress in materials genome engineering is reviewed, and its facilitation of the materials design is disscussed. Based on the computational and data-driven methodologies, materials could be integrated into the engineering design cycle to realize demand-driven materials design, so as to accelerate the discovery and application of materials.

2020

Investigation of synchrotron X-ray induced oxidation of Ag–Cu thin-film

Jian Hui, Hengrui Zhang, Qingyun Hu, Zhan Zhang, Yang Ren, Lanting Zhang, and Hong Wang

Materials Letters, 2020

Abs HTML

Combinatorial Ag–Cu thin-films were irradiated by synchrotron X-ray in air to investigate the beam damage on the surface of the thin-film. The main effect was found to be oxidation with the oxidation state primarily depending on film composition: CuO formed on the pure Cu film, whereas Cu2O formed in the presence of Ag. Meanwhile, formation of crystalline Ag2O2 was favored and preferentially in the Ag-rich area. These results are of great importance in studying the oxidation of noble metal nano-film, and in the characterization of nano-films using synchrotron X-ray.

2019

High-throughput investigation of crystal-to-glass transformation of Ti–Ni–Cu ternary alloy

Jian Hui, Haiqian Ma, Zheyu Wu, Zhan Zhang, Yang Ren, Hengrui Zhang, Lanting Zhang, and Hong Wang

Scientific Reports, 2019

Abs HTML PDF

A high-throughput investigation of metallic glass formation via solid-state reaction was reported in this paper. Combinatorial multilayered thin-film chips covering the entire Ti–Ni–Cu ternary system were prepared using ion beam sputtering technique. Microbeam synchrotron X-ray diffraction (XRD) and X-ray fluorescence (XRF) measurements were conducted, with 1,325 data points collected from each chip, to map out the composition and the phase constitution before and after annealing at 373 K for 110 hours. The composition dependence of the crystal-to-glass transition by solid-state reaction was surveyed using this approach. The resulting composition–phase map is consistent with previously reported results. Time-of-flight secondary ion mass spectroscopy (ToF-SIMS) was performed on the representative compositions to determine the inter-diffusion between layers, the result shows that the diffusion of Ti is the key factor for the crystal-to-glass transition. In addition, both layer thickness and layer sequence play important roles as well. This work demonstrates that combinatorial chip technique is an efficient way for systematic and rapid study of crystal-to-glass transition for multi-component alloy systems.

Conference Papers

2025

Adaptive uncertainty-aware deep learning for materials discovery with high-dimensional design inputs

Jie Chen, Pengfei Ou, Yuxin Chang, Hengrui Zhang, Xiao-Yan Li, Edward H. Sargent, and Wei Chen

In International Design Engineering Technical Conferences, 2025

Abs arXiv

High-dimensional structure and composition spaces present a major challenge in materials discovery due to the difficulty of efficiently navigating vast and complex design space. Additionally, most existing machine learning approaches lack the capability to quantify epistemic uncertainty, which arises from limited data, a critical limitation for materials discovery tasks involving high-dimensional representations such as atomic structures. To address these challenges, we introduce UPNet, an uncertainty-aware atomistic machine learning model within a Bayesian Optimization (BO) framework. UPNet enables automated representation learning directly from high-dimensional atomic structures while providing principled uncertainty quantification through the use of Spectral-normalized Neural Gaussian Process (SNGP). By incorporating a constrained expected improvement acquisition function, our BO framework optimizes multiple evaluation criteria simultaneously. We demonstrate the effectiveness of our approach in catalyst discovery for the CO₂ reduction reaction. The results show that the UPNet model archives high prediction accuracy. The model can also extract interpretable latent features due to its ability to preserve the similarities of the materials structures from the input space to the latent space. The developed constrained BO method outperforms unconstrained BO and random search in terms of the quantity and quality of the discovered materials with top performance. Our method reduces computational cost and time, achieving a 10× reduction in the number of required simulation calculations. Beyond catalysis, this framework offers a broadly applicable solution for accelerating materials discovery in various domains that involve high-dimensional design inputs and expensive physics-based simulations.

2024

Supervised contrastive learning for electric motor bearing fault detection

Hengrui Zhang, and Bingnan Wang

In International Conference on Electrical Machines, 2024

Abs HTML PDF

Various faults can cause electric machine failures, causing downtime and asset losses. Fault detection technologies are highly desirable in the industry to predict and prevent such failures. Recent advances in machine learning have enabled data- driven models that identify faults from signals monitored in the motors. However, those signals could be complex and the features that indicate faults are subtle. Therefore, effective methods for extracting informative features relevant to faults from signals are desired. In this paper, we explore the use of contrastive learning in the detection of bearing faults from phase current signals. We develop a model architecture consisting of two parts, a feature extractor and a classifier, where the feature extractor is pre-trained using supervised contrastive learning. Tested on the Paderborn University bearing fault dataset, our model attains a high fault classification accuracy of 87%, which outperforms the conventional machine learning models. We also perform ablation tests to demonstrate the importance of contrastive learning- based training in this model. By investigating the classification results and extracted features of the models, we further verify the effectiveness of contrastive learning in extracting features that distinguish different classes. We anticipate that contrastive learning can lay the foundation of more accurate fault detection models and be extended to other practical fault detection tasks.