Publications
The best is yet to come.
Preprints
2024
- Do graph neural networks work for high-entropy alloys?Hengrui Zhang, Ruishu Huang, Jie Chen, James M. Rondinelli, and Wei ChenarXiv preprint, 2024
Graph neural networks (GNNs) have excelled in predictive modeling for both crystals and molecules, owing to the expressiveness of graph representations. High-entropy alloys (HEAs), however, lack chemical long-range order, limiting the applicability of current graph representations. To overcome this challenge, we propose a representation of HEAs as a collection of local environment (LE) graphs. Based on this representation, we introduce the LESets machine learning model, an accurate, interpretable GNN for HEA property prediction. We demonstrate the accuracy of LESets in modeling the mechanical properties of quaternary HEAs. Through analyses and interpretation, we further extract insights into the modeling and design of HEAs. In a broader sense, LESets extends the potential applicability of GNNs to disordered materials with combinatorial complexity formed by diverse constituents and their flexible configurations.
- Adaptive catalyst discovery using multicriteria Bayesian optimization with representation learningJie Chen, Pengfei Ou, Yuxin Chang, Hengrui Zhang, Xiao-Yan Li, Edward H. Sargent, and Wei ChenarXiv preprint, 2024
High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, we propose an uncertainty-aware atomistic machine learning model, UPNet, which enables automated representation learning directly from high-dimensional catalyst structures and achieves principled uncertainty quantification. Utilizing a constrained expected improvement acquisition function, our BO framework simultaneously considers multiple evaluation criteria. Using the proposed methods, we explore catalyst discovery for the CO2 reduction reaction. The results demonstrate that our approach achieves high prediction accuracy, facilitates interpretable feature extraction, and enables multicriteria design optimization, leading to significant reduction of computing power and time (10x reduction of required DFT calculations) in high-performance catalyst discovery.
Journal Papers
2024
- High-entropy alloy electrocatalysts screened using machine learning informed by quantum-inspired similarity analysisYuxin Chang, Ian Benlolo, Yang Bai, Christoff Reimer, Daojin Zhou, Hengrui Zhang, Hidetoshi Matsumura, Hitarth Choubisa, Xiao-Yan Li, Wei Chen, Pengfei Ou, Isaac Tamblyn, and Edward H. SargentMatter, 2024
The discovery of new electrocatalysts can be aided by density functional theory (DFT) computation of overpotentials based on the energies of chemical intermediates on prospective adsorption sites. We hypothesize that when training a machine learning model on DFT data, one could improve accuracy by introducing a quantitative measure of similarity among adsorption sites. When we augment a graph neural network-based machine learning workflow using similarity as an input feature, we find that the required training dataset size is decreased from 1,600 to 800, leading to a 2× acceleration: the number of DFT calculations required to train to a given level of accuracy is cut in half. This approach identifies Fe0.125Co0.125Ni0.229Ir0.229Ru0.292 as a promising oxygen reduction reaction catalyst with an overpotential of 0.24 V, outperforming a Pt/C benchmark. We examine, by studying experimentally four additional HEAs, the predictive power of the computational approach.
- Learning molecular mixture property using chemistry-aware graph neural networkHengrui Zhang, Tianxing Lai, Jie Chen, Arumugam Manthiram, James M. Rondinelli, and Wei ChenPRX Energy, 2024
Recent advances in machine learning (ML) have expedited materials discovery and design. One significant challenge faced in ML for materials is the expansive combinatorial space of potential materials formed by diverse constituents and their flexible configurations. This complexity is particularly evident in molecular mixtures, a frequently explored space for materials such as battery electrolytes. Owing to the complex structures of molecules and the sequence-independent nature of mixtures, conventional ML methods have difficulties in modeling such systems. Here we present MolSets, a specialized ML model for molecular mixtures. Representing individual molecules as graphs and their mixture as a set, MolSets leverages a graph neural network and the deep sets architecture to extract information at the molecule level and aggregate it at the mixture level, thus addressing local complexity while retaining global flexibility. We demonstrate the efficacy of MolSets in predicting the conductivity of lithium battery electrolytes and highlight its benefits in virtual screening of the combinatorial chemical space.
- Bayesian optimization of environmentally sustainable graphene inks produced by wet jet millingLindsay E. Chaney, Anton van Beek, Julia R. Downing, Jinrui Zhang, Hengrui Zhang, Janan Hui, E. Alexander Sorensen, Maryam Khalaj, Jennifer B. Dunn, Wei Chen, and Mark C. HersamSmall, 2024
Abstract Liquid phase exfoliation (LPE) of graphene is a potentially scalable method to produce conductive graphene inks for printed electronic applications. Among LPE methods, wet jet milling (WJM) is an emerging approach that uses high-speed, turbulent flow to exfoliate graphene nanoplatelets from graphite in a continuous flow manner. Unlike prior WJM work based on toxic, high-boiling-point solvents such as n-methyl-2-pyrollidone (NMP), this study uses the environmentally friendly solvent ethanol and the polymer stabilizer ethyl cellulose (EC). Bayesian optimization and iterative batch sampling are employed to guide the exploration of the experimental phase space (namely, concentrations of graphite and EC in ethanol) in order to identify the Pareto frontier that simultaneously optimizes three performance criteria (graphene yield, conversion rate, and film conductivity). This data-driven strategy identifies vastly different optimal WJM conditions compared to literature precedent, including an optimal loading of 15 wt% graphite in ethanol compared to 1 wt% graphite in NMP. These WJM conditions provide superlative graphene production rates of 3.2 g hr-1 with the resulting graphene nanoplatelets being suitable for screen-printed micro-supercapacitors. Finally, life cycle assessment reveals that ethanol-based WJM graphene exfoliation presents distinct environmental sustainability advantages for greenhouse gas emissions, fossil fuel consumption, and toxicity.
2023
- Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learningJie Chen, Hengrui Zhang, Carolin B. Wahl, Wei Liu, Chad A. Mirkin, Vinayak P. Dravid, Daniel W. Apley, and Wei ChenProceedings of the National Academy of Sciences, 2023
A bottleneck in high-throughput nanomaterials discovery is the pace at which new materials can be structurally characterized. Although current machine learning (ML) methods show promise for the automated processing of electron diffraction patterns (DPs), they fail in high-throughput experiments where DPs are collected from crystals with random orientations. Inspired by the human decision-making process, a framework for automated crystal system classification from DPs with arbitrary orientations was developed. A convolutional neural network was trained using evidential deep learning, and the predictive uncertainties were quantified and leveraged to fuse multiview predictions. Using vector map representations of DPs, the framework achieves a testing accuracy of 0.94 in the examples considered, is robust to noise, and retains remarkable accuracy using experimental data. This work highlights the ability of ML to be used to accelerate experimental high-throughput materials data analytics.
- ET-AL: Entropy-targeted active learning for bias mitigation in materials dataHengrui Zhang, Wei Chen, James M. Rondinelli, and Wei ChenApplied Physics Reviews, 2023
Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.
2022
- Uncertainty-aware mixed-variable machine learning for materials designHengrui Zhang, Wei Chen, Akshay Iyer, Daniel W. Apley, and Wei ChenScientific Reports, 2022
Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
- High-throughput investigation of structural evolution upon solid-state in Cu–Cr–Co combinatorial multilayer thin-filmJian Hui, Qingyun Hu, Hengrui Zhang, Jie Zhao, Yuxi Luo, Yang Ren, Zhan Zhang, and Hong WangMaterials & Design, 2022
Cu–Cr–Co combinatorial multilayer thin-films were prepared by a high-throughput ion beam sputtering system. Based on the thickness ratio among the individual nanoscale monolayers (Cu, Cr, Co), the resulting stoichiometry covered the entire phase diagram. The chemical composition and structure of Cu–Cr–Co combinatorial chip upon solid-state reaction were studied by lab-based micro-X-ray fluorescence (μ-XRF) and high-throughput synchrotron X-ray diffraction (XRD), respectively. A composition-structure map for Cu–Cr–Co combinatorial chip was developed through automated data analysis employing hierarchical clustering techniques. The structural evolution of Cu–Cr–Co combinatorial chip as a function of heat-treatment temperature, time, and modulation period were studied systematically. Furthermore, the effect of the elemental distribution in-depth direction was investigated to gain more insights regarding phase transformation. This work provides an efficient method and new perspectives for the design and optimization of the composition and structure of high-performance thin-films.
2021
- Demand-driven materials designHengrui ZhangJournal of Shanghai Jiao Tong University, 2021
Emerging paradigms provide a promising solution to current challenges faced by materials science and engineering. In this paper, the progress in materials genome engineering is reviewed, and its facilitation of the materials design is disscussed. Based on the computational and data-driven methodologies, materials could be integrated into the engineering design cycle to realize demand-driven materials design, so as to accelerate the discovery and application of materials.
2020
- Investigation of synchrotron X-ray induced oxidation of Ag–Cu thin-filmJian Hui, Hengrui Zhang, Qingyun Hu, Zhan Zhang, Yang Ren, Lanting Zhang, and Hong WangMaterials Letters, 2020
Combinatorial Ag–Cu thin-films were irradiated by synchrotron X-ray in air to investigate the beam damage on the surface of the thin-film. The main effect was found to be oxidation with the oxidation state primarily depending on film composition: CuO formed on the pure Cu film, whereas Cu2O formed in the presence of Ag. Meanwhile, formation of crystalline Ag2O2 was favored and preferentially in the Ag-rich area. These results are of great importance in studying the oxidation of noble metal nano-film, and in the characterization of nano-films using synchrotron X-ray.
2019
- High-throughput investigation of crystal-to-glass transformation of Ti–Ni–Cu ternary alloyJian Hui, Haiqian Ma, Zheyu Wu, Zhan Zhang, Yang Ren, Hengrui Zhang, Lanting Zhang, and Hong WangScientific Reports, 2019
A high-throughput investigation of metallic glass formation via solid-state reaction was reported in this paper. Combinatorial multilayered thin-film chips covering the entire Ti–Ni–Cu ternary system were prepared using ion beam sputtering technique. Microbeam synchrotron X-ray diffraction (XRD) and X-ray fluorescence (XRF) measurements were conducted, with 1,325 data points collected from each chip, to map out the composition and the phase constitution before and after annealing at 373 K for 110 hours. The composition dependence of the crystal-to-glass transition by solid-state reaction was surveyed using this approach. The resulting composition–phase map is consistent with previously reported results. Time-of-flight secondary ion mass spectroscopy (ToF-SIMS) was performed on the representative compositions to determine the inter-diffusion between layers, the result shows that the diffusion of Ti is the key factor for the crystal-to-glass transition. In addition, both layer thickness and layer sequence play important roles as well. This work demonstrates that combinatorial chip technique is an efficient way for systematic and rapid study of crystal-to-glass transition for multi-component alloy systems.
Conference Papers
2024
- Supervised contrastive learning for electric motor bearing fault detectionHengrui Zhang, and Bingnan WangIn International Conference on Electrical Machines , 2024
Various faults can cause electric machine failures, causing downtime and asset losses. Fault detection technologies are highly desirable in the industry to predict and prevent such failures. Recent advances in machine learning have enabled data- driven models that identify faults from signals monitored in the motors. However, those signals could be complex and the features that indicate faults are subtle. Therefore, effective methods for extracting informative features relevant to faults from signals are desired. In this paper, we explore the use of contrastive learning in the detection of bearing faults from phase current signals. We develop a model architecture consisting of two parts, a feature extractor and a classifier, where the feature extractor is pre-trained using supervised contrastive learning. Tested on the Paderborn University bearing fault dataset, our model attains a high fault classification accuracy of 87%, which outperforms the conventional machine learning models. We also perform ablation tests to demonstrate the importance of contrastive learning- based training in this model. By investigating the classification results and extracted features of the models, we further verify the effectiveness of contrastive learning in extracting features that distinguish different classes. We anticipate that contrastive learning can lay the foundation of more accurate fault detection models and be extended to other practical fault detection tasks.