Publications
The best is yet to come.
Preprints
2024
- Adaptive catalyst discovery using multicriteria Bayesian optimization with representation learningJie Chen, Pengfei Ou, Yuxin Chang, Hengrui Zhang, Xiao-Yan Li, Edward H. Sargent, and Wei ChenarXiv preprint, 2024
High-performance catalysts are crucial for sustainable energy conversion and human health. However, the discovery of catalysts faces challenges due to the absence of efficient approaches to navigating vast and high-dimensional structure and composition spaces. In this study, we propose a high-throughput computational catalyst screening approach integrating density functional theory (DFT) and Bayesian Optimization (BO). Within the BO framework, we propose an uncertainty-aware atomistic machine learning model, UPNet, which enables automated representation learning directly from high-dimensional catalyst structures and achieves principled uncertainty quantification. Utilizing a constrained expected improvement acquisition function, our BO framework simultaneously considers multiple evaluation criteria. Using the proposed methods, we explore catalyst discovery for the CO2 reduction reaction. The results demonstrate that our approach achieves high prediction accuracy, facilitates interpretable feature extraction, and enables multicriteria design optimization, leading to significant reduction of computing power and time (10x reduction of required DFT calculations) in high-performance catalyst discovery.
Journal Papers
2024
- Learning molecular mixture property using chemistry-aware graph neural networkHengrui Zhang, Tianxing Lai, Jie Chen, Arumugam Manthiram, James M. Rondinelli, and Wei ChenPRX Energy, 2024(In press)
Recent advances in machine learning (ML) have expedited materials discovery and design. One significant challenge faced in ML for materials is the expansive combinatorial space of potential materials formed by diverse constituents and their flexible configurations. This complexity is particularly evident in molecular mixtures, a frequently explored space for materials such as battery electrolytes. Owing to the complex structures of molecules and the sequence-independent nature of mixtures, conventional ML methods have difficulties in modeling such systems. Here we present MolSets, a specialized ML model for molecular mixtures. Representing individual molecules as graphs and their mixture as a set, MolSets leverages a graph neural network and the deep sets architecture to extract information at the molecule level and aggregate it at the mixture level, thus addressing local complexity while retaining global flexibility. We demonstrate the efficacy of MolSets in predicting the conductivity of lithium battery electrolytes and highlight its benefits in virtual screening of the combinatorial chemical space.
- Bayesian optimization of environmentally sustainable graphene inks produced by wet jet millingLindsay E. Chaney, Anton van Beek, Julia R. Downing, Jinrui Zhang, Hengrui Zhang, Janan Hui, E. Alexander Sorensen, Maryam Khalaj, Jennifer B. Dunn, Wei Chen, and Mark C. HersamSmall, 2024
Abstract Liquid phase exfoliation (LPE) of graphene is a potentially scalable method to produce conductive graphene inks for printed electronic applications. Among LPE methods, wet jet milling (WJM) is an emerging approach that uses high-speed, turbulent flow to exfoliate graphene nanoplatelets from graphite in a continuous flow manner. Unlike prior WJM work based on toxic, high-boiling-point solvents such as n-methyl-2-pyrollidone (NMP), this study uses the environmentally friendly solvent ethanol and the polymer stabilizer ethyl cellulose (EC). Bayesian optimization and iterative batch sampling are employed to guide the exploration of the experimental phase space (namely, concentrations of graphite and EC in ethanol) in order to identify the Pareto frontier that simultaneously optimizes three performance criteria (graphene yield, conversion rate, and film conductivity). This data-driven strategy identifies vastly different optimal WJM conditions compared to literature precedent, including an optimal loading of 15 wt% graphite in ethanol compared to 1 wt% graphite in NMP. These WJM conditions provide superlative graphene production rates of 3.2 g hr-1 with the resulting graphene nanoplatelets being suitable for screen-printed micro-supercapacitors. Finally, life cycle assessment reveals that ethanol-based WJM graphene exfoliation presents distinct environmental sustainability advantages for greenhouse gas emissions, fossil fuel consumption, and toxicity.
2023
- Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learningJie Chen, Hengrui Zhang, Carolin B. Wahl, Wei Liu, Chad A. Mirkin, Vinayak P. Dravid, Daniel W. Apley, and Wei ChenProceedings of the National Academy of Sciences, 2023
A bottleneck in high-throughput nanomaterials discovery is the pace at which new materials can be structurally characterized. Although current machine learning (ML) methods show promise for the automated processing of electron diffraction patterns (DPs), they fail in high-throughput experiments where DPs are collected from crystals with random orientations. Inspired by the human decision-making process, a framework for automated crystal system classification from DPs with arbitrary orientations was developed. A convolutional neural network was trained using evidential deep learning, and the predictive uncertainties were quantified and leveraged to fuse multiview predictions. Using vector map representations of DPs, the framework achieves a testing accuracy of 0.94 in the examples considered, is robust to noise, and retains remarkable accuracy using experimental data. This work highlights the ability of ML to be used to accelerate experimental high-throughput materials data analytics.
- ET-AL: Entropy-targeted active learning for bias mitigation in materials dataHengrui Zhang, Wei Chen, James M. Rondinelli, and Wei ChenApplied Physics Reviews, 2023
Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.
2022
- Uncertainty-aware mixed-variable machine learning for materials designHengrui Zhang, Wei Chen, Akshay Iyer, Daniel W. Apley, and Wei ChenScientific Reports, 2022
Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
- High-throughput investigation of structural evolution upon solid-state in Cu–Cr–Co combinatorial multilayer thin-filmJian Hui, Qingyun Hu, Hengrui Zhang, Jie Zhao, Yuxi Luo, Yang Ren, Zhan Zhang, and Hong WangMaterials & Design, 2022
Cu–Cr–Co combinatorial multilayer thin-films were prepared by a high-throughput ion beam sputtering system. Based on the thickness ratio among the individual nanoscale monolayers (Cu, Cr, Co), the resulting stoichiometry covered the entire phase diagram. The chemical composition and structure of Cu–Cr–Co combinatorial chip upon solid-state reaction were studied by lab-based micro-X-ray fluorescence (μ-XRF) and high-throughput synchrotron X-ray diffraction (XRD), respectively. A composition-structure map for Cu–Cr–Co combinatorial chip was developed through automated data analysis employing hierarchical clustering techniques. The structural evolution of Cu–Cr–Co combinatorial chip as a function of heat-treatment temperature, time, and modulation period were studied systematically. Furthermore, the effect of the elemental distribution in-depth direction was investigated to gain more insights regarding phase transformation. This work provides an efficient method and new perspectives for the design and optimization of the composition and structure of high-performance thin-films.
2021
- Demand-driven materials designHengrui ZhangJournal of Shanghai Jiao Tong University, 2021
Emerging paradigms provide a promising solution to current challenges faced by materials science and engineering. In this paper, the progress in materials genome engineering is reviewed, and its facilitation of the materials design is disscussed. Based on the computational and data-driven methodologies, materials could be integrated into the engineering design cycle to realize demand-driven materials design, so as to accelerate the discovery and application of materials.
2020
- Investigation of synchrotron X-ray induced oxidation of Ag–Cu thin-filmJian Hui, Hengrui Zhang, Qingyun Hu, Zhan Zhang, Yang Ren, Lanting Zhang, and Hong WangMaterials Letters, 2020
Combinatorial Ag–Cu thin-films were irradiated by synchrotron X-ray in air to investigate the beam damage on the surface of the thin-film. The main effect was found to be oxidation with the oxidation state primarily depending on film composition: CuO formed on the pure Cu film, whereas Cu2O formed in the presence of Ag. Meanwhile, formation of crystalline Ag2O2 was favored and preferentially in the Ag-rich area. These results are of great importance in studying the oxidation of noble metal nano-film, and in the characterization of nano-films using synchrotron X-ray.
2019
- High-throughput investigation of crystal-to-glass transformation of Ti–Ni–Cu ternary alloyJian Hui, Haiqian Ma, Zheyu Wu, Zhan Zhang, Yang Ren, Hengrui Zhang, Lanting Zhang, and Hong WangScientific Reports, 2019
A high-throughput investigation of metallic glass formation via solid-state reaction was reported in this paper. Combinatorial multilayered thin-film chips covering the entire Ti–Ni–Cu ternary system were prepared using ion beam sputtering technique. Microbeam synchrotron X-ray diffraction (XRD) and X-ray fluorescence (XRF) measurements were conducted, with 1,325 data points collected from each chip, to map out the composition and the phase constitution before and after annealing at 373 K for 110 hours. The composition dependence of the crystal-to-glass transition by solid-state reaction was surveyed using this approach. The resulting composition–phase map is consistent with previously reported results. Time-of-flight secondary ion mass spectroscopy (ToF-SIMS) was performed on the representative compositions to determine the inter-diffusion between layers, the result shows that the diffusion of Ti is the key factor for the crystal-to-glass transition. In addition, both layer thickness and layer sequence play important roles as well. This work demonstrates that combinatorial chip technique is an efficient way for systematic and rapid study of crystal-to-glass transition for multi-component alloy systems.