Pesquisa | Portal Regional da BVS

1.

Publisher Correction: Exploiting redundancy in large materials datasets for efficient machine learning with less data.

Li, Kangming; Persaud, Daniel; Choudhary, Kamal; DeCost, Brian; Greenwood, Michael; Hattrick-Simpers, Jason.

Nat Commun ; 15(1): 284, 2024 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-38177139

2.

Exploiting redundancy in large materials datasets for efficient machine learning with less data.

Li, Kangming; Persaud, Daniel; Choudhary, Kamal; DeCost, Brian; Greenwood, Michael; Hattrick-Simpers, Jason.

Nat Commun ; 14(1): 7283, 2023 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-37949845

RESUMO

Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the "bigger is better" mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume.

3.

AtomVision: A Machine Vision Library for Atomistic Images.

Choudhary, Kamal; Gurunathan, Ramya; DeCost, Brian; Biacchi, Adam.

J Chem Inf Model ; 63(6): 1708-1722, 2023 03 27.

Artigo em Inglês | MEDLINE | ID: mdl-36857727

RESUMO

Computer vision techniques have immense potential for materials design applications. In this work, we introduce an integrated and general-purpose AtomVision library that can be used to generate and curate microscopy image (such as scanning tunneling microscopy and scanning transmission electron microscopy) data sets and apply a variety of machine learning techniques. To demonstrate the applicability of this library, we (1) establish an atomistic image data set of about 10â¯000 materials with large structural and chemical diversity, (2) develop and compare convolutional and atomistic line graph neural network models to classify the Bravais lattices, (3) demonstrate the application of fully convolutional neural networks using U-Net architecture to pixelwise classify atom versus background, (4) use a generative adversarial network for super resolution, (5) curate an image data set on the basis of natural language processing using an open-access arXiv data set, and (6) integrate the computational framework with experimental microscopy images for Rh, Fe3O4, and SnS systems. The AtomVision library is available at https://github.com/usnistgov/atomvision.

Assuntos

Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Microscopia , Biblioteca Gênica

4.

Leveraging Theory for Enhanced Machine Learning.

Audus, Debra J; McDannald, Austin; DeCost, Brian.

ACS Macro Lett ; 11(9): 1117-1122, 2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-36018715

RESUMO

The application of machine learning to the materials domain has traditionally struggled with two major challenges: a lack of large, curated data sets and the need to understand the physics behind the machine-learning prediction. The former problem is particularly acute in the polymers domain. Here we aim to simultaneously tackle these challenges through the incorporation of scientific knowledge, thus, providing improved predictions for smaller data sets, both under interpolation and extrapolation, and a degree of explainability. We focus on imperfect theories, as they are often readily available and easier to interpret. Using a system of a polymer in different solvent qualities, we explore numerous methods for incorporating theory into machine learning using different machine-learning models, including Gaussian process regression. Ultimately, we find that encoding the functional form of the theory performs best followed by an encoding of the numeric values of the theory.

Assuntos

Aprendizado de Máquina , Polímeros , Distribuição Normal , Solventes

5.

Uncertainty Prediction for Machine Learning Models of Material Properties.

Tavazza, Francesca; DeCost, Brian; Choudhary, Kamal.

ACS Omega ; 6(48): 32431-32440, 2021 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-34901594

RESUMO

Uncertainty quantification in artificial intelligence (AI)-based predictions of material properties is of immense importance for the success and reliability of AI applications in materials science. While confidence intervals are commonly reported for machine learning (ML) models, prediction intervals, i.e., the evaluation of the uncertainty on each prediction, are not as frequently available. In this work, we compare three different approaches to obtain such individual uncertainty, testing them on 12 ML-physical properties. Specifically, we investigated using the quantile loss function, machine learning the prediction intervals directly, and using Gaussian processes. We identify each approach's advantages and disadvantages and end up slightly favoring the modeling of the individual uncertainties directly, as it is the easiest to fit and, in most of the cases, minimizes over- and underestimation of the predicted errors. All data for training and testing were taken from the publicly available JARVIS-DFT database, and the codes developed for computing the prediction intervals are available through the JARVIS-tools package.

6.

Aggressively optimizing validation statistics can degrade interpretability of data-driven materials models.

Lei, Katherine; Joress, Howie; Persson, Nils; Hattrick-Simpers, Jason R; DeCost, Brian.

J Chem Phys ; 155(5): 054105, 2021 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-34364331

RESUMO

One of the key factors in enabling trust in artificial intelligence within the materials science community is the interpretability (or explainability) of the underlying models used. By understanding what features were used to generate predictions, scientists are then able to critically evaluate the credibility of the predictions and gain new insights. Here, we demonstrate that ignoring hyperparameters viewed as less impactful to the overall model performance can deprecate model explainability. Specifically, we demonstrate that random forest models trained using unconstrained maximum depths, in accordance with accepted best practices, often can report a randomly generated feature as being one of the most important features in generated predictions for classifying an alloy as being a high entropy alloy. We demonstrate that this is the case for impurity, permutation, and Shapley importance rankings, and the latter two showed no strong structure in terms of optimal hyperparameters. Furthermore, we demonstrate that, for the case of impurity importance rankings, only optimizing the validation accuracy, as is also considered standard in the random forest community, yields models that prefer the random feature in generating their predictions. We show that by adopting a Pareto optimization strategy to model performance that balances validation statistics with the differences between the training and validation statistics, one obtains models that reject random features and thus balance model predictive power and explainability.

7.

On-the-fly closed-loop materials discovery via Bayesian active learning.

Kusne, A Gilad; Yu, Heshan; Wu, Changming; Zhang, Huairuo; Hattrick-Simpers, Jason; DeCost, Brian; Sarker, Suchismita; Oses, Corey; Toher, Cormac; Curtarolo, Stefano; Davydov, Albert V; Agarwal, Ritesh; Bendersky, Leonid A; Li, Mo; Mehta, Apurva; Takeuchi, Ichiro.

Nat Commun ; 11(1): 5966, 2020 Nov 24.

Artigo em Inglês | MEDLINE | ID: mdl-33235197

RESUMO

Active learning-the field of machine learning (ML) dedicated to optimal experiment design-has played a part in science as far back as the 18th century when Laplace used it to guide his discovery of celestial mechanics. In this work, we focus a closed-loop, active learning-driven autonomous system on another major challenge, the discovery of advanced materials against the exceedingly complex synthesis-processes-structure-property landscape. We demonstrate an autonomous materials discovery methodology for functional inorganic compounds which allow scientists to fail smarter, learn faster, and spend less resources in their studies, while simultaneously improving trust in scientific results and machine learning tools. This robot science enables science-over-the-network, reducing the economic impact of scientists being physically separated from their labs. The real-time closed-loop, autonomous system for materials exploration and optimization (CAMEO) is implemented at the synchrotron beamline to accelerate the interconnected tasks of phase mapping and property optimization, with each cycle taking seconds to minutes. We also demonstrate an embodiment of human-machine interaction, where human-in-the-loop is called to play a contributing role within each cycle. This work has resulted in the discovery of a novel epitaxial nanocomposite phase-change memory material.

8.

A High-Throughput Structural and Electrochemical Study of Metallic Glass Formation in Ni-Ti-Al.

Joress, Howie; DeCost, Brian L; Sarker, Suchismita; Braun, Trevor M; Jilani, Sidra; Smith, Ryan; Ward, Logan; Laws, Kevin J; Mehta, Apurva; Hattrick-Simpers, Jason R.

ACS Comb Sci ; 22(7): 330-338, 2020 07 13.

Artigo em Inglês | MEDLINE | ID: mdl-32496755

RESUMO

On the basis of a set of machine learning predictions of glass formation in the Ni-Ti-Al system, we have undertaken a high-throughput experimental study of that system. We utilized rapid synthesis followed by high-throughput structural and electrochemical characterization. Using this dual-modality approach, we are able to better classify the amorphous portion of the library, which we found to be the portion with a full width at half maximum (fwhm) of >0.42 Å-1 for the first sharp X-ray diffraction peak. Proper phase labeling is important for future machine learning efforts. We demonstrate that the fwhm and corrosion resistance are correlated but that, while chemistry still plays a role in corrosion resistance, a large fwhm, attributed to a glassy phase, is necessary for the highest corrosion resistance.

Assuntos

Alumínio/química , Técnicas Eletroquímicas , Ensaios de Triagem em Larga Escala , Níquel/química , Titânio/química , Vidro/química , Aprendizado de Máquina , Estrutura Molecular , Difração de Raios X

9.

An Inter-Laboratory Study of Zn-Sn-Ti-O Thin Films using High-Throughput Experimental Methods.

Hattrick-Simpers, Jason R; Zakutayev, Andriy; Barron, Sara C; Trautt, Zachary T; Nguyen, Nam; Choudhary, Kamal; DeCost, Brian; Phillips, Caleb; Kusne, A Gilad; Yi, Feng; Mehta, Apurva; Takeuchi, Ichiro; Perkins, John D; Green, Martin L.

ACS Comb Sci ; 21(5): 350-361, 2019 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-30888788

RESUMO

High-throughput experimental (HTE) techniques are an increasingly important way to accelerate the rate of materials research and development for many technological applications. However, there are very few publications on the reproducibility of the HTE results obtained across different laboratories for the same materials system, and on the associated sample and data exchange standards. Here, we report a comparative study of Zn-Sn-Ti-O thin films materials using high-throughput experimental methods at National Institute of Standards and Technology (NIST) and National Renewable Energy Laboratory (NREL). The thin film sample libraries were synthesized by combinatorial physical vapor deposition (cosputtering and pulsed laser deposition) and characterized by spatially resolved techniques for composition, structure, thickness, optical, and electrical properties. The results of this study indicate that all these measurement techniques performed at two different laboratories show excellent qualitative agreement. The quantitative similarities and differences vary by measurement type, with 95% confidence interval of 0.1-0.2 eV for the band gap, 24-29 nm for film thickness, and 0.08 to 0.37 orders of magnitude for sheet resistance. Overall, this work serves as a case study for the feasibility of a High-Throughput Experimental Materials Collaboratory (HTE-MC) by demonstrating the exchange of high-throughput sample libraries, workflows, and data.

Assuntos

Ligas/química , Óxidos/química , Estanho/química , Titânio/química , Zinco/química , Técnicas de Química Combinatória , Ensaios de Triagem em Larga Escala , Lasers , Teste de Materiais , Bibliotecas de Moléculas Pequenas/química

10.

High Throughput Quantitative Metallography for Complex Microstructures Using Deep Learning: A Case Study in Ultrahigh Carbon Steel.

DeCost, Brian L; Lei, Bo; Francis, Toby; Holm, Elizabeth A.

Microsc Microanal ; 25(1): 21-29, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30869574

RESUMO

We apply a deep convolutional neural network segmentation model to enable novel automated microstructure segmentation applications for complex microstructures typically evaluated manually and subjectively. We explore two microstructure segmentation tasks in an openly available ultrahigh carbon steel microstructure dataset: segmenting cementite particles in the spheroidized matrix, and segmenting larger fields of view featuring grain boundary carbide, spheroidized particle matrix, particle-free grain boundary denuded zone, and Widmanstätten cementite. We also demonstrate how to combine these data-driven microstructure segmentation models to obtain empirical cementite particle size and denuded zone width distributions from more complex micrographs containing multiple microconstituents. The full annotated dataset is available on materialsdata.nist.gov.

11.

Corrigendum to "A large dataset of synthetic SEM images of powder materials and their ground truth 3D structures" [Data Brief 9 (2016) 727-731].

DeCost, Brian L; Holm, Elizabeth A.

Data Brief ; 16: 1103, 2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29854900

RESUMO

[This corrects the article DOI: 10.1016/j.dib.2016.10.011.].

12.

Machine learning with force-field inspired descriptors for materials: fast screening and mapping energy landscape.

Choudhary, Kamal; DeCost, Brian; Tavazza, Francesca.

Phys Rev Mater ; 2(8)2018.

Artigo em Inglês | MEDLINE | ID: mdl-32166213

RESUMO

We present a complete set of chemo-structural descriptors to significantly extend the applicability of machine-learning (ML) in material screening and mapping energy landscape for multicomponent systems. These new descriptors allow differentiating between structural prototypes, which is not possible using the commonly used chemical-only descriptors. Specifically, we demonstrate that the combination of pairwise radial, nearest neighbor, bond-angle, dihedral-angle and core-charge distributions plays an important role in predicting formation energies, bandgaps, static refractive indices, magnetic properties, and modulus of elasticity for three-dimensional (3D) materials as well as exfoliation energies of two-dimensional (2D) layered materials. The training data consists of 24549 bulk and 616 monolayer materials taken from JARVIS-DFT database. We obtained very accurate ML models using gradient boosting algorithm. Then we use the trained models to discover exfoliable 2D-layered materials satisfying specific property requirements. Additionally, we integrate our formation energy ML model with a genetic algorithm for structure search to verify if the ML model reproduces the DFT convex hull. This verification establishes a more stringent evaluation metric for the ML model than what commonly used in data sciences. Our learnt model is publicly available on the JARVIS-ML website (https://www.ctcms.nist.gov/jarvisml) property predictions of generalized materials.

13.

A large dataset of synthetic SEM images of powder materials and their ground truth 3D structures.

DeCost, Brian L; Holm, Elizabeth A.

Data Brief ; 9: 727-731, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-27830168

RESUMO

This data article presents a data set comprised of 2048 synthetic scanning electron microscope (SEM) images of powder materials and descriptions of the corresponding 3D structures that they represent. These images were created using open source rendering software, and the generating scripts are included with the data set. Eight particle size distributions are represented with 256 independent images from each. The particle size distributions are relatively similar to each other, so that the dataset offers a useful benchmark to assess the fidelity of image analysis techniques. The characteristics of the PSDs and the resulting images are described and analyzed in more detail in the research article "Characterizing powder materials using keypoint-based computer vision methods" (B.L. DeCost, E.A. Holm, 2016) [1]. These data are freely available in a Mendeley Data archive "A large dataset of synthetic SEM images of powder materials and their ground truth 3D structures" (B.L. DeCost, E.A. Holm, 2016) located at http://dx.doi.org/10.17632/tj4syyj9mr.1[2] for any academic, educational, or research purposes.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA