CKineticsDB─An Extensible and FAIR Data Management Framework and Datahub for Multiscale Modeling in Heterogeneous Catalysis

Abstract
A great advantage of computational research is its reproducibility and reusability. However, an enormous amount of computational research data in heterogeneous catalysis is barricaded due to logistical limitations. Sufficient provenance and characterization of data and computational environment, with uniform organization and easy accessibility, can allow the development of software tools for integration across the multiscale modeling workflow. Here, we develop the Chemical Kinetics Database, CKineticsDB, a state-of-the-art datahub for multiscale modeling, designed to be compliant with the FAIR guiding principles for scientific data management. CKineticsDB utilizes a MongoDB back-end for extensibility and adaptation to varying data formats, with a referencing-based data model to reduce redundancy in storage. We have developed a Python software program for data processing operations and with built-in features to extract data for common applications. CKineticsDB evaluates the incoming data for quality and uniformity, retains curated information from simulations, enables accurate regeneration of publication results, optimizes storage, and allows the selective retrieval of files based on domain-relevant catalyst and simulation parameters. CKineticsDB provides data from multiple scales of theory (ab initio calculations, thermochemistry, and microkinetic models) to accelerate the development of new reaction pathways, kinetic analysis of reaction mechanisms, and catalysis discovery, along with several data-driven applications. Abstract Graphic available at: https://doi.org/10.1021/acs.jcim.3c00123
Description
This document is the Accepted Manuscript version of a Published Work that appeared in final form in Journal of Chemical Information and Modeling, copyright © 2023 American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see https://doi.org/10.1021/acs.jcim.3c00123. This article will be embargoed until 07/24/2024.
Keywords
Citation
Lambor, Siddhant M., Sashank Kasiraju, and Dionisios G. Vlachos. “CKineticsDB─An Extensible and FAIR Data Management Framework and Datahub for Multiscale Modeling in Heterogeneous Catalysis.” Journal of Chemical Information and Modeling 63, no. 14 (July 24, 2023): 4342–54. https://doi.org/10.1021/acs.jcim.3c00123.