Reproducibility in Quantum Chemistry
Scientific team and organizations should embrace reproducible workflows where all data can be exported, shared, and peer reviewed. For computational chemistry data should include the full flow of data from initial structures, through to final coordinates, energies, and software versions/binary environments used. The Jupyter project has many of the components needed, such as an extensible programming interface, visualization and analysis of data in a common format but it lacks specific workflows for quantum chemistry. This project adds those, and couples notebooks with a data server, and uses an extensible data format definition for static export suitable for long-term archiving of results.
The target audience ranges from a quantum chemistry code developer publishing new methods where this platform enables them to show the input, execution and output of a development snapshot through to end users running calculations on production code. The ability to specify the organization, container name, and version offers the ability to use known versions of codes, and even rerun when fixes are made. It also enhances the peer-review and publication process by offering a full record of what was done computationally, along with the results obtained and a recipe to replicate. As a community we must move towards the routine publication of all of these steps, and consider data standards along with software platforms to reduce the upfront costs of doing so.
The Space Telescope\cite{spacetelescopenotebooks} has selected Jupyter as the primary analysis platform for many of the same reasons it has been used in this project. Fields of scientific research must converge on shared platforms where reproducibility is built in, and share the cost of improving them with customization for each field where it makes sense. The platform described can interface with public databases such as PubChem and QCArchive to import existing data, and produce new data with structures suitable for wider dissemination. The data and metadata standards discussed seek to embrace federated storage of data, embracing the goals of FAIR data to make all data produced more discoverable. The use of established open standards such as InChI, InChI key and SMILES link data produced in individual instances to the global data commons with minimal ambiguity.
Discuss challenges of presenting "permanent" interactive artifacts, versus highly interactive client-server with customization, deep stack, but may lack permanence. Data format enables archiving data, interactive binder images.