See Requirements for the Central Service. Please address your strategy to implementing each requirement referring to the numbering described in that section. If a consortium indicate which organizations would be responsible for which components.
Requirements are itemized by “Done” for those in production or “ToDo” for those to be built and naming “COS” or “partners” leading execution.
Preprints Commons Database
Act as a repository for life sciences preprints, which includes:
            Author’s original manuscript (.doc, .tex etc.) and the converted manuscript (XML);
[Done: COS] OSF preprints become part of the OSF repository along with any associated data, materials, protocols, and connections with other repositories. Metadata is aggregated by SHARE (11 preprint sources, >2 million preprints).
[ToDo: COS] COS will ingest preprint full-text when allowed by license, indexed for discovery, and archived for preservation and to facilitate programmatic access for data-mining research applications.
All files associated with that manuscript, such as figures and any supplementary data; including video and datasets, or links to data stored in appropriate external repositories;
[Done: COS] As an interface to the OSF The Commons can make use of all its repository capabilities. Users can add supplementary data materials links or other files. When made publicly accessible files are displayed with the preprint and available via API. OSF can surface links to external repositories or integrate the external repositories to appear native to the OSF via add-ons. Eight storage add-ons are available now--figshare, dataverse, box.com, dropbox, Google Drive, GitHub, Owncloud, and Amazon S3--and five more in Q2 2017--Dryad Bitbucket Gitlab OneDrive and Fedora.
On registration (i.e., timestamped, project snapshotting) in the OSF, data is copied from all connected services into a preservation environment. In the future, that environment can be an integrated, external repository. For example, on sharing a preprint, (1) authors may connect private storage to facilitate later sharing, or (2) authors may transfer data from active storage to preservation storage.
[ToDo: COS] We will add domain-specific repositories. Letters of support from 15 are attached: Dryad, Protein Data Bank, TalkBank, NIAGADS, NeuroMorpho, NAHDAP, ICPSR, Figshare, flybase, Mouse Phenome Database, Dataverse, Protocols, Sage Bionetworks (Synapse), DIP and Vector Base. An integrated seamless mechanism for deposit of data materials and protocols to domain-relevant repositories will increase discovery and deposit by authors and discovery by readers.
Appropriate metadata
[Done: COS] The submission process collects metadata including title, authors, abstract, discipline, keywords, peer-reviewed DOI, license. Metadata is available via API and HTML meta tags for discoverability (e.g., Google Scholar, FAIR).
SHARE harvests data across preprint services regardless of metadata schema. The data is queryable and normalized to a schema developed for working with diverse sources (draws heavily from schema.org and DataCite). SHARE is relatively schema-agnostic–flexible to additions across sources.
[ToDo: COS] The GB will define metadata standards for The Commons. Fields will be added to relevant workflows (e.g., submission, moderation).
Includes a stable, long-term preservation strategy.
[ToDo: COS] We will add domain-specific repositories. Letters of support from 15 are attached: Dryad, Protein Data Bank, TalkBank, NIAGADS, NeuroMorpho, NAHDAP, ICPSR, Figshare, flybase, Mouse Phenome Database, Dataverse, Protocols, Sage Bionetworks (Synapse), DIP, and Vector Base. An integrated seamless mechanism for deposit of data materials and protocols to domain-relevant repositories will increase discovery and deposit by authors and discovery by readers.
[ToDo: COS] COS is exploring technologies and partnerships to add preservation features including the use of torrent protocols for distributing publicly stored data, the blockchain to create a guaranteed-immutable provenance record, mirroring of public content that individuals could backup or host on cheap, commodity hardware, and partnerships like Internet Archive. The latter includes connecting institutional systems for automatic preservation (e.g., CurateND, Notre Dame).
At inception of the preprint service, the database should be populated with legacy preprints and associated metadata from existing approved servers. Conversion of these legacy files to XML could be considered but is not a specific requirement for this application.
[Done: COS] The GB will decide eligibility for aggregated search. Metadata is already harvested by SHARE; adding preprint services is trivial.
[ToDo: COS] Before launch, COS will harvest full-text preprints from eligible services to accompany metadata harvested by SHARE. Documents will run through a text-extraction pipeline and be indexed for discovery purposes. Word and LaTeX should be convertible to XML; PDF will be difficult (see section 4).
The provider(s) of these services are also required to remove or flag any manuscripts that violate the standards set forth by the Governance Body.
[Done: COS] Moderation is manual; an administrator moderation dashboard is prototyped and scheduled for July 2017 release.
[ToDo: COS] The GB will define moderation standards for The Commons. The dashboard will enable administrators to apply those standards to new submissions. The GB will decide whether to include preprints from services that do not provide the necessary metadata (e.g., license information). Ideally, The Commons will facilitate standards for metadata across external services.
Preprints Commons Human Interface
            Provide a web interface for browsing and searching.
                        Display abstracts and links to source
                        Potentially provide download functionality for content held in the CS in a variety of formats: PDF, XML, HTML, etc
                        Display snippets (like Google Books) to place full-text search results in context
                        If the source is not available elsewhere, or with consent of the ingestion source, display the full manuscript and                         figures/supplementary files.
                        Display a clear statement indicating that the material is a preprint
                        Link to other versions of the manuscript elsewhere, especially journal versions, using Crossref metadata or information from                         other sources
                        Schema.org tags
                        Make available metrics and anonymized usage data to humans.
                        Be developed in line with good user-experience web principles, be fully responsive
                        Support login functionality - via ORCID
[Done: COS] The preprints search interface offers aggregated search, faceting by preprint service and discipline and sort by search relevance or upload date (Figure \ref{493406}).