\r\n\t\t\t
[vc_column_text]Led by Paul Walk, Martin Klein<\/strong><\/p>\nThe problem:<\/b> There is significant and growing interest in connecting pre-prints to peer-review, publication and endorsement services, what is known as the \u201cpublish, then review\u201d model. Some integrations of this kind have already been prototyped and developed. However these are by nature “point-to-point” solutions linking, for example, a single significant preprint repository to a review service. Ideally, overlay journals should be able to review and publish preprints available in any preprint server or repository.<\/span><\/p>\nMany repositories are developed, deployed and managed in low-resource conditions. This means that development resources are scarce, and it is not viable to create individual, service-specific solutions for each and every integration requirement.<\/span><\/p>\nIf we are to connect pre-prints in many, distributed repositories to peer-reviews and similar resources in a wide range of services then we need a general, interoperable protocol suitable for the linking of resources across a distributed service environment.<\/span><\/p>\n <\/p>\n
The solution:<\/b> The resource-oriented nature of the Web is well-suited to an environment which places value in the fact that control of resources is distributed across a large number of repositories. In such an environment, it makes sense to take a pass-by-reference approach to interaction between different networked services, rather than relying on machine or human mediated processes to pass copies of resources around the network.<\/span><\/p>\nResources in repositories have stable URIs that can be used for referencing. This means that a request for review can be sent as a standards-based notification that carries a resource\u2019s stable URI to the inbox of a review service. This also means that the review service can obtain the resource that is to be reviewed by visiting that stable URI. Generally, this means that it becomes possible to invoke and use remote services on the network, by passing instructions to them together with, crucially, URIs identifying particular resources.<\/span><\/p>\n <\/p>\n
Proposed work at Sprint: <\/b>The COAR Notify project is developing a model for such notifications (https:\/\/notify.coar-repositories.org), building on the standard W3C Linked Data Notifications protocol and the W3C Activity Streams 2.0 vocabulary.<\/span><\/p>\nWe have some early development partners helping us to explore some preliminary use-cases, and we expect to be able to deploy a reference implementation of one of these use-cases later in 2021.<\/span><\/p>\nHaving done this, the next step will be to invite wider participation. We believe that the eLife Sprint could offer a perfect opportunity to engage with like-minded technical people, who will be able to see the potential in what we are building, and who could work with us to develop more use-cases and prototypes.<\/span><\/p>\nAs part of our project (in advance of the Sprint) we are developing a notification “inbox” for testing and experimentation purposes: This would be made available to the Sprint as a “target” for notifications. We are also developing documentation and example code to enable people to quickly assemble client software in the programming languages of their choice (or even using desktop HTTP client tools such as Curl).<\/span><\/p>\nWe would design the Sprint activity to deliver two things:<\/span><\/p>\n\n- Feedback on existing notification payload specifications, and their suitability for more use-cases<\/span><\/li>\n
- One or more prototyped notification exchanges demonstrating the use of Notify to address new resource-linking opportunities or requirements between repositories and other services.<\/span><\/li>\n<\/ul>\n
[\/vc_column_text]<\/p>\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n<\/div>
\r\n\t\r\n\t
\r\n\t\t\t\t
MERITS: Metaresearch Evaluation Repository for Identifying Trustworthy Science<\/h4>\r\n\t\t\r\n\t\t\t
<\/span>\r\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n\t\r\n\t\t
\r\n\t\t\t
[vc_column_text]Led by Cooper Smout, Dawn Holford, Paola Masuzzo<\/strong><\/p>\nThe problem:<\/b> In recent years, there has been an explosion of interest in post-publication peer-review, with many models proposing multidimensional article-level ratings (e.g., Kriegeskorte, 2012, Frontiers in Computer Neuroscience) as an alternative to unidimensional journal-level metrics (e.g, the journal impact factor). In line with these ideas, a growing number of preprint review platforms now solicit reviewers\u2019 ratings of preprints on multiple dimensions (e.g. PREreview, Plaudit, Scibase, Rapid Reviews Covid-19, Crowdpeer; see Reimagine Review for more examples). Presently, however, these ratings remain siloed within each project, limiting the interoperability, searchability, and comparison between sites, and preventing research that could otherwise be conducted into the nature of these ratings and how they relate to real-world outcomes (e.g. citations, patents, replicability).\u00a0<\/span><\/p>\nAdditionally, it remains difficult for stakeholders (e.g. researchers, journalists, administration staff) to identify which preprints\/articles have been peer-rated, nor where to find such ratings. This lack of transparency limits the exposure of the evaluations, and makes it hard for readers to identify whether a newly published preprint has been evaluated or not. Highlighting this information could help limit the impact of low quality research, such as the large number of low-quality preprints that have been promoted by the media throughout the COVID pandemic.<\/span><\/p>\nThe solution:<\/b> We propose to develop a central database for organising and storing multi-dimensional point-based article ratings. This database will serve two key purposes: (1) to amalgamate ratings from different preprint review platforms into a single location, and (2) to help researchers\/journalists identify and find peer ratings of articles\/preprints. Ratings will be stored in a common machine-readable format and linked to the original article and source of the rating (i.e. review platform). Articles will be identified using DOIs.\u00a0<\/span><\/p>\nIn the future, we anticipate that this database could be expanded to serve other purposes, for example, storing evaluation data collected during meta-research experiments (e.g. replicability ratings collected under the RepliCATS program), allowing peer reviewers at traditional journals to enter ratings submitted during the peer review process, and\/or amalgamating multiple ratings across different sites (subject to appropriate research on this topic).<\/span><\/p>\nProposed work at Sprint: <\/b>For each deliverable, we will identify key users (e.g., meta-researchers, librarians, journalists), write user stories to understand their needs, and conduct functionality tests for the key users.<\/span><\/p>\nDeliverable 1: Prototype database<\/span><\/p>\n\n- Establish project goals, code of conduct and contributor roles<\/span><\/li>\n
- Identify the types of data (e.g., from PREreview, Plaudit, Docmaps) to import into the database (we will bring examples to the Sprint) and determine a common syntax for them.<\/span><\/li>\n
- Determine platform to host database Build database and launch a prototype version on the host platform (note: we have funding available to cover hosting costs)<\/span><\/li>\n
- Create a project Readme, Licence and Roadmap<\/span><\/li>\n
- Develop code to import data and prevent duplication, and search database by DOI<\/span><\/li>\n
- Write contribution guidelines for future data imports<\/span><\/li>\n<\/ul>\n
<\/p>\n
Deliverable 2 (subject to time and relevant skills): Database API<\/span><\/p>\n\n- Determine API framework (e.g. OpenAPI 3.0)<\/span><\/li>\n
- Develop API<\/span><\/li>\n<\/ul>\n
<\/p>\n
Deliverable 3 (subject to time and relevant skills): Web interface to visualise the API<\/span><\/p>\n\n- Design web interface<\/span><\/li>\n
- Design a visual representation of the rating data, including links to the original reviews<\/span><\/li>\n
- Write website text<\/span><\/li>\n
- Develop and launch the web interface<\/span><\/li>\n<\/ul>\n
[\/vc_column_text]<\/p>\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n<\/div>
\r\n\t\r\n\t
\r\n\t\t\t\t
Genestorian<\/h4>\r\n\t\t\r\n\t\t\t
<\/span>\r\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n\t\r\n\t\t
\r\n\t\t\t
[vc_column_text]Led by Manuel Lera Ramirez<\/strong><\/p>\nThe Problem:<\/b> Genotypes of model organism strains are recorded by researchers as manually entered text, loosely following an allele nomenclature that becomes increasingly obsolete with the growing diversity of genetic modifications. When consulting a publication or a laboratory’s strain database, it can be hard or impossible to understand how a strain was generated. The best way to record the genotype of a strain would be to document the sequence modifications with respect to the reference genome, and the biological resources used to produce those. Electronic Laboratory Notebooks do not provide this functionality.<\/span><\/p>\nDocumentation of the generation of recombinant DNA resources is possible with proprietary software. However, these services do not allow exporting the history of recombinant DNA entries in an Open Source machine-readable format. This prevents the integration of collections with other bioinformatic tools and limits the usability of data produced by researchers. Conversely, Open Source libraries for in silico genetic engineering have limitations. Besides the fact that researchers need to know how to code to use them, these libraries do not provide functionality for inventorying biological entities nor their relationship.<\/span><\/p>\nProducing a simple tool to document genetic engineering unambiguously would improve the reproducibility of science, fostering collaboration and knowledge transfer.\u00a0<\/span><\/p>\n <\/p>\n
The Solution:<\/b> I work in a fission yeast lab and in my free time I have been developing Genestorian, an Open Source web application to document genetic modifications in model organism collections. Genestorian will keep a laboratory inventory of strains, oligonucleotides and plasmids with sequence traceability, using a relational database, in silico molecular biology libraries, and genome databases. New plasmid and allele sequences will be generated in silico from sequences retrieved from the corresponding laboratory inventory and the model organism genome database. Strains and plasmids will be linked to the resources and genetic-engineering methods used to produce them, and to the experimental data validating genetic operations.<\/span><\/p>\nGenestorian will allow users to document strain and plasmid generation unambiguously on the browser through an intuitive web interface. It will also provide programmatic access through an API, and enable data exportation in a machine-readable format. This information can then be used to provide documentation of the history of a biological resource for publications, repositories or collaborators.<\/span><\/p>\nRight now Genestorian is at a very early stage of development, but a landing page is set up with a brief description and a video of a prototype: https:\/\/www.genestorian.org\/<\/span><\/p>\n <\/p>\n
Proposed work at Sprint:<\/b>\u00a0 I am a self-taught programmer, and although I write software for my research, I have never made major contributions to software projects with many users. Additionally, I am new to web development, so I lack the experience to make technical decisions that will impact security, how easy it is to use or deploy the application, etc.<\/span><\/p>\nI see the Sprint as an opportunity for me to discuss the design of the application with experts at an early stage so I can incorporate their feedback into the planning of the software. Additionally, I would like to discuss with them how to incorporate user feedback into the development cycle, to produce an easy to use software tool that meets the needs of the end-user.<\/span><\/p>\nFinally, I am trying to find funding for the development of this project. I have applied for a grant to develop this tool in the lab of J\u00fcrg B\u00e4hler in UCL, but we still do not have an answer from the funding agency. Should we not receive the grant, I hope to learn more about funding opportunities for Open Source software for research during the Sprint.<\/span>[\/vc_column_text]<\/p>\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n<\/div>\r\n\t\r\n\t
\r\n\t\t\t\t
Octopus<\/h4>\r\n\t\t\r\n\t\t\t
<\/span>\r\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n\t\r\n\t\t
\r\n\t\t\t
[vc_column_text]Led by Alexandra Freeman<\/strong><\/p>\nThe Problem:<\/b> Octopus is a platform designed to be the new primary research record. Initiated at the eLife sprint in 2018, it is now hoping to launch by the end of 2021.<\/span><\/p>\nThe platform itself is nearly ready, but in Octopus every publication needs to be linked to an existing one – so the problem is creating a framework of linked publications, extracted from the Open Source literature, to which all new publications can be linked when Octopus launches.<\/span><\/p>\nIn Octopus, publications are one of 8 types, and the framework we will be creating will be of Problems – research questions. These need to be automatically generated, extracted from the existing Open Source corpus, using natural language processing, and then hierarchically linked to form a branching structure.<\/span><\/p>\nWith that framework created, Octopus will be ready for launch – everyone coming with a new publication will be able to find the most closely related research question to link their new publication to. This branching structure will mean that in future, scientific work will all be much more easily discovered and navigated.<\/span><\/p>\nThe solution:<\/b> The solution we envisage is:<\/span><\/p>\n\n- To take JISC’s Open Access corpus The Core (JISC is a partner to Octopus)<\/span><\/li>\n
- To pre-filter the corpus by presence of an Abstract<\/span><\/li>\n
- To extract sentences that are most likely to define the research question of each paper, using adaptations of an algorithm originally developed for extracting the main findings of a paper, developed by Kevin Heffernan. Combine these with the title, journal and author keywords of the relevant paper (and keep the DOI).<\/span><\/li>\n
- To cluster those research questions by semantic similarity using existing NLP algorithms, run iteratively to make hierarchical clusters<\/span><\/li>\n
- To form a ‘research question’ (or Problem) for each cluster which is not necessarily perfect English, but understandable, and keep the DOIs of the papers that link to that Problem.<\/span><\/li>\n
- To feed that framework of Problems into the Octopus database, along with the linked papers.<\/span><\/li>\n<\/ul>\n
Proposed Work at Sprint:<\/b> It is not clear how far through the above process we will be by October, when the sprint will happen. Hopefully we will have had a chance to refine the algorithms and perhaps be ready to run them en masse on the large corpus. The result of this will be a node and link map of all the current problems in science – quite something to behold!<\/span><\/p>\nWhatever stage we’re at, the knowledge and expertise and time of a group of participants will be incredibly useful to get the Problem framework into Octopus ready for launch.<\/span><\/p>\nIf this first milestone is achieved, then the final stage would be to create automatic emails to the authors of the papers which have been used from the corpus, asking them to check the classification of their own paper and the formation of the research question to which it has been linked. We will thus be crowd-sourcing a human touch to perfect the Problem framework prior to launch.<\/span>[\/vc_column_text]<\/p>\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n<\/div>\r\n\t\r\n\t
\r\n\t\t\t\t
R4E curriculum: For Community-led Reproducibility Education<\/h4>\r\n\t\t\r\n\t\t\t
<\/span>\r\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n\t\r\n\t\t
\r\n\t\t\t
[vc_column_text]Led by April Clyburne-Sherin<\/strong><\/p>\nThe Problem: <\/b>Today there is no reliable source for an average researcher to tap when they need introductory level, short-duration content on reproducible methods and tools. While initiatives, like the Turing Way, The Carpentries provide excellent onramps, researchers first require context to understand how these initiatives apply to them, their research, and their needs. There is a critical gap that exists at the introductory level. Our global community-led reproducibility education project, Reproducibility for Everyone (R4E) builds an introductory layer upon these other initiatives, and can help expand their reach and impact.<\/span><\/p>\nResearchers need to identify where gaps exist in awareness of open practices and fill that need with timely, targeted educational offerings. R4E aims to scale open research practices by linking researchers to the tools, communities, and practices that will help them work open effectively. R4E volunteer researchers run practical introductory workshops covering a conceptual framework for reproducibility. They cover fundamental methods, tools, and initiatives to improve reproducibility. During the eLife Sprint, R4E aims to seed, enhance, and develop curriculum with wide reuse at scale to help fill educational gaps and introduce more researchers to reproducibility.<\/span><\/p>\nThe solution:<\/b> Our project aims to apply an open source development approach to community-led reproducibility education. The landscape of reproducibility changes quickly, and we learn more from our participants and instructors with every R4E workshop. In the eLife Sprint, we aim to seed and develop new curriculum modules from the greater researcher community. We also aim to recruit contributors to review and revise existing curriculum modules. The R4E curriculum is iterative: instructors add discipline-specific tools and methods, modules that feel long can be tightened, and popular modules expanded. We hope to open up this iteration to a larger open source community beyond our existing R4E volunteers. Designing an open contribution process and testing it during the eLife Sprint benefits the quality of the curriculum through a larger pool of diverse reviewers and increases the variety of training materials through the seeding of new curriculum ideas to be developed and tested. To achieve this, we will be creating and documenting a process of virtual collaboration of module creation from proposal, to review, to publishing using online tools such as Github, Slack, and the Open Science Framework.<\/span><\/p>\nProposed Work at Sprint:<\/b> The main aim of participating in the sprint is to recruit diverse open source contributors to R4E curriculum development and improvement. A secondary aim is to receive feedback on the contribution process so we can improve our documentation and processes for new contributors. Milestones we hope to achieve include:<\/span><\/p>\n\n- Seed new R4E curriculum modules<\/span><\/li>\n
- Expand or adapt existing R4E curriculum modules to include new audiences, topics, or disciplines<\/span><\/li>\n
- Review and revise existing R4E curriculum modules to include new tools, methods, vocabulary<\/span><\/li>\n
- Translate existing curriculum modules into new languages<\/span><\/li>\n
- Gather feedback and improve contributor guides, workshop guides, and other onboarding materials<\/span><\/li>\n
- Identify gaps in documentation and barriers to participation<\/span><\/li>\n<\/ul>\n
[\/vc_column_text]<\/p>\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n<\/div>
\r\n\t\r\n\t
\r\n\t\t\t\t
Research Group Handbook<\/h4>\r\n\t\t\r\n\t\t\t
<\/span>\r\n\t\t<\/div>\r\n\t<\/div>\r\n\r\n\t\r\n\t\t
\r\n\t\t\t
[vc_column_text]Led by Natalie Thurlby, James Thomas, Alastair Tanner<\/strong><\/p>\nThe Problem:<\/b> The research culture of a group has an impact both on the people working within it (especially students and postdocs), and on the quality of the work that they create together. This culture is created through everyday decisions: if and how team meetings are run, how the group approaches reproducibility and publication practices, what\u2019s in the lab code of conduct, who does the administration work, etc.\u00a0\u00a0<\/span><\/p>\nOften these things are not written down in a group handbook, since this is a prohibitive time commitment for stretched researchers. When they are, they are usually not a collaborative effort.\u00a0<\/span><\/p>\nWhen these things aren\u2019t written down:<\/span><\/p>\n\n- People in the team can\u2019t easily get the support they need.\u00a0\u00a0<\/span><\/li>\n
- Practices are more difficult to change for the better (e.g. making the group more inclusive or reproducible). The team may suffer from the Tyranny of Structurelessness (https:\/\/www.jofreeman.com\/joreen\/tyranny.htm): since the process for decision making isn\u2019t known, it\u2019s difficult for group members with less power to influence it.<\/span><\/li>\n<\/ul>\n
The Solution:<\/b> We want to create a template research group handbook, and materials to help people use it. We hope this will help people to take ownership of these decisions by reducing the barrier to thinking about the research culture. Both of these paired resources will be on GitHub and use Jupyter Book. This will build on and link to other Open Source resources.<\/span><\/p>\nThe template handbook will reduce the time required for research groups to co-create a research handbook together by containing templates and activities to create resources such as the team\u2019s roles and responsibilities or group values (e.g. \u201cWe work Openly, and here\u2019s how). The template repository (at a very early stage) is on GitHub, including plans for what it might contain: https:\/\/github.com\/very-good-science\/our-handbook-template\/issues\/1<\/span><\/p>\nThe how-to-use guidebook will explain how to make best use the template, both technically (e.g. how to create a repository from a template) and practically (e.g. run an annual team meeting to maintain the handbook).\u00a0<\/span><\/p>\nThe placeholder repository is here: https:\/\/github.com\/very-good-science\/our-meta-handbook\u00a0<\/span><\/p>\nAs a bonus, doing a writing (rather than a coding) task as an introduction to GitHub works excellently to introduce people to online collaboration on GitHub!\u00a0<\/span><\/p>\n