Current content discovery mechanisms are based on indexing services that users have a stake in but little influence over, and which fail to provide a seamless or transparent way of identifying content related to whatever the user happens to be currently viewing. The proposed solution is a mechanism for (1) analysing the text of the article currently being viewed in the browser, (2) extracting its key terms, and (3) interrogating a web-based phrase index service in order to suggest to the user a list of related content items. Identifying metadata (URL and/or DOI at minimum) and the extracted phrases for the current article will be added to the web index if they are not already stored. In this way, users will enlarge the index as they use the service. It will be necessary to seed the index with a substantial set of article data, to ensure that early users are able to derive value from the service and are motivated to engage with it. The phrase index could also support other novel content discovery tools and approaches.
Work at the Sprint
- Fleshing out the basic client/server architecture
- Setting up web server-based index structures (bespoke binary files vs. RDMS; experimentally informed decision needed)
- Developing a roadmap leading to cloud hosting of the indexing service
I am looking for…