Scroll to top

PhraseScope


Led by Alexander Powell

Current content discovery mechanisms are based on indexing services that users have a stake in but little influence over. PhraseScope will provide an intuitive discovery tool for identifying content items related to whatever research content the user happens to be currently viewing, whilst placing indexing in the hands of the user community. 

Aim

Current content discovery mechanisms are based on indexing services that users have a stake in but little influence over, and which fail to provide a seamless or transparent way of identifying content related to whatever the user happens to be currently viewing. The proposed solution is a mechanism for (1) analysing the text of the article currently being viewed in the browser, (2) extracting its key terms, and (3) interrogating a web-based phrase index service in order to suggest to the user a list of related content items. Identifying metadata (URL and/or DOI at minimum) and the extracted phrases for the current article will be added to the web index if they are not already stored. In this way, users will enlarge the index as they use the service. It will be necessary to seed the index with a substantial set of article data, to ensure that early users are able to derive value from the service and are motivated to engage with it. The phrase index could also support other novel content discovery tools and approaches.

Work at the Sprint

I envisage:

  1. Fleshing out the basic client/server architecture
  2. Developing JavaScript to implement page indexing/phrase identification (which will involve porting existing Python code)
  3. Setting up web server-based index structures (bespoke binary files vs. RDMS; experimentally informed decision needed)
  4. Developing a roadmap leading to cloud hosting of the indexing service 

I am looking for…

Contributors skilled in JavaScript development, cloud infrastructure use (knowledge of practicalities) and UX.