Why Conventional Search is Doomed When It Comes to Science

There are some challenges to overcome when you aim for maximal accuracy:

1. Many different designations of the same entity

Old names, aliases, IDs in different nomenclatures. Each of these textual designations has multiple spelling variants. Additionally, the ways each of those can be spelled makes it impossible to reach all publications and other information which contain any of those formulations.

1.1. Multiple spelling variations of these designations

nuclear factor [kappa] B [subunit] [1]

Some of the very common variations of spelling this gene name are [kappa] with single p, or with Greek letter k or even Latin letter K, sometimes subunit is missing and [1] is replaced with Latin letter ex “nuclear factor kB 1” .

2. Same designations, different meaning (homonyms):

NAFLD - Non-alcoholic fatty liver disease

Tsc2 - tuberin → NAFLD - old name

How BioSeek solves these challenges?

1. Continuous indexation- BioSeek continuously identifies and indexes the bio-terms within the aggregated scientific publications, patents, clinical trials, etc., and incorporates them into the graph database, making them easy to track down in a most efficient way.

2. Dynamic suggestions - when you start typing your search in the search filed, a reactive drop list of tag suggestions for your search query appears, allowing you to select the exact entity you are searching for and obtain the most accurate search results.

For example the gene NFKB1 has 233 different designations on NCBI. That’s plenty already, and if we add all the different spelling options of each one of these 225 designations- nfkb 1, nf kappa B 1, NFkappabeta1, etc., things get really messy. Search efficiency rates drop significantly when information retrieval is based on exact match only.

Accurate and exhaustive search results have a key role in increasing productivity and efficiency in any process, and certainly high impact in scientific research. Obtaining such results requires a much deeper understanding of the terms being searched than the standard keyword-based search can offer. Keywords in keyword search are void of connected meaning, they retrieve formal matches only, without considering any context or relations pertaining to the term being searched. Standard keyword-based search in relational databases simply doesn’t deliver all relevant results: a number of non-formal matches are left out.

Bringing it all together:
BioSeek significantly improves the way the scientific workers search through the fastly growing corpus of published research (plus we are going to add unpublished research to our database soon). We establish connections between the different designations of one given entity, all of them pointing to the same concept- so that regardless of the particular designation you put in your search query, you get as well all results containing the other designations of this entity. We created the Prometheus engine to perform text analysis and to extract all bio terms from the text — terms and notions specific for the Life Sciences sphere — match them against documents and deliver those which contain any of their designation options.