Author's Posts

Reading Time: 2 minutes

Cronycle is an information workflow application powered by Right Relevance (subsidiary of Cronycle), which is a topical information search and relevance platform.

Topics and Influencers (per topic) form the backbone of the search and relevance technology.

  1. Topics (over 50 thousand) including metadata like related topics & semantics like synonyms, acronyms.
  2. Topical influencers (over 2.5M) with score and rank.

Topics are identified by algorithmically mining over 10M unstructured documents on the web and leveraging Wikipedia and Right Relevance topical graph neighborhood techniques. Relationships and semantics are derived from this process with manual corrections and injections for the last mile.

Topical Influencers mining is fully algorithmic and primarily graph based. The methodology leverages ML, semantic analysis and NLP on unstructured data at scale and involves a 2-level proprietary people rank (custom page rank for social graphs):

Stage 1. Global PR to reduce a ~300M nodes graph to ~6M (for now) globally ranked influencers. This is a first level reduction and we don’t expose the scores. It doesn’t have topical context.

Stage 2. Graph partitioning of the ~6M connected nodes from stage 1 across our ~50K structured topic space using unstructured data assigned to each node. This leads to ~50K per topic sub-graphs, where a secondary PR is applied to determine the topic score for each node in each topical sub-graph. This secondary PR score is normalized to calculate the Right Relevance topic score and rank influencers for every structured topic in our platform. 

Our custom PR algorithm is derived from google pagerank but is specialized for social graphs (instead of links/webpages) with many important differences applicable to social networks.

The RightRelevance score of an expert/influencer for a TOPIC represents the authority within the topical community say for e.g. ‘machine learning’ of that influencer. This measure of influence per topic is termed as ‘topical influence’ and the topical communities formed are termed as “Tribes“.

Once we have the scored and ranked influencers’ community for a particular topic (e.g. machine learning, behavioral science, big data, emergency medicine, oil and gas, angularjs,  social media marketing etc.) we mine the web for content. The numeric influence from topics and influencers is inductively applied to this content for measuring relevance and forms a critical part of the search. We download ~600K articles daily from ~2M websites every month. Topical content and information are available in the form of articles, videos and conversations.

Points to note:

  • We dampen followers count, tweet count etc. noisy signals and lay much more focus on the topical network itself.
  • Each influencer can be part of multiple topical sub-graphs aka communities and have a different score, and rank, within each. This is exposed in our apps via scored tags.
  • Other, non structured, topics work via free-form search but the relevance may not be of the same quality. This can be seen by the score ’10’, which, probably poorly done, means we didn’t find a community for the topic.

Both topics and influencer graphs are mined and built algorithmically at scale with ever-increasing quality after every iteration.

Read more