How is the similarity Index calculated for each of the reports?
The similarity index does not change when viewing different report modes. To calculate the Similarity Index, our system first makes a digital fingerprint of the submitted document’s text and searches it against each of the repositories selected to be searched against within the system. Then once the document is scanned against the selected content repositories our system takes the # of matching words our system found within the document and divides it by the document's total word count to produce the Similarity Index percentage for the report.
If exclusion options are applied to the document our system would recalculate the similarity index percentage and remove all matches that the exclusion option logic excluded from the report.
How is the Similarity Report constructed? What matches get hidden by other matches?
As much of the information surrounding the Similarity Report is proprietary, the following is a brief description of how the report is constructed. A document submission is converted into what we refer to as a digital fingerprint that our system then uses to search against our content databases using our proprietary algorithm. Our system then paints the document with highlights for each section of text that matches a source within our repositories. What our system deems to be the best matches to the sections of text are listed in the report sidebar. Although the best matches are listed there may be hundreds to thousands of other sources that match the document’s text. These underlying sources are listed in the Content Tracking mode.
What if two sources have the exact same amount of matching text; which source would be displayed in the Similarity Report as a best match?
This entirely depends on which repository the document matched to. For example if two internet sources were found to match the identical section of text, the most recently crawled internet source would be displayed as the best match. If an internet source and a publication source were found to match an identical section of text, the publication source would be displayed as the top match.