This paper proposes a new similarity metric, Vector Similarity Metric (VSM), which is as simple as the popular Cosine Similarity Metric (CSM). The CSM has a major deficiency. It yields the same value, irrespective of how different the two vectors are in their sizes so long as the angle between them is the same. This deficiency remains intact even when Natural Language Processing is used to associate semantic meanings to the words/phrases and when the term frequency is modified using Inverse Document Frequency. This deficiency becomes a serious concern when one is comparing the risk profile of one company with the risk profile of another company or investigating the changes in the risk profile of a company from one year to another. The VSM is based on the difference of the two vectors. The paper demonstrates the superiority of VSM over CSM analytically and through real-world examples.

You do not currently have access to this content.