Word Sense Disambiguation & its approaches in NLP

Fahad Ashiq ⚡
8 min readSep 27, 2020



Word Sense Disambiguation is considered to be the one of the earliest term in the days of computer treatment of language in the early 1950’s where computers are considered to be the best and fastest way possible for communication Word Sense Disambiguation is a crucial but very challenging technique in the area of natural language processing. There are number of algorithms and approaches available but very less work has been done in classifying Word Sense Disambiguation algorithms according to the available technique. In this paper we will discuss and review the basic two algorithms used for Word Sense Disambiguation which is LESK which falls in the category of unsupervised one and Naïve Biased which falls in the category of supervised one’s and brief discussion about research related work for word sense disambiguation and the results of both which is best and efficient algorithm or approach to use in implementing application of Word Sense Disambiguation. WSD is mainly used in different areas of Retrieval of Information, Extraction of Information, Translation of Machine and Question Answering. [1]

Background :

Natural Language Processing is promising domain of computer science, which is associated with an Artificial intelligence, mapping the interactions of systems with human languages. When it comes to languages it’s really important to understand that without NLP (Natural Language Processing) contribution the interpretation of the process is incomplete. NLP considers to be helping hand for machines in analyzing the respective algorithm of words which has double meaning in real. Natural language is inherently ambiguous. Most commonly-used words have several meanings. In order to identify the intended meaning of a word one has to analyze the context in which it appears by dire WSD which we briefly explain in the following. exploiting information from raw and semantic texts. The task of automatically assigning predefined meanings to words in contexts, known as Word Sense Disambiguation, is a very basic task in computational lexical semantics (Naively, 2009). [2]

Introduction to Word Sense Disambiguation:

One of the very first problem that is by any natural language processing system is that of lexical ambiguity, be it may be syntactic or semantic. The problem of resolving semantic ambiguity is generally known as word sense disambiguation and has proved to be more difficult than syntactic disambiguation. There is a lot of words available there which possesses more than one 2 | P a g e sense. Word Sense Disambiguation is often considering the count of word or single word which has double meaning in common sentences people often pick this term as double meaning of one single in whole sentence this thing arise the importance of WSD in Natural Language Processing application’s because sometimes computers considers the actual word sense of specific word but sometime it is difficult for computer programs to detect the actual meaning of the word in the same context we use punctuations to handle it as a double meaning of the specific word in one sentence So, by considering this application words can be distinguishing for their actual meaning to use in their respective sense. This Term is often confusing how people disambiguate words is an interesting problem that can provide insight in some examples of which is psycholinguistic approach and WSD has many different task to apply which define in the form of Lexical Sample Task Which Implements on chunks of pre-selected words and small count of words and sense of each word considered to be different in every sentence and context we consider the best approach for it. There are many algorithms available in the research domain the two mainly algorithms which we will discuss in the review paper further more these algorithms and approaches will briefly the main aspects of WSD using Natural Language Processing. There are three approaches in which WSD can use for its implementation are as follows:

  • Supervised Approach.
  • Unsupervised Approach.
  • Knowledge Based Approach.

Comparison Between LESK and NAIVE Biased Algorithm


LESK algorithm is unsupervised approach in WSD identifies and discovered in 1986 and it proposed a method that sure the degree of overlap and between the glosses of the most target and context words. Known as the LESK algorithm which has been cited and extended in the word sense disambiguation community where thousands of WSD Applications are already have been existed and initiated. LESK algorithm is very sensitive to the exact wording of definitions, so the absence of a specific word and sentence radically change the results. Of whole sentence which means the LESK algorithm plays a vital role in distinguishing words of which has double in meaning in real life scenarios and The main Problem statement for LESK algorithm is considered to be the most irrelevant variant meaning in the extraordinarily rich literature on WSD, we focus our review on those closest to the topic of LESK and NBM. In particular, we opt for the “simplify 3 | P a g e LESK” (Kilgariff and Rosenzweig, 2000), This particular variant prevents proliferation of gloss comparison on larger contexts (Mihaela et al., 2004) and is shown to outperform the original LESK algorithm (Vasilis et al., 2004). To NBMs have been employed exclusively as classifiers in WSD — that is, in contrast to their use as a similarity measure in this study. Gale et al. (1992) used Naïve Biased classifier resembling an information retrieval system: a WSD instance is regarded as a document, and candidate senses are scored in terms of “relevance” When evaluated on a WSD benchmark (Vasilescu et al., 2004), the algorithm compared most favorite to LESK variants (as expected for a supervised method). Pedersen (2000) proposed an ensemble model with multiple NB classifiers differing by context window size. HRIS tea (2009) trained an unsupervised NB classifier using the EM algorithm and empirically demonstrated the benefits of WordNet-assisted (Fell Baum, 1998) feature selection over local syntactic features. [4]

Naïve Biased:

It concludes that the Naïve Biased algorithm is effective for its kind of use and in the classifier for word sense disambiguation the most valuable naïve based algorithm mostly implements on the WordNet Examples of our own word which we can be considered for all kind of its own application for its own we consider all of it’s application where words are considered to be semantic context we where words are considered to be the actual meaning of the context in the real world application. In Naïve biased model we consider words to semantics of words in the same context which do exist and oppose the different meaning of such sentence. A famous example is to determine the sense of pen in the following passage (Bar-Hillel 1960): Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. In this example the word pen as five meaning in real and it has an ambiguity to in different context of meaning.

Application of Word Sense Ambiguation:

There are a lot of applications needs to WSD which assist in understanding the components of text which considered to be the ambiguity level of word which needs to be guessed alright in the actual way which has already been available. Some of them will be mentioned in this section.

  • Machine Translation (MT): It uses WSD for solving the ambiguity in word meaning in the sentence for getting exact translation. For example: In the English sentences, (He scored a goal) and (It was his goal in life), cannot be translated correctly without extracting the correct sense for the word (goal) because it has different senses in these sentences (Pal and 2015)
  • Information Extraction (IE): WSD is used in IE and text mining for the accurate analysis of text. In general, a semantic analysis is very useful in IE because senses and synonyms play main role in it
  • Content Analysis (CA): WSD is very important phase in content analysis and can help to categorize data according to user requirements and solve many problems ()
  • Information Retrieval (IR): IR is one of the main real-world applications for World Sense Disambiguation. It is used for retrieving a set of documents that are semantically linked to a particular user query. The WSD help to increase accuracy of Information Retrieval

Limitations and Challenges of Word Sense Disambiguation :

There are number of challenges and limitation available for Word Sense Disambiguation we consider the chance for its own application: Here are some challenges and limitation available for research and we will discuss the remaining path of its own kind we must say we have proposed solution for these challenges and limitation:

  • The definition of complete problem that some authors claim that the meaning of word is discrete.
  • Sense inventory and granularity. The task depends on the applied sense of hope.
  • ML algorithm community show that even most sophisticated methods which considers to be the and make sense of all along its kind. Manu other Kind of challenges we face through the daily routine because word sense disambiguation considers to disambiguate the words available in the same context. We consider the same syntax for the NLP because all the available resources of has been resolved we consider the existing limitations and challenges are considered to be the best part of it.

Future Initiative of Word Sense Disambiguation:

As in some of the research work available in the market, we consider the best way possible for all of its kind we must say we have to consider all things which we allow in the future to take initiative in word sense disambiguation considered. Moreover, as the sense distribution in the training set is pivotal to boosting the performance of WSD systems, we also present two unsupervised and language-independent methods that automatically induce a sense distribution when given a simple corpus of sentences. We show that, when the learned distributions are taken into account for generating the training sets, the 5 | P a g e performance of supervised methods is further enhanced. Experiments have proven that Train-OMatic on its own, and also coupled with word sense distribution learning methods, lead a surprised system to achieve state-of-the-art performance consistently across gold standard and languages. Importantly, we show how our sense distribution learning techniques aid Train-OMatic to scale well over domains, without any extra human effort. Number of work has already been doing and people are getting the same way we consider the future initiatives of such as we consider the name of our Urdu word sense disambiguation the work has already been done in this domain and many other things available in the research industry because many of the other task has been already accomplished.


Here we conclude the best result after examine and determine the two best algorithms of Word Sense Disambiguation and we have also discussed the applications and future initiatives related to WSD which might helpful in the near future to consider all the relevant research work has also been proved the best approach is all corpus based and knowledge based because their similarity of supervised and unsupervised we consider the best approach always in implementing word sense application in real world. We have proposed a general-purpose Naive Bayes model for measuring association between two sets of random events. The model replaced string matching in the LESK algorithm for word sense disambiguation with a probabilistic measure of gloss context overlap. The base model on average more than doubled the accuracy of LESK in Senseval-2 on both fine- and coarsegrained tracks., including open-text WSD, so as to compare with more recent Lesk variants. We would improve probability estimation and inference. Other NLP problems involving compositionality in general might also benefit from the proposed manyto- many similarity measures.


[1] Word Sense Disambiguation and Its Approaches by Vimal Dixit1*, Kamlesh Dutta2 and Pardeep Singh2

[2] Unsupervised Word Sense Disambiguation Using Neighborhood Knowledge1 Huang Heyan 1,2 , Yang Zhizhuo1,2, and Jian Ping 1,2

[3] Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation

[4] Word Sense Disambiguation: Survey Study 1 Ahmed H. Aliwy and 2 Hawraa A. Tah



Fahad Ashiq ⚡

MCT and GOLD MLS Ambassador @ Microsoft ⚡| Ex SDG Lead @ Stacks Pakistan ✨ | Software Engineer 💻| Tech Enthusiastic 🚀 | Mentor @ MLSA Community Lahore 👨‍💻