Domanda di colloquio di Celebal Technologies

stemming, lemmatization and tokenization

Risposta di colloquio

Anonimo

14 set 2022

Tokenization - It is the process of breaking down the given text into the smallest unit in a sentence called a token. Punctuation marks, words, and numbers can be considered tokens. Stemming- the process of finding the root of words. Lemmatization- The process of finding the form of the related word in the dictionary. It is different from Stemming. It involves longer processes to calculate than Stemming.