Preprocessing Text Data for Machine Learning
5 min readJan 5, 2021
Machine Learning models cannot work directly with text data. you need to encode your text data in some numeric form.
Any text document is essentially just a sequence of words which you can tokenize into individual words, After transforming your document into a sequence or list of words, you can encode and represent each word in a numeric form using somekind of numeric encoding.