Preprocessing Text Data for Machine Learning

AnisKHELOUFI
5 min readJan 5, 2021

Machine Learning models cannot work directly with text data. you need to encode your text data in some numeric form.

https://pixabay.com/fr/users/gdj-1086657/

Any text document is essentially just a sequence of words which you can tokenize into individual words, After transforming your document into a sequence or list of words, you can encode and represent each word in a numeric form using somekind of numeric encoding.

--

--

AnisKHELOUFI

Data Engineer and Machine learning enthusiast with a great intrest in cloud technologies