Page 78 - E-Book Kecerdasan Buatan Dandung PTI 1A
P. 78
8.2 Proses Natural Language Processing
Tahapan-tahapan dalam pengolahan teks seperti pada Gambar 8.1.
Gambar 8.1 Proses Pengolahan Teks
Berdasarakn Gambar 8.1 tahapan pertama adalah preprocessing. Preprocessing digunakan
untuk membersihkan sebuah teks seperti menghilangkan symbol, tanda baca, kata sambung,
dll.
Tokenization: Text is tokenized into tokens such as words
Lemmatization: Word is lemmatized into its lemma form
Morphological analyzer: word is analyzed into its root word and its affixes
Stemming: Word is stemmed into its stemmed form
Lowercase: all words are lowercased
75