Page 79 - E-Book Kecerdasan Buatan Dandung PTI 1A
P. 79
Stopword elimination: words are filtered by a stop word list
POS Tagger and stopword elimination: conduct POS Tagger and filter words based on
its POS Tag
Spelling correction: incorrect words (including informal) are corrected
Word normalization: acronym
Entity masking: words fulfilling certain p
Tahapan kedua adalah Feature Extraction
Change token feature into number →called vectorization
Example on unigram word as token feature for spam filtering:
▪ Text: “complimentary Ibiza Holiday needs your URGENT collection”
▪ Input list of token: complimentary, ibiza, holiday, needs, your, urgent, collection
▪ Output: Training/Testing data
Features as Bag of Words
76