Page 79 - E-Book Kecerdasan Buatan Dandung PTI 1A
P. 79

  Stopword elimination: words are filtered by a stop word list

                         POS Tagger and stopword elimination: conduct POS Tagger and filter words based on
                          its POS Tag

                         Spelling correction: incorrect words (including informal) are corrected
                         Word normalization: acronym

                         Entity masking: words fulfilling certain p
                    Tahapan kedua adalah Feature Extraction









                         Change token feature into number →called vectorization

                         Example on unigram word as token feature for spam filtering:
                          ▪ Text: “complimentary Ibiza Holiday needs your URGENT collection”

                          ▪ Input list of token: complimentary, ibiza, holiday, needs, your, urgent, collection
                          ▪ Output: Training/Testing data











                    Features as Bag of Words



























                                                                                                    76
   74   75   76   77   78   79   80   81   82   83   84