Page 14 - Greenstone tutorial exercises
P. 14
7. Difficult PDF documents
25. Build a fresh Greenstone collection from the two files in sample_files\difficult_documents.
Use the default collection configuration: that is, simply gather the files into a new
collection, and build it.
These files are called No extractable text.pdf and Weird characters.pdf—their names hint at the
problems they will cause!
26. Now preview the collection. The titles and filenames lists show only one of the documents.
When you click the “text” icon to look at the text extracted from that document, it’s
garbage. During the building process this message appeared: “One document was processed
and included in the collection; one was rejected.”
Modes in the Librarian Interface
The Librarian Interface can operate in different modes. So far, you have been using the default
mode, called “Librarian.”
27. Use the Preferences item on the File menu to switch to Expert mode and then build the
collection again. The Create panel looks different in Expert mode because it gives more
options: locate the Build Collection button, near the bottom of the window, and click it.
Now a message appears saying that the file could not be processed, and why.
28. We recommend that you switch back to Librarian mode for subsequent exercises, to avoid
confusion.
14