Text is not stored in PDF files in UTF-8 or any other Unicode. PDETextItemCopyText copies text without recoding it. There is a HUGE gap between the internal text format and having it in Unicode. You need to understand text encoding issues from the PDF specification, or for text extraction (without editing) use a different API like a UCS WordFinder.
↧