Hi All,
I know it is not a good idea to (just) refry PDF files (PDF -> EPS -> PDF). Especially when the PDF contains subset embedded fonts. Chances are you will end up with a PDF file which does not contain valid (searchable) text.
I did not know the apposite could also be true. The following zip file contains 2 PDF files echo containing two words: the original and the refried version.
When selecting text from the original PDF (using acrobat 6 through X) file it contains incorrect text, in this case invalid capitals. If I try the same in the refried version the extracted text is correct.
It seems strange to me that a process which only can result in loss of information "fixes" this text issue. Somewhere the correct text must be hidden in the original PDF file. Not only capitals seem to be effected but also random characters which seem to be fixed once refried.
Could anyone think of an explanation?
Is there a workaround without having to refry the PDF (refrying often results in loss of information). I have no influence on the PDF files I recieve, therefore I cannot embed the full fonts.
I am using de C++ SDK for Acrobat to write plugins.
Any pointers would be great!
Kind regards,
Robert