Problem with Font encoding (I think)
Posted: 23 November 2009 10:10 PM   [ Ignore ]
Newbie
Rank
Total Posts:  12
Joined  2008-04-03

The attached pdf looks fine when opened in a pdf reader or editor.  If you copy the heading you find that the result is what you expect.  If you copy the text you get gobbledygook.

The problem seems to be that the heading is encoded as “Roman”; the text is encoded as “built-in”.  I can’t understand why the text can be displayed correctly, but can’t be copied to the clipboard correctly.  Is there something that can be done in PDFClerk to fix the file so that it can be copied correctly?

File Attachments
Test.pdf  (File Size: 65KB - Downloads: 239)
Profile
 
 
Posted: 23 November 2009 10:24 PM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  407
Joined  2007-03-23

Welcome to the not so wonderful world of the PDF-format. In my opinion: it’s a mess. It’s totally possible to code a PDF such that it will read normally, but lack any information as to what those symbols that are rendered actually mean (although that’s just a small part of why it’s a mess). Basically the software that renders the pages, does not know what it is rendering. The issue here is with the encoding of the PDF, not with the rendering software. The only way around this would be to OCR the page. Not an attractive prospect. You’ll get the same behaviour in Apple’s own Preview, or in PDFPen, or in Adobe Reader. (You’re correct, by the way, that the issue here is a font encoding issue. The supplied fonts do not contain character code information.)

[ Edited: 23 November 2009 10:26 PM by Tonio ]
 Signature 

António Nunes
SintraWorks

Profile