When this happens intentionally, it means the document author has removed or re-written the toUnicode map, using a plugin. You can do it using plugins but would have to manually work out what each pair should be, and recreate the map table a letter at a time. The result when you screenread, export, search or copy/paste is a default set of mappings - so it will be a 1:1 relationship (every "A" will become the same character) - but the pairing is not predictable, so it cannot automatically be repaired. If this toUnicode map is corrupted or missing, the PDF will render to screen (and print) just fine, but Acrobat has no idea what the shapes mean. in the word APPLE the first table says the second shape looks like "P" even if the shapes aren't stored in alphabetical order, the toUnicode table says the second letter is 0x0050, a capital P). When you copy or search the file, the second lookup table is used to work out what the text says (i.e. Acrobat uses the first table to draw the page, so it doesn't actually know what the text "says", only which patterns of shapes to draw. It's a "problem" that often happens accidentally, but is also used intentionally to prevent copying and indexing of PDF files, especially when posted online.įonts in PDF files are stored with two tables, one contains the glyphs (the character shapes) and one contains a "toUnicode" map, which says what character each glyph represents.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |