OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Where can I a mapping of Identity-H encoded characters to ASCII or Unicode characters?

  • Thread starter Thread starter Chas. Owens
  • Start date Start date
C

Chas. Owens

Guest
I have a PDF generated by a third party. I am trying to get the text out of it, but neither pdf2text nor copying and pasting results in readable text. After a little digging in the output (of either of two) I found that each character on the screen is made up of three bytes. For example, "A" is the bytes ef, 81, and 81. Looking at the metadata on the PDF it claims to be encoded in Identity-H, so I assume what I am seeing is a set of characters encoded in Identity-H. I have a partial mapping based on the documents I already have, but I want to make a more complete mapping. To do that I need something like an ASCII table for Identity-H.
<p>I have a PDF generated by a third party. I am trying to get the text out of it, but neither <code>pdf2text</code> nor copying and pasting results in readable text. After a little digging in the output (of either of two) I found that each character on the screen is made up of three bytes. For example, "A" is the bytes <code>ef</code>, <code>81</code>, and <code>81</code>. Looking at the metadata on the PDF it claims to be encoded in Identity-H, so I assume what I am seeing is a set of characters encoded in Identity-H. I have a partial mapping based on the documents I already have, but I want to make a more complete mapping. To do that I need something like an ASCII table for Identity-H.</p>
Continue reading...
 

Latest posts

Top