pdf

Extracting text from pdf with custom font

by admin
October 16, 2024
0 Comments
Less than a minute
5 Views
6 days ago

I have a voluminous PDF file containing text with specific scientific notation. I’m trying to extract the text using pdfplumber.

At first, I noticed that certain symbols are extracted as capital Latin characters, while technical symbols like ‘[‘ and codes e.g., (cid:8) are also present. Moreover, the same code is often displayed in the file with different symbols. I solved this problem by collecting not only the text representation of each symbol but also the name of the font.
However, I now wonder if it is possible to extract the encoding directly from the PDF file. I mean getting information in the format: {‘symbol’: ‘e’, ‘font’: ‘ejdeij+4brane’} displayed as something.

You need to sign in to view this answers

Leave feedback about this Cancel Reply

PROS

Add Field

CONS

Add Field

Upload Image

Choose Image

Upload Video

Choose Video

External Video Link

Review anonymously

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Extracting text from pdf with custom font

Leave feedback about this Cancel Reply

PROS

CONS

Categories

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP

Recent Posts

Postgres drop type XX000 “cache lookup failed for type”

PostgreSQL how to merge rows where some fields match and others are null

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Follow Us

Extracting text from pdf with custom font

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP