Quantcast
Channel: Adobe Community: Message List - Acrobat SDK
Viewing all articles
Browse latest Browse all 10848

Re: How to read line number text from PDF using plugin?

$
0
0

Ok, some background reading of the PDF Reference will help you understand why this is so difficult. PDF files are not organised into lines. It is best to think of each word or character on the page as being a graphic with its own position. The human eye sees lines where a series of graphics (words) are roughly in the same horizontal region.

 

In the general case it is difficult or even impossible to answer this. You may have columns with different spacing (but the PDF stores no information on what is a column). You may have subscripts and superscripts. You may have text in graphics coinciding with other text. Commonly, there may be titles, headings or page numbers which are just ordinary text and might count as lines.


That said, what you need to do is extract the text on the page and its positions. The WordFinder APIs are the way to do that. Now, sort all the words out, using the Y coordinates and size to try and guess what makes a "line". Now you are in a position to find the text (divided into words, not strings) and report the "line number" you have estimated.


Viewing all articles
Browse latest Browse all 10848

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>