Quantcast
Channel: Adobe Community: Message List - Acrobat SDK
Viewing all articles
Browse latest Browse all 10848

Ligature text expansion issue

$
0
0

Hi,

 

I am successfully extracting text from pdf, by the PDWordFinder but there are some issue with ligature text.

 

Can any one help let me know if possible, How to stop ligature expanision.

 

There is a word "office" in my pdf file. and it is getting expanded as "offi ce".

 

Here is my code

 

 

                    PDWordFinderConfigRec wfConfig;                    /* WordFinder configuration record */

                    memset(&wfConfig, 0, sizeof(PDWordFinderConfigRec));

                    wfConfig.noXYSort = true;

                    wfConfig.noLigatureExp = false;

 

                    wordFinder = PDDocCreateWordFinderEx (pdDoc, WF_LATEST_VERSION, toUnicode, &wfConfig);
         

         pageNum = AVPageViewGetPageNum (pageView);

         PDWordFinderAcquireWordList (wordFinder, pageNum, &wInfo, NULL, NULL, &count);
        
         

 

for(i=0; i<count; i++)

{

                    memset (str, '\0', MAX_PATH);

                    word = PDWordFinderGetNthWord (wordFinder, i);

                    PDWordGetString (word, str, PDWordGetLength(word));

 

  attrib          = PDWordGetAttrEx (word, 0);

    

   if((attrib & WXE_ADJACENT_TO_SPACE) && !(attrib & WXE_LAST_WORD_ON_LINE) && !(attrib & WXE_HAS_LIGATURE))

        strcat (str, " ");

 

     fprintf (pFileTexts, "%s", str);

}

 

Actually for all words the value (attrib & WXE_HAS_LIGATURE) is never getting true.
so not able to detect ligatured texts.


Viewing all articles
Browse latest Browse all 10848

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>