Quantcast
Channel: Adobe Community: Message List - Acrobat SDK
Viewing all articles
Browse latest Browse all 10848

How to find sentences from PDF?

$
0
0

I think finding paragraph is out of the question. So, if I ask, is it possible to break the text content into sentences? May be using the sentence end marker like (.) Dot?

I'm not exactly sure this is the way to start it but, what if I have the wordList acquired from the PDWordFinder

May be if I use a delimiter with the text content?

Any help and directions greatly appreciated.

 

  ASInt32 numWords;

  PDWord wordInfo;

  PDWord *pXYSortTable;

  PDWordFinderAcquireWordList(pdWordFinder, pageNum,&wordInfo, &pXYSortTable, NULL, &numWords);

  for(int nWordCounter = 0; nWordCounter<numWords; nWordCounter++ )

  {

  PDWord pdNWord = PDWordFinderGetNthWord(pdWordFinder, nWordCounter );

  // Get the word as a string

  char stringBuffer[125];

  //ASUns8 pdwordLength = PDWordGetLength (pdNWord);

  PDWordGetString (pdNWord, stringBuffer, sizeof(stringBuffer));

  pdfCorpus << stringBuffer;

  pdfCorpus << " ";

  }


Viewing all articles
Browse latest Browse all 10848

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>