I think finding paragraph is out of the question. So, if I ask, is it possible to break the text content into sentences? May be using the sentence end marker like (.) Dot?
I'm not exactly sure this is the way to start it but, what if I have the wordList acquired from the PDWordFinder
May be if I use a delimiter with the text content?
Any help and directions greatly appreciated.
ASInt32 numWords;
PDWord wordInfo;
PDWord *pXYSortTable;
PDWordFinderAcquireWordList(pdWordFinder, pageNum,&wordInfo, &pXYSortTable, NULL, &numWords);
for(int nWordCounter = 0; nWordCounter<numWords; nWordCounter++ )
{
PDWord pdNWord = PDWordFinderGetNthWord(pdWordFinder, nWordCounter );
// Get the word as a string
char stringBuffer[125];
//ASUns8 pdwordLength = PDWordGetLength (pdNWord);
PDWordGetString (pdNWord, stringBuffer, sizeof(stringBuffer));
pdfCorpus << stringBuffer;
pdfCorpus << " ";
}