For my class, I have to take all of the texts of Sherlock Holmes and find all dialogue. Dialogue is supposed to happen within two quote marks "
Everything I have tried, I can't consistently get the dialogue. When looking at the text given to use in the assignment and the URL where it came from there are occasionally quote marks missing. This is apparently due to grammar, where if a character is talking for longer than one paragraph, open quotes happen for each new paragraph, but then closing quote " marks only happen at the end of last paragraph of speech.
Due to the randomness of this happening, I have found it to be impossible but have found one possible solution: If my quote state variable equals one and a second quote mark appears I must check and see if there are 2 new line characters, if true then the quote mark doesn't count and continue until end.
I have been trying to use multiple file stream pointers to find the current character, previous character, and previous minus one character but I can't find a way to do this. It seems that it is not possible to make multiple pointers to the same file but I am not 100% sure. Could this in fact be impossible?
Summary: I am having a run time error where I can never seem to find the dialogue consistently. Seems impossible based on grammar rules allowing for random non placement of end quote marks. This in my mind makes me think at best I can only get a mix of dialogue and narration. Have tried creating multiple file stream pointers to find current, previous, and previous minus one character and it isn't working.
Specific function in question for my c program:
void findDialogueInFile(char* filename)
{
FILE *newlyWrittenFile = fopen(filename, "r");
if (newlyWrittenFile == NULL)
{
printf("\nFile could not be opened");
}
else
{
printf("\nNewlyWrittenFile is readable");
}
char charIterator;
int doubleQuoteCounter = 0;
FILE *quoteCheckerFile = fopen("quoteChecker.txt", "w");
if (quoteCheckerFile == NULL)
{
printf("\nFile could not be opened");
}
else
{
printf("\nquoteChecker is writeable");
}
int singleQuoteCounter = 0;
FILE *previousElementOfStreamPointer = fopen("quoteChecker.txt", "r");
FILE *ElementMinus2OfStreamPointer = fopen("quoteChecker.txt", "r");
char previousCharElement;
char previousCharElementMinus2;
int lengthOfStringArrayCounter = 0;
if (previousElementOfStreamPointer == NULL)
{
printf("\nFile could not be opened");
}
else
{
printf("\nquoteChecker is readable with previousElementOfStreamPointer");
}
if (ElementMinus2OfStreamPointer == NULL)
{
printf("\nFile could not be opened");
}
else
{
printf("\nquoteChecker is readable with elementMinus2OfStreamPointer");
}
/// vvvvv where the magic happens vvvvv
while( (charIterator = fgetc(newlyWrittenFile)) != EOF )
{
fseek(previousElementOfStreamPointer, -1L, SEEK_CUR);
fseek(ElementMinus2OfStreamPointer, -2L, SEEK_CUR);
previousCharElement = fgetc(previousElementOfStreamPointer);
previousCharElementMinus2 = fgetc(ElementMinus2OfStreamPointer);
if (charIterator == '\"')
{
if(previousCharElement == '\n' && previousCharElementMinus2 == '\n')
{
printf("\nFOUND DIALOGUE LONGER THAN A PARAGRAPH\n");
continue;
}
fprintf(quoteCheckerFile, "%c", charIterator);
doubleQuoteCounter++;
}
else if (singleQuoteCounter >= 2)
{
fprintf(quoteCheckerFile, "\n");
singleQuoteCounter = 0;
//doubleQuoteCounter = 0;
continue;
}
else if (doubleQuoteCounter == 1)
{
if (charIterator == '\'')
{
singleQuoteCounter++;
}
fprintf(quoteCheckerFile, "%c", charIterator);
}
else if (doubleQuoteCounter >= 2)
{
fprintf(quoteCheckerFile, "\n\n");
doubleQuoteCounter = 0;
}
}
fclose(newlyWrittenFile);
fclose(quoteCheckerFile);
return ;
}
I was expecting to be able to find different positions in a file at one time with multiple pointers and fseek to look for previous and previous minus one character. It seems to not be working and my logs to check if these things work are not printing to terminal.
"This is apparently due to grammar..." No. This is how continuous speech is signalled to the reader of written dialogue.
The code you've presented is too long and muddled to tweak into a usable form (sorry).
Three(!)
FILE
pointers, and counters and flags... It's all too much!As stated in a comment, your code only needs to keep track of a couple of recently seen characters to determine if a double quote signals the end of a bit of speech, or the continuance of a monologue by one character.
Another simplification is to write the program as a 'filter' so that you don't need to fuss with file names and pointers. Let the OS and the C library carry some of the load.
Below is a tiny excerpt from Harper Lee's "To Kill A Mockingbird", used as a sample text. (The final sentence has been split out to have a "monologue continuation" example.)
And here is some code:
And here is the result:
The code works, but, as you can see, phrases have been broken confounding the understanding of who says what to whom. Fixing this is beyond the scope of the OP's question. A post processing filter could be written to strip out blank lines and connect-up phrases that the author had split in the original text.
EDIT:
One of the many problems with a morass of code is its ability to hide small but consequential bugs...
Repeated SO many times, here on SO, "EOF is NOT a single byte
char
"