Saving multiple occurrences of strstr() from a line in C?

368 views Asked by At

This is my first time posting here so I'm sorry if the formatting is a little wrong.

Basically my work has asked me to read through an XML (with invalid tags so using a library may be out of the question - I have no control over the XML files), take specific strings from the tags, and output them to a CSV.

So far I've been able to parse a portion of the program but I run into problems when the desired tags occur more than once in a line.

This is the general format of the XML:

<my:LineItem>
            <my:LineNumber>1</my:LineNumber>
            <my:PartNumber></my:PartNumber>
            <my:Quantity>1</my:Quantity>
            <my:UOM>EA</my:UOM>
            <my:UnitCost>1</my:UnitCost>
            <my:ExtendedCost>1</my:ExtendedCost>
            <my:CostCentre>801090 - CG Collab - Feretti -Core 1</my:CostCentre>
            <my:ExpenseCode>86130 - Lab Equipment Rental</my:ExpenseCode>
            <my:CostExpenseMerge>801090.86130</my:CostExpenseMerge>
            <my:Description>123</my:Description>
            <my:Comments>1</my:Comments>
        </my:LineItem><my:LineItem><my:LineNumber>2</my:LineNumber><my:PartNumber></my:PartNumber><my:Quantity>2</my:Quantity><my:UOM>BX</my:UOM><my:UnitCost>2</my:UnitCost><my:ExtendedCost xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">4</my:ExtendedCost><my:CostCentre>800186 University of Maastricht - Bou</my:CostCentre><my:ExpenseCode>86110 - Glass/Plastic Washing S</my:ExpenseCode><my:CostExpenseMerge>800186.86110</my:CostExpenseMerge><my:Description></my:Description><my:Comments>2</my:Comments></my:LineItem><my:LineItem><my:LineNumber>3</my:LineNumber><my:PartNumber></my:PartNumber><my:Quantity>3</my:Quantity><my:UOM>CA</my:UOM><my:UnitCost>3</my:UnitCost><my:ExtendedCost xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">9</my:ExtendedCost><my:CostCentre>800180 J Bartlett PCC – PRONTO Team Grant</my:CostCentre><my:ExpenseCode>81920 - Mechanical Supplies</my:ExpenseCode><my:CostExpenseMerge>800180.81920</my:CostExpenseMerge><my:Description></my:Description><my:Comments>3</my:Comments></my:LineItem>

And I only need to save specific values such as Quantity, CostExpenseMerge, and Description.

Now; so far I am able to read the first two occurrences of because they occur on separate lines. My problem now is: how do I save multiple occurrences of my desired tags in one line?

The XML seems to randomly force more than one entry into a line (see items 2 and 3 in my input file).

This is what I have for reading:

char buffer[1024];
    const char * startTag = "<my:Quantity>";
    const char * endTag = "</my:Quantity>";
    char * start, * end;
    char * tempString, * target=NULL;

while(fgets(buffer, sizeof(buffer), entry_file)){

        if((start=strstr(buffer,startTag))){
                start+=strlen(startTag);
                if((end=strstr(start,endTag))){
                    target = (char*)malloc(end-start+1);
                    memcpy(target, start, end-start);
                    target[end-start]='\0';
                    if(target)printf("%s\n", target);   
                }
        }
        }

And my output is:

1
2

Which means that the third occurrence of didn't get read (it's supposed to be "3").

Help please!

2

There are 2 answers

0
Brian Sidebotham On

Rather than if(), use a while(). Here's a test program that works on your data:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char buffer[1024];
const char * startTag = "<my:Quantity>";
const char * endTag = "</my:Quantity>";
char * start, * end;
char * tempString, * target=NULL;
int line_offset = 0;

int main( int argc, char* argv[] )
{
    FILE* xml = fopen( "data.xml", "r" );

    if( xml == NULL )
    {
        fprintf( stderr, "Could not load data.xml\n" );
        exit( EXIT_FAILURE );
    }

    while(fgets(buffer, sizeof(buffer), xml)){
        line_offset = 0;
        while((start=strstr(&buffer[line_offset],startTag))){
                start+=strlen(startTag);
                if((end=strstr(start,endTag))){
                    target = (char*)malloc(end-start+1);
                    memcpy(target, start, end-start);
                    target[end-start]='\0';
                    if(target)printf("%s\n", target);
                }

            line_offset += start - buffer;
        }
    }
}
0
Paul Ogilvie On

You must process the tags in a loop because there can be multiple tags on a line:

while(fgets(buffer, sizeof(buffer), entry_file)){
        start= buffer;
        while ((start=strstr(start,startTag))){
                start+=strlen(startTag);
                if((end=strstr(start,endTag))){
                    target = (char*)malloc(end-start+1);
                    memcpy(target, start, end-start);
                    target[end-start]='\0';
                    if(target)printf("%s\n", target);   
                    start= end+strlen(endtag);
                }
                else break;
        }
}