Issue with identifying invalid lexemes in C initialization statement

23 views Asked by At

I have been working on a code snippet to identify invalid lexemes in a C initialization statement. The code is intended to check for invalid data types, invalid identifiers, and invalid constant values. However, I'm facing an issue where the code incorrectly identifies subsequent lexemes as invalid identifiers after encountering the first invalid one and it seems to regard the first whitespace after the data type as an invalid lexeme.

Here's a version of the code:

 private static ArrayList<String> findInvalidLexemes(String[] lexemes) {
        ArrayList<String> invalidLexemes = new ArrayList<>();

        // Check for invalid data types and invalid identifiers
        boolean isInvalidIdentifier = false;
        for (String lexeme : lexemes) {
            if (!(lexeme.equals("int") || lexeme.equals("float"))) {
                if (!isValidIdentifier(lexeme)) {
                    invalidLexemes.add("Invalid identifier: " + lexeme);
                    isInvalidIdentifier = true;
                } else {
                    if (!isInvalidIdentifier) {
                        invalidLexemes.add(lexeme);
                    }
                    isInvalidIdentifier = false;
                }
            }
        }

        // Check for invalid constant values
        for (String lexeme : lexemes) {
            if (lexeme.matches("\\d+(\\.\\d+)?")) {
                if (!isValidFloat(lexeme)) {
                    invalidLexemes.add("Constant(F)");
                } else if (!isValidInt(lexeme)) {
                    invalidLexemes.add("Constant(I)");
                }
            }
        }

        return invalidLexemes;
    }

    private static boolean isValidLexeme(String lexeme) {
        // Add validation logic
        return !lexeme.isBlank();
    }

    private static boolean isValidIdentifier(String identifier) {
        String regex = "^([a-zA-Z_$][a-zA-Z\\d_$]*)$";
        Pattern p = Pattern.compile(regex);

        String[] reservedKeywords = {"int", "float", "double", "char", "short", "long", "unsigned", "signed",
            "void", "for", "while", "do", "if", "else", "switch", "case", "break", "continue",
            "return", "goto", "struct", "union", "typedef", "enum", "static", "extern", "const",
            "volatile", "register", "auto"};
        
        if (identifier == null) {
            return false;
        }

        Matcher m = p.matcher(identifier);
        if (m.matches()) {
            if(Arrays.asList(reservedKeywords).contains(identifier)) {
                return false;
            }
            return true;
        }

        return false;
    }
    
    private static boolean isValidFloat(String lexeme) {
        try {
            float floatValue = Float.parseFloat(lexeme);
            return floatValue >= 0 && floatValue <= 3.4028235E38;
        } catch (NumberFormatException e) {
            return false;
        }
    }

    private static boolean isValidInt(String lexeme) {
        try {
            int intValue = Integer.parseInt(lexeme);
            return intValue >= 0 && intValue <= 2147483647;
        } catch (NumberFormatException e) {
            return false;
        }
    }
    ```

When I run the code with the input "int 2323 = 23. y = notvalid, z;", I get the following output:
Invalid identifier: 
Invalid identifier: 2323
Invalid identifier: 
Invalid identifier: =
Invalid identifier: 
Invalid identifier: 23
Invalid identifier: .
Invalid identifier: 
Invalid identifier: 
Invalid identifier: =
Invalid identifier: 
Invalid identifier: ,
Invalid identifier: 
Invalid identifier: ;

It seems that the logic for handling valid identifiers is causing subsequent lexemes to be marked as invalid incorrectly. I have tried adjusting the code, but I couldn't find a solution.

Can someone please help me understand what's wrong with the current logic and suggest a fix for this issue? I would greatly appreciate any insights or suggestions. Thank you.


expected output for the given input "int 2323 = 23. y = notvalid, z;"

Invalid identifier: 2323
Invalid identifier: notvalid
0

There are 0 answers