Java sanitizing Arraylist records suggestions

2.3k views Asked by At

I am looking for an idea how to accomplish this task. So I'll start with how my program is working.

My program reads a CSV file. They are key value pairs separated by a comma.

  L1234456,ygja-3bcb-iiiv-pppp-a8yr-c3d2-ct7v-giap-24yj-3gie
  L6789101,zgna-3mcb-iiiv-pppp-a8yr-c3d2-ct7v-gggg-zz33-33ie

etc

Function takes a file and parses it into an arrayList of String[]. The function returns the ArrayList.

    public ArrayList<String[]> parseFile(File csvFile) {
    Scanner scan = null;
    try {
        scan = new Scanner(csvFile);
    } catch (FileNotFoundException e) {

    }

    ArrayList<String[]> records = new ArrayList<String[]>();
    String[] record = new String[2];
    while (scan.hasNext()) {
        record = scan.nextLine().trim().split(",");
        records.add(record);
    }
    return records;
 }

Here is the code, where I am calling parse file and passing in the CSVFile.

  ArrayList<String[]> Records = parseFile(csvFile);

I then created another ArrayList for files that aren't parsed.

  ArrayList<String> NotParsed = new ArrayList<String>();

So the program then continues to sanitize the key value pairs separated by a comma. So we first start with the first key in the record. E.g L1234456. If the record could not be sanitized it then it replaces the current key with "CouldNOtBeParsed" text.

for (int i = 0; i < Records.size(); i++) {
        if(!validateRecord(Records.get(i)[0].toString())) {
            Logging.info("Records could not be parsed " + Records.get(i)[0]);
               NotParsed.add(srpRecords.get(i)[0].toString());
            Records.get(i)[0] = "CouldNotBeParsed";
        } else {
            Logging.info(Records.get(i)[0] + " has been sanitized");
        }
    }

Next we do the 2nd key in the key value pair e.g ygja-3bcb-iiiv-pppp-a8yr-c3d2-ct7v-giap-24yj-3gie

for (int i = 0; i < Records.size(); i++) {
        if(!validateRecordKey(Records.get(i)[1].toString())) {
            Logging.info("Record Key could not be parsed " + Records.get(i)[0]);
               NotParsed.add(Records.get(i)[1].toString());
            Records.get(i)[1] = "CouldNotBeParsed";
        } else {
            Logging.info(Records.get(i)[1] + " has been sanitized");
        }
    }

The problem is that I need both keyvalue pairs to be sanitized, make a separate list of the keyValue pairs that could not be sanitized and a list of the ones there were sanitized so they can be inserted into a database. The ones that cannot will be printed out to the user.

I thought about looping thought the records and removing the records with the "CouldNotBeParsed" text so that would just leave the ones that could be parsed. I also tried removing the records from the during the for loop Records.remove((i)); However that messes up the For loop because if the first record could not be sanitized, then it's removed, the on the next iteration of the loop it's skipped because record 2 is now record 1. That's why i went with adding the text.

Atually I need two lists, one for the Records that were sanitized and another that wasn't.

So I was thinking there must be a better way to do this. Or a better method of sanitizing both keyValue pairs at the same time or something of that nature. Suggestions?

1

There are 1 answers

4
Sergey Kalinichenko On BEST ANSWER

Start by changing the data structure: rather than using a list of two-element String[] arrays, define a class for your key-value pairs:

class KeyValuePair {
    private final String key;
    private final String value;
    public KeyValuePair(String k, String v) { key = k; value = v; }
    public String getKey() { return key; }
    public String getValue() { return value; }
}

Note that the class is immutable.

Now make an object with three lists of KeyValuePair objects:

class ParseResult {
    private final List<KeyValuePair> sanitized = new ArrayList<KeyValuePair>();
    private final List<KeyValuePair> badKey = new ArrayList<KeyValuePair>();
    private final List<KeyValuePair> badValue = new ArrayList<KeyValuePair>();
    public ParseResult(List<KeyValuePair> s, List<KeyValuePair> bk, List<KeyValuePair> bv) {
        sanitized = s;
        badKey = bk;
        badValue = bv;
    }
    public List<KeyValuePair> getSanitized() { return sanitized; }
    public List<KeyValuePair> getBadKey() { return badKey; }
    public List<KeyValuePair> getBadValue() { return badValue; }
}

Finally, populate these three lists in a single loop that reads from the file:

public static ParseResult parseFile(File csvFile) {
    Scanner scan = null;
    try {
        scan = new Scanner(csvFile);
    } catch (FileNotFoundException e) {
        ???
        // Do something about this exception.
        // Consider not catching it here, letting the caller deal with it.
    }
    final List<KeyValuePair> sanitized = new ArrayList<KeyValuePair>();
    final List<KeyValuePair> badKey = new ArrayList<KeyValuePair>();
    final List<KeyValuePair> badValue = new ArrayList<KeyValuePair>();
    while (scan.hasNext()) {
        String[] tokens = scan.nextLine().trim().split(",");
        if (tokens.length != 2) {
            ???
            // Do something about this - either throw an exception,
            // or log a message and continue.
        }
        KeyValuePair kvp = new KeyValuePair(tokens[0], tokens[1]);
        // Do the validation on the spot
        if (!validateRecordKey(kvp.getKey())) {
            badKey.add(kvp);
        } else if (!validateRecord(kvp.getValue())) {
            badValue.add(kvp);
        } else {
            sanitized.add(kvp);
        }
    }
    return new ParseResult(sanitized, badKey, badValue);
}

Now you have a single function that produces a single result with all your records cleanly separated into three buckets - i.e. sanitized records, records with bad keys, and record with good keys but bad values.