I have an input GEDCOM file with tons of individual/family records. The purpose is to format their data into this form:

name(p6, 'Harry Buis'). birth(p6, date(1927,11,17)). death(p6, date(2001,08,21)). famc(p6, f3). fams(p6, f2).

I have been able to pull out the person number and their name and print it to an output file, however I am having trouble parsing the birth/death dates. I want to be able to use substring to assign the birthDay, birthMonth, and BirthYear as Integers so I can print it to the output file. It must be Integers so I can sort by date. Here is a sample of one client's data from the input file.

0 @P6@ INDI 
1 BIRT 
2 DATE 17 Nov 1924
1 NAME Harry /Buis/
1 DEAT Age: 76
2 DATE 21 Aug 2001
1 SEX M
1 FAMC @F3@
1 FAMS @F2@

And here is my source code of what I have so far:

public class Main {

static Scanner scan;
static BufferedWriter outFile;
static int birthYear = 0;
static int birthMonth = 0;
static String birthDay = "";
static int deathYear = 0;
static int deathMonth = 0;
static int deathDay = 0;
static String name = "";
static String person = "";
static String sex = "";
static String famC = "";
static String famS = "";
static String man = "";
static String woman = "";
static String child = "";

public static void parse() throws IOException {
    scan = new Scanner(new FileReader("pbuis.ged"));
    outFile = new BufferedWriter(new FileWriter("output.txt"));
    String reader = scan.nextLine();
    int count = 0;

    while (scan.hasNextLine()) {

        if (reader.contains("NAME") && count < 1) {
            reader = reader.substring(1).replace("/", "");
            count++;
            System.out.println(reader);
            name = reader.replace("NAME", "");
        }

        if (reader.startsWith("0")) {
            person = reader.trim().substring(2, 7).replace("@", "")
                    .replace("I", "").trim().toLowerCase();
            System.out.print(person);
            count = 0;
        }

        if (reader.contains("BIRT")) {
            scan.nextLine();
            birthDay = Integerreader.substring(6, 9).trim();
        }

        if (reader.equalsIgnoreCase("") || reader.equalsIgnoreCase(" ")) {
            outFile.write("name(" + person + ", " + "'" + name.trim() + "'"
                    + ")." + "\n" + birthDay);

        }

        reader = scan.nextLine();
    }
}

public static void main(String[] args) throws IOException {
    parse();

}

}

Without the if statement (contains "BIRT"), and "birthDay" not in the outFile.write() method, my output looks like this:

name(p1, 'Paul Edward Buis').
name(p2, 'Thomas Edward Buis').
name(p3, 'Jennifer Joy Buis').
name(p4, 'Daniel Paul Buis').
name(p5, 'Barbara Joy VanderWall').
name(p6, 'Harry Buis').

which is a good start.

But when I have that if statement, I get an error like this, and nothing prints:

p1Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 9
    at java.lang.String.substring(Unknown Source)
    at Main.parse(Main.java:50)
    at Main.main(Main.java:64)

Now, I have tried every combination of substringing index values, and nothing seems to work. Any idea on how I fix this?

Thanks in advance.

2

There are 2 answers

0
PearsonArtPhoto On

I suggest you use a Date function. Date functions can be sorted easier than year/month/date. If you really want, store them as the milliseconds since the epoch.

To parse the date, use a SimpleDateFormatter. I believe something like this would work:

SimpleDateFormatter dateFormat=new SimpleDateFormat("dd mmm yyyy")
Date birth=date.parse("17 jul 1984",0);

One you get it in to the Date format, you can do a lot of neat things, like these:

Date date1, date2;
date1.after(date2);
date1.compareTo(date2)

You could even get the minutes or seconds, but I don't recommend that. Note the 0 refers to the index starting the string, so you could just specify the index where the format starts, and you're good. Overall, I think this is a lot cleaner.

0
Frizbog On

Date parsing from GEDCOM files is tricky. You can use a SimpleDateFormatter for any dates that are in dd MMM yyyy format (like 26 SEP 2015) but GEDCOM supports a lot of weird variations, including imprecise dates where you only have the month and year, or just the year. It also allows prefixes like "ABT" to indicate that something occurred around a specific date, allows for ranges ("BET date1 AND date2") and ("FROM date1 TO date2"), and a lot of other complex behavior (French Republican or Hebrew calendars, anyone?)

I would recommend using gedcom4j (http://gedcom4j.org), which is a java library you can link into your program to load your data into Java objects and then do what you need. The DateParser class in that library can interpret your string values and turn them into java.util.Date values so you can do what you're describing.