I need to modify only the Open Office file metadata. How I can do it without loading the entire file into memory (file.odt)? I need to work only with the file: meta.xml and label: ... metadata ...
I'm using Apache ODF Toolkit 0.5-incubating. My code loads the meta.xml file but I can not get metadata:
OdfPackage pkg = OdfPackage.loadPackage(new File("file.odt"));
Node d = pkg.getDom("meta.xml").getElementsByTagName("office:document-meta").item(0);
for(int i =0; i<d.getAttributes().getLength();i++) {
String nombre = d.getAttributes().item(i).getNodeName();
String valor = d.getAttributes().item(i).getNodeValue();
System.out.println("Clave: " + nombre + " valor: " + valor);
}
If you want to work with a range of file formats, the Apache Tika is your best bet. Tika provides a common interface for extracting text and metadata from a large number of formats, and hides the complexity of the different types and formats from you.
On the command line, to extract the metadata from this sample file you'd do
And you'd get back a huge amount of metadata:
From Java, you could get the same with something as simple as
And you'd get the metadata on the Metadata object