I am trying to write a script that takes a folder with X .odt files and compute the number of words. It has to write it in a csv file, with the date.
I tried to do it using odfpy.
import odf
import glob
import pandas as pd
import os
from odf.opendocument import load as load_odt
filenames = []
word_counts = []
for f in glob.glob('*.odt'):
print(f)
doc = load_odt(f)
if doc.text.hasChildNodes():
n = 0
for e in doc.text.childNodes:
if ":text:" in e.qname[0]:
words = [w for w in str(e).split(" ") if len(w) > 0]
n += len(words)
else:
print(e.qname[0])
filenames.append(f)
word_counts.append(n)
df = pd.DataFrame({'date':[pd.Timestamp.now() for i in range(len(filenames))], 'filename':filenames, 'word_count':word_counts})
print(df)
csv_filename = 'word_count.csv'
it somehow works but there are some missing files from the CSV. Any ideas ?
It looks like this works :
It's not exactly the same word count as LibreOffice but it will be enough.