I have created a very simple application in C# which reads an OpenDocument Spreadsheet file using DotNetZipLib and the XmlDocument
class. This has been relativity straightforward since formatting and styles are not relevant for my application.
The format includes several elements of interest to this question <table:table-column>
, <table:table-row>
, <table:table-cell>
and <table:covered-table-cell>
. The number of column elements does not necessarily correspond to the actual number of columns within the spreadsheet even when column repetition is considered. Likewise each row element contains a differing number of cell elements.
As stated in the OpenDocument specification I have taken into consideration the fact that rows, columns and cells may be repeated. This is working great since the data is being read into the correct cells of my data format.
With my current understanding of the specification it seems that the only way to count the number of columns in the spreadsheet is to enumerate through each row and count the number of cells. Whilst this is relatively easily, it would be convenient to know the column count before filling my data structure.
Is there a way to efficiently determine the number of columns in the spreadsheet without having to consider each row individually?
I have come to the realisation that to determine the total number of columns in an OpenDocument spreadsheet, you must first read each row whilst keeping a running count:
After rows have been read, and the maximum length is known, add empty cells to each of the read rows: