Background: I want to read some data from a text file, into a polars dataframe. The data starts at the line containing the string foo, and stops at the first empty line afterwards. Example file test.txt:
stuff to skip
more stuff to skip
skip me too
foo bar foobar
1 2 A
4 5 B
7 8 C
other stuff
stuff
pl.read_csv has args skip_rows and n_rows. Thus, if I can find the line number of foo and the line number of the first empty line afterwards, I should be able to read the data into a polars dataframe. How can I do that? I'm able to find skip_rows:
from pathlib import Path
file_path = Path('test.txt')
with open(file_path, 'r') as file:
skip_rows = 0
n_rows = 0
for line_number, line in enumerate(file, 1):
if 'foo' in line:
skip_rows = line_number - 1
But how can I find also n_rows without scanning the file again? Also, the solution must handle the case when there's no line containing foo, e.g.
stuff to skip
more stuff to skip
skip me too
1 2 A
4 5 B
7 8 C
other stuff
stuff
In that case, I would like to either return a value indicating that foo was not found, or raise an exception so that the caller knows something went wrong (maybe a ValueError exception?).
EDIT: I forgot an edge case. Sometimes the data may continue until the end of the file:
stuff to skip
more stuff to skip
skip me too
foo bar foobar
1 2 A
4 5 B
7 8 C
You can try:
Prints (with the first input from your question):