How to count words per chapter in txt-file using Python (and islice)?

986 views Asked by At

As a research case I have a literary novel that consists of three main characters who each have their own chapters in the novel. That is: the first chapter is for character X (Aaron), the second for character Y (Sigerius) and the third for character Z (Joni), the fourth for character X, the fifth for character Y, the sixth for character Z, and so on... I want to count the amount of words of all the chapters that are dedicated to character X, character Y and character Z.

This is the Python code I am currently working on with regards to the chapters of one specific character (Aaron):

from itertools import islice

with open(textfile, 'rt', encoding='utf-8') as f:
    # Computes the total word count of the file
    text = f.read()
    words = text.split()
    wordCount = len(words)
    print ("The total word count is:", wordCount)


    # Aaron's chapters

    chapterAaron1 = islice(f, 0, 123)
    chapterAaron4 = islice(f, 223 ,326)
    chapterAaron6 = islice(f, 639, 772)
    chapterAaron10 = islice(f, 1125, 1249)
    chapterAaron12 = islice(f, 1370, 1455)
    chapterAaron15 = islice(f, 1657, 1717)
    chapterAaron19 = islice(f, 2088, 2138)
    chaptersAaron = (chapterAaron1, chapterAaron4, chapterAaron6,    chapterAaron10,  chapterAaron12, chapterAaron12, chapterAaron15, chapterAaron19)

    # Computes the total word count of Aaron's chapters (does not work)

    wordsAaron = chaptersAaron.split()
    wordCountAaron = len(wordsAaron)
    print ("The total word count of Aaron's chapters is:", wordCountAaron)

I have manually decided on which lines of the txt-file the different chapters (per character) begin and end. I use islice to split the txt-file into specific chapters (contained between specific line numbers) in order to calculate the amount of words contained between those line numbers (i.e. the chapters). However, I don't seem to find a way to operationalize islice for this purpose in the right way. I get this AttributeError: 'tuple' object has no attribute 'split'. What I want is to store all chapters of a specific character in one variable (e.g. chaptersAaron), so that I can do stuff with with it, e.g. count the total amount of words and search the occurence of specific words in it.

  • Does anyone have a suggestion with regards to the correct usage of islice for my purposes? Alternative options to split the text into chapters are also very welcome.
1

There are 1 answers

3
Markus Dutschke On

The solution should be:

chaptersAaron=[]
chapterAaron1 = [elem for elem in islice(f, 0, 123)]
chaptersAaron+=chapterAaron1
chapterAaron4 = [elem for elem in islice(f, 223 ,326)]
chaptersAaron+=chapterAaron4
chapterAaron6 = [elem for elem in islice(f, 639, 772)]
chaptersAaron+=chapterAaron6
chapterAaron10 = [elem for elem in islice(f, 1125, 1249)]
chaptersAaron+=chapterAaron10
chapterAaron12 = [elem for elem in islice(f, 1370, 1455)]
chaptersAaron+=chapterAaron12
chapterAaron15 = [elem for elem in islice(f, 1657, 1717)]
chaptersAaron+=chapterAaron15
chapterAaron19 = [elem for elem in islice(f, 2088, 2138)]
chaptersAaron+=chapterAaron19

the problem with you code example is, that you mix iterators, lists and tupels. islice(f, 1125, 1249) is an iterator chaptersAaron = (chapterAaron1, ...) is a tupel and you want to use both as a list

The idea in my solution is to start with an empty list chaptersAaron=[]. Transform all iterators into lists by [elem for elem in islice(f, 0, 123)] and connecinate the lists with chaptersAaron+=chapterAaron1