I am currently working on a project of analyzing the quality examination paper questions.In here I am using Python 3.4 with NLTK.
So first I want to take out each question separately from the text.The question paper format is given below.
(Q1). What is web 3.0?
(Q2). Explain about blogs.
(Q3). What is mean by semantic web?
and so on ........
So now I want to extract the questions one by one without having the question number(Question number format is always same as given above).So my result should be something like this.
What is web 3.0?
Explain about blogs.
What is mean by semantic web?
So how can tackle this problem with python 3.4 with NLTK?
Thank you
You'll probably need to detect lines containing a question, then extract the question and drop the question number. The regexp for detecting a question label is
You can use it to pull out the questions like this:
Obviously,
text
must be a list of lines or a file open for reading.But if you had no idea how to approach this, you have your work cut out for you with the rest of the assignment. I recommend spending some time on the python tutorial or other introductory materials.