reading complex and uneven data from text using java

Question

reading complex and uneven data from text using java

169 views Asked by Debanjan Banik At 02 December 2014 at 02:10

I have to read texts from a file which are not even and a little complex Basically the are in this order

Index . word / DOC_id : position1 postition2 (....and so on), DOC_id : position1 postition2 (....and so on),

So a word could appear in n number of documents and could appear n number of times in a document. As an example i am copying a small section of the file, i cannot put words which occur too many times because of the space constraints.

Example:

13137 . speeding / D85 : 5999  , 
13138 . spell / D53 : 1513  , 
13139 . spelling / D3 : 344 351  , 
13140 . spending / D71 : 398  , 
13141 . spiderman / D60 : 650 733 997 1023 1053 1133 1152 1169  , 
13142 . spiders / D75 : 704  , D91 : 19834  ,
(...and so on)

Please could anyone help me with this. Also, could i format the file in a better way as i generated this file, may be i can reformat it and generate a better formatted text file.

Thank You :)

Original Q&A

There are 1 answers

**Vino** · Accepted Answer · 2014-12-02T02:27:12+00:00

Perhaps you should use the new line as delimiter. Here's what I mean

13137 . speeding / D85 : 5999
13138 . spell / D53 : 1513 
13139 . spelling / D3 : 344 351
13140 . spending / D71 : 398
13141 . spiderman / D60 : 650 733 997 1023 1053 1133 1152 1169
13142 . spiders / D75 : 704 , D91 : 19834

In other words, a format of the following nature

Index . word / DOC_id : position1 postition2 ... , DOC_id : position1 ...
Index . word / DOC_id : position1 postition2 ... , DOC_id : position1 ...
Index . word / DOC_id : position1 postition2 ... , DOC_id : position1 ...

Edit

Now that you can retrieve one line at a time, push them into a Scanner or StringTokenizer or even use String.split remembering that whitespace will be used as a delimiter. Parse through each token keeping track of .,/,: and ,. You already know the format of each line and what separators are used; use that information and proceed.

TechQA.

reading complex and uneven data from text using java

There are 1 answers

Related Questions in JAVA

Related Questions in TEXT-FILES

Related Questions in INVERTED-INDEX

Popular Questions

Popular Tags

Trending Questions