How to process Multiline CSV Input File for Map Reduce Hadoop?

776 views Asked by At

I have CSV input data file in which there are several records. Each Record is made up of any number of lines. (1 line, 2 lines, 5 lines or any). One thing for sure is that each record has 24 fields which are separated by "::". Each record starts on a new line but each new line is not a new record.

default record reader fails for this problem as each new line is not a new record.

**How do I take care of input splits. It may be possible for record of 3 lines, that 1 line is in one block and the other 2 are in other block?

How should I distinguish between records before they are provided as input to a map method?**

I believe it has something to do with inputformat and record-reader. Any suggestions and help is much appreciated.

Here is sample data:

review_id::text::business_id::full_address::schools::longitude::average_stars::date::user_id::open::categories::photo_url::city::review_count::name::neighborhoods::url::votes.cool::votes.funny::state::stars::latitude::type::votes.useful

NaN::NaN::NaN::NaN::NaN::NaN::3.5::NaN::cxInT2YC-tuyGwKpEKAuEw::NaN::NaN::NaN::NaN::8::Jane A.::NaN::http://www.yelp.com/user_details?userid=cxInT2YC-tuyGwKpEKAuEw::2::1::NaN::NaN::NaN::user::5

NaN::NaN::NaN::NaN::NaN::NaN::3.0::NaN::OfAuGRtKoUmwEujBoD1mfw::NaN::NaN::NaN::NaN::4::Amy B.::NaN::http://www.yelp.com/user_details?userid=OfAuGRtKoUmwEujBoD1mfw::1::1::NaN::NaN::NaN::user::6

fu7TcxnAOdnbdLcyFhMmZg::Pretty great! Okay, so this place is obviously not Vegan since they have a bunch of cheese and egg offerings, BUT I see that they do offer plenty of vegan alternatives.

I was sort of skeptical being here because the prices were pretty hefty, I felt.
Anyway, their homemade hot sauce is AMAZING. I got the eggs benedict for dinner and J got an omelet. Both were really good. I do love their homefries.. but the next time I come here, I want onion rings or fries. Those onion rings looked amazing.

Lastly, the food came relatively quickly.

Not a fan of the service. They tried to seat us at this edge facing the stoves, without asking, so I asked for a booth. Then at the booth, the server didn't refill waters very well but didn't feel bad emphasizing over and over whether or not we wanted their $5-7 desserts. Honestly, a slice of pie for $6.50? Veggie Galaxy, you are t r i p p i n !

But great food! (especially breaky!)::qw5gR8vW7mSOK4VROSwdMA::NaN::NaN::NaN::NaN::2011-11-12::Z_WAxc4RUpKp3y12BH1bEg::NaN::NaN::NaN::NaN::NaN::NaN::NaN::NaN::0::1::NaN::4::NaN::review::0

85TbS2RT5f6kqZ5l7_jfRw::Great place!

I have to say the menu and the outdoor seating keep us coming back. The food is good -- had breakfast both times but some friends had lunch items. Definitely a great selection. We've been at off-peak times so no waiting and better service.

All in all, it's no DZ Akins but it's definitely worth trying!::-tphABJRkegXV4Fr1ke4FQ::NaN::NaN::NaN::NaN::2010-09-19::1IzWxAfxuHTnzKOupUOB5Q::NaN::NaN::NaN::NaN::NaN::NaN::NaN::NaN::0::0::NaN::4::NaN::review::0

0

There are 0 answers