I have CSV input data file in which there are several records. Each Record is made up of any number of lines. (1 line, 2 lines, 5 lines or any). One thing for sure is that each record has 24 fields which are separated by "::". Each record starts on a new line but each new line is not a new record.
default record reader fails for this problem as each new line is not a new record.
**How do I take care of input splits. It may be possible for record of 3 lines, that 1 line is in one block and the other 2 are in other block?
How should I distinguish between records before they are provided as input to a map method?**
I believe it has something to do with inputformat and record-reader. Any suggestions and help is much appreciated.
Here is sample data:
review_id::text::business_id::full_address::schools::longitude::average_stars::date::user_id::open::categories::photo_url::city::review_count::name::neighborhoods::url::votes.cool::votes.funny::state::stars::latitude::type::votes.useful
NaN::NaN::NaN::NaN::NaN::NaN::3.5::NaN::cxInT2YC-tuyGwKpEKAuEw::NaN::NaN::NaN::NaN::8::Jane A.::NaN::http://www.yelp.com/user_details?userid=cxInT2YC-tuyGwKpEKAuEw::2::1::NaN::NaN::NaN::user::5
NaN::NaN::NaN::NaN::NaN::NaN::3.0::NaN::OfAuGRtKoUmwEujBoD1mfw::NaN::NaN::NaN::NaN::4::Amy B.::NaN::http://www.yelp.com/user_details?userid=OfAuGRtKoUmwEujBoD1mfw::1::1::NaN::NaN::NaN::user::6
fu7TcxnAOdnbdLcyFhMmZg::Pretty great! Okay, so this place is obviously not Vegan since they have a bunch of cheese and egg offerings, BUT I see that they do offer plenty of vegan alternatives.
I was sort of skeptical being here because the prices were pretty hefty, I felt.
Anyway, their homemade hot sauce is AMAZING. I got the eggs benedict for dinner and J got an omelet. Both were really good. I do love their homefries.. but the next time I come here, I want onion rings or fries. Those onion rings looked amazing.
Lastly, the food came relatively quickly.
Not a fan of the service. They tried to seat us at this edge facing the stoves, without asking, so I asked for a booth. Then at the booth, the server didn't refill waters very well but didn't feel bad emphasizing over and over whether or not we wanted their $5-7 desserts. Honestly, a slice of pie for $6.50? Veggie Galaxy, you are t r i p p i n !
But great food! (especially breaky!)::qw5gR8vW7mSOK4VROSwdMA::NaN::NaN::NaN::NaN::2011-11-12::Z_WAxc4RUpKp3y12BH1bEg::NaN::NaN::NaN::NaN::NaN::NaN::NaN::NaN::0::1::NaN::4::NaN::review::0
85TbS2RT5f6kqZ5l7_jfRw::Great place!
I have to say the menu and the outdoor seating keep us coming back. The food is good -- had breakfast both times but some friends had lunch items. Definitely a great selection. We've been at off-peak times so no waiting and better service.
All in all, it's no DZ Akins but it's definitely worth trying!::-tphABJRkegXV4Fr1ke4FQ::NaN::NaN::NaN::NaN::2010-09-19::1IzWxAfxuHTnzKOupUOB5Q::NaN::NaN::NaN::NaN::NaN::NaN::NaN::NaN::0::0::NaN::4::NaN::review::0