Which functions should I use to work with an XDF file on HDFS?

Question

Which functions should I use to work with an XDF file on HDFS?

312 views Asked by Amit Mishra At 10 June 2015 at 09:47

I have an .xdf file on an HDFS cluster which is around 10 GB having nearly 70 columns. I want to read it into a R object so that I could perform some transformation and manipulation. I tried to Google about it and come around with two functions:

rxReadXdf

rxXdfToDataFrame

Could any one tell me the preferred function for this as I want to read data & perform the transformation in parallel on each node of the cluster?

Also if I read and perform transformation in chunks, do I have to merge the output of each chunks?

Thanks for your help in advance.

Cheers, Amit

Original Q&A

There are 1 answers

**Hong Ooi** · Answer 1 · 2015-06-17T01:26:40+00:00

Note that rxReadXdf and rxXdfToDataFrame have different arguments and do slightly different things:

rxReadXdf has a numRows argument, so use this if you want to read the top 1000 (say) rows of the dataset
rxXdfToDataFrame supports rxTransforms, so use this if you want to manipulate your data in addition to reading it
rxXdfToDataFrame also has the maxRowsByCols argument, which is another way of capping the size of the input

So in your case, you want to use rxXdfToDataFrame since you're transforming the data in addition to reading it. rxReadXdf is a bit faster in the local compute context if you just want to read the data (no transforms). This is probably also true for HDFS, but I haven’t checked this.

However, are you sure that you want to read the data into a data frame? You can use rxDataStep to run (almost) arbitrary R code on an xdf file, while still leaving your data in that format. See the linked documentation page for how to use the transforms arguments.

TechQA.

Which functions should I use to work with an XDF file on HDFS?

There are 1 answers

Related Questions in R

Related Questions in HADOOP

Related Questions in REVOLUTION-R

Popular Questions

Popular Tags

Trending Questions