Understanding some basics of Spark SQL

Question

Understanding some basics of Spark SQL

2k views Asked by lars At 06 January 2017 at 21:13

I'm following http://spark.apache.org/docs/latest/sql-programming-guide.html

After typing:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

I have some questions that I didn't see the answers to.

First, what is the $-notation? As in

 df.select($"name", $"age" + 1).show()

Second, can I get the data from just the 2nd row (and I don't know what the data is in the second row).

Third, how would you read in a color image with spark sql?

4th, I'm still not sure what the difference is between a dataset and dataframe in spark. The variable df is a dataframe, so could I change "Michael" to the integer 5? Could I do that in a dataset?

Original Q&A

There are 2 answers

Vishnu Subramanian On 07 January 2017 at 02:46

1) For question 1, $ sign is used as a shortcut for selecting a column and applying functions on top of it. For example:

df.select($"id".isNull).show

which can be other wise written as

df.select(col("id").isNull)

2) Spark does not have indexing, but for prototyping you can use df.take(10)(i) where i could be the element you want. Note: the behaviour could be different each time as the underlying data is partitioned.

**user7337271** · Accepted Answer · 2017-01-06T21:24:10+00:00

user7337271 On 06 January 2017 at 21:24 BEST ANSWER

$ is not annotation. It is a method call (shortcut for new ColumnName("name")).
You wouldn't. Spark SQL has no notion of row indexing.
You wouldn't. You can use low level RDD API with specific input formats (like ones from HIPI project) and then convert.
Difference between DataSet API and DataFrame

TechQA.

Understanding some basics of Spark SQL

There are 2 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in APACHE-SPARK-DATASET

Popular Questions

Popular Tags

Trending Questions