Non-Uniform distribution of task and data on Pyspark executors

Question

Non-Uniform distribution of task and data on Pyspark executors

161 views Asked by Rakesh Kumar At 07 September 2017 at 10:15

I am running an application on pyspark. For this application below is the snapshot of the distribution of executors. It looks like non-uniformly distributed. Can someone have look and tell where is the problem.

Discription and My Problem:-

I am running my application on huge data, in which I am filtering and joining 3 datasets. After that, I am caching joined data set for generating and aggregating features for the different time period (means my cached data set generate features in the loop). After this, I am trying store these features in a partquet file. This parquet file is taking too much time.

Can any help me to solve this? let me know if you need further information.

Original Q&A

There are 2 answers

**Jack_The_Ripper** · Answer 1 · 2017-09-08T02:32:17+00:00

While my initial suggestion would be to use as little shuffle operations like joins as much as possible. However, if you wish to persist, some suggestions I can provide are to tune your SparkContext in the following ways:

Use Kryo Serializer
Compress data before sending over the network
Play around with your JVM garbage collection
Increase your shuffle memory

**Alper t. Turker** · Answer 2 · 2017-09-07T16:55:51+00:00

As you stated (emphasis mine):

I am filtering and joining 3 datasets. After that, I am caching joined data set for generating and aggregating features

Both joins and, to lesser extent, aggregations might result in a skewed distribution of data if join key or grouping columns are not uniformly distributed - it is a natural consequence of the required shuffles.

In general case there is very little you can do about it. In specific cases it is possible to gain a little with broadcasting or salting, but it doesn't look like the problem is particularly severe in your case.

TechQA.

Non-Uniform distribution of task and data on Pyspark executors

Discription and My Problem:-

There are 2 answers

Related Questions in PYTHON

Related Questions in PYSPARK

Related Questions in EXECUTORS

Popular Questions

Popular Tags

Trending Questions