I have AWS ElastiCache Redis cluster with 2 nodes. I am using redis-py lib for python (version 5.0.1) Seeing following error in pyspark app:
│ File "/usr/local/lib/python3.9/dist-packages/redis/commands/core.py", line 4946, in hget │
│ return self.execute_command("HGET", name, key) │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 536, in execute_command │
│ return conn.retry.call_with_retry( │
│ File "/usr/local/lib/python3.9/dist-packages/redis/retry.py", line 46, in call_with_retry │
│ return do() │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 537, in <lambda> │
│ lambda: self._send_command_parse_response( │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 513, in _send_command_parse_response │
│ return self.parse_response(conn, command_name, **options) │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 553, in parse_response │
│ response = connection.read_response() │
│ File "/usr/local/lib/python3.9/dist-packages/redis/connection.py", line 524, in read_response │
│ raise response │
│ redis.exceptions.ResponseError: MOVED 393 redis-replication-0001-001.xxx.amazonaws.com:6379
Code, giving MOVED error:
import redis
class redis_conn_pool:
def __init__(self, host, password, username):
self.host_ = host
self.pd_ = password
self.user_ = username
def connect(self):
pool = redis.ConnectionPool(host=self.host_,
port=6379,
password=self.pd_,
username=self.user_,
connection_class=redis.SSLConnection,
decode_responses = True,
)
conn = redis.RedisCluster(connection_pool=pool, host = self.host_, reinitialize_steps=1)
self.conn = conn
Tried different values for reinitialize_steps.
Following code is working:
import redis
class redis_conn:
def __init__(self, host, password, username):
self.host_ = host
self.port_ = 6379
self.pd_ = password
self.user_ = username
def connect(self):
conn = redis.RedisCluster(host=self.host_,
port=self.port_,
password=self.pd_,
username=self.user_,
ssl=True,
decode_responses = True,
skip_full_coverage_check=True)
self.conn = conn
Any ideas why it doesn't work with ConnectionPool?
A Redis cluster shards your data over all nodes in your cluster.
ConnectionPoolis not cluster-aware, and can only connect to one node. When data resides in another node, it gives back a reference to that node. If you provide aConnectionPool, I suspect thatRedisClusteronly uses that node even though there are other nodes in the cluster.RedisClustercreates a cluster-aware client and performes the redirection behind the scenes and can therefore return data from all nodes in the cluster. If you create aRedisClusterwithout aConnectionPool, it automatically creates the connection pools needed.To avoid establishing new connections, keep the client alive in your code. Here is a quote on how pooling works internally in
Redis-py: