I have AWS ElastiCache Redis cluster with 2 nodes. I am using redis-py lib for python (version 5.0.1) Seeing following error in pyspark app:
│ File "/usr/local/lib/python3.9/dist-packages/redis/commands/core.py", line 4946, in hget │
│ return self.execute_command("HGET", name, key) │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 536, in execute_command │
│ return conn.retry.call_with_retry( │
│ File "/usr/local/lib/python3.9/dist-packages/redis/retry.py", line 46, in call_with_retry │
│ return do() │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 537, in <lambda> │
│ lambda: self._send_command_parse_response( │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 513, in _send_command_parse_response │
│ return self.parse_response(conn, command_name, **options) │
│ File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 553, in parse_response │
│ response = connection.read_response() │
│ File "/usr/local/lib/python3.9/dist-packages/redis/connection.py", line 524, in read_response │
│ raise response │
│ redis.exceptions.ResponseError: MOVED 393 redis-replication-0001-001.xxx.amazonaws.com:6379
Code, giving MOVED error:
import redis
class redis_conn_pool:
def __init__(self, host, password, username):
self.host_ = host
self.pd_ = password
self.user_ = username
def connect(self):
pool = redis.ConnectionPool(host=self.host_,
port=6379,
password=self.pd_,
username=self.user_,
connection_class=redis.SSLConnection,
decode_responses = True,
)
conn = redis.RedisCluster(connection_pool=pool, host = self.host_, reinitialize_steps=1)
self.conn = conn
Tried different values for reinitialize_steps.
Following code is working:
import redis
class redis_conn:
def __init__(self, host, password, username):
self.host_ = host
self.port_ = 6379
self.pd_ = password
self.user_ = username
def connect(self):
conn = redis.RedisCluster(host=self.host_,
port=self.port_,
password=self.pd_,
username=self.user_,
ssl=True,
decode_responses = True,
skip_full_coverage_check=True)
self.conn = conn
Any ideas why it doesn't work with ConnectionPool?
A Redis cluster shards your data over all nodes in your cluster.
ConnectionPool
is not cluster-aware, and can only connect to one node. When data resides in another node, it gives back a reference to that node. If you provide aConnectionPool
, I suspect thatRedisCluster
only uses that node even though there are other nodes in the cluster.RedisCluster
creates a cluster-aware client and performes the redirection behind the scenes and can therefore return data from all nodes in the cluster. If you create aRedisCluster
without aConnectionPool
, it automatically creates the connection pools needed.To avoid establishing new connections, keep the client alive in your code. Here is a quote on how pooling works internally in
Redis-py
: