I have a batch application which reads large file from amazon s3. MyS3 config:
@Configuration
public class S3Configuration {
@Bean
public S3Client s3Client() {
return S3Client.builder()
.credentialsProvider(DefaultCredentialsProvider.create())
.region(Region.AP_EAST_1)
.overrideConfiguration(ClientOverrideConfiguration.builder().apiCallAttemptTimeout(Duration.ofHours(6)).build())
.build();
}
}
And for reading the file
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
.bucket(bucketName).key(key)
.build();
ResponseInputStream<GetObjectResponse> getObjectResponseResponseInputStream = s3Client.getObject(getObjectRequest);
But I'm getting connection time out error about after half an hour. Attaching stack trace
Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803) ~[na:na]
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:467) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181) ~[na:na]
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1509) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1480) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1065) ~[na:na]
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:197) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) ~[httpclient-4.5.13.jar:4.5.13]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132) ~[na:na]
at software.amazon.awssdk.services.s3.checksums.ChecksumValidatingInputStream.read(ChecksumValidatingInputStream.java:112) ~[s3-2.20.144.jar:na]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132) ~[na:na]
at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) ~[sdk-core-2.20.144.jar:na]
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270) ~[na:na]
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313) ~[na:na]
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188) ~[na:na]
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177) ~[na:na]
at java.base/java.io.BufferedReader.fill(BufferedReader.java:162) ~[na:na]
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329) ~[na:na]
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396) ~[na:na]
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:216) ~[spring-batch-infrastructure-5.0.3.jar:5.0.3]
I have tried with apiCallAttemptTimeout, apiCallTimeout, retryPolicy etc. But nothing works out for me. Can someone please help me in resolving this issue?
The timeout issue would be caused by the delay caused by your ItemWriter updating a database after each batch of lines is read from your file. To work around this you could either
a) Implement an ItemReader that used byte range fetches to read the s3 file in chunks (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html); or
b) Download the file to a local temporary file in a step (via a tasklet), then read that local temporary file in the chunk oriented step that updates the database.