Getting connection timeout while reading file from s3 using aws sdk 2x

358 views Asked by At

I have a batch application which reads large file from amazon s3. MyS3 config:

@Configuration
    public class S3Configuration {
        @Bean
        public S3Client s3Client() {
            return S3Client.builder()
                    .credentialsProvider(DefaultCredentialsProvider.create())
                    .region(Region.AP_EAST_1)
                    .overrideConfiguration(ClientOverrideConfiguration.builder().apiCallAttemptTimeout(Duration.ofHours(6)).build())
                    .build();
        }
    }

And for reading the file

GetObjectRequest getObjectRequest = GetObjectRequest.builder()
                .bucket(bucketName).key(key)
                .build();

ResponseInputStream<GetObjectResponse> getObjectResponseResponseInputStream = s3Client.getObject(getObjectRequest);

But I'm getting connection time out error about after half an hour. Attaching stack trace

Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803) ~[na:na]
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:467) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181) ~[na:na]
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1509) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1480) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1065) ~[na:na]
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:197) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) ~[httpclient-4.5.13.jar:4.5.13]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132) ~[na:na]
at software.amazon.awssdk.services.s3.checksums.ChecksumValidatingInputStream.read(ChecksumValidatingInputStream.java:112) ~[s3-2.20.144.jar:na]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132) ~[na:na]
at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) ~[sdk-core-2.20.144.jar:na]
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270) ~[na:na]
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313) ~[na:na]
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188) ~[na:na]
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177) ~[na:na]
at java.base/java.io.BufferedReader.fill(BufferedReader.java:162) ~[na:na]
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329) ~[na:na]
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396) ~[na:na]
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:216) ~[spring-batch-infrastructure-5.0.3.jar:5.0.3]

I have tried with apiCallAttemptTimeout, apiCallTimeout, retryPolicy etc. But nothing works out for me. Can someone please help me in resolving this issue?

1

There are 1 answers

0
httPants On

The timeout issue would be caused by the delay caused by your ItemWriter updating a database after each batch of lines is read from your file. To work around this you could either

a) Implement an ItemReader that used byte range fetches to read the s3 file in chunks (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html); or

b) Download the file to a local temporary file in a step (via a tasklet), then read that local temporary file in the chunk oriented step that updates the database.