0

I have a batch application which reads large file from amazon s3. MyS3 config:

@Configuration
    public class S3Configuration {
        @Bean
        public S3Client s3Client() {
            return S3Client.builder()
                    .credentialsProvider(DefaultCredentialsProvider.create())
                    .region(Region.AP_EAST_1)
                    .overrideConfiguration(ClientOverrideConfiguration.builder().apiCallAttemptTimeout(Duration.ofHours(6)).build())
                    .build();
        }
    }

And for reading the file

GetObjectRequest getObjectRequest = GetObjectRequest.builder()
                .bucket(bucketName).key(key)
                .build();

ResponseInputStream<GetObjectResponse> getObjectResponseResponseInputStream = s3Client.getObject(getObjectRequest);

But I'm getting connection time out error about after half an hour. Attaching stack trace

Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350) ~[na:na]
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803) ~[na:na]
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:467) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243) ~[na:na]
at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181) ~[na:na]
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1509) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1480) ~[na:na]
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1065) ~[na:na]
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:197) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) ~[httpclient-4.5.13.jar:4.5.13]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132) ~[na:na]
at software.amazon.awssdk.services.s3.checksums.ChecksumValidatingInputStream.read(ChecksumValidatingInputStream.java:112) ~[s3-2.20.144.jar:na]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132) ~[na:na]
at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66) ~[sdk-core-2.20.144.jar:na]
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270) ~[na:na]
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313) ~[na:na]
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188) ~[na:na]
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177) ~[na:na]
at java.base/java.io.BufferedReader.fill(BufferedReader.java:162) ~[na:na]
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329) ~[na:na]
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396) ~[na:na]
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:216) ~[spring-batch-infrastructure-5.0.3.jar:5.0.3]

I have tried with apiCallAttemptTimeout, apiCallTimeout, retryPolicy etc. But nothing works out for me. Can someone please help me in resolving this issue?

8
  • Where you have deployed your application? It is lambda or ecs ? Commented Dec 17, 2023 at 14:53
  • Try using S3 Transfer Manager API. See docs.aws.amazon.com/sdk-for-java/latest/developer-guide/… Commented Dec 17, 2023 at 15:16
  • ALso -- how large is the object you are using? Commented Dec 17, 2023 at 15:27
  • @RohitAgarwal It's in ecs Commented Dec 18, 2023 at 5:13
  • @smac2020 file size is around 6 GB Commented Dec 18, 2023 at 5:13

1 Answer 1

0

The timeout issue would be caused by the delay caused by your ItemWriter updating a database after each batch of lines is read from your file. To work around this you could either

a) Implement an ItemReader that used byte range fetches to read the s3 file in chunks (see https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html); or

b) Download the file to a local temporary file in a step (via a tasklet), then read that local temporary file in the chunk oriented step that updates the database.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.