The most surprising thing about chunked S3 uploads is that they’re often slower than a single, large upload, but you do them anyway because they’re more resilient and offer a better user experience.

Let’s see it in action. Imagine a user uploading a 500MB video file. Without chunking, if the network connection drops halfway through, the whole upload fails, and they have to start from scratch. With chunking, we break that 500MB file into, say, 5MB pieces. If the connection drops, we only need to re-upload the last piece, not the whole thing. This is crucial for large files or unreliable networks.

Here’s a simplified client-side JavaScript snippet using aws-sdk (you’d likely use a more robust library like tus-js-client or aws-sdk’s ManagedUpload for production):

const AWS = require('aws-sdk');
const fs = require('fs');

AWS.config.update({ region: 'us-east-1' });
const s3 = new AWS.S3();

async function uploadFileChunked(filePath, bucketName, key) {
    const fileSize = fs.statSync(filePath).size;
    const chunkSize = 5 * 1024 * 1024; // 5MB
    let uploadedBytes = 0;
    const fileStream = fs.createReadStream(filePath);
    let uploadId = null;

    // Step 1: Initiate Multipart Upload
    try {
        const multipartUpload = await s3.createMultipartUpload({
            Bucket: bucketName,
            Key: key
        }).promise();
        uploadId = multipartUpload.UploadId;
        console.log(`Initiated upload with UploadId: ${uploadId}`);
    } catch (err) {
        console.error("Error initiating multipart upload:", err);
        return;
    }

    const uploadPromises = [];
    let partNumber = 1;

    fileStream.on('data', async (chunk) => {
        if (uploadedBytes < fileSize) {
            const currentChunkSize = chunk.length;
            const params = {
                Bucket: bucketName,
                Key: key,
                UploadId: uploadId,
                PartNumber: partNumber,
                Body: chunk
            };

            try {
                const uploadPartResult = await s3.uploadPart(params).promise();
                console.log(`Uploaded part ${partNumber} with ETag: ${uploadPartResult.ETag}`);
                uploadedBytes += currentChunkSize;
                partNumber++;
                const progress = (uploadedBytes / fileSize) * 100;
                console.log(`Upload Progress: ${progress.toFixed(2)}%`);
                uploadPromises.push({ PartNumber: partNumber - 1, ETag: uploadPartResult.ETag });
            } catch (err) {
                console.error(`Error uploading part ${partNumber}:`, err);
                // In a real app, you'd handle retries here and potentially abort the upload
            }
        }
    });

    fileStream.on('end', async () => {
        if (uploadId && uploadPromises.length > 0) {
            // Step 3: Complete Multipart Upload
            try {
                const completeParams = {
                    Bucket: bucketName,
                    Key: key,
                    UploadId: uploadId,
                    MultipartUpload: {
                        Parts: uploadPromises
                    }
                };
                const completionResult = await s3.completeMultipartUpload(completeParams).promise();
                console.log("Upload complete:", completionResult.Location);
            } catch (err) {
                console.error("Error completing multipart upload:", err);
            }
        }
    });

    fileStream.on('error', (err) => {
        console.error("File stream error:", err);
        // Abort the upload if there's a file stream error
        if (uploadId) {
            s3.abortMultipartUpload({
                Bucket: bucketName,
                Key: key,
                UploadId: uploadId
            }).promise().then(() => console.log("Upload aborted due to file stream error.")).catch(abortErr => console.error("Error aborting upload:", abortErr));
        }
    });
}

// Example usage:
// uploadFileChunked('./my-large-video.mp4', 'my-s3-bucket-name', 'videos/my-large-video.mp4');

The core of this process is S3’s Multipart Upload API. It’s not just about breaking files into pieces; it’s a stateful operation on S3’s side.

  1. Initiate Multipart Upload: You tell S3 you’re starting a multipart upload for a specific object. S3 responds with a unique UploadId. This ID is your handle for all subsequent operations related to this specific upload.
  2. Upload Parts: You send individual parts of your file to S3, along with the UploadId and a PartNumber (which must be sequential from 1 to 10,000). S3 stores each part and returns an ETag for it. The ETag is a checksum or identifier for that specific part.
  3. Complete Multipart Upload: Once all parts are uploaded, you send a request to S3 with the UploadId and a list of all Parts (their PartNumber and ETag). S3 then assembles these parts in the correct order into the final object.

The UploadId is critical. If your client crashes or the network disconnects before you complete the upload, those uploaded parts just sit in S3, potentially costing you money. You need a mechanism to clean these up. S3 has lifecycle policies for this: you can configure a bucket to automatically abort incomplete multipart uploads after a certain number of days.

A common pitfall is not correctly handling the PartNumber. If you upload parts out of order or with duplicate PartNumbers, the CompleteMultipartUpload step will fail. Also, the ETag is crucial for integrity; if a part’s ETag doesn’t match what S3 stored, the completion will fail.

The next concept you’ll run into is managing the state of these uploads on the server-side or using a dedicated service like AWS Transfer Family to offload the complexity.

Want structured learning?

Take the full React course →