AWS S3
Python
File Download
Boto3
Cloud Storage

Download file from AWS S3 using Python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The standard way to download files from Amazon S3 using Python is the Boto3 library, the official AWS SDK for Python. Boto3 provides two interfaces — a high-level resource API with download_file() and download_fileobj(), and a low-level client API with the same methods. For most use cases, s3.download_file(bucket, key, local_path) is the simplest approach. For large files, Boto3 automatically uses multipart downloads to speed up the transfer.

Prerequisites

bash
1# Install Boto3
2pip install boto3
3
4# Configure AWS credentials (one-time setup)
5aws configure
6# AWS Access Key ID: AKIA...
7# AWS Secret Access Key: wJal...
8# Default region name: us-east-1
9# Default output format: json

Credentials can also be set via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or an IAM role attached to an EC2 instance.

Basic Download with Client

python
1import boto3
2
3s3 = boto3.client('s3')
4
5# Download a file from S3 to a local path
6s3.download_file(
7    Bucket='my-bucket',
8    Key='data/reports/2025-01-report.csv',
9    Filename='/tmp/report.csv'
10)
11print("Download complete")

download_file() handles the entire transfer, including multipart downloads for large files. The Key is the full S3 object path (not including the bucket name).

Download with Resource API

python
1import boto3
2
3s3 = boto3.resource('s3')
4
5# Access the bucket and object
6bucket = s3.Bucket('my-bucket')
7bucket.download_file('data/reports/2025-01-report.csv', '/tmp/report.csv')
8
9# Alternative: access object directly
10obj = s3.Object('my-bucket', 'data/reports/2025-01-report.csv')
11obj.download_file('/tmp/report.csv')

The resource API provides a more object-oriented interface. Both approaches call the same underlying S3 API.

Download to Memory (BytesIO)

python
1import boto3
2from io import BytesIO
3
4s3 = boto3.client('s3')
5
6# Download to an in-memory buffer instead of a file
7buffer = BytesIO()
8s3.download_fileobj('my-bucket', 'data/image.png', buffer)
9
10# Read the content
11buffer.seek(0)
12content = buffer.read()
13print(f"Downloaded {len(content)} bytes")
14
15# Or use get_object for direct streaming
16response = s3.get_object(Bucket='my-bucket', Key='data/config.json')
17body = response['Body'].read().decode('utf-8')
18print(body)

download_fileobj() writes to any file-like object. get_object() returns a streaming response body that you read directly.

Download with Progress Callback

python
1import boto3
2import os
3
4s3 = boto3.client('s3')
5
6# Get file size first
7head = s3.head_object(Bucket='my-bucket', Key='data/large-file.zip')
8total_size = head['ContentLength']
9
10downloaded = 0
11
12def progress_callback(bytes_transferred):
13    global downloaded
14    downloaded += bytes_transferred
15    pct = (downloaded / total_size) * 100
16    print(f"\rDownloading: {pct:.1f}%", end="", flush=True)
17
18s3.download_file(
19    'my-bucket',
20    'data/large-file.zip',
21    '/tmp/large-file.zip',
22    Callback=progress_callback
23)
24print("\nDone")

The Callback parameter is called with the number of bytes transferred in each chunk, allowing you to display progress.

Download Multiple Files

python
1import boto3
2import os
3
4s3 = boto3.client('s3')
5
6def download_directory(bucket, prefix, local_dir):
7    """Download all files under an S3 prefix to a local directory."""
8    paginator = s3.get_paginator('list_objects_v2')
9
10    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
11        for obj in page.get('Contents', []):
12            key = obj['Key']
13            # Skip "directory" markers
14            if key.endswith('/'):
15                continue
16
17            # Build local file path
18            relative_path = key[len(prefix):].lstrip('/')
19            local_path = os.path.join(local_dir, relative_path)
20
21            # Create local directories
22            os.makedirs(os.path.dirname(local_path), exist_ok=True)
23
24            print(f"Downloading {key} -> {local_path}")
25            s3.download_file(bucket, key, local_path)
26
27# Download all files under a prefix
28download_directory('my-bucket', 'data/reports/', '/tmp/reports')

Use list_objects_v2 with a paginator to handle prefixes with more than 1,000 objects.

Download with Custom Configuration

python
1import boto3
2from boto3.s3.transfer import TransferConfig
3
4s3 = boto3.client('s3')
5
6# Custom transfer configuration for large files
7config = TransferConfig(
8    multipart_threshold=8 * 1024 * 1024,     # 8 MB — use multipart above this size
9    max_concurrency=10,                        # parallel download threads
10    multipart_chunksize=8 * 1024 * 1024,      # 8 MB per chunk
11    use_threads=True
12)
13
14s3.download_file(
15    'my-bucket',
16    'data/huge-dataset.tar.gz',
17    '/tmp/huge-dataset.tar.gz',
18    Config=config
19)

TransferConfig controls multipart thresholds and concurrency. Increasing max_concurrency speeds up large file downloads on fast networks.

Generate Presigned URL (No SDK Needed on Client)

python
1import boto3
2
3s3 = boto3.client('s3')
4
5# Generate a temporary download URL (valid for 1 hour)
6url = s3.generate_presigned_url(
7    'get_object',
8    Params={'Bucket': 'my-bucket', 'Key': 'data/report.pdf'},
9    ExpiresIn=3600  # seconds
10)
11print(url)
12# https://my-bucket.s3.amazonaws.com/data/report.pdf?AWSAccessKeyId=...&Signature=...&Expires=...

Presigned URLs let users download files without AWS credentials. Useful for sharing files temporarily or serving downloads from a web application.

Common Pitfalls

  • Forgetting to create local directories: download_file() does not create parent directories. If the local path is /tmp/data/report.csv and /tmp/data/ does not exist, the call raises FileNotFoundError. Use os.makedirs(os.path.dirname(path), exist_ok=True).
  • Confusing Key with the full S3 URI: The Key parameter is just the object path within the bucket (e.g., data/file.csv), not s3://bucket/data/file.csv. Do not include the bucket name or the s3:// prefix in the key.
  • Not handling ClientError for missing objects: Downloading a non-existent key raises botocore.exceptions.ClientError with error code 404. Wrap downloads in a try/except block or check existence with head_object first.
  • Using get_object for large files: get_object() loads the response into memory via .read(). For files larger than available RAM, use download_file() or download_fileobj() which stream to disk in chunks.
  • Assuming credentials are always in ~/.aws/credentials: On EC2 instances, ECS tasks, or Lambda functions, credentials come from IAM roles — no credentials file exists. Boto3 checks the credential chain automatically (env vars, config file, IAM role, etc.).

Summary

  • Use s3.download_file(bucket, key, local_path) for the simplest file download
  • Use download_fileobj() or get_object() to download into memory or a file-like object
  • Boto3 automatically handles multipart downloads for large files — customize with TransferConfig
  • Use a paginator with list_objects_v2 to download all files under a prefix
  • Generate presigned URLs for temporary, credential-free download links
  • Always handle missing objects with try/except and create local directories before downloading

Course illustration
Course illustration

All Rights Reserved.