Transferring S3 Bucket Contents Between Accounts with S3cmd

March 12, 2013

I recently needed to transfer ownership of an Amazon S3 bucket, but found that Amazon doesn't have a way to do this. So I was forced to simply transfer the contents of the bucket rather than the bucket itself. This is relatively easy to do on a small scale, but when that bucket contains 1.3 million images, services like Bucket Explorer tend to break down. A quick search online led me to the answer: s3cmd. S3cmd is a command line tool written in python that allows you to interact with S3 buckets and their contents. Their installation instructions describe building the application on a linux machine, but I found it easy enough to install on my Mac (Snow Leopard) through the following steps:

First, clone the repo locally:

$ git clone git@github.com:s3tools/s3cmd.git

Move into the new s3cmd directory and install:

$ cd s3cmd

$ sudo python setup.py install

Then we need to add our credentials. The following command will prompt you for your S3 key and secret. The prompt for 'Path to GPG program:' confused me a bit and after looking up what it meant, I decided I didn't need it and simply entered nothing in this field.

$ s3cmd --configure

We're installed and configured so we should just be able to start syncing some buckets, right?

$ s3cmd sync --skip-existing --recursive s3://my-source-bucket-name s3://my-target-bucket-name

Not so fast...

ERROR: S3 error: 403 (AccessDenied): Access Denied

My source bucket contained publicly readable directories and files and my target bucket was being fed credentials that should give it access. So what gives? Back to searching I went when I happened upon this blog post: S3 Bucket Copying with Multiple Accounts. Although the post made sense, I still didn't feel like it was necessary. Since I was out of options, I tried it anyway. I added a bucket policy to both the source and target buckets that allow each other's account full access to their paired buckets. The source bucket policy looked like:

{
    "Version": "2008-10-17",
    "Id": "Policy1321826983372",
    "Statement": [
        {
            "Sid": "Stmt1321826980370",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<target account number>:root"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::<source bucket name>/*"
        }
    ]
}

and the target bucket policy looked like:

{
    "Version": "2008-10-17",
    "Id": "Policy1321826983372",
    "Statement": [
        {
            "Sid": "Stmt1321826980370",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<source account number>:root"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::<target bucket name>/*"
        }
    ]
}

One note here: the account number is not you api key, it is the 12 digit number (xxxx-xxxx-xxxx) located under the "Welcome <your name> | Sign Out" line on your account page (https://portal.aws.amazon.com/gp/aws/manageYourAccount).

Finally, I ran it again (not all 1.3M files needed to be copied as some had already been brought over with other methods):

$ s3cmd sync --skip-existing --recursive s3://my-source-bucket-name s3://my-target-bucket-name

Summary: 953831 source files to copy, 0 files at destination to delete

After about three and a half days it finished. Voila:

Done. Copied 953831 files in 299608.7 seconds, 3.18 files/s