Transferring S3 Bucket Contents Between Accounts with S3cmd

March 12, 2013

I recently needed to transfer ownership of an Amazon S3 bucket, but found that Amazon doesn't have a way to do this. So I was forced to simply transfer the contents of the bucket rather than the bucket itself. This is relatively easy to do on a small scale, but when that bucket contains 1.3 million images, services like Bucket Explorer tend to break down. A quick search online led me to the answer: s3cmd. S3cmd is a command line tool written in python that allows you to interact with S3 buckets and their contents. Their installation instructions describe building the application on a linux machine, but I found it easy enough to install on my Mac (Snow Leopard) through the following steps:

First, clone the repo locally:

$ git clone git@github.com:s3tools/s3cmd.git

Move into the new s3cmd directory and install:

$ cd s3cmd

$ sudo python setup.py install

Then we need to add our credentials. The following command will prompt you for your S3 key and secret. The prompt for 'Path to GPG program:' confused me a bit and after looking up what it meant, I decided I didn't need it and simply entered nothing in this field.

$ s3cmd --configure

We're installed and configured so we should just be able to start syncing some buckets, right?

$ s3cmd sync --skip-existing --recursive s3://my-source-bucket-name s3://my-target-bucket-name

Not so fast...

ERROR: S3 error: 403 (AccessDenied): Access Denied

My source bucket contained publicly readable directories and files and my target bucket was being fed credentials that should give it access. So what gives? Back to searching I went when I happened upon this blog post: S3 Bucket Copying with Multiple Accounts. Although the post made sense, I still didn't feel like it was necessary. Since I was out of options, I tried it anyway. I added a bucket policy to both the source and target buckets that allow each other's account full access to their paired buckets. The source bucket policy looked like:

{
    "Version": "2008-10-17",
    "Id": "Policy1321826983372",
    "Statement": [
        {
            "Sid": "Stmt1321826980370",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<target account number>:root"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::<source bucket name>/*"
        }
    ]
}

and the target bucket policy looked like:

{
    "Version": "2008-10-17",
    "Id": "Policy1321826983372",
    "Statement": [
        {
            "Sid": "Stmt1321826980370",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<source account number>:root"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::<target bucket name>/*"
        }
    ]
}

One note here: the account number is not you api key, it is the 12 digit number (xxxx-xxxx-xxxx) located under the "Welcome <your name> | Sign Out" line on your account page (https://portal.aws.amazon.com/gp/aws/manageYourAccount).

Finally, I ran it again (not all 1.3M files needed to be copied as some had already been brought over with other methods):

$ s3cmd sync --skip-existing --recursive s3://my-source-bucket-name s3://my-target-bucket-name

Summary: 953831 source files to copy, 0 files at destination to delete

After about three and a half days it finished. Voila:

Done. Copied 953831 files in 299608.7 seconds, 3.18 files/s


August 28 - October 3

October 05, 2011

I'm a ruby developer and recently I built a pretty cool education tool with the guys at Rocket, here in Boise. I've learned a ton from these guys and gained a much greater understanding of application architecture, testing, writing efficient sql queries, and a lot of jQuery (with some haml and sass/scss tossed in there).

So when the project ended in August, what did I do with my new web development skills? I wrote a native iPhone app. No, you're right, the two really don't have anything to do with each other, but I had the time, the interest and most importantly, help from a few guys that know more than me when I needed it.

Before August 28, I had only ever opened XCode once for about fifteen minutes before I gave up. Back then even the simplest app seemed daunting.  Now, having submitted my first app to the app store a little over a month after I started it, I can't believe I waited this long.

It wasn't easy, but it wasn't that difficult either. I started the same way I learned ruby, by coming up with a simple project then brute forcing my way to completion. The code isn't perfect and I'm not terribly efficient with Objective-C yet, but the app works as it should and is finished, all in 37 days.

One thing that helped me a lot (besides having experts on call): the Seinfeld method. I didn't actually use a paper calendar, but I made sure to spend some amount of time reading and writing code every day until I finished on October 3rd. Oh, the app? Tag Along.


GitHub

February 10, 2011

As I've been researching how to find a ruby on rails job, I've come across one standard requirement in the industry: a GitHub account. Now, I've had a GitHub account for over two years, but I had never stored anything there. I simply used it to keep track of gems that I use in projects. I had also never known the proper process of using and editing gems. So when I needed to edit the content of a gem, I downloaded it as a plugin, did my editing and moved on. I had no idea that I should have forked the project, made my edits and then pushed them back for everyone else to see and use. 

So when I learned what I should have been doing, I went back to that gem, only to find that someone else had already made similar changes. Since I wanted to have some code viewable publicly I glanced through the todo lists of some other projects and ended up deciding instead to build a simple app and share that.

So I built a small app for captains of adult league sports teams. I play on three different indoor soccer teams and at one point got suckered into being the captain of one of them. The problem is that every so often we would have to play a game shorthanded or with no subs because I never knew who was actually coming to the games. I looked into a few different ways to manage this and found a great program called Orange Slices. The only problem was that the reminders didn't work and nobody appeared to be supporting it (or at least responding to support emails). So then I looked at using Facebook's private groups and adding events that way, but not everyone on the team was on Facebook. As I have a tendency to do in these situations, I came up with the idea to build my own solution.

A weekend project that overflowed into the week became AlfiesTeam (soon to be at alfieste.am but Armenian registrars take a little longer than most). I know, the name is ridiculous, but: (a) I figured that I'm probably the only person that will use it; (b) it was mainly built as an example app; and (c) if anyone else uses it, I'll become famous (well maybe not, but it worked for Craig Newmark - okay, he had an arguably better app).

The site is live and the code is on GitHub. I'll gladly take feedback if someone actually reads through it. I'll be updating the README file to better explain how I built it and why I did what I did.


HTML5 Audio Tag and the iPad

January 13, 2011

On one of my sites, we use a flash slideshow that allows us to include an audio track to play during the slideshow. However, with the growing popularity of non-flash devices (read: iPad), we have to have a javascript slideshow to fall back on. The problem is that none of the available slideshows have audio track players included and our customers wanted their audio to play on the iPad as well.

Enter the HTML5 <audio> tag. This tag allows you to very easily add audio and play/pause controls to a page with one line:

<audio src="/great_background_music.mp3" controls loop autoplay></audio>

Great. Now when someone views a slideshow on a non-flash device, they'll have the same experience that they would have had with flash: background music while viewing their dream home on the lake. A fix that only took a few minutes. I checked my work in Safari's developer mode to simulate the iPad/iPhone and everything looked good.

Then I flipped on my iPad and waited for the background music to start. And...nothing. After double checking that I had written everything correctly I searched for answers online. Apparently, Apple doesn't want developers to be able to "autoplay" audio on the iPad because it would suck up too much bandwidth, so they disabled the feature. Well, anytime an authority disallows something, someone will find a way to do it anyway, so I kept searching.

There were two schools of thought on getting around this disability: First, write some javascript to simulate a user touching the play button; Second, write some javascript to insert the <audio> tag after the page has loaded. As of iOS 4.2.1, neither of them work.

Now this is where I struggle with some features on my sites. Do I continue to spend hours trying to find a loophole (what I really want to do) or reanalyze the situation and determine how this will affect my users (what I should and ultilmately do do)?

The accepted solution: add a custom play and pause button to the iPad version of the slideshow and allow the user to turn it on and off as they please. If a real estate agent (not our user, but the end customer of our service) wants to set up an iPad at an open house and have it play music, they can.


Copy, Paste, Learn

November 18, 2010

Since I started learning ruby on rails in order to build a program that I wanted to use, I spent a lot of time copying and pasting code that I didn't understand in order to produce the results I wanted quicker.  Of course, as soon as I ran into a bug, this habit bit me in the ass as I had to go back and learn what the copied code did in order to fix my bug.  And the bug always ended up being something debilitating that pissed off a user or two (which at the time was around half the user base).  Now, I know better.  I still copy code snippets (and use plugins) but I make sure I know what I'm copying and how it works.  I may not have known how to write it, but I can read it and explain it to myself (and in most cases, edit it without breaking it).

Today, I spent almost three hours chasing an issue that would have been fixed in moments if I followed my rule above.  I needed to remove spaces and other illegal characters from a string.  Not being a regex expert, I searched for the pattern I needed and found something along the lines of (simplified for the example) :

new_str = str.sub(/\W/,'_')

which worked fine or so I thought.  I knew about and have used "gsub" plenty of times before so when I saw the use of "sub" instead, I went to the ruby docs to sort out the difference.  I quickly glanced at the headings of the two different methods and found the following:

str.sub(pattern, replacement) => new_str
str.gsub(pattern, replacement) => new_str

Ah, so they're the same, I thought.  This seemed reasonable as I've seen other instance methods that are identical (eg. to_s and to_str), so I carried on.  Tested it out on a string with a space and it worked. Then, a user tried a string with three spaces in it and everything broke.  Of course, had I read the documentation closer I would have seen that "sub" only replaces the first occurrence of the pattern and "gsub" replaces every occurrence of the pattern.

The lesson here?  Well, I'm still far from being able to write everything I need without a little help from the google.  However, I need to pay better attention to the details of the code I didn't write myself.

Oh and here is the final result of what I was creating in the first place:


Hello World!

October 12, 2010

I realize that I should probably change the title of this post, but it seems so appropriate.  I've written blogs before (here, here, here and helped my wife with this one), so I am fully aware that WordPress simply puts this placeholder post here to give you a more detailed introduction to their fantastic software.  However, I am leaving the title as-is because this is my first post on a blog devoted to my learning from and giving back to the ruby on rails programming community.

See, I've learned most of what I know about rails from "aha!" blog posts that others have written when faced with a difficult problem.  Not wanting to forget their solution (which I now realize is easy to do), they write out the answer to their problem for guys like me to stumble upon when faced with the same issue.

My hope is that this blog will serve as a searchable index of code that I am sure to forget, and a small contribution to the open source community that has allowed me to do way more than I should be able to do two years in.  I also hope to use it as somewhat of a resume.  My entrepreneurial ways have kept me self-employed since 2004 and I've stated many times that I hope to never need a resume.  Well, if I want to pursue programming I will need to show what I am capable of and this seems like a good launching point for those needing to confirm my worth.