Copying objects between two different amazon s3 accounts

I have recently taken on ownership of a website and as part of the migration task i’ve had to copy over a few artefacts;I will be posting another blog about what i have learnt from this process.

The website uses Amazon s3 for storing users’ uploaded photos and document. I was faced with the task of copying these files over to my S3 account.

Quite a number of way to achieve this; but i stumbled on a link which is a python script that copies object between two buckets in the same account.

I have then updated this script to handle copying objects between two different accounts.

The edited script is as below:

from boto.s3.connection import S3Connection
from boto.s3.key import Key
from Queue import LifoQueue
import threading

source_aws_key = '*******************'
source_aws_secret_key = '*******************'
dest_aws_key = '*******************'
dest_aws_secret_key = '*******************'
srcBucketName = '*******************'
dstBucketName = '*******************'

class Worker(threading.Thread):
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.source_conn = S3Connection(source_aws_key, source_aws_secret_key)
        self.dest_conn = S3Connection(dest_aws_key, dest_aws_secret_key)
        self.srcBucket = self.source_conn.get_bucket(srcBucketName)
        self.dstBucket = self.dest_conn.get_bucket(dstBucketName)
        self.queue = queue

    def run(self):
        while True:
            key_name = self.queue.get()
            k = Key(self.srcBucket, key_name)
            dist_key = Key(self.dstBucket, k.key)
            if not dist_key.exists() or k.etag != dist_key.etag:
                print 'copy: ' + k.key
                self.dstBucket.copy_key(k.key, srcBucketName, k.key, storage_class=k.storage_class)
            else:
                print 'exists and etag matches: ' + k.key

            self.queue.task_done()

def copyBucket(maxKeys = 1000):
    print 'start'
	
    s_conn = S3Connection(source_aws_key, source_aws_secret_key)
    srcBucket = s_conn.get_bucket(srcBucketName)

    resultMarker = ''
    q = LifoQueue(maxsize=5000)

    for i in range(10):
        print 'adding worker'
        t = Worker(q)
        t.daemon = True
        t.start()

    while True:
        print 'fetch next 1000, backlog currently at %i' % q.qsize()
        keys = srcBucket.get_all_keys(max_keys = maxKeys, marker = resultMarker)
        for k in keys:
            q.put(k.key)
        if len(keys) < maxKeys:
            print 'Done'
            break
        resultMarker = keys[maxKeys - 1].key

    q.join()
    print 'done'

if __name__ == "__main__":
    copyBucket()

Pre-requisites are you have to install python and install the boto s3 library.

and obviously you have to put your credentials in there

Advertisements
This entry was posted in Programming and tagged , , . Bookmark the permalink.

2 Responses to Copying objects between two different amazon s3 accounts

  1. Bruce says:

    Thanks for posting exactly what I needed, but I have a question regarding the setup required for this to work.
    What AWS account credentials are you using? I’ve tried with the root account and an admin group user that has full admin access and I always get
    AccessDeniedAccess DeniedC864816EDE369292WPq+O5Z9xBA5nLUozHdoNgQbNrbd5JvR601XBPoIkd2gW/rFsBu6AXCdwrIyPl+l
    on the copy_key method.

  2. Bhavya says:

    Script works fine only when bucket in other region is empty. Once the files exists it fails.

    start
    adding worker
    adding worker
    adding worker
    adding worker
    adding worker
    adding worker
    adding worker
    adding worker
    adding worker
    adding worker
    fetch next 1000, backlog currently at 0
    Done
    Exception in thread Thread-1:
    Traceback (most recent call last):
    File “/usr/local/lib/python2.7/threading.py”, line 810, in __bootstrap_inner
    self.run()
    File “/root/jtmp/stage/prod_to_stage_s3.py”, line 29, in run
    if not dist_key.exists() or k.etag != dist_key.etag:
    File “/opt/pythonenvs/system/lib/python2.7/site-packages/boto/s3/key.py”, line 516, in exists
    return bool(self.bucket.lookup(self.name, headers=headers))
    File “/opt/pythonenvs/system/lib/python2.7/site-packages/boto/s3/bucket.py”, line 143, in lookup
    return self.get_key(key_name, headers=headers)
    File “/opt/pythonenvs/system/lib/python2.7/site-packages/boto/s3/bucket.py”, line 193, in get_key
    key, resp = self._get_key_internal(key_name, headers, query_args_l)
    File “/opt/pythonenvs/system/lib/python2.7/site-packages/boto/s3/bucket.py”, line 235, in _get_key_internal
    response.status, response.reason, ”)
    S3ResponseError: S3ResponseError: 403 Forbidden

    Can you please help on why it might be the case?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s