SHARE   

A few weeks ago, I put together a tutorial on how to get S3 data into TensorFlow and I used pico swiftstack as an example data source. This article goes into a little more detail about what pico swiftstack is and how I used it to quickly test the S3 API.

First off, pico swiftstack is a Docker image allowing you to quickly run the access layer of SwiftStack so you can test for S3 or Swift API compatibility. pico swiftstack may also be used when integrating with a CI/CD system. Essentially, if  your application works with pico swiftstack, it will work with the full SwiftStack Storage platform. pico SwiftStack is freely available to use and if you need to test the S3 or Swift API in a private environment, I recommend giving it a try.

 

Step 1: Install Docker and AWS CLI

Ubuntu

sudo apt install docker.io

CentOS/RHEL

https://docs.docker.com/install/linux/docker-ce/centos/

AWS CLI Install: both Ubuntu and CentOS

pip install awscli

 

Step 2: Run the picoswiftstack container and get the credentials

sudo docker run -d --rm -p 8080:8080 --hostname="picoswiftstack" --name="picoswiftstack" swiftstack/picoswiftstack

sudo docker exec picoswiftstack get_auth

[note down the output, e.g.]

 

======================= CLUSTER AUTHENTICATION =======================

 

Swift API Auth: http://127.0.0.1:8080/auth/v1.0 or http://<your VM IP>:<your exposed port>/auth/v1.0

SwiftStack Auth Username: test

SwiftStack Auth Password: test

S3 API URL: http://127.0.0.1 or http://<your VM ip>

S3 API Region: us-east-1

S3 Access Key: test

S3 Secret Key: 78cffadc0bea806e405e9615ce8dbb0e

 

======================================================================

 

Step 3: Create credentials files for the AWS CLI client

mkdir -p ~/.aws

 

cat <<EOF|tee ~/.aws/config

[default]

region=us-east-1

output=json

EOF

 

cat <<EOF|tee ~/.aws/credentials

[default]

aws_access_key_id=test

aws_secret_access_key=[PUT YOUR 'S3 Secret Key' here]

EOF

 

Step 4: Verify functionality

Assuming that all worked as anticipated, you should be able to run:

 

aws --endpoint-url http://127.0.0.1:8080 s3 ls

 

and get no errors (no content either).   If you run into errors, it is probably due to one of the variables not being set properly in the ~/.aws/ config files.  You can also manually specify them when calling the AWS client for troubleshooting. Otherwise, verify the docker container is running, that you specified the correct IP/port with the AWS client, and that the endpoint is accessible.

 

Step 5: Create a bucket

In the colab example, we use ‘mnist’ as the bucket name, but you can call it whatever you want as long as you ensure to set the variables properly in the colab code.

 

aws --endpoint-url http://127.0.0.1:8080 s3 mb s3://mnist

 

Output should look like:

 

make_bucket: mnist

 

Step 6: Fetch and unzip data for upload

mkdir mnist

cd mnist/

wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz

wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz

wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

gunzip *

rm -f *.gz

 

Step 7: Upload the MNIST data

for i in `ls`;do aws --endpoint-url http://127.0.0.1:8080 s3 cp $i s3://mnist;done;

 

The output should look like:

upload: ./t10k-images-idx3-ubyte to s3://mnist/t10k-images-idx3-ubyte

upload: ./t10k-labels-idx1-ubyte to s3://mnist/t10k-labels-idx1-ubyte

upload: ./train-images-idx3-ubyte to s3://mnist/train-images-idx3-ubyte

upload: ./train-labels-idx1-ubyte to s3://mnist/train-labels-idx1-ubyte

 

Verify everything was uploaded properly:

aws --endpoint-url http://127.0.0.1:8080 s3 ls mnist

 

The output should look like:

2019-07-08 22:15:25    7840016 t10k-images-idx3-ubyte

2019-07-08 22:15:26      10008 t10k-labels-idx1-ubyte

2019-07-08 22:15:27   47040016 train-images-idx3-ubyte

2019-07-08 22:15:28      60008 train-labels-idx1-ubyte

 

As always, if there’s any way we can help, please feel free to reach out.


IDC Market Note

SwiftStack Stakes Its Play in the AI/ML Market

By Amita Potis Research Director at IDC’s Infrastructure Systems Platforms and Technologies Group

IDC Market Note | SwiftStack Stakes Its Play in the AI/Ml Market

About Author

Jon Kelly

Jon Kelly

Jon Kelly is Director of Machine Learning Solutions at SwiftStack. He has worked in emerging areas of the IT industry for N years, where N is a number larger than he cares to admit. His hobbies include obstinately pursuing demanding physical activities beyond the point at which any rational being would have stopped, and spending time with his family in a variety of activities such as attending tea parties, engaging in hijinks in the world of Minecraft, and emprically demonstrating just how much more energy a child whose age is in the single digits has as compared to an adult.