Contributed by Dirk Petersen, Scientific Computing Director, Fred Hutchinson Cancer Research Center
Today the OpenStack Swift API is widely used, and developers enjoy how easy and fast they can interact with stored objects. If you have ever tried to pull a listing of 100,000 files from a Posix filesystem via NFS, you are pleasantly surprised that you can pull a list of 1M objects from Swift in a couple of seconds. However, there are many cases where end users and even developers would like to access Swift more interactively, such as browsing through objects in a container and easily picking files for upload and download.
Today we have a number of tools such as ExpanDrive (Windows, Mac), Cloudberry Drive (Windows), Cyberduck (Windows, Mac) and the OpenStack Drive (Windows, Mac is coming) from Storage Made Easy. The OpenStack drive is of particular interest because it focuses entirely on supporting Swift API well (instead of supporting multiple APIs such as S3 and Google). Except for Cyberduck, all clients mount Swift as a drive in Windows Explorer or integrate in Mac finder.
However, all tools that mount Swift as a drive have one limitation in common: All data that is uploaded has to go through a cache on the local disk first. This is really great for interactive performance with small amounts of data but it is not so great if large amounts of data from Scientific Instruments (e.g. sequencing or imaging data) fill up your laptop that is equipped with a small SSD.
Over the last year we found that we got the best performance, consistency and reliability when we used the standard Openstack python-swiftclient. The python-swiftclient is (aside from curl) the most mature and battle hardened client for Swift. Unfortunately there is no GUI for it and the command line is relatively complex for end users as you need to set multiple command options to make it work as desired. Another fairly mundane limitation is that it’s hard to install on Windows. Once you got it to work and successfully upload a huge folder with lots of files in subfolders you quickly run into another unexpected problem.
By default the swiftclient lists all objects in a container and it is fairly complicated to filter that listing by pseudo folders (which are actually forward slashes as part of the object name). One of the huge benefits of object storage system is the ability to set arbitrary metadata, yet the only GUI client that supports setting of metadata for uploaded data is Cyberduck (which hides this option in a non-obvious place.) Given these shortcomings I wondered how hard it can be to address these issue with a pure python solution that uses the tested python-swiftclient machinery and is really easy to use.
What I Tried
I really wanted to achieve something with writing little code (often in my spare time), and I wanted users to be able to use it productively within 10 seconds. I also wanted it to work not only on Windows but also on Mac and Linux. I used the following python components:
- cx_freeze: turns your script into a binary executable (e.g. exe on Windows) for Windows, Linux or Mac and packages python, the modules used by your script and optionally some binary libraries such as windows DLLs into an installable package (e.g. Windows msi)
- easygui: Easy GUI is an extremely simple GUI toolkit that uses the TCL/TK stuff that comes with python offers Message Boxes, List Boxes, Password prompts and FileOpen dialogs.
These 2 components are really all you need to create a very simple but functional and installable GUI. The result is available on github: https://github.com/FredHutch/swiftclient-gui . To install this software on windows use the msi installer in the msi subdirectory.
After you installed “Openstack Swift Client GUI” you launch the App
Then you’ll get a simple login screen. If you use SwiftStack with LDAP or Active Directory authentication and want to access shared storage to which all members of a workgroup have access, enter a LDAP Security group as Tenant
Once you are authenticated successfully you will see this simple interface through which you can now upload or download folders to/from swift …. But wait | there is an even simpler way. Hit cancel:
Then open Windows Explorer. Simply select a folder (such as Downloads), right click and select Swift: Upload folder
You will see a list of existing containers in the current swift account / tenant
Now you have 2 options: You can either select an existing container to upload the folder you selected on your computer (e.g. Downloads from Windows Explorer) to a new pseudo directory inside an existing container or you upload to a newly created container ‘Downloads’.
After hitting OK you will be prompted for another optional dialog which allows you to add arbitrary metadata to your upload. If the uploaded data is related to a project called “Hawaii” you can simply add a line (called a key:value pair) like this:
This will allow you to better retrieve your data later by filtering for metadata.
Hit OK to start the upload.
Now the upload starts, we are simply redirecting the python-swiftclient console output to a little tool that works like unix tail and is refreshed in real time. A log file is created for each folder uploaded in case troubleshooting is required later.
After a successful upload let’s try downloading data. Again, we select our “Downloads” folder (If you were confused before why I would upload the “Downloads” folders …. now it’s all good)
Now you can actually browse the structure of container/pseudo/folder etc and pick a subfolder for download
and perhaps you’ll see a successful download quickly:
How You Can Help
Use it, test it, give feedback! The application works under Linux (see below) but has not been widely tested. It should also work under Mac but it has not been tested yet.
Please fork it on github, make it work on a Mac, integrate it in Finder, etc. The source code is only 600 something lines, you should not find it hard to make changes to it. There are also a number of enhancements where I would appreciate some help: https://github.com/FredHutch/swiftclient-gui/issues
Other cool stuff
Index search features (e.g. elastic search) are a part of the discussion around monitoring and also setting metadata through the swiftclient. Martin Lanner presented a session at the latest OpenStack Summit on this topic, you can watch the replay here.