Metadata Life Cycle

Organizations of today can easily generate and persist large amounts of data, but this does not come without a challenge:  The question, “How can we store all of this data?” has been replaced with, “How can we derive value from all of this data?”  Without a logical way to locate something or to perform analysis across a pile of data, the value of the data is lost.  Metadata solves that problem.

Metadata is information that describes data, and it can include any variety of things—like the GPS coordinates or aperture settings of a camera when a photo is taken, or the patient ID associated with an MRI, or the list of actors and actresses in a video file.  A single file may have tens or even hundreds of pieces of metadata describing it, and—when that file is one of millions or billions of files in a storage repository—the metadata provides a fast way to search for specific files or even to perform analysis across a group of data files.

When you think of metadata, consider two important phases—first, the creation of the metadata, and second, the utilization of the metadata.  The creation of metadata can be automatic—like when a camera records its location or aperture settings as a picture is taken, or it can be manual after-the-fact—like when someone adds a comment or “tags” a person in the picture in their photo library.  The utilization of the metadata comes into play when the metadata already exists—like when someone searches their photo library for all of the pictures tagged with a particular friend’s face. (Take a look at our simple photo-mapping web application as an example.)  It goes without saying that you can’t utilize metadata if it hasn’t been created, and simply creating metadata doesn’t add value unless there is someone or something planning to utilize it.


Now, let’s bridge from our simple analogies to what we see happening in the multi-cloud world around us.  Technically speaking, metadata is usually in the form of small alphanumeric data types that can be easily indexed by relational databases, so—rather than indexing all of the data itself—the databases make it quick and easy to search just the metadata. As we said before, this metadata information can be added manually or programmatically: Manual metadata added “by hand” typically describes the content of the data, while programmatic metadata most often refers to something calculable and may be more specific to each type of file.  At large scale (e.g., hundreds of thousands or millions of files), programmatic addition of metadata that requires no additional human effort is preferable, and this is where new technologies come into play.

Artificial Intelligence (AI) and Machine Learning (ML) blur the line between manual and programmatic generation of metadata:  They can create content-specific descriptions of data automatically at machine-speed. SwiftStack supports both manual and programmatic metadata types with end-to-end solutions for machine learning.   Tools like TensorFlow and Turicreate can use data stored on SwiftStack to train and generate AI models, which can then be used to generate metadata for the objects stored in SwiftStack, and more tools like these are becoming available every day in on-premises software and public-cloud services.  In fact, public cloud vendors like Amazon, Google, and Microsoft are demonstrating differentiation between their clouds based on some of these AI and ML tools.

We’ll explore more about how SwiftStack’s multi-cloud data management software leverages and enables these tools in real-world workflows in the next post in this series, but for starters, take a look at the demo we showed at NAB in April—using Google’s transcription service to generate text from an audio track and insert that text as searchable metadata within a media asset library.  Then, think about applying cloud-based facial recognition to your video surveillance data or object-identification to your photos, or think about the ability to search metadata across all of your cloud storage—both on-premises and in multiple public clouds—to analyze a collection of data as if it was in a single location.


SwiftStack makes this possible.  Stay tuned for more…


About Author

Ted Butler

Ted Butler

A computer engineer who ventured into the magical world of production, post production and broadcasting. I enjoy architecting solutions from different vendors all the way to developing software solution to streamline workflows and achieve efficiencies to attain optimal productivity for todays expectations.