Data volumes are growing dramatically, there’s no question about that. According to Domo, 2.5 quintillion bytes of data were created every day in 2017. That number has undoubtedly increased this year. The Internet of Things (IoT) is projected to include up to 200 billion devices by 2020, many of them continually collecting and generating data.
But data volume is only part of the story. Data scientists like to talk about the three Vs — volume, velocity and variety. Velocity refers to the fact that data growth is accelerating. In a 2013 whitepaper, Facebook revealed that users had uploaded more than 250 billion photos and were continuing to upload 350 million new photos every day. By 2015, Facebook users were uploading more than 900 million photos each day.
Variety refers to the fact that data is becoming more and more diverse. The vast majority of it is unstructured, not fitting neatly into the rows and columns of a database. In addition to photographs, we have audio, video, IoT sensor data and many other data types.
Simply storing this data is expensive, both in terms of the raw hardware capacity and the IT resources needed to manage the environment. That’s why forward-thinking companies are trying to extract value from data by analyzing it for business insight.
Trouble is, most data storage systems were not designed for today’s volume, velocity and variety. Sure, storage hardware has increased in performance and capacity but it’s still not enough. Traditional file systems use location-based addressing schemes that depend largely on simple metadata in the file header. Problems arise when you have huge repositories of unstructured data, such as thousands of images with nearly identical metadata.
Object storage has been touted as the answer to this problem. It puts a file and all its associated metadata into a container and assigns it a unique 128-bit identifier, eliminating the need for an application to find the physical location of information. However, object storage creates other kinds of overhead, making it unsuitable for high-performance applications and unstructured data that changes frequently.
I’m excited about Qumulo’s approach to storing and managing large volumes of data. The Qumulo File Fabric (QF2) is a unified file system designed to handle billions of files across the data center and the public cloud. On-premises hardware and cloud instances work together to create a storage fabric with the highest levels of performance and scalability. Real-time analytics enable administrators to more effectively control and manage storage usage anywhere in the world.
QF2 is now the platform of choice for industry sectors that require high-performance, high-volume storage. Medical research institutions are using QF2 for next-generation genomic sequencing, which involves the analysis of billions of small files, and research imaging, which requires many random reads from large files. In the oil and gas sector, Qumulo is capable of supporting document repositories with millions of digitized files, including large 3-D renderings.
Qumulo is also enabling film and video production studios to eliminate costly storage-area networks, Fibre Channel switches and associated adapters and software while supporting more editors. Machine learning capabilities ensure that editors can access uncompressed 4K video with hybrid storage and 6K video using all-flash NVMe storage.
Big data isn’t just about larger storage capacity. It’s about greater performance and scale and the ability to support unstructured files. Qumulo has come up with a unique approach that allows organizations to tap the value of data while reducing costs and enabling real-time control.