From the Blogosphere
The Hottest Panels of this Fall!
It’s probably a good idea to state I wrote this blog while employed by Amplidata
By: Tom Leyden
Nov. 8, 2012 09:00 AM
It’s probably a good idea to state I wrote this blog while employed by Amplidata, but during my own time. This article reflects my own opinion, not necessarily that of Amplidata or its partners.
As I am writing this, I am crossing the Atlantic for the seventh time in about two months. I’m on my way to CloudExpo West in Santa Clara, one of the few technology trade shows that are still growing. At the event I will be sitting on the last Object Storage for Big Data panel of the season. Robin Harris – aka StorageMojo – and I have been working hard this fall educating the industry on the benefits, challenges and opportunities of Object Storage. We’ve been trying to explain how the current generation of Object Storage platforms is so much different from the first attempt at it (EMC’s Centera), how it enables companies cope with the massive amounts of unstructured data that we are all generating and how companies can even monetize archived data by re-activating their archives.
Unlike StorageMojo and some other people who I have been working with lately, I don’t have decades of experience in the storage industry. However, being located in Belgium, I’ve had the privilege of working with people who used to be part of the Filepool team (and spent years at EMC after the acquisition). Those were the earliest object storage days, I had no idea of what was coming. Later, at Sun, I learned a lot about Object Storage when we were working on the Sun Cloud project. The architecture (ZFS) was different of what we are seeing on the market today, but the concept was – as was often the case at Sun – promising. This article is not another take at describing Object Storage and the benefits it brings, it’s more an overview of what we have learned at the past four Object Storage for Big Data panels. The setup for each of the panels was mostly the same: Robin Harris would challenge between 4 and 6 Object Storage specialists (technology vendors or users) and try to have the audience participate with. We did expect the topics of the panels to be different as we were hosted by trade shows with different audiences, but we never expected the discussions to vary as much as they did.
The common thread for each panel was the challenge companies have to store different types of Big Data and more particularly Big Unstructured Data. The latter represents up to 90% of the digital data that we will be generating over the next decades and will put traditional storage technologies under heavy stress as they are hitting their scalability limits. Unstructured data is currently mostly stored in file system based storage infrastructures. File systems will not only be unable to scale as required – try setting up a file structure for 5 petabytes of data – but they will also become obsolete as applications can provide a lot more features to keep your unstructured data organized (structured?), to analyze that information and potentially monetize what is today stored in (dead) tape archives. Rich applications that talk directly to a large and (infinitely) scalable storage pool make a lot more sense than maintenance-intensive files systems. Also, properly designed Object Storage (with erasure coding technology instead of RAID to protect the data) requires a lot less overhead, consumes a lot less power, can easily be implemented over multiple sites and does not require migration to new systems when a system cannot be further scaled. So what else did we discuss at the panels?
The first panel after summer was at Intel’s IDF in San Francisco. Panel members came from Intel and Quanta, who with Amplidata built an Object Storage reference architecture. We also had Michelle Munson of Aspera, who presented a couple of perfect use cases of Object Storage in the media and entertainment industry. Aspera developed a very smart way to transfer large amounts of data over the WAN in a much more efficient way than how it is currently done. Aspera’s bandwidth optimization software practically enables this new generation of Object Storage by taking away the latency issue, e.g. to stream high res movies over a long distance. Once we had explained the drivers for Object Storage, the opportunities and best practices, most of the discussion (questions from the audience) was about why RAID is not the right technology to architect an Object Storage platform with. We discussed the benefits of erasure coding in much detail and spent a lot of time on the differences with RAID. In short: in Erasure Coding based systems, all disks are equal (all parity) and there is no need to rebuild a disk when broken: when codes are lost due to bit errors or hardware failures, new codes can be generated spread over the whole pool, not just one system. A recent and very good independent deepdive in the Amplidata erasure coding technology can be found here.
A lot less RAID and erasure coding at the Createasphere DAM Show in New York a few weeks later. The show focusses on Digital Asset Management and the attendees are more interested in the applications and content than the actual data. That did not make the discussion any less interesting. From Sarah Berndt of Johnson Space Center we learned a *lot* about the importance of metadata, an issue that would be discussed at SNW Europe as well (see further). Interesting newcomer on the panel was Dalet, a DAM vendor who integrate with many Object Storage platforms and see a clear benefit of having their platform interface with a scale-out storage pool directly (REST) rather than through an additional file system. Dalet is the perfect valet in my car analogy that is becoming more and more popular: a file system is like a public parking lot where you have to go find your car yourself (this once took me a few hours in Paris’ CDG airport). Object storage is much more like valet parking, where you get a ticket when you leave your car and use that ticket to get it back later. The application, Dalet, is the valet.
At SNWUSA in Santa Clara in October we had David Chapa of Quantum on board for the firs time. David is an authority to explain the use cases where tape is the better alternative and when it is better to use Object Storage, or Wide Area Storage (WAS) as Quantum calls it. WAS is Quantum’s attempt to take away the confusion caused by the name Object Storage, a term first used by EMC almost a decade ago. I think it’s a good idea of Quantum to try to introduce a new term, I’m not sure WAS is the best choice though. Maybe something new will come up next month at Greg Duplessie’s Object Storage summit, although I doubt it. Once we kind of agreed that this generation of Object Storage, or whatever it will be called later, has very little or nothing to do with EMC’s product line that was most famous for locking-in customers, the conversation took a very sudden change. In an attempt to spice up the discussion, Ranajit Nevatia of Panzura claimed Object Storage provides very bad performance. This was very much true for the first generation of Object Storage platforms we just discussed and might be true of the platforms they currently promote (including Atmos, EMC’s second attempt at Object Storage), but not at all for the technologies that are most successful on the market today. Scality have been promoting their high IOPS (smaller files, IO intensive workloads). Amplidata focus more on large file storage, which is IMO the more obviouse use case for Object Storage, but I may be biassed. In a recent independent test, Amplidata demonstrated throughout numbers that can only be called “extremely high-performant”. Howard Marks confirmed Amplidata provides 1 GB/s of throughput with a single controller. But it gets better: Amplidatas scale throughput linearly by adding more controllers. So a system with 6 controllers provides 6 GB/s of throughput.
Last week’s panel at SNW Europe, which is traditionally well attended by press and analysts, was again very interactive. Robin Harris set the stage explaining how this generation of Object Storage is different from earlier products. This led to a lengthy discussion about API’s, a call for one standard API (I say let’s just all standardize on Amazon) and complaints about lock-ins by … yes, EMC. Vendors be warned, that trick is getting old and is not getting any respect. The audience included some of the better analysts and bloggers, including the451′s Simon Robinson and Storagebod. The latter, known for being a critic of the Object Storage paradigm (with great arguments), helped us bring the discussion to the next level by bringing up interesting topics such as the importance of metadata for the applications: who/what will enter metadata? The application? People? The panel acknowledged that, while applications already generate quite some metadata, companies will have to make business decisions on how much metadata they need. Adding more metadata comes at a cost as it will require manual work. The day after the panel, it was interesting to see Chris Mellor be critical of Object Storage in his review of the show (how dare the Object Storage vendors doubt the many benefits of tape?). Chris, join us on the panel next time!
Cloud Expo Breaking News
Top Stories for Cloud Expo 2012 East
In this Big Data Power Panel at the 10th International Cloud Expo, moderated by Cloud Expo Conference Chair Jeremy Geelan, Govind Rangasamy, Director of Product Management at Eucalyptus Systems; Kevin Brown; CEO of Coraid, Inc.; Christos Tryfonas, CTO and Co-Founder of Cetas; and Max Riggsbee, CMO and VP of Products for WhipTail, discussed such topics as: Big Data has existed since the early days of computing; why, then, do you think there is such an industry buzz around it right now? How is Big Data impacting storage and networking architecture in data centers? How about the intersection of Big Data Analytics and Cloud Computing - how big a sector is that and why? What's the difference between Big Data and Fast Data? ... (more)
Best Recent Articles on Cloud Computing & Big Data Topics
The Arlington, Virginia-based National Science Foundation has just released its "Report on Support for Cloud Computing" - in response to the America Competes Reauthorization Act of 2010, Section 524. It is an absolute must-read for all concerned with current and future research projects in Cloud Computing.
"The volume of data we're generating now from machines pales in comparison to the volume of data we'll soon generate from our own bodies," says data security expert Dave Asprey. Writing in a Trend Micro blog, Asprey - who is one of the leaders in the emerging Quantified Self movement - explains his vision of a world in which personal biometrical data is shared via the cloud.
Cloud computing has caught the attention of business leaders around the world in every industry because of its enormous transformative potential. Visionary companies know that the value of the cloud is far greater than the current focus solely on technology and operating costs: when combined with a collaborative approach to designing processes, cloud computing will change how we do business.
Want to make sense of the hottest new concept in Enterprise IT? Want to understand in just hours what experts have spent many hundreds of days deciphering? Cloud computing is a technology that has rapidly evolving peppered with a lot of hype along the way. Customers find it hard to navigate through this and make sense of what aspects of this technology will give them real business benefit. Cloud Computing Bootcamp, led by our 2013 Bootcamp Instructor Larry Carvalho, is a great way to get a practical understanding of this technology. We offer multiple days of actionable insight into what vendor offerings are currently available and help you comprehend their strategy. The ever-popular Bootcamp, which is now held regularly around the world, is being held in conjunction with the 12th Cloud Expo, June 10-13, 2013, at the Javits Center, New York, NY.
Did you know that ninety percent of the data in the world has been created in the last two years? Every day, we create 2.5 quintillion (or 2.518) bytes of data, according to IBM. As corporations across all industries globally are struggling with how to retain, aggregate and analyze this mounting volume of what the industry refers to as Big Data, it also provides a unique opportunity for innovative startups that recognize the business prospects Big Data presents. Big Data is not just unlocking new information but new sources of economic and business value. Interactivity is driving Big Data, with people and machines both consuming and creating it. Digital companies focused on becoming good at aggregating and analyzing the data created by the end users of their product, who then provide their customers with solid insights taken from that data are at a distinct competitive advantage over others in the marketplace.
Industry-specific clouds are those PaaS, IaaS, and PaaS services that are tailored for a specific vertical, such as transportation, retail, finance, and health care. IDC sees a $65 billion market in these industry solutions for 2013, rising to $100 billion in 2016. The value of industry-specific clouds is that businesses within a vertical can connect to applications, processes, and databases that are pre-defined for that vertical within a public or private cloud. They can extend processes and databases into the business domain, versus defining the data and processes within a generic cloud-based platform. So, are industry specific clouds right for your business? What options are out there? How do you figure out the ROI?
SYS-CON Events announced today that Rackspace Hosting, the open cloud company, has been named "Platinum Plus Sponsor" of SYS-CON's 12th International Cloud Expo, which will take place on June 10-13, 2013, at the Javits Center in New York City, New York. Rackspace® Hosting (NYSE: RAX) is the open cloud company, delivering open technologies and powering more than 205,000 customers worldwide. Rackspace provides its renowned Fanatical Support® across a broad portfolio of IT products, including Public Cloud, Private Cloud, Hybrid Hosting and Dedicated Hosting. Rackspace has been recognized by Bloomberg BusinessWeek as a Top 100 Performing Technology Company, is featured on Fortune's list of 100 Best Companies to Work For and is included on the Dow Jones Sustainability Index. Rackspace was positioned in the Leaders Quadrant by Gartner Inc. in the "2011 Magic Quadrant for Managed Hosting." Rackspace is headquartered in San Antonio with offices and data centers around the world.
10th International Cloud Expo, held on June 11-14, 2012 at the Javits Center in New York City, featured four content-packed days with a rich array of sessions about the business and technical value of cloud computing led by exceptional speakers from every sector of the cloud computing ecosystem. The Cloud Expo series is the fastest-growing Enterprise IT event in the past 10 years, devoted to every aspect of delivering massively scalable enterprise IT as a service. We invite you to enjoy our photo album of the show - we'll be adding new images all week.
Ulitzer.com announced "the World's 30 most influential Cloud bloggers," who collectively generated more than 24 million Ulitzer page views. Ulitzer's annual "most influential Cloud bloggers" list was announced at Cloud Expo, which drew more delegates than all other Cloud-related events put together worldwide. "The world's 50 most influential Cloud bloggers 2010" list will be announced at the Cloud Expo 2010 East, which will take place April 19-21, 2010, at the Jacob Javitz Convention Center, in New York City, with more than 5,000 expected to attend.
Cloud computing is becoming one of the next industry buzz words. It joins the ranks of terms including: grid computing, utility computing, virtualization, clustering, etc. Cloud computing overlaps some of the concepts of distributed, grid and utility computing, however it does have its own meaning if contextually used correctly. The conceptual overlap is partly due to technology changes, usages and implementations over the years. Trends in usage of the terms from Google searches shows Cloud Computing is a relatively new term introduced in the past year. There has also been a decline in general interest of Grid, Utility and Distributed computing. Likely they will be around in usage for quit a while to come. But Cloud computing has become the new buzz word driven largely by marketing and service offerings from big corporate players like Google, IBM and Amazon.
SYS-CON Events announced today that Dell Inc. has been named "Silver Sponsor" of SYS-CON's 12th International Cloud Expo, which will take place on June 10-13, 2013, at the Javits Center in New York City, New York. For more than 28 years, Dell has empowered countries, communities, customers and people everywhere to use technology to realize their dreams. Customers trust Dell to deliver technology solutions that help them do and achieve more, whether they're at home, work, school or anywhere in their world. Learn more about Dell's story, purpose and people behind its customer-centric approach.
One of the most compelling promises of the cloud is that you can pull out a credit card and be working in minutes. No purchase orders to fill out, no equipment to wait for on the loading dock. Just instant access to the resources you need, when you need them. But accessibility comes at a price, and an unintentional consequence may be that you create yet another orphaned identity silo. Enterprise IT has spent years consolidating its mishmash of directories, only to discover that cloud now threatens to turn back their hard-won victories. In his session at the 12th International Cloud Expo, Scott Morrison, CTO and Chief Architect at Layer 7 Technologies, will look at strategies to incorporate identity into cloud applications. Enterprise identity or social login can both be a part of your go-to-cloud strategy, but you must plan for this upfront, rather than try to retrofit identity and access control at a later date.
Cloud Expo, Cloud Expo East, Cloud Expo West, Cloud Expo Silicon Valley, Cloud Expo Europe, Cloud Expo Tokyo, Cloud Expo Prague, Cloud Expo Hong Kong, Cloud Expo Sao Paolo are trademarks and /or registered trademarks (USPTO serial number 85009040) of Cloud Expo, Inc.
The World's Most Influential Blogs