My name is Philipp C. Heckel and I write about nerdy things.
This site moved here recently from blog.philippheckel.com!

Hybrid Clouds: A Comparison of Cloud Toolkits


Distributed Systems, Virtualization

Hybrid Clouds: A Comparison of Cloud Toolkits


In the last few years, the importance of the Internet has risen constantly and made it indispensable for businesses and most individuals to be on-line around the clock. One of the greatest drivers of this development was and still is the shift of the traditional one-to-many Web to an advanced, participatory version of the Word Wide Web. Rather than only making editorial information accessible to many users, the Web 2.0 encourages participation and enables user generated contributions. Leveraging this new paradigm, services like Flickr, Facebook, or Twitter have become very prominent examples for this development.

An essential part of this evolution, but mostly hidden to the end-consumer, is the set of tools that enable these large scale applications. Cloud computing is a relatively new technology that serves as underlying architecture for most of these platforms. By providing virtualized computing resources as a service in a pay-as-you-go manner, cloud computing enables new business models and cost effective resource usage. Instead of having to maintain their own data center, companies can concentrate on their core business and purchase resources when needed. Especially when combining a privately maintained virtual infrastructure with publicly accessible clouds in a hybrid cloud, the technology can open up new opportunities for businesses and help consolidating resources.
However, since cloud computing is a very new term, there are as many definitions of its components as there are opinions about its usefulness. Most of the corresponding technologies are only a few years old and the toolkits lack of maturity and interoperability.

This article introduces the basic concepts of cloud computing and discusses the technical requirements for setting up a hybrid cloud. It briefly looks into security concerns and outlines the status quo of current cloud technologies. In particular, it evaluates several existing cloud toolkits regarding its requirements, occurring problems and interoperability.


Contents


Download as PDF: This article is a slightly shortened version of my seminar paper. Feel free to download the original PDF version, or the presentation slides.


1. Cloud Computing

1.1. Status Quo

As newest concept in the development of distributed computing, cloud computing is often believed to be “the next step in the evolution of the Internet” (Open Cloud Manifesto). As foundation and enabler for Software as a Service, it delivers computing resources over the Internet and provides elastic scalability for any kind of application. While cluster and grid computing already allowed multiple computers to work together on complex tasks in a distributed manner, the cloud concept extends this idea even further: instead of regarding individual machines, cloud computing treats resources as a utility. That is computing time and storage are provisioned on-demand and paid per usage without the need for any upfront commitment.

As one of the first commercial providers of cloud services, Amazon launched a beta version of its Elastic Computing Cloud (EC2) in August 2006 and announced production stability in October 2008. Google followed with a public beta of App Engine in April 2008, and Microsoft made its cloud platform Windows Azure publicly available in February 2010. The well known alternatives to the commercial solutions are several open source cloud toolkits. Prominent examples include OpenNebula, a project started by researchers of the University of Chicago and Madrid in 2008, as well as the Eucalyptus cloud software, initiated by the University of California, Santa Barbara in 2007.

Even though there are already many commercial and open source cloud solutions, all of them are fairly young and have yet to prove their acceptance and durability. According to the Open Cloud Manifesto, the technology “is still in its early stages, with much to learn and more experimentation to come“. This particularly includes challenges that yet need to be overcome, e.g. data security within the cloud, or interoperability between different clouds.

1.2. Definitions and Key Characteristics

Due to the actuality of the topic, there are several opinions about what cloud computing and its corresponding terms comprises. Some experts see the technology as “one of the foundations of next generation computing” (Tim O’Reilly, CEO of O’Reilly Media, 2008), others believe that the term is just a buzz word to define “everything that we currently do” (Larry Ellison, CEO of Oracle Corp., 2008).

However, while the term is being criticized, there are still many intersecting definitions describing the technology. Some are broader than others and include not only the technical part, but also the services enabled by the cloud, i.e. SaaS applications. For the Laboratory of Distributed Systems of the University of California, Berkeley, for instance, “cloud computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the data centers that provide those services” (Armbrust et al., 2009). Not contrary to that definition, but limiting the term to hardware and software, IBM describes cloud computing as “an emerging computing paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices over the Internet” (O’Neill, 2009).

In this article, cloud computing is defined by the three key characteristics shared by many experts:

  • Resource abstraction: resources inside the cloud are not directly observable by the cloud user, but are virtualized using technologies like Xen or KVM, and can be accessed via an application programmable interface (API).
  • Elastic capacity: the cloud appears to users as a pool of infinite capacity, being able to allocate and free resources on-demand. Scaling up and down avoids over- and under-utilization and thereby allows an optimal load.
  • Utility-based pricing: storage, CPU time and network bandwidth are charged by the hour using a pay-per-use pricing model. Resources can be allocated almost instantaneously without any upfront commitment.

1.3. Classifications

Similar to the attempts to define the term cloud computing, the categorization of it is rather difficult, if not “impossible in the currently rapid evolution of the cloud landscape” (Lenk et al., 2009). However, many papers classify cloud systems by their level of abstraction and their exposure to the Internet.

1.3.1. Service Models: Abstraction Classes

In order create the illusion of infinite resources and elasticity, virtualization technology is needed. Depending on how abstracted resources are, different service models are differentiated (cmp. NIST, 2009, and Armbrust et al., 2009):

  • Software as a Service (SaaS): at the highest level of abstraction, users are mostly unaware of the fact that are using cloud-enabled applications, and are hence not able to control the underlying resources. Instead, they simply use client interfaces such as web browsers. A popular example is the salesforce.com CRM system.
  • Platform as a Service (PaaS): users are able to develop and deploy applications within the provider’s hosting environment, e.g. a Java application framework. Low-level resources are not controlled by the cloud user. Prominent example is the Google App Engine.
  • Infrastructure as a Service (IaaS): at the lowest level of abstraction, cloud users have access to virtualized resources such as processing time, networking or storage. They are provided virtual machines and can run arbitrary software. Famous example is Amazon EC2.
1.3.2. Deployment Models: Exposure Classes

Not every cloud is available for public use: depending on the level of exposure to the Internet, the following deployment models are differentiated (NIST, 2009):

  • Public Cloud: the cloud infrastructure is publicly accessible via an API. It is hosted by a cloud provider who sells its capacity using a pay-per-use payment model. All of the above mentioned examples are public clouds.
  • Private Cloud: the cloud infrastructure is hosted within the data center of an organization and used by local users only. It focuses on providing a flexible virtualized infrastructure rather than on selling the cloud resources.
  • Hybrid Cloud: the hybrid cloud approach extends the private cloud model by using both local and remote resources. It is usually used to handle flash crowds by scaling out when the local capacity is exhausted. This so called cloudbursting enables highly elastic environments. The key difference between private and hybrid clouds is “the extension of service provider-oriented low cost cloud storage to the enterprise” (Lesem, 2010). That is remote cloud resources are seamlessly integrated in the private cloud, and thereby create a hybrid cloud.

>> Next chapter: Hybrid Clouds

Pages:1 234>

Leave a comment

I'd very much like to hear what you think of this post. Feel free to leave a comment. I usually respond within a day or two, sometimes even faster. I will not share or publish your e-mail address anywhere.