The Full Wiki



More info on Cassandra (database)

Cassandra (database): Wikis

  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

Updated live from Wikipedia, last check: May 31, 2012 17:12 UTC (45 seconds ago)

From Wikipedia, the free encyclopedia

Apache Cassandra
Original author(s) Avinash Lakshman, Prashant Malik
Developer(s) Apache Software Foundation
Initial release 2008
Stable release 0.5.1 / February 26, 2010; 19 day(s) ago (2010-02-26)
Written in Java
Operating system Cross-platform
Available in English
Development status Active
Type Database
License Apache License 2 (free software)
Website cassandra.apache.org

Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project[1] designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature[2]. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure[3].

Cassandra provides a structured key-value store with eventual consistency[4]. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. Furthermore, columns are added only to specified keys, so different keys can have different numbers of columns in any given family. The values from a column family for each key are stored together, making Cassandra a hybrid between a column-oriented DBMS and a row-oriented store.

Contents

History

Cassandra was developed by Facebook to power their Inbox Search feature by Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik (Facebook Engineer). It was opensourced and released on Google code in July 2008[3]. In March 2009, it became an Apache Incubator project[5]. On February 17th, 2010 it graduated to a top-level project[1].

Features

Decentralized

Every node in the cluster is identical. There is no Single point of failure.

Fault-tolerant

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

Tunable consistency

Writes and reads offer a tunable ConsistencyLevel, all the way from "writes never fail" to "block for all replicas to be readable," with the quorum level in the middle.

Elasticity

Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.

Data model

A table in Cassandra is a distributed multi dimensional map indexed by a key. The value is an object which is highly structured. The row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation under a single row key is atomic per replica no matter how many columns are being read or written into. Columns are grouped together into sets called column families very much similar to what happens in the BigTable system. Cassandra exposes two kinds of columns families, Simple and Super column families. Super column families can be visualized as a column family within a column family.

Furthermore, applications can specify the sort order of columns within a Super Column or Simple Column family. The system allows columns to be sorted either by time or by name. Time sorting of columns is exploited by applications like Facebook Inbox Search where the results are always displayed in time sorted order. Any column within a column family is accessed using the convention column_family : column and any column within a column family that is of type super is accessed using the convention column_family : super_column : column.

Typically applications use a dedicated Cassandra cluster and manage them as part of their service. Although the system supports the notion of multiple tables all deployments have only one table in their schema.

Prominent users

  • Facebook uses Cassandra to power Inbox Search, with over 200 nodes deployed.[2]
  • Digg, the largest social news website, announced on Sep 9th, 2009 that it is rolling out its use of Cassandra.[6]
  • Twitter switched over to Cassandra because it can be run on large server clusters and is capable of taking in very large amounts of data at a time.[7]
  • Rackspace is known to use Cassandra internally[8]
  • Cisco's WebEx uses Cassandra to store user feed and activity in near real time [9]
  • IBM has done research in building a scalable email system based on Cassandra [10]
  • Reddit switched to Cassandra from memcacheDB on March 12th, 2010[11]
  • Cloudkick uses Cassandra's scalability to store billions of metrics [12]

See also

External links

References








Got something to say? Make a comment.
Your name
Your email address
Message
Please enter the solution to case below
12+12=