| Original author(s) | Avinash Lakshman, Prashant Malik |
|---|---|
| Developer(s) | Apache Software Foundation |
| Initial release | 2008 |
| Stable release | 0.5.1 / February 26, 2010 |
| Written in | Java |
| Operating system | Cross-platform |
| Available in | English |
| Development status | Active |
| Type | Database |
| License | Apache License 2 (free software) |
| Website | cassandra.apache.org |
Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project[1] designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature[2]. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure[3].
Cassandra provides a structured key-value store with eventual consistency[4]. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. Furthermore, columns are added only to specified keys, so different keys can have different numbers of columns in any given family. The values from a column family for each key are stored together, making Cassandra a hybrid between a column-oriented DBMS and a row-oriented store.
Contents |
Cassandra was developed by Facebook to power their Inbox Search feature by Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik (Facebook Engineer). It was opensourced and released on Google code in July 2008[3]. In March 2009, it became an Apache Incubator project[5]. On February 17th, 2010 it graduated to a top-level project[1].
Every node in the cluster is identical. There is no Single point of failure.
Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
Writes and reads offer a tunable ConsistencyLevel, all the way from "writes never fail" to "block for all replicas to be readable," with the quorum level in the middle.
Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
A table in Cassandra is a distributed multi dimensional map indexed by a key. The value is an object which is highly structured. The row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation under a single row key is atomic per replica no matter how many columns are being read or written into. Columns are grouped together into sets called column families very much similar to what happens in the BigTable system. Cassandra exposes two kinds of columns families, Simple and Super column families. Super column families can be visualized as a column family within a column family.
Furthermore, applications can specify the sort order of columns within a Super Column or Simple Column family. The system allows columns to be sorted either by time or by name. Time sorting of columns is exploited by applications like Facebook Inbox Search where the results are always displayed in time sorted order. Any column within a column family is accessed using the convention column_family : column and any column within a column family that is of type super is accessed using the convention column_family : super_column : column.
Typically applications use a dedicated Cassandra cluster and manage them as part of their service. Although the system supports the notion of multiple tables all deployments have only one table in their schema.
|
||||||||||||||||||||||||||
|
|