| Developer(s) | Yunhong Gu |
|---|---|
| Stable release | 1.24a / 2009-09-19 |
| Written in | C++ |
| Operating system | Cross-platform |
| Development status | Active |
| Type | Distributed File System |
| License | BSD License |
| Website | http://sector.sourceforge.net/ |
Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS/MapReduce stack. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming framework that supports generic MapReduce style parallel data processing. Additionally, Sector/Sphere is unique in its ability to operate in a wide area network (WAN) setting.
Contents |
Sector/Sphere consists of four components. The security server maintains the system security policies such as user accounts and the IP access control list. One or more master servers control operations of the overall system in addition to responding to various user requests. The slave nodes store the data files and process them upon request. The clients are the users' computers from which system access and data processing requests are issued.

Sector is a user space file system which relies on the local/native file system of each node for storing uploaded files.
Sector provides file system-level fault tolerance by replication, thus it does not require hardware fault tolerance such as RAID, which is usually very expensive. The default replication strategy is to place replicas as far from each other as possible so that the system can support data distribution over wide area networks.
Sector does not split user files into blocks; instead, a user file is stored intact on the local file system of one or more slave nodes. This means that Sector has a file size limitation that is application specific. The advantages, however, are that the Sector file system is very simple, and it leads to better performance in Sphere parallel data processing due to reduced data transfer between nodes. It also allows uploaded data to be accessible from outside the Sector system.
There are three methods for interacting with the Sector file system:
Sphere is a parallel data processing engine similar to MapReduce, but it uses generic User Defined Functions (UDFs) instead of the map and reduce functions. A UDF can be either a map function or a reduce function, or even others. Basically, Sphere disassembles MapReduce functions into generic UDFs and developers can choose and combine UDFs with superior flexibility.
Sphere uses hashing to simulate the Reduce operation. In a Sphere UDF, each data unit can be assigned a bucket ID, while data units with the same bucket ID will be put together into one destination file.
|
|