The Full Wiki

Sector/Sphere: Wikis


Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

Updated live from Wikipedia, last check: June 04, 2012 14:31 UTC (42 seconds ago)

From Wikipedia, the free encyclopedia

Sector/Sphere Distributed Data Storage and Processing System
Developer(s) Yunhong Gu
Stable release 1.24a / 2009-09-19; 3 months ago
Written in C++
Operating system Cross-platform
Development status Active
Type Distributed File System
License BSD License
Website http://sector.sourceforge.net/

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS/MapReduce stack. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming framework that supports generic MapReduce style parallel data processing. Additionally, Sector/Sphere is unique in its ability to operate in a wide area network (WAN) setting.

Contents

Architecture

Sector/Sphere consists of four components. The security server maintains the system security policies such as user accounts and the IP access control list. One or more master servers control operations of the overall system in addition to responding to various user requests. The slave nodes store the data files and process them upon request. The clients are the users' computers from which system access and data processing requests are issued.

Sector-arch.jpg

Sector Distributed File System

Sector is a user space file system which relies on the local/native file system of each node for storing uploaded files.

Sector provides file system-level fault tolerance by replication, thus it does not require hardware fault tolerance such as RAID, which is usually very expensive. The default replication strategy is to place replicas as far from each other as possible so that the system can support data distribution over wide area networks.

Sector does not split user files into blocks; instead, a user file is stored intact on the local file system of one or more slave nodes. This means that Sector has a file size limitation that is application specific. The advantages, however, are that the Sector file system is very simple, and it leads to better performance in Sphere parallel data processing due to reduced data transfer between nodes. It also allows uploaded data to be accessible from outside the Sector system.

There are three methods for interacting with the Sector file system:

  • Sector provides an API for application development which allows user applications to interact directly with Sector,
  • Sector comes prepackaged with a set of command-line tools for accessing the file system, and
  • Sector supports the FUSE interface; presenting a mountable file system that is accessible via standard command-line tools.

Sphere Parallel Data Processing Engine

Sphere is a parallel data processing engine similar to MapReduce, but it uses generic User Defined Functions (UDFs) instead of the map and reduce functions. A UDF can be either a map function or a reduce function, or even others. Basically, Sphere disassembles MapReduce functions into generic UDFs and developers can choose and combine UDFs with superior flexibility.

Sphere uses hashing to simulate the Reduce operation. In a Sphere UDF, each data unit can be assigned a bucket ID, while data units with the same bucket ID will be put together into one destination file.

References

External links








Got something to say? Make a comment.
Your name
Your email address
Message
Please enter the solution to case below
12+12=