Database: Wikis

  
  
  
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Did you know ...


More interesting facts on Database

Include this on your site/blog:

Encyclopedia

From Wikipedia, the free encyclopedia

A database is a collection of data for one or more multiple uses. One way of classifying databases involves the type of content, for example: bibliographic, full-text, numeric, image. Other classification methods start from examining database models or database architectures: see below. Software organizes the data in a database according to a database model. As of 2010 the relational model occurs most commonly. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.

Contents

Architecture

A number of database architectures exist. Many databases use a combination of strategies.

Databases consist of software-based "containers" that are structured to collect and store information so users can retrieve, add, update or remove such information in an automatic fashion. Database programs are designed for users so that they can add or delete any information needed. The structure of a database is tabular, consisting of rows and columns of information.

Online Transaction Processing systems (OLTP) often use a "row oriented" or an "object oriented" data store architecture, whereas data-warehouse and other retrieval focused applications like Google's BigTable, or bibliographic database (library catalog) systems may use a Column oriented DBMS architecture.

Document-Oriented, XML, knowledgebases, as well as frame databases and RDF-stores (also known as triple stores), may also use a combination of these architectures in their implementation.

Not all databases have or need a database schema ("schema-less databases").

Over many years general-purpose database systems have dominated the database industry. These offer a wide range of functions, applicable to many, if not most circumstances in modern data processing. These have been enhanced with extensible datatypes (pioneered in the PostgreSQL project) to allow development of a very wide range of applications.

There are also other types of databases which cannot be classified as relational databases. Most notable is the object database management system, which stores language objects natively without using a separate data definition language and without translating into a separate storage schema. Unlike relational systems, these object databases store the relationship between complex data types as part of their storage model in a way that does not require runtime calculation of related data using relational algebra execution algorithms.

Database management systems

A database management system (DBMS) consists of software that organizes the storage of data. A DBMS controls the creation, maintenance, and use of the database storage structures of social organizations and of their users. It allows organizations to place control of organization wide database development in the hands of Database Administrators (DBAs) and other specialists. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way.

Database management systems are usually categorized according to the database model that they support, such as the network, relational or object model. The model tends to determine the query languages that are available to access the database. One commonly used query language for the relational database is SQL, although SQL syntax and function can vary from one DBMS to another. A common query language for the object database is OQL, although not all vendors of object databases implement this, majority of them do implement this method. A great deal of the internal engineering of a DBMS is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between the products.

A relational database management system (RDBMS) implements features of the relational model. In this context, Date's "Information Principle" states: "the entire information content of the database is represented in one and only one way. Namely as explicit values in column positions (attributes) and rows in relations (tuples). Therefore, there are no explicit pointers between related tables." This contrasts with the object database management system (ODBMS), which does store explicit pointers between related types.

Components of DBMS

According to the wikibooks open-content textbooks, "Design of Main Memory Database System/Overview of DBMS", most DBMS as of 2009 implement a relational model. Other less-used DBMS systems, such as the object DBMS, generally operate in areas of application-specific data management where performance and scalability take higher priority than the flexibility of ad hoc query capabilities provided via the relational-algebra execution algorithms of a relational DBMS.

RDBMS components

  • Interface drivers - A user or application program initiates either schema modification or content modification. These drivers are built on top of SQL. They provide methods to prepare statements, execute statements, fetch results, etc. Examples include DDL, DCL, DML, ODBC, and JDBC. Some vendors provide language-specific proprietary interfaces. For example MySQL provides drivers for PHP, Python, etc.
  • SQL engine - This component interprets and executes the SQL query. It comprises three major components (compiler, optimizer, and execution engine).
  • Transaction engine - Transactions are sequences of operations that read or write database elements, which are grouped together.
  • Relational engine - Relational objects such as Table, Index, and Referential integrity constraints are implemented in this component.
  • Storage engine - This component stores and retrieves data records. It also provides a mechanism to store metadata and control information such as undo logs, redo logs, lock tables, etc.

ODBMS components

  • Language drivers - A user or application program initiates either schema modification or content modification via the chosen programming language. The drivers then provide the mechanism to manage object lifecycle coupling of the application memory space with the underlying persistent storage. Examples include C++, Java, .NET, and Ruby.
  • Query engine - This component interprets and executes language-specific query commands in the form of OQL, LINQ, JDOQL, JPAQL, others. The query engine returns language specific collections of objects which satisfy a query predicate expressed as logical operators e.g >, <, >=, <=, AND, OR, NOT, GroupBY, etc.
  • Transaction engine - Transactions are sequences of operations that read or write database elements, which are grouped together. The transaction engine is concerned with such things as data isolation and consistency in the driver cache and data volumes by coordinating with the storage engine.
  • Storage engine - This component stores and retrieves objects in an arbitrarily complex model. It also provides a mechanism to manage and store metadata and control information such as undo logs, redo logs, lock graphs,

Primary tasks of DBMS packages

  • Database Development: used to define and organize the content, relationships, and structure of the data needed to build a database.
  • Database Interrogation: can access the data in a database for information retrieval and report generation. End users can selectively retrieve and display information and produce printed reports and documents.
  • Database Maintenance: used to add, delete, update, correct, and protect the data in a database.
  • Application Development: used to develop prototypes of data entry screens, queries, forms, reports, tables, and labels for a prototyped application. Or use 4GL or 4th Generation Language or application generator to develop program codes.

Types

Operational database

These databases store detailed data needed to support the operations of an entire organization. They are also called subject-area databases (SADB), transaction databases, and production databases. For example:

  • customer databases
  • personal database
  • inventory databases
  • accounting databases

Analytical database

These databases store data and information extracted from selected operational and external databases. They consist of summarized data and information most needed by an organization's management and other end-users. Some people refer to analytical databases as multidimensional databases, management databases, or information databases.

Data warehouse

A data warehouse stores data from current and previous years — data extracted from the various operational databases of an organization. It becomes the central source of data that has been screened, edited, standardized and integrated so that it can be used by managers and other end-user professionals throughout an organization. Data warehouses are characterized by being slow to insert into but fast to retrieve from. Recent developments in data warehousing have led to the use of a Shared nothing architecture to facilitate extreme scaling.

Distributed database

These are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include segments of both common operational and common user databases, as well as data generated and used only at a user’s own site.

End-user database

These databases consist of a variety of data files developed by end-users at their workstations. Examples of these are collections of documents in spreadsheets, word processing and even downloaded files.

External database

These databases provide access to external, privately-owned data online — available for a fee to end-users and organizations from commercial services. Access to a wealth of information from external database is available for a fee from commercial online services and with or without charge from many sources in the Internet.

Hypermedia databases on the web

These are a set of interconnected multimedia pages at a web-site. They consist of a home page and other hyperlinked pages[citation needed] of multimedia or mixed media such as text, graphic, photographic images, video clips, audio etc.

Navigational database

In navigational databases, queries find objects primarily by following references from other objects. Traditionally navigational interfaces are procedural, though one could characterize some modern systems like XPath as being simultaneously navigational and declarative.

In-memory databases

In-memory databases primarily rely on main memory for computer data storage. This contrasts with database management systems which employ a disk-based storage mechanism. Main memory databases are faster than disk-optimized databases since[citation needed] the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory provides faster and more predictable performance than disk. In applications where response time is critical, such as telecommunications network equipment that operates emergency systems, main memory databases are often used.

Document-oriented databases

Document-oriented databases are computer programs designed for document-oriented applications. These systems may be implemented as a layer above a relational database or an object database. As opposed to relational databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, they store each record as a document that has certain characteristics. Any number of fields of any length can be added to a document. Fields can also contain multiple pieces of data.

Real-time databases

A real-time database is a processing system designed to handle workloads whose state may change constantly. This differs from traditional databases containing persistent data, mostly unaffected by time. For example, a stock market changes rapidly and dynamically. Real-time processing means that a transaction is processed fast enough for the result to come back and be acted on right away. Real-time databases are useful for accounting, banking, law, medical records, multi-media, process control, reservation systems, and scientific data analysis. As computers increase in power and can store more data, real-time databases become integrated into society and are employed in many applications.

Relational Database

The standard of business computing as of 2009, relational databases are the most commonly used database today[citation needed]. It uses the table to structure information so that it can be readily and easily searched through.

Models

Post-relational database models

Products offering a more general data model than the relational model are sometimes classified as post-relational[1]. The data model in such products incorporates relations but is not constrained by the Information Principle, which requires the representation of all information by data values in relation to it.

Some of these extensions to the relational model actually integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with trees on the nodes.

Some products implementing such models do so by extending relational database systems with non-relational features. Others, however, have arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational in their current architecture.

Object database models

In recent years, the object-oriented paradigm has been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology leads to new programming models known as object databases. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

Storage structures

Databases may store relational tables/indexes in memory or on hard disk in one of many forms:

These have various advantages and disadvantages - discussed further in the articles on each topic. The most commonly used[citation needed] are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use virtual memory-mapped files to make the native language (C++, Java etc.) objects persistent. This can be highly efficient but it can make multi-language access more difficult. Others break the objects down into fixed- and varying-length components that are then clustered tightly together in fixed sized blocks on disk and reassembled into the appropriate format either for the client or in the client address space. Another popular technique involves storing the objects in tuples (much like a relational database) which the database server then reassembles for the client.

Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. Memory management and storage topology can be important design choices for database designers as well. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries.[2]

Indexing

All of these databases can take advantage of indexing to increase their speed. This technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common[citation needed] kind of index uses a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be quickly located. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as B-trees, hashes, and linked lists). Usually, a database designer selects specific techniques to increase efficiency in the particular case of the type of index required.

Most relational DBMSs and some object DBMSs have the advantage that indexes can be created or dropped without changing existing applications making use of them, The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes act transparently to the application or end-user querying the database; while they affect performance, any SQL command will run with or without indexes to compute the result of an SQL statement. The RDBMS will produce a query plan of how to execute the query: often generated by analyzing the run times of the different algorithms and select the quickest process. Some of the key algorithms that deal with joins are nested loop join, sort-merge join and hash join. Which of these an RDBMS selects may depend on whether an index exists, what type it is, and its cardinality.

An index speeds up access to data, but it has disadvantages as well. First, every index increases the amount of storage used on the hard drive which is also necessary for the database file, and second, the index must be updated each time the data are altered, and this costs time. (Thus an index saves time in the reading of data, but it costs time in entering and altering data. It thus depends on the use to which the data are to be put whether an index is overall a net plus or minus in the quest for efficiency.)

A special case of an index is a primary index based on a primary key: a primary index must ensure a unique reference to a record. Often, for this purpose one simply uses a running index-number (ID number). Primary indexes play a significant role in relational databases, and they can speed up access to data considerably.

Transactions and concurrency

In addition to their data model, most practical databases ("transactional databases") attempt to enforce database transactions. Ideally, the database software should enforce the ACID rules, summarized here:

  • Atomicity: Either all the tasks in a transaction must happen, or none of them. The transaction must be completed, or else it must be undone (rolled back).
  • Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot leave the data in a contradictory state.
  • Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction must remain invisible to other transactions.
  • Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes.

In practice, many DBMSs allow the selective relaxation of most of these rules — for better performance.

Concurrency control ensures that transactions execute in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

Replication

Replication of databases often relates closely to transactions. If a database can log its individual actions, one can create a duplicate of the data in real time. DBAs can use the duplicate to improve performance and/or the availability of the whole database system.

Common replication concepts include:

  • master/slave Replication: All write-requests are performed on the master and then replicated to the slave(s)
  • quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas
  • multimaster: Two or more replicas sync each other via a transaction identifier

Parallel synchronous replication of databases enables the replication of transactions on multiple servers simultaneously, which provides a method for backup and security as well as data availability. This is commonly referred to as "database clustering".[3] [4]

Security

Database security denotes the system, processes, and procedures that protect a database from unintended activity. Enforcing security is one of the major tasks of the DBA.

DBMSs usually enforce security through access control, auditing, and encryption:

  • Access control ensures and restricts who can connect and what they can do to the database.
  • Auditing logs what action or change has been performed, when and by whom.
  • Encryption: many commercial databases include built-in encryption mechanisms to encode data natively into tables and to decipher information "on the fly" when a query comes in. DBAs can also secure and encrypt connections if required using DSA, MD5, SSL or legacy encryption standards.

In the United Kingdom, legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. Organizations based in the United Kingdom and holding personal data in electronic format (databases for example) must register with the Data Commissioner.[5]

Locking

Databases handle multiple concurrent operations with locking. This is how concurrency and some form of basic integrity are managed within the database system. Such locks can be applied on a row level, or on other levels like page (a basic data block), extent (multiple array of pages) or even an entire table. This helps maintain the integrity of the data by ensuring that only one process at a time can modify the same data.

In basic filesystem files or folders, only one lock at a time can be set[citation needed], restricting the usage to one process only. Databases, on the other hand, can set and hold mutiple locks at the same time on the different levels of the physical data structure. The database engine locking scheme determines how to set and maintain locks based on the submitted SQL or transactions by the users. Generally speaking, any activity on the database should involve some or extensive locking.

As of 2009 most DBMS systems use shared and exclusive locks. Exclusive locks mean that no other lock can acquire the current data object as long as the exclusive lock lasts. DBMSs usually set exclusive locks when the database needs to change data, as during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current data structure.[6] Shared locks are usually used while the database is reading data (during a SELECT operation). The number, nature of locks and time the lock holds a data block can have a huge impact on the database performances. Bad locking can lead to disastrous performance response (usually the result of poor SQL requests, or inadequate database physical structure)

The isolation level of the data server enforces default locking behavior. Changing the isolation level will affect how shared or exclusive locks must be set on the data for the entire database system. Default isolation is generally 1, where data can not be read while it is modified, forbidding the return of "ghost data" to end users.

At some point intensive or inappropriate exclusive locking can lead to a "deadlock" situation between two locks, where none of the locks can be released because they try to acquire resources mutually from each other. The database should have a fail-safe mechanism which will automatically "sacrifice" one of the locks, thus releasing the resource. Processes or transactions involved in the "deadlock" get rolled back.

Databases can also be locked for other reasons, like access restrictions for given levels of user. Some DBAs also lock databases for routine maintenance, which prevents changes being made during the maintenance. See "Locking tables and databases" (section in some documentation / explanation from IBM) for more detail.) However, many modern databases do not lock the database during routine maintenance. e.g. "Routine Database Maintenance" for PostgreSQL.

Applications

Databases function in many applications, spanning virtually the entire range of computer software. Databases have become the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common API to retrieve the information stored in a database. Commonly used database APIs include JDBC and ODBC.

Databases in new media

Within new media, databases collect items on which the user can carry out various operations such as viewing, navigating, create, and searching. Though there are various types of items within the database, each item has the same significance. Unlike a narrative or film, the collections are computerized and therefore may offer a unique experience with each view. This form of data may present a unique presentation of what the world is like. Databases can be seen as a symbolic form of the computer age.[citation needed]

See also

References

  1. ^ Introducing databases by Stephen Chu, in Conrick, M. (2006) Health informatics: transforming healthcare with technology, Thomson, ISBN 0-17-012731-1, p. 69.
  2. ^ Lightstone 2007, p. ?.
  3. ^ MySQL Cluster
  4. ^ Oracle Real Application Cluster (RAC)
  5. ^ Information Commissioner's Office - ICO
  6. ^ "Information on Shared Locks"

Further reading

  • Ling Liu and Tamer M. Özsu (Eds.) (2009). "Encyclopedia of Database Systems, 4100 p. 60 illus. ISBN 978-0-387-49616-0. Table of Content available at http://refworks.springer.com/mrw/index.php?id=1217
  • Beynon-Davies, P. (2004). Database Systems. 3rd Edition. Palgrave, Houndmills, Basingstoke.
  • Connolly, Thomas and Carolyn Begg. Database Systems. New York: Harlow, 2002.
  • Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
  • Galindo, J.; Urrutia, A.; Piattini, M. Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
  • Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
  • Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
  • Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
  • Kroenke, David M. and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
  • Lightstone, S.; T. Teorey and T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12-369389-6.
  • O'Brien, James. "Management Information Systems". New York 1999
  • Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
  • Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
  • Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.
  • Manovich, Lev.Database as a Symbolic Form, Cambridge: MIT press, 2001

External links


Simple English

A database is a system for organizing data. It is a collection of raw data that can be manipulated, sorted, and questioned to produce information. The data can be stored in many ways. Before computers, card files, printed books and other methods were used. Now most data is kept on computer files.

A database system is a computer program for managing electronic databases. A very simple example of a database system would be an electronic address book.

The data in a database is organized in some way. Before there were computers, employee data was often kept in file cabinets. There was usually one card for each employee. On the card, information such as the date of birth or the name of the employee could be found. A database also has such "cards". To the user, the card will look the same as it did in old times, only this time it will be on the screen. To the computer, the information on the card can be stored in different ways. Each of these ways is known as a database model. The most commonly used database model is called relational database model; it uses relations and sets to store the data. Normal users talking about the database model will not talk about relations, they will talk about database tables.

Contents

Database systems are used for:

Database systems are used for:

  • They store data
  • They store special information used to manage the data. This information is called metadata and it is not shown to all the people looking at the data.
  • They can solve cases where many users want to access (and possibly change) the same entries of data.
  • They manage access rights (who is allowed to see the data, who can change it)
  • When there are many users asking questions to the database, the questions must be answered faster. So, the last person to ask a question, can get an answer in reasonable time.
  • Certain attributes are more important than others, they can be used to find other data. This is called indexing. An index contains all the important data and can be used to find the other data.
  • They ensure that the data always makes sense. There are certain rules that can be added to tell the database system if the data makes sense. One of the rules might say November has 30 days. This means if someone wants to enter November 31 as a date, this change will be rejected.

Changing data

In databases, some data changes occasionally. There may be problems when data is changed, an error might have occcurred. The error might make the data useless. The database system looks at the data, it must fulfill certain requirements. It does this by using a transaction. There are two points in time in the database, the time before the data was changed, and the time after the data was changed. If something goes wrong when changing the data, the database system simply puts the database back into the state before the change happened. This is called rollback. After all the changes are done successfully, they are committed. This means that the data makes sense again; committed changes can no longer be undone

In order to be able to do this, databases follow the ACID principle:

  • All. Either all tasks of a given set (called transaction) are done, or none of them is. Known as Atomicity
  • Complete. The data in the database always makes sense. There is no half-done (invalid) data. Known as Consistency
  • independent. If many people work on the same data, they will not see (or impact) each other. Each of them has their own view of the database, which is independent of the others. Known as Isolation
  • Done. Transactions must be committed, when they are done. Once the committed, they can not be undone. Known as Durability.

Database model

There are different ways how to represent the data.

  • Simple files (called flat files): Simply write the data into a spreadsheet, then save it in the database program.
  • Hierarchical model: The data is organized like a tree structure. The interesting data is at the leaves of the tree
  • Network model: Use records and sets to store the data
  • Relational model: This uses set theory and predicate logic. It is widely used. Data looks like it is organized in tables. These tables can then be joined together so that simple queries can be chosen from them.
  • Object relational model: This uses the same data types for the database, as for the (object-oriented) application.

Ways to organize the data

Like in real life, the same data can be looked at from different perspectives, and it can be organized in different ways. There are different things to consider, when organizing the data:

  • Each item of data should be stored as few times as possible. Imagine that an unmarried woman is listed in the county records, State Motor Vehicle Dept, Federal Social Security Dept and International Passport Dept. Once she marries, and decides to change her name, all the these departments have to be notified. If all the departments were linked, and her name stored in only one place, then updating is easy.
  • If the data is stored in several different databases, it may contradict itself.
  • This problem makes finding data slower. If there is a lot of data, this problem of storing one piece of data in many places, will take up a lot of space. In our example there were 4 databases for one person. That will be 8 changes made, if a second person has the exact same problem.
  • If you have this problem, a method called Database Normalisation was developed to solve it. Currently there are 5 Normal forms. These are ways to make a database faster, and make the data take less space.








Got something to say? Make a comment.
Your name
Your email address
Message