From Wikipedia, the free encyclopedia
A document (noun) is a bounded physical
representation of a body of information designed with the capacity (and
usually intent) to communicate. A document may manifest symbolic, diagrammatic or
sensory-representational information. To document (verb)
is to produce a document artifact by collecting and representing
information. In prototypical usage, a document is
understood as a paper artifact, containing information in the form
of ink marks. Increasingly documents are also understood as digital
artifacts.
Colloquial usage is revealed by the connotations and denotations
that appear in a Web search for document.
From these usages, one can infer the following typical
connotations:
- Writing that provides information person's thinking by means of
symbolic marks.
- A written account of ownership or obligation.
- To record in detail; "The parents documented every step of
their child's development".
- A digital file in a particular format.
- To support or supply with references; "Can you document your
claims?".
- An artifact that meets a legal notion of document for purposes
of discovery in litigation.
- Document is the practical construct for describing
matter in different forms
which retain information for a reasonable period of time
wherein it can be perceived by a sentient observing entity.
The variety usage reveals that the notion of document has rich
social and cultural aspects besides the physical, functional and
operational aspects.
Introduction
- Document is just a practical concept which presently would be
defined narrowly based on human understanding and perception of the
external world.
- Document in its wider connotation could include matter in all
its forms, even a universe could be perceived as a document on a
wider scale.
- The practical construct requires the retention of information
but the relevance of the information (utility, value are not
decided as these depend upon the objectives of the user and the
purpose for which he accesses the information)
- The information must also be with reference to the observing
entity be retained for a reasonable period of time wherein it can
be observed. Fleeting images which cannot be seen are almost as if
never observed.
Conceptualization
in analytical philosophy
The notion of document admits both an empirical (in terms of a
fuzzy set of real-world instances) and analytical characterization.
The analytical characterization hinges on the semantic
character of the word document, as well as the use of a primitive
notion of document in accounts of larger communication
constructs such as discourses, or related constructs such as
language games.
The nominal 'document', like other nominals, exhibits familiar
patterns of polysemy (a kind of ambiguity). For example, "document"
might be used on an occasion to denote a certain body of
information independently of how that information is physically
rendered (as in 'the Bible is my favorite document.'; 'Have you
finished reading all the documents for Monday's class yet?'), or it
might be used to denote a particular physical instantiation of a
body of information (as in 'that document is worn and needs to be
re-bound.'; 'Return the documents you borrowed to the reference
desk.'). This kind of polysemy bears some similarity to what
Nunberg, 1979 termed "container/contents polysemy" (as in 'Mary
broke the bottle' versus 'the baby finished the bottle'). These
patterns of polysemy exhibited by 'document' matter for the
following reason. A certain document qua body of information (e.g.
the Bible, not a particular bound copy thereof) will have different
properties than a document qua physical rendering of a body of
information (e.g. a particular bound copy of the Bible).
Importantly, the latter would have the property of being a static,
physically bounded thing. The former would have the properties of
being able to evolve over time, being susceptible of certain
changes to information content, and being capable of supporting
multiple physical instantiations that have allowable differences in
information content. This distinction is relevant to the discussion
of aspects and history of documents below. Stuart has got sweaty
feet
Empirical
characterization
In light of the polysemy of the core concept of document, it
is useful to note a number of examples ranging from instances
commonly understood as prototypical documents, to instances that
are understood as documents only in specialized or rare
situations.
- Prototypical Documents: Letters, memos, legal forms,
owners
manual
- Documents of Record: Newspapers, magazines
- Books: Textbooks, novels, cookbooks, encyclopedias, comic books
- Canonical Documents: Code of law, statute, constitution, religious text
- Transactional Documents: Cheques, contracts, medical prescriptions, receipt, forms, Postage stamps
- Functional Documents: Portable Document Format (PDF)
files, PostScript
files, XML files, email
- Non-Prototypical Documents: Post-it notes, fortune cookie
strips, maps, paintings, milk cartons, cereal boxes
- Non-Classical Digital Documents: Web pages, blogs, wikis
- Boundary Examples: The Pioneer plaque on the Pioneer 11 spacecraft,
designed by astronomer Carl Sagan, and using information assumed to
be universal is an extreme example of a document that is intended
to communicate with aliens. Conversely, the recorded and printed
signals of the SETI project would constitute documents if they were
discovered to contain alien communication.
Social aspects of
documents
Documents play a key role in the construction of social reality
(Searle, 1996) and therefore play a part in accounts of every
important aspect of human society and culture. An example of this
type of account is in the seminal account of the role of print in
political evolution, Imagined Communities, (Anderson, B.,
2006). More direct examples include the works of Marshall
McLuhan (McLuhan, 1964 and 1969). Many key social aspects of
documents arise from their historically unchanging
character. This aspect leads to a definition of a document as a
talking thing (Levy, D., 2003), whose strengths and
weaknesses both arise from its relative (historical) immutability
with respect to oral forms of communication. The relative
immutability of documents has thus historically been important for
establishing a record of transient events, or for preserving
information whose precise linguistic form is of ritual or practical
importance (such as religious texts or legal documents). Note
though, that historically many societies have accorded greater
authority to disciplined oral traditions as more reliable than
parallel written ones. With this caveat in mind, the following
social aspects of information may be noted.
- Social Value: The information in documents as
well as documents themselves are often valuable; the information
because of the influence represented, and the document itself when
it is believed to be a rare or unique and authentic representation
of the information it contains.
- Manifestation of authority: Documents are
often produced to provide a record that will be considered
authoritative in the future, particularly with respect to
government. Consider receipts, titles, and deeds as examples of
proof of ownership, and passports or driver's licenses as proof of
identity.
- Conventional: Documents inherit a key feature
of language-based communication in general: they are denoted as
documents by convention (Lewis, 2002). Virtually any medium can
constitute a document provided the people involved can agree on the
meaning represented. Hence cave drawings, hieroglyphics, scrolls of
sheepskin, sheets of papyrus, ink on paper, magnetic tape and
electronic files are all documents under certain accounts.
- Manifestation of economic labor: Historically,
the effort required to produce a document has been significant, so
only the most important documents were created. The Illuminated manuscript of the
pre-Gutenberg era demonstrates the cost (and associated imputed
value) of documents. Historically, the cost of producing documents
has declined, while their functional characteristics ("affordances"
in the sense of Sellen and Harper, 2001) have become richer.
- Manifestation of business processes: Documents
play many roles in the internal management of a business as well in
the interfaces between businesses and their suppliers, employees,
and customers. Current trends toward longer value chains and
increased regulation increase the number of documents that must be
generated and processed.
- Instruments of Governance and Law: The
unchanging aspect of documents is crucial to the consistent
communication of policy and administration of law to citizens.
Documents that play such roles include constitutions, corporate
annual reports and religious texts.
- Analytical philosophical character: The notion
of document plays a role in political philosophy (example, the
notion of social contract as a primitive construct), as well as in
the philosophy of law
- Role in Religion: Documents play a key role in
religion, and constitute canonical content. Document-related terms
such as dogma and doctrine have today acquired
pejorative connotations primarily due to historical events
associated with religious documents.
- Cultural Significance: Documents play a
central role in art of all varieties. In the movie Office Space for
instance, central plot elements are frustration with bureacratic
process involving the fictional "TPS reports" and a malfunctioning
printer.
- Metaphoric Significance: Metaphors based on
documents permeate our thinking, ranging from the obvious ("let's
start with a clean sheet for this design", "this is a new chapter
in my life" and "she wrote the book on that") to the highly
allegorical ("All mankind is of one author, and is one volume; when
one man dies, one chapter is not torn out of the book, but
translated into a better language; and every chapter must be so
translated" — John Donne).
Functional
characteristics
Documents also manifest several, more localized characteristics
that determine how we use them in everyday life:
- Manifest nature: Information is physical, i.e.
it always must exist in a tangible form, even when digital. IBM
computer scientist Rolf Landauer is credited with this
observation and working out its implications. By virtue of being
realizations of chunks of information, documents are necessarily
physical in all their forms.
- Contextuality and Situatedness: All
communication takes place in a context, which includes at least the
shared understanding of the parties communicating (Lewis, 2002).
Explicit and implicit references to the context can convey a large
amount of meaning by building on the shared understanding, but that
meaning is lost to another party that does not share that context.
For example, Shakespeare in the original would be incomprehensible
to modern readers simply because of the evolution of language and
spelling since the seventeenth century, and modern readers (besides
Shakespeare scholars) normally read modernized versions. Similarly,
hypertext documents exist in a context which is lost if printed,
leading to a different offline reading context.
- Evolvability: When we think of a document as a
definitive source containing the best known information about a
topic there is need to change that information as more is learned.
This is frequently done by revising the document into a new version
or edition. Typically, older versions are archived to facilitate
understanding how the document has changed. In modern contexts,
when technologies such as wikis or software source code are under
discussion, this evolvability can require very sophisticated
version control technologies.
- Renderability: Every abstract entity that is understood to be a
document in some context can be rendered, often in more than one
way. A rendition of a document refers to a particular physical or
electronic representation of the information from the document. For
example, a portable document format (pdf) representation and a web page may
contain the same information but have substantially different
properties and appearances. We think of them as different
renditions (or renderings) of the same document. We might similarly
consider different translations of a document to be the same
document although differences in language context and structure may
make it impossible to express precisely the same meaning in both
languages.
- Affordances: Documents in digital and physical
forms manifest various "affordances" (Sellen and Harper, 2001,
Gladwell, 2002)). The affordances of a particular rendition of a
document determine its uses. For example, paper has the affordances
of allowing flipping and easy tactile manipulation, while digital
forms are easier to edit.
Classical
roles and workflows in document production
There are a number of roles in which people are involved in the
creation and distribution of traditional paper documents (Romano,
1989); some, but not all documents are processed by people acting
in each role, each of which may be performed by an individual or a
group. Books are a well known example of documents that require an
extensive publication process, but many other documents undergo
similar processes to at least some of those from book publication.
Each of these roles is considered to improve or add value to a
document. These roles are generally understood as being clustered
in various phases in the production of a classical document,
including authorship, editing and prepress. Roles and workflows in
the production of modern digital documents are more variable and
are discussed in the section on future documents.
- An author
selects the content to be communicated and performs the initial
organization and recording of the content. A document in this state
is often called a manuscript.
- A reviewer reads the content and
evaluates it with respect to the intended audience. Reviewers often
recommend only the best documents to be published. Documented
reviews are frequently published as guidelines for document
consumers as well.
- An editor helps to organize and express
the content so that the meaning is clear and understandable, and
follows the conventions of the symbolic representation such as
spelling and grammar.
- A publisher orchestrates the process
of producing a document, often decides whether a document is worth
the effort of publishing (usually an economic decision), and
collects and disseminates the profits from sales of a produced
document.
- A printer formats the
document into a comfortable form such as a bound book. Printing can
be a very complex and elaborate process, including
- pagination - function performed by
an individual who takes on the tasks of organizing text, fonts,
images, headings, footnotes, chapters and sections to accommodate
the physical constraints of a printed page aesthetically.
- pre-press—function performed by print shops in preparing paper
documents for production.
- imposition - organizing desired
pages on a larger media such that when folded and trimmed the pages
will be upright and in order.
- printing - marking paper with ink or
toner
- folding pages into sections
- binding pages together and covering
- trimming
- packaging
- A distributor manages inventory and
physical distribution of printed documents to retailers.
- A retailer manages a local inventory
and sales to consumers, and often is familiar with the content and
can make appropriate recommendations.
- A librarian organizes, tracks
borrowing of, and archives documents.
A publication process enables a consumer to purchase or borrow, read
and learn from documents. Consumers are often the intended audience
of the publication process.
Document production
technology
Document production technology has evolved significantly through
history. While a great deal can be said about ancient production
technologies including papyrus, palm leaves, stone tablets and
marking devices ranging from quills to chisels, the modern form of
the document has evolved largely under the influence of printing
technologies. The Illuminated manuscript of Europe
is a useful prototypical instance of the document at the end of its
evolution before the widespread use of printing. The associated
technology was largely a human one. Other cultures at this stage
used other forms of pre-print era documents. The history of
printing can be traced as follows:
Bronze Age civilizations made extensive use of seals for commercial and transactional
purposes. The particular case of the signet ring was of particular
importance, and is still in use in place of signatures in East
Asian countries like Korea, where it is common for individuals to
carry a seal.
Chinese Woodblock printing was the first
widespread technology that automated important parts of the
document production process.
The Gutenberg Printing Press (McLuhan, M., 1969) enabled
the mass production of faithful copies of documents, and hence the
widespread dissemination of information. The widespread access to
information enabled (and necessitated) fundamental changes to
society in religion, government, law, business, and entertainment.
Prior to the press the huge effort required to faithfully hand-copy
severely limited the number of documents available, and hence
access to the that sucked information contained therein. The effort
to set type and prepare a document for reproduction was still high,
but many high fidelity copies could be produced.
The development of Lithography constituted the next great
advance in document production technology and continues today to
dominate the economic landscape of document production, an economic
sector estimated to be of the order of $1 trillion. Lithography
brought economies of scale and extremely high quality and low cost
to documents.
The typewriter
improved the accessibility of document production technologies and
enabled it to enter mainstream workplaces. Carbon paper enabled a modest number of
copies to be produced concurrently with the original. A brief era
of photography-based technologies flourished (including the
photostat and cyclostyle processes) in parallel with the age of
typewriters.
The Xerox Copier became a major milestone in document
production by eliminating the typesetting effort required by a
printing press. The Xerographic ("dry writing") technology (also
referred to as electrophotography) could produce durable and
economical copies of a paper document easily and quickly. Modern
digital printers from Xerox and other companies such as HP, Canon
and Ricoh, can produce more than 240 black and white or 170 copies
of a page each minute, and work with up to 6 colors and dry and wet
inks. This technology supports a $100 billion market in digital
printing, particularly in domains where lithography has clear
limitations.
Computers enabled information to be stored
electronically in databases and electronic files on magnetic tapes,
drums, and disks. This led to a radical disruption of all document
production technologies. Initially most of this information was
printed onto paper by teletypes (automated typewriters), but
computer printers rapidly became faster and more sophisticated.
Computers, by controlling lasers in xerography, micro-nozzles in
inkjet systems, and tiny solenoids in mechanical systems, became
capable of being serially embedded in the document production
process. Computers are also critical to modern lithography.
A whole interaction style with computers was
developed around the metaphor of working with
documents and folders on a desktop, to the point that the word
document is now commonly associated with the information
stored in a computer file according to the metaphor. Today, electronic
paper is viewed as one potential future evolutionary physical
form of the prototypical document, as it can present the electronic
document with the readability of printed paper.
Document life cycle
management technology
Technology to manage documents has evolved in parallel with
documents themselves. Of particular importance are practices
concerning the preservation, archival, destruction and management
of documents. These constitute what is known as the "document life
cycle"
- Physical preservation: Documents in both
traditional physical forms and in digital physical forms such as
magnetic media must be physically preserved. This aspect of
document management deals with such issues as the aging of paper
(the innovation of acid-free paper is an advance in preservation)
and obsolescence of magnetic media.
- Storage: This aspect includes management of
scarce resources such as shelf space and disk space, and associated
technologies such as optimal space utilization. Modern libraries
such as the University of Nevada and the University of Michigan
often use complex space-saving technologies such as robotic
retrieval systems for stacks and moving bookshelves. In the digital
realm, the entire discipline of compression technologies can be
viewed as concerned with the storage of documents.
- Cultural Preservation: This function,
traditionally ascribed to librarians involves the selection,
arrangement and storage of documents in safe places. The importance
of this part of document life cycle management can be seen in the
impact of historical events such as the destruction of books in
ancient China and the burning of the library at Alexandria. Today,
library and information science has evolved into an important
academic discipline.
- Bibliometrics: This aspect of document
management involves functions of indexing, generating statistics
and taxonomies, and improving the usability of large collections of
documents. The modern history of this management technology dates
back to Melvil Dewey and the Dewey Decimal
System. Today, the science of bibliometrics is largely
concerned with managing the impact of electronic technologies. This
aspect must also deal with ISBN numbers, Library of Congress data
and other standards.
- Digital Content Management: The explosion of
digital content has resulted in technologies to manage large
collections of digital information generated by organizations. Such
systems must manage access control and privileges, multiple
electronic format, interface with printing infrastructures and
enable collaborative work flows around documents.
- Digital-Physical Interaction Management: As
long as both paper and digital documents continue to have value,
the modern management technologies to manage their interaction will
continue. Key to this management is the management of large scale
and systematic scanning of physical documents (such as the Google
book scanning project).
- Destruction: With the increased cost of
identity theft, corporate scandals and privacy concerns, the
destruction of both paper and electronic documents has become
increasingly important to manage. Technologies such as shredders
play a role, as do verifiable processes of destruction of
electronic documents to ensure compliance with privacy laws.
- Security: Shannon's information theory has led
to an entire discipline that concerns itself with the security of
documents, and associated technologies such as encryption, as well
as more physical security features such as watermarks and making
currency documents safe from counterfeiting.
- Transportation: The entire postal system, as
well as modern courier systems, is largely built on the need to
move documents physically from one location to the other.
The
document economy
The economics of the production and management of documents
indirectly impacts every economic sector. While the total economic
value of the document economy is hard to estimate, the economic
sectors with business models directly dependent on documents
include:
- Document Authoring Technology: This sector
supports a huge variety of digital and physical production
technologies, ranging from Microsoft Word to LaTeX to advanced layout software.
- Education: The production and processing of
documents is so critical that entire educational disciplines have
evolved around writing, editing, layout and design of documents.
The information sciences are also part of the document
economy.
- Electronic Document Management: Managing
documents within organizations and in public and personal contexts
supports a huge industry in content management systems, ranging
from free public infrastructure such as wikipedia to proprietary enterprise
applications such as Docushare and Documentum.
- Physical Document Management: Large
manufacturing sectors producing everything from 3-ring binders to
filing cabinets and office desks exist largely due to the need to
process documents.
- Media: The paper industry exists to support
the document economy.
- Print equipment: From lithography and
xerography to pencils and crayons, an extraordinarily diverse set
of equipment industries depend on documents.
- Document Services: In large organizations, the
life of documents in the work flows and processes of daily activity
represent an enormous locus of value addition and cost reduction,
which has led to a burgeoning industry in managed document
services, ranging from specialized niches (such as payroll
management by PayChex Inc.,) to managed office printing.
- Retail Production: From large chains such as
Kinko's in the United States to small copy shops and offset print
shops, documents support a large production sector for the end
user.
- Publishing: All publishing, ranging from
offset-based newspaper and magazine printing, to highly customized
modern publishing using publish-on-demand digital print technology,
is part of the document economy. The publishing industry includes
major sub-areas such as the writer's market, small, medium and
large publishing houses, small and large distributors and a vast
network of independent and chain bookstores, online retailers, a
large used-documents market and subscription-based markets.
- Document Transportation: The international
postal system, as well as the commercial package transportation
systems represented by companies such as DHL and UPS have economic
models based largely on the demand for document
transportation.
Future of
documents
Since the advent of the digital era, documents have been rapidly
evolving, and may require fundamental reconceptualization (Wesch,
2006). Efforts at this reconceptualization include Vannevar Bush's
initial conceptualization of hypertext (Bush, V., 1945). The impact of
digital technology can be understood in terms of several key
aspects:
- Blurring the notion of document boundary:
hypertext and Web content make it hard to determine what is being
denoted by the term document. While the early days of the
Web resulted in documents that mimicked their physical ancestors,
Web content rapidly took on new characteristics.
Reconceptualization of the notion of "boundary" is a key
intellectual challenge (Sweet, 2002).
- Increasing structure and openness: The
document is going from an opaque container of information to a much
more open, structured document. XML
is underlying most document formats today (OpenDocument or Office Open XML). In the future, it
will become even more queriable, with the actual elements of this
document being tagged — e.g. HR-XML.
- Dynamic nature: Web analogs of traditional
paper documents like a newspaper column have taken on a dynamic
character due to the impact of technology enabling the addition of
comments from readers. The document will increasingly become
"virtual", bringing up-to-date information from various sources in
one container (a la "mash-up") - as such,it will be kept
evergreen.
- Paper and electronic are reconciling: Paper
has traditionally been a gap in document processing workflows.
Technologies such as OCR, OMR, or 2D Barcodes
are helping get its content back into the electronic world. In the
future however, Not only will that transition be seamless, but it
will also be possible to track it while in the "physical" world
through RFID
or MemorySpot.
- Hybrid automated/human authorship: authorship
workflows for digital documents have evolved to include the
computer in a key role. Dynamic Web pages may be viewed as the
joint output of a human author (who produces a template) and a
software system (that fills in the template). Sophisticated
examples of this phenomenon can be found in recent evolutions in
paper documents as well. Variable data technology, for instance,
allows creators of direct mail marketing documents to vary the
content of every piece in a print run using technologies such as
DesignMerge or Xmpie.
- Prosumer workflows: Content repositories such
as wikipedia radically
alter traditional document production workflows by blurring roles
such as author and editor.
- Customizability: Digital technology allows
users to actively participate in the construction of documents they
see, realizing the postmodern notion of construction of meaning in
an unexpectedly literal way.
- Long Tail Economics: Technologies such as
blogs have allowed document production economics to operate with
such radically cheap cost structures that single individuals can
derive an income from a global audience with low capital expenses.
This has led to an explosion of niche content.
- Blurring of Documents and Interfaces:
Technologies such as Ajax or Apollo blur the distinction
between documents and user interfaces to "intelligent"
technologies, leading to a whole class of smart documents that can
go beyond the passive nature of traditional documents.
- Fluidity and Dynamic Microstructure: Distinct
from the impact of hypertext on the notion of document is the fluid
potential of modern documents at the microlevel, which allows an
enormous variety of word and sentence level dynamic phenonomenology
(Kelly, K., 2006).
See also
References
- Sellen, A. J. and Harper, R. H. R., 2001, The Myth of the
Paperless Office
- McLuhan, M., 1969, The Gutenberg Galaxy
- McLuhan, M., 1964, Understanding Media: The Extensions of
Man
- Faculty of Information Systems
and Technologies
- Landow, G. P., 2006, Hypertext 3.0: Critical Theory and New
Media in an Era of Globalization
- Bush, V., 1945, As We May Think, Atlantic Monthly, http://www.theatlantic.com/doc/194507/bush
- Kelly, K. 2006, Scan This Book!, New York Times Magazine, http://www.kk.org/writings/scan_this_book.php
- Owen, D., 2004, Copies in Seconds: How a Lone Inventor and
an Unknown Company Created the Biggest Communication Breakthrough
Since Gutenberg — Chester Carlson and the Birth of the Xerox
Machine
- Searle, J. R., 1997, The Construction of Social
Reality
- Anderson, B., 2006, Imagined Communities: Reflections on
the Origin and Spread of Nationalism, New Edition
- Levy, D., 2003, Scrolling Forward: Making Sense of
Documents in the Digital Age
- Gladwell, M., 2002, The Social Life of Paper, New Yorker Magazine, http://www.gladwell.com/2002/2002_03_25_a_paper.htm
- Lewis, D. K., 2002 Convention: A Philosophical Study
(Revised edition)
- Pedauque, R. T., Document: Form, Sign and Medium, as
Reformulated for Electronic Documents [1]
- Romano, F., 1989, Pocket Guide to Digital
Prepress
- Sweet, J., 2003, Document Boundaries Master's Thesis,
Rochester Institute of
Technology
- Wesch, M., 2006, The Machine is Us/ing Us, video short
documentary, http://www.youtube.com/watch?v=6gmP4nk0EOE