From Wikipedia, the free encyclopedia
Cyc is an artificial intelligence project that
attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of
enabling AI applications to perform
human-like reasoning. The project was started in 1984 by Douglas Lenat at MCC
and is developed by company Cycorp. Parts of the project are released
as OpenCyc, which provides an API, RDF endpoint, and data dump under an open source
The project was started in 1984 as part of Microelectronics
and Computer Technology Corporation. The objective was to
codify, in machine-usable form, millions of pieces of knowledge
that comprise human common sense. CycL presented a proprietary
knowledge representation schema that utilized first-order
1986, Doug Lenat estimated the effort to complete
Cyc would be 250,000 rules and 350 man-years of effort. The
Cyc Project was spun off into Cycorp, Inc. in Austin, Texas in 1994.
The name "Cyc" (from "encyclopedia", pronounced like
syke) is a registered trademark owned by Cycorp. The
original knowledge base is proprietary, but a smaller version of
the knowledge base, intended to establish a common vocabulary for
automatic reasoning, was released as OpenCyc under an open source (Apache)
license. More recently, Cyc has been made available to AI
researchers under a research-purposes license as ResearchCyc.
Typical pieces of knowledge represented in the database are
"Every tree is a plant" and "Plants die eventually". When asked
whether trees die, the inference engine can draw the obvious
conclusion and answer the question correctly. The Knowledge Base
(KB) contains over one million human-defined assertions, rules or
common sense ideas. These are formulated in the language CycL, which is based on predicate calculus and has a syntax similar to that of the Lisp programming language.
Much of the current work on the Cyc project continues to be knowledge engineering,
representing facts about the world by hand, and implementing
efficient inference mechanisms on that knowledge. Increasingly,
however, work at Cycorp involves giving the Cyc system the ability
to communicate with end users in natural language, and to assist with
the knowledge formation process via machine learning.
Like many companies, Cyc has ambitions to use the Cyc natural language
understanding tools to parse the entire internet to extract
In 2008, Cyc resources were mapped to many Wikipedia articles,
potentially easing the connecting with other open datasets like DBpedia and Freebase.
The concept names in Cyc are known as constants.
Constants start with an optional "#$" and are case-sensitive. There
are constants for:
- Individual items known as individuals, such as
#$BillClinton or #$France.
- Collections, such as #$Tree-ThePlant (containing all
trees) or #$EquivalenceRelation (containing all equivalence relations). A member
of a collection is called an instance of that
- Truth Functions which can be applied to one or more
other concepts and return either true or false. For example
#$siblings is the sibling relationship, true if the two arguments
are siblings. By convention, truth function constants start with a
lower-case letter. Truth functions may be broken down into logical
connectives (such as #$and, #$or, #$not, #$implies), quantifiers
(#$forAll, #$thereExists, etc.) and predicates.
- Functions, which produce new terms from given ones.
For example, #$FruitFn, when provided with an argument describing a
type (or collection) of plants, will return the collection of its
fruits. By convention, function constants start with an upper-case
letter and end with the string "Fn".
The most important predicates are #$isa and #$genls. The first
one describes that one item is an instance of some
collection, the second one that one collection is a subcollection
of another one. Facts about concepts are asserted using certain
CycL sentences. Predicates are written before their
arguments, in parentheses:
(#$isa #$BillClinton #$UnitedStatesPresident)
"Bill Clinton belongs to the collection of U.S. presidents"
(#$genls #$Tree-ThePlant #$Plant)
"All trees are plants".
(#$capitalCity #$France #$Paris)
"Paris is the capital of France."
Sentences can also contain variables, strings starting with "?".
These sentences are called "rules". One important rule asserted
about the #$isa predicate reads
(#$isa ?OBJ ?SUBSET)
(#$genls ?SUBSET ?SUPERSET))
(#$isa ?OBJ ?SUPERSET))
with the interpretation "if OBJ is an instance of the collection
SUBSET and SUBSET is a
subcollection of SUPERSET, then OBJ is an instance of the
collection SUPERSET". Another typical example is
(#$relationAllExists #$biologicalMother #$ChordataPhylum #$FemaleAnimal)
which means that for every instance of the collection
#$ChordataPhylum (i.e. for every chordate), there exists a female animal
(instance of #$FemaleAnimal) which is its mother (described by the
base is divided into microtheories (Mt), collections
of concepts and facts typically pertaining to one particular realm
of knowledge. Unlike the knowledge base as a whole, each
microtheory is required to be free from contradictions. Each
microtheory has a name which is a regular constant; microtheory
constants contain the string "Mt" by convention. An example is
#$MathMt, the microtheory containing mathematical knowledge. The
microtheories can inherit from each other and are organized in a
hierarchy: one specialization of #$MathMt is #$GeometryGMt, the
microtheory about geometry.
An inference engine is a computer program
that tries to derive answers from a knowledge base. The Cyc
inference engine performs general logical deduction
tollens, universal quantification and
The latest version of OpenCyc, 2.0, was released in July 2009.
OpenCyc 1.0 includes the entire Cyc ontology containing hundreds of
thousands of terms, along with millions of assertions relating the
terms to each other, however these are mainly taxonomic assertions,
not the complex rules available in Cyc. The knowledge base contains
47,000 concepts and 306,000 facts and can be browsed on the OpenCyc
The first version of OpenCyc was released in spring 2002 and
contained only 6,000 concepts and 60,000 facts. The knowledge base
is released under the Apache License. Cycorp has stated its
intention to release OpenCyc under parallel, unrestricted licences
to meet the needs of its users. The CycL and SubL
interpreter (the program that allows you to browse and edit the
database as well as to draw inferences) is released free of charge,
but only as a binary, without source code. It is available for Linux and Microsoft
Windows. The open source Texai project
has released the RDF-compatible content
extracted from OpenCyc.
In July 2006, Cycorp released the binaries of ResearchCyc
1.0, a version of Cyc aimed at the research community, at no
charge. (ResearchCyc was in beta stage of development during all of
2004; a beta version was released in February 2005.) In addition to
the taxonomic information contained in OpenCyc, ResearchCyc
includes significantly more semantic knowledge (i.e., additional
facts) about the concepts in its knowledge base, and includes a
large lexicon, English parsing
and generation tools, and Java based interfaces for
knowledge editing and querying.
Terrorism Knowledge Base
The comprehensive Terrorism Knowledge Base is an application of
cyc in development that will try to ultimately contain all relevant
knowledge about terrorist groups, their members, leaders, ideology,
founders, sponsors, affiliations, facilities, locations, finances,
capabilities, intentions, behaviors, tactics, and full descriptions
of specific terrorist events. The knowledge is stored as statements
in mathematical logic, suitable for computer understanding and
Cyclopedia is being developed that superimposes cyc keywords on
pages taken from Wikipedia pages.
Criticisms of the Cyc
The Cyc project has been described as "one of the most
controversial endeavors of the artificial intelligence
history", so it
has inevitably garnered its share of criticism. Criticisms
- The complexity of the system - arguably necessitated by its
encyclopedic ambitions - and the consequent difficulty in adding to
the system by hand
- Scalability problems from widespread reification,
especially as constants
- Unsatisfactory treatment of the concept of substance and
the related distinction between intrinsic and extrinsic
- The lack of any meaningful benchmark or comparison for the
efficiency of Cyc's inference engine
- The current incompleteness of the system in both breadth and
depth and the related difficulty in measuring its completeness
- Limited documentation
- The lack of up-to-date on-line training material makes it
difficult for new people to learn the systems
- A large number of gaps in not only the ontology of ordinary
objects but an almost complete lack of relevant assertions
describing such objects
- Alan Belasco et al. (2004). "Representing Knowledge Gaps
Effectively". In: D. Karagiannis, U. Reimer (Eds.):
Practical Aspects of Knowledge Management, Proceedings of PAKM
2004, Vienna, Austria, December 2-3, 2004. Springer-Verlag,
- Elisa Bertino, Gian Piero & B.C. Zarria (2001).
Intelligent Database Systems. Addison-Wesley
- John Cabral & others (2005). "Converting Semantic
Meta-Knowledge into Inductive Bias". In: Proceedings of the
15th International Conference on Inductive Logic Programming.
Bonn, Germany, August 2005.
- Jon Curtis et al. (2005). "On the Effective Use of Cyc
in a Question Answering System". In: Papers from the IJCAI
Workshop on Knowledge and Reasoning for Answering Questions.
Edinburgh, Scotland: 2005.
- Chris Deaton et al. (2005). "The Comprehensive Terrorism
Knowledge Base in Cyc". In: Proceedings of the 2005
International Conference on Intelligence Analysis, McLean,
Virginia, May 2005.
- Kenneth Forbus et al. (2005) ."Combining analogy,
intelligent information retrieval, and knowledge integration for
analysis: A preliminary report". In: Proceedings of the
2005 International Conference on Intelligence Analysis,
McLean, Virginia, May 2005
- James Masters (2002). "Structured Knowledge Source
Integration and its applications to information fusion". In:
Proceedings of the Fifth International Conference on
Information Fusion. Annapolis, MD, July 2002.
- James Masters and Z. Güngördü (2003). "Structured Knowledge Source
Integration: A Progress Report". In: In Integration of
Knowledge Intensive Multiagent Systems. Cambridge,
Massachusetts, USA, 2003.
- Cynthia Matuszek et al. (2005) ."Searching for Common Sense:
Populating Cyc from the Web". In: Proceedings of the
Twentieth National Conference on Artificial Intelligence.
Pittsburgh, Pennsylvania, July 2005.
Lenat and R. V. Guha. (1990). Building Large
Knowledge-Based Systems: Representation and Inference in the Cyc
Project. Addison-Wesley. ISBN 0-201-51752-3.
- Tom O'Hara et al. (2003). "Inducing criteria for mass
noun lexical mappings using the Cyc Knowledge Base and its
Extension to WordNet". In: Proceedings of the Fifth
International Workshop on Computational Semantics. Tilburg,
- Kathy Panton et al. (2002). "Knowledge Formation and
Dialogue Using the KRAKEN Toolset". In: Eighteenth National
Conference on Artificial Intelligence. Edmonton, Canada,
- Deepak Ramachandran P. Reagan & K. Goolsbey (2005). "First-Orderized ResearchCyc:
Expressivity and Efficiency in a Common-Sense Ontology". In:
Papers from the AAAI Workshop on Contexts and Ontologies:
Theory, Practice and Applications. Pittsburgh, Pennsylvania,
- Stephen Reed and D. Lenat (2002). "Mapping Ontologies into
Cyc". In: AAAI 2002 Conference Workshop on Ontologies For
The Semantic Web. Edmonton, Canada, July 2002.
- Benjamin Rode et al. (2005). "Towards a Model of Pattern
Recovery in Relational Data". In: Proceedings of the 2005
International Conference on Intelligence Analysis. McLean,
Virginia, May 2005.
- Dave Schneider et al. (2005). "Gathering and Managing Facts
for Intelligence Analysis". In: Proceedings of the 2005
International Conference on Intelligence Analysis". McLean,
Virginia, May 2005.
- Blake Shepard et al. (2005). "A Knowledge-Based Approach to
Network Security: Applying Cyc in the Domain of Network Risk
Assessment". In: Proceedings of the Seventeenth Innovative
Applications of Artificial Intelligence Conference.
Pittsburgh, Pennsylvania, July 2005.
- Nick Siegel et al. (2004). "Agent Architectures:
Combining the Strengths of Software Engineering and Cognitive
Systems". In: Papers from the AAAI Workshop on Intelligent
Agent Architectures: Combining the Strengths of Software
Engineering and Cognitive Systems. Technical Report WS-04-07,
pp. 74-79. Menlo Park, California: AAAI Press, 2004.
- Nick Siegel et al. (2005). Hypothesis Generation and
Evidence Assembly for Intelligence Analysis: Cycorp's Nooscape
Application". In Proceedings of the 2005 International
Conference on Intelligence Analysis, McLean, Virginia, May
- Michael Witbrock et al. (2002). "An Interactive Dialogue
System for Knowledge Acquisition in Cyc". In: Proceedings
of the Eighteenth International Joint Conference on Artificial
Intelligence. Acapulco, Mexico, 2003.
- Michael Witbrock et al. (2004). "Automated OWL Annotation
Assisted by a Large Knowledge Base". In: Workshop Notes of
the 2004 Workshop on Knowledge Markup and Semantic Annotation at
the 3rd International Semantic Web Conference ISWC2004.
Hiroshima, Japan, November 2004, pp. 71-80.
- Michael Witbrock et al. (2005). "Knowledge Begets Knowledge:
Steps towards Assisted Knowledge Acquisition in Cyc". In:
Papers from the 2005 AAAI Spring Symposium on Knowledge
Collection from Volunteer Contributors (KCVC).
pp. 99-105. Stanford, California, March 2005.
• Ars Magna
, 1300) •
An Essay towards a Real Character and a Philosophical
, 1688) • Calculus ratiocinator
& Characteristica universalis
, 1700) • Dewey Decimal
, 1876) • Begriffsschrift
1879) • Mundaneum
& Henri La
, 1910) • Logical atomism
, 1918) • Tractatus
, 1921) • Hilbert's
, 1920s) • Incompleteness theorem
1931) • Memex
1945) • Cyc
(1984) • True Knowledge
(2007) • Wolfram