The Full Wiki

Data mining: Wikis

  
  

Encyclopedia

From Wikipedia, the free encyclopedia

.Data mining is the process of extracting patterns from data.^ That is, data mining attempts to extract knowledge from data.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Understand the data mining process.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Data mining is a knowledge discovery process.

.Data mining is becoming an increasingly important tool to transform this data into information.^ For more information on Data Mining .
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

^ Question: Why is data mining important?

^ Data mining is a tool, not a magic wand.
  • welcome to Hua Analytical Technology Co.,Ltd����Data Mining 19 January 2010 18:018 UTC www.huaat.com [Source type: FILTERED WITH BAYES]

.It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.^ Their products are used in a wide range of industries, including power generation, pulp and paper, food and beverage, chemical processing, marine, materials processing, medical, financial, targeted marketing, credit, and securities.  .
  • PC AI - Data Warehouse and Data Mining 19 January 2010 18:018 UTC www.pcai.com [Source type: Academic]

^ We study the underlying principles of data analysis algorithms, develop innovative techniques for knowledge discovery, and apply those techniques to practical tasks in areas such as fraud detection, scientific data analysis, and web mining."
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ The stored data can be of a wide nature, such as oil-drilling data, stock market data, consumer data, etc.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.Data mining can be used to uncover patterns in data but is often carried out only on samples of data.^ All the data preparation and mining is carried out on the client.
  • XpertRule | White Papers | Data Mining - Beyond Algorithms 19 January 2010 18:018 UTC www.xpertrule.com [Source type: Academic]

^ Rule induction data mining was used to discover patterns in the data.
  • XpertRule | White Papers | Data Mining - Beyond Algorithms 19 January 2010 18:018 UTC www.xpertrule.com [Source type: Academic]

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

.The mining process will be ineffective if the samples are not a good representation of the larger body of data.^ Understand the data mining process.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Data mining is a knowledge discovery process.

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

.Data mining cannot discover patterns that may be present in the larger body of data if those patterns are not present in the sample being "mined". Inability to find patterns may become a cause for some disputes between customers and service providers.^ The patterns data mining finds for those two goals may be very different.
  • welcome to Hua Analytical Technology Co.,Ltd����Data Mining 19 January 2010 18:018 UTC www.huaat.com [Source type: FILTERED WITH BAYES]

^ Data mining provides the enterprise with intelligence.
  • Data Mining and Statistics: What is the Connection? 19 January 2010 18:018 UTC www.tdan.com [Source type: FILTERED WITH BAYES]

^ Data mining News - IT Industry Today This is a service of a digital news provider .
  • Data mining News - IT Industry Today 19 January 2010 18:018 UTC it.einnews.com [Source type: News]

.Therefore data mining is not foolproof but may be useful if sufficiently representative data samples are collected.^ A variety of data sources may be used to form the base of data to be mined.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ September 21, 2005 Lawmakers are full of questions about a data-mining effort that may have compiled information on Mohammed Atta before the attacks.
  • data mining news on CNET 19 January 2010 18:018 UTC ces.cnet.com [Source type: General]

^ This tutorial can be used as a self-contained introduction to the flavor and terminology of data mining without needing to review many statistical or probabilistic pre-requisites.
  • Statistical Data Mining Tutorials 19 January 2010 18:018 UTC www.autonlab.org [Source type: FILTERED WITH BAYES]

.The discovery of a particular pattern in a particular set of data does not necessarily mean that a pattern is found elsewhere in the larger data from which that sample was drawn.^ Comparing data sub-sets with K-Means .

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ Data mining is the automated analysis of large data sets to find patterns and trends that might otherwise go undiscovered.
  • Advanced Data Mining - CIO.com - Business Technology Leadership 19 January 2010 18:018 UTC www.cio.com [Source type: General]

.An important part of the process is the verification and validation of patterns on other samples of data.^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ A data item whose value falls outside the bounds enclosing most of the other corresponding values in the sample.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

^ In other words, if you had 1,000 rows of data, you would build a sample of 1,000 rows by picking one row at random.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.The related terms data dredging, data fishing and data snooping refer to the use of data mining techniques to sample sizes that are (or may be) too small for statistical inferences to be made about the validity of any patterns discovered (see also data-snooping bias).^ A variety of data sources may be used to form the base of data to be mined.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Q. What are some of the different techniques used in data mining?

^ Many times we see that data mining operators can be used cooperatively.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.Data dredging may, however, be used to develop new hypotheses, which must then be validated with sufficiently large sample sets.^ Focus on large data sets and databases .

^ It helps in extracting data from both software and hardware platforms and can be applied on new systems in order to develop the new products and upgrade the existing platforms.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

^ Advances in information technology and data collection methods have led to the availability of large data sets in commercial enterprises and in a wide variety of scientific and engineering disciplines.
  • Data Mining Conference 2004 19 January 2010 18:018 UTC www.siam.org [Source type: Academic]

Contents

Background

.Humans have been "manually" extracting patterns from data for centuries, but the increasing volume of data in modern times has called for more automated approaches.^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ Solving the Challenges of Exponential Data Growth sponsored by Syncsort WHITE PAPER - This paper explores the reasons why data volumes are increasing and where bottlenecks most frequently occur.
  • Data Mining White Papers, Webcasts and Case Studies - BusinessWeek 19 January 2010 18:018 UTC whitepapers.businessweek.com [Source type: Academic]
  • SecurityDocs: Data Mining 19 January 2010 18:018 UTC securitydocs.bitpipe.com [Source type: FILTERED WITH BAYES]

^ Data mining is the automated analysis of large data sets to find patterns and trends that might otherwise go undiscovered.
  • Advanced Data Mining - CIO.com - Business Technology Leadership 19 January 2010 18:018 UTC www.cio.com [Source type: General]

.Early methods of identifying patterns in data include Bayes' theorem (1700s) and Regression analysis (1800s).^ Descriptive It identifies patterns or relationships in data.

^ Data mining is the automated analysis of large data sets to find patterns and trends that might otherwise go undiscovered.
  • Advanced Data Mining - CIO.com - Business Technology Leadership 19 January 2010 18:018 UTC www.cio.com [Source type: General]

^ Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis.

.The proliferation, ubiquity and increasing power of computer technology has increased data collection and storage.^ Advances in information technology and data collection methods have led to the availability of large data sets in commercial enterprises and in a wide variety of scientific and engineering disciplines.
  • Data Mining Conference 2004 19 January 2010 18:018 UTC www.siam.org [Source type: Academic]

^ The rapid growth of computerized data, and the computer power available to analyze it, creates great opportunities for data mining in business, medicine, science, government, etc.
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

^ Fortunately, we have reached a point in terms of computational power, storage capacity and cost that enables us to gather, analyze and mine unprecedented amounts of data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing.^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ The IDU Data Mining (DM) technical area is about techniques for processing and combining raw data -- from large, distributed, heterogeneous, multidimensional data sets with complex spatial and/or temporal dynamics -- to detect patterns and regularities.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Data mining is the automated analysis of large data sets to find patterns and trends that might otherwise go undiscovered.
  • Advanced Data Mining - CIO.com - Business Technology Leadership 19 January 2010 18:018 UTC www.cio.com [Source type: General]

.This has been aided by other discoveries in computer science, such as neural networks, clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1980s).^ Support Vector Machines (20 November).
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

^ This study does highlight the fact that a chosen option does not necessarily dictate or limit the scale as long as the other attributes such as an effective parallelism algorithm, B-tree indices, main-memory computation, compression etc.
  • The Data Mining Renaissance – GigaOM 19 January 2010 18:018 UTC gigaom.com [Source type: General]

^ It is a fairly recent topic in computer science but applies many older computational techniques from statistics, information retrieval, machine learning and pattern recognition.

.Data mining is the process of applying these methods to data with the intention of uncovering hidden patterns.^ There are a number of data mining methods.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Data Mining applied to File Integrity.
  • Security Issues in Data Mining 19 January 2010 18:018 UTC www.cs.purdue.edu [Source type: Academic]

^ Understand the data mining process.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

[1] .It has been used for many years by businesses, scientists and governments to sift through volumes of data such as airline passenger trip records, census data and supermarket scanner data to produce market research reports.^ December 5, 2006 Department of Homeland Security implements data-mining system for passengers traveling to the U.S. TAGS: risk assessment , data mining , passenger , European Union , agency , government , U.S. Yahoo focuses on research .
  • data mining news on CNET 19 January 2010 18:018 UTC ces.cnet.com [Source type: General]

^ Instead, the underlying data will be available to anyone who wants to build a superior site or tool to sift through it.
  • Congressional Data Mining: Coming Soon? | Mother Jones 19 January 2010 18:018 UTC motherjones.com [Source type: FILTERED WITH BAYES]

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

.(Note, however, that reporting is not always considered to be data mining).^ Data-mining sucks: official report .
  • Data-mining sucks: official report - Boing Boing 19 January 2010 18:018 UTC www.boingboing.net [Source type: General]

^ Data mining: Life after report generators.
  • Bibliomining Bibliography (data mining in libraries) 19 January 2010 18:018 UTC bibliomining.com [Source type: Academic]

^ Extracting that information and getting it into usable shape, however, requires sophisticated data mining tools.
  • Data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: General]

.A primary reason for using data mining is to assist in the analysis of collections of observations of behaviour.^ This tutorial can be used as a self-contained introduction to the flavor and terminology of data mining without needing to review many statistical or probabilistic pre-requisites.
  • Statistical Data Mining Tutorials 19 January 2010 18:018 UTC www.autonlab.org [Source type: FILTERED WITH BAYES]

^ Data mining and Knowledge Discovery selects a collection of methods from a branch of Artificial Intelligence that began its explosive growth very recently.

^ Data mining is the automated analysis of large data sets to find patterns and trends that might otherwise go undiscovered.
  • Advanced Data Mining - CIO.com - Business Technology Leadership 19 January 2010 18:018 UTC www.cio.com [Source type: General]

.Such data are vulnerable to collinearity because of unknown interrelations.^ Intelligent applications, such as neural networks and genetic algorithms are ideal for finding trends and unknown information from the vast quantities of computer data.
  • PC AI - Data Warehouse and Data Mining 19 January 2010 18:018 UTC www.pcai.com [Source type: Academic]

.An unavoidable fact of data mining is that the (sub-)set(s) of data being analysed may not be representative of the whole domain, and therefore may not contain examples of certain critical relationships and behaviours that exist across other parts of the domain.^ Comparing data sub-sets with K-Means .

^ Domain-specific data-mining solutions .
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Although most of the data mining techniques have existed, at least as academic algorithms, for years or decades, it is only in the last several years that commercial data mining has caught on in a big way.
  • Data Mining and Statistics: What is the Connection? 19 January 2010 18:018 UTC www.tdan.com [Source type: FILTERED WITH BAYES]

.To address this sort of issue, the analysis may be augmented using experiment-based and other approaches, such as Choice Modelling for human-generated data.^ Models and issues in data stream systems .

^ A variety of data sources may be used to form the base of data to be mined.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Then, in May 2006, an event happened that in one day demonstrated convincingly that our approach was significantly better than all the other alternatives in our field.

In these situations, inherent correlations can be either controlled for, or removed altogether, during the construction of the experimental design.
.There have been some efforts to define standards for data mining, for example the 1999 European Cross Industry Standard Process for Data Mining (CRISP-DM 1.0) and the 2004 Java Data Mining standard (JDM 1.0).^ There are a number of data mining methods.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ The CRISP-DM process model .
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Understand the data mining process.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.These are evolving standards; later versions of these standards are under development.^ These capabilities are now evolving to integrate directly with industry-standard data warehouse and OLAP platforms.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

.Independent of these standardization efforts, freely available open-source software systems like the R Project, Weka, KNIME, RapidMiner and others have become an informal standard for defining data-mining processes.^ Responses to “The process of data mining” .

^ The process of data mining HOME .

^ Understand the data mining process.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.The first three of these systems are able to import and export models in PMML (Predictive Model Markup Language) which provides a standard way to represent data mining models so that these can be shared between different statistical applications.^ Models and issues in data stream systems .

^ Link mining includes both descriptive and predictive modeling of link data.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

.PMML is an XML-based language developed by the Data Mining Group (DMG)[2], an independent group composed of many data mining companies.^ Many forms of data mining are predictive.

^ Thus data mining technique is developed based on the knowledge based concepts.

^ Client based data mining .
  • XpertRule | White Papers | Data Mining - Beyond Algorithms 19 January 2010 18:018 UTC www.xpertrule.com [Source type: Academic]

PMML version 4.0 was released in June 2009.[2][3][4]

Research and evolution

.In addition to industry driven demand for standards and interoperability, professional and academic activity have also made considerable contributions to the evolution and rigour of the methods and models; an article published in a 2008 issue of the International Journal of Information Technology and Decision Making summarises the results of a literature survey which traces and analyses this evolution.^ These findings are documented in Digital Exposure , an article by Elizabeth Svoboda published in the November 2009 issue of Discover .
  • How to Protect Yourself Against Data Mining | Small Business Trends 19 January 2010 18:018 UTC smallbiztrends.com [Source type: General]

^ By far the most important negative for decision trees is that they are forced to make decisions along the way based on limited information that implicitly leaves out of consideration the vast majority of potential rules in the training file.
  • New Technology | Data Mining Technologies Inc. 19 January 2010 18:018 UTC www.data-mine.com [Source type: FILTERED WITH BAYES]

^ This Special Issue will provide a significant opportunity for authors to publish important novel and original contributions in the area of Data Mining applied to Social Media.

[5]
.The premier professional body in the field is the Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD).^ That is, data mining attempts to extract knowledge from data.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ ACM's Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) : A premier group promoting knowledge discovery R&D. Organizes one of the top conferences in the area.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Data Mining and Knowledge Discovery : journal edited by Usama Fayyad.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

[citation needed] Since 1989 they have hosted an annual international conference and published its proceedings,[6] and since 1999 have published a biannual academic journal titled "SIGKDD Explorations".[7] Other Computer Science conferences on data mining include:

Process

.Knowledge Discovery in Databases (KDD) is the name coined by Gregory Piatetsky-Shapiro in 1989 to describe the process of finding interesting, interpreted, useful and novel data.^ This process of using computers to extract useful information from a database is called "knowledge discovery," or simply data mining.

^ Data mining is also known as Knowledge Discovery in Data ( KDD).

^ It is a process to find the hidden information in a database.

.There are many nuances to this process, but roughly the steps are to preprocess raw data, mine the data, and interpret the results.^ There are a number of data mining methods.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Understand the data mining process.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Data mining is a knowledge discovery process.

[10]

Pre-processing

.Once the objective for the KDD process is known, a target data set must be assembled.^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ MapReduce is a technique popularized by Google that distributes complex problems to many distributed nodes and, as such, is useful for processing information from large data sets.
  • The Data Mining Renaissance – GigaOM 19 January 2010 18:018 UTC gigaom.com [Source type: General]

^ Fayyad, Usama, et al., The KDD Process for Extracting Useful Knowledge from Volumes of Data , Communications of the ACM , 39 11, November 1996.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.As data mining can only uncover patterns already present in the data, the target dataset must be large enough to contain these patterns while remaining concise enough to be mined in an acceptable timeframe.^ By and large, the application of data mining is constrained only by our imagination.
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

^ This tutorial can be used as a self-contained introduction to the flavor and terminology of data mining without needing to review many statistical or probabilistic pre-requisites.
  • Statistical Data Mining Tutorials 19 January 2010 18:018 UTC www.autonlab.org [Source type: FILTERED WITH BAYES]

^ MWF 10:30--11:20 Porter Hall 226B Data mining is the art of extracting useful patterns from large bodies of data; finding seams of actionable knowledge in the raw ore of information.
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

.A common source for data is a datamart or data warehouse.^ Hive , for example, is an open-source data warehouse infrastructure built on top of Hadoop.
  • The Data Mining Renaissance – GigaOM 19 January 2010 18:018 UTC gigaom.com [Source type: General]

^ Because these issues are common with those found while building Data Warehouses, we will not discuss them here.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

The target set is then cleaned. .Cleaning removes the observations with noise and missing data.^ Resolving semantic ambiguities, handling missing values in data and cleaning dirty data sets are typical data integration issues.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ First, pixels are labeled as either background or as significant pixels; this is done to remove noise from the data.

.The clean data are reduced into feature vectors, one vector per observation.^ I calculate average age of the 5 tables without combining them into one big data table.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

^ Turning nonlinear problems into linear ones by expanding into high-dimensional feature spaces.
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

^ Thus, with over 20,000 observations per shift, there is a tremendous amount of data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

A feature vector is a summarised version of the raw data observation. .For example, a black and white image of a face which is 100px by 100px would contain 10,000 bits of raw data.^ First, the dates of all the time-series were turned into bits / date-binaries and invalid raw data was excluded.
  • Assembly Language Extensions of Visual Prolog For Data Mining 19 January 2010 18:018 UTC omadeon.com [Source type: FILTERED WITH BAYES]

^ GooglingSEO says: November 8, 2009 at 10:23 pm How to Protect Yourself Against Data Mining http://bit.ly/1noa0o .
  • How to Protect Yourself Against Data Mining | Small Business Trends 19 January 2010 18:018 UTC smallbiztrends.com [Source type: General]

^ The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.This might be turned into a feature vector by locating the eyes and mouth in the image.^ Turning nonlinear problems into linear ones by expanding into high-dimensional feature spaces.
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

.Doing so would reduce the data for each vector from 10,000 bits to three codes for the locations, dramatically reducing the size of the dataset to be mined, and hence reducing the processing effort.^ The process of data mining .

^ Data mining is a knowledge discovery process.

^ Responses to “The process of data mining” .

.The feature(s) selected will depend on what the objective(s) is/are; obviously, selecting the "right" feature(s) is fundamental to successful data mining.^ Data mining and Knowledge Discovery selects a collection of methods from a branch of Artificial Intelligence that began its explosive growth very recently.

^ Featured Listings Data Mining Demo Watch the Cognos Data Mining Software Demo Right Now.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ The following list summarizes those features of a data mining tool that make it scalable.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.The feature vectors are divided into two sets, the "training set" and the "test set". The training set is used to "train" the data mining algorithm(s), while the test set is used to verify the accuracy of any patterns found.^ If you don't use different training and test data, the accuracy of the model will be overestimated.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ In this method, we randomly divide the data into two equal sets.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

Data mining

Data mining commonly involves four classes of tasks:[10]
.
  • Classification - Arranges the data into predefined groups.^ Classification maps the data in predefined classes.

    ^ Then, as time permits, I will describe some of my group's recent work on link-based classification and entity resolution in linked data.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ Clustering differs from classification in that it does not rely on predefined classes or characteristics for each group.

    .For example an email program might attempt to classify an email as legitimate or spam.^ For example, you might have the requirement to classify new customers regarding their risk of default.
    • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

    .Common algorithms include Decision Tree Learning, Nearest neighbor, naive Bayesian classification and Neural network.
  • Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together.
  • Regression - Attempts to find a function which models the data with the least error.
  • Association rule learning - Searches for relationships between variables.^ This is to help you remember that that this model is using a decision trees algorithm.
    • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

    ^ And then find the best function of this type that models the given data.

    ^ Classification maps the data in predefined classes.

    .For example a supermarket might gather data on customer purchasing habits.^ The structure of the Internet log data is similar to what a commercial organization might have for new customer requests.
    • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

    ^ For example, you might have the requirement to classify new customers regarding their risk of default.
    • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

    ^ Unfortunately, to determine what is unique about this second group of purchasers will require all of the data available on each customer and several complex models.
    • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

    .Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes.^ A sequential pattern function will analyze such collections of related records and will detect frequently occurring patterns of products bought over time.
    • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

    ^ The mined rules can in turn be used to improve the accuracy of information extraction.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ I recommend: Learn what others are saying about your business and your product with the web-based data mining tool Web Mining Spider from Information Crawler .
    • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

    This is sometimes referred to as market basket analysis.
  • See also structured data analysis.

Results validation

.The final step of knowledge discovery from data is to verify the patterns produced by the data mining algorithms occur in the wider data set.^ That is, data mining attempts to extract knowledge from data.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Handbook of Data Mining and Knowledge Discovery (pp.

^ Data mining is a knowledge discovery process.

.Not all patterns found by the data mining algorithms are necessarily valid.^ First of all, what are data mining and text mining?
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ They are similar in that they both 'mine' large amounts of data, looking for meaningful patterns.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

.It is common for the data mining algorithms to find patterns in the training set which are not present in the general data set, this is called overfitting.^ Summary:   Data mining is the process of finding rules and patterns in structured data.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ The IDU Data Mining (DM) technical area is about techniques for processing and combining raw data -- from large, distributed, heterogeneous, multidimensional data sets with complex spatial and/or temporal dynamics -- to detect patterns and regularities.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

.To overcome this, the evaluation uses a test set of data which the data mining algorithm was not trained on.^ When testing the validity of your model it is important to test the model with data that it has not used for training.
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

^ A measure often used in data mining algorithms that measures the disorder of a set of data.
  • A Data MiningGlossary 19 January 2010 18:018 UTC www.thearling.com [Source type: Reference]

^ Data mining is widely being used by the industry sectors such as retail, financial, communication, and marketing organizations where consumer focus is a concern.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

.The learnt patterns are applied to this test set and the resulting output is compared to the desired output.^ In the data output view, you can see the resulting set with global statistics of the model, as shown in Figure 3.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ It is then useful to compare the results of the mining operations across several data sets.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Data miners will often try different algorithms and settings, and inspect the resulting models and test results to select the best algorithm and settings.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

.For example, a data mining algorithm trying to distinguish spam from legitimate emails would be trained on a training set of sample emails.^ Note, in some data mining tasks there is no desire to predict a variable, for example some clustering exercises do not make predictions, they just cluster.
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

^ First applied in banking, data mining uses a variety of algorithms to sift through storehouses of data in search of 'noisy' patterns and relationships among the different silos of information.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Multiple trees were built from this data, with each tree only considering some percent of the training set (Kamath 17).

.Once trained, the learnt patterns would be applied to the test set of emails which it had not been trained on, the accuracy of these patterns can then be measured from how many emails they correctly classify.^ Some researchers have started to think about how they might better find meaning in these new mountains of data, and if possible to set up plans for future actions based on the growth of present data.

^ It is a fairly recent topic in computer science but applies many older computational techniques from statistics, information retrieval, machine learning and pattern recognition.

^ In this way, they use all of the data for both training and testing their model.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.A number of statistical methods may be used to evaluate the algorithm such as ROC curves.^ In the simplest case, regression uses standard statistical techniques such as linear regression.

^ Genetic algorithms : Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

^ Building a complex model with many variables, even when the number of rows is small, may be computationally intensive and will therefore benefit from parallel data mining algorithms.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.If the learnt patterns do not meet the desired standards, then it is necessary to reevaluate and change the preprocessing and data mining.^ A data warehouses are not a necessary for data mining.

^ Data Mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules.

^ They are similar in that they both 'mine' large amounts of data, looking for meaningful patterns.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

.If the learnt patterns do meet the desired standards then the final step is to interpret the learnt patterns and turn them into knowledge.^ Knowledge and Information Systems - An International Journal : IEEE Transactions on Pattern Analysis and Machine Intelligence : Data Mining Group : a data mining standardization organization (e.g.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ This paper highlights the pre-processing of raw data that the program performs, describes the data mining aspects of the software and how the interpretation of patterns supports the process of knowledge discovery.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

Notable uses

Games

Since the early 1960s, with the availability of oracles for certain combinatorial games, also called tablebases (e.g. for .3x3-chess) with any beginning configuration, small-board dots-and-boxes, small-board-hex, and certain endgames in chess, dots-and-boxes, and hex; a new area for data mining has been opened up.^ The course will begin with a tutorial on data mining.
  • Security Issues in Data Mining 19 January 2010 18:018 UTC www.cs.purdue.edu [Source type: Academic]

^ People who have Data Mining as a research interest (336) Home News Departments Research Interests People Papers Status Updates Friend Finder Login Signup .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

^ Data mining is an important method for extracting valuable information from all sizes of databases: large and small.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

This is the extraction of human-usable strategies from these oracles. .Current pattern recognition approaches do not seem to fully have the required high level of abstraction in order to be applied successfully.^ It is a fairly recent topic in computer science but applies many older computational techniques from statistics, information retrieval, machine learning and pattern recognition.

^ A course in artificial intelligence, machine learning, pattern recognition, algorithms, or statistics would be helpful, but is not required.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

Instead, extensive experimentation with the tablebases, combined with an intensive study of tablebase-answers to well designed problems and with knowledge of prior art, i.e. pre-tablebase knowledge, is used to yield insightful patterns. Berlekamp in dots-and-boxes etc. and .John Nunn in chess endgames are notable examples of researchers doing this work, though they were not and are not involved in tablebase generation.^ For instance, in Spain a lot of researchers earn less money that if they were working in a supermarket or driving a taxi, occupations with less responsabilities and less impact in the society.

^ Using examples drawn from leading web sites, I will provide an overview of how some of these systems work and how they generate great value for customers.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

Business

.Data mining in customer relationship management applications can contribute significantly to the bottom line.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ The Data mining applications .

^ Data mining has got the diversity in the field of application.

.July 2008" style="white-space:nowrap;">[citation needed] Rather than randomly contacting a prospect or customer through a call center or sending mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer.^ Predict which customers will respond to mailing.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Go to White Paper Voice of the Customer: Text Analytics for the Responsive Enterprise [ Source: Alta Plana Corporation ] July 2008- Text analytics helps insurance company business users discern and capture the Voice of the Customer from online media such as blogs, forum postings and news articles; from email, chat interactions and contact-center dialogues; and from surveys and other mechanisms for collecting customer feedback: from the totality of enterprise information sources.
  • Data Mining - White Papers, Case Studies, Videos, Webcasts - TechWeb Digital Library 19 January 2010 18:018 UTC www.informationweek.com [Source type: Academic]

^ The purpose of this paper is to walk you through a complete, real-life scenario for using data mining to predict customer profitability.
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

.More sophisticated methods may be used to optimise resources across campaigns so that one may predict which channel and which offer an individual is most likely to respond to — across all potential offers.^ The marketing of products to select groups of consumers that are more likely than average to be interested in the offer.
  • A Data MiningGlossary 19 January 2010 18:018 UTC www.thearling.com [Source type: Reference]

^ All of these services offer the user access to multiple surveys; however, the task of integrating all the catalogs into one source is far from over.

^ In most catalogs; however, the information available may be contained on only a few spectra, to be truly useful, multiple surveys at different wavelengths have to be considered.

.Additionally, sophisticated applications could be used to automate the mailing.^ There are thousands of ways that you could use a very similar structure to predict a continuous number for a different real-world application.
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

^ In the future, I may make more sophisticated maps using additional data.
  • Data Mining 101: Finding Subversives with Amazon Wishlists | Applefritter 19 January 2010 18:018 UTC www.applefritter.com [Source type: General]

^ For example, a neural net application could build multiple models using different architectures (e.g., with a different number of nodes or hidden layers) simultaneously on each processor.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.Once the results from data mining (potential prospect/customer and channel/offer) are determined, this "sophisticated application" can either automatically send an e-mail or regular mail.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ If the result of this query returns a number of customers that match the available budget for mailing promotions, the process ends.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ It is IBM's strategy to provide a data mining solution that complements the capabilities our customers have in their Information Warehouse product family.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.Finally, in cases where many people will take an action without an offer, uplift modeling can be used to determine which people will have the greatest increase in responding if given an offer.^ Mixture models as probabilistic clustering; finally an answer to "how many clusters?"
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

^ Example: A software vendor develops a model that can be used to control the settings of a vat in a chemical plant to increase yield.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ For example, for k-nearest nearest neighbor the calculation time increases as the factorial of the total number of points, and the calculation must be made every time the model is used.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.Data clustering can also be used to automatically discover the segments or groups within a customer data set.^ Another use of these tools is to detect trends and patterns in customer data that will help answer some questions about the business.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ This new analysis (conducted using 37 large data sets, comparing classification and ranking performance) shows some remarkable things.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Data mining tools discover useful facts buried in the raw data (thus the term discovery model.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.Businesses employing data mining may see a return on investment, but also they recognise that the number of predictive models can quickly become very large.^ There are a number of data mining methods.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Link mining includes both descriptive and predictive modeling of link data.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Click on Mining Model Prediction tab.
  • Predicting Customer Profitability – A First Data Mining Model 19 January 2010 18:018 UTC technet.microsoft.com [Source type: FILTERED WITH BAYES]

.Rather than one model to predict which customers will churn, a business could build a separate model for each region and customer type.^ IBM has the breadth to take this solution end-to-end: we understand our customers' business problems and can help build their business models; we can analyze and build their data warehouse, integrating the technology components and assist in helping analyze the results - leaving behind an operational system.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ It simply says that if the same technique were used on a succession of databases to build a model, the average error rate would be close to the one obtained this way.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Table 3 shows another common scenario for building models: predict what is going to happen in the future.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

.Then instead of sending an offer to all people that are likely to churn, it may only want to send offers to customers that will likely take to offer.^ And it's about raising people's consciousness and also giving them mechanisms to opt in to giving certain information, or more anonymized information or statistical features of that information instead of just having opt-in be an all-or-nothing.
  • Data Mining Spurs Innovation, Threatens Privacy : NPR 19 January 2010 18:018 UTC www.npr.org [Source type: General]

^ To all of them I want you to know that you demonstrate the courage and decency of the American people, and we are extraordinarily proud of you.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

^ I know a lot of people want to send blankets or water -- just send your cash.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.And finally, it may also want to determine which customers are going to be profitable over a window of time and only send the offers to those that are likely to be profitable.^ WiseGuys CRM Software Award winning CRM sends promo at right time to right customer.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ It is important to remember that the amount of data you are going to mine is likely to increase over time.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ For example, imagine a scenario where you have data from bank customers from which you want to use clustering to determine the different customer segments.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

.In order to maintain this quantity of models, they need to manage model versions and move to automated data mining.^ For example, fields that are primary keys in DB2 that always have different values are automatically moved to supplementary fields because they cannot provide insights to the clusters.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ Digimine Internet company that provides fully managed data warehousing and data mining solutions for eBusiness Intelligence.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ They are similar in that they both 'mine' large amounts of data, looking for meaningful patterns.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

.Data mining can also be helpful to human-resources departments in identifying the characteristics of their most successful employees.^ The Data Mine : An index of resources for data mining.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Data Mining has emerged as one of the most exciting and dynamic fields in computer science.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Generally, we used to feed a lot of information about the variety of situation where an answer is known and then we will run the data mining software through that data on our computer and fetch the characteristic of the data that would help to construct the model.

.Information obtained, such as universities attended by highly successful employees, can help HR focus recruiting efforts accordingly.^ Detailed measurement and transparency of results can help focus efforts.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

^ Fortunately, advances in a field known as data mining are helping customers leverage their data more effectively and obtain insightful information that can give them a competitive edge.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Successes along the way help build momentum and continued focus.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.Additionally, Strategic Enterprise Management applications help a company translate corporate-level goals, such as profit and margin share targets, into operational decisions, such as production plans and workforce levels.^ Also, the model may help the non-profit raise more money, and reduce paper waste thanks to better targeting.
  • How to Protect Yourself Against Data Mining | Small Business Trends 19 January 2010 18:018 UTC smallbiztrends.com [Source type: General]

^ As such, good facilities to perform queries and data visualization as well as the availability of powerful data mining operators should be part of a well architected Decision Support environment.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Furthermore, the data analysis and data mining applications make it impossible for the corporate computer to maintain its level of transaction throughput.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

[11]
.Another example of data mining, often called the market basket analysis, relates to its use in retail sales.^ Businesses interested in data mining not related to numbers might want to consider Alyuda from Neo Digital, Inc.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ As noted earlier, a simple and common data mining application involves the analysis of sales data in retail environments; much of this is said to involve "market basket" analysis, as it requires understanding purchases made by a customer in a single transaction (or what items he/she placed in a shopping cart or market basket for purchase).
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Leinweber's exercise isn't much more absurd than some actual examples of data mining.
  • Data Mining Isn't A Good Bet - WSJ.com 19 January 2010 18:018 UTC online.wsj.com [Source type: General]

.If a clothing store records the purchases of customers, a data-mining system could identify those customers who favour silk shirts over cotton ones.^ Purchase software for mining data.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ Data Mining has emerged as one of the most exciting and dynamic fields in computer science.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Suppose a data mining utility unearthed a pattern in the data which indicated that customers who shopped on Saturday afternoons and who made their initial purchase of the day in the shoe department tended to make, on average, 4 additional purchases from other departments and that the average member of this group spent more per visit than the typical shopper.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

Although some explanations of relationships may be difficult, taking advantage of it is easier. .The example deals with association rules within transaction-based data.^ RedShed Software Software that aids in the discovery of relationship rules between data sets; based in Victoria, Australia.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ "Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, rules, and statistically significant structures and events in data.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ Using the same data as source, it was determined that using associations, a set of dynamic rules could be generated which would allow orders to either be accumulated or picked based on the likelihood of another identical order occurring within the next few days.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.Not all data are transaction based and logical or inexact rules may also be present within a database.^ Data mining is an important method for extracting valuable information from all sizes of databases: large and small.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ RedShed Software Software that aids in the discovery of relationship rules between data sets; based in Victoria, Australia.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ A variety of data sources may be used to form the base of data to be mined.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.In a manufacturing application, an inexact rule may state that 73% of products which have a specific defect or problem will develop a secondary problem within the next six months.^ We developed a safe, effective vaccine in six months – three months less than it usually takes.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.Market basket analysis has also been used to identify the purchase patterns of the Alpha consumer.^ Retailers use it to predict consumer buying patterns, and credit card companies use it to detect fraud.
  • AITopics / DataMining 19 January 2010 18:018 UTC www.aaai.org [Source type: Academic]

^ For example, to discover product affinities in market basket analysis one may include information about advertising and shelf placement.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Some of these early customer tests involved market basket analysis applications (described previously), and it's worth taking a closer look at how IBM's efforts were used to satisfy these application requirements.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

Alpha Consumers are people that play a key roles in connecting with the concept behind a product, then adopting that product, and finally validating it for the rest of society. .Analyzing the data collected on these type of users has allowed companies to predict future buying trends and forecast supply demands.^ This talk will review these types of data and prior applications of machine learning to gene chip data.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ If a site bans data mining, then the user can choose to play on another site which allows data mining.
  • Online Poker -- The Data Mining Dilemma - Poker News - CardPlayer.com 19 January 2010 18:018 UTC www.cardplayer.com [Source type: General]

^ Another use of these tools is to detect trends and patterns in customer data that will help answer some questions about the business.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

.Data Mining is a highly effective tool in the catalog marketing industry.^ The data mining tools can make this leap.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

^ The following list summarizes those features of a data mining tool that make it scalable.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Data mining can yield exciting results for almost every organization that collects data on its customers, markets, products or processes.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

Catalogers have a rich history of customer transactions on millions of customers dating back several years. .Data mining tools can identify patterns among customers and help identify the most likely customers to respond to upcoming mailing campaigns.^ Identify behavior patterns of risky customers.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Predict which customers will respond to mailing.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ The data mining tools can make this leap.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

.Related to an integrated-circuit production line, an example of data mining is described in the paper "Mining IC Test Data to Optimize VLSI Testing."^ Businesses interested in data mining not related to numbers might want to consider Alyuda from Neo Digital, Inc.
  • Data Mining Software Information | Business.com 19 January 2010 18:018 UTC www.business.com [Source type: Academic]

^ People who have Data Mining as a research interest (336) Home News Departments Research Interests People Papers Status Updates Friend Finder Login Signup .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

^ Ansari, Suhail, et al., Integrating E-Commerce and Data Mining: Architectures and Challenges , ICDM 2001.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

[12] .In this paper the application of data mining and decision analysis to the problem of die-level functional test is described.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ This paper explores the role of data mining in decision support systems ....
  • Data Mining - White Papers, Case Studies, Videos, Webcasts - TechWeb Digital Library 19 January 2010 18:018 UTC www.informationweek.com [Source type: Academic]

^ People who have Data Mining as a research interest (336) Home News Departments Research Interests People Papers Status Updates Friend Finder Login Signup .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

.Experiments mentioned in this paper demonstrate the ability of applying a system of mining historical die-test data to create a probabilistic model of patterns of die failure which are then utilised to decide in real time which die to test next and when to stop testing.^ Models and issues in data stream systems .

^ Data mining uses a different model for the creation of information about data.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Data Mining applied to File Integrity.
  • Security Issues in Data Mining 19 January 2010 18:018 UTC www.cs.purdue.edu [Source type: Academic]

.This system has been shown, based on experiments with historical test data, to have the potential to improve profits on mature IC products.^ Data analysis that predicts future trends, behaviors, or events based on historical data.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

^ The second example is also very elementary, but not as trivial: - The knowledge base has been imported from a test-example in many Expert System Textbooks : The “animal expert system”.
  • Assembly Language Extensions of Visual Prolog For Data Mining 19 January 2010 18:018 UTC omadeon.com [Source type: FILTERED WITH BAYES]

^ Based on some test data, a model has been developed that will predict the status of an item without destructive testing.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

Science and engineering

.In recent years, data mining has been widely used in area of science and engineering, such as bioinformatics, genetics, medicine, education and electrical power engineering.^ Data mining is widely being used by the industry sectors such as retail, financial, communication, and marketing organizations where consumer focus is a concern.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

^ Data Mining has emerged as one of the most exciting and dynamic fields in computer science.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ The rapid growth of computerized data, and the computer power available to analyze it, creates great opportunities for data mining in business, medicine, science, government, etc.
  • Statistics 36-350: Data Mining (Fall 2009) 19 January 2010 18:018 UTC www.stat.cmu.edu [Source type: Academic]

.In the area of study on human genetics, the important goal is to understand the mapping relationship between the inter-individual variation in human DNA sequences and variability in disease susceptibility.^ In recent years several novel types of genetic data have become available, and a primary application area for these types of data is in the study of cancer.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Have a working knowledge of some of the more significant current research in the area of data mining and ML. Be aware of various data mining data repositories for the study of data mining.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.In lay terms, it is to find out how the changes in an individual's DNA sequence affect the risk of developing common diseases such as cancer.^ Find out how easily you can use DB2 XQuery to create your own access methods based on your data mining results.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ We also gave out more than $5 billion in new NIH grants to help bring us closer to the cures and treatments of the future for diseases from cancer to autism.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.This is very important to help improve the diagnosis, prevention and treatment of the diseases.^ We also gave out more than $5 billion in new NIH grants to help bring us closer to the cures and treatments of the future for diseases from cancer to autism.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.The data mining technique that is used to perform this task is known as multifactor dimensionality reduction.^ The technique that is used to perform these feats in data mining is called modeling.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

^ Use data mining techniques to detect fraud .
  • data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: News]

^ Data mining is widely being used by the industry sectors such as retail, financial, communication, and marketing organizations where consumer focus is a concern.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

[13]
.In the area of electrical power engineering, data mining techniques have been widely used for condition monitoring of high voltage electrical equipment.^ Use data mining techniques to detect fraud .
  • data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: News]

^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Fortunately, we have reached a point in terms of computational power, storage capacity and cost that enables us to gather, analyze and mine unprecedented amounts of data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

The purpose of condition monitoring is to obtain valuable information on the insulation's health status of the equipment. .Data clustering such as self-organizing map (SOM) has been applied on the vibration monitoring and analysis of transformer on-load tap-changers(OLTCS).^ Data mining is widely being used by the industry sectors such as retail, financial, communication, and marketing organizations where consumer focus is a concern.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

^ In some situations, the data mining model is applied to one event or transaction at a time, such as scoring a loan application for risk.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areas such as promotional campaign management, fraud detection, new product rollout, and so on.
  • An Introduction to Data Mining 19 January 2010 18:018 UTC www.thearling.com [Source type: FILTERED WITH BAYES]

.Using vibration monitoring, it can be observed that each tap change operation generates a signal that contains information about the condition of the tap changer contacts and the drive mechanisms.^ The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ In most catalogs; however, the information available may be contained on only a few spectra, to be truly useful, multiple surveys at different wavelengths have to be considered.

^ Reallocate service investments away from things customers don’t care about (e.g., brochures that contain mostly known information) and toward things that improve the customer experience.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

Obviously, different tap positions will generate different signals. However, there was considerable variability amongst normal condition signals for the exact same tap position. SOM has been applied to detect abnormal conditions and to estimate the nature of the abnormalities.[14]
.Data mining techniques have also been applied for dissolved gas analysis (DGA) on power transformers.^ Fortunately, we have reached a point in terms of computational power, storage capacity and cost that enables us to gather, analyze and mine unprecedented amounts of data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ As we have seen, many data mining problems involve large, complex databases, complicated modeling techniques and substantial computer processing.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

DGA, as a diagnostics for power transformer, has been available for many years. .Data mining techniques such as SOM has been applied to analyse data and to determine trends which are not obvious to the standard DGA ratio techniques such as Duval Triangle.^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ One of the frequently encountered challenges in data mining is the problem of representing unstructured information such as words in text, action codes, and similar non numerical information.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ In some situations, the data mining model is applied to one event or transaction at a time, such as scoring a loan application for risk.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

[14]
.A fourth area of application for data mining in science/engineering is within educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors which reduce their learning[15] and to understand the factors influencing university student retention.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Machine learning and data mining applications are central to eCommerce.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

[16] .A similar example of the social application of data mining its is use in expertise finding systems, whereby descriptors of human expertise are extracted, normalised and classified so as to facilitate the finding of experts, particularly in scientific and technical fields.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Learn expert data mining secrets!

^ Data Mining has emerged as one of the most exciting and dynamic fields in computer science.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.In this way, data mining can facilitate Institutional memory.^ They provide an excellent way of constructing certain kinds of AI systems (e.g., speech recognizers, handwriting recognizers, data mining systems, etc.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ According to the General Accountability Office and the CATO Institute, the government currently has 52 different agencies using almost 200 different data mining programs.
  • Data Mining Giving Privacy the Shaft? - InternetNews.com 19 January 2010 18:018 UTC www.internetnews.com [Source type: News]

^ But instead, leave it in individual organizations and then use what people call privacy preserving data mining methods to essentially just do the data mining in a distributed and cryptic - encrypted way.
  • Data Mining Spurs Innovation, Threatens Privacy : NPR 19 January 2010 18:018 UTC www.npr.org [Source type: General]

.Other examples of applying data mining technique applications are biomedical data facilitated by domain ontologies,[17] mining clinical trial data,[18] traffic analysis using SOM,[19] et cetera.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Web mining is the application of data mining techniques to acquire this knowledge for e-business.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ His current interests are Data Mining techniques and applications.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since 1998, used data mining methods to routinely screen for reporting patterns indicative of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.^ There are a number of data mining methods.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ They report that data mining methods are cumbersome and useless, and they do not find patterns in the data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Data mining is widely being used by the industry sectors such as retail, financial, communication, and marketing organizations where consumer focus is a concern.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

[20] .Recently, similar methodology has been developed to mine large collections of electronic health records for temporal patterns associating drug prescriptions to medical diagnoses.^ A sequential pattern function will analyze such collections of related records and will detect frequently occurring patterns of products bought over time.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

^ Recently, it has gained a lot of attention under the name "Data Mining", with the new twist being an increased emphasis on analyzing large datasets.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ IBM has developed technologies that allow for the implementation of very powerful Association and Sequential Pattern functions (see section Solutions.
  • Data Mining: Extending the Information Warehouse Framework 19 January 2010 18:018 UTC www.almaden.ibm.com [Source type: Reference]

[21]

Spatial Data mining

.Spatial data mining is the application of data mining techniques to spatial data.^ Three applications of data mining principles .
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ His current interests are Data Mining techniques and applications.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

^ Web mining is the application of data mining techniques to acquire this knowledge for e-business.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

.Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography.^ Summary:   Data mining is the process of finding rules and patterns in structured data.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ Moreover, many applications may require real time mining of unusual patterns in data streams, including finding unusual network or telecommunication traffic, real-time pattern mining in video surveillance, detecting suspicious on-line transactions or terrorist activities, and so on.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ The following list summarizes those features of a data mining tool that make it scalable.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.So far, data mining and Geographic Information Systems (GIS) have existed as two separate technologies, each with its own methods, traditions and approaches to visualization and data analysis.^ This is the practice that people have been following for a long time as before the advent of data mining technology.

^ It helps in extracting data from both software and hardware platforms and can be applied on new systems in order to develop the new products and upgrade the existing platforms.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

^ This talk presents an approach for corpus-based text classification based on WHIRL, a database system that augments traditional relational database technology with textual-similarity operations developed in the information retrieval community.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

Particularly, most contemporary GIS have only very basic spatial analysis functionality. .The immense explosion in geographically referenced data occasioned by developments in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasises the importance of developing data driven inductive approaches to geographical analysis and modeling.^ Three factors, however, make developing a data mining model a potentially lengthy process: .
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Global statistics result set You can also use the InfoSphere Warehouse Data tooling to develop and debug UDFs and STPs.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ Both of these approaches are analogous to how data mining models are parallelized.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.Data mining, which is the partially automated search for hidden patterns in large databases, offers great potential benefits for applied GIS-based decision-making.^ The aim of data mining is to make sense of large amounts of data .
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

^ As we shall see, mining large databases and constructing complex models call for lots of computing power, as do sampling, testing and validating.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Data mining is an important method for extracting valuable information from all sizes of databases: large and small.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.Recently, the task of integrating these two technologies has become critical, especially as various public and private sector organisations possessing huge databases with thematic and geographically referenced data begin to realise the huge potential of the information hidden there.^ There are two query languages that can be used to extract information from XML documents.
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

^ The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ All of these services offer the user access to multiple surveys; however, the task of integrating all the catalogs into one source is far from over.

Among those organisations are:
  • offices requiring analysis or dissemination of geo-referenced statistical data
  • public health services searching for explanations of disease clusters
  • environmental agencies assessing the impact of changing land-use patterns on climate change
  • geo-marketing companies doing customer segmentation based on spatial location.
Challenges
.Geospatial data repositories tend to be very large.^ Segmentation or clustering of large data sets with thousands of dimensions is very challenging because of sparsity problems arising from the ``curse of dimensionality''.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ For a very large database, we may set aside 50% or more of the data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.Moreover, existing GIS datasets are often splintered into feature and attribute components, that are conventionally archived in hybrid data management systems.^ It helps in extracting data from both software and hardware platforms and can be applied on new systems in order to develop the new products and upgrade the existing platforms.
  • Data Mining Research Services-Web Projects and Internet Database Mining-Data Mine Applications 19 January 2010 18:018 UTC www.dataentrysolution.com [Source type: FILTERED WITH BAYES]

^ He has been working on research into data mining, data warehousing, database systems, spatial and multimedia databases, deductive and object-oriented databases, Web databases, bio-medical databases, etc.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Off-the-shelf machine-learning algorithms now are used widely both for manual data analysis and as important components for building intelligent systems.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

.Algorithmic requirements differ substantially for relational (attribute) data management and for topological (feature) data management [22].^ This attribute can denote truth or existence of any property, of any type of data in relation to calendar time.
  • Assembly Language Extensions of Visual Prolog For Data Mining 19 January 2010 18:018 UTC omadeon.com [Source type: FILTERED WITH BAYES]

^ It also shows how mining your data may require many different models.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Classification and clustering in linked relational domains require new data mining models and algorithms.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

Related to this is the range and diversity of geographic data formats, that also presents unique challenges. .The digital geographic data revolution is creating new types of data formats beyond the traditional "vector" and "raster" formats.^ Create a new Data Development Project by selecting File > New > Data Development Project .
  • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

.Geographic data repositories increasingly include ill-structured data such as imagery and geo-referenced multi-media [23].^ Her current work includes research on link mining, statistical relational learning and representing uncertainty in structured and semi-structured data.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

.There are several critical research challenges in geographic knowledge discovery and data mining.^ Data Mining and Knowledge Discovery : journal edited by Usama Fayyad.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ People who have Data Mining as a research interest (336) Find people in: sort by: recently joined .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

^ People who have Data Mining as a research interest (336) Home News Departments Research Interests People Papers Status Updates Friend Finder Login Signup .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

Miller and Han [24] offer the following list of emerging research topics in the field:
.
  • Developing and supporting geographic data warehouses - Spatial properties are often reduced to simple aspatial attributes in mainstream data warehouses.^ This attribute can denote truth or existence of any property, of any type of data in relation to calendar time.
    • Assembly Language Extensions of Visual Prolog For Data Mining 19 January 2010 18:018 UTC omadeon.com [Source type: FILTERED WITH BAYES]

    ^ Global statistics result set You can also use the InfoSphere Warehouse Data tooling to develop and debug UDFs and STPs.
    • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

    ^ Benjamin G. Leonhardi , Software Engineer, IBM Benjamin Leonhardi is a software engineer for InfoSphere Warehouse data mining at the IBM Research & Development Lab in Boeblingen, Germany.
    • Using DB2 XQuery to extract data mining results stored as PMML 19 January 2010 18:018 UTC www.ibm.com [Source type: FILTERED WITH BAYES]

    .Creating an integrated GDW requires solving issues in spatial and temporal data interoperability, including differences in semantics, referencing systems, geometry, accuracy and position.
  • Better spatio-temporal representations in geographic knowledge discovery - Current geographic knowledge discovery (GKD) techniques generally use very simple representations of geographic objects and spatial relationships.^ Have a working knowledge of different data mining tools and techniques.
    • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

    ^ ACM's Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) : A premier group promoting knowledge discovery R&D. Organizes one of the top conferences in the area.
    • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

    ^ This talk will describe machine learning algorithms for classifying and extracting information from web pages, including results of recent research on using unlabeled data and other kinds of information to improve learning accuracy.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    .Geographic data mining techniques should recognise more complex geographic objects (lines and polygons) and relationships (non-Euclidean distances, direction, connectivity and interaction through attributed geographic space such as terrain).^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ Often used for predictive modeling, data mining tools can also help organizations better understand relationships among variables.
    • Data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: General]

    ^ One of the frequently encountered challenges in data mining is the problem of representing unstructured information such as words in text, action codes, and similar non numerical information.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    Time needs to be more fully integrated into these geographic representations and relationships.
  • Geographic knowledge discovery using diverse data types - GKD techniques should be developed that can handle diverse data types beyond the traditional raster and vector models, including imagery and geo-referenced multimedia, as well as dynamic data types (video streams, animation).

Surveillance

.Previous data mining to stop terrorist programs under the U.S. government include the Total Information Awareness (TIA) program, Secure Flight (formerly known as Computer-Assisted Passenger Prescreening System (CAPPS II)), Analysis, Dissemination, Visualization, Insight, Semantic Enhancement (ADVISE[25]), and the Multistate Anti-Terrorism Information Exchange (MATRIX).^ Data Mining has emerged as one of the most exciting and dynamic fields in computer science.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Fortunately, we have reached a point in terms of computational power, storage capacity and cost that enables us to gather, analyze and mine unprecedented amounts of data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ A chilling new report by investigative journalist Ryan Singel provides startling details of how the FBI’s National Security Branch Analysis Center (NSAC) is quietly morphing into the Total Information Awareness (TIA) system of convicted Iran-Contra felon, Admiral John M. Poindexter.
  • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

[26] .These programs have been discontinued due to controversy over whether they violate the US Constitution's 4th amendment, although many programs that were formed under them continue to be funded by different organisations, or under different names.^ In April 2007, Undersecretary of Defense for Intelligence, Lt. General James Clapper, “reviewed the results of the TALON program” and concluded “he did not believe they merit continuing the program as currently constituted.” .
  • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

^ The IG’s report failed to disclose what these programs actually did, and probably still do today under the Obama administration.
  • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

^ If they worry about their pros playing anonymously then make it part of their sponsorship deal that they play a certain amount under their own name.
  • Online Poker -- The Data Mining Dilemma - Poker News - CardPlayer.com 19 January 2010 18:018 UTC www.cardplayer.com [Source type: General]

[27]
Two plausible data mining techniques in the context of combating terrorism include "pattern mining" and "subject-based data mining".
Pattern mining
."Pattern mining" is a data mining technique that involves finding existing patterns in data.^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ Moreover, many applications may require real time mining of unusual patterns in data streams, including finding unusual network or telecommunication traffic, real-time pattern mining in video surveillance, detecting suspicious on-line transactions or terrorist activities, and so on.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ People who have Data Mining as a research interest (336) Find people in: sort by: recently joined .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

In this context patterns often means association rules. .The original motivation for searching association rules came from the desire to analyze supermarket transaction data, that is, to examine customer behaviour in terms of the purchased products.^ Fortunately, we have reached a point in terms of computational power, storage capacity and cost that enables us to gather, analyze and mine unprecedented amounts of data.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Rows refers to the number of units or cases -- customers, transactions, products, patients, items on an assembly line.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Data mining can yield exciting results for almost every organization that collects data on its customers, markets, products or processes.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.For example, an association rule "beer ⇒ crisps (80%)" states that four out of five customers that bought beer also bought crisps.^ For example, we invested $80 million to reduce healthcare-associated infections, the infections you get once you’re already in the hospital that kill tens of thousands of Americans a year.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

.In the context of pattern mining as a tool to identify terrorist activity, the National Research Council provides the following definition: "Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as small signals in a large ocean of noise."[28][29][30] Pattern Mining includes new areas such a Music Information Retrieval (MIR) where patterns seen both in the temporal and non temporal domains are imported to classical knowledge discovery search techniques.^ Link mining includes both descriptive and predictive modeling of link data.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ O ct 30 – Financial Data Mining .
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ Data Mining and Knowledge Discovery : journal edited by Usama Fayyad.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

Subject-based data mining
."Subject-based data mining" is a data mining technique involving the search for associations between individuals in data.^ His textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann, 2001) has been popularly adopted for data mining courses in universities.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ A data mining consultant recommends a technique called bootstrapping to estimate and validate the model.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ His current interests are Data Mining techniques and applications.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

In the context of combatting terrorism, the National Research Council provides the following definition: "Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum."[29]

Privacy concerns and ethics

.Some people believe that data mining itself is ethically neutral.^ People who have Data Mining as a research interest (336) Find people in: sort by: recently joined .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

^ Have a working knowledge of some of the more significant current research in the area of data mining and ML. Be aware of various data mining data repositories for the study of data mining.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

^ People who have Data Mining as a research interest (336) Home News Departments Research Interests People Papers Status Updates Friend Finder Login Signup .
  • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

[31] .However, the ways in which data mining can be used can raise questions regarding privacy, legality, and ethics.^ Extracting that information and getting it into usable shape, however, requires sophisticated data mining tools.
  • Data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: General]

^ Wal-Mart, for example, is famed for its use of data mining to analyze "market baskets," the combinations of items consumers group together in one purchase.
  • Data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: General]

^ Often used for predictive modeling, data mining tools can also help organizations better understand relationships among variables.
  • Data mining | ITworld 19 January 2010 18:018 UTC www.itworld.com [Source type: General]

[32] .In particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program or in ADVISE, has raised privacy concerns.^ One of the frequently encountered challenges in data mining is the problem of representing unstructured information such as words in text, action codes, and similar non numerical information.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ While John Poindexter’s Total Information Awareness program may have disappeared along with the Bush administration, it’s toxic heart lives on in the National Security Branch Analysis Center.
  • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

^ The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

[33][34]
.Data mining requires data preparation which can uncover information or patterns which may compromise confidentiality and privacy obligations.^ Moreover, many applications may require real time mining of unusual patterns in data streams, including finding unusual network or telecommunication traffic, real-time pattern mining in video surveillance, detecting suspicious on-line transactions or terrorist activities, and so on.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ While in some cases the data mart may use a special-purpose DBMS developed for the data mining tool, in most cases it will use a standard DBMS. In either case, the size of the data mart or allowing for growth in the size of the data mart will require the data store to be parallelized.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Data mining is an important method for extracting valuable information from all sizes of databases: large and small.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

A common way for this to occur is through data aggregation. Data aggregation is when the data are accrued, possibly from various sources, and put together so that they can be analyzed.[35] .This is not data mining per se, but a result of the preparation of data before and for the purposes of the analysis.^ Preparing datasets for Data Mining Activities .

^ Data mining can yield exciting results for almost every organization that collects data on its customers, markets, products or processes.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Knowledge and Information Systems - An International Journal : IEEE Transactions on Pattern Analysis and Machine Intelligence : Data Mining Group : a data mining standardization organization (e.g.
  • Data mining, data mining course, graduate data mining, financial data mining, machine learning, neural networks, genetic programs, decision trees, WEKA 19 January 2010 18:018 UTC sce.uhcl.edu [Source type: Academic]

.The threat to an individual's privacy comes into play when the data, once compiled, cause the data miner, or anyone who has access to the newly-compiled data set, to be able to identify specific individuals, especially when originally the data were anonymous.^ In this method, we randomly divide the data into two equal sets.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ The size of a database is not the only reason to use scalable tools; other factors in building, testing, and deploying a data mining solution may come into play.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ Privacy concerns about mined or analyzed personal data also include concerns about the quality and accuracy of the mined data; the use of the data for other than the original purpose for which the data were collected without the consent of the individual; the protection of the data against unauthorized access, modification, or disclosure; and the right of individuals to know about the collection of personal information, how to access that information, and how to request a correction of inaccurate information.
  • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

It is recommended that an individual is made aware of the following before data are collected:
.
  • the purpose of the data collection and any data mining projects,
  • how the data will be used,
  • who will be able to mine the data and use them,
  • the security surrounding access to the data, and in addition,
  • how collected data can be updated.^ C3456 ISBN: 9781584883456 Publication Date: December 23, 2002 Binding: Hardback Most books on data mining focus on principles and furnish few instructions on how to carry out a data mining project.
    • CRC Press Online - Book: Next Generation of Data Mining 19 January 2010 18:018 UTC www.crcpress.com [Source type: Academic]

    ^ People who have Data Mining as a research interest (336) Find people in: sort by: recently joined .
    • Academia.edu | People | People who have Data Mining as a research interest (336) 19 January 2010 18:018 UTC www.academia.edu [Source type: Academic]

    ^ While in some cases the data mart may use a special-purpose DBMS developed for the data mining tool, in most cases it will use a standard DBMS. In either case, the size of the data mart or allowing for growth in the size of the data mart will require the data store to be parallelized.
    • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

    [35]
.In the United States, privacy concerns have been somewhat addressed by their congress via the passage of regulatory controls such as the Health Insurance Portability and Accountability Act (HIPAA).^ One of the first bills President Obama signed after he took office was the Children’s Health Insurance Program Reauthorization Act.
  • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

^ Monopolies Control Health Insurance Don Monkerud / 01/16/2010 Obama Executive Order Seeks to “Synchronize and Integrate” State and Federal Military Forces Tom Burghardt / 01/16/2010 Too Little Too Late for Haiti?
  • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

.The HIPAA requires individuals to be given "informed consent" regarding any information that they provide and its intended future uses by the facility receiving that information.^ Using examples drawn from leading web sites, I will provide an overview of how some of these systems work and how they generate great value for customers.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ I just received information from BlueCross BlueShield of Tennessee, which used to be my insurance carrier, that said some of their records had been stolen and now I was subject to identity theft.
  • Data Mining Spurs Innovation, Threatens Privacy : NPR 19 January 2010 18:018 UTC www.npr.org [Source type: General]

.According to an article in Biotech Business Week, “In practice, HIPAA may not offer any greater protection than the longstanding regulations in the research arena, says the AAHC. More importantly, the rule's goal of protection through informed consent is undermined by the complexity of consent forms that are required of patients and participants, which approach a level of incomprehensibility to average individuals.” [36] This underscores the necessity for data anonymity in data aggregation practices.^ Recent graphical approaches provide novel and efficient solutions by working in similarity space rather than in the original vector space in which the data points were specified.
  • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

^ While some data mining problems are amenable to a desktop or simple client-server approach, many problems -- due to their size or complexity -- require a scalable data mining product.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ It also shows how mining your data may require many different models.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

.One may additionally modify the data so that they are anonymous, so that individuals may not be readily identified.^ In the future, I may make more sophisticated maps using additional data.
  • Data Mining 101: Finding Subversives with Amazon Wishlists | Applefritter 19 January 2010 18:018 UTC www.applefritter.com [Source type: General]

^ To identify the model that has the least error and is the best predictor may require building hundreds of models in order to select the best one.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

[35] However, even de-identified data sets can contain enough information to identify individuals, as occurred when journalists were able to find several individuals based on a set of search histories that were inadvertently released by AOL. [37]

Marketplace surveys

.Each year a number of organisations survey the marketplace and produce reports of current data-mining market-place requirements, and comparisons of the tools and vendors that are addressing them.^ However, if you already are a SQL Server Data Mining user, you can win a free, autographed copy of the book by filling a simple survey here .
  • Data Mining with SQL Server 2008 | The Book's Blog 19 January 2010 18:018 UTC sqldataminingbook.com [Source type: General]

^ While in some cases the data mart may use a special-purpose DBMS developed for the data mining tool, in most cases it will use a standard DBMS. In either case, the size of the data mart or allowing for growth in the size of the data mart will require the data store to be parallelized.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

^ The following list summarizes those features of a data mining tool that make it scalable.
  • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

Some of these annual reports include:

Groups and Associations

  • SIGKDD, the ACM Special Interest Group on Knowledge Discovery and Data Mining.

See also

Applications

Methods

Miscellaneous

Data mining is about analysing data; for information about extracting information out of data, see:

References

  1. ^ Kantardzic, Mehmed (2003). Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons. ISBN 0471228524. OCLC 50055336. 
  2. ^ a b The Data Mining Group (DMG). The DMG is an independent, vendor led group which develops data mining standards, such as the Predictive Model Markup Language (PMML).
  3. ^ PMML Project Page
  4. ^ Alex Guazzelli, Michael Zeller, Wen-Ching Lin, Graham Williams. PMML: An Open Standard for Sharing Models. The R Journal, vol 1/1, May 2009.
  5. ^ Y. Peng, G. Kou, Y. Shi, Z. Chen (2008). "A Descriptive Framework for the Field of Data Mining and Knowledge Discovery". International Journal of Information Technology and Decision Making, Volume 7, Issue 4 7: 639 – 682. doi:10.1142/S0219622008003204. 
  6. ^ Proceedings, International Conferences on Knowledge Discovery and Data Mining, ACM, New York.
  7. ^ SIGKDD Explorations, ACM, New York.
  8. ^ International Conference on Data Mining: 5th (2009); 4th (2008); 3rd (2007); 2nd (2006); 1st (2005)
  9. ^ IEEE International Conference on Data Mining: ICDM09, Miami, FL; ICDM08, Pisa (Italy); ICDM07, Omaha, NE; ICDM06, Hong Kong; ICDM05, Houston, TX; ICDM04, Brighton (UK); ICDM03, Melbourne, FL; ICDM02, Maebashi City (Japan); ICDM01, San Jose, CA.
  10. ^ a b Fayyad, Usama; Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). "From Data Mining to Knowledge Discovery in Databases". http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf. Retrieved 2008-12-17. 
  11. ^ Ellen Monk, Bret Wagner (2006). Concepts in Enterprise Resource Planning, Second Edition. Thomson Course Technology, Boston, MA. ISBN 0-619-21663-8. OCLC 224465825. 
  12. ^ Tony Fountain, Thomas Dietterich & Bill Sudyka (2000) Mining IC Test Data to Optimize VLSI Testing, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. (pp. 18-25). ACM Press.
  13. ^ Xingquan Zhu, Ian Davidson (2007). Knowledge Discovery and Data Mining: Challenges and Realities. Hershey, New Your. pp. 18. ISBN 978-159904252-7. 
  14. ^ a b A.J. McGrail, E. Gulski et al.. "Data Mining Techniques to Asses the Condition of High Voltage Electrical Plant". CIGRE WG 15.11 of Study Committee 15. 
  15. ^ R. Baker. "Is Gaming the System State-or-Trait? Educational Data Mining Through the Multi-Contextual Application of a Validated Behavioral Model". Workshop on Data Mining for User Modeling 2007. 
  16. ^ J.F. Superby, J-P. Vandamme, N. Meskens. "Determination of factors influencing the achievement of the first-year university students using data mining methods". Workshop on Educational Data Mining 2006. 
  17. ^ Xingquan Zhu, Ian Davidson (2007). Knowledge Discovery and Data Mining: Challenges and Realities. Hershey, New York. pp. 163–189. ISBN 978-159904252-7. 
  18. ^ ibid. pp. 31–48.
  19. ^ Yudong Chen, Yi Zhang, Jianming Hu, Xiang Li. "Traffic Data Analysis Using Kernel PCA and Self-Organizing Map". Intelligent Vehicles Symposium, 2006 IEEE. 
  20. ^ Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, De Freitas RM. A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol. 1998 Jun;54(4):315-21.
  21. ^ Norén GN, Bate A, Hopstadius J, Star K, Edwards IR. Temporal Pattern Discovery for Trends and Transient Effects: Its Application to Patient Records. Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining SIGKDD 2008, pages 963-971. Las Vegas NV, 2008.
  22. ^ Healey, R., 1991, Database Management Systems. In Maguire, D., Goodchild, M.F., and Rhind, D., (eds.), Geographic Information Systems: Principles and Applications (London: Longman).
  23. ^ Câmara, A. S. and Raper, J., (eds.), 1999, Spatial Multimedia and Virtual Reality, (London: Taylor and Francis).
  24. ^ Miller, H. and Han, J., (eds.), 2001, Geographic Data Mining and Knowledge Discovery, (London: Taylor & Francis).
  25. ^ Government Accountability Office, Data Mining: Early Attention to Privacy in Developing a Key DHS Program Could Reduce Risks, GAO-07-293, Washington, D.C.: February 2007.
  26. ^ Secure Flight Program report, MSNBC.
  27. ^ "Total/Terrorism Information Awareness (TIA): Is It Truly Dead?". Electronic Frontier Foundation (official website). 2003. http://w2.eff.org/Privacy/TIA/20031003_comments.php. Retrieved 2009-03-15. 
  28. ^ R. Agrawal et al., Fast discovery of association rules, in Advances in knowledge discovery and data mining pp. 307-328, MIT Press, 1996.
  29. ^ a b National Research Council, Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, Washington, DC: National Academies Press, 2008.
  30. ^ Stephen Haag et al. (2006). Management Information Systems for the information age. Toronto: McGraw-Hill Ryerson. pp. 28. ISBN 0-07-095569-7. OCLC 63194770. 
  31. ^ William Seltzer. The Promise and Pitfalls of Data Mining: Ethical Issues. http://www.amstat.org/committees/ethics/linksdir/Jsm2005Seltzer.pdf. 
  32. ^ Chip Pitts (March 15, 2007). "The End of Illegal Domestic Spying? Don't Count on It". Washington Spectator. http://www.washingtonspectator.com/articles/20070315surveillance_1.cfm. 
  33. ^ K.A. Taipale (December 15, 2003). "Data Mining and Domestic Security: Connecting the Dots to Make Sense of Data". Columbia Science and Technology Law Review 5 (2). SSRN 546782 / OCLC 45263753. http://www.stlr.org/cite.cgi?volume=5&article=2. 
  34. ^ John Resig, Ankur Teredesai (2004). "A Framework for Mining Instant Messaging Services". In Proceedings of the 2004 SIAM DM Conference. http://citeseer.ist.psu.edu/resig04framework.html. 
  35. ^ a b c Think Before You Dig: Privacy Implications of Data Mining & Aggregation, NASCIO Research Brief, September 2004.
  36. ^ Biotech Business Week Editors. (June 30, 2008). BIOMEDICINE; HIPAA Privacy Rule Impedes Biomedical Research. Biotech Business Week. Retrieved 17 Nov 2009 from LexisNexis Academic.
  37. ^ AOL search data identified individuals, SecurityFocus, August 2006.
  38. ^ Gareth Herschel (1 July 2008) Magic Quadrant for Customer Data-Mining Applications, Gartner Inc.
  39. ^ Karl Rexer, Paul Gearan, & Heather Allen (2008) 2008 Data Miner Survey Summary, presented at SPSS Directions Conference, Oct. 2008, and Oracle BIWA Summit, Nov. 2008.

Further reading

.
  • Bhagat, Phiroz Pattern Recognition in Industry, Elsevier, ISBN 0-08-044538-1.
  • Cabena, Peter, Pablo Hadjnian, Rolf Stadler, Jaap Verhees and Alessandro Zanasi (1997) Discovering Data Mining: From Concept to Implementation, Prentice Hall, ISBN 0137439806.
  • Dummer, Stephen W., False Positives And Secure Flight Using Dataveillance When Viewed Through The Ever Increasing Likelihood Of Identity Theft, 11 J. of Tech.^ You can use the Visio Data Mining Template and see what functions are called….but their different…..
    • Data Mining with SQL Server 2008 | The Book's Blog 19 January 2010 18:018 UTC sqldataminingbook.com [Source type: General]

    ^ Recently, it has gained a lot of attention under the name "Data Mining", with the new twist being an increased emphasis on analyzing large datasets.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ However, as we have since learned, the data-mining portion of the program was farmed out to a host of state agencies, including the National Security Agency, the Defense Intelligence Agency and the FBI. .
    • Dissident Voice : FBI Data-Mining Programs Resurrect “Total Information Awareness” 19 January 2010 18:018 UTC dissidentvoice.org [Source type: FILTERED WITH BAYES]

    .Law & Pol’y 259 (2006).
  • Dummer, Stephen W., Comment: Secure Flight and Dataveillance, A New Type Of Civil Liberties Erosion: Stripping Your Rights When You Don’t Even Know It, 75 MISS. L.J. 583 (2005).
  • Feldman, Ronen and James Sanger The Text Mining Handbook, Cambridge University Press, ISBN 9780521836579.
  • Guo, Yike and Robert Grossman, editors (1999) High Performance Data Mining: Scaling Algorithms, Applications and Systems, Kluwer Academic Publishers.
  • Hastie, Trevor, Robert Tibshirani and Jerome Friedman (2001).^ The goal of hardware scalability is to provide high performance by adding modestly priced processor building blocks (or nodes) in such a way that performance scales linearly.
    • Two Crows white paper: "Scalable Data Mining" 19 January 2010 18:018 UTC www.twocrows.com [Source type: FILTERED WITH BAYES]

    ^ Jerome H. Friedman Stanford University Fri, Nov 10, 2000 .
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ Constrained Clustering: Advances in Algorithms, Theory, and Applications Editor(s): Sugato Basu, Google, Inc.
    • CRC Press Online - Book: Next Generation of Data Mining 19 January 2010 18:018 UTC www.crcpress.com [Source type: Academic]

    .The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, ISBN 0387952845.
  • Hornick, Mark F., Erik Marcade and Sunil Venkayala Java Data Mining: Strategy, Standard, and Practice: A Practical Guide for Architecture, Design, And Implementation (Broché).
  • Bing Liu (2007).^ Link mining includes both descriptive and predictive modeling of link data.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ This is the practice that people have been following for a long time as before the advent of data mining technology.

    ^ Algorithm Iphigenia can then also be used as a powerful data-mining engine, able to deduce new logic patterns and inferences (in bits) that were not known (at the start), provided we make sure that the interface between the “raw data” and our logical “meta-rules” ( used for the data mining strategy ) is kept transparently clear and well-designed.
    • Assembly Language Extensions of Visual Prolog For Data Mining 19 January 2010 18:018 UTC omadeon.com [Source type: FILTERED WITH BAYES]

    .Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, ISBN 3540378812.
  • Mierswa, Ingo, Michael Wurst, Ralf Klinkenberg, Martin Scholz and Timm Euler (2006) YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06).
  • Mucherino, Antonio and Papajoirgji, Petraq and Pardalos, Panos, 'Data Mining in Agriculture', Springer, 2009. Author web page on the book Springer
  • Nisbet, Robert, John Elder, Gary Miner, 'Handbook of Statistical Analysis & Data Mining Applications, Academic Press/Elsevier, ISBN: 9780123747655 (2009)
  • Poncelet, Pascal, Florent Masseglia and Maguelonne Teisseire, editors (October 2007) Data Mining Patterns: New Methods and Applications, Information Science Reference, ISBN 978-1599041629.
  • Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining (2005), ISBN 0-321-32136-7
  • Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Pattern Recognition" , 4th Edition, Academic Press, ISBN: 978-1-59749-272-0.
  • Wang, X.Z.; Medasani, S.; Marhoon, F; Al-Bazzaz, H. (2004) Multidimensional visualisation of principal component scores for process historical data analysis, Industrial & Engineering Chemistry Research, 43(22), pp. 7036–7048.
  • Wang, X.Z. (1999) Data mining and knowledge discovery for process monitoring and control.^ Industries where data analysis is critical .
    • Data-Mi.ning - The aim of data mining is to make sense of large amounts of data 19 January 2010 18:018 UTC data-mi.ning.com [Source type: General]

    ^ He is also author of a new book on Web Mining.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    ^ Hive: Large-scale, distributed data processing Suppose you want to run regular statistical analyses on your Web site's traffic log data -- several hundred terabytes, updated weekly.
    • Computerworld - Knowledge Centers - Learn about Data Mining 19 January 2010 18:018 UTC www.computerworld.com [Source type: News]
    • Computerworld - Knowledge Centers - Learn about Data Mining 19 January 2010 18:018 UTC www.computerworld.com [Source type: News]

    .Springer, London.
  • Weiss and Indurkhya Predictive Data Mining, Morgan Kaufmann.
  • Witten, Ian and Eibe Frank (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, ISBN 1-55860-552-5. (See also Free Weka software.^ Predictive Data Mining .
    • Data Mining Survivor: dmsurvivor - Index 19 January 2010 18:018 UTC datamining.togaware.com [Source type: Academic]

    ^ However, if you already are a SQL Server Data Mining user, you can win a free, autographed copy of the book by filling a simple survey here .
    • Data Mining with SQL Server 2008 | The Book's Blog 19 January 2010 18:018 UTC sqldataminingbook.com [Source type: General]

    ^ Link mining includes both descriptive and predictive modeling of link data.
    • UT-Austin Data Mining Seminar Schedule Abstracts 19 January 2010 18:018 UTC www.cs.utexas.edu [Source type: Academic]

    )

External links


Simple English

Data mining is a term from computer science. Sometimes it is also called knowledge discovery in databases (KDD). Data mining is about finding new information in a lot of data. The information obtained from data mining is hopefully both new and useful.

In many cases, data is stored so it can be used later. The data is saved with a goal. For example, a store wants to save what has been bought. They want to do this to know how much they should buy themselves, to have enough to sell later. Saving this information, makes a lot of data. The data is usually saved in a database. The reason why data is saved is called the first use.

Later, the same data can also be used to get other information that was not needed for the first use. The store might want to know now what kind of things people buy together when they buy at the store. (If someone buys pasta, they usually also buy mushrooms for example.) That kind of information is in the data, and is useful, but was not the reason why the data was saved. This information is new and can be useful. It is a second use for the same data.

Finding new information that can also be useful from data, is called data mining.

Different kinds of data mining

For data, there a lot of different kinds of data mining for getting new information. Usually, prediction is involved. There is uncertainity in the predicted results. The following is based on the observation that there is a small green apple. Some of the kinds of data mining are:

  • Pattern-recognition (Trying to find similarities in the rows in the database, in the form of rules. Small -> green. (Small apples are often green))
  • Using a Bayesian Network (Trying to make something that can say how the different data attributes are connected/influence each other. The size and the colour are related.So if you know something about the size, you can guess the colour.)
  • Using a Neural Network (Trying to make a model like a brain, which is hard to understand, but a computer can tell that if the apple is green it has a higher chance to be sour, if we tell the computer the apple is green. So this is like a black box model, we do not know how it works, but it works.)
  • Using Classification tree (With all other knowledge trying to say what one other thing about the thing we are looking at will be. Here is an apple with a size, a colour and shininess, what will it taste like?)

Citable sentences

Up to date as of December 20, 2010

Here are sentences from other pages on Data mining, which are similar to those in the above article.








Got something to say? Make a comment.
Your name
Your email address
Message