Monday, November 30, 2009
Market Basket Analysis
Market Basket Analysis
What is it?
Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don't buy a bar meal, you are more likely to buy crisps (US. chips) at the same time than somebody who didn't buy beer.
The set of items a customer buys is referred to as an itemset, and market basket analysis seeks to find relationships between purchases.
Typically the relationship will be in the form of a rule:
IF {beer, no bar meal} THEN {crisps}.
The probability that a customer will buy beer without a bar meal (i.e. that the antecedent is true) is referred to as the support for the rule. The conditional probability that a customer will purchase crisps is referred to as the confidence.
Other Application Areas
Although Market Basket Analysis conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in which it can be applied. These include:
Analysis of credit card purchases.
Analysis of telephone calling patterns.
Identification of fraudulent medical insurance claims.
(Consider cases where common rules are broken).
Analysis of telecom service purchases.
(AlbionResearch)
Thursday, November 19, 2009
What is Business Intelligence?
What is Business Intelligence?
Business Intelligence is the process of gathering meaningful information at any time to answer questions and identify significant trends or patterns, giving key stakeholders the ability to make better decisions (Aexis)
Monday, November 2, 2009
Learning Data Mining Easily Using iDA Excel add in
Learning Data Mining Easily Using iDA Excel add in
One of the easiest way to learn data mining is learning data mining using the free trial of iDA (Intelligent Data Analyzer).
iDA requires Java. Most likely your computer has Java. If your computer doesn’t have Java, please install Java before installing iDA. You can get Java for free on http://java.com/en/
You can get iDA software from:
a. InfoAcumen, i.e., the maker of iDA:
http://www.infoacumen.com/ (click free download on the on the left part of your screen). I strongly recommend this web site.
You can also get the turorial in pdf from that website.
The pdf file actually is chapter 4 from the following book:
“Data Mining: A Tutorial Based Primer” by Richard Roiger and Michael Geatz. Publisher: Addison Wesley/Pearson, ISBN-10: 0201741288; ISBN-13: 978-0201741285
b. Richard Roiger's website.
http://krypton.mnsu.edu/~roiger/supp.htm (click IDA download).
You can also get PowerPoint slides, additional software, etc. from that web site.
After installing iDA, you can find the data in C:\iDA\Samples.
Unfortunately, iDA only works with Windows XP and Windows 7.
I do not know how to make iDA works with Windows Vista even if I use the compatibility function in Windows Vista.
Monday, October 26, 2009
What is Data Mining and Text Mining?
From raw data to smarter business decisions
Every organization accumulates huge volumes of data from a variety of sources on a daily basis. Data mining is an iterative process of creating predictive and descriptive models, by uncovering previously unknown trends and patterns in vast amounts of data from across the enterprise, in order to support decision making. Text mining applies the same analysis techniques to text-based documents. The knowledge gleaned from data and text mining can be used to fuel strategic decision making.
(SAS)
Thursday, October 22, 2009
Book: Table of Contents - Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining
Preface.
1. Introduction.
1.1 Overview.
1.2 Problem definition.
1.3 Data preparation.
1.4 Implementation of the analysis.
1.5 Deployment of the results.
1.6 Book outline.
1.7 Summary.
1.8 Further reading.
2. Definition.
2.1 Overview.
2.2 Objectives.
2.3 Deliverables.
2.4 Roles and responsibilities.
2.5 Project plan.
2.6 Case study.
2.6.1 Overview.
2.6.2 Problem.
2.6.3 Deliverables.
2.6.4 Roles and responsibilities.
2.6.5 Current situation.
2.6.6 Timetable and budget.
2.6.7 Cost/benefit analysis.
2.7 Summary.
2.8 Further reading.
3. Preparation.
3.1 Overview.
3.2 Data sources.
3.3 Data understanding.
3.3.1 Data tables.
3.3.2 Continuous and discrete variables.
3.3.3 Scales of measurement.
3.3.4 Roles in analysis.
3.3.5 Frequency distribution.
3.4 Data preparation.
3.4.1 Overview.
3.4.2 Cleaning the data.
3.4.3 Removing variables.
3.4.4 Data transformations.
3.4.5 Segmentation.
3.5 Summary.
3.6 Exercises.
3.7 Further reading.
4. Tables and graphs.
4.1 Introduction.
4.2 Tables.
4.2.1 Data tables.
4.2.2 Contingency tables.
4.2.3 Summary tables.
4.3 Graphs.
4.3.1 Overview.
4.3.2 Frequency polygrams and histograms.
4.3.3 Scatterplots.
4.3.4 Box plots.
4.3.5 Multiple graphs.
4.4 Summary.
4.5 Exercises.
4.6 Further reading.
5. Statistics.
5.1 Overview.
5.2 Descriptive statistics.
5.2.1 Overview.
5.2.2 Central tendency.
5.2.3 Variation.
5.2.4 Shape.
5.2.5 Example.
5.3 Inferential statistics.
5.3.1 Overview.
5.3.2 Confidence intervals.
5.3.3 Hypothesis tests.
5.3.4 Chi-square.
5.3.5 One-way analysis of variance.
5.4 Comparative statistics.
5.4.1 Overview.
5.4.2 Visualizing relationships.
5.4.3 Correlation coefficient (r).
5.4.4 Correlation analysis for more than two variables.
5.5 Summary.
5.6 Exercises.
5.7 Further reading.
6. Grouping.
6.1 Introduction.
6.1.1 Overview.
6.1.2 Grouping by values or ranges.
6.1.3 Similarity measures.
6.1.4 Grouping approaches.
6.2 Clustering.
6.2.1 Overview.
6.2.2 Hierarchical agglomerative clustering.
6.2.3 K-means clustering.
6.3 Associative rules.
6.3.1 Overview.
6.3.2 Grouping by value combinations.
6.3.3 Extracting rules from groups.
6.3.4 Example.
6.4 Decision trees.
6.4.1 Overview.
6.4.2 Tree generation.
6.4.3 Splitting criteria.
6.4.4 Example.
6.5 Summary.
6.6 Exercises.
6.7 Further reading.
7. Prediction.
7.1 Introduction.
7.1.1 Overview.
7.1.2 Classification.
7.1.3 Regression.
7.1.4 Building a prediction model.
7.1.5 Applying a prediction model.
7.2 Simple regression models.
7.2.1 Overview.
7.2.2 Simple linear regression.
7.2.3 Simple nonlinear regression.
7.3 K-nearest neighbors.
7.3.1 Overview.
7.3.2 Learning.
7.3.3 Prediction.
7.4 Classification and regression trees.
7.4.1 Overview.
7.4.2 Predicting using decision trees.
7.4.3 Example.
7.5 Neural networks.
7.5.1 Overview.
7.5.2 Neural network layers.
7.5.3 Node calculations.
7.5.4 Neural network predictions.
7.5.5 Learning process.
7.5.6 Backpropagation.
7.5.7 Using neural networks.
7.5.8 Example.
7.6 Other methods.
7.7 Summary.
7.8 Exercises.
7.9 Further reading.
8. Deployment.
8.1 Overview.
8.2 Deliverables.
8.3 Activities.
8.4 Deployment scenarios.
8.5 Summary.
8.6 Further reading.
9. Conclusions.
9.1 Summary of process.
9.2 Example.
9.2.1 Problem overview.
9.2.2 Problem definition.
9.2.3 Data preparation.
9.2.4 Implementation of the analysis.
9.2.5 Deployment of the results.
9.3 Advanced data mining.
9.3.1 Overview.
9.3.2 Text data mining.
9.3.3 Time series data mining.
9.3.4 Sequence data mining.
9.4 Further reading.
Appendix A Statistical tables.
A.1 Normal distribution.
A.2 Student’s t-distribution.
A.3 Chi-square distribution.
A.4 F-distribution.
Appendix B Answers to exercises.
Glossary.
Bibliography.
Index
Book: Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining
Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining
Glenn J. Myatt
ISBN: 978-0-470-07471-8
Paperback
292 pages
November 2006
========
A practical, step-by-step approach to making sense out of data
Making Sense of Data educates readers on the steps and issues that need to be considered in order to successfully complete a data analysis or data mining project. The author provides clear explanations that guide the reader to make timely and accurate decisions from data in almost every field of study. A step-by-step approach aids professionals in carefully analyzing data and implementing results, leading to the development of smarter business decisions. With a comprehensive collection of methods from both data analysis and data mining disciplines, this book successfully describes the issues that need to be considered, the steps that need to be taken, and appropriately treats technical topics to accomplish effective decision making from data.
Readers are given a solid foundation in the procedures associated with complex data analysis or data mining projects and are provided with concrete discussions of the most universal tasks and technical solutions related to the analysis of data, including:
* Problem definitions
* Data preparation
* Data visualization
* Data mining
* Statistics
* Grouping methods
* Predictive modeling
* Deployment issues and applications
Throughout the book, the author examines why these multiple approaches are needed and how these methods will solve different problems. Processes, along with methods, are carefully and meticulously outlined for use in any data analysis or data mining project.
From summarizing and interpreting data, to identifying non-trivial facts, patterns, and relationships in the data, to making predictions from the data, Making Sense of Data addresses the many issues that need to be considered as well as the steps that need to be taken to master data analysis and mining.
Success in Enterprise Information Management - Seven Causes
Success in Enterprise Information Management - Seven Causes
by Anne Marie Smith, Ph. D.
Many enterprise information management (EIM) or data management projects don't live up to their potential. EIM technology (data dictionaries, meta data management products, data modeling, data warehousing and business intelligence, data quality) have been around for a long time. Enterprise Data Management is a mature field, even if it has been called by different names, and the field is founded on strong principles. The approaches are well-structured, cover a wide variety of situations and have worked well for many organizations. Additionally, project management processes, tools and technologies are mature and well established. So the question arises, why do data management projects / programs fail?
www.eiminstitute.org/
Managing Unstructured Data
Managing Unstructured Data
by Larissa Moss
According to a 2003 study by the University of California at Berkeley, about 5 exabytes (an exabyte is roughly the equivalent of 1,000 petabytes, 1 million terabytes, or 1 billion gigabytes) of unique analog and digital information were produced worldwide in 2002, twice the amount produced in 1999. That's a data explosion equivalent to half a million new libraries the size of the print collection of the Library of Congress, and this number will continue to expand exponentially. Although we haven't seen any further studies, today - in 2009 - and after the massive use of social networks, such as FaceBook, YouTube, MySpace and Twitter, this number must be incredible! IBM estimates that about 85 percent of all data is unstructured and about 50 percent of the unstructured data is duplicated. Therefore, any discussion about a data strategy is incomplete without formulating a tactic for maintaining unstructured data.
www.eiminstitute.org/
Meta Data Silos- Part I
Meta Data Silos- Part I
by David Marco
Meta data management and its use in enterprise information management has become one of the critical information technology (IT) focuses for both global 2000 corporations and large government agencies. As these entities look to reduce their IT portfolio and control their escalating IT costs, they are turning to the technical functionality that a managed meta data environment (MME) can provide them. The organizations that have built well-architected enterprise-wide MMEs have achieved a tremendous amount of success. Unfortunately, like most popular IT trends, companies are making key mistakes in building and moving forward on their meta data management investments.
www.eiminstitute.org/
Wednesday, October 21, 2009
SAS Data Mining and Text Mining
Data Mining and Text Mining
From raw data to smarter business decisions
Every organization accumulates huge volumes of data from a variety of sources on a daily basis. Data mining is an iterative process of creating predictive and descriptive models, by uncovering previously unknown trends and patterns in vast amounts of data from across the enterprise, in order to support decision making. Text mining applies the same analysis techniques to text-based documents. The knowledge gleaned from data and text mining can be used to fuel strategic decision making.
"SAS provides a solid and efficient linkage between our quantitative expertise and our clinical expertise. It allows us to take the data and to quickly and efficiently produce models that can be implemented to allow our clinicians to be with the right person at the right time."
—Adam Hobgood
Director of Statistics, Center for Health Research, Healthways
Components of Data Mining and Text Mining
Data mining – Go from raw data to accurate, business-driven analytical models with a seamless, efficient process.
Scoring acceleration – Maximize the performance and accuracy of your analytic models.
Text mining – Uncover key text-based information in large document collections, and integrate that information with your structured data.
For customers who are interested in in-database analytics with a Teradata data warehouse, be sure to learn about SAS Analytic Advantage for Teradata.
How SAS® Is Different
Only SAS offers a rich suite of integrated data mining tools that provide:
Unprecedented ease of use.
Ability to explore and exploit corporate data for strategic business advantage – all in a single environment.
Reduced time to decisions and a more accurate organizational view.
Services and training to help organizations get started right away.
Subscribe to:
Posts (Atom)