Cover image for Visual data mining : techniques and tools for data visualization and mining
Visual data mining : techniques and tools for data visualization and mining
Soukup, Tom, 1962-
Personal Author:
Publication Information:
New York : Wiley Pub., [2002]

Physical Description:
xxxi, 382 pages, 8 unnumbered pages of plates : illustrations (some color), maps (some color) ; 24 cm
Added Author:
Format :


Call Number
Material Type
Home Location
Item Holds
QA76.9.D343 S68 2002 Adult Non-Fiction Central Closed Stacks

On Order



Marketing analysts use data mining techniques to gain a reliable understanding of customer buying habits and then use that information to develop new marketing campaigns and products. Visual mining tools introduce a world of possibilities to a much broader and non-technical audience to help them solve common business problems. Explains how to select the appropriate data sets for analysis, transform the data sets into usable formats, and verify that the sets are error-free Reviews how to choose the right model for the specific type of analysis project, how to analyze the model, and present the results for decision making Shows how to solve numerous business problems by applying various tools and techniques Companion Web site offers links to data visualization and visual data mining tools, and real-world success stories using visual data mining

Author Notes

TOM SOUKUP has more than fifteen years of experience in data management and analysis. He is currently with Konami Gaming, Inc., where he is involved in data mining and data warehousing projects for the gaming industry.
IAN DAVIDSON, PhD, has worked on commercial data mining applications, including insurance claim fraud detection, product cross-sell, customer retention, and credit card fraud detection. He is currently an Assistant Professor of Computer Science at the State University of New York, Albany.

Table of Contents

Acknowledgmentsp. xv
About the Authorsp. xvii
Trademarksp. xix
Introductionp. xxi
Part 1 Introduction and Project Planning Phasep. 1
Chapter 1 Introduction to Data Visualization and Visual Data Miningp. 3
Visualization Data Setsp. 5
Visualization Data Typesp. 6
Visual versus Data Dimensionsp. 7
Data Visualization Toolsp. 8
Multidimensional Data Visualization Toolsp. 8
Column and Bar Graphsp. 10
Distribution and Histogram Graphsp. 10
Box Graphsp. 12
Line Graphsp. 14
Scatter Graphsp. 16
Pie Graphsp. 17
Hierarchical and Landscape Data Visualization Toolsp. 19
Tree Visualizationsp. 19
Map Visualizationsp. 20
Visual Data Mining Toolsp. 21
Summaryp. 23
Chapter 2 Step 1: Justifying and Planning the Data Visualization and Data Mining Projectp. 25
Classes of Projectsp. 26
Project Justificationsp. 27
Dayton Hudson Corp. Success Storyp. 29
Marketing Dynamics Success Storyp. 29
Sprint Success Storyp. 30 Success Storyp. 30
Challenges to Visual Data Miningp. 31
Data Visualization, Analysis, and Statistics are Meaninglessp. 31
Why Are the Predictions Not 100 Percent Accurate?p. 31
Our Data Can't Be Visualized or Minedp. 32
Closed-Loop Business Modelp. 32
Using the Closed-Loop Business Modelp. 34
Project Timelinep. 36
Project Resources and Rolesp. 36
Data and Business Analyst Teamp. 38
Domain Expert Teamp. 38
Decision Maker Teamp. 40
Operations Teamp. 41
Data Warehousing Teamp. 42
Project Justification and Plan for the Case Studyp. 44
Summaryp. 48
Chapter 3 Step 2: Identifying the Top Business Questionsp. 49
Choosing the Top Business Questionsp. 49
Problems Data Mining Does Not Addressp. 50
Data Visualization Problem Definitionsp. 51
Multidimensional or Comparative Visualization Problem Definitionsp. 51
Geographic or Spatial Data Visualization Problem Definitionsp. 52
Visual Data Mining Problem Definitionsp. 52
Classification Data Mining Problem Definitionsp. 53
Estimation Data Mining Problem Definitionsp. 54
Association Grouping Data Mining Problem Definitionsp. 54
Clustering and Segmentation Data Mining Problem Definitionsp. 54
Prediction Data Mining Problem Definitionsp. 55
Which Data Mining Techniques Can Address a Business Issue?p. 55
Mapping the ROI Targetsp. 57
Determining the Visualization and Data Mining Analysis Goals and Success Criteriap. 59
Problem and Objective Definitions for the Case Studyp. 61
Summaryp. 63
Part 2 Data Preparation Phasep. 65
Chapter 4 Step 3: Choosing the Business Data Setp. 67
Identifying the Operational Datap. 68
Exploratory Data Martp. 69
Business Data Setsp. 71
Data Typesp. 74
Experimental Unitp. 74
Surveying Discrete and Continuous Columns with Visualizationsp. 75
Selecting Columns from the Operational Data Sourcesp. 79
Encoded Data Dimensionsp. 80
Data Dimension Consistencyp. 82
Business Rule Consistencyp. 82
Unique Columnsp. 82
Duplicate Columnsp. 83
Correlated Columnsp. 84
Insignificant Columnsp. 84
Developing and Documenting the ECTL Proceduresp. 85
Data Cleaningp. 87
Techniques for Handling Data Noise, NULLs, and Missing Valuesp. 89
Handling NULLsp. 91
Sampling the Operational Data Sourcesp. 92
Avoiding Biased Samplingp. 94
Available ECTL Toolsp. 96
Documenting the ECTL Proceduresp. 97
Choosing the Business Data Set for the Case Studyp. 98
Identifying the Operational Data Sourcesp. 100
ECTL Processing of the Customer Filep. 102
Documenting ECTL Procedure for the Customer Filep. 108
ECTL Processing of the Contract Filep. 109
Documenting ECTL Procedure for the Contact Filep. 113
ECTL Processing of the Invoice Filep. 113
Documenting ECTL Procedure for the Invoice Filep. 118
ECTL Processing of the Demographic Filep. 118
Documenting ECTL Procedure for the Demographic Filep. 122
Creating the Production Business Data Setp. 123
Review of the ECTL Procedures for the Case Studyp. 126
Summaryp. 127
Chapter 5 Step 4: Transforming the Business Data Setp. 129
Types of Logical Transformationsp. 130
Table-Level Logical Transformationsp. 131
Transforming Weighted Data Setsp. 132
Transforming Column Weightsp. 133
Transforming Record Weightsp. 135
Transforming Time Series Data Setsp. 137
Aggregating the Data Setsp. 140
Filtering Data Setsp. 142
Column-Level Logical Transformationsp. 143
Simple Column Transformationsp. 144
Column Grouping Transformationsp. 146
Documenting the Logical Transformationsp. 151
Logically Transforming the Business Data Set for the Customer Retention VDM Case Studyp. 154
Logically Transforming the customer_join Business Data Setp. 156
Documenting the Logical Transformations for the Business Data Set customer_joinp. 163
Logically Transforming the customer_demographic Business Data Setp. 164
Documenting the Logical Transformations for the Business Data Set customer_demographicp. 168
Review of the Logical Transformation Procedures for the Case Studyp. 168
Summaryp. 169
Chapter 6 Step 5: Verify the Business Data Setp. 171
Verification Processp. 172
Verifying the Integrity of the Data Preparation Operationsp. 173
Discrete Column Verification Techniquesp. 174
Continuous Column Verification Techniquesp. 178
Verifying Common ECTL Data Preparation Operationsp. 180
Verifying the Logic of the Data Preparation Operationsp. 181
Verifying Common Logical Transformation Operationsp. 181
Data Profiling Toolsp. 188
Verifying the Data Set for the Case Studyp. 189
Verifying the ECTL Proceduresp. 191
Verifying the ECTL Data Preparation Step for the Customer Tablep. 191
Verifying the ECTL Data Preparation Step for the Contract Tablep. 197
Verifying the ECTL Data Preparation Step for the Invoice Tablep. 197
Verifying the ECTL Data Preparation Step for the Demographic Tablep. 199
Verifying the Logical Transformationsp. 199
Summaryp. 201
Part 3 Data Analysis Phase and Beyondp. 203
Chapter 7 Step 6: Choosing the Visualization or Visual Mining Toolp. 205
Choosing the Right Data Visualization Toolp. 206
Multidimensional Visualizationsp. 208
Column and Bar Graphsp. 208
Area, Line, High-Low-Close, and Radar Graphsp. 216
Histogram, Distribution, Pie, and Doughnut Graphsp. 219
Scatter Graphsp. 220
Specialized Landscape and Hierarchical Visualizationsp. 221
Map Graphsp. 222
Tree Graphsp. 222
Choosing the Right Data Mining Toolp. 225
Which Subset of the Available Tools Is Applicable?p. 225
Business Questions to Addressp. 225
How Is the Model to Be Used?p. 227
Supervised and Unsupervised Learningp. 227
Supervised Learning Toolsp. 228
Decision Trees and Rule Set Modelsp. 228
Neural Network Models for Classificationp. 230
Linear Regression Modelsp. 231
Logistic Regressionp. 232
Unsupervised Learning Toolsp. 233
Association Rulesp. 233
K-Means and Clusteringp. 234
Kohonen Self-Organizing Mapsp. 235
Tools to Solve Typical Problemsp. 236
Which of the Applicable Tools Are Best for My Situation?p. 236
How the Different Techniques Handle Data Typesp. 240
Choosing the Visualization or Mining Tool for the Case Studyp. 242
Choosing the Data Visualization Toolsp. 243
Choosing the Data Mining Toolsp. 248
Tuning the Data Mining Tool Selectionp. 248
Summaryp. 250
Chapter 8 Step 7: Analyzing the Visualization or Mining Toolp. 253
Analyzing the Data Visualizationsp. 254
Using Frequency Graphs to Discover and Evaluate Key Business Indicatorsp. 254
Using Pareto Graphs to Discover and Evaluate the Importance of Key Business Indicatorsp. 262
Using Radar Graphs to Spot Seasonal Trends and Problem Areasp. 265
Using Line Graphs to Analyze Time Relationshipsp. 268
Using Scatter Graphs to Evaluate Cause-and-Effect Relationshipsp. 270
Analyzing the Data Mining Modelsp. 276
Visualizations to Understand the Performance of the Core Data Mining Tasksp. 276
Classificationp. 276
Estimationp. 283
Association Groupingp. 284
Clustering and Segmentingp. 284
Using Visualization to Understand and Evaluate Supervised Learning Modelsp. 288
Decision Treesp. 288
Neural Networksp. 290
Uses of Visualizations after Model Deploymentp. 290
Analyzing the Visualization or Mining Tools for the Case Studyp. 291
Using Frequency Graphs with Trend Lines to Analyze Time Relationshipsp. 294
Using Pareto Graphs to Discover and Evaluate the Importance of Key Business Indicatorsp. 295
Using Scatter Graphs to Evaluate Cause-and-Effect Relationshipsp. 296
Using Data Mining Tools to Gain an Insight into Churnp. 299
Profiling the Ones That Got Awayp. 299
Trying to Predict the Defectorsp. 305
Explaining Why People Leavep. 309
Predicting When People Will Leavep. 312
Summaryp. 315
Chapter 9 Step 8: Verifying and Presenting the Visualizations or Mining Modelsp. 317
Verifying the Data Visualizations and Mining Modelsp. 318
Verifying Logical Transformations to the Business Data Setp. 318
Verifying Your Business Assumptionsp. 319
Organizing and Creating the Business Presentationp. 320
Parts of the Business Presentationp. 320
Description of the VDM Project Goalsp. 321
Highlights of the Discoveries and Data Mining Modelsp. 321
Call to Actionp. 324
VDM Project Implementation Phasep. 326
Create Action Planp. 327
Approve Action Planp. 327
Implement Action Planp. 327
Measure the Resultsp. 328
Verifying and Presenting the Analysis for the Case Studyp. 329
Verifying Logical Transformations to the Business Data Setp. 329
Verifying the Business Assumptionsp. 330
The Business Presentationp. 331
Customer Retention Project Goals and Objectivesp. 331
Highlights of the Discoveriesp. 332
Call to Actionp. 334
Summaryp. 337
Chapter 10 The Future of Visual Data Miningp. 339
The Project Planning Phasep. 339
The Data Preparation Phasep. 341
The Data Analysis Phasep. 347
Trends in Commercial Visual Data Mining Softwarep. 350
More Chart Types and User-Defined Layoutsp. 351
Dynamic Visualizations That Allow User Interactionp. 353
Size and Complexity of Data Structures Visualizedp. 354
Standards That Allow Exchanges between Toolsp. 354
Summaryp. 355
Glossaryp. 357
Referencesp. 363
Indexp. 365