图书介绍
Mining of massive datasetsPDF|Epub|txt|kindle电子书版本下载
- Anand Rajaraman ; Jeffrey D. Ullman 著
- 出版社: Cambridge University Press
- ISBN:1107015357
- 出版时间:2012
- 标注页数:316页
- 文件大小:39MB
- 文件页数:328页
- 主题词:
PDF下载
下载说明
Mining of massive datasetsPDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
1 Data Mining1
1.1 What is Data Mining?1
1.2 Statistical Limits on Data Mining4
1.3 Things Useful to Know7
1.4 Outline of the Book15
1.5 Summary of Chapter 116
1.6 References for Chapter 117
2 Large-Scale File Systems and Map-Reduce18
2.1 Distributed File Systems18
2.2 Map-Reduce21
2.3 Algorithms Using Map-Reduce26
2.4 Extensions to Map-Reduce37
2.5 Efficiency of Cluster-Computing Algorithms42
2.6 Summary of Chapter 249
2.7 References for Chapter 251
3 Finding Similar Items53
3.1 Applications of Near-Neighbor Search53
3.2 Shingling of Documents57
3.3 Similarity-Preserving Summaries of Sets60
3.4 Locality-Sensitive Hashing for Documents67
3.5 Distance Measures71
3.6 The Theory of Locality-Sensitive Functions77
3.7 LSH Families for Other Distance Measures83
3.8 Applications of Locality-Sensitive Hashing88
3.9 Methods for High Degrees of Similarity96
3.10 Summary of Chapter 3104
3.11 References for Chapter 3106
4 Mining Data Streams108
4.1 The Stream Data Model108
4.2 Sampling Data in a Stream112
4.3 Filtering Streams115
4.4 Counting Distinct Elements in a Stream118
4.5 Estimating Moments122
4.6 Counting Ones in a Window127
4.7 Decaying Windows133
4.8 Summary of Chapter 4136
4.9 References for Chapter 4137
5 Link Analysis139
5.1 PageRank139
5.2 Efficient Computation of PageRank153
5.3 Topic-Sensitive PageRank159
5.4 Link Spam163
5.5 Hubs and Authorities167
5.6 Summary of Chapter 5172
5.7 References for Chapter 5175
6 Frequent Itemsets176
6.1 The Market-Basket Model176
6.2 Market Baskets and the A-Priori Algorithm183
6.3 Handling Larger Datasets in Main Memory192
6.4 Limited-Pass Algorithms199
6.5 Counting Frequent Items in a Stream205
6.6 Summary of Chapter 6209
6.7 References for Chapter 6211
7 Clustering213
7.1 Introduction to Clustering Techniques213
7.2 Hierarchical Clustering217
7.3 K-means Algorithms226
7.4 The CURE Algorithm234
7.5 Clustering in Non-Euclidean Spaces237
7.6 Clustering for Streams and Parallelism241
7.7 Summary of Chapter 7247
7.8 References for Chapter 7250
8 Advertising on the Web252
8.1 Issues in On-Line Advertising252
8.2 On-Line Algorithms255
8.3 The Matching Problem258
8.4 The Adwords Problem261
8.5 Adwords Implementation270
8.6 Summary of Chapter 8273
8.7 References for Chapter 8275
9 Recommendation Systems277
9.1 A Model for Recommendation Systems277
9.2 Content-Based Recommendations281
9.3 Collaborative Filtering291
9.4 Dimensionality Reduction297
9.5 The NetFlix Challenge305
9.6 Summary of Chapter 9306
9.7 References for Chapter 9308
Index310