图书介绍

Mining of massive datasetsPDF|Epub|txt|kindle电子书版本下载

Anand Rajaraman ; Jeffrey D. Ullman 著
出版社： Cambridge University Press
ISBN：1107015357
出版时间：2012
标注页数：316页
文件大小：39MB
文件页数：328页
主题词：

PDF下载

点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示：（请使用BT下载软件FDM进行下载）软件下载地址页直链下载[便捷但速度慢] [在线试读本书] [在线获取解压码]

点击复制MD5值：3562735dbeb78da26e1e98186164cb48

下载说明

Mining of massive datasetsPDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

点击复制85GB完整离线版磁力链接到迅雷FDM等BT下载工具进行下载详情点击-查看共享计划

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台）。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用！后期资源热门了。安装了迅雷也可以迅雷进行下载！

（文件页数要大于标注页数，上中下等多册电子书除外）

注意：本站所有压缩包均有解压码： 点击下载压缩包解压工具

图书目录

1 Data Mining1

1.1 What is Data Mining？1

1.2 Statistical Limits on Data Mining4

1.3 Things Useful to Know7

1.4 Outline of the Book15

1.5 Summary of Chapter 116

1.6 References for Chapter 117

2 Large-Scale File Systems and Map-Reduce18

2.1 Distributed File Systems18

2.2 Map-Reduce21

2.3 Algorithms Using Map-Reduce26

2.4 Extensions to Map-Reduce37

2.5 Efficiency of Cluster-Computing Algorithms42

2.6 Summary of Chapter 249

2.7 References for Chapter 251

3 Finding Similar Items53

3.1 Applications of Near-Neighbor Search53

3.2 Shingling of Documents57

3.3 Similarity-Preserving Summaries of Sets60

3.4 Locality-Sensitive Hashing for Documents67

3.5 Distance Measures71

3.6 The Theory of Locality-Sensitive Functions77

3.7 LSH Families for Other Distance Measures83

3.8 Applications of Locality-Sensitive Hashing88

3.9 Methods for High Degrees of Similarity96

3.10 Summary of Chapter 3104

3.11 References for Chapter 3106

4 Mining Data Streams108

4.1 The Stream Data Model108

4.2 Sampling Data in a Stream112

4.3 Filtering Streams115

4.4 Counting Distinct Elements in a Stream118

4.5 Estimating Moments122

4.6 Counting Ones in a Window127

4.7 Decaying Windows133

4.8 Summary of Chapter 4136

4.9 References for Chapter 4137

5 Link Analysis139

5.1 PageRank139

5.2 Efficient Computation of PageRank153

5.3 Topic-Sensitive PageRank159

5.4 Link Spam163

5.5 Hubs and Authorities167

5.6 Summary of Chapter 5172

5.7 References for Chapter 5175

6 Frequent Itemsets176

6.1 The Market-Basket Model176

6.2 Market Baskets and the A-Priori Algorithm183

6.3 Handling Larger Datasets in Main Memory192

6.4 Limited-Pass Algorithms199

6.5 Counting Frequent Items in a Stream205

6.6 Summary of Chapter 6209

6.7 References for Chapter 6211

7 Clustering213

7.1 Introduction to Clustering Techniques213

7.2 Hierarchical Clustering217

7.3 K-means Algorithms226

7.4 The CURE Algorithm234

7.5 Clustering in Non-Euclidean Spaces237

7.6 Clustering for Streams and Parallelism241

7.7 Summary of Chapter 7247

7.8 References for Chapter 7250

8 Advertising on the Web252

8.1 Issues in On-Line Advertising252

8.2 On-Line Algorithms255

8.3 The Matching Problem258

8.4 The Adwords Problem261

8.5 Adwords Implementation270

8.6 Summary of Chapter 8273

8.7 References for Chapter 8275

9 Recommendation Systems277

9.1 A Model for Recommendation Systems277

9.2 Content-Based Recommendations281

9.3 Collaborative Filtering291

9.4 Dimensionality Reduction297

9.5 The NetFlix Challenge305

9.6 Summary of Chapter 9306

9.7 References for Chapter 9308

Index310