Big Data/Analytics Zone is brought to you in partnership with:

I’ve been a Windows developer since 3.0 and caught the Visual Basic wave early with v1. I’ve released a “production” application in every version of VB since then (except VB for DOS). Focusing on enterprise, line-of-business development I’ve built Call Center Applications, Mortgage finance systems, Customer Relationship Management tools and more recently I’ve been in the Litigation Support/Electronic Data Discovery/Electronically Stored Information space. Greg is a DZone MVB and is not an employee of DZone and has posted 476 posts at DZone. You can read more from them at their website. View Full User Profile

Free Big Data EBook: 'Mining of Massive Datasets'

08.14.2012
| 8960 views |
  • submit to reddit

The book has now been published by Cambridge University Press. The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3.

Download Version 1.0

The following materials are equivalent to the published book, with errata corrected to July 4, 2012. It has been frozen as we revise the book. The evolving book can be downloaded as "Version 1.1" below.

Download the Complete Book (340 pages, approximately 2MB) [GD: Click through for all the downloads]

Download chapters of the book:

Preface and Table of Contents
Chapter 1 Data Mining
Chapter 2 Large-Scale File Systems and Map-Reduce
Chapter 3 Finding Similar Items
Chapter 4 Mining Data Streams
Chapter 5 Link Analysis
Chapter 6 Frequent Itemsets
Chapter 7 Clustering
Chapter 8 Advertising on the Web
Chapter 9 Recommendation Systems
Index

Download Version 1.1

Below is a draft, evolving version of the MMDS book. We have added Jure Leskovec as a coauthor, and at this point added only one new chapter, on mining large graphs. However, we will be making available new chapters on large-scale machine-learning algorithms and dimensionality reduction, as well as expanding Chapter 2 on map-reduce algorithm design.

Download the Complete Book (395 pages, approximately 2.4MB)

Download chapters of the book:

Preface and Table of Contents
Chapter 1 Data Mining
Chapter 2 Large-Scale File Systems and Map-Reduce
Chapter 3 Finding Similar Items
Chapter 4 Mining Data Streams
Chapter 5 Link Analysis
Chapter 6 Frequent Itemsets
Chapter 7 Clustering
Chapter 8 Advertising on the Web
Chapter 9 Recommendation Systems
Chapter 10 Mining Social-Network Graphs
Index

From the Preface of v1.1

This book evolved from material developed over several years by Anand Rajaraman and Jeff Ullman for a one-quarter course at Stanford. The course CS345A, titled “Web Mining,” was designed as an advanced graduate course, although it has become accessible and interesting to advanced undergraduates. When Jure Leskovec joined the Stanford faculty, we reorganized the material considerably. He introduced a new course CS224W on network analysis and added material to CS345A, which was renumbered CS246. The three authors also introduced a large-scale data-mining project course, CS341. The book now contains material taught in all three courses.

What the Book Is About
At the highest level of description, this book is about data mining. However, it focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort. The principal topics covered are:

1. Distributed file systems and map-reduce as a tool for creating parallel algorithms that succeed on very large amounts of data.
2. Similarity search, including the key techniques of minhashing and locality-sensitive hashing.
3. Data-stream processing and specialized algorithms for dealing with data that arrives so fast it must be processed immediately or lost.
4. The technology of search engines, including Google’s PageRank, link-spam detection, and the hubs-and-authorities approach.
5. Frequent-itemset mining, including association rules, market-baskets, the A-Priori Algorithm and its improvements.
6. Algorithms for clustering very large, high-dimensional datasets.
7. Two key problems for Web applications: managing advertising and recommendation systems.
8. Algorithms for analyzing and mining the structure of very large graphs,especially social-network graphs.

If you're really big into big data, or wanna-be, this eBook looks to be just for you.

(via Jason Haley - Interesting Finds: August 12, 2012)

Published at DZone with permission of Greg Duncan, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Mrunal Shah replied on Tue, 2012/08/14 - 2:16pm

I am very interested in this subject matter, however I couldn't find free download link as you mentioned on this page :(

Kingshuk Chatterjee replied on Tue, 2012/08/14 - 5:08pm in response to: Mrunal Shah

I just googled it and found this link here: http://i.stanford.edu/~ullman/mmds/book.pdf

Swathi Venkatachala replied on Thu, 2012/08/16 - 5:07am

Nice post! Thanks for the share :)

Cheers!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.