Preface .
1. Why Mongo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Problem of Objects and Relational Data Structures 1
The Problem with ORMs 2
ORMs Are Hairy and Complex 2
ORMs Aren’t Performant 2
ORMs Neutered SQL 2
Complicated Architecture 2
PHP Is Mostly CRUD 3
MongoDB, Optimized for Operation 3
MongoDB Is a Document Database 4
Document == Array 4
MongoDB Is Optimized for CRUD Operations 5
Optimal Interface for Developers 6
Optimal Performance 6
Optimal Simplicity 6
The Value of Consistency 6
2. PHP, MongoDB, and You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Installing the Driver on Linux or MacOS X 9
Checking for the Driver 9
Installing the Driver 9
Upgrading the Driver 10
Installing the Driver on Windows 10
Connecting to a Database 11
Connecting to a MongoDB Database Server 11
Selecting a Database 11
The Basics (CRUD Operations) 11
Creating/Selecting a Collection 12
Creating a Document 12
Primary Keys and ObjectIds 14
Reading a Document 15
Updating a Document 16
Saving a Document 18
Deleting a Document 19
The MongoDB Shell 19
mongo 19
Using the Shell 19
Administrative Commands 20
Working with Sets 20
Querying Sets 20
Finding (Querying) Data in MongoDB 21
Pagination with the Cursor 22
Ranges 22
Working with Arrays 23
Conditionals 28
Working with Multiple Documents 28
Working with Indexes 29
Setting Indexes 30
Index Order 31
About Indexes 31
Compound Indexes 31
Indexing Arrays 32
Indexes and Memory 32
Database References 32
References Are Not Foreign Keys 33
When to Use References or Reference versus Embed 33
How to Create References 34
How to Access DBRefs 36
Dates and Times 37
3. Advanced MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Regular Expressions 39
Creating a MongoDB Regular Expression 40
Regular Expressions and Indexes 40
Aggregation Commands 41
The Distinct Command 41
The Group Command 42
MapReduce 44
findAndModify 47
GridFS 47
What Is GridFS? 47
Using GridFS 48
Mongofiles 49
Replication 49
High Availability 49
Why Three Nodes? 49
Really Easy Configuration 50
Checking the Replica Set Status 50
Sharding 51
Gotchas 52
The $ Problem 52
The Array != Array Problem 53
Request Injection Attacks 53
PHP Libraries and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Object Document Mappers (ODM) 55
Doctrine MongoDB ODM 56
Active Mongo 56
Mandango 57
Tools 57
MongoQueue 57
Genghis 58
RockMongo 58
Frameworks 58
Symfony2 59
Lithium 59
Zend 59
Fuel 59
FatFree Framework 59
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Once every decade or so, a technology comes along that is so revolutionary that it
fundamentally alters the way we approach everything we do. The world itself has
changed. As I think back to 1995 when I first started developing Internet applications,
our data needs were relatively simple. For the next 10 years, little changed; more and
more people were using the Internet, and consequently data stores needed to scale to
larger workloads, but caching largely took care of that, as all users were accessing the
same set of data. As social media came to fruition, it was clear that the approach that
had worked for the prior 30 years was not longer sufficient. In the future, all data and
experience would need to be personalized—on a large scale. It was out of this need that
MongoDB was created. A database for today’s applications, a database for today’s
challenges, a database for today’s scale: MongoDB has that disruptive potential that
will fundamentally change the way you approach developing applications.
I’d like to publicly thank my wife and four children for being patient with me as I spent
most of my free time over the past few months writing this book.
Why Mongo?
One of the problems that led to the first dot-com crash was the huge expense of
development, especially server software. A new and viable set of open source tools
emerged from the ashes of the first dot-com and became the foundation for the next
generation of the Internet. In the summer of 2001, a new acronym emerged;
LAMP—Linux, Apache, MySQL and PHP—became the platform of choice for an entire
generation of developers. And like that, PHP and MySQL were married (they were right
next to each other, after all). The two seemed destined to go together forever.
The Problem of Objects and Relational Data Structures
There was only one problem. PHP—which started as a templating language—ma-
tured and gradually embraced objects. PHP was being used in more complex applica-
tions and the language consistently changed to meet these ever-increasing demands.
The practice of writing raw SQL queries in template files quickly became unacceptable
(some say it was never acceptable). As the problems became more and more complex,
tools were written to solve the constantly growing trouble of PHP using objects (or
arrays) and MySQL (and the other relational databases) using tables, rows, and
This isn’t a problem specific to PHP. For decades, people have built tools and libraries
to automate the process of translating objects to relational data structures. The most
popular set is called Object Relational Mappers (ORMs). ORMs were built to solve the
problem of SQL. Their sales pitch is: use an ORM because it masks all the nasty details
of the datastore, so all you ever need to touch is your friendly PHP objects. Although
tools emerged that did a reasonable job of making good on that promise, they never
really worked perfectly. First, you always needed to remember that there was a rela-
tional database behind these objects that spoke in terms of tables, rows, and columns.
Second, these ORMs came at a high cost. They added a lot of complexity and overhead
to applications and persisted only a subset of SQLs features. As they developed, it
quickly became the case that learning an ORM was far more time-consuming than
learning SQL in the first place. It is sufficient to say that although the ORMs largely
fixed the problems of SQL, they brought with them the problems of ORMs.
The Problem with ORMs
The objective of an ORM may be simple, but the solution never is.
ORMs Are Hairy and Complex
Propel and Doctrine are the two most popular ORMs for PHP. Propel follows an active
record model; Doctrine follows hibernate. Both projects are quite large, comprising
tens of thousands of lines of code. Doctrine also provides its own SQL-like query lan-
guage called DQL, so you need to know both SQL and DQL to use Doctrine.
ORMs Aren’t Performant
The core objective of the ORM is developer convenience. The core objective of an ORM
is developer convenience as they are built to translate the database's tables, rows, and
columns into your languages objects. The most common approach is called Active
Record. It is especially easy to use but carries with it some of the worst performance
compromises to do so. This is universally true, but especially in PHP. Typically they
perform reasonably well with low activity, but as load or data size increases, their per-
formance compromises become a large hindrance. A common criticism is that Ruby
on Rails doesn’t scale, and it’s best as a prototype environment. This is an accurate
criticism, but it is important to recognize that the place that it doesn't scale isn't the
controller or view, it's the Active Record layer. Not only do ORMs add a layer of over-
head at runtime, but they also consume a lot of memory.
ORMs Neutered SQL
It wasn’t just that the ORMs made it so that SQL was hidden; they stripped it down to
its most basic features. ORMs made it really quite simple to do the operational stuff
like reading and writing objects, commonly called CRUD (Create Read Update Delete)
operations, but failed in large part to support any of SQL’s advanced features. If you
don’t believe me, try to do a left outer join with an ORM or an aggregate function like
an average across a set of data. Many have even failed to provide support for database
transactions, passing along the responsibility to the application.
Complicated Architecture
In an effort to address some of the performance shortcomings of ORMs and relational
databases in general, MemCache was built. MemCache was so effective at speeding up
data retrieval that it was quickly adopted across the industry. It soon became a necessary
2 | Chapter 1: Why Mongo?
