William Stein
The last database we'll talk about is called MongoDB. It's another example of a noSQL database (so SQL isn't used). It's vastly more powerful than the key:value stores we've talked about, more scalable, and it is extremely efficient. I did many benchmarks comparing MongoDB with other very fast databases (e.g., Tokyo Cabinet), and amazingly for large numbers of small records MongoDB is as good or better.
MongoDB does not come with Sage. To use it, you must install it yourself.
mkdir -p /tmp/mongotest mongod --dbpath /tmp/mongotest/ --bind_ip localhost --port 29000
Now:
You typically query the database by constructing a query dictionary.
{{{id=15| query = {'n':str(2^29-1)} results = F.find(query); results ///The result of the query is an iterable:
{{{id=13| results.next() /// {u'n': u'536870911', u'shape': u'2^29-1', u'_id': ObjectId('4cf87bcb8c667a74970000ba'), u'factor': [[u'233', 1], [u'1103', 1], [u'2089', 1]]} }}} {{{id=23| results.next() /// Traceback (most recent call last): File "You can insert any documents you want into F. There's no fixed "schema" that they all have to have.
{{{id=25| F.insert({"description":"This is a collection of factorizations of numbers.", "author":"William Stein"}) /// ObjectId('4cf87f598c667a7497000186') }}} {{{id=22| F.count() /// 36 }}} {{{id=36| %time # First, we make a list of the factorizations. We do not bother to set the 'shape' # field below, since these are not factorization of a special form. Note that we take # care to only store basic types (strings, ints, lists, dicts, etc.) in the database. v = [] for n in range(1,10^5): f = factor(n) v.append({'n':str(n), 'factor':[(str(q),int(e)) for (q,e) in f]}) /// CPU time: 16.60 s, Wall time: 16.62 s }}}We do the actual insert in one single call to the database.
{{{id=28| # manipulate=False speeds it up a little time z = F.insert(v, manipulate=False, check_keys=False) /// Time: CPU 1.04 s, Wall: 1.04 s }}} {{{id=30| F.find({'description': {'$exists':True}}).next() /// {u'_id': ObjectId('4cf87be48c667a74970000d4'), u'description': u'This is a collection of factorizations of numbers.', u'author': u'William Stein'} }}} {{{id=31| /// }}}The possible queries that you can do are extremely powerful. See the MongoDB documentation's query page.
{{{id=29| /// }}}You can also make an index on a given field, which matters greatly once the database gets bigger. To illustrate this, let's insert 100000 factorizations.
{{{id=20| F.count() /// 100035 }}} {{{id=18| timeit("F.find({'n':str(randint(2,10000))})") /// 625 loops, best of 3: 15.7 µs per loop }}} {{{id=19| F.ensure_index('n') /// u'n_1' }}} {{{id=27| timeit("F.find({'n':str(randint(2,10000))})") /// 625 loops, best of 3: 15.5 µs per loop }}}One hundred thousand isn't enough to make much of a difference with such simple keys. For larger data sets, ensure_index is absolutely critical.
We can also do almost everything from the monogo command line prompt, which is literally a Javascript interpreter.
MongoDB shell version: 1.6.3 connecting to: localhost:29000/test > use db switched to db db > db.factorizations db.factorizations > f = db.factorizations db.factorizations > z=0; for(i=1;i<=4;i++) { z+=i; }; z /* Look, javascript! */ 10 > f.find({'n':'2010'}) { "_id" : ObjectId("4cf87de0f94482457dbfbbbe"), "factor" : [ [ "2", 1 ], [ "3", 1 ], [ "5", 1 ], [ "67", 1 ] ], "n" : "2010" } > f.find({'shape':'2^29-1'}) { "_id" : ObjectId("4cf87dd78c667a749700016c"), "shape" : "2^29-1", "factor" : [ [ "233", 1 ], [ "1103", 1 ], [ "2089", 1 ] ], "n" : "536870911" } > f.find({"description":{"\$exists":true}}) { "_id" : ObjectId("4cf87f598c667a7497000186"), "description" : "This is a collection of factorizations of numbers.", "author" : "William Stein" }{{{id=38| /// }}}
Next lecture: discussion of the overall architecture for the modular forms and L-functions database project: database + webserver, etc.