Answer to a question that came up in lecture on Wednesday about file io:

{{{id=44| file = '/tmp/test' if os.path.exists(file): os.unlink(file) open(file,'w').write(""" This is a test.""") /// }}} {{{id=47| f = open(file,'r+') f.seek(11) f.write('mess!') f.close() /// }}} {{{id=48| print open(file).read() /// This is a mess! }}} {{{id=46| /// }}} {{{id=45| /// }}}

Math 480: Lecture 27 -- Databases, part 2: SQLite

 

Using SQLite in Sage

Check out the SQLite website.   Some key points:

 

Here's a complete example of using SQLite to make a database of integer factorizations.

{{{id=41| # sqlite3 is a standard Python module import sqlite3 # Make sure the database file isn't left over from a previous demo... file = '/tmp/sqlite0' if os.path.exists(file): os.unlink(file) /// }}} {{{id=28| # open the database file -- zero configuration! db = sqlite3.connect(file) # get a "cursor" cursor = db.cursor() # start executing SQL commands cursor.execute("""CREATE TABLE factorizations (number INTEGER, factorization TEXT, UNIQUE(number))""") cursor.execute("CREATE INDEX factorizations_idx ON factorizations(number)") # commit our changes -- SQL uses transactions db.commit() /// }}} {{{id=27| t = ('6', '[(2,1),(3,1)]') cursor.execute('INSERT INTO factorizations VALUES(?,?)', t) /// }}} {{{id=25| db.commit() /// }}}

We can look at our new database on the command line, completely independently of Sage/Python:

boxen:~ wstein\$ sage -sh
(sage subshell)\$ sqlite3 /tmp/sqlite1
SQLite version 3.4.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .schema
CREATE TABLE factorizations 
         (number TEXT, factorization TEXT, UNIQUE(number));
CREATE INDEX factorizations_idx ON factorizations(number);
sqlite> select * from factorizations;
6|[(2,1),(3,1)]

By the way, the UNIQUE above makes it so you can't enter another factorization of the same number.

{{{id=35| t = ('6', '[(2,1),(3,1)]') cursor.execute('INSERT INTO factorizations VALUES(?,?)', t) /// Traceback (most recent call last): File "", line 1, in File "_sage_input_76.py", line 10, in exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("dCA9ICgnNicsICdbKDIsMSksKDMsMSldJykKY3Vyc29yLmV4ZWN1dGUoJ0lOU0VSVCBJTlRPIGZhY3Rvcml6YXRpb25zIFZBTFVFUyg/LD8pJywgdCk="),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))' + '\n', '', 'single') File "", line 1, in File "/tmp/tmp37ss5c/___code___.py", line 3, in exec compile(u"cursor.execute('INSERT INTO factorizations VALUES(?,?)', t)" + '\n', '', 'single') File "", line 1, in sqlite3.IntegrityError: column number is not unique }}} {{{id=69| %time for n in range(1,10000): f = str(list(factor(n))).replace(' ','') try: t = (str(n), f) z = cursor.execute('INSERT INTO factorizations VALUES(?,?)', t) except: print "Unable to insert factorization of %s"%n /// Unable to insert factorization of 6 CPU time: 0.63 s, Wall time: 0.63 s }}} {{{id=68| time db.commit() /// Time: CPU 0.01 s, Wall: 0.00 s }}} {{{id=53| a = cursor.execute('SELECT * FROM factorizations ORDER BY number;') /// }}} {{{id=54| i = 0 for x in a: print x i += 1 if i>10: break /// (1, u'[]') (2, u'[(2,1)]') (3, u'[(3,1)]') (4, u'[(2,2)]') (5, u'[(5,1)]') (6, u'[(2,1),(3,1)]') (7, u'[(7,1)]') (8, u'[(2,3)]') (9, u'[(3,2)]') (10, u'[(2,1),(5,1)]') (11, u'[(11,1)]') }}} {{{id=52| /// }}}

We use the command line again (we do not have to exit or reload!) and find:

sqlite> SELECT * FROM factorizations where number<10;
1|[]
2|[(2,1)]
3|[(3,1)]
4|[(2,2)]
5|[(5,1)]
6|[(2,1),(3,1)]
7|[(7,1)]
8|[(2,3)]
9|[(3,2)]

Obviously, to use SQLite effectively, it helps enormously to know the SQL language.   Fortunately, you don't need to know very much, there are tons of examples on the web, many tutorials, books, and SQL isn't hard to learn.  

 

Python documentation for the sqlite3 module: http://docs.python.org/library/sqlite3.html

{{{id=49| /// }}}

SQLAlchemy

Next we'll spend a few moments on SQLAlchemy, which is a Python package included standard with Sage, which can also be installed easily into any Python install. 

  • SQLAlchemy is the canonical "object relational database mapper" for Python.
  • SQLAlchemy abstracts away the database backend, so the same code/application can work with SQLite, PostgreSQL, Oracle, MySQL, etc.
  • SQLAlchemy has a large test suite, good documentation, and is a high quality polished product. 
  • SQLAlchemy is MIT licensed (so very open source)

 

WARNING: As of this writing (May 27, 2011) the version of SQLAlchemy in the newest Sage (which is Sage-4.7) is the "ancient" 0.5.8 version.  So make sure to look at the right version of the SQLAlchemy docs here: http://www.sqlalchemy.org/docs/05/

{{{id=58| import sqlalchemy sqlalchemy.__version__ /// '0.5.8' }}}

We will use the file /tmp/sqlite1 for our demo.  Make sure it is deleted.

{{{id=76| file = '/tmp/sqlite1' if os.path.exists(file): os.unlink(file) /// }}}

Create a SQLite engine, which SQLalchemy will use.  This is the only place below that SQLite is explicitly mentioned.

{{{id=57| from sqlalchemy import create_engine engine = create_engine('sqlite:///%s'%file) #, echo=True) /// }}}

Use SQLalchemy to declare a new Python class, which will get mapped to a table in the above SQLlite database.

{{{id=56| from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column Base = declarative_base() class IntFac(Base): __tablename__ = 'factorizations' number = Column(sqlalchemy.Integer, primary_key=True) factorization = Column(sqlalchemy.String) def __init__(self, number): self.number = int(number) self.factorization = str(list(factor(number))).replace(' ','') def __repr__(self): return '%s: %s'%(self.number, self.factorization) /// }}}

Make a particular session that connects to the database.

{{{id=55| from sqlalchemy.orm import sessionmaker session = sessionmaker(bind=engine)() /// }}}

Create the tables.  In this case, there is exactly one, which corresponds to the IntFac class above.

{{{id=66| Base.metadata.create_all(engine) /// }}}

Now create an integer factorization object.

{{{id=40| f = IntFac(6); f /// 6: [(2,1),(3,1)] }}}

And add it to our session, so it will get tracked by the database.

{{{id=59| session.add(f) /// }}}

Commit everything we have done so far.  After this commit, the database exists separately on a disk on file, and we can inspect it using the sqlite3 command line program.

{{{id=65| session.commit() /// }}}
wstein@boxen:/tmp\$ ls -lh /tmp/sqlite1
-rw-r--r-- 1 sagenbflask sagenbflask 2.0K 2011-05-27 13:46 /tmp/sqlite1
wstein@boxen:/tmp\$ sqlite3 /tmp/sqlite1
SQLite version 3.4.2
Enter ".help" for instructions
sqlite> .schema
CREATE TABLE factorizations (
	number INTEGER NOT NULL, 
	factorization VARCHAR, 
	PRIMARY KEY (number)
);
sqlite> select * from factorizations;
6|[(2,1),(3,1)]

We try a query on the session:

{{{id=63| session.query(IntFac).first() /// 6: [(2,1),(3,1)] }}}

We try adding the factorization of 6 again.  This should give an error because number is the primary key, hence must be unique.

{{{id=60| session.add(IntFac(6)) /// }}} {{{id=61| session.commit() /// Traceback (most recent call last): File "", line 1, in File "_sage_input_127.py", line 10, in exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("c2Vzc2lvbi5jb21taXQoKQ=="),globals())+"\\n"); execfile(os.path.abspath("___code___.py"))' + '\n', '', 'single') File "", line 1, in File "/tmp/tmpkPWLRS/___code___.py", line 2, in exec compile(u'session.commit()' + '\n', '', 'single') File "", line 1, in File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/session.py", line 671, in commit self.transaction.commit() File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/session.py", line 378, in commit self._prepare_impl() File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/session.py", line 362, in _prepare_impl self.session.flush() File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/session.py", line 1354, in flush self._flush(objects) File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/session.py", line 1432, in _flush flush_context.execute() File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/unitofwork.py", line 261, in execute UOWExecutor().execute(self, tasks) File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/unitofwork.py", line 753, in execute self.execute_save_steps(trans, task) File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/unitofwork.py", line 768, in execute_save_steps self.save_objects(trans, task) File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/unitofwork.py", line 759, in save_objects task.mapper._save_obj(task.polymorphic_tosave_objects, trans) File "/sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/SQLAlchemy-0.5.8-py2.6.egg/sqlalchemy/orm/mapper.py", line 1303, in _save_obj (state_str(state), instance_key, state_str(existing))) sqlalchemy.orm.exc.FlushError: New instance with identity key (, (6,)) conflicts with persistent instance }}}

Once an error occurs the only option is to rollback the whole transaction.

{{{id=73| session.rollback() /// }}}

Let's make a few thousand factorization (like we did above) and include them all in one transaction in the database.

{{{id=67| time v = [IntFac(n) for n in [1..5] + [7..10000]] /// Time: CPU 1.98 s, Wall: 1.98 s }}}

Using add_all should be more efficient than calling add many times. 

{{{id=70| time session.add_all(v) /// Time: CPU 0.35 s, Wall: 0.36 s }}} {{{id=71| time session.commit() /// Time: CPU 6.59 s, Wall: 6.59 s }}}

Now we have factorizations of all integers up to 10000.  We can do a query like above.

{{{id=75| for X in session.query(IntFac).filter('number<10'): print X /// 1: [] 2: [(2,1)] 3: [(3,1)] 4: [(2,2)] 5: [(5,1)] 6: [(2,1),(3,1)] 7: [(7,1)] 8: [(2,3)] 9: [(3,2)] }}}

And, we can do the same on the command line:

sqlite> select * from factorizations where number<10;
1|[]
2|[(2,1)]
3|[(3,1)]
4|[(2,2)]
5|[(5,1)]
6|[(2,1),(3,1)]
7|[(7,1)]
8|[(2,3)]
9|[(3,2)]
{{{id=99| /// }}} {{{id=98| /// }}} {{{id=96| /// }}}