Math 480: Lecture 22 -- Interfaces

 

The goal of this lecture is to give you a deeper understanding of some of the fundamental and unique architectural issues involved in Sage.

I built Sage partly from other complete mathematical software systems because I wanted to finish Sage in at most 5 years. 

"Building the car instead of reinventing the wheel."

Some of the major components included in Sage are:

Each of the above is a full standalone project with it's own custom programming language, history, culture, etc.  And each has unique, powerful, debugged code that I don't want to have to rewrite from scratch, since it would take too long, and writing code from scratch is incredibly difficult and frustrating. 

I also wanted to make it easy to call the following systems from Sage for the purposes of benchmarking, porting, migration of users and code, optional functionality, etc.:

 

The Big Problem: How can we make use of the above systems from Python?

This question is difficult partly because there are so many answers, each with pros and cons, and I had to choose (then be criticized for my choices).

 

 

{{{id=1| /// }}}

Problem 1: Availability of a specific known version of a third party software package.

Even if we solve the big problem above, a "vendor" will often just release a new version of their software with numerous changes that break our solution.  This happens with constantly.  And the actual versions of software that people have installed (under OS X, various Linuxes, Solaris, etc.) will be widely distributed over versions of software from the last decade.    

Solution: For the free open systems that (1) we really need, and (2) we can build from source easily enough, we ship and build a very specific version as part of Sage.   This completely solves problem 1, at the expense of a lot of (misplaced) criticism from people who don't understand the problem; at the same time, this accidentally creates a solution to a different problem (easy-to-install distribution of a bunch of useful math software), which many people greatly appreciate. 

For the non-free systems or the free systems that are hard to build, the problem just doesn't get solved.  And indeed our interfaces and code that relies on those systems is sadly fairly brittle; often a new version of Magma just breaks with Sage.   Fortunately, of course very little functionality in Sage depends on such systems. 

 

 

{{{id=4| /// }}}

Problem 2: Make a specific version of some mathematics software (call it M) usable from Python.

Here are some of the many potential approaches to this problem:

  1. Naive Subprocess. Start up M, tell it to read in a file, and save the results in a file, and terminate.  This doesn't preserve state between calls, and startup time can easily be seconds, so this is not viable.
  2. Create network protocols.  Define an openmath/XML based protocol for well-defined communication of "arbitrary mathematics" (whatever that is) between software, e.g., between M and Python.  Design and implement client and server protocols.  The SCIEnce project, started in 2006, and costing many millions of dollars, is an example.   This seems like the right approach, but it it is very slow in benchmarks and complicated to develop.  It's probably very useful for something (e.g., writing research papers), but is massively too complicated for what we need for Sage, which is focused on what is practical now.
  3. Pseudo-tty's (ptty) = pexpect.   Create a simulated command line prompt that is controled by Python.  Absolutely anything that one can do at the command line with M immediately becomes usable from a Python program.    This is relatively easy to implement and extremely flexible -- one can create a useful interface to any math software system out there in a day.  It is slow in some sense, but still much better than (1) and (2).   This approach has been used in Sage for a long time.
  4. C/C++ library interfaces.  Create a C/C++ library interface and link the other program into Python itself, using Cython.   This is extremely difficult, because none of the M's (except PARI) are designed to be used this way.  However, it is extremely fast.  For basic arithmetic, it can be several hundred times faster than (3) above.  

As of now, people have written fairly polished versions of both (3) and (4) for all of: PARI, GAP, Singular, Maxima, and R.    In case of (4), these are all hard work, and aren't necessarily used much in Sage yet, or even included in Sage, but they exist, and are on the way in. 

{{{id=6| /// }}}

The rest of this worksheet is about how to use (3) above: the pexpect based interfaces.   This is well worth learning, because these interface all work in almost exactly the same way, and there are interfaces to pretty much every math software system out there.   Sage is the only software in existence that can talk to so many other math software systems.  

Here are the basic points, which we'll follow with several examples illustrating them.  Suppose m is one of math software systems, e.g., r or singular or maxima:

And that is pretty much it.  

WARNING: There is latency.  Any time you call any function involving a pexpect interface, expect it to take on the order of at least 1ms (=one millisecond), even if the actual operation in the system M takes almost no time.    For comparison, adding or multiplying most simple objects in Python/Sage takes about 1 microsecond (i.e., 1/1000 the time of a call involving pexpect), and adding/multiping objects in Cython can take only a few nanoseconds (1/1000000 the time of a pexpect call).  

{{{id=8| /// }}}

Examples

Another note: the very first time you do m.eval(...) it may take surprisingly long, since another program is starting up.

We use Maxima to illustrate evaluation of a simple string:

{{{id=13| s = maxima.eval("2 + 3") type(s) /// }}} {{{id=17| s /// '5' }}} {{{id=72| maxima.eval(""" a : 2; b : 3; c : a +b; """) maxima.eval('c') /// '5' }}} {{{id=71| /// }}}

There is now a separate Maxima subprocess running.  Each process has an id number associated to it:

{{{id=12| maxima.pid() # the "pin id" of the subprocess /// 9259 }}}

Next will illustrate creating a Python object that wraps an expression in Maxima.

{{{id=16| s = maxima('sin(x^3) * tan(y)') type(s) /// }}}

The name of the object in the corresponding Maxima session:

{{{id=22| s.name() /// 'sage6' }}}

The object prints nicely:

{{{id=15| s /// sin(x^3)*tan(y) }}}

Latex output happens to be supported:

{{{id=24| show(s) ///
\newcommand{\Bold}[1]{\mathbf{#1}}\sin x^3\,\tan y
}}}

You can call functions on objects in a Pythonic way.

{{{id=11| s.integrate('y') /// sin(x^3)*log(sec(y)) }}}

Or use maxima.function(...)

{{{id=10| maxima.integrate(s, 'y') /// sin(x^3)*log(sec(y)) }}}

The result is another Python object (which wraps another object defined in Maxima).  We can call functions on that object as well.

{{{id=31| z = s.integrate('y') type(z) /// }}} {{{id=33| z /// sin(x^3)*log(sec(y)) }}} {{{id=34| z.diff('y') /// sin(x^3)*tan(y) }}} {{{id=53| /// }}}

Conclusion: If you understand the above, you are in extremely good shape.  All the other interfaces work the same way.   The examples below are just to illustrate some subtle points and show how interfaces are useful.  

{{{id=30| /// }}}

It is possible in some systems to seriously mess things up and get things "out of sync".  This is nearly impossible with Maxima, since we use it so heavily and have debugged the heck out of it.  However, with other systems (like Magma) this can happen.  If it does, do, e.g., maxima.quit().  This completely kills the subprocess, invalides any Python objects that wrap variables in that session, and starts a brand new fresh session.  

{{{id=36| /// }}}

Here is an example with each of the five big systems included in Sage:

{{{id=55| maxima('2+3') # maxima /// 5 }}} {{{id=52| gp('2+3') # pari/gp /// 5 }}} {{{id=51| singular('2+3') /// 5 }}} {{{id=50| gap('2+3') /// 5 }}} {{{id=41| r('2 + 3') /// [1] 5 }}}

You can follow standard R tutorials and have the computations (except graphics at present) to all definitely "just work".  (Unlike the potentially confusing rpy2.) 

{{{id=58| x = r('c(1,3,2,10,5)'); y = r('1:5') print x print y /// [1] 1 3 2 10 5 [1] 1 2 3 4 5 }}} {{{id=57| x + y /// [1] 2 5 5 14 10 }}} {{{id=40| x/y /// [1] 1.0000000 1.5000000 0.6666667 2.5000000 1.0000000 }}} {{{id=62| x.length() /// [1] 5 }}} {{{id=63| x > 3 /// [1] FALSE FALSE FALSE TRUE TRUE }}} {{{id=61| x[x > 3] /// [1] 10 5 }}}

There is also an interface to Octave, which is very similar to Matlab (but free).

{{{id=68| A = octave('rand(3)'); A /// 0.401446 0.286955 0.396858 0.606625 0.371021 0.515619 0.96863 0.683554 0.837288 }}} {{{id=66| A*A /// 0.719642 0.492938 0.639562 0.968042 0.664185 0.863772 1.61454 1.1039 1.43791 }}} {{{id=69| A.rref() /// 1 0 0 0 1 0 0 0 1 }}} {{{id=70| /// }}} {{{id=65| /// }}} {{{id=59| /// }}}

Bonus: There is even a pexpect interface to Sage itself.   (Trivia: this is used in the implementation of the Sage notebook.)

{{{id=39| sage0('2 + 3') /// 5 }}} {{{id=38| A = sage0('matrix(QQ, 3, [1..9])'); A /// [1 2 3] [4 5 6] [7 8 9] }}} {{{id=3| type(A) /// }}} {{{id=43| A.echelon_form() /// [ 1 0 -1] [ 0 1 2] [ 0 0 0] }}}

Let's get crazy: a pexpect interface inside a pexpect interface.  And of course, this code is going from the notebook to Sage via yet another interface.

{{{id=46| sage0.eval('sage0 = Sage()') z = sage0('sage0("3+5")') /// }}} {{{id=45| type(z) /// }}} {{{id=44| z /// 8 }}} {{{id=47| sage0.type(z) /// }}} {{{id=48| /// }}}