Math 480: 2010-05-14

Fast Function Evaluation and Some More Cython

William Stein

 

 

{{{id=5| /// }}}

Fast Evaluation of Expressions

Sage has a powerful fast_float function that given a symbolic expression returns a very fast callable object. This is really useful to know about when implementing root finding or if you call any functions from scipy. At present, _fast_float_ can be 100 times faster than not using it!  Here is an example:

{{{id=34| f(x) = cos(x)^2 - x^3 + x - 5 parent(f) /// Callable function ring with arguments (x,) }}} {{{id=36| a = float(e); a /// 2.7182818284590451 }}} {{{id=37| f(a) /// -21.5359963633559 }}} {{{id=89| type(f(a)) /// }}} {{{id=33| timeit('float(f(a))') /// 625 loops, best of 3: 198 µs per loop }}}

198 microseconds per call may sound fast, but if we're calling this function millions of times, e.g., applying it to entries of a 1000x1000 matrix for some image processing application, or drawing a fractal, etc., then 198 microseconds is dog slow:

{{{id=150| t = 1000. * 1000 * 198 * units.si_prefixes.micro * units.time.second; t /// 1.98000000000000e8*micro*second }}} {{{id=151| t.convert(units.time.minute) /// 3.30000000000000*minute }}} {{{id=148| /// }}}

Pure Python is also very good in this case, because its evalution of float arithmetic is highly optimized.

Method 1: Use "raw" literals and explicit call the cos math library function.

The "r" suffix below is special to Sage.  It means "please, do not preparse this number.  Leave it as a Python float or int."

{{{id=140| def f_python(x): return math.cos(x)^2.0r - x^3.0r + x - 5.0r /// }}} {{{id=144| f_python(a) /// -21.535996363355856 }}} {{{id=145| type(f_python(a)) /// }}} {{{id=146| timeit('f_python(a)') /// 625 loops, best of 3: 808 ns per loop }}} {{{id=147| 198/.808 /// 245.049504950495 }}}

Method 2: Use a %python block (equivalently, put your code in a .py file and import it).  Note that here the Sage preparser isn't invoked, so be careful to use ** instead of ^, etc.

{{{id=165| %python import math def f_python(x): return math.cos(x)**2.0 - x**3.0 + x - 5.0 /// }}} {{{id=167| timeit('f_python(a)') /// 625 loops, best of 3: 920 ns per loop }}} {{{id=141| /// }}}

Now we try fast_float, and see how it does.

{{{id=95| g = fast_float(f, x) /// }}} {{{id=54| g /// }}} {{{id=38| type(g(a)) /// }}} {{{id=32| timeit('g(a)') /// 625 loops, best of 3: 427 ns per loop }}} {{{id=126| 198/.427 /// 463.700234192037 }}} {{{id=56| g.op_list() /// [('load_arg', 0), ('ipow', 3), 'neg', ('load_arg', 0), 'cos', ('ipow', 2), 'add', ('load_arg', 0), 'add', ('load_const', -5.0), 'add', 'return'] }}}

In our imaginary image processing example, we would thus get a speedup, and instead of the poor user waiting over 3 minutes, they wait less than half a second.

{{{id=156| t = 1000. * 1000 * 427 * units.si_prefixes.nano * units.time.second; t /// 4.27000000000000e8*nano*second }}} {{{id=57| t.convert(units.time.second) /// 0.427000000000000*second }}} {{{id=159| /// }}} {{{id=158| /// }}}

If your function is sufficiently simple, you can also code it up using Cython.  When this works, it is blazingly fast, though you have the overhead of compiling the code in the first place.  Also, for optimized function evaluation, you have to use Cython like the C language, so to get good performance you must know the basics of C programming:

  • C datatypes like float and int
  • the C library, e.g., math.h
{{{id=133| %cython cdef extern from "math.h": double cos(double) def f_cython(double x): return cos(x)*cos(x) - x*x*x + x - 5 /// }}} {{{id=132| a = float(e) f_cython(a) /// -21.535996363355856 }}} {{{id=58| timeit('f_cython(a)') /// 625 loops, best of 3: 184 ns per loop }}}

Damn, that was fast!!!!  This is over 1000 times faster than our original symbolic function f!

{{{id=135| 198/.184 /// 1076.08695652174 }}} {{{id=134| /// }}}

FTW: In our imaginary image processing example, we would thus get a speedup of a factor of 1000, and instead of the poor user waiting over 3 minutes, they wait 0.184 seconds.  

{{{id=152| t = 1000. * 1000 * 184 * units.si_prefixes.nano * units.time.second; t /// 1.84000000000000e8*nano*second }}} {{{id=42| t.convert(units.time.second) /// 0.184000000000000*second }}} {{{id=155| /// }}}

By the way, fast_float works with symbolic functions of any number of variables:

{{{id=35| f(x,y) = x^3 + y^2 - cos(x-y)^3 /// }}} {{{id=39| a = float(e) timeit('f(a,a)') /// 625 loops, best of 3: 250 µs per loop }}} {{{id=40| g = fast_float(f, x,y) /// }}} {{{id=41| timeit('g(a,a)') /// 625 loops, best of 3: 449 ns per loop }}}

To figure out how to use "pow", I typed 

       man 3 pow

at the bash shell command line.

{{{id=2| %cython cdef extern from "math.h": double cos(double) double pow(double, double) def f_cython(double x, double y): return pow(x,3)+ y*y - pow(cos(x-y),3) /// }}} {{{id=43| timeit('f_cython(a,a)') /// 625 loops, best of 3: 232 ns per loop }}} {{{id=162| %python import math def f_python(x,y): return x**3 + y*y - math.cos(x-y)**3 /// }}} {{{id=161| timeit('f_python(a,a)') /// 625 loops, best of 3: 829 ns per loop }}} {{{id=170| /// }}}

Problem:

  1. Write symbolic code, pure Python code, fast_float, and Cython code, as above, to evaluate the expression $$f(x,y,z) = x\cos(y) - xyz + x^3 + y^{10}$$ efficiently.  
  2. Compare timings of evaluation of this function using each method.
{{{id=169| /// }}} {{{id=168| /// }}} {{{id=163| /// }}} {{{id=98| /// }}}

Root Finding: A basic numerical algorithm

Suppose $f(x)$ is a continuous real-valued function on an interval $[a,b]$ and that $f(a)f(b) < 0$, i.e., there is a sign change. Then by calculus (the Intermediate Value Theorem) $f$ has a zero on this interval.

{{{id=3| f(x) = cos(x) - x show(plot(f, 0.1,1.5), xmin=0) /// }}} {{{id=172| /// }}}

The main goal is to understand various ways of numerically finding a point $c \in [a,b]$ such that $f(c)$ is very close to 0.

Bisection

Perhaps the simplest method is bisection. Given $\epsilon>0$, the following algorithm find a value $c\in [a,b]$ such that $|f(c)| < \epsilon$.

  1. If $|f((a+b)/2)| < \epsilon$, output $c=(a+b)/2$.
  2. Chop the interval $[a,b]$ in two parts $[a,c] \cup [c,b]$.
  3. If $f$ has a sign change on $[a,c]$ replace $[a,b]$ by $[a,c]$. Otherwise, replace $[a,b]$ by $[c,b]$. Then go to step 2.

At every step in the algorithm there is a point $c$ in the interval so that $f(c) = 0$. Since the width of the interval halves each time and each interval contains a zero of $f$, the sequence of intervals will converge to some root of $f$. Since $f$ is continuous, after a finite number of steps we will find a $c$ such that $|f(c)| < \epsilon$.

{{{id=11| def bisect_method(f, a, b, eps): try: f = fast_float(f, f.variables()[0]) except AttributeError: print "WARNING: Not using fast_float." intervals = [(a,b)] two = float(2); eps = float(eps) while True: c = (a+b)/two fa = f(a); fb = f(b); fc = f(c) if abs(fc) < eps: return c, intervals if fa*fc < 0: a, b = a, c elif fc*fb < 0: a, b = c, b else: raise ValueError, "f must have a sign change in the interval (%s,%s)"%(a,b) intervals.append((a,b)) html("

Double Precision Root Finding Using Bisection

") @interact def _(f = cos(x) - x, a = float(0), b = float(1), eps=(-3,(-16..-1))): eps = 10^eps print "eps = %s"%float(eps) try: time c, intervals = bisect_method(f, a, b, eps) except ValueError: print "f must have opposite sign at the endpoints of the interval" show(plot(f, a, b, color='red'), xmin=a, xmax=b) else: print "root =", c print "f(c) = %r"%f(c) print "iterations =", len(intervals) P = plot(f, a, b, color='red') h = (P.ymax() - P.ymin())/ (1.5*len(intervals)) L = sum(line([(c,h*i), (d,h*i)]) for i, (c,d) in enumerate(intervals) ) L += sum(line([(c,h*i-h/4), (c,h*i+h/4)]) for i, (c,d) in enumerate(intervals) ) L += sum(line([(d,h*i-h/4), (d,h*i+h/4)]) for i, (c,d) in enumerate(intervals) ) show(P + L, xmin=a, xmax=b) /// }}} {{{id=97| /// }}} {{{id=99| /// }}} {{{id=21| import scipy.optimize /// }}} {{{id=121| scipy.optimize. /// }}} {{{id=120| scipy.optimize.bisect? /// }}} {{{id=29| f(x) = cos(x) - x scipy.optimize.bisect(f, 0, 1) /// 0.73908513321566716 }}} {{{id=100| a=float(0); b=float(1) timeit('scipy.optimize.bisect(f, a, b)') /// 125 loops, best of 3: 7.05 ms per loop }}} {{{id=27| g = fast_float(f, x) timeit('scipy.optimize.bisect(g, a, b)') /// 625 loops, best of 3: 16.8 µs per loop }}}

The bisect we implemented above is slower:

{{{id=61| bisect_method(f, a, b, 1e-8)[0] /// 0.73908513784408569 }}} {{{id=102| timeit('bisect_method(g, a, b, 1e-8)[0]') /// 625 loops, best of 3: 446 µs per loop }}} {{{id=30| /// }}}

Scipy's bisect is a lot faster than ours, of course. It's optimized code written in C. Let's look.

  1. Extract the scipy spkg:
    wstein> ls spkg/standard/scipy*
    spkg/standard/scipy-0.7.p4.spkg
    spkg/standard/scipy_sandbox-20071020.p4.spkg
    wstein> tar jxvf spkg/standard/scipy-0.7.p4.spkg # may have to get from http://sagemath.org/packages/standard/
  2. Track down the bisect code. This just takes guessing and knowing how to get around source code. I found the bisect code in
      scipy-0.7.p4/src/scipy/optimize/Zeros/bisect.c
    
    Here it is:
    /* Written by Charles Harris charles.harris@sdl.usu.edu */
    
    #include "zeros.h"
    
    double
    bisect(callback_type f, double xa, double xb, double xtol, double rtol, int iter, default_parameters *params)
    {
        int i;
        double dm,xm,fm,fa,fb,tol;
        
        tol = xtol + rtol*(fabs(xa) + fabs(xb));
        
        fa = (*f)(xa,params);
        fb = (*f)(xb,params);
        params->funcalls = 2;
        if (fa*fb > 0) {ERROR(params,SIGNERR,0.0);}
        if (fa == 0) return xa;
        if (fb == 0) return xb;
        dm = xb - xa;
        params->iterations = 0;
        for(i=0; iiterations++;
            dm *= .5;
            xm = xa + dm;
            fm = (*f)(xm,params);
            params->funcalls++;
            if (fm*fa >= 0) {
                xa = xm;
            }
            if (fm == 0 || fabs(dm) < tol)
                return xm;
        }
        ERROR(params,CONVERR,xa);
    }
    

We can implement this directly in Sage using Cython.

{{{id=60| %cython cdef extern from "math.h": double abs(double) def bisect_cython(f, double a, double b, double eps): cdef double two = 2, fa, fb, fc, c, dm, fabs cdef int iterations = 0 fa = f(a); fb = f(b) while 1: iterations += 1 c = (a+b)/two fc = f(c) fabs = -fc if fc < 0 else fc if fabs < eps: return c, iterations if fa*fc < 0: a, b = a, c fb = fc elif fc*fb < 0: a, b = c, b fa = fc else: raise ValueError, "f must have a sign change in the interval (%s,%s)"%(a,b) /// }}}

Problem: Click on each of the .html link above to see the autogenerated code (double click on a line of Cython to see the corresponding C code).

{{{id=105| /// }}} {{{id=64| f = cos(x) - x g = fast_float(f, x) print bisect_cython(g, 0.0, 1.0, 1e-16) print scipy.optimize.bisect(g, 0.0, 1.0, maxiter=40) /// (0.73908513321516067, 52) 0.739085133216 }}} {{{id=106| /// }}}

Note: Putting r after a number keeps the Sage preparser from turning it into a Sage arbitrary precision real number (instead of a standard Python float).

{{{id=59| print "scipy" timeit('scipy.optimize.bisect(g, 0.0r, 1.0r, maxiter=40)') print "sage/python" timeit('bisect_method(g, 0.0r, 1.0r, 1e-16r)') print "sage/cython" timeit('bisect_cython(g, 0.0r, 1.0r, 1e-16r)') /// scipy 625 loops, best of 3: 17.4 µs per loop sage/python 625 loops, best of 3: 883 µs per loop sage/cython 625 loops, best of 3: 15.1 µs per loop }}} {{{id=63| /// }}} {{{id=68| /// }}} {{{id=65| /// }}} {{{id=10| /// }}}

Newton's Method

{{{id=66| %hide %html Often $f$ is differentiable and its values can help us give an even more efficient root finding algorithm called Newton's Method.

The equation of the tangent line to the graph of $f(x)$ at the point $(c,f(c))$ is $$ y = f(c) + f'(c)(x-c). $$ This is because the above line goes through the point $(c,f(c))$ and is a line of slow $f'(c)$.

Newton's method: Approximate $f(x)$ by the line above, find a root of that line, then replace $c$ by that root. Iterate. In particular, by easy algebra, the root of the linear function is $$c - \frac{f(c)}{f'(c)}.$$

As an algorithm:
  1. Choose a starting value $c$.
  2. Let $c = c - f(c)/f'(c)$.
  3. If $|f(c)| < \epsilon$ output $c$ and terminate. Otherwise, go to 2.
      /// Often f is differentiable and its values can help us give an even more efficient root finding algorithm called Newton's Method.

      The equation of the tangent line to the graph of f(x) at the point (c,f(c)) is
      y = f(c) + f'(c)(x-c).
      This is because the above line goes through the point (c,f(c)) and is a line of slow f'(c).

      Newton's method: Approximate f(x) by the line above, find a root of that line, then replace c by that root. Iterate. In particular, by easy algebra, the root of the linear function is
      c - \frac{f(c)}{f'(c)}.


      As an algorithm:
      1. Choose a starting value c.
      2. Let c = c - f(c)/f'(c).
      3. If |f(c)| < \epsilon output c and terminate. Otherwise, go to 2.
          }}} {{{id=73| x = var('x') @interact def _(f = cos(x) - x, c = float(1/2), a=float(0), b=float(1)): show(plot(f,a,b) + point((c,f(c)),rgbcolor='black',pointsize=20) + \ plot(f(c) + f.derivative()(c)*(x-c), a,b,color='red'), a,b) /// }}} {{{id=26| def newton_method(f, c, eps, maxiter=100): x = f.variables()[0] fprime = f.derivative(x) try: g = f._fast_float_(x) gprime = fprime._fast_float_(x) except AttributeError: g = f; gprime = fprime iterates = [c] for i in xrange(maxiter): fc = g(c) if abs(fc) < eps: return c, iterates c = c - fc/gprime(c) iterates.append(c) return c, iterates html("

          Double Precision Root Finding Using Newton's Method

          ") var('x') @interact def _(f = x^2 - 2, c = float(0.5), eps=(-3,(-16..-1)), interval=float(0.5)): eps = 10^(eps) print "eps = %s"%float(eps) time z, iterates = newton_method(f, c, eps) print "root =", z print "f(c) = %r"%f(z) n = len(iterates) print "iterations =", n html(iterates) P = plot(f, z-interval, z+interval, rgbcolor='blue') h = P.ymax(); j = P.ymin() L = sum(point((w,(n-1-float(i))/n*h), rgbcolor=(float(i)/n,0.2,0.3), pointsize=10) + \ line([(w,h),(w,j)],rgbcolor='black',thickness=0.2) for i,w in enumerate(iterates)) show(P + L, xmin=z-interval, xmax=z+interval) /// }}} {{{id=25| float(sqrt(2)) /// 1.4142135623730951 }}} {{{id=93| /// }}} {{{id=117| scipy.optimize.newton? /// }}} {{{id=78| x = var('x') f = x^2 - 2 g = f._fast_float_(x) gprime = f.derivative()._fast_float_(x) scipy.optimize.newton(g, 1, gprime) /// 1.4142135623730951 }}}

          Interestingly, newton in scipy is written in PURE PYTHON, so might (!?) be easy to beat using Cython.

          {{{id=79| scipy.optimize.newton?? /// }}}

          Write our own Newton method in Cython:

          {{{id=23| %cython def newton_cython(f, double c, fprime, double eps, int maxiter=100): cdef double fc cdef int i for i from 0 <= i < maxiter: fc = f(c) absfc = -fc if fc < 0 else fc if absfc < eps: return c c = c - fc/fprime(c) return c /// }}}

          Test them all

          {{{id=80| x = var('x') f = (x^2 - cos(x) + sin(x^2))^3 g = f._fast_float_(x) gprime = f.derivative()._fast_float_(x) timeit('scipy.optimize.newton(g, 0.5r, gprime)') timeit('newton_cython(g, 0.5r, gprime, 1e-13)') /// 625 loops, best of 3: 127 µs per loop 625 loops, best of 3: 37.6 µs per loop }}} {{{id=82| newton_cython(g, 0.5r, gprime, 1e-13) /// 0.63810381041535857 }}} {{{id=6| plot(g, 0,1) /// }}} {{{id=81| /// }}}

          Easy wrappers

          Sage has code that uses Scipy, fast_float, etc., behind the scenes.  With your help it could have a lot more!

          {{{id=86| var('x') f = cos(x)-x /// }}} {{{id=85| f.find_root(0,1) # fast /// 0.7390851332151559 }}} {{{id=109| /// }}}

          Problem: Find a root of something else on another interval.

          {{{id=83| /// }}}

          Problem: Type f.find_root??  to read the source code.  The function just calls into another module.  Type sage.numerical.optimize.find_root?? to read the source code of that, which does involve calling scipy.

          {{{id=114| /// }}} {{{id=16| /// }}} {{{id=115| /// }}} {{{id=75| /// }}} {{{id=125| /// }}} {{{id=124| /// }}} {{{id=123| /// }}} {{{id=122| /// }}} {{{id=77| /// }}}