Math 480: 2010-05-14
Fast Function Evaluation and Some More Cython
William Stein
Sage has a powerful fast_float function that given a symbolic expression returns a very fast callable object. This is really useful to know about when implementing root finding or if you call any functions from scipy. At present, _fast_float_ can be 100 times faster than not using it! Here is an example:
{{{id=34| f(x) = cos(x)^2 - x^3 + x - 5 parent(f) /// Callable function ring with arguments (x,) }}} {{{id=36| a = float(e); a /// 2.7182818284590451 }}} {{{id=37| f(a) /// -21.5359963633559 }}} {{{id=89| type(f(a)) ///198 microseconds per call may sound fast, but if we're calling this function millions of times, e.g., applying it to entries of a 1000x1000 matrix for some image processing application, or drawing a fractal, etc., then 198 microseconds is dog slow:
{{{id=150| t = 1000. * 1000 * 198 * units.si_prefixes.micro * units.time.second; t /// 1.98000000000000e8*micro*second }}} {{{id=151| t.convert(units.time.minute) /// 3.30000000000000*minute }}} {{{id=148| /// }}}Pure Python is also very good in this case, because its evalution of float arithmetic is highly optimized.
Method 1: Use "raw" literals and explicit call the cos math library function.
The "r" suffix below is special to Sage. It means "please, do not preparse this number. Leave it as a Python float or int."
{{{id=140| def f_python(x): return math.cos(x)^2.0r - x^3.0r + x - 5.0r /// }}} {{{id=144| f_python(a) /// -21.535996363355856 }}} {{{id=145| type(f_python(a)) ///Method 2: Use a %python block (equivalently, put your code in a .py file and import it). Note that here the Sage preparser isn't invoked, so be careful to use ** instead of ^, etc.
{{{id=165| %python import math def f_python(x): return math.cos(x)**2.0 - x**3.0 + x - 5.0 /// }}} {{{id=167| timeit('f_python(a)') /// 625 loops, best of 3: 920 ns per loop }}} {{{id=141| /// }}}Now we try fast_float, and see how it does.
{{{id=95| g = fast_float(f, x) /// }}} {{{id=54| g ///In our imaginary image processing example, we would thus get a speedup, and instead of the poor user waiting over 3 minutes, they wait less than half a second.
{{{id=156| t = 1000. * 1000 * 427 * units.si_prefixes.nano * units.time.second; t /// 4.27000000000000e8*nano*second }}} {{{id=57| t.convert(units.time.second) /// 0.427000000000000*second }}} {{{id=159| /// }}} {{{id=158| /// }}}If your function is sufficiently simple, you can also code it up using Cython. When this works, it is blazingly fast, though you have the overhead of compiling the code in the first place. Also, for optimized function evaluation, you have to use Cython like the C language, so to get good performance you must know the basics of C programming:
Damn, that was fast!!!! This is over 1000 times faster than our original symbolic function f!
{{{id=135| 198/.184 /// 1076.08695652174 }}} {{{id=134| /// }}}FTW: In our imaginary image processing example, we would thus get a speedup of a factor of 1000, and instead of the poor user waiting over 3 minutes, they wait 0.184 seconds.
{{{id=152| t = 1000. * 1000 * 184 * units.si_prefixes.nano * units.time.second; t /// 1.84000000000000e8*nano*second }}} {{{id=42| t.convert(units.time.second) /// 0.184000000000000*second }}} {{{id=155| /// }}}By the way, fast_float works with symbolic functions of any number of variables:
{{{id=35| f(x,y) = x^3 + y^2 - cos(x-y)^3 /// }}} {{{id=39| a = float(e) timeit('f(a,a)') /// 625 loops, best of 3: 250 µs per loop }}} {{{id=40| g = fast_float(f, x,y) /// }}} {{{id=41| timeit('g(a,a)') /// 625 loops, best of 3: 449 ns per loop }}}To figure out how to use "pow", I typed
man 3 pow
at the bash shell command line.
{{{id=2| %cython cdef extern from "math.h": double cos(double) double pow(double, double) def f_cython(double x, double y): return pow(x,3)+ y*y - pow(cos(x-y),3) /// }}} {{{id=43| timeit('f_cython(a,a)') /// 625 loops, best of 3: 232 ns per loop }}} {{{id=162| %python import math def f_python(x,y): return x**3 + y*y - math.cos(x-y)**3 /// }}} {{{id=161| timeit('f_python(a,a)') /// 625 loops, best of 3: 829 ns per loop }}} {{{id=170| /// }}}Problem:
Suppose $f(x)$ is a continuous real-valued function on an interval $[a,b]$ and that $f(a)f(b) < 0$, i.e., there is a sign change. Then by calculus (the Intermediate Value Theorem) $f$ has a zero on this interval.
{{{id=3| f(x) = cos(x) - x show(plot(f, 0.1,1.5), xmin=0) ///The main goal is to understand various ways of numerically finding a point $c \in [a,b]$ such that $f(c)$ is very close to 0.
Perhaps the simplest method is bisection. Given $\epsilon>0$, the following algorithm find a value $c\in [a,b]$ such that $|f(c)| < \epsilon$.
At every step in the algorithm there is a point $c$ in the interval so that $f(c) = 0$. Since the width of the interval halves each time and each interval contains a zero of $f$, the sequence of intervals will converge to some root of $f$. Since $f$ is continuous, after a finite number of steps we will find a $c$ such that $|f(c)| < \epsilon$.
{{{id=11| def bisect_method(f, a, b, eps): try: f = fast_float(f, f.variables()[0]) except AttributeError: print "WARNING: Not using fast_float." intervals = [(a,b)] two = float(2); eps = float(eps) while True: c = (a+b)/two fa = f(a); fb = f(b); fc = f(c) if abs(fc) < eps: return c, intervals if fa*fc < 0: a, b = a, c elif fc*fb < 0: a, b = c, b else: raise ValueError, "f must have a sign change in the interval (%s,%s)"%(a,b) intervals.append((a,b)) html("The bisect we implemented above is slower:
{{{id=61| bisect_method(f, a, b, 1e-8)[0] /// 0.73908513784408569 }}} {{{id=102| timeit('bisect_method(g, a, b, 1e-8)[0]') /// 625 loops, best of 3: 446 µs per loop }}} {{{id=30| /// }}}Scipy's bisect is a lot faster than ours, of course. It's optimized code written in C. Let's look.
wstein> ls spkg/standard/scipy*
spkg/standard/scipy-0.7.p4.spkg
spkg/standard/scipy_sandbox-20071020.p4.spkg
wstein> tar jxvf spkg/standard/scipy-0.7.p4.spkg # may have to get from http://sagemath.org/packages/standard/
scipy-0.7.p4/src/scipy/optimize/Zeros/bisect.cHere it is:
/* Written by Charles Harris charles.harris@sdl.usu.edu */ #include "zeros.h" double bisect(callback_type f, double xa, double xb, double xtol, double rtol, int iter, default_parameters *params) { int i; double dm,xm,fm,fa,fb,tol; tol = xtol + rtol*(fabs(xa) + fabs(xb)); fa = (*f)(xa,params); fb = (*f)(xb,params); params->funcalls = 2; if (fa*fb > 0) {ERROR(params,SIGNERR,0.0);} if (fa == 0) return xa; if (fb == 0) return xb; dm = xb - xa; params->iterations = 0; for(i=0; iiterations++; dm *= .5; xm = xa + dm; fm = (*f)(xm,params); params->funcalls++; if (fm*fa >= 0) { xa = xm; } if (fm == 0 || fabs(dm) < tol) return xm; } ERROR(params,CONVERR,xa); }
We can implement this directly in Sage using Cython.
{{{id=60| %cython cdef extern from "math.h": double abs(double) def bisect_cython(f, double a, double b, double eps): cdef double two = 2, fa, fb, fc, c, dm, fabs cdef int iterations = 0 fa = f(a); fb = f(b) while 1: iterations += 1 c = (a+b)/two fc = f(c) fabs = -fc if fc < 0 else fc if fabs < eps: return c, iterations if fa*fc < 0: a, b = a, c fb = fc elif fc*fb < 0: a, b = c, b fa = fc else: raise ValueError, "f must have a sign change in the interval (%s,%s)"%(a,b) /// }}}Problem: Click on each of the .html link above to see the autogenerated code (double click on a line of Cython to see the corresponding C code).
{{{id=105| /// }}} {{{id=64| f = cos(x) - x g = fast_float(f, x) print bisect_cython(g, 0.0, 1.0, 1e-16) print scipy.optimize.bisect(g, 0.0, 1.0, maxiter=40) /// (0.73908513321516067, 52) 0.739085133216 }}} {{{id=106| /// }}}Note: Putting r after a number keeps the Sage preparser from turning it into a Sage arbitrary precision real number (instead of a standard Python float).
{{{id=59| print "scipy" timeit('scipy.optimize.bisect(g, 0.0r, 1.0r, maxiter=40)') print "sage/python" timeit('bisect_method(g, 0.0r, 1.0r, 1e-16r)') print "sage/cython" timeit('bisect_cython(g, 0.0r, 1.0r, 1e-16r)') /// scipy 625 loops, best of 3: 17.4 µs per loop sage/python 625 loops, best of 3: 883 µs per loop sage/cython 625 loops, best of 3: 15.1 µs per loop }}} {{{id=63| /// }}} {{{id=68| /// }}} {{{id=65| /// }}} {{{id=10| /// }}}Interestingly, newton in scipy is written in PURE PYTHON, so might (!?) be easy to beat using Cython.
{{{id=79| scipy.optimize.newton?? /// }}}Write our own Newton method in Cython:
{{{id=23| %cython def newton_cython(f, double c, fprime, double eps, int maxiter=100): cdef double fc cdef int i for i from 0 <= i < maxiter: fc = f(c) absfc = -fc if fc < 0 else fc if absfc < eps: return c c = c - fc/fprime(c) return c /// }}}Test them all
{{{id=80| x = var('x') f = (x^2 - cos(x) + sin(x^2))^3 g = f._fast_float_(x) gprime = f.derivative()._fast_float_(x) timeit('scipy.optimize.newton(g, 0.5r, gprime)') timeit('newton_cython(g, 0.5r, gprime, 1e-13)') /// 625 loops, best of 3: 127 µs per loop 625 loops, best of 3: 37.6 µs per loop }}} {{{id=82| newton_cython(g, 0.5r, gprime, 1e-13) /// 0.63810381041535857 }}} {{{id=6| plot(g, 0,1) ///Sage has code that uses Scipy, fast_float, etc., behind the scenes. With your help it could have a lot more!
{{{id=86| var('x') f = cos(x)-x /// }}} {{{id=85| f.find_root(0,1) # fast /// 0.7390851332151559 }}} {{{id=109| /// }}}Problem: Find a root of something else on another interval.
{{{id=83| /// }}}Problem: Type f.find_root?? to read the source code. The function just calls into another module. Type sage.numerical.optimize.find_root?? to read the source code of that, which does involve calling scipy.
{{{id=114| /// }}} {{{id=16| /// }}} {{{id=115| /// }}} {{{id=75| /// }}} {{{id=125| /// }}} {{{id=124| /// }}} {{{id=123| /// }}} {{{id=122| /// }}} {{{id=77| /// }}}