See http://rpy.sourceforge.net/rpy2/doc-2.0/html/introduction.html.
{{{id=1| %auto import rpy2.robjects as robjects R = robjects.r /// }}}We get pi from the R namespace.
{{{id=5| v = R['pi']; v ///Note that we have to explicitly use print to see a nice representation:
{{{id=6| print v /// [1] 3.141593 }}}There is a pexpect interface to r called "r" by default when you start Sage. This tutorial is not about that interface, but instead about the C library interface called rpy2, which is much faster and more robust.
{{{id=135| r /// R Interpreter }}} {{{id=19| r('2 + 3') # the pexpect interface /// [1] 5 }}} {{{id=130| import rpy2.robjects as robjects R = robjects.r print R('2 + 3') # the rpy2 cython interface (note the import!) /// [1] 5 }}} {{{id=136| R(""" a = 5 b = 7 c = a + b""") /// }}} {{{id=137| print R("c") /// [1] 12 }}} {{{id=131| timeit("r('2+3')") /// 625 loops, best of 3: 1.44 ms per loop }}} {{{id=18| timeit("R('2+3')") /// 625 loops, best of 3: 650 µs per loop }}} {{{id=133| timeit("pari('2+3')") /// 625 loops, best of 3: 5.72 µs per loop }}}(frankly, I'm shocked at how slow the rpy2 interface actually is...!)
{{{id=132| /// }}}This is how to get started with rpy2:
Beware the preparser:
{{{id=138| v = R['pi']; v ///And note again that v is a vector not a number.
{{{id=7| print v + int(3) /// [1] 3.141593 3.000000 }}} {{{id=8| print v[0] + int(3) /// 6.14159265359 }}}WARNING: Python indexing starts at 0 and R indexing starts at 1.
{{{id=82| print R('c(5,2,-3)[1]') /// [1] 5 }}} {{{id=80| /// }}} {{{id=145| timeit('R("f <- function(r) { 2 * pi * r }")') /// 625 loops, best of 3: 460 µs per loop }}}Define a function in R:
{{{id=17| R("f <- function(r) { 2 * pi * r }") ///Now call the function:
{{{id=16| print R("f(3)") /// [1] 18.84956 }}}The function is now defined in the global R namespace:
{{{id=147| r_f = R['f'] /// }}} {{{id=146| print r_f(int(3)) /// [1] 18.84956 }}} {{{id=148| timeit('r_f(int(3))') /// 625 loops, best of 3: 41.4 µs per loop }}} {{{id=15| print R("f") /// function(r) { 2 * pi * r } }}} {{{id=24| /// }}}Most R objects have a string representation that can be directly parsed by R, which can be handy.
{{{id=9| letters = R['letters'] /// }}} {{{id=22| print letters.r_repr() /// c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z") }}}Here is an example of how we might use this:
{{{id=23| rcode = 'paste(%s, collapse="-")' %(letters.r_repr()) print R(rcode) /// [1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z" }}} {{{id=149| timeit('robjects.IntVector(range(10))') /// 625 loops, best of 3: 9.65 µs per loop }}} {{{id=150| time w = robjects.IntVector(range(10^6)) /// Time: CPU 0.74 s, Wall: 0.74 s }}} {{{id=154| time print R['mean'](w) /// [1] 499999.5 Time: CPU 0.16 s, Wall: 0.17 s }}} {{{id=155| time print R['sd'](w) /// [1] 288675.3 Time: CPU 0.18 s, Wall: 0.18 s }}} {{{id=152| time w = r(range(10^3)) /// Time: CPU 1.12 s, Wall: 2.56 s }}} {{{id=153| /// }}} {{{id=151| time z = pari(range(10^6)) /// Traceback (most recent call last): File "Vectors are an important basic data structure in R:
{{{id=30| print robjects.StrVector(['abc', 'def']) /// [1] "abc" "def" }}} {{{id=31| print robjects.IntVector([1, 2, 3]) /// [1] 1 2 3 }}} {{{id=33| print robjects.FloatVector([1.1, 2.2, 3.3]) /// [1] 1.1 2.2 3.3 }}}You can also create R matrices, which are R vectors with a dim attribute:
{{{id=36| v = robjects.FloatVector([1.1, 2.2, 3.3, 4.4, 5.5, 6.6]) m = R['matrix'](v, nrow = int(2)) print m /// [,1] [,2] [,3] [1,] 1.1 3.3 5.5 [2,] 2.2 4.4 6.6 }}}The above illustrates how to call an R function. You get it from the R namespace, then call it in the standard way. Here's another example:
{{{id=157| v = robjects.IntVector([1..10]) /// }}} {{{id=158| print R['sum'] /// function (..., na.rm = FALSE) .Primitive("sum") }}} {{{id=159| ans = R['sum'](v) ans ///You can also pass in keywords:
{{{id=46| rsort = R['sort'] print rsort(v, decreasing=True) /// [1] 10 9 8 7 6 5 4 3 2 1 }}} {{{id=99| /// }}}GOTCHA: In R variable names with dots in them are allowed, but in Python they are not. The example below illustrates how to deal with this (use **kwds). In this example, we make an R vector with a "NA" in it, which means we don't know that entry; the na.rm option to R's sum command controls how it behaves on lists with NA's in them.
{{{id=100| v = R('c(1,NA,2,3)') print v /// [1] 1 NA 2 3 }}} {{{id=162| print R['sum'] /// function (..., na.rm = FALSE) .Primitive("sum") }}} {{{id=98| rsum = R['sum'] print rsum(v) /// [1] NA }}}Directly in R, we would just type na.rm=TRUE. In Python this does not make sense.
{{{id=102| print R('sum( c(1,NA,2,3), na.rm=TRUE )') /// [1] 6 }}} {{{id=106| print rsum(v, na.rm=True) # boom! /// Traceback (most recent call last): File "So we use **kwds, which works fine:
{{{id=101| a = {'na.rm':True} print R['sum'](v, **a) /// [1] 6 }}} {{{id=165| def f(a, b, c): return a + 2*b + 3*c args = (5,) kwds = {'b':7, 'c':13} f(*args, **kwds) /// 58 }}} {{{id=169| def g(*scott, **alex): print scott, alex return f(*scott, **alex) /// }}} {{{id=170| g(1,2,c=3) /// (1, 2) {'c': 3} 14 }}} {{{id=164| f( *(3, 8), **{'c':2}) /// 25 }}} {{{id=97| f( 2, *(5,), **{'c':1}) /// 15 }}}IMPORTANT: This must all happen in the same notebook cell. Otherwise the output file gets written in temp directory for a cell that was already evaluated, and the plot may not appear.
{{{id=54| x = robjects.IntVector(range(50)) y = R.rnorm(len(x)) # normal random numbers # 300r = "raw Python int" (no preparser) R.png('sage.png', width=600r, height=300r) R.plot(x, y, xlab="x", ylab="rnorm", col="red") _ = R['dev.off']() # "_ =" to suppress printing /// }}}Interact works, of course.
{{{id=61| @interact def _(points=(10..1000)): x = robjects.IntVector(range(points)); y = R.rnorm(int(points)) R.png('sage.png', width=600r, height=300r) R.plot(x, y, xlab="x", ylab="rnorm", col="blue") R['dev.off']() /// }}} {{{id=60| /// }}}Warning again -- Do NOT do this: call dev.off in a separate cell!
{{{id=57| # intentionally broken! x = robjects.IntVector(range(50)) y = R.rnorm(len(x)) # normal random numbers R.png('sage.png', width=600r, height=300r) R.plot(x, y, xlab="runif", ylab="foo/bar", col="red") ///
This is how we would do this directly in R, which we can use from Sage by using the "%r" mode in the notebook (or r.eval("""...""")):
{{{id=71| %r ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) anova(lm.D9 <- lm(weight ~ group)) summary(lm.D90 <- lm(weight ~ group - 1))# omitting intercept /// Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16 }}}Next, we do the same computation, but via rpy2 (which is unfortunately more complicated):
{{{id=74| ctl = robjects.FloatVector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14]) trt = robjects.FloatVector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69]) group = R.gl(2r, 10r, 20r, labels = ["Ctl","Trt"]) weight = ctl + trt robjects.globalEnv["weight"] = weight robjects.globalEnv["group"] = group lm_D9 = R.lm("weight ~ group") print(R.anova(lm_D9)) /// Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 }}} {{{id=66| lm_D90 = R.lm("weight ~ group - 1") v = R.summary(lm_D90) print(v) /// Call: function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "subset", "weights", "na.action", "offset"), names(mf), 0L) mf <- mf[c(1L, m)] mf$drop.unused.levels <- TRUE mf[[1L]] <- as.name("model.frame") mf <- eval(mf, parent.frame()) if (method == "model.frame") return(mf) else if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'", method), domain = NA) mt <- attr(mf, "terms") y <- model.response(mf, "numeric") w <- as.vector(model.weights(mf)) if (!is.null(w) && !is.numeric(w)) stop("'weights' must be a numeric vector") offset <- as.vector(model.offset(mf)) if (!is.null(offset)) { if (length(offset) != NROW(y)) stop(gettextf("number of offsets is %d, should equal %d (number of observations)", length(offset), NROW(y)), domain = NA) } if (is.empty.model(mt)) { x <- NULL z <- list(coefficients = if (is.matrix(y)) matrix(, 0, 3) else numeric(0L), residuals = y, fitted.values = 0 * y, weights = w, rank = 0L, df.residual = if (is.matrix(y)) nrow(y) else length(y)) if (!is.null(offset)) { z$fitted.values <- offset z$residuals <- y - offset } } else { x <- model.matrix(mt, mf, contrasts) z <- if (is.null(w)) lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, ...) } class(z) <- c(if (is.matrix(y)) "mlm", "lm") z$na.action <- attr(mf, "na.action") z$offset <- offset z$contrasts <- attr(x, "contrasts") z$xlevels <- .getXlevels(mt, mf) z$call <- cl z$terms <- mt if (model) z$model <- mf if (ret.x) z$x <- x if (ret.y) z$y <- y if (!qr) z$qr <- NULL z }(formula = "weight ~ group - 1") Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16 }}} {{{id=76| print(lm_D9.names) /// [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "contrasts" "xlevels" "call" "terms" [13] "model" }}} {{{id=65| print(lm_D9.r['coefficients']) /// $coefficients (Intercept) groupTrt 5.032 -0.371 }}}You could also use rpy2 as follows to do this computation:
{{{id=78| R(""" ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) print(anova(lm.D9 <- lm(weight ~ group))) print(summary(lm.D90 <- lm(weight ~ group - 1))) """) /// Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16 }}} {{{id=77| /// }}}In R a "data frame" is an array of values with labeled rows and columns (like part of a spreadsheet). Typically one thinks of a data frame as a table where the rows are observations and the columns are variables.
You can create a data frame using the data.frame R function:
{{{id=58| d = {'value': robjects.IntVector((24,25,26)), 'letter': robjects.StrVector(('x', 'y', 'z'))} dataf = R['data.frame'](**d) print(dataf) /// letter value 1 x 24 2 y 25 3 z 26 }}} {{{id=85| type(dataf) ///Get each column:
{{{id=86| print dataf.r['letter'] /// letter 1 x 2 y 3 z }}} {{{id=87| print dataf.r['value'] /// value 1 24 2 25 3 26 }}}Labels for the rows:
{{{id=88| print dataf.rownames() /// [1] "1" "2" "3" }}}Labels for the columns:
{{{id=89| print dataf.colnames() /// [1] "letter" "value" }}} {{{id=96| /// }}}If you are using rpy2 and Sage together to deal with large real-world data sets, then it is critical that you can quickly move data back and forth. If you're working with big data in Sage, you're probably using numpy arrays. Fortunately, there is a way to very quickly convert a big numpy array to an R vector and conversely, as we illustrate below.
NOTE: The rpy2 documentation suggests doing "import rpy2.robjects.numpy2ri" but this is broken (at least with the versions of R, rpy2, and numpy in Sage), and gives totally wrong results. So just explicitly use FloatVector, etc., as illustrated below.
{{{id=109| import numpy a = numpy.array([[1,2],[3,4]], dtype=float) v = numpy.arange(5) /// }}} {{{id=126| print R(v) /// Traceback (most recent call last): File "... CRAP, this seems to be just totally broken in rpy2. Maybe it is fixed in a newer version. Sorry folks.
{{{id=127| /// }}} {{{id=129| /// }}}