Big Data/Analytics Zone is brought to you in partnership with:

Jonathan Callahan received his Ph.D. in Physical Chemistry from the University of Washington in 1993. After two years as a post-doc in a magnetic resonance imaging laboratory, Jonathan joined NOAA's Pacific Marine Environmental Laboratory to work on analysis and visualization software for oceanographic climate and model data. Since 2007 Jonathan has worked as an independent consultant for NOAA, NASA and the US EPA. His areas of expertise include: data management; data visualization; statistical analysis using R; interface design; data mining; web services architecture. Jonathan writes occasional articles on data management at Working With Data. Jonathan is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Using R — .Call(“hello”)

02.10.2013
| 1979 views |
  • submit to reddit

In an introductory post on R APIs to C code, Calling C Code ‘Hello World!’, we explored the .C() function with some ‘Hello World!’ baby steps.  In this post we will make a leap forward by implementing the same functionality using the .Call() function.

Is .Call() better than .C()?

A heated but friendly conversation took place on the r-devel email forum this past March about R’s copying of arguments and the merits of .C() and .Call().  It is perhaps best to just include a highlight from this exchange.  Here is Simon Urbanek responding to Hervé Pagès:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
> My understanding is that most packages use the .C interface
> because it's simpler to deal with and because they don't need
> to pass complicated objects at the C level, just atomic vectors.
> My guess is that it's probably rarely the case that the cost
> of copying the arguments passed to .C is significant, but,
> if that was the case, then they could always call .C() with
> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
> section in the man page).
>
> No need to switch to .Call
>
I strongly disagree. I'm appalled to see that sentence here. The overhead is
significant for any large vector and it is in particular unnecessary since in
.C you have to allocate *and copy* space even for results (twice!). Also it is
very error-prone, because you have no information about the length of vectors
so it's easy to run out of bounds and there is no way to check. IMHO .C should
not be used for any code written in this century (the only exception may be if
you are passing no data, e.g. if all you do is to pass a flag and expect no
result, you can get away with it even if it is more dangerous). It is a legacy
interface that dates way back and is essentially just re-named .Fortran
interface. Again, I would strongly recommend the use of .Call in any recent
code because it is safer and more efficient (if you don't care about either
attribute, well, feel free ;)).

The important differences between the two R interfaces to C code are summarized here:

.C()

  • allows you to write simple C code that knows nothing about R
  • only simple data types can be passed
  • all argument type conversion and checking must be done in R
  • all memory allocation must be done in R
  • all arguments are copied locally before being passed to the C function (memory bloat)

.Call()

  • allows you to write simple R code
  • allows for complex data types
  • allows for a C function return value
  • allows C function to allocate memory
  • does not require wasteful argument copying
  • requires much more knowledge of R internals
  • is the recommended, modern approach for serious C programmers

To allow readers to compare for themselves how difficult or easy it is to switch from .C() to .Call() we will re-implement our three “Hello World!” examples using the .Call() interface.

Getting used to SEXP

The first thing you have to embrace when using the .Call() interface is the new way of dealing with R objects inside your C code.  Excellent introductory information and example code is available here:

In preparation for working with .Call() you will want to familiarize yourself with the location of R’s include files.  The following Unix shell commands show how to find where R is installed and then look at the contents of the include directory:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ R RHOME
/usr/lib/R
$ ls -1 `R RHOME`/include
Rconfig.h
Rdefines.h
Rembedded.h
R_ext
R.h
Rinterface.h
Rinternals.h
Rmath.h
Rversion.h
S.h

Here’s what they contain:

Rconfig.hvarious configuration flags
Rdefines.hlots of macros of interest, includes Rinternals.h
Rembedded.hfunction declarations for embedding R in C programs
R_extdirectory of include files for specific data types, etc.
R.hincludes all the files found in R_ext
Rinterface.hprovides hooks for external GUIs
Rinternals.hcore R data structures
Rmath.hmath constants and function declarations
Rversion.hversion string components
S.hmacros for S/R compatibility

With the .Call() interface, the C function needs to be of type SEXP — a pointer to a SEXPREC or Simple EXPression RECord.  We’ll get the definition of SEXP and everything else we need by including both R.h and Rdefines.h in our code.  So here is the C code for our first, brain dead C function — helloA1.c:

1
2
3
4
5
6
7
#include <R.h>
#include <Rdefines.h>
#include <stdio.h>
SEXP helloA1(){
 printf("Hello World!\n");
 return(R_NilValue);
}

Note that, even though we are returning R_NilValue (aka NULL), the function is declared to be of type SEXP.  The function will always be of type SEXP, as will any arguments.  It will be up to the C code to convert other data types into and out of SEXP.  As in the previous post, you should compile this code with R CMD SHLIB helloA1.c.  Here is the very simple R function we need to add to wrappers.R:

1
2
3
4
5
# wrapper function to invoke helloA1
dyn.load("helloA1.so")
helloA1<-function(){
 result<-.Call("helloA1")
}

Finally, what does it look like when invoked from R?

1
2
3
4
5
>source('wrappers.R')
>greeting<-helloA1()
Hello World!
>class(greeting)
[1]"NULL"

Whew!  That was a lot of complexity just to run “Hello World!”.  However, the value of this complexity will become apparent as we move forward.

PROTECT against garbage collection

One of the things R does well is pick up the garbage we leave lying around.  (If you’ve ever lived through a garbage haulers’ strike you know this is a good thing.)  Unused objects are disposed of after they are no longer needed (i.e. after there are no more active references to them) to free up memory.  As we write C code that uses R functions and structures we need to make sure that R knows when it should not toss something out and, after we are done, when it is again OK.  This is done with the PROTECT andUNPROTECT functions.

Here is our next iteration of “Hello World!” where we will allocate space for an R character vector, assign our greeting to the first element and then return the vector:

1
2
3
4
5
6
7
8
9
#include <R.h>
#include <Rdefines.h>
SEXP helloB1(){
 SEXP result;
 PROTECT(result=NEW_CHARACTER(1));
 SET_STRING_ELT(result,0,mkChar("Hello World!"));
 UNPROTECT(1);
 return(result);
}

Note that we allocate memory for a character vector of length # withNEW_CHARACTER(#).  It is worth taking a look in the R include files to see how this and similar macros are defined:

1
2
3
4
5
6
7
8
9
10
11
$ grep NEW_ /usr/lib/R/include/*.h
/usr/lib/R/include/Rdefines.h:#define NEW_LOGICAL(n)  allocVector(LGLSXP,n)
/usr/lib/R/include/Rdefines.h:#define NEW_INTEGER(n)  allocVector(INTSXP,n)
/usr/lib/R/include/Rdefines.h:#define NEW_NUMERIC(n)  allocVector(REALSXP,n)
/usr/lib/R/include/Rdefines.h:#define NEW_CHARACTER(n)  allocVector(STRSXP,n)
/usr/lib/R/include/Rdefines.h:#define NEW_COMPLEX(n)  allocVector(CPLXSXP,n)
/usr/lib/R/include/Rdefines.h:#define NEW_LIST(n) allocVector(VECSXP,n)
/usr/lib/R/include/Rdefines.h:#define NEW_STRING(n) NEW_CHARACTER(n)
/usr/lib/R/include/Rdefines.h:#define NEW_RAW(n)  allocVector(RAWSXP,n)
/usr/lib/R/include/Rdefines.h:/* NEW_OBJECT is recommended; NEW is for green book compatibility */
/usr/lib/R/include/Rdefines.h:#define NEW_OBJECT(class_def) R_do_new_object(class_def)

So we could have used allocVector(STRSXP,1) instead of NEW_CHARACTER(1) and you will see plenty of the former in R source code and packages.  Similarly you can grep for “_ELT” or “mkChar” and learn about those.  There really isn’t any definitive source for information and you will have to get comfortable googling, poking around source code examples, examining the R include files and even checking the R-devel mailing list to get a sense of the R functions that are available for getting C code to work with R objects.  I would recommend spending some time with Rinternals.h and Rdefines.h.

After R CMD SHLIB‘ing we will again create a very simple wrapper and then run the code from R:

1
2
3
4
5
6
# wrapper function to return a greeting.
dyn.load("helloB1.so")
helloB1<-function(){
 result<-.Call("helloB1")
 return(result)
}
1
2
3
4
5
6
>source('wrappers.R')
>greeting<-helloB1()
>class(greeting)
[1]"character"
>greeting
[1]"Hello World!"

Double Whew!  So far it still seems like .Call() is a big headache.  But we haven’t really tried to do anything in our C code yet.  The complexity/benefit balance evens out a little in our final example.

Casting about in the R header files

The title of this section really says it all.  As you start to do more in your C code you will need to learn how to cast character strings into SEXP objects, SEXP objects into integers,etcetc.  There is a finite, but large, amount to know before you become expert.  The two links in the “Getting used to SEXP” section above have excellent examples as doesProgramming with Data: Using and Extending R by Dirk Eddelbuettel.

Here is our last “Hello World!” example, the one that counts the characters in incoming greetings.  This example shows how R macros defined in Rdefines.h are used to extract elements from a vector, how vector elements are cast into char and int and how you need to UNPROTECT the same number of elements that you placed on the PROTECTstack.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <R.h>
#include <Rdefines.h>
#include <string.h>
SEXP helloC1(SEXP greeting){
 inti,vectorLength,stringLength;
 SEXP result;
 PROTECT(greeting=AS_CHARACTER(greeting));
 vectorLength=LENGTH(greeting);
 PROTECT(result=NEW_INTEGER(vectorLength));
 for(i=0;i<vectorLength;i++){
 stringLength=strlen(CHAR(STRING_ELT(greeting,i)));
 INTEGER(result)[i]=stringLength;
 }
 UNPROTECT(2);
 return(result);
}

After R CMD SHLIB, here is the wrapper and the R session:

1
2
3
4
5
6
# wrapper function to invoke helloC1
dyn.load("helloC1.so")
helloC1<-function(greeting){
 result<-.Call("helloC1",greeting)
 return(result)
}
1
2
3
4
>source('wrappers.R')
>greeting<-c("Hello World!","Bonjour tout le monde!","Привет мир!")
>helloC1(greeting)
[1]122220

Yes, it’s still at the double Whew! level but we did some worthwhile things like allocate space for R objects and correctly harness garbage collection.  If there were any halfway decent API docs for all this I would have no hesitation in recommending the .Call() interface to anyone writing C code.  As it is, however, there will be a painful learning curve.  If all you are doing is processing a vector of numbers and returning a simple scalar or vector result then the .C() interface will certainly be much easier — assuming you can take the memory hit.  If, on the other hand, you are doing things like using a C library to convert a bunch of raw data into more complex structures then you are going to have to learn to do things the R way.

But there is hope!  In the next post we will investigate using the Rcpp package to simplify this robust but complex interface to C code.  Hopefully we won’t have to become C++ wizards to do so.

Example Packages using .Call()

The .Call() interface is heavily used in many R packages.  Along with poring over Writing R Extensions document it is important to have some example code to work from.  Here is a running list of the packages I found with useful example code:

  • Rcsdp — R interface to the CSDP semidefinite programming library.

 More Information

Hadley Wickham has written an excellent tutorial on using the .Call() interface.

Published at DZone with permission of Jonathan Callahan, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)