As a first example, consider a function to calculate the two sample t- statistic, showing ``all the steps''. This is an artificial example, of course, since there are other, simpler ways of achieving the same end.
The function is defined as follows:
> twosam <- function(y1, y2) {
n1 <- length(y1); n2 <- length(y2)
yb1 <- mean(y1); yb2 <- mean(y2)
s1 <- var(y1); s2 <- var(y2)
s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
tst <- (yb1 - yb2)/sqrt(s2*(1/n1 + 1/n2))
tst
}
With this function defined, you could perform two sample t- tests using a call such as
tstat <- twosam(data$male, data$female); tstat
As a second example, consider a function to emulate directly the MATLAB backslash command, which returns the coefficients of the orthogonal projection of the vector y onto the column space of the matrix, X . (This is ordinarily called the least squares estimates of the regression coefficients). This would ordinarily be done with the qr() function; however this is sometimes a bit tricky to use directly and it pays to have a simple function such as the following to use it safely.
Thus given a vector
and a matrix
then
where (X'X)- is a generalised inverse of X'X .
> bslash <- function(X, y) {
X <- qr(X)
qr.coef(X, y)
}
After this object is created it is permanent, like all objects, and may be used in statements such as
regcoeff <- bslash(Xmat, yvar)
and so on.
The classical . function lsfit() does this job quite well, and
more
.
It in turn uses the functions qr() and qr.coef() in the
slightly counterintuitive way above to do this part of the calculation.
Hence there is probably some value in having just this part isolated in a
simple to use function if it is going to be in frequent use. If so, we may
wish to make it a matrix binary operator for even more convenient use.