The template for a statistical model is a linear regression model with
independent, homoscedastic errors
In matrix terms this would be written
where the
is the response vector,
is the model
matrix or design matrix and has columns
,
,
,
, the determining variables. Very often
will be a column of 1s defining an intercept term.
Examples.
Before giving a formal specification, a few examples may usefully set the
picture.
Suppose y, x, x0, x1, x2, ... are
numeric variables, X is a matrix and A, B, C,
... are factors. The following formulæ on the left side below
specify statistical models as described on the right.
The operator is used to define a model formula in .. The
form, for an ordinary linear model, is
response t#tex2html_wrap_inline1673# term1
term2
term3 

- response
- is a vector or matrix, (or expression evaluating to
a vector or matrix) defining the response variable(s).
- is an operator, either + or -, implying
the inclusion or exclusion of a term in the model, (the first is optional).
- term
- is either
- a vector or matrix expression, or 1,
- a factor, or
- a formula expression consisting of factors, vectors or matrices
connected by formula operators.
In all cases each term defines a collection of columns either to be added
to or removed from the model matrix. A 1 stands for an intercept
column and is by default included in the model matrix unless explicitly
removed.
The formula operators are similar in effect to the Wilkinson and
Rogers notation used by such programs a Glim and Genstat.
One inevitable change is that the operator ``.'' becomes ``:''
since the period is a valid name character in .. The notation is
summarised as in the Table
(based on Chambers & Hastie,
p. 29).
Table:
Summary of model operator semantics

|
Note that inside the parentheses that usually enclose function arguments
all operators have their normal arithmetic meaning. The function
I() is an identity function used only to allow terms in model formulæ
to be defined using arithmetic operators.
Note particularly that the model formulæ specify the columns of
the model matrix, specification of the parameters is implicit. This is
not the case in other contexts, for example in fitting nonlinear models
Jeff Banfield
2/13/1998