https://mailman.stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001802.html [R-sig-ME] Is it right to specify a random slope for the dummy variables Douglas Bates bates at stat.wisc.edu Thu Jan 1 20:44:48 CET 2009 * Next message: [R-sig-ME] Is it right to specify a random slope for the dummy variables * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] This is an interesting question in that it provides an opportunity for philosophical musings on fixed- and random-effects and on the way that categorical covariates are incorporated into model matrices. Allow me to expand a bit (well, maybe more than a bit) on Reinhold's remarks below. On Wed, Dec 31, 2008 at 5:35 AM, Reinhold Kliegl wrote: > By default, a categorical variable (factor) with n levels comes with > n-1 treatment contrasts. Therefore, in the fixed-effect part of the > model the intercept represents the reference level and the n-1 > contrasts represent the mean differences between the other levels and > the reference level, assuming a balanced design. You can check your > specification with contrasts(factor). Of course, you should change > from treatment to other contrasts as required by your hypotheses. See > ?contrasts. > Now suppose you have the variable group as random factor in the model > and you include the variable factor also in the random effects part: > lmer(y ~ factor + (factor|group)) > Then, you can estimate the variance of the intercept (i.e., variance > of reference level for groups), variances of the n-1 difference > scores for group, and correlations between intercept and difference > scores as random effects (i.e., you estimate varying intercepts and > varying differences and the correlations between them). > Thus, with categorical variables you are mostly looking at the > variance and correlation of difference scores between levels of a > factor rather than variance and correlation of slopes (which are also > a kind of difference score, of course). Reinhold is exactly correct in stating that the coefficients associated with a categorical factor are usually expressed as an intercept and a set of k-1 "contrasts", where k is the number of levels in the factor. There are alternatives, however, and it can be interesting to explore them. For brevity let me write f for a factor whose levels are considered fixed and repeatable, g for a factor whose levels represent a sample from a, possibly infinite, set of potential levels, x for a continuous covariate and y for the response. The number of levels for factor f is k. As most R users are (or should be) aware, a model formula, which is used to generate a model matrix from a data set, includes an implicit intercept term. Thus the formula y ~ x is equivalent to the formula y ~ 1 + x In fact, many people prefer to use the second form because it more clearly illustrates that there will be two coefficients estimated, the first associated with a constant term and the second associated with the numeric values of x. If we wish to suppress the intercept we can replace the '1' by a '0' to obtain the formula y ~ 0 + x When we have only continuous covariates in the model formula, the underlying models represented by 1 + x and 0 + x are different but the parameterization is the same. That statement may seem opaque but bear with me and I will try to explain what I mean. The "underlying model" is the set of all possible fitted values from the model. One of the characteristics of linear models is that the model is defined by the set of all possible predictions, which must be a linear subspace of the response space. For those who know the linear algebra terminology, this subspace is the column span of the model matrix. The coefficients are associated with a particular parameterization but the model itself exists independently of the parameterization. (It is this concept that Don Watts and I ported over to nonlinear regression models to explain some of the effects of nonlinearity.) So what I was saying about 1 + x versus 0 + x is that the column spans of the model matrices are different; the first is two dimensional and the second is one dimensional, but the parameterization for the column spans is the same. This is not the case for a categorical factor. The model formula y ~ 1 + f and the model formula y ~ 0 + f generate the same model. Each generates a model matrix whose column span is the k-dimensional span of the indicator columns for the levels of f. For the second formula the parameterization is derived from the indicator columns. We can think of the parameterization for the first model formula as resulting from the constant column followed by the complete set of indicator columns generating a model matrix with k + 1 columns and rank k. To obtain a full-rank matrix we must drop some linear combination of the columns. The default method is to drop the indicator of the first level of the factor but any one will do. That is, we have a well-defined column span but with an arbitrary parameterization. There is an interesting point here regarding the difference between fixed-effects and random-effects terms. The coefficients in fixed-effects terms are estimated by least squares, which I think of as a rigid criterion. The model matrix for the fixed-effects terms must have full column rank. If it doesn't then the coefficient estimates are undetermined. Thus, to obtain a well-defined and unique set of coefficient estimates we must check the rank of the model matrix and adjust the form of the matrix (and hence the parameterization of the model) if we detect a rank-deficient matrix. Much of the checking and adjustment can be and is done at the symbolic level, which is why the code for model.matrix is much more subtle than most would suspect (and also why just about any formula about analysis of variance given in an introductory book is not really the way that analysis of variance results are calculated). The model formula -> model frame -> model matrix path in the S language is almost never recognized for the great accomplishment that it is. However, even the symbolic analysis doesn't account for all cases, which is why there is a secondary check on the numerical rank of the model matrix - something that is not nearly as simple as it sounds. As anyone who has taken a linear algebra class knows, the rank of a matrix is a well-defined property that is easily evaluated. As anyone who has taken a numerical linear algebra class knows, the rank of a matrix is essentially undefined and impossible to evaluate reliably. One side note; it is exactly the need to assign a rank to a model matrix and to take corrective action for rank-deficient cases that keeps us using Linpack code for the QR decomposition instead of the numerically superior and more efficient Lapack code. Anyway, back at the fixed-effects versus random-effects computational issues. The coefficients for a random-effects term are "estimated" by penalized least squares, which I think of as a flexible criterion. (I would embark on analogies about least squares being the oak and penalized least squares being the willow in the wind storm but that is probably a little too much.) The penalty term takes care of any problems with rank deficiencies. We say that it "regularizes" the calculation in the sense that that the conditional means of the random effects are "shrunken" toward zero relative to the least squares estimates. It is exactly this regularization that allows the models to be expressed in the natural parameterization. That is, although the model matrix for the formula y ~ 1 + f cannot be expressed as a constant and the complete set of indicator columns because it will be rank deficient, the model matrix for the formula y ~ 1 + (1 | g) is expressed as a constant and the complete set of indicator columns for the levels of g. Furthermore all of these coefficients are estimable. Finally I get to the point of Zhijie's question about incorporating coefficients for a factor f in the random effects associated with the levels of g. Reinhold explained what the formula that could be written as 1) y ~ 1 + f + (1 + f | g) would generate. This model is equivalent to 2) y ~ 1 + f + (0 + f | g) or 3) y ~ 0 + f + (0 + f | g) and I would recommend one of these forms for interpretability. The parameterization of the fixed-effects is unimportant so either 2) or 3) will do. The parameterization of the random effects is similarly unimportant in that formulas 1), 2) and 3) all generate the same model fits but I feel there is an advantage in using the indicator column representation from 2) and 3). All three of these formulas will result in k variances and k(k-1)/2 covariances (or correlations) being estimated and up to k random effects for each level of g. In 2) and 3) the variances reported are the variance of the effect of level i of factor f on level j on factor g (or vice versa). The disadvantage of all three formulations is that when k is moderate the number of variance/covariance parameters being estimated, k(k+1)/2, can be large. An alternative model, which I prefer as a starting point, is 4) y ~ 0 + f + (1 | g) + (1 | f:g) Model 4) allows for an additive shift for each level of of g and an additive shift for each (observed) combination of levels of f and g. There are actually more random effects in 4) than in 1), 2) or 3) but many fewer variance/covariance parameters. Model 4) has only two variance parameters to be estimated and no covariances. Model 4) corresponds to model 3) subject to the constraint that the variance-covariance matrix for the random effects have a compound symmetry pattern. Some of these issues are illustrated in the section "Interactions of grouping factors and other covariates" in the slides at http://www.stat.wisc.edu/~bates/UseR2008/WorkshopD.pdf (slides 84 to 95). > On Wed, Dec 31, 2008 at 9:45 AM, zhijie zhang wrote: >> Dear all, >> Today, i was thinking the following question. >> We know the variables may be classified into continuous, ordinal, and >> categorical variables. I was confused about how to handle with >> the categorical variables in the multi-level models. >> For fixed effects, the categorical variables were always treated as dummy >> variables, my questions are: >> 1. Could the random slope be specified for categorical variables that was >> always changed into the form of dummy variables? >> 2. If the random slope could be specified for categorical variables, how to >> explain it? It seems a little different from the continuous variables. >> I tried the GLIMMIX Procedure in SAS. It seems that SAS treats categorical >> variables as continuous variables. While in MLWin, it seems that random >> slope could be specified for the dummy variables . >> Any ideas on it are greatly appreciated. * Next message: [R-sig-ME] Is it right to specify a random slope for the dummy variables * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] More information about the R-sig-mixed-models mailing list [R-sig-ME] Teaching Mixed Effects Douglas Bates bates at stat.wisc.edu Thu Jan 22 23:40:11 CET 2009 * Previous message: [R-sig-ME] Teaching Mixed Effects * Next message: [R-sig-ME] Teaching Mixed Effects * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] My thanks to Andrew for starting this thread and to Ben for his responses. I will add more responses below. I'm sorry that my responses have been delayed - it happens that this is the first week of our spring semester and I have been tied up getting courses underway. On Tue, Jan 20, 2009 at 8:59 AM, Bolker,Benjamin Michael wrote: > Some comments (others will certainly have more to add) > ________________________________________ > From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Andrew Beckerman [a.beckerman at sheffield.ac.uk] > Sent: Tuesday, January 20, 2009 6:47 AM > To: R Models Mixed > Subject: [R-sig-ME] Teaching Mixed Effects > > Dear R-Mixed people - > > I am about to embark on a day of attempting to teach some aspects of > mixed models using R to PhD students. I was wondering if anyone would > be willing to indulge in this summary below, developed through reading > threads on R-Mixed and R-Help over the past few months, and vet my > list of issues/questions/topics (4) associated with mixed models? > > Let me reduce any rising blood pressure by saying that I understand > (possibly) and accept why there are no p-values in lmer, and NONE of > the comments/questions below are about why lmer does not produce > sensible df's and p-values to calculate significance (Phew). > > ####################### > > First, a technical question: > > Based on these two threads: > https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001459.html > https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001456.html > > IS mcmcsamp() broken for "complicated" random effects? Is it in good > enough shape to teach for "simple" gaussian mixed models, to > demonstrate the principle? > > BMB: good question. I'd like to know, although I would > guess that it would be OK for demonstrating the principle. > You could try it out and see if it's sensible ... I have not verified the results from the current mcmcsamp, even for simple Gaussian models. They seem reasonable for these models but I need to look at them much more closely before I could advise trusting those results. The problem with designing an mcmcsamp method is that the variances of the random effects can legitimately be zero and often have a non-negligible probability of assuming the value of zero during the MCMC iteraions. However, most methods of sampling from the distribution of a variance are based on sampling from the distribution of a multiplier of the current value. If the current value is zero, you end up stuck there. > ####################### > > Now, here is what I am possibly going to talk about..... > > 0) Rule number 1 is to design experiments well, and aim for > orthogonal, well replicated and balanced designs. If you get data > that conforms to all of that, old school F-ratio's CAN be used. If > not, see 1-4 below (we will assume that Rule number 1 will be broken). > BMB: good idea. In a designed experiment you can do this. In an observational study you can't expect to end up with balanced data. I also tell my students that assuming a balanced design will always produce balanced data is contrary to Murphy's Law. Balance is fragile. I think it is unwise to depend on balance being achieved. > 1) It is agreed that the Laplacian methods for estimating terms and > "likelihoods" in mixed effects models is considered most reliable > (accurate and precise). > > BMB: I believe (but stand ready to be corrected) that > PQL vs Laplace vs adaptive Gauss-Hermite quadrature > (AGHQ) is an issue for GLMMs, not so much for LMMs. > AGHQ is generally even better (but slower) than Laplace, > which is a special case. Exactly. Adaptive Gauss-Hermite quadrature uses a quadrature formula centered at the conditional modes of the random effects and scaled by an approximation to the conditional standard deviations. (That's where the term "adaptive" comes from.) The quadrature formula depends on the number of quadrature points. Generally we use an odd number of points so one of the evaluations is at the conditional mode. Thus you could use a 3-point evaluation or a 5-point evaluation. The simplest formula, the 1-point Gauss-Hermite evaluation, is exactly the Laplace approximation. It turns out that the Laplace approximation is the only feasible such method (at least the only one that I can imagine) when you have more than one grouping factor for the random effects. Even with just one grouping factor, if you have vector-valued random effects (meaning that you have more than one random effect associated with with each level of the grouping factor) then the complexity of AGHQ is the number of quadrature points raised to the dimension of the vector-valued random effect. Thus if you have a random intercept and a random slope for each subject, say, and you choose a 7 point formula then you must do 49 evaluations of the deviance residuals for each AGHQ evaluation. Oliver Schaubenberger's paper on SAS PROC GLIMMIX, which Ben mentions below, discusses the problem of proliferation of the number of evaluations required by AGHQ. > R (lme4) and ADMB model builder use these > methods. SAS nlmixed does, but SAS proc mixed does not appear to. > STATA can. Genstat does/can (see final note below**). > > BMB: SAS PROC MIXED does, as of version 9.2 > see www2.sas.com/proceedings/forum2007/177-2007.pdf I think you mean SAS PROC GLIMMIX, not SAS PROC MIXED. Regarding the comparison of methods provided by different software, remember that details of the implementation can be important. The term "adaptive Gauss-Hermite quadrature" describes a technical approach but there can be many variations on the implementation, with important consequences for precision, speed and accuracy. It is a gross oversimplification to imagine that such a technique is implemented by handing a paper with some formulas to a "programmer" and declaring the job done. Comparing implementations, including looking at the intermediate steps, is important but only possible for open source implementations. > 2) It is agreed that the appropriate test for fixed effects in mixed > models should be between nested models. However, there is no > agreement as how to characterise the distributions that would be used > to generate p-values. This is the crux of the Bates et al argument: > Likelihood Ratio Tests, Wald tests etc all need to assume a > distribtion and some degrees of freedom. But, in many mixed models, > the distribution need not conform to any of our standard ones (t,F, > Chi-square etc), especially when the number of subjects in the random > effects is small. Moreover, the relationship between fixed and random > effects means that it is nearly impossible, and perhaps not worthwhile > to calcuate what might be appropriate "degrees of freedom". I'm afraid my response will be rather long-winded, for which I apologize. I feel that as statisticians we have done a disservice to those who use statistics by taking complex problems (tests of hypotheses for terms in complicated model structures) for which there is an enormous simplification in the central special case (linear dependence of the mean on the model parameters, "spherical" Gaussian distribution) and describing certain computational shortcuts that can be used only in this special case. For the most part we don't even hint at the general problem or even discuss the approach used in the special case. We jump right in to a particular form of a summary of a computational shortcut, the analysis of variance table. I am very pleased to see you describe the situation as a comparison of two nested models. That is indeed how we should approach the problem. We have a general model and a specific model that is a special case of the general model. Obviously the general model will fit the data at least as well as the more specialized model. We wish to determine if the fit is sufficiently better to justify using the more general model. To do so we must decide how to judge the extent to which the general model fits better and how much more complex it is, so we can form some kind of cost/benefit criterion. Then we must assess the value of this test statistic by comparing it to a reference distribution. In the special case of a Gaussian linear model it all works out beautifully and we can summarize the results very compactly with t statistics or F statistics and their degrees of freedom. But this case is special. It is misleading to believe that things will simplify like this in more general cases. Consider the question of how we measure the comparative complexity of two models. Typically we can measure the difference in the complexity of the models in terms of the number of additional parameters in the general model. In the central special case (Gaussian linear models) there is a geometric representation of the model where the general model corresponds to a linear subspace of the sample space and the specific model corresponds to a subspace contained in the general model. The spherical Gaussian assumption (i.e. Gaussian distribution with variance-covariance of sigma^2 I, for which the contours of constant probability density are spheres) links the probability model to the Euclidean geometry of the space. From those two assumptions the methods of testing a general linear hypothesis can be derived geometrically. Gosset derived a statistic that is equivalent to the t statistic using analytic means but the current form of the t statistic and its relation to degrees of freedom came from Fisher and were based on his geometric insight (see the Wikipedia article on Gosset). And, of course, Fisher was able to generalize the t distribution to the F distribution, again based on geometric principles. In the first chapter of our 1980 book "Nonlinear Regression Analysis and Its Applications" Don Watts and I illustrate the geometric approach to the t and F statistics for linear models. Fisher's genius was to see that questions about comparative model fits, which are related to distances in the geometric representation, can be transformed into questions about angles, related to the ratios of distances or, equivalently, squared distances of orthogonal components of a response vector. The use of the ratio allows one scale factor (variance component in statistical terminology) to be canceled out. That is, the null distribution of an F ratio can be expressed without needing to specify the unknown error variance. Regrettably, the "use a ratio to eliminate a variance component" trick only works once. The first variance component is free but not subsequent ones. If you have a perfectly balanced, orthogonal design then you can apply the trick multiple times by isolating certain orthogonal submodels and applying the trick within the submodel. That is, you can use estimates from different "error strata" in ratios for different terms. However, that approach based on certain mean squares and expected mean squares doesn't generalize well. The computationally tractable approach to estimation of parameters in mixed models is maximum likelihood or REML estimation. The reason for my long-winded explanation is your saying " This is the crux of the Bates et al argument: Likelihood Ratio Tests, Wald tests etc all need to assume a distribution and some degrees of freedom.", which is a natural statement given the way that we teach analysis of variance. We teach the "what" (i.e. create a table of sums of squares, degrees of freedom, mean squares, F ratios, p-values) and not the "why". If you only see the "what" then it is natural to assume that there are some magical properties associated with sums of squares and degrees of freedom and all we need to do is to figure out which sums of squares and which degrees of freedom to use. The magical properties are actually the simplified geometric representation (orthogonal linear subspaces, Euclidean geometry) that is unique to the Gaussian linear model. The beauty of that model is that, no matter how complicated the representation of a test as a formula may be, the geometric representation is always the same, as the ratio of the normalized squared lengths of two orthogonal components of the response vector. When we step away from that Gaussian linear model the simplifications all break down. I spent the early part of career thinking of what parts of the Gaussian linear model can be transferred over to the Gaussian nonlinear model and what that would mean for inference. This is why I concentrated so much on the geometry of models based on the spherical Gaussian distribution. Generalized linear models retain the linear predictor, transformed through an inverse link function to the appropriate scale for the mean, but allow for distributions other than the Gaussian. They require another way of thinking. I think of mixed models as being based on two random vectors, the response vector and the unobserved random effects. In the case of a Gaussian linear mixed model the conditional distribution of the response, given the random effects, is a spherical Gaussian and the unconditional distribution of the random effects is multivariate Gaussian but our inferences require the marginal distribution of the response. For a linear mixed model this is Gaussian but with more than one variance component. Getting rid of just one variance component won't do, yet the t and F derivations depend strongly on just having one variance component that can be removed by considering a ratio. If we want to perform a hypothesis test related to a fixed-effects term in a mixed model (and, for the moment, I will not go into the question of whether statistical inferences should always be phrased as the result of hypothesis tests) I would claim we should start at the beginning, which is considering two models for the data at hand, one model being a special case of the other. We need to decide how we measure the quality of the fit of the general model relative to the more specific model and how we measure the additional cost of the general model. Then we need to formulate a test statistic. If we are incredibly lucky, the null distribution of this test statistic will be well-defined (that is, it will not depend on the values of other, unknown parameters) and we can evaluate probabilities associated with it. That does happen in the case of the Gaussian linear model. I personally don't think it will be possible to possible to provide a general approach that isolates the effect of a fixed-effect term in a linear mixed model using a statistic that does not depend on the values of other parameters. I would be delighted if someone can do it but I think there is too much that goes right in the case of the Gaussian linear model to expect that the same incredible simplifications will apply to other models. I don't feel that holy grail of inference in mixed effects models should be a magical formula for degrees of freedom to be associated with some ratio that looks like a t or an F statistic (despite the strongly held beliefs of those in the First Church of the Kenward-Roger Approximation). Certainly there has been a lot of statistical research related to approximating a difficult distribution by a more common distribution but I view this approach as belonging to an earlier era. It is the approach embodied in software like SAS whose purpose often seems to be to evaluate difficult formulas and provide reams of output including every number that could possibly be of interest. I think we should use the power of current and future computers to interact with and explore data and models for the data. MCMC is one way to do this. In nonlinear regression Don and I advocated profiling the sum of squares function with respect to the values of individual parameters as another way of assessing the actual behavior of the model versus trying to formulate an approximation. I'm sure that creative people will come up with many other ways to use the power of computers to this end. The point is to explore the actual behavior of the model/data combination, not to do just one fit, calculate a bunch of summary statistics, apply approximate distributions to get p-values and go home. If we want to generalize methods of inference we should consider the whole chain of reasoning that leads us to the result rather than concentrating only on the last step, which is "now that I have the value of a statistic how do I convert it to a p-value?" or, even more specifically, "I have calculated something that looks like a t-ratio so I am going to assume that its distribution is indeed a t-distribution which leaves me with only one question and that is on how many degrees of freedom". I appreciate that this is inconvenient to those applying such models to their data. Philosophical discussions of the fundamentals of statistical inference are all well and good but when the referees on your paper say you have to provide a p-value for a particular term or tests, it is a practical matter, not an academic, theoretical debate. Those with sufficient energy and skill plus a stellar reputation as a researcher may be able to convince editors that p-values are not the "be all and end all" of data analysis - Reinhold Kleigl has been able to do this in some cases - but that is definitely not the path of least resistance. The sad reality is that p-values have taken on a role as the coin of the realm in science that is far beyond what any statistician would have imagined. (Apparently the default "significance level" of 5%, which is considered in some disciplines to be carved in stone, resulted from a casual comment by Fisher to the effect that he might regard an outcome that would be expected to occur less than, say, 1 time in 20 as "significant".) It is unhelpful of me not to provide p-values in the lmer summaries but I develop the software out of interest in doing it as well as I possibly can and not because someone assigns me a task to compute something. I really don't know of a good way of assigning p-values to t-ratios or F-ratios so I don't. I still report the ratio of the estimate divided by it standard error, and even call it a t-ratio, because I think it is informative. > BMB: and specifically, if you happily use the anova() method > to calculate LRT it will do so -- but Pinheiro and Bates 2000 expressly > warn against the results/show that they can be "anticonservative" > for small sample sizes. If you have huge sample sizes (i.e. > you're not a field ecologist) then LRT may be OK (I seem to remember > Bates using it without comment in an analysis of a (large) > Bangladeshi arsenic data set). I think that was data on artificial contraception use obtained as part of a fertility survey in Bangladesh. I feel that the likelihood ratio is a perfectly reasonable way of comparing two model fits where one is a special case of the other. In fact, if the models have been fit by maximum likelihood, the likelihood ratio would, I think, be the first candidate for a test statistic. The problem with likelihood ratio tests is not the likelihood ratio, per se -- it is converting the likelihood ratio to a p-value. You need to be able to evaluate the distribution of the likelihood ratio under the null hypothesis. The chi-square approximation to the distribution is exactly that - an approximation - and its validity depends on not testing at the boundary and on having a large sample, in some sense of the sample size. If I were really interested in evaluating a p-value for the likelihood ratio I would probably try a parametric bootstrap to get a reference distribution. > 2.1) However, Bates et al have mentioned the restricted likelihood > ratio test. There is a package in R implementing some of these tools > (RLRsim), but these appear to be limited to and or focused on tests of > random effects. > > BMB: You can test fixed effects, apparently, but only *in combination with* > a test of the random effect (the null hypothesis is always a model > without random effects). Also limited to LMMs, and a single > random effect. > > 2.2) What some "other" packages do: SAS can produce wald tests and > LRT's if you want, and can implement the kenward-rogers adjustement. > > BMB: Kenward-Roger ! (not Rogers) > > There is some theory behind the K-R, but it is still not dealing with > the crux of the problem (see 2). Genstat uses wald tests and warns > you that with small numbers of subjects, these are not reliable. > Genstat is also experimenting with HGLM by Nelder and Lee (see **) > > 2.3) "Testing" random effects is considered inappropriate (but see 2.1 > for methods?). > > BMB: I don't think this is necessarily true. Admittedly it is a point > null hypothesis (variance will never be _exactly_ zero), but I > can certainly see cases ("does variation among species contribute > significantly to the overall variance observed"?) where one would > want to test this question. This is a bit murky but I think the > distinction is often between random effects as part of an experimental > design (no point in testing, not interesting) and random effects > as observational data. Actually the MLE or the REML estimate of a variance component can indeed be zero. The residual variance (i.e. the variance of the "per observation" noise term) is only zero for artificial data but the estimates of other variance components can be exactly zero. I think of such situations as an indication that I should simplify the model by eliminating such a term. I have spent most of my career at the University of Wisconsin-Madison in a statistics department founded by George Box who famously said "All models are wrong; some models are useful." I don't expect a model to be correct, I am only interested in whether the terms in the model are useful in explaining the observed data. > 3) Because of 2, there is the resounding argument that bayesian and or > simulation/bootstrapping tools be used to evalaute fixed effects. > > BMB: I don't know about "resounding", but OK. Probably > the best option. > > Current methods proposed and coded, but in various states of > usefulness are: > > mcmcsamp() and HPDinterval() from lme4 + baayen *.fnc's from languageR, > BUGS and winBugs, > RH Baayen's simulation tools (e.g. page 307 method) > Andrew Gelman and Jennifer Hill's tools (e.g. sim() method from > package arm) > Ben Bolker's suggestions in this list for glmm's (thread: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001459.html) > > 3.1) These all evalaute "simple" tests of whether beta's and intercept > are different than 0, and are linked to the contrasts. There is no > emerging method equivalent to a LRT (but see 2.1 and **Final Note > Below). > > BMB: I think if you want to calculate the parametric bootstrap/ > null-hypothesis simulation of the change in deviances between > nested models, it's actually reasonably straightforward. See > examples on glmm.wikidot.com , especially > http://glmm.wikidot.com/basic-glmm-simulation > [geared toward GLMMs but even easier for LMMs] > > 4) Andrew Gelman et al also suggest AIC based methods and model > averaging for model inference, given constant random effects. I think > their argument about AIC is that if the "likelihood" is estimated > well, relative differences in AIC will be constant, irrespective of > any adjustement made to numbers of paramters used in calculating AIC: > i.e. as long as the random effects structure stays the same, the > relative differences between nested models will not change if the > number of paramters is estimated consistently. These methods still do > not produce p-values. > > BMB: and AIC is an asymptotic method anyway, like > LRTs ... which means it is likely to have the same problems, but I don't > think anyone has evaluated them. If you want to use the finite-size > corrections (AICc) then you are back in the situation of guessing > at residual degrees of freedom ... > > **Final Note Below - I have noticed a relative lack of discussion of > Nelder et al's H-likelihood and their methods to generate a general > method for all heirarchical modelling (HGLM?!). Would anybody be able > to comment? A recent paper (http://www.springerlink.com/content/17p17r046lx4053r/fulltext.pdf > ) that is somewhat beyond my skills, indicates the use of Laplace > methods to estimate likelihoods in heirarchical models and various > capacity for model inference. > > BMB: I think HGLMs are a very promising way of *estimating* > the parameters of GLMMs (I too wonder why they don't seem to > be discussed much), but I don't think they get us any farther forward > with inference. > > Thanks again, in advance, to anyone who took this on..... apologies > for any glaring errors or assignment of ideas to people incorrectly. > > Andrew > > --------------------------------------------------------------------------------- > Dr. Andrew Beckerman > Department of Animal and Plant Sciences, University of Sheffield, > Alfred Denny Building, Western Bank, Sheffield S10 2TN, UK > ph +44 (0)114 222 0026; fx +44 (0)114 222 0002 > http://www.beckslab.staff.shef.ac.uk/ > > http://www.flickr.com/photos/apbeckerman/ > http://www.warblefly.co.uk > > _______________________________________________ > R-sig-mixed-models at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models > > _______________________________________________ > R-sig-mixed-models at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models > * Previous message: [R-sig-ME] Teaching Mixed Effects * Next message: [R-sig-ME] Teaching Mixed Effects * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] More information about the R-sig-mixed-models mailing list