--- title: "Grading the Professor" author: 'Group Names:' date: 'Due: Monday, April 23 (in class)' output: pdf_document: default html_document: default word_document: default fig_width: 3 fig_height: 4 --- # Grading the Professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor "anonymously". However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, "Beauty in the classroom: instructors' pulchritude and putative pedagogical productivity" (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking end to receive higher instructional ratings (Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376). In this lab we will analyze the data from this study in order to learn what goes into a positive professor evaluation. ## Data The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors' physical appearance. The result is a data frame where each row contains a different course and columns represent variables about the course and its associated professors. ```{r, tidy = TRUE} evals <- read.csv("evals.csv", header = TRUE) ``` variable \hspace{4cm} | description \hspace{4cm} -----------------------------|-------------------------- score | average professor evaluation score: (1) very unsatisfactory - (5) excellent \vspace{0.25cm} rank | rank of professor: teaching, tenure track, tenured \vspace{0.25cm} ethnicity | ethnicity of professor: not minority, minority \vspace{0.25cm} gender | gender of professor: female, male \vspace{0.25cm} language | language of school where professor received education: English or non-English \vspace{0.25cm} age | age of professor \vspace{0.25cm} cls_perc_eval | percent of students in class who completed evaluation \vspace{0.25cm} cls_did_eval | number of students in class who completed evaluation \vspace{0.25cm} cls_students | total number of students in class \vspace{0.25cm} cls_level | class level: lower, upper \vspace{0.25cm} cls_profs | number of professors teaching sections in course in sample: single, multiple \vspace{0.25cm} cls_credits | number of credits of class: one credit (lab, PE, etc.), multi credit \vspace{0.25cm} bty_f1lower | beauty rating of professor from lower level female: (1) lowest - (10) highest \vspace{0.25cm} bty_f1upper | beauty rating of professor from upper level female: (1) lowest - (10) highest \vspace{0.25cm} bty_f2upper | beauty rating of professor from second upper level female: (1) lowest - (10) highest \vspace{0.25cm} bty_m1lower | beauty rating of professor from lower level male: (1) lowest - (10) highest \vspace{0.25cm} bty_m1upper | beauty rating of professor from upper level male: (1) lowest - (10) highest \vspace{0.25cm} bty_m2upper | beauty rating of professor from second upper level male: (1) lowest - (10) highest \vspace{0.25cm} bty_avg average | beauty rating of professor \vspace{0.25cm} pic_outfit | outfit of professor in picture: not formal, formal \vspace{0.25cm} pic_color | color of professor's picture: color, black & white ## Exploring the Data 1. __Is this an observational study or an experiment? The original research question posed in the paper is whether beauty leads directly to differences in course evaluations. Given the study design, is it possible to answer this question as it is phrased? If not, rephrase the question.__ \pagebreak 2. __Based on the plot below, describe the distribution of evaluation scores. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not?__ \vspace{2cm} ```{r, fig.align = 'center', out.width = '0.75\\linewidth'} hist(evals$score, xlab = "Evaluation Scores", main = "", nclass = 25) ``` ## Simple Linear Regression The fundamental phenomenon suggested by the study is that better looking teachers are evaluated more favorably. Let's create a scatterplot to see if this appears to be the case: ```{r, fig.align = 'center', out.width = '0.75\\linewidth'} plot(jitter(evals$score) ~ evals$bty_avg, xlab = "Average Beauty Score", ylab = "Evaluation Score") ``` 3. __What relationship do you see in the scatterplot above?__ \vspace{2cm} Let's see if the apparent trend in the plot is something more than natural variation. Fit the linear model called `m_bty` to predict average professor score by average beauty rating. \vspace{0.5cm} ```{r} m_bty <- lm(score ~ bty_avg, data = evals) summary(m_bty)$coefficients ``` \vspace{0.5cm} Now, we can add this regression line the scatterplot using abline(m_bty). \vspace{0.5cm} ```{r, fig.align = 'center', out.width = '0.75\\linewidth'} plot(jitter(evals$score) ~ evals$bty_avg, xlab = "Average Beauty Score", ylab = "Evaluation Score") abline(m_bty) ``` \vspace{0.5cm} 4. __Write out the *estimated* equation for the linear model *and* interpret the slope.__ \vspace{4cm} 5. __Is average beauty score a "statistically significant"" predictor? Does it appear to be a practically significant predictor? (hint: we describe predictors as "practically significant" if they have a "large" estimated effect)__ \pagebreak 6. __Use diagnostic plots and critical thinking to evaluate whether the conditions of simple linear regression are reasonably satisfied. Provide plots and comments for each one.__ \vspace{0.5cm} ```{r, fig.width = 8, fig.height = 5} par(mfrow = c(2, 2)) plot(m_bty) ``` * __Independence:__ \vspace{1cm} * __Normality of Residuals:__ \vspace{1cm} * __Constant Variance:__ \vspace{1cm} * __Linear Relationship:__ \vspace{1cm} * __No Influential Observations:__ \vspace{1cm} * __No Multicollinearity:__ \pagebreak ## Multiple Linear Regression The data set contains several variables on the beauty score of the professor: individual ratings from each of the six students who were asked to score the physical appearance of the professors and the average of these six scores. Let's take a look at the relationship between one of these scores and the average beauty score. \vspace{0.5cm} ```{r} plot(evals$bty_avg ~ evals$bty_f1lower, xlab = "Lower Level Beauty Rating", ylab = "Average Beauty Score") cor(evals$bty_avg, evals$bty_f1lower) ``` \vspace{0.5cm} As expected the relationship is quite strong - after all, the average score is calculated using the individual scores. We can actually take a look at the relationships between all beauty variables (columns 13 through 19) by making a scatterplot matrix. \pagebreak ```{r, warning = FALSE} library(psych) pairs.panels(evals[,13:19], ellipses = FALSE) ``` 7. __What statistical term do we use to describe when there are "large" correlations between explanatory variables?__ \vspace{1cm} ## The Search for the Best Model We will start with a full model that predicts professor score based on all of the available quantitative predictors: age, proportion of students that filled out evaluations, the number of students that did the evaluation, class size, and all 7 of the available beauty ratings. Let's run the model. ```{r, tidy = TRUE} m_full <- lm(score ~ age + cls_perc_eval + cls_did_eval + cls_students + bty_f1lower + bty_f1upper + bty_f2upper + bty_m1lower + bty_m1upper + bty_m2upper + bty_avg, data = evals) summary(m_full) ``` \vspace{0.5cm} 8. __What do you notice about the standard errors for the beauty variables?__ \vspace{2cm} When we believe that the explanatory variables are highly correlated with each other, the standard errors are inflated. We can actually measure how much each of the standard errors are inflated because of multicollinearity with other variables in the model, using what are called *variance inflation factors* (or VIFs). VIFs provide a way to assess the multicollinearity in the MLR model that is caused by including specific explanatory variables. The amount of information that is shared between a single explanatory variable and the others can be found by regressing that variable on the others and calculating the $R^2$ for that model. The easy way to obtain VIFs is using the vif function from the car package (Fox, 2003). Run the following code! ```{r, message = FALSE, warning = FALSE} library(car) vif(m_full) ## VIFs ``` \pagebreak Basically, large VIFs are bad, with the rule of thumb that values over 5 or 10 are considered "large" values indicating high multicollinearity in the model for __that particular variable__. We use this scale to determine if multicollinearity is a problem for a variable of interest. ```{r} sqrt(vif(m_full)) ## square root of VIFs ``` If we take the square root of the VIF numbers next each variable, we can interpret these numbers as "the number of times larger the standard error for the slope for __that variable__ is, due to collinearity with other variables in the model." 9. __Based on the above VIF output, which variables have "large" multicollinearity problems?__ \vspace{2cm} Having more than one of the beauty variables in the model seems like a bad choice, since they all are highly correlated with each other. In this application and with these highly-correlated predictors, I would choose the average beauty score as the single representative of these variables. Since the correlations between the beauty variables are the highest when comparing with average beauty score (all correlations > 0.75), it seems to be a reasonable choice. 10. __Drop all of the beauty variables except the average beauty score from the model.__ ```{r, tidy = TRUE} ## new MLR model with ONLY bty_avg, cls_perc_eval, cls_did_eval, cls_students, age AS EXPLANATORY VARIABLES ``` 11. __Did the standard errors of the explanatory variables change? How did they change?__ \vspace{2cm} 12. __Drop the variable with the highest p-value (as long as it is above 0.05) and re-fit the model. Did the coefficients and significance of the other explanatory variables change? If not, what does this say about whether or not the dropped variable was collinear with the other explanatory variables?__ \vspace{4cm} ```{r} # run a model with the variables you left in # run a summary of your new model ``` A model selection method used in statistics is called "backward-selection". The process is as follows: * Fit the full model (all possible quantitative variables) * Find the p-values of each variable in the model * Delete the __one__ variable with the __largest__ p-value, as long as the p-value is larger than your specified significance level (say, $\alpha = 0.05$) * Re-fit the model without that deleted variable * Find the p-values of each variable in the __new__ model * Delete the __one__ variable with the __largest__ p-value, as long as the p-value is larger than your specified significance level (say, $\alpha = 0.05$) This process continues until __all__ of the variables included in the model have p-values less than your specified significance level (say, $\alpha = 0.05$). 13. __Using this procedure as the selection method, determine the best model. You do not need to show all steps in your answer, just the output for the final model.__ ```{r} ## code for model fitting here! ## include your code for the FINAL model you choose! ``` 14. __Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.__ \vspace{4cm} 15. __The original paper describes how these data were gathered by taking a sample of professors from the University of Texas at Austin and including all courses that they have taught. Considering that each row represents a course, could this new information have an impact on any of the conditions of linear regression?__