Part of the end-of-term project requires providing answers to specific questions: correct answers will be marked as correct, incorrect ones will not. The rest will be graded on the basis of the student’s capacity to present and interpret statistical outputs. The rationale behind the project: • it will test all the stats skills you acquired this term – from formulating hypotheses, to visualising relationships, running statistical analysis, presenting and interpreting the results, but also data management, such as recoding of variables, where needed; • you have some freedom over the analysis you will run; You have to pick one of the 3 datasets available to you, and you get two pick the variables you will use in the analysis; • rather than telling you exactly what methods to apply, you will need to think about the variables you are using and which are the appropriate statistical techniques to test the relationship(s) between the variables you chose; • think about it as a miniature research project, but one in which you don’t need a theory and literature review part. A word on R code: • It is not mandatory to add your R code to the assignment, but it is recommended. It does not count towards the word limit (which is not strict, anyway) and you will be not marked on it. However, it helps us when marking the assignment. • If you produce your document in Word, then you can add the code at the end of the assignment. • If you produce the assignment using RMarkdown, then you don’t need to include the code at the end, as it is part of the document. 1 Formulate hypotheses 1. Pick a dataset among gss, nes and world. Inspect it, have a look at the variables it contains and at the codebook. Select an outcome and a predictor variable. These will be the central elements of your assignment. Remember that the outcome variable needs to be interval, ratio or high-level ordinal – what we call a continuous variable. Feel free to recode variables where you need to. Formulate the working and the null hypotheses. (15 points) Univariate statistics and visualisations 2. Describe the two variables. Create appropriate visualisations for each variable, accompanied by the appropriate descriptive statistics (hint: it all depends on the level of measurement). (15 points) Visualise a bivariate relationship 3. Thinking about the type of variable you selected, create a graph that will illustrate the relationship between your dependent and independent variables. Remember that visualisations have to be nice to look at, represent the data truthfully, be clear and informative. In other words, do not forget to add titles, labels and so on. (15 points) Hypothesis testing with a t-test or a non-parametric test 4. Test the hypothesis you formulated in Step 1 using a t-test or a nonparametric test, depending on which one is appropriate (hint: remember it depends on whether the variable is normally distributed or not). Report the test statistics, and its associated p-value. Use the .05 cut off point for statistical significance and interpret the results. (15 points) Bivariate regression 5. Test the hypothesis you formulated in Step 1 using a regression model. Present the regression results in a table and interpret them. Use the .05 cut off point for statistical significance. (15 points) Multiple regression 6. Expand on the relationship you tested above, by choosing another two variables that could improve your model. Feel free to recode variables. 6a. Create hypotheses for each new variable (and your outcome variable). (5 points) 6b. Present univariate analysis on the new variables (descriptive statistics and visualisations). (5 points) 6c. Run a regression model that includes the new variables. Present the regression results in a table and interpret them. Use the .05 cut off point for statistical significance. Run regression diagnostics for your model and discuss whether your model respects OLS assumptions. If it violates any assumptions, you need to indicate how you would fix the issue. You don’t need to re-run the model. (10 points) 2 6d. Compare the new regression model to the model from Step 5, using the appropriate statistical test. Report the results and interpret them. Is the second regression model more informative? (5 points)
Do you need Assignment help from intel-writers.us?