# You should turn in any R code that you used together with a separate text, word, or pdf file… 1 answer below »

There is a data set in two formats, datasets.csv (a comma delimited text file),and datasets.xlsx (an Excel file). The data in the files are identical. There are 4 groups of data sets, AA, BB, CC, and DD. Answer the questions below about each data set.

You should turn in any R code that you used together with a separate text, word, or pdf file containing the answers to the questions below. Have the file organized so that I can easily find your answer to each question. Please do not make me search through a large file with both your code and answers and certainly do not require me to run your code to find your answers.

**Subsamples**

You may have covered this in the lab sessions and have alternative techniques for getting subsamples. If not, for several of the data sets, one needs a random subsample of a given size and needs to run a regression on that subsample. This can be accomplished with the sample() function in R. For instance, if df is a data frame with variables Y, X1, and X2, each variable has 1000 observations, and one need a random subsample of size 50, then one could use the following code:

subsample

model

Alternatively, one could use the following code:

subsample

YSS

X1SS

X2SS

model

This last bit of code is of use when doing weighted least squares on a subsample. There, one has to perform an auxiliary regression, which is difficult using the first form.

**Data set A**

A.1: Using the full sample, regress YAA on X1AA and X2AA and then test for heteroskedasticity using your preferred test. Is there any evidence of heteroskedasticity?

A.2: Compute the OLS estimates of the coefficients and their 95% confidence intervals from the regression is A.1 and report your results.

A.3: Take a random subsample of 20 observations from the full sample and repeat A.1 and A.2 for this subsample. Do the confidence intervals for the subset of observations contain the confidence intervals for the full sample?

**Data set B**

B.1: Select a random subsample of 100 observations from the full sample. Run the regression of YBB on X1BB and X2BB. Test for heteroskedasticity using your preferred test and report the results. Compute the OLS estimates and their 95% confidence intervals. Did you use the robust standard errors or not? Why?

B.2: Using the same subsample as in B.1, run the regression of YBB on X1BB, X2BB, and X3BB. Test for heteroskedasticity using your preferred test and report the results. Compute the OLS estimates and their 95% confidence intervals. Did you use the robust standard errors or not? Why?

B.3: Redo B.1 and B.2 using the full sample. Do you think any of the coefficients are biased if the variable X3BB is omitted, and if so, which ones? What is the correlation between the independent variables? Can it really be the case that there is no heteroskedasticity when X3BB is omitted, but there is heteroskedasticity when it is included? How?

**Data set C**

C.1: Select a random subsample of 100 observations from the full sample. Run the regression of YCC on X1CC and X2CC. Test for heteroskedasticity using your preferred test and report the results. Compute the OLS estimates and their 95% confidence intervals. Did you use the robust standard errors or not? Why?

C.2: One can use feasible generalized least squares (FGLS) to try to correct for heteroskedasticity. In class, we used two different techniques for obtaining the weights

- Regressing the log of the square residuals on all the independent variables (FGLS.r, line28)
- Regressing the absolute value of the residuals on all the independent variables (FGLS-2.r, line 29).

For the subsample in C.1, apply each of these techniques and report your coefficient estimates along with 95% confidence intervals.

C.3: Redo C.1 and C.2 with the full sample. Which of the two FGLS techniques do you prefer? Why?

C.4: Plot the residuals from the regression of YCC on X1CC and X2CC against X1CC and then against X2CC (there should be two plots and the residuals should be on the y-axis). Use these plots to explain why one of the FGLS techniques seems to be doing better.

**Data set DD**

D.1: Using the full sample, regress YDD on X1DD and X2DD and then test for heteroskedasticity using your preferred test. Is there any evidence of heteroskedasticity?

D.2: Plot the residuals from the regression from the regression in D.1 against X1CC and then against X2CC (there should be two plots and the residuals should be on the y-axis). Do either of these plots show evidence of heteroskedasticity? Why?

D.3: There is another test for heteroskedasticity due to White (1980). In the Breusch-Pagen test, one regressed the square residuals on all the independent variables. In the White test, one regresses the square residuals on all the independent variables, their squares, and their cross-products. The asymptotic distribution of the White test is chi-square with the number of degrees of freedom equal to the number of regressors in the auxiliary regression. The following bit of R code below implements the White test in this example.

modelDD

X11DD

X22DD

X12DD

white

white_stat

pval

cat(“White test for heteroskedasticitynW = “, white_stat, ” p-value = “, pval, “nn”)

Is there any evidence for heteroskedasticity with this test? Compute the OLS estimates and their 95% confidence intervals. Did you use the robust standard errors or not? Why? The true data generating process was

YDD = 2 + 3 * X1DD – X2DD

Do your confidence intervals contain the true coefficient values?

**Last Question:**Given what you learned in D.1 through D.3, can you figure out a feasible way to use weighted least square to correct for the heteroskedasticity? If so, explain the correction and report the 95% confidence intervals for your technique. Do the confidence intervals contain the true coefficient values and are they contained in the confidence intervals from D.3?

So, I need answers to all the questions here, and I also need the R code used to execute them(follow the instructions). I need answers in the order of each question.

I attach 3 files!

Attachments: