- Questions & Answers
- Accounting
- Computer Science
- Automata or Computationing
- Computer Architecture
- Computer Graphics and Multimedia Applications
- Computer Network Security
- Data Structures
- Database Management System
- Design and Analysis of Algorithms
- Information Technology
- Linux Environment
- Networking
- Operating System
- Software Engineering
- Big Data
- Android
- iOS
- Matlab

- Economics
- Engineering
- Finance
- Thesis
- Management
- Science/Math
- Statistics
- Writing
- Dissertations
- Essays
- Programming
- Healthcare
- Law

- Log in | Sign up

STAT 431 Exam 1

For each task you should:

Summarize your data in verbal, tabular or graphic form. Perform EDA on the data.

Explain what the testing problem is about.

You should clearly state the null hypothesis and what its rejection means.

XXXXXXXXXXpoints) Returns on the Major U.S. Stock Exchanges (New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and NASDAQ) for the period 12/31/1926 through 12/31/2018 are in the file exam2.indexReturns.csv in this Exam's data directory which is on Canvas in Canvas/Files/Data/_Exam 2 Data. NOTE: For your convenience and support, we also provide the long-form dataset that R requires, that filename is exam2.indexReturns.long.csv.

Universe refers to the major U.S. stock exchanges, (New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and NASDAQ). Each of these indexes quote and trade thousands of securities. The return data is the annual percent return in an index (NYSE, AMEX, NASDAQ) for the year end date indicated.

There are four ways to calculate an index returns, these are the "type" factor. Each level corresponds to a different constituent stock weighting scheme in producing the index level, for which the annual returns (in percent) are found. These levels are

vwretdMarket capitalization-weighted return with dividend vwretxMarket capitalization-weighted return without dividend ewretdEqual-weighted return with dividend

ewretxEqual-weighted return without dividend

The questions we seek to answer include:

· If there a difference in the universe considered?

· Is there a difference between types of market returns, i.e., EW and MW, with and without dividends?

· Are there interactions between the universe and the return type, and what is their meaning?

In addition, please answer the following questions (10 points each):

a. Obtain the mean, median, and geometric mean (CAGR) annual returns for each universe and return type. Discuss these results.

b. Describe the data: time range, frequency, summary statistics, etc. Note that R's Anova and other functions require long-form data, which has been provided to you.

c. Check all major parametric Anova assumptions. You are familiar with normality and HOV diagnostics; independence can be checked with a runs test. Be sure and order your universe factors to NYSE, AMEX and NASDAQ, otherwise R assumes they are alphabetical, and we want to make comparisons relative to the NYSE.

d. Assuming you reject the omnibus hypothesis (be sure and state it), perform post-hoc testing to determine which regions of the market are significantly different.

e. Using R's {pwr} package or an online power program or GPower (freely available for Win or Mac OS-X; see tutorial provided in the Exam data directory), perform a power analysis for the problem. For this data, what power did we achieve? Determine the sample size required to detect a

± 4% difference in mean returns with a probability (power) of 80%.

2. (40 points) An observational study obtained decibel sound pressure level (SPL) data for an exponential horn (type of loudspeaker), as a function of distance from the audio source. The data is available in Canvas/Files/Data/_Exam 2 Data/spl.txt.

a. Prepare a comparative analysis of different regression models for this data. You should consider the linear model, and local (curvilinear) models, such as polynomial, spline and LOESS regression. Evaluate the model fits and provide your conclusion about which models best get at the data generating process. In your quantitative criteria be sure and include MAE and RMS error, as well as your recommendations for final model. Devise a tabular comparison to facilitate review of your models.

b. Unfortunately, the SPL sensor is subject to detecting random bursts of energy, resulting in occasional abnormally high readings. The data including these bad readings are in the file spl.contaminated.txt. Based on visual inspection, you would repeat your previous models and also include robust repression, such

as quantile regression (at least try the median), and Kendall-Theil regression. Be sure an provide pseudo R2 (one can use the nagelkerke function in the {rcompanion} package). Perform a complete analysis and comparison, make your conclusions and recommendation for the best model to use for this system.

3. This question has to do with expected mean squares. One way to state it is in terms of expected mean squares of treatment and errors. Although you will rarely need to know the expected value of MST or MSE, it is important to see that both expected values are the same when the null hypothesis is true and that the expected value of MST is larger when the null hypothesis is false.

Under H0 , E(MStmt) E(MSerr) , and under H , E(MStmt) so that the resulting F ratio can

22

(

1

)

increase. You may recall that SStmt n (x x )2 n x nx 2 .

iii i

a. For a single factor Anova, find/derive E(MSerr).

b. Find/derive E(MStmt) for H0 when it is both true and false.

4. Polynomial regression with interactions. This problem uses the pollution dataset exam2.pollute.txt in the exam data directory.

Find a parsimonious model for this data. Be sure and write the estimating equation for your final model. You should evaluate your stopping point with R2 and AIC. Do not mindlessly use stepwise search. You should consider including polynomial predictors. Be sure and consider interactions.

Fully interpret your resulting model. You should be able to obtain an R2 of at least .76 with an AIC of 326 or better.

For each task you should:

Summarize your data in verbal, tabular or graphic form. Perform EDA on the data.

Explain what the testing problem is about.

You should clearly state the null hypothesis and what its rejection means.

XXXXXXXXXXpoints) Returns on the Major U.S. Stock Exchanges (New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and NASDAQ) for the period 12/31/1926 through 12/31/2018 are in the file exam2.indexReturns.csv in this Exam's data directory which is on Canvas in Canvas/Files/Data/_Exam 2 Data. NOTE: For your convenience and support, we also provide the long-form dataset that R requires, that filename is exam2.indexReturns.long.csv.

Universe refers to the major U.S. stock exchanges, (New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and NASDAQ). Each of these indexes quote and trade thousands of securities. The return data is the annual percent return in an index (NYSE, AMEX, NASDAQ) for the year end date indicated.

There are four ways to calculate an index returns, these are the "type" factor. Each level corresponds to a different constituent stock weighting scheme in producing the index level, for which the annual returns (in percent) are found. These levels are

vwretdMarket capitalization-weighted return with dividend vwretxMarket capitalization-weighted return without dividend ewretdEqual-weighted return with dividend

ewretxEqual-weighted return without dividend

The questions we seek to answer include:

· If there a difference in the universe considered?

· Is there a difference between types of market returns, i.e., EW and MW, with and without dividends?

· Are there interactions between the universe and the return type, and what is their meaning?

In addition, please answer the following questions (10 points each):

a. Obtain the mean, median, and geometric mean (CAGR) annual returns for each universe and return type. Discuss these results.

b. Describe the data: time range, frequency, summary statistics, etc. Note that R's Anova and other functions require long-form data, which has been provided to you.

c. Check all major parametric Anova assumptions. You are familiar with normality and HOV diagnostics; independence can be checked with a runs test. Be sure and order your universe factors to NYSE, AMEX and NASDAQ, otherwise R assumes they are alphabetical, and we want to make comparisons relative to the NYSE.

d. Assuming you reject the omnibus hypothesis (be sure and state it), perform post-hoc testing to determine which regions of the market are significantly different.

e. Using R's {pwr} package or an online power program or GPower (freely available for Win or Mac OS-X; see tutorial provided in the Exam data directory), perform a power analysis for the problem. For this data, what power did we achieve? Determine the sample size required to detect a

± 4% difference in mean returns with a probability (power) of 80%.

2. (40 points) An observational study obtained decibel sound pressure level (SPL) data for an exponential horn (type of loudspeaker), as a function of distance from the audio source. The data is available in Canvas/Files/Data/_Exam 2 Data/spl.txt.

a. Prepare a comparative analysis of different regression models for this data. You should consider the linear model, and local (curvilinear) models, such as polynomial, spline and LOESS regression. Evaluate the model fits and provide your conclusion about which models best get at the data generating process. In your quantitative criteria be sure and include MAE and RMS error, as well as your recommendations for final model. Devise a tabular comparison to facilitate review of your models.

b. Unfortunately, the SPL sensor is subject to detecting random bursts of energy, resulting in occasional abnormally high readings. The data including these bad readings are in the file spl.contaminated.txt. Based on visual inspection, you would repeat your previous models and also include robust repression, such

as quantile regression (at least try the median), and Kendall-Theil regression. Be sure an provide pseudo R2 (one can use the nagelkerke function in the {rcompanion} package). Perform a complete analysis and comparison, make your conclusions and recommendation for the best model to use for this system.

3. This question has to do with expected mean squares. One way to state it is in terms of expected mean squares of treatment and errors. Although you will rarely need to know the expected value of MST or MSE, it is important to see that both expected values are the same when the null hypothesis is true and that the expected value of MST is larger when the null hypothesis is false.

Under H0 , E(MStmt) E(MSerr) , and under H , E(MStmt) so that the resulting F ratio can

22

(

1

)

increase. You may recall that SStmt n (x x )2 n x nx 2 .

iii i

a. For a single factor Anova, find/derive E(MSerr).

b. Find/derive E(MStmt) for H0 when it is both true and false.

4. Polynomial regression with interactions. This problem uses the pollution dataset exam2.pollute.txt in the exam data directory.

Find a parsimonious model for this data. Be sure and write the estimating equation for your final model. You should evaluate your stopping point with R2 and AIC. Do not mindlessly use stepwise search. You should consider including polynomial predictors. Be sure and consider interactions.

Fully interpret your resulting model. You should be able to obtain an R2 of at least .76 with an AIC of 326 or better.

Answered Same DayMay 10, 2021

- Regional vs. National Housing Price Comparison Report2 [Note: To complete this template, replace the bracketed text with your own content. Remove this note before you submit your outline.] Report:...Oct 17, 2021
- 1) Explanation of important variables in the python code (with comment out function). 2) Use Jupiter notebook of python 3) Please clearly mention questions (a) or (b) or (1) or (2) in python code not...SolvedOct 15, 2021
- Homework 6 https://www.khanacademy.org/math/ap-statistics/xfb5d8e68:inference-quantitative-means/one-sample-t-interval-mean/v/confidence-interval-for-a-mean-difference...SolvedOct 14, 2021
- MAS223 Applied Statistics Semester 2, 2021 Assignment 3 Due: Friday, 15 October 2020, 5PM This assignment covers material from Topics 1-7 (Descriptive Statistics, Re- sampling Methods, Linear...Oct 14, 2021
- Summary Count of Unique Landfills in this File 2,632 The LMOP Database contains key information about MSW landfills and LFG energy projects in the United States. Information in the LMOP Database is...Oct 10, 2021

- Respond to the following: Go to the Build-A-Bear website at Shop, Explore andPlay at Build-A-Bear and examine its store-based strategy mix. Reread the information regarding the company in Chapter 1 of...Oct 23, 2021
- An insect is laying 2000 eggs. A viable offspring develops from each egg (mutually independently from all other eggs) with a probability of XXXXXXXXXXDetermine the probability of having at least two...Oct 23, 2021
- Arrays and File I/O Create a program in a class called PlayGame.java that reads data from two files ShipPositionsPlayerOne.txt and ShipPositionsPlayerTwo.txt and plays the board game Battleship. A log...Oct 23, 2021
- Diversity Case Study Analysis In 2003, clothing retailer Abercrombie & Fitch (A&F) was sued by current and past members of its sales force as well as by applicants who were denied jobs, all of whom...Oct 23, 2021
- 1 Newton’s method Newton’s method is a root-finding algorithm. The way it works is pretty simple. Let f be a differentiable function for all x ∈ R. Pick a starting point, x0. Then we compute x1 = x0 −...Oct 23, 2021

Copy and Paste Your Assignment Here

Copyright © 2021. All rights reserved.