R Code Applied to Statistics David Casado de Lucas Complutense University of Madrid ∟ Faculty of Economic and Business Sciences 24 March 2017 Department of Statistics and Operational Research II ∟ David Casado de Lucas ∟ You can decide not to print this file and consult it in digital format – paper and ink will be saved. Otherwise, print it on recycled paper, double-sided and with less ink. Be ecological. Thank you very much. ▲ ▲ ▲ Errata and linguistic errors are corrected as soon as Links to the beginning of the document possible. You may want to update (download and and the chapter, respectively. overwrite) the version of this file you might have. To use these Contents textboxes you must overwrite Names of sections are the file or save usually links too. it with a different name. Basic Use and Main Functions 5 Hypothesis Tests 102 Mathematical Analysis 28 Tests on the Mean Equivalence of the Two Methodologies Functions Power Function Interpolation and Approximation Methods Significance Level Probability Theory 37 Regression Methods 118 Random Numbers Simple Linear Regression (SLR) Plotting Density and Distribution Functions SLR: Quantity of Data Inference Theory 47 SLR: Quality of Data Sampling SLR: Y independent Correction Factor SLR: Robustness Descriptive Statistics 65 SLR: Transformation of Data Some Measures and Plots SLR: Spurious Correlation Point Estimations 68 Linear Regression Diagnosis Basic Estimators Appendixes 135 Statistics to Study the Mean Framework Moving Average Real Data Autocorrelation Functions Package stats Confidence Intervals 94 Package R Commander Confidence Intervals for the Mean Some References ▲ 3 Prologue This file contents the code that I am writing with teaching purposes. Users are supposed to have little—but some—knowledge about the basic ideas about what the computer can be used for in Statistics. I wrote some of these ideas for the practicals (in the appendixes at the end of each chapter) of: http://www.casado-d.org/edu/NotesStatisticalInference-Slides.pdf Acknowledgements This document has been created with Linux, LibreOffice, OpenOffice, GIMP and R. I thank those who make this software available for free. I donate funds to these kinds of project from time to time. Note: In using the built-in functions of R, we assume that the help of the functions will be consulted. ▲ 4 References – My Documents [1] A Brief Guide for Students. http://www.Casado-D.org/edu/GuideForStudents-Slides.pdf [2] Notes of Probability Theory. http://www.Casado-D.org/edu/NotesProbabilityTheory-Slides.pdf [3] Notes of Statistical Inference. http://www.Casado-D.org/edu/NotesStatisticalInference-Slides.pdf [4] Solved Exercises and Problems of Statistical Inference. http://www.Casado-D.org/edu/ExercisesProblemsStatisticalInference.pdf [5] R Code Applied to Statistics. http://www.Casado-D.org/edu/CodeAppliedToStatistics-Slides.pdf ▲ 5 Basic Use and Main Functions Some Functions Some Useful Methods and Prefixes How to Define Functions One Variable Several Variables Managing Data Managing Packages Managing Code Probability Theory Descriptive Statistics Statistical Inference ▲ 6 ▲ Basic Use and Main Functions Some Functions ; ... ' # getwd() # To see which is the working directory setwd() # To set the working directory help(Distributions) # To consult the help related to 'Distributions' ?Distributions # To consult the help related to 'Distributions' help.search('test') # To consult the help related to 'test' ??Distributions # To search for the string 'Distributions' in help info. apropos('print') # To search functions containing 'print' in their name library(help = 'stats') # To consult the help of the package 'stats' ls() # To list all the variables existing at the moment rm(namevariable) # To remove the variable 'namevariable' remove(namevariable) # To remove the variable 'namevariable' rm(list = ls()) # To remove all existing variables length(namevector) # The lenght of the vector dim(namearray) # The dimensions of the object names(nameobject) # To access the names of the fields of the object namefunction # To see the code of the function 'namefunction' ▲ 7 ▲ Basic Use and Main Functions a:b # To generate the sequence from a to b seq() # To generate sequences of numbers rep() # To repeat numbers, strings, etc max() # To identify the maximum min() # To identify the minimum sum() # To sum the elements sort() # To order the elements rank() # To obtain the ranks of the elements x[x>=0] # To identify an select the elements that are nonnegative which() # To identify the elements verifying a logical condition vector() # To create a vector matrix() # To create a matrix array() # To create an array is.numeric() # To check whether an element is numeric as.numeric() # To convert to numeric is.character() # To check whether an element is a string as.character() # To convert into a string is.infinite() # To check whether a variable is the concept infinite list() # To construct a list unlist() # To convert a list into a vector ▲ 8 ▲ Basic Use and Main Functions c(object1,...,objectX) # To combine numbers, strings, etc c(3,5,NA,4) # There may be missing data ('not available') cbind(object1,...,objectX) # To combine objects as columns rbind(object1,...,objectX) # To combine objects as rows paste() # To combine vectors as strings apply(x1,n1,function1) # To apply a function to margins of arrays tapply(x1,list1,function1) ave(x1,y1) by() tabulate() # To create the frequency table of a vector of integer values table() # To create a contingency table (by using a factor) cumsum() # To get a a vector whose elements are the cumulative sums, products, minima or maxima of the elements of the argument x11() # To open a new window for figures; otherwise, the last one is used plot() # To plot objects matplot() # To plot the columns of one matrix against the columns of another par(mfcol=c(4,2), lty=1, lwd=3, bty='n') # To change plotting parameters Sys.sleep() # For the system to suspend executions for a while call(function, argument1, argument2) # Another way to call a function eval(expression) # To evaluate literally a string expression assign('name', value, .GlobalEnv) # To assign a value to a variable in an environment ▲ 9 ▲ Basic Use and Main Functions Some Useful Methods and Prefixes print. as. plot. is. summary. How to Define Functions f1 = function(x) (1/2)^x f2 = function(x) { a=2^2; a*x } f3 = function(x) f1(x)+f2(x) ▲ 10 ▲ Basic Use and Main Functions Some Functions of Our Own Now we define some functions that will be used several times throughout this document: ourSampleMean = function(x) sum(x)/length(x) ourSampleSD1 = function(x) sqrt(mean((x-mean(x))^2)) ourSampleSD2 = function(x) sqrt(sum((x-sum(x)/length(x))^2)/length(x)) ourSampleQuasiSD1 = function(x) sqrt( mean((x-mean(x))^2)*length(x)/(length(x)-1)) ourSampleQuasiSD2 = function(x) sqrt(sum((x-sum(x)/length(x))^2)/(length(x)-1)) ourSampleSkewness1 = function(x) mean((x-mean(x))^3)/(mean((x-mean(x))^2)^1.5) ourSampleSkewness2 = function(x) mean((x-mean(x))^3)/(sd(x)^3) ourSampleKurtosis1 = function(x) mean((x-mean(x))^4)/(mean((x-mean(x))^2)^2) ourSampleKurtosis2 = function(x) mean((x-mean(x))^4)/(sd(x)^4) Due to the use of sd(), the second versions of the skewness and the kurtosis provide values slightly different from the first versions. There exist different estimators of the (population) skewness and kurtosis, and some of them are biased while others are simpler and “more natural”. The user must consult the literature before working with one in important tasks. # Our mass and distribution functions for the negative hypergeometric distribution #(for future us) dnhyperOUR = function(x, N, kappa, nu) { (choose(x-1,nu-1)*choose(N-x,kappa-nu))/choose(N,kappa) } pnhyperOUR = function(x, N, kappa, nu) { cumsum(dnhyperOUR(x, N, kappa, nu)) }
Description: