STA 4504 - 5503: Outline of Lecture Notes, c Alan Agresti (cid:13) Categorical Data Analysis Introduction 1. Methods for response (dependent) variable Y having • scale that is a set of categories Explanatory variables may be categorical or contin- • uous or both 1 Example Y = vote in election (Democrat, Republican, Indepen- dent) x’s - income, education, gender, race Two types of categorical variables Nominal - unordered categories Ordinal - ordered categories Examples Ordinal patient condition (excellent, good, fair, poor) government spending (too high, about right, too low) 2 Nominal transport to work (car, bus, bicycle, walk, . . . ) favorite music (rock, classical, jazz, country, folk, pop) We pay special attention to binary variables (success - fail) for which nominal - ordinal distinction unimportant. 3 Probability Distributions for Categorical Data The binomial distribution (and its multinomial dis- tribution generalization) plays the role that the normal distribution does for continuous response. Binomial Distribution n Bernoulli trials - two possible outcomes for each • (success, failure) π = P(success), 1 π = P(failure) for each trial • − Y = number of successes out of n trials • Trials are independent • Y has binomial distribution 4 n! y n y P(y) = π (1 π) , y = 0, 1, 2, . . . , n − y!(n y)! − − y! = y(y 1)(y 2) (1) with 0! = 1 (factorial) − − · · · Example Vote (Democrat, Republican) Suppose π = prob(Democrat) = 0.50. For random sample size n = 3, let y = number of Demo- cratic votes 3! y 3 y p(y) = .5 .5 − y!(3 y)! − 3! 0 3 3 p(0) = .5 .5 = .5 = 0.125 0!3! 3! 1 2 3 p(1) = .5 .5 = 3(.5 ) = 0.375 1!2! y P(y) 0 0.125 1 0.375 2 0.375 3 0.152 1.0 5 Note E(Y ) = nπ • V ar(Y ) = nπ(1 π), σ = nπ(1 π) − − p Y p = = proportion of success (also denoted πˆ) • n Y E(p) = E = π n (cid:18) (cid:19) Y π(1 π) σ = − n n r (cid:18) (cid:19) When each trial has > 2 possible outcomes, num- • bers of outcomes in various categories have multinomial distribution 6 Inference for a Proportion We conduct inferences about parameters using maximum likelihood Definition: The likelihood function is the probability of the observed data, expressed as a function of the param- eter value. Example: Binomial, n = 2, observe y = 1 p(1) = 2! π1(1 π)1 = 2π(1 π) 1!1! − − = ℓ(π) the likelihood function defined for π between 0 and 1 7 If π = 0, probability is ℓ(0) = 0 of getting y = 1 If π = 0.5, probability is ℓ(0.5) = 0.5 of getting y = 1 Definition The maximum likelihood (ML) estimate is the parameter value at which the likelihood function takes its maximum. Example ℓ(π) = 2π(1 π) maximized at πˆ = 0.5 − i.e., y = 1 in n = 2 trials is most likely if π = 0.5. ML estimate of π is πˆ = 0.50. 8 Note y For binomial, πˆ = = proportion of successes. • n If y , y , . . . , y are independent from normal (or many 1 2 n • other distributions, such as Poisson), ML estimate µˆ = y¯. In ordinary regression (Y normal) “least squares” • ∼ estimates are ML. For large n for any distribution, ML estimates are • optimal (no other estimator has smaller standard error) For large n, ML estimators have approximate normal • sampling distributions (under weak conditions) 9 ML Inference about Binomial Parameter y πˆ = p = n π(1 π) Recall E(p) = π, σ(p) = . − n q Note σ(p) as n , so • ↓ ↑ p π (law of large numbers, true in general for ML) → p is a sample mean for (0,1) data, so by Central • Limit Theorem, sampling distribution of p is approxi- mately normal for large n (True in general for ML) 10
Description: