1 AAnnaallyyzziinngg OOnnee--VVaarriiaabbllee DDaattaa Lesson 1.1 Statistics: The Science and Art of Data 4 Lesson 1.2 Displaying Categorical Data 11 Lesson 1.3 Displaying Quantitative Data: Dotplots 21 Lesson 1.4 Displaying Quantitative Data: Stemplots 30 Lesson 1.5 Displaying Quantitative Data: Histograms 38 Lesson 1.6 Measuring Center 49 Lesson 1.7 Measuring Variability 58 Lesson 1.8 Summarizing Quantitative Data: Boxplots and Outliers 67 Lesson 1.9 Describing Location in a Distribution 77 Chapter 1 Main Points 86 Chapter 1 Review Exercises 88 Chapter 1 Practice Test 90 2 (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 2 27/06/16 1:50 PM s ib ro C /n a e c O /8 6 © StAtS applied! Does hand sanitizer work? Is soap better than hand sanitizer for getting rid of unwanted bacteria? Daniel and Kate de- signed an experiment to find out. Using 30 identical petri dishes, they randomly assigned 10 students to press one hand in a dish after washing with soap, 10 students to press one hand in a dish after using hand sanitizer, and 10 students to press one hand in a dish after using nothing. After three days of incubation, they counted the number of bacteria colo- nies on each petri dish.1 Which petri dishes had the most bacteria colonies? What conclusion did Daniel and Kate make based on the data? We’ll revisit STATS applied! at the end of the chapter, so you can use what you have learned to help answer these questions. 3 (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 3 27/06/16 1:50 PM Lesson 1.1 Statistics: the Science and Art of Data L e A r n i n g t A r g e t S d Identify the individuals and variables in a data set, then classify the variables as categorical or quantitative. d Summarize the distribution of a variable with a frequency table or a relative frequency table. We live in a world of data. Every day, the media report poll results, outcomes of medical studies, and analyses of data on everything from gasoline prices to standardized test scores to consumption of bottled water to new technology. The data are trying to tell us a story. To understand what the data are saying, you need to learn more about statistics. DEFINITION Statistics Statistics is the science and art of collecting, analyzing, and drawing conclusions from data. A solid understanding of statistics will help you make good decisions based on data in your daily life. A c t i V i t y The “1 in 6 wins” game This activity will give you a “taste” of what statistics is a winner!” We can model the status of an individual about: drawing conclusions from data. bottle with a six-sided die: Let 1 through 5 represent “Please try again!” and 6 represent “You’re a winner!” As a special promotion for its 20-ounce bottles of soda, a soft-drink company printed a message on the 1. Roll your die 30 times to imitate the process of inside of each bottle cap. Some of the caps said, “Please the students in Jorge’s statistics class buying their try again!” while others said, “You’re a winner!” The com- sodas. How many of them won a prize? pany advertised the promotion with the slogan “1 in 6 2. Your teacher will draw and label axes for a class wins a prize.” The prize is a free 20-ounce bottle of soda. dotplot. Plot the number of prize winners you got Jorge’s statistics class wonders if the company’s in Step 1 on the graph. claim holds true at a nearby convenience store. To find out, all 30 students in the class go to the store, 3. Have some students repeat Steps 1 and 2 until and each buys one 20-ounce bottle of the soda. Two you have a total of at least 40 repetitions of the of them get caps that say, “You’re a winner!” Does this simulation for your class. result give convincing evidence that the company’s 4. Discuss the results with your classmates. What per- 1-in-6 claim is false? You and your classmates will per- cent of the time did Jorge’s statistics class get two form a simulation to help answer this question. or fewer prizes, just by chance? Does it seem plau- For now, let’s assume that the company is telling sible (believable) that the company is telling the the truth and that every 20-ounce bottle of soda it fills truth but that the class just got unlucky? Explain. has a 1-in-6 chance of getting a cap that says, “You’re 4 (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 4 27/06/16 1:50 PM L E S S O N 1.1 • Statistics: The Science and Art of Data 5 The previous activity outlines the steps in the statistical problem-solving process. You’ll learn more about the details of this process in future lessons. DEFINITION Statistical problem-solving process2 • Ask Questions: Clarify the research problem and ask one or more valid statistics questions. • Collect Data: Design and carry out an appropriate plan to collect the data. • Analyze Data: Use appropriate graphical and numerical methods to analyze the data. • Interpret Results: Draw conclusions based on the data analysis. Be sure to answer the research question(s)! Classifying Data The table displays data on several roller coasters that have opened since April 2014.3 Roller coaster Type Height (ft) Design Speed (mph) Duration (s) Wildfire Wood 187 Sit down 70.2 120 Skyline Steel 131.3 Inverted 50 90 Goliath Wood 165 Sit down 72 105 Helix Steel 134.5 Sit down 62.1 130 Banshee Steel 167 Inverted 68 160 Black Hole Steel 22.7 Sit down 25.5 75 Most data tables follow this format—each row describes an individual and each column holds the values of a variable. (Sometimes the individuals in a data set are called cases or observational units.) DEFINITION Individual, Variable An individual is a person, animal, or thing described in a set of data. A variable is any attribute that can take different values for different individuals. For the roller coaster data set, the individuals are the 6 roller coasters. The five vari- ables recorded for each coaster are: type, height (in feet), design, speed (in miles per hour), and duration (in seconds). Type and design are categorical variables. Height, speed, and duration are quantitative variables. DEFINITION Categorical variable, Quantitative variable A categorical variable assigns labels that place individuals into particular groups. A quantitative variable takes number values for which it makes sense to find an average. Not every variable that takes number values is quantitative. Zip code is one example. CauTIOn Although zip codes are numbers, it doesn’t make sense to talk about the average zip code. ! In fact, zip codes place individuals (people or dwellings) into categories based on location. (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 5 27/06/16 1:50 PM 6 C H A P T E R 1 • Analyzing One-Variable Data e X A M P L e a So you want to be happy? Individuals and variables PROBLEM: The American Statistical Association sponsors a Web-based project that collects data about primary and secondary school students using surveys. We used the site’s “Random Sampler” to choose 40 U.S. high school students who completed the survey in a recent year.4 The table displays data for the first 10 students chosen. The rightmost column gives students’ answers to the question: Which would you prefer to be? Select one. ______________ Rich ____________ Happy ___________ Famous ___________ Healthy Grade Birth Height Arm span Preferred State level Gender Age month (cm) (cm) status SC 12 Male 17 January 177 161 Famous UT 9 Female 14 March 162 153 Healthy NM 12 Female 17 August 164 167 Healthy CA 12 Female 17 April 153 154 Famous GA 12 Female 17 June 172 169 Happy MI 11 Male 17 March 170 173 Famous IN 12 Female 18 January 168 163 Happy CO 9 Female 14 June 152 160 Happy NJ 10 Female 16 November 165 174 Famous CO 9 Male 15 January 190 177 Rich . . . Identify the individuals and variables in this data set. Classify each variable as categorical or quantitative. SOLUTION: Individuals: 40 U.S. high school students who completed an online survey. Variables: • Categorical: State where student lives, grade level, gender, Grade level is a categorical variable even though it takes birth month, preferred status number values. The numbers place the students into • Quantitative: Age (years), height (centimeters), arm span categories: 9 = freshman, 10 = sophomore, 11 = junior, (centimeters) and 12 = senior. FOR PRACTICE TRY EXERCISE 1. The proper method of data analysis depends on whether a variable is categorical or quantitative. For that reason, it is important to distinguish between these two types of variables. To make life simpler, we sometimes refer to “categorical data” or “quan- titative data” instead of identifying the variable as categorical or quantitative. (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 6 27/06/16 1:50 PM L E S S O N 1.1 • Statistics: The Science and Art of Data 7 Summarizing Data A variable generally takes values that vary from one individual to another. That’s why we call it a variable! The distribution of a variable describes the pattern of variation of these values. DEFINITION Distribution The distribution of a variable tells us what values the variable takes and how often it takes these values. We can summarize a variable’s distribution with a frequency table or a relative frequency table. DEFINITION Frequency table, Relative frequency table A frequency table shows the number of individuals having each data value. A relative frequency table shows the proportion or percent of individuals having each data value. Some people use the terms “frequency distribution” and “relative frequency distribu- tion” instead. To make either kind of table, start by tallying the number of times that the variable takes each value. e X A M P L e Would you rather be happy or rich? Frequency and relative frequency tables PROBLEM: Here are the data on preferred status for all 40 students in the sample from the previous example: Famous Healthy Healthy Famous Happy Famous Happy Happy Famous Rich Happy Happy Rich Happy Happy Happy Rich Happy Famous Healthy Rich Happy Happy Rich Happy Happy Rich Healthy Happy Happy Rich Happy Happy Rich Happy Famous Famous Happy Happy Happy Summarize the distribution of preferred status with a frequency table and a relative frequency table. SOLUTION: Preferred status Tally Start by tallying the number of students in each Famous |||| || preferred status category. Happy |||| |||| |||| |||| | Healthy |||| Rich |||| ||| (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 7 27/06/16 1:50 PM 8 C H A P T E R 1 • Analyzing One-Variable Data Frequency table The frequency table shows the number of students who Preferred status Frequency chose each status. Famous 7 Happy 21 Healthy 4 Rich 8 Total 40 Relative frequency table The relative frequency table shows the proportion or Preferred status Relative frequency Famous 7/40 = 0.175 or 17.5% percent of students who chose each status. Happy 21/40 = 0.525 or 52.5% Healthy 4/40 = 0.100 or 10.0% Rich 8/40 = 0.200 or 20.0% Total 40/40 5 1.000 or 100% FOR PRACTICE TRY EXERCISE 5. The same process can be used to summarize the distribution of a quantitative vari- able. Of course, it would be hard to make a frequency table or a relative frequency table for quantitative data that take many different values, like the ages of people attending a high school band concert. We’ll look at a better option for quantitative variables with many possible values in Lesson 1.5. L e S S O n A P P 1. 1 What are my classmates like? On the first day of a statistics course, the instructor gave all 40 students in the class a survey. The table shows data ges from the first 10 students on the class roster. ma Getty I o/ Homework Have a hot p Pulse Dominant Children last night Sleep smart- ock GenFder ClFarss 3G.P2A2 ra7t2e haRnd in fa3mily (m0–i1n4) 10(h) phoYne? mages/iSt F Fr 2.3 110 L 3 0–14 8 N GLi M Ju 3.8 60 L 6 15–29 7 Y D M So 3.1 72 R 2 15–29 7.5 Y 2. Here are the ages of the 40 students F So 4.0 51 R 1 45–59 7 Y in the class: F So 3.4 68 R 4 0–14 8.5 Y F So 3.0 80 R 3 30–44 7 Y 17 16 17 17 17 16 18 14 16 15 M So 3.5 59 R 2 30–44 7 Y 16 16 17 18 17 16 17 16 15 14 M Fr 3.9 65 R 2 15–29 6 Y M Sr 3.5 104 R 2 0–14 7 N 17 14 14 17 17 17 16 15 17 17 . . . 17 18 18 14 15 18 17 17 17 16 1. Identify the individuals and variables in this Summarize the distribution of age data set. Classify each variable as categorical or with a frequency table and a relative quantitative. frequency table. (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 8 27/06/16 1:50 PM L E S S O N 1.1 • Statistics: The Science and Art of Data 9 Lesson 1.1 W h At D iD yO u L e A r n ? LEARNING TARGET ExAMPLES ExERCISES Identify the individuals and variables in a data set, then classify the p. 6 1–4 variables as categorical or quantitative. Summarize the distribution of a variable with a frequency table or a p. 7 5–8 relative frequency table. E x e r c i s e s Lesson 1.1 The solutions to all exercises numbered in red are Dis- found in the Solutions Appendix, starting on page S-1. tance Room to rate Mastering Concepts and Skills Exercise Internet Restau- site Room ($/ Hotel Pool room? ($/day) rants (mi) service? day) 1. Box-office smash According to the Internet Movie Comfort Out Y 0 1 8.2 Y 149 pg 6 Database, Avatar is tops based on box-office re- Inn ceipts worldwide. The table displays data on sev- Fairfield In Y 0 1 8.3 N 119 eral popular movies.5 Identify the individuals and Inn & Suites variables in this data set. Classify each variable as Baymont Out Y 0 1 3.7 Y 60 categorical or quantitative. Inn & Suites Time Chase Out N 15 0 1.5 N 139 Movie Year Rating (min) Genre Box office ($) Suite Avatar 2009 PG-13 162 Action 2,783,918,982 Hotel Titanic 1997 PG-13 194 Drama 2,207,615,668 Court- In Y 0 1 0.2 Dinner 114 yard Star Wars: 2015 PG-13 136 Adventure 2,040,375,795 The Force Hilton In Y 10 2 0.1 Y 156 Awakens Marriott In Y 9.95 2 0.0 Y 145 Jurassic 2015 PG-13 124 Action 1,669,164,161 World 3. Portraits in data The table displays data on 10 ran- Marvel’s The 2012 PG-13 142 Action 1,519,479,547 domly selected U.S. residents from a recent census. Avengers Identify the individuals and variables in this data set. Furious 7 2015 PG-13 137 Action 1,516,246,709 Classify each variable as categorical or quantitative. The Aveng- 2015 PG-13 141 Action 1,404,705,868 Travel ers: Age of number time to Ultron of family Marital Yearly work Harry Potter 2011 PG-13 130 Fantasy 1,328,111,219 State members age Gender status income (min) and the Kentucky 2 61 Female Married $31,000 20 Deathly Florida 6 27 Female Married $31,300 20 Hallows: Part 2 Wisconsin 2 27 Male Married $40,000 5 Frozen 2013 PG 108 Animation 1,254,512,386 California 4 33 Female Married $36,000 10 Iron Man 3 2013 PG-13 129 Action 1,172,805,920 Michigan 3 49 Female Married $25,100 25 Virginia 3 26 Female Married $35,000 15 2. Tournament time A high school’s lacrosse team Pennsylvania 4 44 Male Married $73,000 10 is planning to go to Buffalo for a three-day tour- Virginia 4 22 Male Never $13,000 0 nament. The tournament’s sponsor provides a list married/ of available hotels, along with some information single about each hotel. The following table displays data California 1 30 Male Never $50,000 15 about hotel options. Identify the individuals and married/ single variables in this data set. Classify each variable as categorical or quantitative. New York 4 34 Female Separated $40,000 40 (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 9 27/06/16 1:50 PM 10 C H A P T E R 1 • Analyzing One-Variable Data 4. Who buys cars? A new-car dealer keeps records on 9 8 6 7.5 7 8 4 7 7 8 car buyers for future marketing purposes. The table gives information on the last 4 buyers. Identify the 8 8 6 7 8 8 7 7 6 8 individuals and variables in this data set. Classify 9 7 6 5 7 8 8.5 7 9 6 each variable as categorical or quantitative. 6 6.5 8 9 5 8 7 7 7 7 Buyer’s distance 8. Crowded house? The online survey also asked how from many people lived in the student’s home. Here are Buyer’s Zip dealer Car Engine type the responses from the 40 students in the sample. name code Gender (mi) model (cylinders) Price Summarize the distribution of household size with P. Smith 27514 M 13 Fiesta 4 $26,375 a frequency table and a relative frequency table. K. Ewing 27510 M 10 Mustang 8 $39,500 3 5 3 2 4 6 4 4 3 5 L. Shipman 27516 F 2 Fusion 4 $38,400 4 4 2 2 4 4 3 4 3 3 S. Reice 27243 F 4 F-150 6 $56,000 5 3 5 5 4 4 4 5 3 3 5. Choose your power The online survey (page 6) also pg 7 asked which superpower students would choose to 3 4 3 3 4 3 2 6 2 4 have—fly, freeze time, invisibility, super strength, or Applying the Concepts telepathy (ability to read minds). Here are the re- sponses from the 40 students in the sample. Summa- 9. Where did you go? June and Barry are interested in rize the distribution of superpower preference with where students at their school travel for spring break. a frequency table and a relative frequency table. So they survey 100 classmates who took a trip dur- ing spring break this year. Then they make a spread- Fly Freeze time Telepathy Fly Telepathy sheet that includes the state or country visited, how Super Telepathy Telepathy Fly Super many nights they spent there, mode of transportation strength strength to get to the destination, distance from home, and Invisibility Freeze time Fly Telepathy Freeze time average cost per night for each student’s trip. Identify Telepathy Super Fly Freeze time Telepathy the individuals in this data set. Classify each variable strength as categorical or quantitative. Freeze Freeze time Freeze time Fly Fly 10. Protecting history How can we help wood surfaces time resist weathering, especially when restoring historic Fly Freeze time Invisibility Fly Invisibility wooden buildings? Researchers prepared wooden Telepathy Telepathy Fly Telepathy Fly panels and then exposed them to the weather. Here are some of the variables recorded: type of wood Fly Telepathy Telepathy Fly Fly (yellow poplar, pine, cedar); type of water repellent (solvent-based, water-based); paint thickness (in 6. Birth months Here are the reported birth months millimeters); paint color (white, gray, light blue); for the 40 students in the online sample. Summa- weathering time (in months). Identify the individu- rize the distribution of birth month with a frequen- als in this data set. Classify each variable as cat- cy table and a relative frequency table. egorical or quantitative. January March August April June 11. Numerical but not quantitative Give two examples March January June November January of variables that take numerical values but are cat- July December April April January egorical. December May December December December 12. Quantigorical? In most data sets, age is classified as June August March January July a quantitative variable. Explain how age could be classified as a categorical variable. April July April June May 13. Car stats Popular magazines rank car models based January August April October January on their overall quality. Describe two categorical December March February July June variables and two quantitative variables that might be considered in determining the rankings. 7. Get some sleep The online survey also asked how much sleep students got on a typical school night. Here 14. Social media You are preparing to study the social are the responses from the 40 students in the sample media habits of high school students. Describe (in hours). Summarize the distribution of sleep amount two categorical variables and two quantitative with a frequency table and a relative frequency table. variables that you might record for each student. (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 10 27/06/16 1:50 PM Lesson 1.2 Displaying categorical Data L e A r n i n g t A r g e t S d Make and interpret bar charts of categorical data. d Interpret pie charts. d Identify what makes some graphs of categorical data deceptive. A frequency table or relative frequency table summarizes a variable’s distribution with numbers. For instance, the Current Population Survey conducted by the U.S. Census Bureau collected data on the highest educational level achieved by U.S. 25- to 34-year-olds in 2014. The relative frequency table summarizes the data.6 To display the distribution more clearly, use a graph. Level of education Percent Less than high school 13.2 High school graduate 22.6 Some college 28.7 Bachelor’s degree 24.9 Advanced degree 10.6 You can make a bar chart or a pie chart for categorical data. (Bar charts are sometimes called bar graphs. Pie charts are sometimes referred to as circle graphs.) We’ll discuss graphs for quantitative data in the next few lessons. DEFINITION Bar chart, Pie chart A bar chart shows each category as a bar. The heights of the bars show the category frequencies or relative frequencies. A pie chart shows each category as a slice of the “pie.” The areas of the slices are propor- tional to the category frequencies or relative frequencies. Figure 1.1 shows a bar chart and a pie chart of the data on the educational achievement of U.S. 25- to 34-year-olds in 2014. You can see that the most common level of education for this age group was “some college.” 11 (C) 2017 BFW Publishers Starnes_3e_CH01_002-093_v4.indd 11 27/06/16 1:51 PM
Description: