ebook img

PDF Format PDF

27 Pages·2015·0.6 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PDF Format

You Will Soon Analyze Categorical Data (Classifying Fortune Cookie Fortunes) Mary Richardson Grand Valley State University [email protected] Published: May 2014 Overview of Lesson Plan In this activity students will have the opportunity to collect and explore real data using two different brands of fortune cookies. Students will open each brand of fortune cookie and classify their fortunes into one of four categories. Students will then construct a two-way frequency table to display their data and they will investigate their results using joint relative frequencies and marginal and conditional distributions. In an extension students will use a chi-square test of homogeneity to determine if the proportions of fortunes within the categories differ for the two brands. GAISE Components This activity follows all four components of statistical problem solving put forth in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report. The four components are: formulate a question, design and implement a plan to collect data, analyze the data by measures and graphs, and interpret the results in the context of the original question. The main activity is a GAISE Level B Activity. The extension of the activity is a GAISE Level C Activity. Common Core State Standards for Mathematical Practice 1. Make sense of problems and persevere in solving them. 2. Reason abstractly and quantitatively. 4. Model with mathematics. 5. Use appropriate tools strategically. 6. Attend to precision. Common Core State Standard Grade Level Content (High School) S-ID. 5. Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Recognize possible associations and trends in the data. S-IC. 1. Understand statistics as a process for making inferences about population parameters based on a random sample from that population. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 1 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication NCTM Principles and Standards for School Mathematics Data Analysis and Probability Standards for Grades 9-12 Formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them:  understand the meaning of measurement data and categorical data, of univariate and bivariate data, and of the term variable. Select and use appropriate statistical methods to analyze data:  display and discuss bivariate data where at least one variable is categorical. Prerequisites For the activity students must know how to calculate relative frequencies. For the extension, some exposure to hypothesis testing would be helpful. Learning Targets After completing the activity, students will be able to create a two-way frequency table from raw data and proceed to examine marginal and conditional distributions in order to help answer a question of interest. If the extension is completed students will learn how to perform the chi-square test of homogeneity and will be able to distinguish between the chi-square test of homogeneity and the chi-square test of independence. Time Required The time required for the activity is roughly 1 class period. Materials Required Students will need a copy of the Activity Sheet (see the end of the lesson); to complete the lesson interactively, each student will need two or three of each of two brands of fortune cookies. Note: (1) A case of fortune cookies, containing 100 cookies, can be purchased for roughly $15. (2) With monetary constraints in mind, a collection of fortune cookie sayings for two different brands of fortune cookies appears at the end of this lesson. The teacher could potentially provide each student with a single fortune cookie and use the sayings that are included with this lesson as part of the data collection process. (3) Some top selling fortune cookie brands are: Golden Bowl (made by Wonton Foods, Inc.), Shang Pin, and Peking Noodle. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 2 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Instructional Lesson Plan The GAISE Statistical Problem-Solving Procedure I. Formulate Question(s) Begin the activity by discussing some history on fortune cookies. Some historical background is provided on the activity worksheet. The worksheet also provides an introduction of and definitions and examples of four categories of fortunes that will be used in the activity: Prophecy, Compliment, Advice, and Wisdom. Explain to students that there are two brands of fortune cookies available and that we would like to determine if the percentage of fortunes falling into the four categories differs for the two brands. II. Design and Implement a Plan to Collect the Data Have students open their fortune cookies, read the fortunes, and tally them into the categories: Prophecy, Advice, Wisdom, and Misc. Note that the Misc. category was created to incorporate Compliments and ‘Other’ types of fortunes. Create regions on the white board where the students can put their tallies. The following table contains example data that might be collected when completing this activity. To replicate this data, each student will need to be given 3 or 4 of each brand of fortune cookie. Text of the individual fortunes extracted from these cookies is provided at the end of the activity worksheet. Table 1. Two-way frequency table for example class data. Type of Fortune Brand of Prophecy Advice Wisdom Misc. Row Cookie Totals Shang Pin 16 34 49 4 103 Golden Bowl 15 21 52 4 92 Column Totals 31 55 101 8 195 *The Misc. category includes Compliments and Other (such as this fortune from a Golden Bowl cookie: “Great! You’re ready for a party.”). III./IV. Analyze the Data/Interpret the Results In order to help determine if the two brands of fortune cookies have similar fortunes students are lead through a series of questions. Students begin by calculating the marginal distribution of the Type of Fortune. Students determine that the percentage of all of the fortune cookie sayings that are Prophecy is 16%. The corresponding percentages for Advice, Wisdom, and Misc. are: 28%, 52%, and 4%. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 3 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Discuss with students that these percentages collectively make up what is called the marginal distribution of the Type of Fortune and ask students to explain why it makes sense to call these percentages a marginal distribution. The term marginal seems appropriate since the percentages were calculated using the table column totals divided by the overall total number of fortunes. The column totals appear in the margin of the table. Next, students are asked to calculate selected joint percentages. For example, the percentage of all of the fortunes that came from a Golden Bowl cookie and contained a Prophecy is 8%. The percentage of all of the fortunes that came from a Shang Pin cookie and contained Wisdom is 25%. Discuss with students that percentages such as these are referred to as joint percentages (relative frequencies) and ask them to explain why it makes sense to call these percentages joint. The percentages describe two characteristics: Brand of Cookie and Type of Fortune, so it seems reasonable to refer to them as joint. Next, students will calculate the conditional distribution of the Type of Fortune given the Brand of fortune cookie. That is, for each brand, the percentages of the Types of Fortunes will be calculated. Note that when the conditional distribution is calculated the Row Totals should be approximately 100%. Table 2 contains the conditional distribution for the data appearing in Table 1. Table 2. Conditional distribution of Type of Fortune given Brand of fortune cookie. Type of Fortune Row Brand of Prophecy Advice Wisdom Misc. Totals Cookie Shang Pin 15% 33% 48% 4% 100% Golden Bowl 16% 23% 57% 4% 100% Based upon the conditional distribution ask students if they think that the two brands Shang Pin and Golden Bowl have the same Type of Fortunes. Of course, if the fortunes for Shang Pin and Golden Bowl were exactly the same, then all of the conditional percentages shown in the table above would be equal. In this case, we can see that Shang Pin and Golden Bowl tend to have the same percentage of fortunes that are Prophetic and that fall into the Misc. category. However, the Shang Pin cookie fortunes have a higher percentage of Advice, by 10% and a lower percentage of Wisdom, by 9%. So, the two brands may not have the same types of fortunes. Finally, students are referred to the results obtained by Yin and Miike when they analyzed the text of fortune cookie sayings in the article A Textual Analysis of Fortune Cookie Sayings: How Chinese Are They? For their data collection, Yin and Miike categorized 595 fortune cookies from a variety of Chinese restaurants. The results of their analysis appear in the table below: _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 4 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Table 3. The results obtained by Yin and Miike. Categories and Themes of Fortune Cookie Sayings (p. 22) Categories Numbers (%) Prophecy 367 (61.7) Compliments 66 (11.1) Advice 72 (12.1) Wisdom 90 (15.1) Total 595 (100) Tell students that we want to see if our data collection produced results comparable to Yin and Miike. In order to make this comparison first have students combine their results for the Shang Pin and Golden Bowl fortune cookies. Have them fill in the 15 cells in the following table. Table 4. Two way frequency table of class results and Yin and Miike’s results. Type of Fortune Row Brand of Prophecy Advice Wisdom Misc. Totals Cookie Shang 31 55 101 8 195 Pin/Golden Bowl Yin and 367 72 90 66 595 Miike’s Brands Column Totals 398 127 191 74 790 Ask students to explain what types of percentages should be used to compare the class results for Shang Pin and Golden Bowl cookies to the results of Yin and Miike: marginal, joint, or conditional. They should respond that the appropriate percentages to use to make this comparison are conditional percentages. After a brief discussion, have them calculate the conditional distribution of Type of Fortune given Brand of cookie. The conditional distribution is shown in Table 5. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 5 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Table 5. Conditional distribution of Type of Fortune given Brand of cookie. Type of Fortune Brand of Prophecy Advice Wisdom Misc. Row Cookie Totals Shang 16% 28% 52% 4% 100% Pin/Golden Bowl Yin and 62% 12% 15% 11% 100% Miike’s Brands After they calculate the conditional distribution students should discuss if they think that the class data collection produced results that are comparable to the results of Yin and Miike. Obviously, the class results are not comparable. Yin and Miike’s cookies overwhelming produced Prophetic fortunes whereas the Shang Pin/Golden Bowl cookies’ fortunes were predominantly fortunes that contained Wisdom. Ask students to provide a possible explanation for the discrepancies in the Types of Fortunes. One thing that comes to mind is that we are not certain of the brands of cookies that Yin and Miike extracted fortunes from. It does not seem as though they were Shang Pin or Golden Bowl cookies. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 6 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Assessment In the General Social Survey, respondents were asked, “Do you agree with the following statement? “In spite of what some people say, the lot (situation/condition) of the average man is getting worse, not better.” The results, for 990 respondents by gender, are shown below. “Lot is getting worse” Gender Agree Disagree Total Female 357 200 557 Male 234 199 433 Total 591 399 990 1. What percentage of the respondents were female and believed that the lot of the average man is getting worse, not better? 2. Calculate the marginal distribution of gender. 3. Calculate the conditional distribution of opinion of the lot of the average man, given gender. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 7 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Answers 1. 357/990 = .3606 so 36.06% 2. Female: 557/990 = .5626 or 56.26% and Male: 433/990 = .4374 or 43.74% 3. “Lot is getting worse” Gender Agree Disagree Total Female 357/557 = 200/557 = 100% .6409 or 64% .3591 or 36% Male 234/433 = 199/433 = 100% .5404 or 54% .4596 or 46% Extension of Introductory Activity Typically a two-way frequency table analysis will be extended to a chi-square hypothesis test. When analyzing data from a frequency table, there are two types of chi-square tests that might be utilized. A test of independence answers the question, “Are the two categorical variables independent for a population under study?” It assesses whether there is a relationship between two variables for a single population. The null hypothesis for the test of independence is that the two categorical variables are not related (independent) for the population of interest. A test of homogeneity answers the question, “Do two or more populations have the same distribution for one categorical variable?” It assesses whether a single categorical variable is distributed the same in two (or more) different populations. The null hypothesis for the test of homogeneity is that the distribution of the categorical variable is the same for the two (or more) populations. The mechanics of tests of independence and tests of homogeneity are the same. The distinction is the way in which the data was collected. If two categorical variables are collected for each subject, then a test of independence should be performed. If a single categorical variable is collected for each of two (or more) groups, then a test of homogeneity should be performed. Students first determine the null and alternative hypotheses to be tested in order to answer our question: Do Shang Pin and Golden Bowl fortune cookies have the same distribution of Type of Fortune? The null hypothesis is that the percentages of the fortunes that are Prophecy, Advice, Wisdom, and Misc. are the same for Shang Pin and Golden Bowl fortune cookies. And the alternative hypothesis is that the percentages of the fortunes that are Prophecy, Advice, Wisdom, and Misc. are not the same for Shang Pin and Golden Bowl fortune cookies. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 8 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Then, students are introduced to the necessary data conditions along with the formula for calculating the chi-square test statistic. The necessary data conditions for the chi-square test of homogeneity are that: (1) all expected counts are greater than 1 and (2) at least 80% of the table cells have an expected count greater than 5. To compute the expected count for each table cell the following formula is applied: Row TotalColumn Total Expected count = . Total n Once the expected counts have been calculated, they are used to calculate the chi-square test statistic: ObservedExpected2 Chi-Square = 2   . Expected all cells Explain to students that the chi-square test statistic measures the difference between the observed counts and the counts that would be expected if the null hypothesis were true. So, a large difference between the counts is evidence against the null hypothesis (or in other words a large test statistic value is evidence against the null hypothesis). Students are asked to calculate the expected counts for the class two-way frequency table. The expected counts are shown in Table 6. Table 6. Expected cell counts for the example class data. Type of Fortune Brand of Prophecy Compliment Advice Wisdom Row Cookie Totals Shang Pin 10331 10355 103101 1038     195 195 195 195 103 16.37 29.05 53.35 4.23 Golden Bowl 9231 9255 92101 928     195 195 195 195 92 14.63 25.95 47.65 3.77 Column 31 55 101 8 195 Totals Students see that none of the expected counts are less than 1. However; only 75% of the expected counts are greater than 5. After noting that for the class data the necessary conditions have not been met for the chi-square test of homogeneity, explain to students that the test will be performed anyway, for purposes of illustration. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 9 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication Applying the formula for the chi-square test statistic to the example class data: 1616.372 3429.052 4953.352 44.232 2     16.37 29.05 53.35 4.23 1514.632 2125.952 5247.652 43.772     2.58. 14.63 25.95 47.65 3.77 In order to have students calculate the p-value ask them to recall that a large test statistic is evidence against the null hypothesis. Thus the p-value will be the probability that the chi-square test statistic could have been as large or larger if the null hypothesis were true. On the TI-84 PLUS calculator students can use the test statistic value to find the corresponding p-value. Select 2nd  DISTR 2cdf( ENTER. Within the parentheses, the students need to enter the lower bound, upper bound, degrees of freedom. The lower bound will always be the test statistic due to the shape of the chi-square distribution. For the upper bound, students can enter any very large number such as 10000000. The degrees of freedom are found using the formula df (r1)(c1),where r is the number of rows in the table and c is the number of columns. Note that in our two-by-four table, the degrees of freedom are equal to 3. So for our example class data the p-value is .4610. Based upon the p-value, students decide whether or not to reject the null hypothesis and provide a conclusion in this problem’s context. Since the p-value is rather large, at any reasonable level of significance, the null hypothesis will not be rejected. The data do not provide significant evidence to indicate that the percentages of the fortunes that are Prophecy, Advice, Wisdom, and Misc. differ for Shang Pin and Golden Bowl fortune cookies. Finally, discuss with students that the assumptions, test statistic calculation, and p-value calculation are the same for the chi-square test of homogeneity and the chi-square test of independence. The distinction lies in how the data were collected and in the formulation of the hypotheses. _____________________________________________________________________________________________ STatistics Education Web: Online Journal of K-12 Statistics Lesson Plans 10 http://www.amstat.org/education/stew/ Contact Author for permission to use materials from this STEW lesson in a publication

Description:
Common Core State Standard Grade Level Content (High School). S-ID. Students will need a copy of the Activity Sheet (see the end of the lesson); to complete the . 8. 195. Yin and. Miike's Brands. 367. 72. 90. 66. 595. Column Totals 398 .. Early fortunes featured Biblical sayings, or aphorisms from
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.