Galaxy Hunter
Teacher Page: Lesson Plan

Index:

Goal/Purpose:

Galaxy Hunter uses real data from the Hubble Deep Fields to strengthen students' understanding of simple random samples. Students will investigate bias in sampling techniques and use min/max plots to determine optimal sample size based on the variability of different-sized samples. They will use statistical data in order to make conjectures about the universe.

Desired Learning Outcomes:

• Analyze sampling methods for bias.
• Take a simple random sample to draw conclusions from the data.
• Use sample variability to determine optimal sample size.
• Compare sample data with the population parameter to determine accuracy of sampling techniques.
• Use statistical data to make conjectures about the universe.

Prerequisites:

Before attempting to complete this lesson, the student should:

• be able to construct and interpret frequency tables
• demonstrate knowledge of simple random samples
• be able to define range, mean, and median
• demonstrate knowledge of min/max plots

New Vocabulary:

Math Terms

Bias-
A systemic error in sample statistics that can occur from the use of poor sampling methods.

Data-
Information (such as age, color, or shape) gathered concerning a sample or population.

Mean-
The average value found by adding the values for all the members of the sample and dividing the sum by the number of members. The mean can be skewed to one side or the other by an extreme score - one that is out of line with the others.

Measures of Central Tendency-
Numerical values that are located, in some sense, in the middle of a sample or population. These include the mean, median, and mode.

Median-
The value that falls in the middle of a sample; there are as many members above the median as below it. Since the median falls in the middle of the sample, it is not skewed by extreme scores like the mean.

Min/max plot-
A visual representation of the range. It shows the dispersion of the data by using a line to connect the lowest-valued (smallest) piece of data to the highest-valued (largest) one. For example, the min/max plot for the data: 35, 47, 58, 70, 71, 75 would look like this:

 35 75

Mode-
The number, or range of numbers, in a set that occurs most frequently.

Population-
All members of some set that, as a whole, is generally too large to study. Sometimes the term population refers to people or objects; other times it refers to measurements of those people or objects. Both meanings are used and the context usually tells which one is meant.

Population Parameter-
The theoretical descriptive measure on the scores of the population. Generally a population is too large to record all the scores but theoretically, the parameters such as the mean, median and standard deviation for the population exist. The goal of sample statistics is to be able to infer something meaningful about a population parameter based on a sample statistic.

Random Number Table-
A list of random numbers from which single digit (0-9), double digit (00 - 99) or multi-digit numbers can be selected (usually by randomly pointing to a spot on the table). These numbers can then be used to identify the members of a population to be included in a sample or to simulate an experiment. Most random number tables include a suggestion for how to use them.

Range-
The simplest measure of dispersion. It is the interval between the highest-valued (largest) and lowest-valued (smallest) pieces of data, expressed as the difference between them.

Sample-
A subset of the population obtained by collecting information only about some members of the population. Sometimes the term sample refers to people or objects; other times it refers to measurements of those people or objects. Both meanings are used and the context usually tells which one is meant.

Sample statistic-
Any descriptive measure on the scores of a sample, such as the mean, median, or standard deviation.

Simple random sample (SRS)-
A simple random sample is gathered by selecting each member without replacement, which means that a member is not available to be selected more than once. The members are selected by chance so that each member has an equal opportunity of being selected. The sample, of size n, is representative of the population as a whole.

Smallest Reasonable Sample-
Statisticians look for the smallest sample that is reasonable. The smallest sample minimizes time and costs. A reasonable sample is one that is unbiased and large enough to adequately represent the population it's derived from.

Science Terms

Bulge-
The bulge is a spherical structure at the center of spiral galaxies composed primarily of old stars, with a little gas and dust. The bulge of the Milky Way is about 30,000 light-years across.

Disk-
The disk is a pancake-shaped structure in spiral galaxies and is composed primarily of young and middle-aged stars, with abundant gas and dust. Some old stars are also present. The disk contains the spiral arms and slices through the bulge and halo of a spiral galaxy. The disk in the Milky Way is about 100,000 light-years across and 2,000 light-years thick.

Elliptical Galaxy-
A galaxy having an oval or nearly spherical shape. Some are more elongated than others. Resembling a bulge and halo, it is composed mostly of old stars and contains very little gas and dust. The smallest elliptical galaxies (called "dwarf ellipticals") are probably the most common type of galaxy in the nearby universe.

Galaxy-
A galaxy is an enormous collection of a few million to trillions of stars, gas, and dust held together by gravity. They can be several thousand to hundreds of thousands of light-years across.

Halo-
The halo extends outward from the bulge and disk, and contains clusters of old stars ("globular clusters"), individual stars, and a small amount of gas and dust. In the Milky Way, the halo measures about 130,000 light-years across.

Hubble Deep Fields (HDFs)-
These are the deepest, sharpest, multi-color, optical wavelength images of the faintest universe. The images were made by aiming the telescope at one point in the sky for 10 days. There are two HDFs - one in the northern sky and one in the southern sky.

Hubble Space Telescope-
An automated reflecting telescope which orbits the Earth, built by the National Aeronautics and Space Administration and the European Space Agency. It contains instruments capable of receiving many types of radiation.

Irregular Galaxy-
A galaxy whose shape is neither elliptical nor spiral. It contains both young and old stars and is often rich in gas and dust. These galaxies often have active regions of star formation. Sometimes the irregular shape of these galaxies results from interactions or collisions between galaxies. Observations such as the Hubble Deep Fields show that irregular galaxies were more common in the distant (early) universe.

Light-year-
The distance traveled by light in a full year - some 10 trillion kilometers (about 6 trillion miles). To calculate the distance light travels in one year, multiply the speed of light, 300,000 km/s, by the length of a year expressed in seconds: 1LY = (300,000 km/s)(31,536,000 sec) = 10 trillion kilometers

Milky Way-
The specific galaxy to which our solar system belongs, so named because most of its visible stars appear overhead on a clear, dark night as a milky band of light extending across the sky. The Milky Way is a spiral galaxy.

Redshift-
When an object is moving away from an observer, the light it emits appears to be redder than it would be if the object were at rest. This effect is similar to the apparent change in pitch associated with moving objects that emit sound waves (the Doppler effect). For example, the pitch of a fire truck's siren sounds higher (having a shorter wavelength) as it approaches a car stopped along the side of the road. The pitch drops significantly as the truck passes the car and sounds much lower (having a longer wavelength) as the truck continues to move away. When dealing with light, the shift to longer wavelengths is referred to as redshift because red light has the longest wavelength within the optical spectrum. (The wavelength shift occurring when objects are moving toward an observer is called a blueshift.) The cosmological redshift tells astronomers how fast the universe is expanding.

Spiral Arms-
Pinwheel shaped features located in the disk of a spiral galaxy. The new bright blue stars that are born there make the spiral pattern visible.

Spiral Galaxy-
A galaxy made up of a disk with spiral (pinwheel-shaped) arms, a bulge near its center, and a halo. The sizes of the disk and bulge vary. The galaxy is composed of a mixture of old and young stars as well as gas and dust. The spiral arms are sites of active star formation. The majority of large galaxies in the nearby universe are spirals.

Whole population of HDFs-
Astronomers counted and classified all the galaxies that they could reliably count and classify in each of the Deep Fields.

General Misconceptions:

Math Misconceptions

Students may seem to think that any way they choose data will give results representative of a population. Humans are not good at selecting data in a truly random fashion, since they tend to introduce unconscious bias, whereas computers do not.

Students might think that closing eyes and picking a sample by pointing a finger produces an unbiased sample. Bias is introduced when the student decides which item the finger is covering. Also, the student will have a mental image of the area from which the sample is taken and may consciously move the finger to different locations in order to avoid being biased which introduces bias.

Science Misconceptions

Students may think that galaxies are static and do not change with time. In fact, galaxies are dynamic and change over millions of years.

Preparation Time:

2. Teachers should allow time to preview the lesson and to read the science background pages. These pages will provide additional content that will help teachers to answer questions posed by students.
3. By previewing the lesson plan, teachers will be able to select an engagement activity, identify follow-up activities, allow time for gathering supplies such as the student travelogue (worksheet), and determine time needed by students to complete the lesson.

Execution Time by Module:

The amount of time needed to complete any of these modules will vary depending on the length of available teaching time and the ratio of computers to students in the class. One possible way to jumpstart your lesson and eliminate the trial-and-error that is sometimes needed to become familiar with a new lesson is to have the students do just one activity or a part of a module. Use an overhead, an LCD, or a TV monitor to project the lesson to the class. The following are estimated times:

• Start Safari 10-15 minutes
• Bias 10-15 minutes
• Sample Size 15-20 minutes
• HDF-N vs. HDF-S 10-15 minutes
• Last Stop 5-10 minutes

Physical Layout of Room:

Students can work in groups of two or individually in a computer lab. Adaptations can be made to accommodate classrooms with only a single computer having Internet access. This might include using an overhead projector with an LCD that projects the computer image on a screen or a hookup from a computer to a television monitor.

You can also do Galaxy Hunter off-line. Different software programs provide off-line access to the Internet. Their programs allow you to save web pages to your local hard drive. Using your Netscape browser you can open the web pages locally and experience the lesson as if you were on the Internet. Using this option, however, will deny students access to the references (identified in the Grab Bag pages) available on the World Wide Web.

Materials:

This lesson requires a computer with a color monitor and Internet connection. The Web browser used must have at least the capability of Netscape's Navigator 4.0 or Internet Explorer 4.0. For additional information, see the Computer Needs section.

Teachers might want to use the Student Travelogue (worksheet) found in the Grab Bag section of this lesson, under Downloadable Documents, to monitor student progress as they record data and answer questions.

Procedure / Directions:

This is a self-directed interactive computer activity. Students may work independently or in small groups to complete each activity.

Engagement Activity:

Step-by-step Instructions:

These are self-directed activities. Students can work independently or in pairs to complete each lesson activity. The "One-Computer Classroom" section contains optional discussion topics for certain points in the lesson.

Start Safari
After choosing one of the Deep Fields and hand-picking a sample, students compare their result to that for the whole field. This leads students to explore bias and sample size, both of which affect the validity of results.

Bias
Bias is explored when the students decide which of several sampling methods are biased. They see how bias affects the percentage of irregular galaxies determined to be in the sample from the Deep Field.

Sample Size
The optimal sample size is determined by exploring sample variability. The concept of variability is introduced through a min/max plot of 5 samples of the same size as the students' initial selection. The concept is then extended to the whole population, using increasingly larger sample sizes. The mean and median are added to the smaller sample sizes in order to pinpoint the spot where variability settles down and the measures of central tendency approach a constant value. The point where this first occurs is the smallest reasonable sample size. Students' understanding is checked by asking them to find the best sample size from real data taken from the relevant HDF.

HDF-N vs. HDF-S
In this assessment, students generate a sample from the other Deep Field and compare the two fields to decide if they are similar. This provides students with a real-life example of how statistics can be used by scientists.

The Last Stop
One of the fundamental questions in astronomy asks how galaxies formed and evolved. The Hubble Deep Fields provide a glimpse into the history of our universe, showing what galaxies looked like 2 to 10 billion years ago. Students will be asked whether or not they think the distant universe looks like the nearby universe. The activity concludes by revealing that irregular galaxies are much more common in the distant universe than they are in the nearby universe. This is one of the most intriguing results from the Hubble Deep Fields.

Evaluation / Assessment:

There are several assessment activities/questions within this lesson. Students are asked to decide between biased and unbiased sampling methods and queried about a reasonable sample size.

The Galaxy Hunter Travelogue provides additional questions to assess students' understanding and information-processing skills, and to encourage critical thinking.

Solutions to the Student Travelogue

1. What method, if any, did you use to select your sample of galaxies?

Students may have selected the galaxies by size, shape, color, or location on the map. They may have tried to make the selection random by closing their eyes and pointing to different locations.

2. Record your data from the HDF- N or S (circle one) in the table at left, below.

Students' answers will vary depending on the size of their sample and how they selected their sample.

3. Record the actual result for percentage of irregulars in the table above, right. Describe the variation between your result for percentage of irregular galaxies and the actual result. What do you think accounts for this difference?

The actual percentage of irregulars for the whole HDF-N is 70.7%; for the whole HDF-S, it is 67.2%. Variations between the accepted result and the students' results could be caused by too few galaxies in their samples and/or bias in their sampling methods. For example, the brightest galaxies in either HDF tend to be spirals and ellipticals. If the student chose bright galaxies, their percentage of irregulars would be lower than the value for the whole HDF. If they chose only dim galaxies, they might come up with a value very close to or even higher than that for the whole HDF.

4. Define bias.

Bias is a systematic error in sample statistics that can occur from the use of poor sampling methods. The students might include a statement about humans unconsciously basing choices on a trait or quality of the sample member.

5. Based on your current knowledge, was your method of selecting the sample of galaxies described in question 1 biased? What effect might this have had on your results?

Students will probably say that their method was biased and that the results are not reliable due to this bias. Student results will probably be too low but they could be very close to the actual results or even slightly higher depending on how they selected their galaxies.

6. If a computer randomly chooses a fixed number of galaxies, 5 different times, would you expect it to keep getting the same result for percentage of irregulars each time, since the sample sizes were the same each time? Why or why not?

The results will vary each time a sample is taken because, with small sample sizes, the likelihood of selecting the same proportion of irregulars each time is low. With sample sizes of 5 or 10, the variability of results will be great, with percentages of irregulars ranging from 30% or less up to 90% or more. The smaller the sample size, the greater the variability of results. If the sample size is larger than 35, the students will not see as great a variability as they would if the sample size were lower, but there will still be some variability. At a sample size of 35, percentages of irregulars will generally range from 60-85%.

7. Compare the positions of the mean and median on any single min/max plot. How would those positions change if the lowest value on the min/max plot were zero?

The positions of the mean and median may be close to each other if the distribution of results is close to a bell curve. For example, if the five values are: 40%, 50%, 60%, 70% and 80%, then the median is 60% and the mean is also 60%. If the data is skewed, then the median and mean will be farther apart. For example, if the five values are 60%, 60%, 60%, 80% and 90%, then the median would be close to 60% and the mean would be 70%. If the lowest value on the min/max plot were zero, the median would probably remain the same but the mean would be lower.

8. Explain why the variability of sample results approaches zero as the sample size approaches the population size.

When the whole population is sampled, there is no variability. As the sample size approaches the population size, the likelihood of obtaining a different sample population decreases because only a few of the members are "left out" of the sample. Since so many of the members of the population are included, missing a few won't greatly affect the proportions of the sample, so the variability is slight.

9. Predict a range of acceptable percentages for irregulars based on the length of the min/max bar for your best sample size. The range of percentages for the whole chart is 40% to 100%.

Estimates will vary based on student sample size. Generally speaking, the min/max bar for the best sample size is 10 units long, so the range is about 10%.

10. Copy the table, making sure you fill in your sample size and the full name of the HDF you used. Compare your value for the percentage of irregulars with your predicted range of acceptable percentages from above.

Student values will vary for their sample. Students' values should fall within 10% of the astronomers' results for the whole HDF. The astronomers' results for the whole HDFs are as follows:

Actual Percent of Galaxies for HDF-N and HDF-S
Galaxy Type HDF-N HDF-S
Ellipticals 4.6% 5.1%
Spirals 24.7% 27.7%
Irregulars 70.7% 67.2%

11. Now that you know about sample size and bias, do you think the percentage of irregulars in your original sample is within acceptable range of the actual results? Explain. (Your results and the actual value for percentage of irregulars can be found in question 2 above.)

Students will probably decide that their original sample size was too small and subject to bias so their results were not within the acceptable range.

12. State the sample size you will use and explain why you chose this size.

Students should choose the same-sized sample for the other HDF as they choose for the first one, since the populations of the two fields are about the same.

13. Do you think your results are close to the astronomers'? Explain.

If the students selected the correct sample size, their results will fall close to those of the astronomers'. We are considering their results close if they are within a few percentage points of the astronomers' results.

14. Comparing the two HDFs, could you say that the universe probably looks the same in these two directions? Explain.

Since the proportions of galaxies for the two fields are similar, students should conclude that the universe probably looks the same in these two directions.

15. Using what you've learned about sample size and bias, could you use the HDFs to make a general statement concerning the uniformity of the universe? Explain.

No. Each HDF represents only a tiny fraction of the sky, and could easily be subject to bias. In order to make a general statement about the uniformity of the sky, many more HDFs would need to be randomly selected from the whole sky. The fact that the two fields are similar to each other is consistent with astronomers' supposition that the universe looks the same in all directions.

16. What is the most common type of galaxy in (a) the local universe and (b) the faraway universe pictured in the HDFs?

(a) When one counts both large and small galaxies, dwarf ellipticals (small ellipticals) are probably the most common type of galaxy in the nearby universe. Since these galaxies are small and faint, the exact number of these galaxies is not well known. The majority of large, bright galaxies in the nearby universe are spirals. Large bright elliptical galaxies are relatively rare. (b) In the faraway universe pictured in the HDFs, the most common type of galaxy is irregular. Astronomers hope that this information will help them understand the fundamental question of how galaxies form and evolve.

Follow-up Activities / Interdisciplinary Connections:

Activity 1 should only be attempted if students have previous experience with confidence intervals.

1. Suppose scientists could take a random sample of the entire sky? How many galaxies would need to be included in order for the sample to be representative of the whole population of galaxies in the universe? More advanced students can calculate this minimum sample size by using the following formula where the sample proportion value is unknown:

 n= (za/2)2pq E2
 Where: n = sample size za/2 =critical value of z E = maximum error of estimates p = estimated population proportion q = 1 - p

If no approximate value of p is known, then p = .5 and q = .5 (1-.5)

There are some interesting websites listed in the Grab Bag that will calculate sample sizes. These require entering a confidence level and interval.

The answer to the equation above - a value of about 350 - allows us to conclude that the HDFs are large enough to be representative of the universe as a whole, since the population for HDF-N is 1067 and the population for HDF-S is 1275. However, each of the galaxies within the HDF was not selected randomly and the sample could be subject to bias. Note: In the lesson, the students are determining the minimum sample size that is representative of the HDFs as a whole, not the universe.

2. Use the Web to explore how astronomers assign coordinates to different points in the sky.

3. Use the Web to find out what mathematics is necessary for a career in astronomy.

4. Use a graphic organizer to compare how science fiction stories, movies and TV shows have depicted deep space, in contrast to reality, as in the example below:

Fiction Reality
All stars look white. Stars come in a variety of colors: blue, white yellow, orange, and red.

One-Computer Classroom:

It is recommended that teachers project the images from the computer onto a classroom screen using an overheard LCD or use a large-screen television monitor.

Here are two suggestions to facilitate a large group presentation and to avoid last minute glitches that can happen when using the Internet:

1. Bookmark the lesson and any links you may find useful.
2. Use the student worksheet (Travelogue) to accompany each part of the lesson.

Take advantage of the enforced pace of a one-computer classroom lesson by inserting more discussion at key points in the activity. Some discussion points are listed below.

1. In the section, "Start Safari", after students have generated data from their clickable maps (title of page: Photo Safari Results), the teacher can collect the data and lead a discussion on sampling methods used in the class. In the case of a one-computer classroom, use the "Back" button and allow several students to generate samples and record the data. A box plot of this data for one type of galaxy could be constructed and would show the variation. Note: This data is biased and no averaging of values should be performed here!
2. In the section "Bias," before leaving the page titled "Banishing Bias" and moving to the page titled "Comparing Sampling Methods," the teacher can have students predict what the percentage of irregulars might be for each method of choosing galaxies. Record these values and then move on to "Comparing Sampling Methods" to see how close the students' predictions were.
3. In the section "Bias," record data from the page titled "Comparing Sampling Methods," which shows the results for different sampling methods. Teachers can have students generate a box plot of the different percentages for each galaxy. These box plots should show how widely the data is spread for these different sampling techniques. Note: This data is biased and no averaging of values or discussion of median should be presented here! The box plot is used to graphically display the great variation between methods.

Classrooms without Computers:

Use the Grab Bag section in this lesson to download the databases used for the HDF-N and HDF-S. The Grab Bag also has downloadable images of selected pages (Banishing Bias, Why 'Choosing with Eyes Closed' is Biased, Comparing Sampling Methods, Variability vs. Sample Size, Where's the Best Sample Size, HDF- N: Where's the Best Sample Size?, Stare & Compare: Astronomer's Results, Does the Universe Look the Same in all Directions?, and The Universe Looks Different at Different Depths), which the teacher can use to prepare overhead transparencies for use in classrooms without computers.

1. Present students with an image of the HDF-N along with the database, which can be downloaded from the Grab Bag. Ask them to design a method of selecting a sample of galaxies from this information. {Display the Hubble Deep Field, either as a poster or an 8 x 10 image, and ask students what kinds of statistical treatments might be possible. Posters can be obtained from a NASA Educator Resource Center (http://spacelink.nasa.gov/Instructional.Materials/NASA.Educational.Products/Accessing.NASA.Education.Brochure/erc_nasa.html). The lithograph (an 8 x 10 image) can be downloaded from the Space Science Education Resource Directory (http://teachspacescience.org). (Search for "Hubble's Galaxy Gallery".)}

2. Prepare a transparency of "Banishing Bias" as a reinforcement of valid sampling technique. Go through this activity as a group, putting sampling methods in the correct category. A show of hands could be used as a voting tool. Note: Many students will think that "choosing with eyes closed" is not biased. You can use the transparency, "Why 'Choosing with Eyes Closed' is Biased", to correct this misconception.

3. Prepare a transparency of "Comparing Sampling Methods". Lead students to predict values for each method. Reveal the actual percentages one-by-one for student analysis.

4. Revise the sampling method designed in step 1. Break the students into 5 or 7 groups and have each group take an unbiased sample of 5-10 galaxies from one HDF using the downloaded database. Ask the students to determine the percentage of irregular galaxies in their sample. (Note: There is a random number generator in the Grab Bag which the teacher could use to generate samples for the students so that they would only have to look up the values corresponding to the galaxy's number. There are 1067 galaxies in the HDF-N and 1275 in the HDF-S.)

5. Display the results of each group's sample and discuss why there are differences. Have students determine the mean and median for the grouped samples.

6. Use transparencies of the hypothetical data "Variability vs. Sample Size" to discuss how the range decreases as the size of the sample increases. Next, use "Where's the Best Sample Size" and add the mean or median to the plots to find the point where the measure of central tendency settles into a constant value. The point where this first occurs is the smallest reasonable sample size.

7. Display a transparency of the real data "HDF-N: Where's the Best Sample Size?" and ask students to select the smallest reasonable sample size based on the variability of the data.

8. Once a sample size is established, have half the class generate a random sample from each HDF using that size and determine the percentage of each galaxy type. Compare the student results with the astronomers' results for the whole HDF-N and the whole HDF-S. Use the transparency "Stare & Compare: Astronomer's Results" for this.

9. Using the same transparency, "Stare & Compare: Astronomer's Results," discuss whether the results for each HDF indicate that the sky looks the same in these two directions. Then use the information contained in "Does the Universe Look the Same in All Directions?" to discuss whether the results support the supposition that the universe looks largely the same in all directions. Then use "The Universe Looks Different at Different Depths" to discuss how the galaxy composition of the local universe compares with that of the HDF views. The answers to questions 14-16 in the Galaxy Hunter Travelogue (found in this section of the Teacher Pages) contain information to assist you in this discussion.

Home Schooler:

This lesson is easily followed without additional teacher support if the prerequisites are met. Some of the follow-up activities require experience with confidence intervals and hypothesis testing and are labeled as such. They should not be attempted without adequate prior instruction. Parents can preview the lesson and examine the teacher pages ahead of time. A wealth of information can be found at Hubblesite, the Hubble Space Telescope's website at the Space Telescope Science Institute. Here you can find background information on the telescope, pictures and news releases of past and present stories, education activities, and other science resources.