Contents 


Contents 
Table of contents 
Categories: STASTE
Statistics is a part of mathematics. Many fields of science do tests or measurements. Statistics is a collection, analysis, interpretation or explanation of data.^{[1]} Statistics can help in two basic ways:
Contents 
Statistics has been in use for a long time. The first known statistics are census data. The Babylonians did a census around 3500BC, the Egyptians around 2500 BC, and the Ancient Chinese around 1000 BC.
Before we can describe the world with statistics, we must collect data. The data that we collect in statistics are called measurements. After we collect data, we use one or more numbers to describe each observation or measurement. For example, suppose we want to find out how popular a certain TV show is. We can pick a group of people (called a sample) out of the total population of viewers. Then we ask each one how often they watch the show, or (better) we measure it by attaching an apparatus to each of their television sets. For another example, if we want to know whether a certain drug can help lower blood pressure, we could give the drug to people for some time and measure their blood pressure before and after.
Most often we collect statistical data by doing surveys or experiments. To do a survey, we pick a small number of people and ask them questions. Then, we use their answers as the data.
The choice of which individuals to take for a survey or data collection is important, as it directly influences the statistics. When the statistics are done, it can no longer be determined which individuals are taken. Suppose we want to measure the water quality of a big lake. If we take samples next to the waste drain, we will get different results than if the samples are taken in a far away, hard to reach, spot of the lake.
There are two kinds of problems which are commonly found when taking samples:
We can reduce chance errors by taking a larger sample, and we can avoid some bias by choosing randomly. However, sometimes large random samples are hard to take. And bias can happen if some people refuse to answer our questions, or if they know they are getting a fake treatment. These problems can be hard to fix. See also standard error.
The middle of the data is called an average. The average tells us about a typical individual in the population. There are three kinds of average that are often used: the mean, the median and the mode.
The examples below use this sample data:
Name  A B C D E F G H I J  score 23 26 49 49 57 64 66 78 82 92
The formula for the mean is
$\backslash bar\; x\; =\; \backslash frac\{1\}\{N\}\backslash sum\_\{i=1\}^N\; x\_i\; =\; \backslash frac\{x\_1+x\_2+\backslash cdots+x\_N\}\{N\}$
Where $x\_1,\; x\_2,\; \backslash ldots,\; x\_N$ are the data and $N$ is the population size. (see Sigma Notation).
This means that you add up all the values, and then divide by the number of values.
In our example $\backslash bar\; x\; =\; (23+26+49+49+57+64+66+78+82+92)/10\; =\; 58.6$
The problem with the mean is that it does not tell anything about how the values are distributed. Values that are very large or very small change the mean a lot. In statistics, these extreme values might be errors of measurement, but sometimes the population really does contain these values. For example, if in a room there are 10 people who make $10/day and 1 who makes $1,000,000/day. The mean of the data is $90,918/day. Even though it is the average amount, the mean in this case is not the amount any single person makes, and is probably useless.
The median is the middle item of the data. To find the median we sort the data from the smallest number to the largest number and then choose the number in the middle. If there is an even number of data, there will not be a number right in the middle, so we choose the two middle ones and calculate their mean. In our example there are 10 items of data, the two middle ones are "57" and "64", so the median is (57+64)/2 = 60.5. Another example, like the income example presented for the mean, consider a room with 10 people who have incomes of $10, $20, $20, $40, $50, $60, $90, $90, $90, $100, and $1,000,000, the median is $55 because $55 is the average of the two middle numbers, $50 and $60. If the extreme value of $1,000,000 is ignored, the mean is $57. In this case, the median is close to the value obtained when the extreme value is thrown out. The median solves the problem of extreme values as described in the definition of mean above.
The mode is the most frequent item of data. For example the most common letter in English is the letter "e". We would say that "e" is the mode of the distribution of the letters.
For example, if in a room there are 10 people with incomes of $10, $20, $20, $40, $50, $60, $90, $90, $90, $100, and $1,000,000, the mode is $90 because $90 occurs three times and all other values occur fewer than three times.
There can be more than one mode. For example, if in a room there are 10 people with incomes of $10, $20, $20, $20, $50, $60, $90, $90, $90, $100, and $1,000,000, the modes are $20 and $90. This is bimodal, or has two modes. Bimodality is very common and often indicates that the data is the combination of two different groups. For instance, the average height of all adults in the U.S. has a bimodal distribution. This is because males and females have separate average heights of 1.763 m (5 ft 9+1/2 in) for men and 1.763 m (5 ft 9+1/2 in) for women. These peaks are apparent when both groups are combined.
The mode is the only form of average that can be used for data that can not be put in order.
Another thing we can say about a set of data is how spread out it is. A common way to describe the spread of a set of data is the standard deviation. If the standard deviation of a set of data is small, then most of the data is very close to the average. If the standard deviation is large, though, then a lot of the data is very different from the average.
If the data follows the common pattern called the normal distribution, then it is very useful to know the standard deviation. If the data follows this pattern (we would say the data is normally distributed), about 68 of every 100 pieces of data will be off the average by less than the standard deviation. Not only that, but about 95 of every 100 measurements will be off the average by less that two times the standard deviation, and about 997 in 1000 will be closer to the average than three standard deviations.
We also can use statistics to find out that some percent, percentile, number, or fraction of people or things in a group do something or fit in a certain category.
For example, social scientists used statistics to find out that 49% of people in the world are males.
Error creating thumbnail: sh: convert: command not found 
Here are sentences from other pages on Statistics, which are similar to those in the above article.
