In general, the purpose of statistical tests is to determine whether some hypothesis is extremely unlikely given observed data. Note that there are two philosophical approaches to such tests. The first approach, significance testing (due to Fisher), is aimed at quantifying the evidence against a particular hypothesis being true. The second approach, hypothesis testing (due to Neyman and Pearson), is aimed at making a simple decision as to whether to reject or retain a hypothesis. The difference is important in that the evaluation of evidence takes place in context and will lead to opinions that can be revised when new evidence becomes available. However, once a decision has been taken, it is final and cannot be changed. This important difference is frequently overlooked and statisticians often treat the terms "significance test" and "hypothesis test" as though they are interchangeable. They are not!
A data analyst frequently wants to know whether there is a difference between two sets of data, and whether that difference is likely to occur due to random fluctuations, or is instead unusual enough that random fluctuations rarely cause such differences.
In particular, frequently we wish to know something about the average (or mean), or about the variability (as measured by variance or standard deviation).
Statistical tests are carried out by first making some assumption, called the Null Hypothesis, and then determining whether the data observed is unlikely to occur given that assumption. If the probability of seeing the observed data is small enough under the assumed Null Hypothesis, then the Null Hypothesis is rejected.
A simple example might help. We wish to determine if men and women are the same height on average. We select and measure 20 women and 20 men. We assume the Null Hypothesis that there is no difference between the average value of heights for men vs. women. We can then test using the t test to determine whether our sample of 40 heights would be unlikely to occur given this assumption. The basic idea is to assume heights are normally distributed, and to assume that the means and standard deviations are the same for women and for men. Then we calculate the average of our 20 men, and of our 20 women, we also calculate the sample standard deviation for each. Then using the ttest of two means with 402 = 38 degrees of freedom we can determine whether the difference in heights between the sample of men and the sample of women is sufficiently large to make it unlikely that they both came from the same normal population.
