video 1.1. intro to stats

preview_player
Показать описание

Intro, descriptive and inferential statistics, populations and samples.
Closed captioning text:
Рекомендации по теме
Комментарии
Автор

Hello. Welcome to the first video of the Statistics Video Textbook. This is an open educational resource version of a textbook, where instead of the textbook being written, it is in videos. You should be able to find the link to the whole textbook down in the description down below in YouTube.

Overall, what this covers is the content for an introductory statistics course in psychology. So that is a little bit different from the statistics you might get in a math program. For example, because they will probably talk a lot more about conditional probabilities. But we will not do that here because
psychologists do not use those very often. What we will do in this class is we will talk about variables and descriptive statistics ... distributions and the logic of inferential tests or statistical inference. And that is the whole first half of this class. The second half of the class is introducing the standard inferential tests that are used by psychologists, so t tests, ANOVAs, correlation, regression. And we will talk just a little bit at the end about non-parametric tests, and in particular we will talk about chi-square tests.

As somebody taking an introductory statistics class, you might wonder why statistics are important. They are important primarily in two ways. One is that we are all consumers of claims that are based on statistics. Scientific research all uses statistics. When you are reading about a claim that is made based on numbers, the more you understand statistics, the better you are able to understand and critically evaluate a claim that is made based on data or statistics. The other main use for statistics is by producers of knowledge. These are folks who are research scientists or people that are collecting data about polls or anything like that. I am a research scientist and I use statistics all the time. What we do is when we collect our data, we need to make sense of that data. And that is what statistics do for us. They help us summarize and describe our data, but also they kind of give us insight into what kind of claims we can make about the world based on our data. So, these are the two main perspectives where you might use statistics in your life.

To tell you a little bit about myself, my name is Bryan Koenig. I have a PhD from New Mexico State University in Social Psychology. I am an Assistant Professor of Psychology at Southern Utah University down in Cedar City, Utah.

In this video, I will introduce the two main kinds of statistics: descriptive statistics and inferential statistics. Then I will try to clarify what those mean a little bit more by talking about populations and samples. At a very broad level, statistics are ways of making sense of numbers. There are two main kinds of statistics. One kind is called descriptive statistics. These are numbers that summarize data sets. For example, if you have the heights of all American adult women and you calculated the average, that would be a descriptive statistic that summarizes the average height of that data set. For example, the average height of American women. The other main kind of statistic is called inferential statistics. These are mathematical procedures that people have figured out that allow researchers to make decisions or inferences about populations based on sample data. For example, if you are trying to figure out how tall female students are at Southern Utah University, you could get a sample with a sample size of 30. You could measure the height of 30 SUU women, and that sample's average height, SUU women's height, their average you could use this average as your best guess, from this sample, you can use that as your best guess at the at the overall average for all SUU women in the population, even though you did not measure all those women. You just measured a subsample or a sample of the population, and that would be a way to guess at that population. So that would be an inferential statistical approach. That is called a point estimate.

Two critical ideas in statistics are a population and a sample. A population is everybody you want to know about or make a claim about. For example, I might want to know the average height of American women. In that case, the population is American women. If somehow I have a way to measure the height of every American woman, I would have done a census, just like the government does a census every few years. They are measuring everybody in the population. Most researchers, though, are unable to measure everybody that they are interested in. So we researchers get stuck working with samples. Samples are subsets of the population that are in our study. For example, we might be interested in the average height of American women, but because we are unable to access a large population of American women we might end up with a sample of 30 Southern Utah University undergraduate women. Your intuitions might be that this sample is not really representative of this population, and you would be right. The relation between the sample and the population turns out to be a really important issue, but for now i just wanted to introduce these topics of population and samples.

Now I want to try to illustrate some of these ideas that I have been talking about. Let us say we want to know the average height of American women. So, we want to know about a population. I will draw some circles or put some dots on here. Let us say that those are all of the individuals in the population. Just bear with me here. Clearly that is not all American women. And this is just all American women, and how tall they are. What we want to know is their average height. So "mu" is the symbol for a population mean, a population average. This is what we want to know. We want to know the population mean for American women's height. American women's height. But when you are doing research, you almost never are able to measure everybody in the population, to do a census, and usually you end up working with a sample. So, if these are all the people in the population, what we have is going to be a sample. Ideally, the way that you will get individuals from the population into your sample will be with random selection. It is also called random sampling. All that random sampling and random selection means is that each person in the population has the same chance as every other person in the population of being in your sample. Let us say I am randomly picking these folks, and that is how we get our sample, is each person has the same chance. And we end up having those folks coming over and being in our in our sample. Remember that a sample is just a subset of the population that are in your sample. So if this is the population, and we have got a random sample, let us say that our sample has 30 American women in it. We will say that this is 30 individuals. Then what we can do is we can measure the height of each of those 30 people that are in our study, and that average height is going to be called "X- bar." So X with a line over the top, that is a sample mean or average. There is a different symbol for the population mean compared to the sample mean. Notice that both of these are means, and therefore they are both descriptive statistics, because this is a average that is summarizing this data set, and this is an average that is summarizing this data set, although we do not actually have access to that whole data set. Remember we are interested in the population mean, but we do not have all the people in the population in our sample. We just have this this subset of the population. What we do as researchers is we use the the information in our sample and we make decisions or guesses or inferences about the population based on our sample data. This is statistical inference.

All of the inferential statistical tests we talk about in the class are going to allow us to work with our sample data and make inferences about populations based on that data. If all that you are doing is taking a sample mean and using that as your best guess at the population mean, which is reasonable if you do not have any other information about that population mean, that statistical inference process is called a "point estimate." It is a way to guess at a population parameter, a population mean in this case, given your sample data. That is pretty common, although most of the inferential tests we will talk about in this class are not quite like that. But they do use sample data to make inferences about populations. Another thing I want to point out is that you will notice that these symbols for samples, like X-bar, these are usually Roman letters, which are just regular English letters that you are most familiar with already. Most of the time the symbols for populations, like mu here, are going to be Greek letters. There are a few exceptions to this, but as a rule of thumb it is pretty much usually the case that Roman letters are sample symbols and Greek letters are population symbols.

To recap, researchers always want to know things about populations, but we almost never have access to everybody in a population. We cannot do a census, so we will get a sample, that is a subset of the population, and that is who is in our study. We will measure those folks, or do experiments with them, or whatever, and then what we will do is we will use inferential statistical processes to make inferences from samples to populations. This is the stuff that you will read about in the newspaper or in research articles as the result of this statistical inference process, which is really the main topic of this whole class.

statisticsforpsychology