R Tutorial : Overview of the Measure Development Process

preview_player
Показать описание

---

Now that you know how to conduct a single-factor EFA, you're ready to start thinking about how it fits into the process of developing a measure.

When you're creating a measure of an unobservable variable or variables, you typically want to follow this process. Steps 1, 2, and 4 are theoretical and don't involve any coding so we won't spend much time on those.

When you have determined what construct you want to measure, the first step is to write questions, or items, for your measure. Always write more items than you think you'll need since they probably won't all perform as well as you'd like.

After you've got enough items developed, the next step is to collect pilot data from a representative sample to test the measure. These data will be used to examine your measure and see how it is functioning before you use it for real. You can think of the gcbs dataset as pilot data for these 15 items.

Once you've got pilot data, the first step is to check out what the dataset looks like.

To get a sense for your data, use the describe() function to see basic information about each of the items in the dataset.
If you look at the first row of output, you can see in the column called n that item Q1 has 2,495 responses, a mean of 3.47, and a standard deviation of 1.46.
You'll notice that the minimum and maximum are 0 and 5 for these items because they've been scored on a Likert scale.

Now that you know some basic features of your dataset, you have to consider which analyses you want to run. If you want to both develop and confirm a theory of how items are related to underlying factors, you'll want to use both EFA and CFA on your dataset. In that case, you'll need to split the dataset - step 5 here.

To do that, create a set of random indices, then use them to divide the dataset.

First, you'll determine the number of rows, N, in the dataset and set up a sequence from 1 to N. Next, you'll use the sample() function to select half of those numbers at random, then assign them to an object called indices_EFA. The other half of the numbers in the sequence are assigned to another object called indices_CFA.

Once those sets of indices are established, you'll use them to create two datasets: one for your EFA, and one for your CFA. By creating a theory with half of the data, then testing it on the other half, you'll avoid overfitting your model or falsely confirming your theory.

Now that your sample is split into random halves, you'll want to make sure the two halves are similar - step 6 in this process. If the halves aren't similar, they aren't good representations of the population and aren't appropriate to evaluate your measure.

The psych package provides some convenience functions to examine a dataset according to a grouping variable. To accomplish this, create a grouping variable from the indices you created to split the data.

The grouping variable can then be bound onto the gcbs dataset as a new column using the cbind() function.

This grouping variable provides the information needed by describeBy() and statsBy(), which you can use to view some key summary statistics. Watch out though - while the group argument of describeBy() has to be a vector, the group argument of statsBy() has to be the name of a column in your dataframe.

Let's see these functions in action.

#DataCamp #RTutorial #FactorAnalysisinR
Рекомендации по теме