This page is under development. Stay tuned!
This vignette gives an overview of how data is preprocessed by the surveygraph package, following a number of optional arguments that specify how certain data is to be handled.
We’ll start by loading surveygraph,
and assume data S
that we attempt to supply to
surveygraph.
df <- data.frame(
item1 = c(2, -99, 1, 1, 100, 5, 5, 4, 3),
item2 = c(1, 3, 1, 2, 4, 3, 4, 5, 4),
item3 = c(2, 1, 3, -99, 5, 6, 8, 4, 10)
)
df
#> item1 item2 item3
#> 1 2 1 2
#> 2 -99 3 1
#> 3 1 1 3
#> 4 1 2 -99
#> 5 100 4 5
#> 6 5 3 6
#> 7 5 4 8
#> 8 4 5 4
#> 9 3 4 10
The first this we check is that the input data S
is a
dataframe. If it’s not the program is halted and an error is output.
Future versions may attempt to coerce other formats to dataframes.
For instance, if we attempt to run the make_projection()
routine on a list, we get the following error.
make_projection(list(c(1, 2, 3)))
#> Error in make_projection(list(c(1, 2, 3))): Input data must be provided as a data frame.
Similarly, an error is output if an empty data frame is provided.
Our approach is to coerce all data to floating point types, and to set them to NA otherwise.
If columns happen to string literals of numeric data, these are
coerced to floating point numbers, otherwise they are set to
NA
.
If survey entries contain TRUE
or FALSE
,
then these are coerced to 1 and 0, respectively.
This is a flag that if set to TRUE
, dummy codes
everything that falls outside the range specified by the
likert
flag.
The likert
optional argument allows us to specify the
range of the values that we are to interpret as valid input data. The
idea is that anything that falls outside of this range is set to
NA
, or is dummy coded.
l <- data.frame(
minval = apply(df, 2, min, na.rm = TRUE),
maxval = apply(df, 2, max, na.rm = TRUE)
)
This creates the following data frame.
The idea is that by visually inspecting the limiting values for each
item, it is obvious which columns contain flags, such as
-99
and 100
in our data. As such, we might
set
# set the minimum value of items one and three to 1
l$minval[1] <- 1
l$minval[3] <- 1
# set the maximum value of item one to 10
l$maxval[1] <- 10
Following these changes, we interpret the Likert ranges to be
Now, we provide the Likert specification l
to
make_projection
to tell surveygraph how to handle the
outliers.