library(elicitr)
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) :
#> object 'type_sum.accel' not found
Many of the concepts introduced in
vignette("continuous_variables")
are also applicable to
categorical variables, and the name of the functions are the same but
have the prefix cat
instead of cont
. However,
there are some differences in the workflow for loading and analysing
data collected during the elicitation of categorical variables. This
vignette will guide you through the process of loading and analysing
categorical data.
Datasets
There are three datasets included in the package for demonstration
purposes: ?topic_1
, ?topic_2
, and
?topic_3
:
topic_1
#> # A tibble: 120 × 5
#> name category option confidence estimate
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Derek Maclellan category_1 option_1 15 0.08
#> 2 Derek Maclellan category_2 option_1 15 0
#> 3 Derek Maclellan category_3 option_1 15 0.85
#> 4 Derek Maclellan category_4 option_1 15 0.02
#> 5 Derek Maclellan category_5 option_1 15 0.05
#> 6 Derek Maclellan category_1 option_2 35 0.02
#> 7 Derek Maclellan category_2 option_2 35 0.11
#> 8 Derek Maclellan category_3 option_2 35 0.18
#> 9 Derek Maclellan category_4 option_2 35 0.02
#> 10 Derek Maclellan category_5 option_2 35 0.67
#> # ℹ 110 more rows
topic_2
#> # A tibble: 100 × 5
#> name category option confidence estimate
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Christopher Felix category_1 option_1 100 0.09
#> 2 Christopher Felix category_2 option_1 100 0.21
#> 3 Christopher Felix category_3 option_1 100 0.11
#> 4 Christopher Felix category_4 option_1 100 0.59
#> 5 Christopher Felix category_5 option_1 100 0
#> 6 Christopher Felix category_1 option_2 20 0.09
#> 7 Christopher Felix category_2 option_2 20 0.05
#> 8 Christopher Felix category_3 option_2 20 0.33
#> 9 Christopher Felix category_4 option_2 20 0.24
#> 10 Christopher Felix category_5 option_2 20 0.29
#> # ℹ 90 more rows
topic_3
#> # A tibble: 90 × 5
#> name category option confidence estimate
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Derek Maclellan category_1 option_1 80 0.02
#> 2 Derek Maclellan category_2 option_1 80 0.02
#> 3 Derek Maclellan category_3 option_1 80 0.01
#> 4 Derek Maclellan category_4 option_1 80 0.87
#> 5 Derek Maclellan category_5 option_1 80 0.08
#> 6 Derek Maclellan category_1 option_2 50 0.11
#> 7 Derek Maclellan category_2 option_2 50 0.09
#> 8 Derek Maclellan category_3 option_2 50 0.17
#> 9 Derek Maclellan category_4 option_2 50 0.09
#> 10 Derek Maclellan category_5 option_2 50 0.54
#> # ℹ 80 more rows
In each dataset the first column contains the name of the expert and the second the categories of the categorical variable. Each category can have different options, saved in column three. The fourth column contains the expert’s confidence, and the fifth the expert’s estimate.
Load data
We start by creating the ?elic_cat
object with the
function cat_start()
. As for the continuous variables, this
objects stores the metadata of the elicitation process:
my_categories <- c("category_1", "category_2", "category_3",
"category_4", "category_5")
my_options <- c("option_1", "option_2", "option_3", "option_4")
my_topics <- c("topic_1", "topic_2", "topic_3")
my_elicitation <- cat_start(categories = my_categories,
options = my_options,
experts = 6,
topics = my_topics)
#> ✔ <elic_cat> object for "Elicitation" correctly initialised
my_elicitation
#>
#> ── Elicitation ──
#>
#> • Categories: "category_1", "category_2", "category_3", "category_4", and
#> "category_5"
#> • Options: "option_1", "option_2", "option_3", and "option_4"
#> • Number of experts: 6
#> • Topics: "topic_1", "topic_2", and "topic_3"
#> • Data available for 0 topics
This elicitation process is for a categorical variables with 5 categories estimated for four options and three topics by six experts.
Similarly as we did for continuous variables, we can load the data
with the function cat_load()
:
my_elicitation <- cat_add_data(my_elicitation,
data_source = topic_1,
topic = "topic_1") |>
cat_add_data(data_source = topic_2, topic = "topic_2") |>
cat_add_data(data_source = topic_3, topic = "topic_3")
#> ✔ Data added to Topic "topic_1" from "data.frame"
#> ✔ Data added to Topic "topic_2" from "data.frame"
#> ✔ Data added to Topic "topic_3" from "data.frame"
Again, metadata are used to validate the data. If the data is not consistent with the metadata, an error message will be displayed. For example, if we try to load data with a category not defined in the metadata:
malformed_data <- topic_1
malformed_data[1, 2] <- "category_6"
cat_add_data(my_elicitation,
data_source = malformed_data,
topic = "topic_1")
#> Error in `cat_add_data()`:
#> ! The column with the name of the categories contains unexpected values:
#> ✖ The value "category_6" is not valid.
#> ℹ Check the metadata in the <elic_cat> object.
Get data
Data can be retrieved from the elic_cat
object with the
cat_get_data()
function:
cat_get_data(my_elicitation, topic = "topic_1")
#> # A tibble: 120 × 5
#> id category option confidence estimate
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 5ac97e0 category_1 option_1 15 0.08
#> 2 5ac97e0 category_2 option_1 15 0
#> 3 5ac97e0 category_3 option_1 15 0.85
#> 4 5ac97e0 category_4 option_1 15 0.02
#> 5 5ac97e0 category_5 option_1 15 0.05
#> 6 5ac97e0 category_1 option_2 35 0.02
#> 7 5ac97e0 category_2 option_2 35 0.11
#> 8 5ac97e0 category_3 option_2 35 0.18
#> 9 5ac97e0 category_4 option_2 35 0.02
#> 10 5ac97e0 category_5 option_2 35 0.67
#> # ℹ 110 more rows
Notice that the name of the expert has been anonymised and assigned
to the column id
. Data can be retrieved only for given
options:
cat_get_data(my_elicitation, topic = "topic_2", option = "option_1")
#> # A tibble: 25 × 5
#> id category option confidence estimate
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 e51202e category_1 option_1 100 0.09
#> 2 e51202e category_2 option_1 100 0.21
#> 3 e51202e category_3 option_1 100 0.11
#> 4 e51202e category_4 option_1 100 0.59
#> 5 e51202e category_5 option_1 100 0
#> 6 e78cbf4 category_1 option_1 75 0.31
#> 7 e78cbf4 category_2 option_1 75 0.27
#> 8 e78cbf4 category_3 option_1 75 0.09
#> 9 e78cbf4 category_4 option_1 75 0.17
#> 10 e78cbf4 category_5 option_1 75 0.16
#> # ℹ 15 more rows
Data analysis
Contrary to continuous variables, there is not yet a function for plotting the raw data. However, we can plot the distribution of the sampled data.
Sample data
Data can be sampled using the function cat_sample()
(see
the variable documentation for the explanation of the sampling methods).
Here we sample 100 values for each option:
samp <- cat_sample_data(my_elicitation,
method = "basic",
topic = "topic_1",
n_votes = 100)
#> ✔ Data sampled successfully using "basic" method.
samp
#> # A tibble: 2,400 × 7
#> id option category_1 category_2 category_3 category_4 category_5
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5ac97e0 option_1 4.13e-14 0 0.996 5.48e-107 3.79e- 3
#> 2 5ac97e0 option_1 9.12e- 1 0 0.0881 8.28e- 20 6.40e- 8
#> 3 5ac97e0 option_1 4.57e- 4 0 1.00 6.44e- 15 8.17e-14
#> 4 5ac97e0 option_1 6.03e- 2 0 0.940 3.20e- 5 1.45e- 9
#> 5 5ac97e0 option_1 9.50e- 3 0 0.988 1.73e- 3 3.37e- 4
#> 6 5ac97e0 option_1 3.31e- 3 0 0.997 6.76e- 18 1.95e-31
#> 7 5ac97e0 option_1 4.81e- 6 0 1.00 1.28e- 14 2.11e-12
#> 8 5ac97e0 option_1 1.73e- 4 0 0.998 9.17e- 15 2.22e- 3
#> 9 5ac97e0 option_1 1.15e-19 0 1.00 5.78e- 15 4.62e- 6
#> 10 5ac97e0 option_1 2.59e- 6 0 1.00 1.36e- 30 1.20e- 8
#> # ℹ 2,390 more rows
Sampled data can be summarised for any option:
summary(samp, option = "option_1")
#> # A tibble: 5 × 7
#> Category Min Q1 Median Mean Q3 Max
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 category_1 7.88e- 29 1.63e- 5 7.04e- 3 0.132 0.142 0.999
#> 2 category_2 0 1.25e-73 4.31e-18 0.0498 0.000236 0.974
#> 3 category_3 2.44e- 21 2.65e- 3 1.35e- 1 0.343 0.701 1.00
#> 4 category_4 2.69e-234 1.44e-22 4.07e- 4 0.153 0.167 0.999
#> 5 category_5 3.74e- 49 5.44e- 4 1.21e- 1 0.323 0.650 1.00
And plotted as violin plot:
plot(samp)