While the Introductory vignette gives an overview of core labelr functions, here, we offer an ad hoc dive into a range of miscellaneous special topics and additional functionalities. Let’s go.
labelr is not intended for “large” data.frames, which is a fuzzy concept. To give a sense of what labelr can handle, let’s see it in action with the NYC Flights 2013 data set: a moderate-not-big data.frame of ~340K rows.
Let’s load labelr and the nycflights13 package.
We’ll assign the data.frame to one we call df.
We’ll add a “frame label,” which describes the data.frame overall.
df <- add_frame_lab(df, frame.lab = "On-time data for all flights that
departed NYC (i.e. JFK, LGA or EWR) in 2013.")
### > Warning in as_base_data_frame(data):
### > data argument object coerced from augmented to conventional (Base R) data.frame.
Note that the source data.frame (nycflights13::flights
)
is a tibble. The labelr package coerces augmented data.frames, such as
tibbles and data.tables, into “pure” Base R data.frames – and alerts you
that it has done so. The intent is to avoid the dependencies, errors, or
inconsistent and unpredictable behaviors that might result from labelr
trying to integrate with or make sense of these or other competing,
alternative data.frame constructs, which (a) by design behave
differently from standard R data.frames in various subtle or
not-so-subtle ways and which (b) may continue to evolve in the
future.
Let’s see what this did.
attr(df, "frame.lab") # check for attribute
### > [1] "On-time data for all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013."
get_frame_lab(df) # return frame.lab alongside data.frame name as a data.frame
### > data.frame
### > 1 df
### > frame.lab
### > 1 On-time data for all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013.
get_frame_lab(df)$frame.lab
### > [1] "On-time data for all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013."
Now, let’s assign variable NAME labels.
names_labs_vec <- c(
"year" = "Year of departure",
"month" = "Month of departure",
"year" = "Day of departure",
"dep_time" = "Actual departure time (format HHMM or HMM), local tz",
"arr_time" = "Actual arrival time (format HHMM or HMM), local tz",
"sched_dep_time" = "Scheduled departure times (format HHMM or HMM)",
"sched_arr_time" = "Scheduled arrival time (format HHMM or HMM)",
"dep_delay" = "Departure delays, in minutes",
"arr_delay" = "Arrival delays, in minutes",
"carrier" = "Two letter airline carrier abbreviation",
"flight" = "Flight number",
"tailnum" = "Plane tail number",
"origin" = "Flight origin airport code",
"dest" = "Flight destination airport code",
"air_time" = "Minutes spent in the air",
"distance" = "Miles between airports",
"hour" = "Hour of scheduled departure time",
"minute" = "Minutes component of scheduled departure time",
"time_hour" = "Scheduled date and hour of the flight as a POSIXct date"
)
df <- add_name_labs(df, name.labs = names_labs_vec)
get_name_labs(df) # show that they've been added
### > var lab
### > 1 year Day of departure
### > 2 month Month of departure
### > 3 day day
### > 4 dep_time Actual departure time (format HHMM or HMM), local tz
### > 5 sched_dep_time Scheduled departure times (format HHMM or HMM)
### > 6 dep_delay Departure delays, in minutes
### > 7 arr_time Actual arrival time (format HHMM or HMM), local tz
### > 8 sched_arr_time Scheduled arrival time (format HHMM or HMM)
### > 9 arr_delay Arrival delays, in minutes
### > 10 carrier Two letter airline carrier abbreviation
### > 11 flight Flight number
### > 12 tailnum Plane tail number
### > 13 origin Flight origin airport code
### > 14 dest Flight destination airport code
### > 15 air_time Minutes spent in the air
### > 16 distance Miles between airports
### > 17 hour Hour of scheduled departure time
### > 18 minute Minutes component of scheduled departure time
### > 19 time_hour Scheduled date and hour of the flight as a POSIXct date
Let’s add variable VALUE labels for variable “carrier.” Helpfully, a mapping of airlines’ carrier codes to their full names ships with the nycflights13 package itself.
airlines <- nycflights13::airlines
head(airlines)
### > # A tibble: 6 × 2
### > carrier name
### > <chr> <chr>
### > 1 9E Endeavor Air Inc.
### > 2 AA American Airlines Inc.
### > 3 AS Alaska Airlines Inc.
### > 4 B6 JetBlue Airways
### > 5 DL Delta Air Lines Inc.
### > 6 EV ExpressJet Airlines Inc.
The carrier field of airlines matches the carrier column of df (formerly, flights)
The name field of airlines gives us the full airline names.
Let’s use these vectors to add value labels to df. We’ll demo
add_val1()
, which accepts only one variable but allows you
to pass its name unquoted.
df <- add_val1(df,
var = carrier, vals = ny_val,
labs = ny_lab,
max.unique.vals = 20
)
### > Warning in add_val1(df, var = carrier, vals = ny_val, labs = ny_lab, max.unique.vals = 20):
### >
### > Note: labelr is not optimized for data.frames this large.
(Side note on warnings: The package issues the first in what will become a series of potentially annoying warnings that you are applying value labels to a larger data.frame than labelr was built to handle. There is a reason that this is a warning, not an error: labelr will work on larger data.frames until it doesn’t, which is to say that the burdens of computational intensiveness will become a drag on speed and R’s in-session memory capacity. In the present case, labelr handles the data.frame just fine, but things take a little longer, and labelr seizes most opportunities to remind you that you’re making it work overtime.)
Okay, back to the value-labeling. Our data.frame also has a month
variable, expressed in integer terms (e.g., 1 indicates January, 9
indicates September). We will “hand-jam” month value labels,using
add_val_labs()
. This command is equivalent to
add_val1()
, except that it requires variable names to be
quoted but allows you to supply more than one of them at a time (i.e.,
you can supply a character vector of variable names). In this case,
we’ll use it on just one variable.
First, we’ll create our vectors of unique values and labels.
ny_month_vals <- c(1:12) # values
ny_month_labs <- c(
"JAN", "FEB", "MAR", "APR", "MAY", "JUN",
"JUL", "AUG", "SEP", "OCT", "NOV", "DEC"
) # labels
Note that order is important here: We need to supply exactly as many
values as value labels, with each value label being uniquely associated
with the value that shares its index. For example, in the above case,
ny_month_vals[3]
(here, 3
) is associated with
the ny_month_labs[3] (here, "MAR"
)).
Now, let’s use these two vectors to add value labels for the variable “month”.
df <- add_val_labs(df,
vars = "month",
vals = ny_month_vals,
labs = ny_month_labs,
max.unique.vals = 20
)
### > Warning in add_val_labs(df, vars = "month", vals = ny_month_vals, labs = ny_month_labs, :
### >
### > Note: labelr is not optimized for data.frames this large.
Finally, we’ll use add_quant_labs()
to provide numerical
range value labels for five quintiles of the variable “dep_time.”
df <- add_quant_labs(df, "dep_time", qtiles = 5)
### > Warning in add_quant_labs(df, "dep_time", qtiles = 5):
### >
### > Note: labelr is not optimized for data.frames this large.
Let’s see where these value-labeling operations have left us.
get_val_labs(df)
### > var vals labs
### > 1 month 1 JAN
### > 2 month 2 FEB
### > 3 month 3 MAR
### > 4 month 4 APR
### > 5 month 5 MAY
### > 6 month 6 JUN
### > 7 month 7 JUL
### > 8 month 8 AUG
### > 9 month 9 SEP
### > 10 month 10 OCT
### > 11 month 11 NOV
### > 12 month 12 DEC
### > 13 month NA NA
### > 14 dep_time 827 q020
### > 15 dep_time 1200 q040
### > 16 dep_time 1536 q060
### > 17 dep_time 1830 q080
### > 18 dep_time 2400 q100
### > 19 dep_time NA NA
### > 20 carrier 9E Endeavor Air Inc.
### > 21 carrier AA American Airlines Inc.
### > 22 carrier AS Alaska Airlines Inc.
### > 23 carrier B6 JetBlue Airways
### > 24 carrier DL Delta Air Lines Inc.
### > 25 carrier EV ExpressJet Airlines Inc.
### > 26 carrier F9 Frontier Airlines Inc.
### > 27 carrier FL AirTran Airways Corporation
### > 28 carrier HA Hawaiian Airlines Inc.
### > 29 carrier MQ Envoy Air
### > 30 carrier OO SkyWest Airlines Inc.
### > 31 carrier UA United Air Lines Inc.
### > 32 carrier US US Airways Inc.
### > 33 carrier VX Virgin America
### > 34 carrier WN Southwest Airlines Co.
### > 35 carrier YV Mesa Airlines Inc.
### > 36 carrier NA NA
We can use head()
to get a baseline look at select rows
and variables
head(df[c("origin", "dep_time", "dest", "year", "month", "carrier")])
### > origin dep_time dest year month carrier
### > 1 EWR 517 IAH 2013 1 UA
### > 2 LGA 533 IAH 2013 1 UA
### > 3 JFK 542 MIA 2013 1 AA
### > 4 JFK 544 BQN 2013 1 B6
### > 5 LGA 554 ATL 2013 1 DL
### > 6 EWR 554 ORD 2013 1 UA
Now, let’s do the same for a version of df that we’ve modified with
use_val_labs()
, which converts all values of value-labeled
variables to their corresponding labels.
df_swapd <- use_val_labs(df)
### > Warning in use_val_labs(df):
### > Note: labelr is not optimized for data.frames this large.
head(df_swapd[c("origin", "dep_time", "dest", "year", "month", "carrier")])
### > origin dep_time dest year month carrier
### > 1 EWR q020 IAH 2013 JAN United Air Lines Inc.
### > 2 LGA q020 IAH 2013 JAN United Air Lines Inc.
### > 3 JFK q020 MIA 2013 JAN American Airlines Inc.
### > 4 JFK q020 BQN 2013 JAN JetBlue Airways
### > 5 LGA q020 ATL 2013 JAN Delta Air Lines Inc.
### > 6 EWR q020 ORD 2013 JAN United Air Lines Inc.
Instead of replacing values using use_val_labs()
–
something we can’t directly undo – it might be safer to simply add
“value-labels-on” character variables to the data.frame, while
preserving the parent variables. This adds nearly 1M new cells to our df
(!), but let’s throw caution to the wind with
add_lab_cols()
.
df_plus <- add_lab_cols(df, vars = c("carrier", "month", "dep_time"))
### > Warning in add_lab_cols(df, vars = c("carrier", "month", "dep_time")):
### >
### > Note: labelr is not optimized for data.frames this large.
head(df_plus[c(
"origin", "dest", "year",
"month", "month_lab",
"dep_time", "dep_time_lab",
"carrier", "carrier_lab"
)])
### > origin dest year month month_lab dep_time dep_time_lab carrier
### > 1 EWR IAH 2013 1 JAN 517 q020 UA
### > 2 LGA IAH 2013 1 JAN 533 q020 UA
### > 3 JFK MIA 2013 1 JAN 542 q020 AA
### > 4 JFK BQN 2013 1 JAN 544 q020 B6
### > 5 LGA ATL 2013 1 JAN 554 q020 DL
### > 6 EWR ORD 2013 1 JAN 554 q020 UA
### > carrier_lab
### > 1 United Air Lines Inc.
### > 2 United Air Lines Inc.
### > 3 American Airlines Inc.
### > 4 JetBlue Airways
### > 5 Delta Air Lines Inc.
### > 6 United Air Lines Inc.
We can use flab()
to filter df based on month and
carrier, even when value labels are “invisible” (i.e., existing only as
attributes() meta-data.
# labels are not visible (they exist only as attributes() meta-data)
head(df[c("carrier", "arr_delay")])
### > carrier arr_delay
### > 1 UA 11
### > 2 UA 20
### > 3 AA 33
### > 4 B6 -18
### > 5 DL -25
### > 6 UA 12
# we still can use them to filter (note: we're filtering on "JetBlue Airways",
# ...NOT its obscure code "B6")
df_fl <- flab(df, carrier == "JetBlue Airways" & arr_delay > 20)
### > Warning in use_val_labs(data):
### > Note: labelr is not optimized for data.frames this large.
# here's what's returned when we filtered on "JetBlue Airways" using flab()
head(df_fl[c("carrier", "arr_delay")])
### > carrier arr_delay
### > 70 B6 44
### > 129 B6 24
### > 174 B6 40
### > 203 B6 42
### > 292 B6 29
### > 314 B6 38
# double-check that this is JetBlue
head(use_val_labs(df_fl)[c("carrier", "arr_delay")])
### > carrier arr_delay
### > 70 JetBlue Airways 44
### > 129 JetBlue Airways 24
### > 174 JetBlue Airways 40
### > 203 JetBlue Airways 42
### > 292 JetBlue Airways 29
### > 314 JetBlue Airways 38
How long did this entire NYC Flights session take (results will vary)?
labelr is not a fan of NA values or other “irregular” values, which are defined as infinite values, not-a-number values, and character values that look like them (e.g., “NAN”, “INF”, “inf”, “Na”).
When value-labeling a column / variable, such values are
automatically given the catch-all label “NA” (which will be converted to
an actual NA in any columns created by add_lab_cols()
or
use_val_labs()
). You do not need (and should not try) to
specify this yourself, and you should not try to over-ride labelr on
this. If you want to use labelr AND you present with these sorts of
values, your options are to accept the default “NA” label or convert
these sorts of values to something else before labeling.
With that said, let’s see how labelr handles this, with an assist from our old friend mtcars (packaged with R’s base distribution).
First, let’s assign mtcars to a new data.frame object that we will besmirch.
Let’s get on with the besmirching.
mtbad[1, 1:11] <- NA
rownames(mtbad)[1] <- "Missing Car"
mtbad[2, "am"] <- Inf
mtbad[3, "gear"] <- -Inf
mtbad[5, "carb"] <- NaN
mtbad[2, "mpg"] <- Inf
mtbad[3, "mpg"] <- NaN
# add a character variable, for demonstration purposes
# if it makes you feel better, you can pretend these are Consumer Reports or
# ...JD Power ratings or something
set.seed(9202) # for reproducibility
mtbad$grade <- sample(c("A", "B", "C"), nrow(mtbad), replace = TRUE)
mtbad[4, "grade"] <- NA
mtbad[5, "grade"] <- "NA"
mtbad[6, "grade"] <- "Inf"
# see where this leaves us
head(mtbad)
### > mpg cyl disp hp drat wt qsec vs am gear carb grade
### > Missing Car NA NA NA NA NA NA NA NA NA NA NA B
### > Mazda RX4 Wag Inf 6 160 110 3.90 2.875 17.02 0 Inf 4 4 C
### > Datsun 710 NaN 4 108 93 3.85 2.320 18.61 1 1 -Inf 1 C
### > Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 <NA>
### > Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 NaN NA
### > Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Inf
sapply(mtbad, class)
### > mpg cyl disp hp drat wt
### > "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
### > qsec vs am gear carb grade
### > "numeric" "numeric" "numeric" "numeric" "numeric" "character"
Now, let’s add value labels to this unruly data.frame.
mtlabs <- mtbad |>
add_val1(grade,
vals = c("A", "B", "C"),
labs = c("Gold", "Silver", "Bronze")
) |>
add_val1(am,
vals = c(0, 1),
labs = c("auto", "stick")
) |>
add_val1(carb,
vals = c(1, 2, 3, 4, 6, 8), # not the most inspired use of labels
labs = c(
"1c", "2c", "3c",
"4c", "6c", "8c"
)
) |>
add_val1(gear,
vals = 3:5, # again, not the most compelling use case
labs = c(
"3-speed",
"4-speed",
"5-speed"
)
) |>
add_quant1(mpg, qtiles = 4) # add quartile-based value labels
get_val_labs(mtlabs, "am") # NA values were detected and dealt with
### > var vals labs
### > 6 am 0 auto
### > 7 am 1 stick
### > 8 am NA NA
Let’s streamline the data.frame with sselect()
to make
it more manageable.
mtless <- sselect(mtlabs, mpg, cyl, am, gear, carb, grade) # safely select
head(mtless, 5) # note that the irregular values are still here
### > mpg cyl am gear carb grade
### > Missing Car NA NA NA NA NA B
### > Mazda RX4 Wag Inf 6 Inf 4 4 C
### > Datsun 710 NaN 4 1 -Inf 1 C
### > Hornet 4 Drive 21.4 6 0 3 1 <NA>
### > Hornet Sportabout 18.7 8 0 3 NaN NA
Notice how all irregular values are coerced to NA when we substitute
labels for values with use_val_labs()
.
head(use_val_labs(mtless), 5) # but they all go to NA if we `use_val_labs`
### > mpg cyl am gear carb grade
### > Missing Car <NA> NA <NA> <NA> <NA> Silver
### > Mazda RX4 Wag <NA> 6 <NA> 4-speed 4c Bronze
### > Datsun 710 <NA> 4 stick <NA> 1c Bronze
### > Hornet 4 Drive q075 6 auto 3-speed 1c <NA>
### > Hornet Sportabout q050 8 auto 3-speed <NA> <NA>
Now, let’s try an add_lab_cols()
view.
mtlabs_plus <- add_lab_cols(mtlabs, c("mpg", "am")) # creates, adds "am_lab" col
mtlabs_plus <- sselect(mtlabs_plus, mpg, mpg_lab, am, am_lab) # select cols
head(mtlabs_plus) # where we landed
### > mpg mpg_lab am am_lab
### > Missing Car NA <NA> NA <NA>
### > Mazda RX4 Wag Inf <NA> Inf <NA>
### > Datsun 710 NaN <NA> 1 stick
### > Hornet 4 Drive 21.4 q075 0 auto
### > Hornet Sportabout 18.7 q050 0 auto
### > Valiant 18.1 q050 0 auto
What if we had tried to explicitly label the NA values and/or irregular values themselves? We would have failed.
# Trying to Label an Irregular Value (-Inf)
mtbad <- add_val1(
data = mtcars,
var = gear,
vals = -Inf,
labs = c("neg.inf")
)
### > Error in add_val1(data = mtcars, var = gear, vals = -Inf, labs = c("neg.inf")):
### > Cannot supply NA, NaN, Inf, or character variants as a val or lab arg.
### > These are handled automatically.
# Trying to Label an Irregular Value (NA)
mtbad <- add_val_labs(
data = mtbad,
vars = "grade",
vals = NA,
labs = c("miss")
)
### > Error in add_val_labs(data = mtbad, vars = "grade", vals = NA, labs = c("miss")):
### > Cannot supply NA, NaN, Inf, or character variants as a val or lab arg.
### > These are handled automatically.
# Trying to Label an Irregular Value (NaN)
mtbad <- add_val_labs(
data = mtbad,
vars = "carb",
vals = NaN,
labs = c("nan-v")
)
### > Error in add_val_labs(data = mtbad, vars = "carb", vals = NaN, labs = c("nan-v")):
### > Cannot supply NA, NaN, Inf, or character variants as a val or lab arg.
### > These are handled automatically.
# labelr also treats "character variants" of irregular values as irregular values.
mtbad <- add_val1(
data = mtbad,
var = carb,
vals = "NAN",
labs = c("nan-v")
)
### > Error in add_val1(data = mtbad, var = carb, vals = "NAN", labs = c("nan-v")):
### > Cannot supply NA, NaN, Inf, or character variants as a val or lab arg.
### > These are handled automatically.
Again, labelr handles NA and irregular values and resists our efforts to take such matters into our own hands.
R’s concept of a factor variable shares some affinities with the concept of a value-labeled variable and can be viewed as one approach to value labeling. However, factors can manifest idiosyncratic and surprising behaviors depending on the function to which you’re trying to apply them. They are character-like, but they are not character values. They are built on top of integers, but they won’t submit to all of the operations that integers do. They do some very handy things in certain model-fitting applications, but their behavior “under the hood” can be counter-intuitive or opaque. Simply put, they are their own thing.
So, while factors have their purposes, it would be nice to associate value labels with the distinct values of data.frame variables in a manner that preserves the integrity and transparency of the underlying values (factors tend to be a bit opaque about this) and that allows you to view or use the labels in flexible ways.
And if you wanted to work with a factor, it would be nice if you could add value labels to it without it ceasing to exist and behave as a factor.
With that said, let’s see if we can have our label-factor cake and eat it, too, using the iris data.frame that comes pre-packaged with R.
unique(iris$Species)
### > [1] setosa versicolor virginica
### > Levels: setosa versicolor virginica
sapply(iris, class) # nothing up our sleeve -- "Species" is a factor
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > "numeric" "numeric" "numeric" "numeric" "factor"
Let’s add value labels to “Species” and assign the result to a new data.frame that we’ll call irlab. For our value labels, we’ll use “se”, “ve”, and “vi”, which are not adding much new information, but they will help to illustrate what we can do with labelr and a factor variable.
irlab <- add_val_labs(iris,
vars = "Species",
vals = c("setosa", "versicolor", "virginica"),
labs = c("se", "ve", "vi")
)
# this also would've worked
# irlab_dos <- add_val1(iris, Species,
# vals = c("setosa", "versicolor", "virginica"),
# labs = c("se", "ve", "vi")
# )
Note that we could have just as (or even more) easily used
add_val1()
, which works for a single variable at a time and
allows us to avoid quoting our column name, if that matters to us. In
contrast, add_val_labs()
requires us to put our variable
name(s) in quotes, but it also gives us the option to apply a common
value-label scheme to several variables at once (e.g., Likert-style
survey responses). We’ll see an example of this type of use case in
action in a little bit.
For now, though, let’s prove that the iris and irlab data.frames are functionally identical.
First, note that irlab looks and acts just like iris in the usual ways that matter
summary(iris)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width
### > Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
### > 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
### > Median :5.800 Median :3.000 Median :4.350 Median :1.300
### > Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
### > 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
### > Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
### > Species
### > setosa :50
### > versicolor:50
### > virginica :50
### >
### >
### >
summary(irlab)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width
### > Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
### > 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
### > Median :5.800 Median :3.000 Median :4.350 Median :1.300
### > Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
### > 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
### > Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
### > Species
### > setosa :50
### > versicolor:50
### > virginica :50
### >
### >
### >
head(iris, 4)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > 1 5.1 3.5 1.4 0.2 setosa
### > 2 4.9 3.0 1.4 0.2 setosa
### > 3 4.7 3.2 1.3 0.2 setosa
### > 4 4.6 3.1 1.5 0.2 setosa
head(irlab, 4)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > 1 5.1 3.5 1.4 0.2 setosa
### > 2 4.9 3.0 1.4 0.2 setosa
### > 3 4.7 3.2 1.3 0.2 setosa
### > 4 4.6 3.1 1.5 0.2 setosa
lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
### >
### > Call:
### > lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
### >
### > Coefficients:
### > (Intercept) Sepal.Width Speciesversicolor Speciesvirginica
### > 2.2514 0.8036 1.4587 1.9468
lm(Sepal.Length ~ Sepal.Width + Species, data = irlab) # values are same
### >
### > Call:
### > lm(formula = Sepal.Length ~ Sepal.Width + Species, data = irlab)
### >
### > Coefficients:
### > (Intercept) Sepal.Width Speciesversicolor Speciesvirginica
### > 2.2514 0.8036 1.4587 1.9468
Note also that irlab’s “Species” is still a factor, just like its iris counterpart/parent.
sapply(irlab, class)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > "numeric" "numeric" "numeric" "numeric" "factor"
levels(irlab$Species)
### > [1] "setosa" "versicolor" "virginica"
But irlab’s “Species” has value labels!
get_val_labs(irlab, "Species")
### > var vals labs
### > 1 Species setosa se
### > 2 Species versicolor ve
### > 3 Species virginica vi
### > 4 Species NA NA
And they work.
head(use_val_labs(irlab))
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > 1 5.1 3.5 1.4 0.2 se
### > 2 4.9 3.0 1.4 0.2 se
### > 3 4.7 3.2 1.3 0.2 se
### > 4 4.6 3.1 1.5 0.2 se
### > 5 5.0 3.6 1.4 0.2 se
### > 6 5.4 3.9 1.7 0.4 se
ir_v <- flab(irlab, Species == "vi")
head(ir_v, 5)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > 101 6.3 3.3 6.0 2.5 virginica
### > 102 5.8 2.7 5.1 1.9 virginica
### > 103 7.1 3.0 5.9 2.1 virginica
### > 104 6.3 2.9 5.6 1.8 virginica
### > 105 6.5 3.0 5.8 2.2 virginica
Our take-aways so far? Factors can be value-labeled while staying factors, and we can use the labels to do labelr-y things with those factors. We can have both.
We may want to go further and add the labeled variable alongside the factor version.
This gives us a new variable called “Species_lab”. Let’s get select rows of the resulting data.frame, since we want to see all the different species.
set.seed(231)
sample_rows <- sample(seq_len(nrow(irlab)), 10, replace = FALSE)
irlab_aug[sample_rows, ]
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species_lab
### > 7 4.6 3.4 1.4 0.3 setosa se
### > 91 5.5 2.6 4.4 1.2 versicolor ve
### > 41 5.0 3.5 1.3 0.3 setosa se
### > 133 6.4 2.8 5.6 2.2 virginica vi
### > 130 7.2 3.0 5.8 1.6 virginica vi
### > 19 5.7 3.8 1.7 0.3 setosa se
### > 104 6.3 2.9 5.6 1.8 virginica vi
### > 43 4.4 3.2 1.3 0.2 setosa se
### > 8 5.0 3.4 1.5 0.2 setosa se
### > 68 5.8 2.7 4.1 1.0 versicolor ve
sapply(irlab_aug, class)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species_lab
### > "numeric" "numeric" "numeric" "numeric" "factor" "character"
with(irlab_aug, table(Species, Species_lab))
### > Species_lab
### > Species se ve vi
### > setosa 50 0 0
### > versicolor 0 50 0
### > virginica 0 0 50
Caution: Replacing the entire data.frame using
use_val_labs()
WILL coerce factors to character, since the
value labels are character values, not recognized factor levels
ir_char <- use_val_labs(irlab) # we assign this to a new data.frame
sapply(ir_char, class)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > "numeric" "numeric" "numeric" "numeric" "character"
head(ir_char, 3)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > 1 5.1 3.5 1.4 0.2 se
### > 2 4.9 3.0 1.4 0.2 se
### > 3 4.7 3.2 1.3 0.2 se
class(ir_char$Species) # it's character
### > [1] "character"
Of course, even then, we could explicitly coerce the labels to be factors if we wanted
ir_fact <- use_val_labs(irlab)
ir_fact$Species <- factor(ir_char$Species,
levels = c("se", "ve", "vi"),
labels = c("se", "ve", "vi")
)
head(ir_fact, 3)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species
### > 1 5.1 3.5 1.4 0.2 se
### > 2 4.9 3.0 1.4 0.2 se
### > 3 4.7 3.2 1.3 0.2 se
class(ir_fact$Species) # it's a factor
### > [1] "factor"
levels(ir_fact$Species) # it's a factor
### > [1] "se" "ve" "vi"
We’ve recovered.
Value labels work with ordered factors, too. Let’s make a fictional ordered factor that we add to ir_ord. We can pretend that this is some sort of judge’s overall quality rating, if that helps.
ir_ord <- iris
set.seed(293)
qrating <- c("AAA", "AA", "A", "BBB", "AA", "BBB", "A")
ir_ord$qrat <- sample(qrating, 150, replace = TRUE)
ir_ord$qrat <- factor(ir_ord$qrat,
ordered = TRUE,
levels = c("AAA", "AA", "A", "BBB")
)
Where do we stand with this factor?
Now, let’s add value labels to it.
ir_ord <- add_val_labs(ir_ord,
vars = "qrat",
vals = c("AAA", "AA", "A", "BBB"),
labs = c(
"unimpeachable",
"excellent",
"very good",
"meh"
)
)
Let’s add a separate column with those labels as a distinct (character) variable unto itself, existing in addition to (not replacing) “qrat”.
ir_ord <- add_lab_cols(ir_ord, vars = "qrat")
head(ir_ord, 10)
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species qrat qrat_lab
### > 1 5.1 3.5 1.4 0.2 setosa AA excellent
### > 2 4.9 3.0 1.4 0.2 setosa AA excellent
### > 3 4.7 3.2 1.3 0.2 setosa AA excellent
### > 4 4.6 3.1 1.5 0.2 setosa AAA unimpeachable
### > 5 5.0 3.6 1.4 0.2 setosa AA excellent
### > 6 5.4 3.9 1.7 0.4 setosa BBB meh
### > 7 4.6 3.4 1.4 0.3 setosa AAA unimpeachable
### > 8 5.0 3.4 1.5 0.2 setosa AA excellent
### > 9 4.4 2.9 1.4 0.2 setosa A very good
### > 10 4.9 3.1 1.5 0.1 setosa A very good
with(ir_ord, table(qrat_lab, qrat))
### > qrat
### > qrat_lab AAA AA A BBB
### > excellent 0 49 0 0
### > meh 0 0 0 43
### > unimpeachable 11 0 0 0
### > very good 0 0 47 0
class(ir_ord$qrat)
### > [1] "ordered" "factor"
levels(ir_ord$qrat)
### > [1] "AAA" "AA" "A" "BBB"
class(ir_ord$qrat_lab)
### > [1] "character"
get_val_labs(ir_ord, "qrat") # labs are still there for qrat
### > var vals labs
### > 1 qrat A very good
### > 2 qrat AA excellent
### > 3 qrat AAA unimpeachable
### > 4 qrat BBB meh
### > 5 qrat NA NA
get_val_labs(ir_ord, "qrat_lab") # no labs here; this is just a character var
### > Warning in get_val_labs(ir_ord, "qrat_lab"):
### >
### > No val.labs found.
### > [1] var vals labs
### > <0 rows> (or 0-length row.names)
labelr offers some additional facilities for working with factors and
categorical variables. For example, functions
add_lab_dummies()
(alias ald()
) and
add_lab_dumm1()
(alias ald1()
) will generate
and assign a dummy (aka binary aka indicator) variable for each unique
value label of a value-labeled variable – factor or otherwise.
Alternatively, lab_int_to_factor()
(alias
int2f()
) allows you to convert a value-labeled integer
variable (or other non-decimal-having numeric column) to a factor, while
factor_to_lab_int()
(alias f2int()
) allows you
to convert a factor to a value-labeled integer variable. Note that the
latter is NOT a straightforward “undo” for the former:
the resulting unique integer values and their ordering may differ, as we
demonstrate.
First, let’s convert a factor to a value-labeled integer.
class(iris[["Species"]])
### > [1] "factor"
iris_df <- factor_to_lab_int(iris, Species)
class(iris_df[["Species"]])
### > [1] "integer"
head(iris_df$Species)
### > [1] 1 1 1 1 1 1
get_val_labs(iris_df, "Species")
### > var vals labs
### > 1 Species 1 setosa
### > 2 Species 2 versicolor
### > 3 Species 3 virginica
### > 4 Species NA NA
Now, let’s value-label an integer and convert it to a factor. Note
that our variable is not a strict as.integer()
integer, but
it’s a numeric variable with no decimal values, and that’s good enough
for lab_int_to_factor()
.
carb_orig <- mtcars
carb_orig <- add_val_labs(
data = mtcars,
vars = "carb",
vals = c(1, 2, 3, 4, 6, 8),
labs = c(
"1c", "2c", # a tad silly, but these value labels will demo the principle
"3c", "4c",
"6c", "8c"
)
)
# carb as labeled numeric
is.integer(carb_orig$carb) # note: carb not technically an "as.integer()" integer
### > [1] FALSE
class(carb_orig$carb) # but it IS numeric
### > [1] "numeric"
has_decv(carb_orig$carb) # and does NOT have decimals; so, lab_int_to_fac() works
### > [1] FALSE
levels(carb_orig$carb) # none, not a factor
### > NULL
head(carb_orig$carb, 3)
### > [1] 4 4 1
mean(carb_orig$carb) # compare to carb_to_int (below)
### > [1] 2.8125
lm(mpg ~ carb, data = carb_orig) # compare to carb_to_int (below)
### >
### > Call:
### > lm(formula = mpg ~ carb, data = carb_orig)
### >
### > Coefficients:
### > (Intercept) carb
### > 25.872 -2.056
# note this for comparison to below
(adj_r2_orig <- summary(lm(mpg ~ carb, data = carb_orig))$adj.r.squared)
### > [1] 0.2803024
# compare to counterparts below
AIC(lm(mpg ~ carb, data = carb_orig))
### > [1] 199.1807
# Make carb a factor
carb_fac <- lab_int_to_factor(carb_orig, carb) # alias int2f() also works
class(carb_fac$carb) # now it's a factor
### > [1] "factor"
levels(carb_fac$carb) # like any good factor, it has levels
### > [1] "1c" "2c" "3c" "4c" "6c" "8c"
head(carb_fac$carb, 3)
### > [1] 4c 4c 1c
### > Levels: 1c 2c 3c 4c 6c 8c
lm(mpg ~ carb, data = carb_fac) # again: carb is a factor
### >
### > Call:
### > lm(formula = mpg ~ carb, data = carb_fac)
### >
### > Coefficients:
### > (Intercept) carb2c carb3c carb4c carb6c carb8c
### > 25.343 -2.943 -9.043 -9.553 -5.643 -10.343
# compare these model fit stats to counterparts above and below
(adj_r2_fac <- summary(lm(mpg ~ carb, data = carb_fac))$adj.r.squared)
### > [1] 0.3377081
# compare to counterparts above and below
AIC(lm(mpg ~ carb, data = carb_fac))
### > [1] 199.9415
Note that we can use factor_to_lab_int()
to convert
“carb” from a factor to a labeled integer variable. However, this is not
a straightforward “undo” of what we just did: the resulting labeled
integer won’t be identical to the “carb” column of mtcars that we
started with, because factor_to_lab_int()
converts the
supplied factor variable’s values to sequentially ordered integers (from
1 to k, where k is the number of unique factor levels), ordered in terms
of the levels of the factor variable being converted.
# ??"back"?? to integer? Not quite. Compare below to carb_orig above
carb_to_int <- factor_to_lab_int(carb_fac, carb) # alias f2int() also works
class(carb_to_int$carb) # Is an integer
### > [1] "integer"
levels(carb_to_int$carb) # NOT a factor
### > NULL
mean(carb_to_int$carb) # NOT the same as carb_orig
### > [1] 2.71875
identical(carb_to_int$carb, carb_orig$carb) # really!
### > [1] FALSE
lm(mpg ~ carb, data = carb_to_int) # NOT the same as carb_orig
### >
### > Call:
### > lm(formula = mpg ~ carb, data = carb_to_int)
### >
### > Coefficients:
### > (Intercept) carb
### > 27.330 -2.663
# Compare to counterpart calls from earlier iterations of carb (above)
(adj_r2_int <- summary(lm(mpg ~ carb, data = carb_to_int))$adj.r.squared)
### > [1] 0.3470751
AIC(lm(mpg ~ carb, data = carb_to_int))
### > [1] 196.0649
Now, let’s quickly demo add_lab_dummies()
. To do so,
we’ll revisit the “Species” column of irlab, our factor variable from
iris that we value-labeled a few moments ago. It’s still here and still
has value labels.
get_val_labs(irlab, "Species")
### > var vals labs
### > 1 Species setosa se
### > 2 Species versicolor ve
### > 3 Species virginica vi
### > 4 Species NA NA
Let’s use add_lab_dummies()
to create a dummy variable
for each of its labels.
irl_dumm <- add_lab_dummies(irlab, "Species")
head(irl_dumm) # they're there!
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species species_1 species_2
### > 1 5.1 3.5 1.4 0.2 setosa 1 0
### > 2 4.9 3.0 1.4 0.2 setosa 1 0
### > 3 4.7 3.2 1.3 0.2 setosa 1 0
### > 4 4.6 3.1 1.5 0.2 setosa 1 0
### > 5 5.0 3.6 1.4 0.2 setosa 1 0
### > 6 5.4 3.9 1.7 0.4 setosa 1 0
### > species_3
### > 1 0
### > 2 0
### > 3 0
### > 4 0
### > 5 0
### > 6 0
tail(irl_dumm) # again, they're there!
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species species_1
### > 145 6.7 3.3 5.7 2.5 virginica 0
### > 146 6.7 3.0 5.2 2.3 virginica 0
### > 147 6.3 2.5 5.0 1.9 virginica 0
### > 148 6.5 3.0 5.2 2.0 virginica 0
### > 149 6.2 3.4 5.4 2.3 virginica 0
### > 150 5.9 3.0 5.1 1.8 virginica 0
### > species_2 species_3
### > 145 0 1
### > 146 0 1
### > 147 0 1
### > 148 0 1
### > 149 0 1
### > 150 0 1
We can use add_lab_dumm1()
to achieve the same result
without quoting the column name. The countervailing advantage of
add_lab_dummies()
is that it lets you create dummy
variables for more than one value-labeled variable at a time
(add_lab_dumm1()
does not).
irl_dumm2 <- add_lab_dumm1(irlab, Species)
head(irl_dumm2) # again, they're there!
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species species_1 species_2
### > 1 5.1 3.5 1.4 0.2 setosa 1 0
### > 2 4.9 3.0 1.4 0.2 setosa 1 0
### > 3 4.7 3.2 1.3 0.2 setosa 1 0
### > 4 4.6 3.1 1.5 0.2 setosa 1 0
### > 5 5.0 3.6 1.4 0.2 setosa 1 0
### > 6 5.4 3.9 1.7 0.4 setosa 1 0
### > species_3
### > 1 0
### > 2 0
### > 3 0
### > 4 0
### > 5 0
### > 6 0
tail(irl_dumm2) # again, they're there!
### > Sepal.Length Sepal.Width Petal.Length Petal.Width Species species_1
### > 145 6.7 3.3 5.7 2.5 virginica 0
### > 146 6.7 3.0 5.2 2.3 virginica 0
### > 147 6.3 2.5 5.0 1.9 virginica 0
### > 148 6.5 3.0 5.2 2.0 virginica 0
### > 149 6.2 3.4 5.4 2.3 virginica 0
### > 150 5.9 3.0 5.1 1.8 virginica 0
### > species_2 species_3
### > 145 0 1
### > 146 0 1
### > 147 0 1
### > 148 0 1
### > 149 0 1
### > 150 0 1
Functions for adding value labels (e.g., add_val_labs
,
add_quant_labs
and add_m1_lab
) will do partial
matching if the partial argument is set to TRUE. Let’s use labelr’s
make_likert_data()
function to generate some fake Likert
scale-style survey data to demonstrate this more fully.
set.seed(272) # for reproducibility
dflik <- make_likert_data(scale = 1:7) # another labelr function
head(dflik)
### > id x1 x2 x3 x4 x5 y1 y2 y3 y4 y5
### > U-1 1 5 7 2 2 2 7 1 1 4 2
### > O-2 2 6 2 7 6 2 3 5 4 1 4
### > H-3 3 7 7 5 5 6 6 4 1 5 7
### > Z-4 4 4 5 5 4 5 6 3 7 3 4
### > C-5 5 3 3 3 1 6 2 7 6 3 5
### > P-6 6 7 3 5 3 7 5 7 1 6 2
We’ll put the values we wish to label and the labels we wish to use
in stand-alone vectors, which we will supply to
add_val_labs
in a moment.
Now, let’s associate/apply the value labels to ALL vars with “x” in their name and also to var “y3.” Note: partial = TRUE.
dflik <- add_val_labs(
data = dflik, vars = c("x", "y3"), ### note the vars args
vals = vals2label,
labs = labs2use,
partial = TRUE # applying to all cols with "x" or "y3" substring in names
)
Let’s compare dflik with value labels present but “off” to labels “on.”
First, present but “off.”
head(dflik)
### > id x1 x2 x3 x4 x5 y1 y2 y3 y4 y5
### > U-1 1 5 7 2 2 2 7 1 1 4 2
### > O-2 2 6 2 7 6 2 3 5 4 1 4
### > H-3 3 7 7 5 5 6 6 4 1 5 7
### > Z-4 4 4 5 5 4 5 6 3 7 3 4
### > C-5 5 3 3 3 1 6 2 7 6 3 5
### > P-6 6 7 3 5 3 7 5 7 1 6 2
Now, let’s “turn on” (use) these value labels.
lik1 <- uvl(dflik) # assign to new object, since we can't "undo"
head(lik1) # we could have skipped previous call by using labelr::headl(dflik)
### > id x1 x2 x3 x4 x5 y1 y2 y3 y4 y5
### > U-1 1 A VSA SD SD SD 7 1 VSD 4 2
### > O-2 2 SA SD VSA SA SD 3 5 N 1 4
### > H-3 3 VSA VSA A A SA 6 4 VSD 5 7
### > Z-4 4 N A A N A 6 3 VSA 3 4
### > C-5 5 D D D VSD SA 2 7 SA 3 5
### > P-6 6 VSA D A D VSA 5 7 VSD 6 2
Yea, verily: All variables with “x” in their name (and “y3”) got the labels!
Suppose we want to drop these value labels for a select few, but not
all, of these variables. drop_val_labs
can get the job
done.
Most of our previously labeled columns remain so; but not “x2” and “y3.”
get_val_labs(dfdrop, c("x2", "y3"))
### > Warning in get_val_labs(dfdrop, c("x2", "y3")):
### >
### > No val.labs found.
### > [1] var vals labs
### > <0 rows> (or 0-length row.names)
Compare to values for variable “x1” (we did not drop value labels from this one)
get_val_labs(dfdrop, "x1")
### > var vals labs
### > 1 x1 1 VSD
### > 2 x1 2 SD
### > 3 x1 3 D
### > 4 x1 4 N
### > 5 x1 5 A
### > 6 x1 6 SA
### > 7 x1 7 VSA
### > 8 x1 NA NA
Just like we did with add_val_labs()
, we also can use a
single command to drop value labels from all variables with “x” in their
variable names.
“y3” still has value labels, but now all “x” var value labels are gone.
get_val_labs(dfxgone)
### > var vals labs
### > 1 y3 1 VSD
### > 2 y3 2 SD
### > 3 y3 3 D
### > 4 y3 4 N
### > 5 y3 5 A
### > 6 y3 6 SA
### > 7 y3 7 VSA
### > 8 y3 NA NA
tabl()
Finally, let’s get to know labelr’s tabl()
function,
which supports count or proportion tabulations with labels turned “on”
or “off” and offers some other functionalities.
set.seed(4847) # for reproducibility
df <- make_demo_data(n = 1000) # make a fictional n = 1000 data set
df <- add_val1(df, # data.frame
var = raceth, # var to label, unquoted since this is add_val1()
vals = c(1:7), # label values 1 through 7, inclusive
labs = c(
"White", "Black", "Hispanic", # ordered labels for sequential vals 1-7
"Asian", "AIAN", "Multi", "Other"
)
)
df <- add_val1(
data = df,
var = gender,
vals = c(0, 1, 2, 3, 4), # the values to be labeled
labs = c("M", "F", "TR", "NB", "Diff-Term"), # labs order should reflect vals order
max.unique.vals = 10
)
# label values of var "x1" according to quantile ranges
df <- add_quant1(
data = df,
var = x1, # apply quantile range value labels to this var
qtiles = 3 # first, second, and third tertiles
)
# apply many-vals-get-one-label labels to "edu" (note vals 3-5 all get same lab)
df <- add_m1_lab(df, "edu", vals = c(3:5), lab = "Some College+")
df <- add_m1_lab(df, "edu", vals = 1, lab = "Not HS Grad")
df <- add_m1_lab(df, "edu", vals = 2, lab = "HSG, No College")
# show value labels
get_val_labs(df)
### > var vals labs
### > 1 gender 0 M
### > 2 gender 1 F
### > 3 gender 2 TR
### > 4 gender 3 NB
### > 5 gender 4 Diff-Term
### > 6 gender NA NA
### > 7 raceth 1 White
### > 8 raceth 2 Black
### > 9 raceth 3 Hispanic
### > 10 raceth 4 Asian
### > 11 raceth 5 AIAN
### > 12 raceth 6 Multi
### > 13 raceth 7 Other
### > 14 raceth NA NA
### > 15 edu 1 Not HS Grad
### > 16 edu 2 HSG, No College
### > 17 edu 3 Some College+
### > 18 edu 4 Some College+
### > 19 edu 5 Some College+
### > 20 edu NA NA
### > 21 x1 92.63 q033
### > 22 x1 108.83 q067
### > 23 x1 157.06 q100
### > 24 x1 NA NA
With tabl()
, tables can be generated…
…in terms of values
tabl(df, vars = "gender", labs.on = FALSE)
### > gender n
### > 1 1 460
### > 2 0 431
### > 3 3 56
### > 4 2 40
### > 5 4 13
…or in terms of labels
tabl(df, vars = "gender", labs.on = TRUE) # labs.on = TRUE is the default
### > gender n
### > 1 F 460
### > 2 M 431
### > 3 NB 56
### > 4 TR 40
### > 5 Diff-Term 13
…in proportions
tabl(df, vars = c("gender", "edu"), prop.digits = 3)
### > gender edu n
### > 1 F Some College+ 0.307
### > 2 M Some College+ 0.296
### > 3 F HSG, No College 0.139
### > 4 M HSG, No College 0.124
### > 5 NB Some College+ 0.032
### > 6 TR Some College+ 0.031
### > 7 NB HSG, No College 0.024
### > 8 F Not HS Grad 0.014
### > 9 Diff-Term Some College+ 0.011
### > 10 M Not HS Grad 0.011
### > 11 TR HSG, No College 0.007
### > 12 Diff-Term HSG, No College 0.002
### > 13 TR Not HS Grad 0.002
### > 14 Diff-Term Not HS Grad 0.000
### > 15 NB Not HS Grad 0.000
…cross-tab style
head(tabl(df, vars = c("raceth", "edu"), wide.col = "gender"), 20)
### > raceth edu F M NB TR Diff-Term
### > 1 Multi Some College+ 55 49 1 5 1
### > 2 AIAN Some College+ 53 44 3 2 2
### > 3 White Some College+ 40 50 3 8 1
### > 4 Hispanic Some College+ 49 35 2 2 2
### > 5 Black Some College+ 35 44 9 7 1
### > 6 Asian Some College+ 41 42 7 2 1
### > 7 Other Some College+ 34 32 7 5 3
### > 8 Hispanic HSG, No College 19 26 3 0 1
### > 9 Multi HSG, No College 26 17 2 0 0
### > 10 White HSG, No College 23 22 3 2 0
### > 11 AIAN HSG, No College 20 18 3 0 0
### > 12 Other HSG, No College 20 15 4 3 0
### > 13 Asian HSG, No College 17 11 4 0 1
### > 14 Black HSG, No College 14 15 5 2 0
### > 15 Asian Not HS Grad 5 1 0 0 0
### > 16 Other Not HS Grad 2 4 0 1 0
### > 17 Black Not HS Grad 3 0 0 0 0
### > 18 Multi Not HS Grad 1 3 0 0 0
### > 19 AIAN Not HS Grad 1 2 0 1 0
### > 20 Hispanic Not HS Grad 1 1 0 0 0
…with non-value-labeled data.frames
tabl(iris, "Species") # explicit vars arg with one-var ("Species")
### > Species n
### > 1 setosa 50
### > 2 versicolor 50
### > 3 virginica 50
# many-valued numeric vars automatically converted to quantile categories
tabl(mtcars, c("am", "gear", "cyl", "disp", "mpg"),
qtiles = 4, zero.rm = TRUE
)
### > am gear cyl disp mpg n
### > 1 0 3 8 q100 q025 5
### > 2 1 4 4 q025 q100 4
### > 3 0 3 8 q075 q050 3
### > 4 0 3 8 q075 q025 2
### > 5 0 3 8 q100 q050 2
### > 6 0 4 6 q050 q050 2
### > 7 1 4 6 q050 q075 2
### > 8 1 5 4 q025 q100 2
### > 9 0 3 4 q025 q075 1
### > 10 0 3 6 q075 q050 1
### > 11 0 3 6 q075 q075 1
### > 12 0 4 4 q050 q075 1
### > 13 0 4 4 q050 q100 1
### > 14 1 4 4 q025 q075 1
### > 15 1 4 4 q050 q075 1
### > 16 1 5 6 q050 q075 1
### > 17 1 5 8 q075 q025 1
### > 18 1 5 8 q100 q050 1
This is the suitably abrupt ending to our choppy, ad hoc overview of some additional labelr capabilities and special topics. Thanks for reading.