Title: | Functions to Make Surveys Processing Easier |
---|---|
Description: | Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions. |
Authors: | Julien Barnier [aut, cre], François Briatte [aut], Joseph Larmarange [aut] |
Maintainer: | Julien Barnier <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7.8.9000 |
Built: | 2024-10-23 04:45:28 UTC |
Source: | https://github.com/juba/questionr |
This function modifies a factor by turning NA
into an extra level
(so that NA
values are counted in tables, for instance).
This version of addNA
extends the same function provided in R by
allowing to specify a string name for the extra level (see examples).
addNAstr(x, value = "NA", ...)
addNAstr(x, value = "NA", ...)
x |
a vector of data, usually taking a small number of distinct values. |
value |
string to use for the extra level name. If NULL, the extra level is created as NA, and the result is the same as the one of the |
... |
arguments passed to |
an object of class "factor"
, original missing values being coded as an
extra level named NA
if as.string=FALSE
, "NA"
if
as.string=TRUE
, as specified by as.string
if as.string
is
a string.
Adapted from James (https://stackoverflow.com/a/5817181) by Joseph Larmarange <[email protected]>
addNA
(base).
f <- as.factor(c("a","b",NA,"a","b")) f addNAstr(f) addNAstr(f, value="missing") addNAstr(f, value=NULL)
f <- as.factor(c("a","b",NA,"a","b")) f addNAstr(f) addNAstr(f, value="missing") addNAstr(f, value=NULL)
Some fictive results from a fecondity survey.
a data frame containing one record for each child of the surveyed women in the fertility survey.
Return the raw, standardized or Pearson's residuals (the default) of a chi-squared test on a two-way frequency table.
chisq.residuals(tab, digits = 2, std = FALSE, raw = FALSE)
chisq.residuals(tab, digits = 2, std = FALSE, raw = FALSE)
tab |
frequency table |
digits |
number of digits to display |
std |
if |
raw |
if |
This function is just a wrapper around the chisq.test
base R function. See this function's help page
for details on the computation.
## Sample table data(Titanic) tab <- apply(Titanic, c(1,4), sum) ## Pearson residuals chisq.residuals(tab) ## Standardized residuals chisq.residuals(tab, std = TRUE) ## Raw residuals chisq.residuals(tab, raw = TRUE)
## Sample table data(Titanic) tab <- apply(Titanic, c(1,4), sum) ## Pearson residuals chisq.residuals(tab) ## Standardized residuals chisq.residuals(tab, std = TRUE) ## Raw residuals chisq.residuals(tab, raw = TRUE)
This function transforms its argument to HTML with knitr::kable and then copy it to the clipboard or to a file for later use in an external application.
clipcopy(obj, ...) ## Default S3 method: clipcopy( obj, append = FALSE, file = FALSE, filename = "temp.html", clipboard.size = 4096, ... ) ## S3 method for class 'proptab' clipcopy(obj, percent = NULL, digits = NULL, justify = "right", ...)
clipcopy(obj, ...) ## Default S3 method: clipcopy( obj, append = FALSE, file = FALSE, filename = "temp.html", clipboard.size = 4096, ... ) ## S3 method for class 'proptab' clipcopy(obj, percent = NULL, digits = NULL, justify = "right", ...)
obj |
object to be copied |
... |
arguments passed to |
append |
if TRUE, append to the file instead of replacing it |
file |
if TRUE, export to a file instead of the clipboard |
filename |
name of the file to export to |
clipboard.size |
under Windows, size of the clipboard in kB |
percent |
whether to add a percent sign in each cell |
digits |
number of digits to display |
justify |
justification |
Under Linux, this function requires that xclip
is
installed on the system to copy to the clipboard.
NULL
NULL
data(iris) tab <- table(cut(iris$Sepal.Length, 8), cut(iris$Sepal.Width, 4)) ## Not run: copie(tab) ## End(Not run) ptab <- rprop(tab, percent = TRUE) ## Not run: clipcopy(ptab) ## End(Not run)
data(iris) tab <- table(cut(iris$Sepal.Length, 8), cut(iris$Sepal.Width, 4)) ## Not run: copie(tab) ## End(Not run) ptab <- rprop(tab, percent = TRUE) ## Not run: clipcopy(ptab) ## End(Not run)
Return the column percentages of a two-way frequency table with formatting and printing options.
cprop(tab, ...) ## S3 method for class 'table' cprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'data.frame' cprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'matrix' cprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'tabyl' cprop(tab, digits = 1, total = TRUE, percent = FALSE, n = FALSE, ...)
cprop(tab, ...) ## S3 method for class 'table' cprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'data.frame' cprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'matrix' cprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'tabyl' cprop(tab, digits = 1, total = TRUE, percent = FALSE, n = FALSE, ...)
tab |
frequency table |
... |
parameters passed to other methods. |
digits |
number of digits to display |
total |
if |
percent |
if |
drop |
if |
n |
if |
The result is an object of class table
and proptab
.
rprop
, prop
, table
, prop.table
## Sample table data(Titanic) tab <- apply(Titanic, c(4,1), sum) ## Column percentages cprop(tab) ## Column percentages with custom display cprop(tab, digits=2, percent=TRUE, total=FALSE)
## Sample table data(Titanic) tab <- apply(Titanic, c(4,1), sum) ## Column percentages cprop(tab) ## Column percentages with custom display cprop(tab, digits=2, percent=TRUE, total=FALSE)
This function computes Cramer's V for a two-way frequency table
cramer.v(tab)
cramer.v(tab)
tab |
table on which to compute the statistic |
data(Titanic) tab <- apply(Titanic, c(4,1), sum) #' print(tab) cramer.v(tab)
data(Titanic) tab <- apply(Titanic, c(4,1), sum) #' print(tab) cramer.v(tab)
This function allows to generate a two-way frequency table from a multiple choices question and a factor. The question's answers must be stored in a series of binary variables.
cross.multi.table( df, crossvar, weights = NULL, digits = 1, freq = FALSE, tfreq = "col", n = FALSE, na.rm = TRUE, ... )
cross.multi.table( df, crossvar, weights = NULL, digits = 1, freq = FALSE, tfreq = "col", n = FALSE, na.rm = TRUE, ... )
df |
data frame with the binary variables |
crossvar |
factor to cross the multiple choices question with |
weights |
optional weighting vector |
digits |
number of digits to keep in the output |
freq |
display percentages |
tfreq |
type of percentages to compute ("row" or "col") |
n |
if |
na.rm |
Remove any NA values in |
... |
arguments passed to |
See the multi.table
help page for details on handling of the multiple
choices question and corresponding binary variables.
If freq
is set to TRUE, the resulting table gives the columns percentages
based on the contingency table of crossvar in the respondants population.
Object of class table.
multi.table
, multi.split
, table
## Sample data frame set.seed(1337) sex <- sample(c("Man","Woman"),100,replace=TRUE) jazz <- sample(c(0,1),100,replace=TRUE) rock <- sample(c(TRUE, FALSE),100,replace=TRUE) electronic <- sample(c("Y","N"),100,replace=TRUE) weights <- runif(100)*2 df <- data.frame(sex,jazz,rock,electronic,weights) ## Two-way frequency table on 'music' variables by sex cross.multi.table(df[,c("jazz", "rock","electronic")], df$sex, true.codes=list("Y")) ## Column percentages based on respondants cross.multi.table(df[,c("jazz", "rock","electronic")], df$sex, true.codes=list("Y"), freq=TRUE) ## Row percentages based on respondants cross.multi.table(df[,c("jazz", "rock","electronic")], df$sex, true.codes=list("Y"), freq=TRUE, tfreq="row", n=TRUE)
## Sample data frame set.seed(1337) sex <- sample(c("Man","Woman"),100,replace=TRUE) jazz <- sample(c(0,1),100,replace=TRUE) rock <- sample(c(TRUE, FALSE),100,replace=TRUE) electronic <- sample(c("Y","N"),100,replace=TRUE) weights <- runif(100)*2 df <- data.frame(sex,jazz,rock,electronic,weights) ## Two-way frequency table on 'music' variables by sex cross.multi.table(df[,c("jazz", "rock","electronic")], df$sex, true.codes=list("Y")) ## Column percentages based on respondants cross.multi.table(df[,c("jazz", "rock","electronic")], df$sex, true.codes=list("Y"), freq=TRUE) ## Row percentages based on respondants cross.multi.table(df[,c("jazz", "rock","electronic")], df$sex, true.codes=list("Y"), freq=TRUE, tfreq="row", n=TRUE)
This function describes the variables of a vector or a dataset that might include labels imported with haven packages.
describe(x, ...) ## S3 method for class 'factor' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'numeric' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'character' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## Default S3 method: describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'haven_labelled' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'data.frame' describe(x, ..., n = 10, freq.n.max = 0) ## S3 method for class 'description' print(x, ...)
describe(x, ...) ## S3 method for class 'factor' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'numeric' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'character' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## Default S3 method: describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'haven_labelled' describe(x, n = 10, show.length = TRUE, freq.n.max = 10, ...) ## S3 method for class 'data.frame' describe(x, ..., n = 10, freq.n.max = 0) ## S3 method for class 'description' print(x, ...)
x |
object to describe |
... |
further arguments passed to or from other methods, see details |
n |
number of first values to display |
show.length |
display length of the vector? |
freq.n.max |
display a frequency table if the number of unique values is less than this value, 0 to hide |
When describing a data.frame, you can provide variable names as character strings. Using the "*" or "|" wildcards in a variable name will search for it using a regex match. The search will also take into account variable labels, if any. See examples.
an object of class description
.
Joseph Larmarange <[email protected]>
data(hdv2003) describe(hdv2003$sexe) describe(hdv2003$age) describe(hdv2003) describe(hdv2003, "cuisine", "heures.tv") describe(hdv2003, "trav*") describe(hdv2003, "trav|lecture") describe(hdv2003, "trav", "lecture") data(fertility) describe(women$residency) describe(women) describe(women, "id")
data(hdv2003) describe(hdv2003$sexe) describe(hdv2003$age) describe(hdv2003) describe(hdv2003, "cuisine", "heures.tv") describe(hdv2003, "trav*") describe(hdv2003, "trav|lecture") describe(hdv2003, "trav", "lecture") data(fertility) describe(women$residency) describe(women) describe(women, "id")
The native duplicated function determines which elements of a vector
or data frame are duplicates of elements already observed in the vector or the
data frame provided. Therefore, only the second occurence (or third or nth)
of an element is considered as a duplicate.
duplicated2
is similar but will also mark the first occurence as a
duplicate (see examples).
duplicated2(x)
duplicated2(x)
x |
a vector, a data frame or a matrix |
A logical vector indicated wich elements are duplicated in x
.
https://forums.cirad.fr/logiciel-R/viewtopic.php?p=2968
df <- data.frame(x = c("a", "b", "c", "b", "d", "c"), y = c(1, 2, 3, 2, 4, 3)) df duplicated(df) duplicated2(df)
df <- data.frame(x = c("a", "b", "c", "b", "d", "c"), y = c(1, 2, 3, 2, 4, 3)) df duplicated(df) duplicated2(df)
Some fictive results from a fecondity survey.
a data frame containing one record for each child of the surveyed women in the fecondite survey.
Escape regex special chars Code directly taken from Hmisc::escapeRegex
escape_regex(s)
escape_regex(s)
s |
string to escape regex special chars from |
Some fictive results from a fecondity survey, with French labels.
3 data frames with labelled data (as if data would have been imported from SPSS with haven):
menages
contains some information from the households selected for the survey;
femmes
contains the questionnaire administered to all 15-49 years old women
living in the selected households;
enfants
contains one record for each child of the surveyed women.
Data can be linked using the variables id_menage
and id_femme
.
fertility for an English version of this dataset.
data(fecondite) describe(menages) describe(femmes) describe(enfants)
data(fecondite) describe(menages) describe(femmes) describe(enfants)
Some fictive results from a fecondity survey.
a data frame containing the questionnaire administered to all 15-49 years old women living in the selected households for the fecondite survey.
Some fictive results from a fecondity survey, with English labels.
3 data frames with labelled data (as if data would have been imported from SPSS with haven):
households
contains some information from the households selected for the survey;
women
contains the questionnaire administered to all 15-49 years old women
living in the selected households;
children
contains one record for each child of the surveyed women.
Data can be linked using the variables id_household
and id_woman
.
fecondite for an French version of this dataset.
data(fertility) describe(households) describe(women) describe(children)
data(fertility) describe(households) describe(women) describe(children)
Return first non-null of two values
x %||% y
x %||% y
x |
first object |
y |
second object |
Format an object of class proptab for printing depending on its attributes.
## S3 method for class 'proptab' format(x, digits = NULL, percent = NULL, justify = "right", ...)
## S3 method for class 'proptab' format(x, digits = NULL, percent = NULL, justify = "right", ...)
x |
object of class proptab |
digits |
number of digits to display |
percent |
if not NULL, add a percent sign after each value |
justify |
justification of character vectors. Passed to |
... |
other arguments to pass to |
This function is designed for internal use only.
Generate and format frequency tables from a variable or a table, with percentages and formatting options.
freq( x, digits = 1, cum = FALSE, total = FALSE, exclude = NULL, sort = "", valid = !(NA %in% exclude), levels = c("prefixed", "labels", "values"), na.last = TRUE )
freq( x, digits = 1, cum = FALSE, total = FALSE, exclude = NULL, sort = "", valid = !(NA %in% exclude), levels = c("prefixed", "labels", "values"), na.last = TRUE )
x |
either a vector to be tabulated, or a table object |
digits |
number of digits to keep for the percentages |
cum |
if TRUE, display cumulative percentages |
total |
if TRUE, add a final row with totals |
exclude |
vector of values to exclude from the tabulation (if |
sort |
if specified, allow to sort the table by increasing ("inc") or decreasing ("dec") frequencies |
valid |
if TRUE, display valid percentages |
levels |
the desired levels for the factor in case of labelled vector (labelled package must be installed): "labels" for value labels, "values" for values or "prefixed" for labels prefixed with values |
na.last |
if TRUE, NA values are always be last table row |
The result is an object of class data.frame.
# factor data(hdv2003) freq(hdv2003$qualif) freq(hdv2003$qualif, cum = TRUE, total = TRUE) freq(hdv2003$qualif, cum = TRUE, total = TRUE, sort ="dec") # labelled data data(fecondite) freq(femmes$region) freq(femmes$region, levels = "l") freq(femmes$region, levels = "v")
# factor data(hdv2003) freq(hdv2003$qualif) freq(hdv2003$qualif, cum = TRUE, total = TRUE) freq(hdv2003$qualif, cum = TRUE, total = TRUE, sort ="dec") # labelled data data(fecondite) freq(femmes$region) freq(femmes$region, levels = "l") freq(femmes$region, levels = "v")
Generate a frequency table of missing values as raw counts and percentages.
freq.na(data, ...)
freq.na(data, ...)
data |
either a vector or a data frame object |
... |
if |
The result is an object of class data.frame.
data(hdv2003) ## Examine a single vector. freq.na(hdv2003$qualif) ## Examine a data frame. freq.na(hdv2003) ## Examine several variables. freq.na(hdv2003, "nivetud", "trav.satisf") ## To see only variables with the most number of missing values head(freq.na(hdv2003))
data(hdv2003) ## Examine a single vector. freq.na(hdv2003$qualif) ## Examine a data frame. freq.na(hdv2003) ## Examine several variables. freq.na(hdv2003, "nivetud", "trav.satisf") ## To see only variables with the most number of missing values head(freq.na(hdv2003))
A function to facilitate ggplot2
graphs using a survey object.
It will initiate a ggplot and map survey weights to the
corresponding aesthetic.
ggsurvey(design = NULL, mapping = NULL, ...)
ggsurvey(design = NULL, mapping = NULL, ...)
design |
A survey design object, usually created with
|
mapping |
Default list of aesthetic mappings to use for plot,
to be created with |
... |
Other arguments passed on to methods. Not currently used. |
Graphs will be correct as long as only weights are required
to compute the graph. However, statistic or geometry requiring
correct variance computation (like
ggplot2::geom_smooth()
) will
be statistically incorrect.
if (require(survey) & require(ggplot2)) { data(api) dstrat <- svydesign( id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc ) ggsurvey(dstrat) + aes(x = cnum, y = dnum) + geom_count() d <- as.data.frame(Titanic) dw <- svydesign(ids = ~1, weights = ~Freq, data = d) ggsurvey(dw) + aes(x = Class, fill = Survived) + geom_bar(position = "fill") }
if (require(survey) & require(ggplot2)) { data(api) dstrat <- svydesign( id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc ) ggsurvey(dstrat) + aes(x = cnum, y = dnum) + geom_count() d <- as.data.frame(Titanic) dw <- svydesign(ids = ~1, weights = ~Freq, data = d) ggsurvey(dw) + aes(x = Class, fill = Survived) + geom_bar(position = "fill") }
This data extract is taken from Hadley Wickham's productplots
package.
The original description follows, with minor edits.
The data is a small sample of variables related to happiness from the General Social Survey (GSS). The GSS is a yearly cross-sectional survey of Americans, run from 1972. We combine data for 25 years to yield 51,020 observations, and of the over 5,000 variables, we select nine related to happiness:
A data frame with 51020 rows and 10 variables
age. age in years: 18–89.
degree. highest education: lt high school, high school, junior college, bachelor, graduate.
finrela. relative financial status: far above, above average, average, below average, far below.
happy. happiness: very happy, pretty happy, not too happy.
health. health: excellent, good, fair, poor.
marital. marital status: married, never married, divorced, widowed, separated.
sex. sex: female, male.
wtsall. probability weight. 0.43–6.43.
Smith, Tom W., Peter V. Marsden, Michael Hout, Jibum Kim. General Social Surveys, 1972-2006. [machine-readable data file]. Principal Investigator, Tom W. Smith; Co-Principal Investigators, Peter V. Marsden and Michael Hout, NORC ed. Chicago: National Opinion Research Center, producer, 2005; Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut, distributor. 1 data file (57,061 logical records) and 1 codebook (3,422 pp).
Sample from 2000 people and 20 variables taken from the Histoire de Vie survey, produced in France in 2003 by INSEE.
A data frame with 2000 rows and 20 variables
https://www.insee.fr/fr/statistiques/2532244
Some fictive results from a fecondity survey.
a data frame containing some information from the households selected for the fertility survey.
This function launches a shiny app in a web browser in order to do interactive conversion of a numeric variable into a categorical one.
icut(obj = NULL, var_name = NULL)
icut(obj = NULL, var_name = NULL)
obj |
vector to recode or data frame to operate on |
var_name |
if obj is a data frame, name of the column to be recoded, as a character string (possibly without quotes) |
The function launches a shiny app in the system web browser. The recoding code is returned in the console when the app is closed with the "Done" button.
## Not run: data(hdv2003) icut(hdv2003, "age") irec(hdv2003, heures.tv) ## End(Not run)
## Not run: data(hdv2003) icut(hdv2003, "age") irec(hdv2003, heures.tv) ## End(Not run)
This function launches a shiny app in a web browser in order to do interactive reordering of the levels of a categorical variable (character or factor).
iorder(obj = NULL, var_name = NULL)
iorder(obj = NULL, var_name = NULL)
obj |
vector to recode or data frame to operate on |
var_name |
if obj is a data frame, name of the column to be recoded, as a character string possibly without quotes) |
The generated convert the variable into a factor, as only those allow for levels ordering.
The function launches a shiny app in the system web browser. The reordering code is returned in he console when the app is closed with the "Done" button.
## Not run: data(hdv2003) iorder(hdv2003, "qualif") ## End(Not run)
## Not run: data(hdv2003) iorder(hdv2003, "qualif") ## End(Not run)
This function launches a shiny app in a web browser in order to do interactive recoding of a categorical variable (character or factor).
irec(obj = NULL, var_name = NULL)
irec(obj = NULL, var_name = NULL)
obj |
vector to recode or data frame to operate on |
var_name |
if obj is a data frame, name of the column to be recoded, as a character string possibly without quotes) |
The function launches a shiny app in the system web browser. The recoding code is returned in the onsole when the app is closed with the "Done" button.
## Not run: data(hdv2003) irec() v <- sample(c("Red", "Green", "Blue"), 50, replace = TRUE) irec(v) irec(hdv2003, "qualif") irec(hdv2003, sexe) ## this also works ## End(Not run)
## Not run: data(hdv2003) irec() v <- sample(c("Red", "Green", "Blue"), 50, replace = TRUE) irec(v) irec(hdv2003, "qualif") irec(hdv2003, sexe) ## this also works ## End(Not run)
This function is a wrapper around xtabs
, adding automatically
value labels for labelled vectors if labelled package eis installed.
ltabs( formula, data, levels = c("prefixed", "labels", "values"), variable_label = TRUE, ... )
ltabs( formula, data, levels = c("prefixed", "labels", "values"), variable_label = TRUE, ... )
formula |
a formula object (see |
data |
a data frame |
levels |
the desired levels in case of labelled vector: "labels" for value labels, "values" for values or "prefixed" for labels prefixed with values |
variable_label |
display variable label if available? |
... |
additional arguments passed to |
data(fecondite) ltabs(~radio, femmes) ltabs(~radio+tv, femmes) ltabs(~radio+tv, femmes, "l") ltabs(~radio+tv, femmes, "v") ltabs(~radio+tv+journal, femmes) ltabs(~radio+tv, femmes, variable_label = FALSE)
data(fecondite) ltabs(~radio, femmes) ltabs(~radio+tv, femmes) ltabs(~radio+tv, femmes, "l") ltabs(~radio+tv, femmes, "v") ltabs(~radio+tv+journal, femmes) ltabs(~radio+tv, femmes, variable_label = FALSE)
Some fictive results from a fecondity survey.
a data frame containing some information from the households selected for the fecondite survey.
Split a multiple choices variable in a series of binary variables
multi.split(var, split.char = "/", mnames = NULL)
multi.split(var, split.char = "/", mnames = NULL)
var |
variable to split |
split.char |
character to split at |
mnames |
names to give to the produced variabels. If NULL, the name are computed from the original variable name and the answers. |
This function takes as input a multiple choices variable where choices are recorded as a string and separated with a fixed character. For example, if the question is about the favourite colors, answers could be "red/blue", "red/green/yellow", etc. This function splits the variable into as many variables as the number of different choices. Each of these variables as a 1 or 0 value corresponding to the choice of this answer. They are returned as a data frame.
Returns a data frame.
v <- c("red/blue","green","red/green","blue/red") multi.split(v) ## One-way frequency table of the result multi.table(multi.split(v))
v <- c("red/blue","green","red/green","blue/red") multi.split(v) ## One-way frequency table of the result multi.table(multi.split(v))
This function allows to generate a frequency table from a multiple choices question. The question's answers must be stored in a series of binary variables.
multi.table(df, true.codes = NULL, weights = NULL, digits = 1, freq = TRUE)
multi.table(df, true.codes = NULL, weights = NULL, digits = 1, freq = TRUE)
df |
data frame with the binary variables |
true.codes |
optional list of values considered as 'true' for the tabulation |
weights |
optional weighting vector |
digits |
number of digits to keep in the output |
freq |
add a percentage column |
The function is applied to a series of binary variables, each one corresponding to a choice of the question. For example, if the question is about seen movies among a movies list, each binary variable would correspond to a movie of the list and be true or false depending of the choice of the answer.
By default, only '1' and 'TRUE' as considered as 'true' values fro the binary variables,
and counted in the frequency table. It is possible to specify other values to be counted
with the true.codes
argument. Note than '1' and 'TRUE' are always considered as
true values even if true.codes
is provided.
If freq
is set to TRUE, a percentage column is added to the resulting table. This
percentage is computed by dividing the number of TRUE answers for each value by the total
number of (potentially weighted) observations. Thus, these percentages sum can be greater
than 100.
Object of class table.
cross.multi.table
, multi.split
, table
## Sample data frame set.seed(1337) sex <- sample(c("Man","Woman"),100,replace=TRUE) jazz <- sample(c(0,1),100,replace=TRUE) rock <- sample(c(TRUE, FALSE),100,replace=TRUE) electronic <- sample(c("Y","N"),100,replace=TRUE) weights <- runif(100)*2 df <- data.frame(sex,jazz,rock,electronic,weights) ## Frequency table on 'music' variables multi.table(df[,c("jazz", "rock","electronic")], true.codes=list("Y")) ## Weighted frequency table on 'music' variables multi.table(df[,c("jazz", "rock","electronic")], true.codes=list("Y"), weights=df$weights) ## No percentages multi.table(df[,c("jazz", "rock","electronic")], true.codes=list("Y"), freq=FALSE)
## Sample data frame set.seed(1337) sex <- sample(c("Man","Woman"),100,replace=TRUE) jazz <- sample(c(0,1),100,replace=TRUE) rock <- sample(c(TRUE, FALSE),100,replace=TRUE) electronic <- sample(c("Y","N"),100,replace=TRUE) weights <- runif(100)*2 df <- data.frame(sex,jazz,rock,electronic,weights) ## Frequency table on 'music' variables multi.table(df[,c("jazz", "rock","electronic")], true.codes=list("Y")) ## Weighted frequency table on 'music' variables multi.table(df[,c("jazz", "rock","electronic")], true.codes=list("Y"), weights=df$weights) ## No percentages multi.table(df[,c("jazz", "rock","electronic")], true.codes=list("Y"), freq=FALSE)
na.rm
is similar to na.omit but allows to specify a list of
variables to take into account.
na.rm(x, v = NULL)
na.rm(x, v = NULL)
x |
a data frame |
v |
a list of variables |
If v
is not specified, the result of na.rm
will be the same as
na.omit. If a list of variables is specified through v
, only
observations with a missing value (NA
) for one of the specified
variables will be removed from x
. See examples.
Joseph Larmarange <[email protected]>
df <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z = c("a", NA, "b")) df na.omit(df) na.rm(df) na.rm(df, c("x", "y")) na.rm(df, "z")
df <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z = c("a", NA, "b")) df na.omit(df) na.rm(df) na.rm(df, c("x", "y")) na.rm(df, "z")
S3 method for odds ratio
odds.ratio(x, ...) ## S3 method for class 'glm' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'multinom' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'factor' odds.ratio(x, fac, level = 0.95, ...) ## S3 method for class 'table' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'matrix' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'numeric' odds.ratio(x, y, level = 0.95, ...) ## S3 method for class 'odds.ratio' print(x, signif.stars = TRUE, ...)
odds.ratio(x, ...) ## S3 method for class 'glm' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'multinom' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'factor' odds.ratio(x, fac, level = 0.95, ...) ## S3 method for class 'table' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'matrix' odds.ratio(x, level = 0.95, ...) ## S3 method for class 'numeric' odds.ratio(x, y, level = 0.95, ...) ## S3 method for class 'odds.ratio' print(x, signif.stars = TRUE, ...)
x |
object from whom odds ratio will be computed |
... |
further arguments passed to or from other methods |
level |
the confidence level required |
fac |
a second factor object |
y |
a second numeric object |
signif.stars |
logical; if |
For models calculated with glm
, x
should have
been calculated with family=binomial
.
p-value are the same as summary(x)$coefficients[,4]
.
Odds ratio could also be obtained with exp(coef(x))
and
confidence intervals with exp(confint(x))
.
For models calculated with multinom
(nnet),
p-value are calculated according to
https://stats.oarc.ucla.edu/r/dae/multinomial-logistic-regression/.
For 2x2 table
, factor
or matrix
, odds.ratio
uses fisher.test
to compute the odds ratio.
Returns a data.frame of class odds.ratio
with odds ratios,
their confidence interval and p-values.
If x
and y
are proportions, odds.ratio
simply
returns the value of the odds ratio, with no confidence interval.
Joseph Larmarange <[email protected]>
fisher.test
in the stats package.
printCoefmat
in the stats package.
data(hdv2003) reg <- glm(cinema ~ sexe + age, data=hdv2003, family=binomial) odds.ratio(reg) odds.ratio(hdv2003$sport, hdv2003$cuisine) odds.ratio(table(hdv2003$sport, hdv2003$cuisine)) M <- matrix(c(759, 360, 518, 363), ncol = 2) odds.ratio(M) odds.ratio(0.26, 0.42)
data(hdv2003) reg <- glm(cinema ~ sexe + age, data=hdv2003, family=binomial) odds.ratio(reg) odds.ratio(hdv2003$sport, hdv2003$cuisine) odds.ratio(table(hdv2003$sport, hdv2003$cuisine)) M <- matrix(c(759, 360, 518, 363), ncol = 2) odds.ratio(M) odds.ratio(0.26, 0.42)
Print an object of class proptab.
## S3 method for class 'proptab' print(x, digits = NULL, percent = NULL, justify = "right", ...)
## S3 method for class 'proptab' print(x, digits = NULL, percent = NULL, justify = "right", ...)
x |
object of class proptab |
digits |
number of digits to display |
percent |
if not NULL, add a percent sign after each value |
justify |
justification of character vectors. Passed to |
... |
other arguments to pass to |
Return the percentages of a two-way frequency table with formatting and printing options.
prop(tab, ...) prop_table( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'data.frame' prop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'matrix' prop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'tabyl' prop(tab, digits = 1, total = TRUE, percent = FALSE, n = FALSE, ...)
prop(tab, ...) prop_table( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'data.frame' prop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'matrix' prop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'tabyl' prop(tab, digits = 1, total = TRUE, percent = FALSE, n = FALSE, ...)
tab |
frequency table |
... |
parameters passed to other methods |
digits |
number of digits to display |
total |
if |
percent |
if |
drop |
if |
n |
if |
The result is an object of class table
and proptab
.
rprop
, cprop
, table
, prop.table
## Sample table data(Titanic) tab <- apply(Titanic, c(1,4), sum) ## Percentages prop(tab) ## Percentages with custom display prop(tab, digits=2, percent=TRUE, total=FALSE, n=TRUE)
## Sample table data(Titanic) tab <- apply(Titanic, c(1,4), sum) ## Percentages prop(tab) ## Percentages with custom display prop(tab, digits=2, percent=TRUE, total=FALSE, n=TRUE)
This function quickly loads one or more packages, installing them quietly if necessary.
qload(..., load = TRUE, silent = TRUE)
qload(..., load = TRUE, silent = TRUE)
... |
the packages to load/install. Packages are loaded with |
load |
load the packages. Set to |
silent |
keep output as silent as possible.
Defaults to |
The function probably requires R 3.0.0 or above to make use of the quiet
argument when calling install.packages
. It is not clear what the argument
previously achieved in older versions of R.
The result is a list of packages cited in the scripts.
François Briatte <[email protected]>
qscan
, install.packages
, library
qload("questionr") qload("questionr", silent = FALSE)
qload("questionr") qload("questionr", silent = FALSE)
This function scans one or more R scripts and tries to quick-load/install
the packages mentioned by library
or require
functions.
qscan(..., load = TRUE, detail = TRUE)
qscan(..., load = TRUE, detail = TRUE)
... |
the scripts to scan. Defaults to all R scripts in the current working directory. |
load |
quick-load/install the cited packages (see details).
Defaults to |
detail |
show the list of packages found in each script.
Defaults to |
The function calls the qload
function to quick-load/install the packages.
The result is a list of packages cited in the scripts.
François Briatte <[email protected]>
## Scan the working directory. ## Not run: qscan()
## Scan the working directory. ## Not run: qscan()
This function transforms a quantitative variable into a qualitative one by breaking it into classes with the same frequencies.
quant.cut(var, nbclass, include.lowest = TRUE, right = FALSE, dig.lab = 5, ...)
quant.cut(var, nbclass, include.lowest = TRUE, right = FALSE, dig.lab = 5, ...)
var |
variable to transform |
nbclass |
number of classes |
include.lowest |
argument passed to the |
right |
argument passed to the |
dig.lab |
argument passed to the |
... |
arguments passed to the |
This is just a simple wrapper around the cut
and quantile
functions.
The result is a factor.
data(iris) sepal.width3cl <- quant.cut(iris$Sepal.Width,3) table(sepal.width3cl)
data(iris) sepal.width3cl <- quant.cut(iris$Sepal.Width,3) table(sepal.width3cl)
This function recodes selected values of a quantitative or qualitative variable by matching its levels to exact or regular expression matches.
recode.na(x, ..., verbose = FALSE, regex = TRUE, as.numeric = FALSE)
recode.na(x, ..., verbose = FALSE, regex = TRUE, as.numeric = FALSE)
x |
variable to recode. The variable is coerced to a factor if necessary. |
... |
levels to recode as missing in the variable. The values are coerced to character strings, meaning that you can pass numeric values to the function. |
verbose |
print a table of missing levels before recoding them as missing. Defaults to |
regex |
use regular expressions to match values that include the "*" or "|" wildcards. Defaults to |
as.numeric |
coerce the recoded variable to |
The result is a factor with properly encoded missing values. If the recoded variable contains only numeric values, it is converted to an object of class numeric
.
François Briatte <[email protected]>
data(hdv2003) ## With exact string matches. hdv2003$nivetud = recode.na(hdv2003$nivetud, "Inconnu") ## With regular expressions. hdv2003$relig = recode.na(hdv2003$relig, "[A|a]ppartenance", "Rejet|NSP") ## Showing missing values. hdv2003$clso = recode.na(hdv2003$clso, "Ne sait pas", verbose = TRUE) ## Test results with freq. freq(recode.na(hdv2003$trav.satisf, "Equilibre")) ## Truncate a count variable (recommends numeric conversion). freq(recode.na(hdv2003$freres.soeurs, 5:22))
data(hdv2003) ## With exact string matches. hdv2003$nivetud = recode.na(hdv2003$nivetud, "Inconnu") ## With regular expressions. hdv2003$relig = recode.na(hdv2003$relig, "[A|a]ppartenance", "Rejet|NSP") ## Showing missing values. hdv2003$clso = recode.na(hdv2003$clso, "Ne sait pas", verbose = TRUE) ## Test results with freq. freq(recode.na(hdv2003$trav.satisf, "Equilibre")) ## Truncate a count variable (recommends numeric conversion). freq(recode.na(hdv2003$freres.soeurs, 5:22))
Rename a data frame column
rename.variable(df, old, new)
rename.variable(df, old, new)
df |
data frame |
old |
old name |
new |
new name |
A data frame with the column named "old" renamed as "new"
data(iris) str(iris) iris <- rename.variable(iris, "Species", "especes") str(iris)
data(iris) str(iris) iris <- rename.variable(iris, "Species", "especes") str(iris)
This function removes unused levels of a factor or in a data.frame. See examples.
rm.unused.levels(x, v = NULL)
rm.unused.levels(x, v = NULL)
x |
a factor or a data frame |
v |
a list of variables (optional, if |
If x
is a data frame, only factor variables of x
will be impacted.
If a list of variables is provided through v
, only the unused levels of the
specified variables will be removed.
Joseph Larmarange <[email protected]>
df <- data.frame(v1 = c("a", "b", "a", "b"), v2 = c("x", "x", "y", "y")) df$v1 <- factor(df$v1, c("a", "b", "c")) df$v2 <- factor(df$v2, c("x", "y", "z")) df str(df) str(rm.unused.levels(df)) str(rm.unused.levels(df, "v1"))
df <- data.frame(v1 = c("a", "b", "a", "b"), v2 = c("x", "x", "y", "y")) df$v1 <- factor(df$v1, c("a", "b", "c")) df$v2 <- factor(df$v2, c("x", "y", "z")) df str(df) str(rm.unused.levels(df)) str(rm.unused.levels(df, "v1"))
Sample from the 2012 national french census. It contains results for every french city of more than 2000 inhabitants, and a small subset of variables, both in population counts and proportions.
A data frame with 5170 rows and 60 variables
https://www.insee.fr/fr/information/2008354
Sample from the 2018 national french census. It contains results for every french city of more than 2000 inhabitants, and a small subset of variables, both in population counts and proportions.
A data frame with 5417 rows and 62 variables
https://www.insee.fr/fr/information/5369871
Return the row percentages of a two-way frequency table with formatting and printing options.
rprop(tab, ...) ## S3 method for class 'table' rprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'data.frame' rprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'matrix' rprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'tabyl' rprop(tab, digits = 1, total = TRUE, percent = FALSE, n = FALSE, ...)
rprop(tab, ...) ## S3 method for class 'table' rprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'data.frame' rprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'matrix' rprop( tab, digits = 1, total = TRUE, percent = FALSE, drop = TRUE, n = FALSE, ... ) ## S3 method for class 'tabyl' rprop(tab, digits = 1, total = TRUE, percent = FALSE, n = FALSE, ...)
tab |
frequency table |
... |
parameters passed to other methods. |
digits |
number of digits to display |
total |
if |
percent |
if |
drop |
if |
n |
if |
The result is an object of class table
and proptab
.
cprop
, prop
, table
, prop.table
## Sample table data(Titanic) tab <- apply(Titanic, c(1,4), sum) ## Column percentages rprop(tab) ## Column percentages with custom display rprop(tab, digits=2, percent=TRUE, total=FALSE)
## Sample table data(Titanic) tab <- apply(Titanic, c(1,4), sum) ## Column percentages rprop(tab) ## Column percentages with custom display rprop(tab, digits=2, percent=TRUE, total=FALSE)
Generate table with multiple weighted crossresult (full sample is first column). kable(), which is found in library(knitr), is recommended for use with RMarkdown.
tabs( df, x, y, type = "percent", percent = FALSE, weight = NULL, normwt = FALSE, na.rm = TRUE, na.show = FALSE, exclude = NULL, digits = 1 )
tabs( df, x, y, type = "percent", percent = FALSE, weight = NULL, normwt = FALSE, na.rm = TRUE, na.show = FALSE, exclude = NULL, digits = 1 )
df |
A data.frame that contains |
x |
variable name (found in |
y |
one (or more) variable names. tabs(my.data, x = 'q1', y = c('sex', 'job')). |
type |
'percent' (default ranges 0-100), 'proportion', or 'counts' (type of table returned). |
percent |
if |
weight |
variable name for weight (found in |
normwt |
if TRUE, normalize weights so that the total weighted count is the same as the unweighted one |
na.rm |
if TRUE, remove NA values before computation |
na.show |
if TRUE, show NA count in table output |
exclude |
values to remove from x and y. To exclude NA, use na.rm argument. |
digits |
Number of digits to display; ?format.proptab for formatting details. |
tabs calls wtd.table on 'x
' and, as applicable, each variable named by 'y
'.
Pete Mohanty
data(hdv2003) tabs(hdv2003, x = "relig", y = c("qualif", "trav.imp"), weight = "poids") result <- tabs(hdv2003, x = "relig", y = c("qualif", "trav.imp"), type = "counts") format(result, digits = 3) # library(knitr) # xt <- tabs(hdv2003, x = "relig", y = c("qualif", "trav.imp"), weight = "poids") # kable(format(xt)) # to use with RMarkdown...
data(hdv2003) tabs(hdv2003, x = "relig", y = c("qualif", "trav.imp"), weight = "poids") result <- tabs(hdv2003, x = "relig", y = c("qualif", "trav.imp"), type = "counts") format(result, digits = 3) # library(knitr) # xt <- tabs(hdv2003, x = "relig", y = c("qualif", "trav.imp"), weight = "poids") # kable(format(xt)) # to use with RMarkdown...
Some fictive results from a fecondity survey.
a data frame containing the questionnaire administered to all 15-49 years old women living in the selected households for the fertility survey.
Compute the weighted mean or weighted variance of a vector. Exact copies of Hmisc functions.
wtd.mean(x, weights = NULL, na.rm = TRUE)
wtd.mean(x, weights = NULL, na.rm = TRUE)
x |
Numeric data vector |
weights |
Numeric weights vector. Must be the same length as |
na.rm |
if |
If weights
is NULL
, then an uniform weighting is applied.
These functions are exact copies of the wtd.mean
and wtd.var
function from the wtd.stats package. They have been created by
Frank Harrell, Department of Biostatistics, Vanderbilt University School of
Medicine, <[email protected]>.
mean
,var
, wtd.table
and the survey
package.
data(hdv2003) mean(hdv2003$age) wtd.mean(hdv2003$age, weights=hdv2003$poids)
data(hdv2003) mean(hdv2003$age) wtd.mean(hdv2003$age, weights=hdv2003$poids)
Generate weighted frequency tables, both for one-way and two-way tables.
wtd.table( x, y = NULL, weights = NULL, digits = 3, normwt = FALSE, useNA = c("no", "ifany", "always"), na.rm = TRUE, na.show = FALSE, exclude = NULL )
wtd.table( x, y = NULL, weights = NULL, digits = 3, normwt = FALSE, useNA = c("no", "ifany", "always"), na.rm = TRUE, na.show = FALSE, exclude = NULL )
x |
a vector |
y |
another optional vector for a two-way frequency table. Must be the same length as |
weights |
vector of weights, must be the same length as |
digits |
Number of significant digits. |
normwt |
if TRUE, normalize weights so that the total weighted count is the same as the unweighted one |
useNA |
wether to include NA values in the table |
na.rm |
(deprecated) if TRUE, remove NA values before computation |
na.show |
(deprecated) if TRUE, show NA count in table output |
exclude |
values to remove from x and y. To exclude NA, use na.rm argument. |
If weights
is not provided, an uniform weghting is used.
If some weights are missing ('NA'), they are converted to zero. In case of missing weights with 'normwt=TRUE', the observations with missing weights are still counted in the unweighted count. You have to filter them out before using this function if you don't want them to be taken into account when using 'normwt'.
If y
is not provided, returns a weighted one-way frequency table
of x
. Otherwise, returns a weighted two-way frequency table of
x
and y
wtd.table
, table
, and the survey
package.
data(hdv2003) wtd.table(hdv2003$sexe, weights=hdv2003$poids) wtd.table(hdv2003$sexe, weights=hdv2003$poids, normwt=TRUE) table(hdv2003$sexe, hdv2003$hard.rock) wtd.table(hdv2003$sexe, hdv2003$hard.rock, weights=hdv2003$poids)
data(hdv2003) wtd.table(hdv2003$sexe, weights=hdv2003$poids) wtd.table(hdv2003$sexe, weights=hdv2003$poids, normwt=TRUE) table(hdv2003$sexe, hdv2003$hard.rock) wtd.table(hdv2003$sexe, hdv2003$hard.rock, weights=hdv2003$poids)