Title: | Blocked Weighted Bootstrap |
---|---|
Description: | The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys. |
Authors: | Mark Myatt [aut, cph], Ernest Guevarra [aut, cre, cph] |
Maintainer: | Ernest Guevarra <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0.9000 |
Built: | 2025-01-16 10:27:33 UTC |
Source: | https://github.com/rapidsurveys/bbw |
This set of functions is an alternative to the bootBW()
function. This set
attempts to make the blocked weighted bootstrap algorithm more efficient
through vectorisation and use of parallelisation techniques. The function
syntax has been kept consistent with bootBW()
for ease of transition. A
more in depth discussion of the efficiencies gained from this alternative
function is discussed here.
boot_bw( x, w, statistic, params, outputColumns = params, replicates = 400, strata = NULL, parallel = FALSE, cores = parallelly::availableCores(omit = 1) ) boot_bw_parallel( x, w, statistic, params, outputColumns = params, replicates = 400, strata = NULL, cores = parallelly::availableCores(omit = 1) ) boot_bw_sequential( x, w, statistic, params, outputColumns = params, replicates = 400, strata = NULL ) boot_bw_weight(w) boot_bw_sample_clusters(x, w, index = FALSE) boot_bw_sample_within_clusters(cluster_df)
boot_bw( x, w, statistic, params, outputColumns = params, replicates = 400, strata = NULL, parallel = FALSE, cores = parallelly::availableCores(omit = 1) ) boot_bw_parallel( x, w, statistic, params, outputColumns = params, replicates = 400, strata = NULL, cores = parallelly::availableCores(omit = 1) ) boot_bw_sequential( x, w, statistic, params, outputColumns = params, replicates = 400, strata = NULL ) boot_bw_weight(w) boot_bw_sample_clusters(x, w, index = FALSE) boot_bw_sample_within_clusters(cluster_df)
x |
A |
w |
A |
statistic |
Am estimator function operating on variables in |
params |
Parameters specified as names of columns in |
outputColumns |
Names to be used for columns in output |
replicates |
Number of bootstrap replicates to be performed. Default is 400. |
strata |
A character value for name of variable in |
parallel |
Logical. Should resampling be done in parallel? Default to FALSE. |
cores |
The number of computer cores to use or number of child processes to be run simultaneously. Default to one less than the available number of cores on current machine. |
index |
Logical. Should index values be returned or a list of
|
cluster_df |
A list of |
For boot_bw()
, a data.frame()
with number of columns equal to
length of outputColumns
; number of rows equal to number of replicates
;
and, names of variables equal to values of outputColumns
. For
boot_bw_weight()
, A data.frame()
based on w
with two additional
variables for weight
and cumWeight
. For boot_bw_sample_clusters()
,
either a vector of integers corresponding to the primary sampling unit
(psu) identifier of the selected clusters (when index = TRUE
) or a list
of data.frame()
s corresponding to the data for the selected clusters
(when index = FALSE
). For boot_bw_sample_within_clusters()
, a matrix
similar in structure to x
of resampled data from each selected cluster.
boot_bw( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", replicates = 9, parallel = TRUE )
boot_bw( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", replicates = 9, parallel = TRUE )
Estimate median and confidence intervals from bootstrap replicates
boot_bw_estimate(boot_df)
boot_bw_estimate(boot_df)
boot_df |
A |
A data.frame()
with rows equal to the number of columns of
boot_df
and 4 columns for indicator, estimate,
95% lower confidence limit, and 95% upper confidence limit.
boot_df <- boot_bw( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", parallel = TRUE, replicates = 9 ) boot_bw_estimate(boot_df)
boot_df <- boot_bw( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", parallel = TRUE, replicates = 9 ) boot_bw_estimate(boot_df)
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population proportional sampling or PPS as used in SMART surveys) or posterior weighting (e.g. as used in RAM and S3M surveys).
bootBW(x, w, statistic, params, outputColumns = params, replicates = 400)
bootBW(x, w, statistic, params, outputColumns = params, replicates = 400)
x |
A |
w |
A |
statistic |
Am estimator function operating on variables in |
params |
Parameters specified as names of columns in |
outputColumns |
Names to be used for columns in output |
replicates |
Number of bootstrap replicates to be performed. Default is 400. |
A data.frame()
with:
number of columns equal to length of outputColumns
;
number of rows equal to number of replicates
; and,'
names equal to outputColumns
.'
# Example call to bootBW function using RAM-OP test data: bootBW( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", outputColumns = "anc1", replicates = 9 ) # Example estimate with 95% CI: #quantile(bootP, probs = c(0.500, 0.025, 0.975), na.rm = TRUE)
# Example call to bootBW function using RAM-OP test data: bootBW( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", outputColumns = "anc1", replicates = 9 ) # Example estimate with 95% CI: #quantile(bootP, probs = c(0.500, 0.025, 0.975), na.rm = TRUE)
Simple proportion statistics function for bootstrap estimation
bootClassic(x, params)
bootClassic(x, params)
x |
A data frame with primary sampling unit (PSU) in column named
|
params |
A vector of column names corresponding to the binary variables
of interest contained in |
A numeric vector of the mean of each binary variable of interest with
length equal to length(params)
# Example call to bootClassic function sampled_clusters <- boot_bw_sample_clusters( x = indicatorsHH, w = boot_bw_weight(villageData) ) boot <- boot_bw_sample_within_clusters(sampled_clusters) bootClassic(boot, "anc1")
# Example call to bootClassic function sampled_clusters <- boot_bw_sample_clusters( x = indicatorsHH, w = boot_bw_weight(villageData) ) boot <- boot_bw_sample_within_clusters(sampled_clusters) bootClassic(boot, "anc1")
PROBIT statistics function for bootstrap estimation
bootPROBIT(x, params, threshold = THRESHOLD)
bootPROBIT(x, params, threshold = THRESHOLD)
x |
A data frame with primary sampling unit (PSU) in column named
|
params |
A vector of column names corresponding to the continuous
variables of interest contained in |
threshold |
cut-off value for continuous variable to differentiate case and non-case |
A numeric vector of the PROBIT estimate of each continuous variable
of interest with length equal to length(params)
# Example call to bootBW function: sampled_clusters <- boot_bw_sample_clusters( x = indicatorsCH1, w = boot_bw_weight(villageData) ) boot <- boot_bw_sample_within_clusters(sampled_clusters) bootPROBIT(x = boot, params = "muac1", threshold = 115)
# Example call to bootBW function: sampled_clusters <- boot_bw_sample_clusters( x = indicatorsCH1, w = boot_bw_weight(villageData) ) boot <- boot_bw_sample_within_clusters(sampled_clusters) bootPROBIT(x = boot, params = "muac1", threshold = 115)
Post-stratification analysis
estimate_total(est_df, pop_df, strata)
estimate_total(est_df, pop_df, strata)
est_df |
A |
pop_df |
A |
strata |
A character value of the variable name in |
A vector of values for the overall estimate, overall 95% lower
confidence limit, and overall 95% upper confidence limit for each of the
strata
in est_df
.
est_df <- boot_bw( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", strata = "region", replicates = 9, parallel = TRUE ) |> boot_bw_estimate() ## Add population ---- pop_df <- somalia_population |> subset(select = c(region, total)) names(pop_df) <- c("strata", "pop") estimate_total(est_df, pop_df, strata = "region")
est_df <- boot_bw( x = indicatorsHH, w = villageData, statistic = bootClassic, params = "anc1", strata = "region", replicates = 9, parallel = TRUE ) |> boot_bw_estimate() ## Add population ---- pop_df <- somalia_population |> subset(select = c(region, total)) names(pop_df) <- c("strata", "pop") estimate_total(est_df, pop_df, strata = "region")
Child indicators on morbidity, health service coverage and anthropometry calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.
indicatorsCH1
indicatorsCH1
A data frame with 16 columns and 3090 rows.
Variable | Description |
region |
Region in Somalia from which the cluster belongs to |
district |
District in Somalia from which the cluster belongs to |
psu |
The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
mID |
The mother identifier |
cID |
The child identifier |
ch1 |
Diarrhoea in the past 2 weeks (0/1) |
ch2 |
Fever in the past 2 weeks (0/1) |
ch3 |
Cough in the past 2 weeks (0/1) |
ch4 |
Immunisation card (0/1) |
ch5 |
BCG immunisation (0/1) |
ch6 |
Vitamin A coverage in the past month (0/1) |
ch7 |
Anti-helminth coverage in the past month (0/1) |
sex |
Sex of child |
muac1 |
Mid-upper arm circumference in mm |
muac2 |
Mid-upper arm circumference in mm |
oedema |
Oedema (0/1) |
Mother and child health and nutrition survey in 3 regions of Somalia
indicatorsCH1
indicatorsCH1
Infant and young child feeding indicators using the infant and child feeding index (ICFI) by Arimond and Ruel. Calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.
indicatorsCH2
indicatorsCH2
A data frame with 15 columns and 2083 rows.
Variable | Description |
region |
Region in Somalia from which the cluster belongs to |
district |
District in Somalia from which the cluster belongs to |
psu |
The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
mID |
The mother identifier |
cID |
The child identifier |
ebf |
Exclusive breastfeeding (0/1) |
cbf |
Continued breastfeeding (0/1) |
ddd |
Dietary diversity (0/1) |
mfd |
Meal frequency (0/1) |
icfi |
Infant and child feeding index (from 0 to 6) |
iycf |
Good IYCF |
icfiProp |
Good ICFI |
age |
Child's age |
bf |
Child is breastfeeding (0/1) |
bfStop |
Age in months child stopped breastfeeding |
Mother and child health and nutrition survey in 3 regions of Somalia
indicatorsCH2
indicatorsCH2
Mother indicators for health and nutrition calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.
indicatorsHH
indicatorsHH
A data frame with 26 columns and 2136 rows:
Variable | Description |
region |
Region in Somalia from which the cluster belongs to |
district |
District in Somalia from which the cluster belongs to |
psu |
The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
mID |
The mother identifier |
mMUAC |
Mothers with mid-upper arm circumference < 230 mm (0/1) |
anc1 |
At least 1 antenatal care visit with a trained health professional (0/1) |
anc2 |
At least 4 antenatal care visits with any service provider (0/1) |
anc3 |
FeFol coverage (0/1) |
anc4 |
Vitamin A coverage (0/1) |
wash1 |
Improved sources of drinking water (0/1) |
wash2 |
Improved sources of other water (0/1) |
wash3 |
Probable safe drinking water (0/1) |
wash4 |
Number of litres of water collected in a day |
wash5 |
Improved toilet facilities (0/1) |
wash6 |
Human waste disposal practices / behaviour (0/1) |
wash7a |
Handwashing score (from 0 to 5) |
wash7b |
Handwashing score of 5 (0/1) |
hhs1 |
Household hunger score (from 0 to 6) |
hhs2 |
Little or no hunger (0/1) |
hhs3 |
Moderate hunger (0/1) |
hhs4 |
Severe hunger (0/1) |
mfg |
Mother's dietary diversity score |
pVitA |
Plant-based vitamin A-rich foods (0/1) |
aVitA |
Animal-based vitamin A-rich foods (0/1) |
xVitA |
Any vitamin A-rich foods (0/1) |
iron |
Iron-rich foods (0/1) |
Mother and child health and nutrition survey in 3 regions of Somalia
indicatorsHH
indicatorsHH
Utility function that recodes variables based on user recode specifications. Handles both numeric or factor variables.
recode(var, recodes, afr, anr = TRUE, levels)
recode(var, recodes, afr, anr = TRUE, levels)
var |
Variable to recode |
recodes |
Character string of recode specifications:
|
afr |
Return a factor. Default is TRUE if |
anr |
Coerce result to numeric (default is TRUE) |
levels |
Order of the levels in the returned factor; the default is to use the sort order of the level names. |
Recoded variable
# Recode values from 1 to 9 to various specifications var <- sample(x = 1:9, size = 100, replace = TRUE) # Recode single values recode(var = var, recodes = "9=NA") # Recode set of values recode(var = var, recodes = "c(1,2,5)=1") # Recode range of values recode(var = var, recodes = "1:3=1;4:6=2;7:9=3") # Recode other values recode(var = var, recodes = "c(1,2,5)=1;else=NA")
# Recode values from 1 to 9 to various specifications var <- sample(x = 1:9, size = 100, replace = TRUE) # Recode single values recode(var = var, recodes = "9=NA") # Recode set of values recode(var = var, recodes = "c(1,2,5)=1") # Recode range of values recode(var = var, recodes = "1:3=1;4:6=2;7:9=3") # Recode other values recode(var = var, recodes = "c(1,2,5)=1;else=NA")
A data.frame with 19 rows and 18 columns:
somalia_population
somalia_population
An object of class data.frame
with 19 rows and 18 columns.
Variable | Description |
region |
Region name |
total |
Total population |
urban |
Total urban population |
rural |
Total rural population |
idp |
Total IDP population |
urban_stressed |
Total urban population - stressed |
rural_stressed |
Total rural population - stressed |
idp_stressed |
Total IDP population - stressed |
urban_crisis |
Total urban population - crisis |
rural_crisis |
Total rural population - crisis |
idp_crisis |
Total IDP population - crisis |
urban_emergency |
Total urban population - emergency |
rural_emergency |
Total rural population - emergency |
idp_emergency |
Total IDP population - emergency |
urban_catastrophe |
Total urban population - catastrophe |
rural_catastrophe |
Total rural population - catastrophe |
idp_catastrophe |
Total IDP population - catastrophe |
percent_at_least_crisis |
Percentage of population that are at least in crisis |
https://fsnau.org/downloads/2022-Gu-IPC-Population-Tables-Current.pdf
Dataset containing cluster population weights for use in performing posterior weighting with the blocked weighted bootstrap approach. This dataset is from a mother and child health and nutrition survey conducted in 4 districts from 3 regions in Somalia.
villageData
villageData
A data frame with 6 columns and 117 rows:
Variable | Description |
region |
Region in Somalia from which the cluster belongs to |
district |
District in Somalia from which the cluster belongs to |
psu |
The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
lon |
Longitude coordinate of the cluster |
lat |
Latitude coordinate of the cluster |
pop |
Population size of the cluster |
Mother and child health and nutrition survey in 3 regions of Somalia
villageData
villageData