R/generate_democracy_scores_dataset.R
generate_democracy_scores_dataset.Rd
This function automates the process of joining together all the archived and downloadable datasets in this package, producing a comprehensive data frame with every democracy measure. It automatically converts each democracy measure to a numeric value, ensures that larger values are associated with more democracy, and that all country-years are appropriately matched.
generate_democracy_scores_dataset(
datasets,
selection,
output_format = "long",
use_extended = TRUE,
include_extra_pmm = FALSE,
verbose = TRUE,
target_system = c("GWn", "cown"),
force_redownload = FALSE,
scale_scores = FALSE,
keep_only_last_in_year = TRUE,
exclude_downloadable = FALSE
)
Character vector indicating which datasets to use in producing the combined data frame. Can be any or all of (an unambiguous abbreviation of) "anckar", "anrr", "LIED", "PIPE", "arat_pmm", "blm", "blm_pmm", "bmr", "bnr", "bnr_extended", "bti", "bollen_pmm", "doorenspleet", "eiu", "fh_pmm", "gwf_all", "gwf_all_extended", "hadenius_pmm", "kailitz", "magaloni", "magaloni_extended", "mainwaring", "mainwaring_pmm", "munck_pmm", "pacl", "pacl_pmm", "pacl_update", "peps", "pitf", "polity_pmm", "polyarchy", "polyarchy_dimensions", "polyarchy_pmm", "prc_gasiorowski", "prc_pmm", "svmdi", "svolik_regime", "uds_2010", "uds_2011", "uds_2014", "ulfelder", "ulfelder_extended", "utip", "vanhanen", "vdem", "wahman_teorell_hadenius", "reign" or "REIGN", "polityIV", "polity" (or "polity_annual"), "fh", "fh_electoral", "wgi". Default is all of them.
A regular expression for selecting among the datasets. Optional.
Character. Whether to output a "wide" (each measure of democracy in a separate column) or a "long" (a column with measure names, a column with values) version of the data frame. Default is "long".
Whether to use "extended" (that is, including values
before 1945 for some regimes) versions of some datasets (gwf, ulfelder,
bnr, and magaloni). Default is TRUE
.
Whether to include versions of some measures found in Pemstein, Meserve, and Melton's replication dataset for their 2010 piece introducing the Unified Democracy Scores (Pemstein, Meserve, and Melton 2010, 2013). See blm_pmm, prc_pmm, fh_pmm, pacl_pmm, vanhanen_pmm, and polity_pmm for details. This is included mostly to extend or replicate the uds scores.
Provides a running commentary on what the function is is
doing. Default is TRUE
.
Character vector describing which state system to use
for the combined file, and which country codes to use. Can be one or both
of "GWn" (Gleditsch and Ward, numeric codes) and "cown" (numeric codes for
the Correlates of War system). The default is both Gleditsch-ward and
Correlates of War (c("GWn", "cown")
).
Whether to re-download all datasets that can be
re-downloaded, including those archived with this package. Used only for
debugging; default is FALSE
.
Whether to scale each measure (substracting their mean
and dividing by their standard deviation). Default is FALSE
.
Whether to keep only the last regime
measurement in a given country-year. Some datasets (e.g., prc, reign)
contain more than one regime measurement per country-year in some cases (if
the regime changed multiple times during the year); setting this to
TRUE
discards all except the regime measurement as of 31 December of
the year, the standard practice in most datasets. Default is TRUE
.
This setting is only of interest if you set output_format = "long"
,
since it is ignored when output_format = "wide"
, which automatically
discards all regime measurements except the last in the year.
Whether to exclude all datasets that must be
explicitly downloaded (polity, fh, wgi), using only archived
datasets. This speeds up the process considerably, but you lose some of the
more important democracy measures out there. Default is FALSE
.
A tibble with the selected democracy measures and state system information, in two versions: "long" and "wide". These contain the following variables:
The name of the country in the Gleditsch-Ward system of states, or the official name of the entity (for non-sovereign entities and states not in the Gleditsch and Ward system of states) or else a common name for disputed cases that do not have an official name (e.g., Western Sahara, Hyderabad). The Gleditsch and Ward scheme sometimes indicates the common name of the country and (in parentheses) the name of an earlier incarnation of the state: thus, they have Germany (Prussia), Russia (Soviet Union), Madagascar (Malagasy), etc. For details, see Gleditsch, Kristian S. & Michael D. Ward. 1999. "Interstate System Membership: A Revised List of the Independent States since 1816." International Interactions 25: 393-413. The list can be found at http://privatewww.essex.ac.uk/~ksg/statelist.html.
Gleditsch and Ward's numeric country code, from the Gleditsch and Ward list of independent states.
The Correlates of War numeric country code, 2016 version. This differs from Gleditsch and Ward's numeric country code in a few cases. See http://www.correlatesofwar.org/data-sets/state-system-membership for the full list.
Whether the state is "in system" (that is, is
independent and sovereign), according to Gleditsch and Ward, for this
particular date. Matches at the end of the year; so, for example South
Vietnam 1975 is FALSE
because, according to Gleditsch and Ward, the
country ended on April 1975 (being absorbed by North Vietnam). It is also
TRUE
for dates beyond 2012 for countries that did not end by then, depsite
the fact that the Gleditsch and Ward list has not been updated since.
In the "long" version of the dataset (format = "long"
), the output
data frame also contains the following variables:
The calendar year. Most measures of democracy reflect the
country's situation as of 31 December of the year. If
keep_last_in_year = FALSE
(and format = "long"
), a single
country year may nevertheless contain more than one measurement for some
measures (e.g., prc).
The name of the measure. (e.g., "blm", "fh_total_reversed").
The numerical value of the measure, in the original scale
(if scale_scores = FALSE
) or as a z-score (if scale_scores =
TRUE
).
The index type (dichotomous, trichotomous, ordinal/graded, continuous).
The name of the dataset.
In the "wide" version of the dataset (format = "wide"
), the output
data frame can also contain any of the following variables (in the scales
described below, unless scale_scores = TRUE
, in which case the
measures are converted to z-scores):
The calendar year. In the "wide" version, all measures of democracy reflect the country's situation as of 31 December of the year as much as possible.
The anckar measure of democracy, as a numeric
value. Up to 2010 this should be identical to bmr_democracy_omitteddata
.
0 = non-democracy, 1 = democracy.
The anrr measure of democracy, as a numeric value. 0 = non-democracy, 1 = democracy.
The blm measure of democracy, as a numeric value. Can be 0 (authoritarian), 0.5 (hybrid), or 1 (democracy).
The bmr measure of democracy, as a numeric value. Can be 0 (non-democracy) or 1 (democracy).
According to the bmr codebook, "this is
the same measure as bmr, except it records an NA for countries occupied
during an international war (e.g., the Netherlands 1940-44) or experiencing
state collapse during a civil war (e.g., Lebanon 1976-89). The democracy
variable instead fills in these years as continuations of the same regime
type." There are some -1 values that I've converted to NA
(the measure is
supposed to be between 0 and 1).
According to the bmr codebook, this
is the same measure as bmr
, except that it also requires that at least
half of adult women have the right to vote. 30 countries change values.
The bnr event measure of democracy, reversed, so that 0
indicates non-democracy and 1 indicates democracy. Since the event
variable of the bnr dataset only codes terminations of democracy
(ignoring years when the country is non-democratic), this variable is
mostly equal to 1.
The bnr_extended variable, with 0 indicating non-democracy and 1 indicating democracy. See the documentation for bnr_extended for details of how this variable extends bnr, filling in years of non-democracy back to 1913.
The bti - Berteslmann Transformation Democracy Status index. Ranges from 1 to 10.
The doorenspleet measure of democracy, with 1 indicating non-democracy and 2 indicating democracy.
The eiu measure of democracy, ranging from 0 (least democratic) to 10 (most democratic). The report says that the index "is based on five categories: electoral process and pluralism; civil liberties; the functioning of government; political participation; and political culture" which form "one interrelated whole"
The fh_electoral measure of electoral democracy, with 0 indicating a lack of electoral democracy, 1 indicating electoral democracy status.
The Freedom House combined measure of political and civil rights, reversed, so that 0 is least free and 12 is most free.
A measure of democracy from gwf_all, obtained by coding all democracies as 1 and all non-democracies, including all non-democratic nonautocracies, as 0.
A measure of democracy from gwf_all_extended, obtained by coding all democracies as 1 and all non-democracies, including all non-democratic nonautocracies, as 0. See the documentation for gwf_all_extended for details on how the dataset was extended to the period before 1945 for some countries.
A measure of democracy from gwf_all, obtained
by coding all democracies as 1 and all non-democracies as 0. It is
NA
for all non-democratic non-autocracies (e.g., warlord regimes,
foreign occupation, etc.).
A measure of democracy from
gwf_all_extended, obtained by coding all democracies as 1 and all
non-democracies as 0. It is NA
for all non-democratic
non-autocracies (e.g., warlord regimes, foreign occupation, etc.). See the
documentation for gwf_all_extended for details on how the dataset was
extended to the period before 1945 for some countries.
A measure of democracy from kailitz, obtained by coding all liberal democracies as 1 and all other regimes as 0.
A trichotomous measure of democracy from kailitz, obtained by coding all liberal democracies as 2, all electoral autocracies as 1, and all other regimes as 0.
The LIED measure of electoral democracy, ranging from 0 to 6.
The LIED measure of polyarchy, ranging from 0 to 7 (including political liberties).
A measure of democracy from magaloni, obtained by coding all democracies as 1 and all non-democracies as 0.
A measure of democracy from magaloni_extended, obtained by coding all democracies as 1 and all non-democracies as 0. See the documentation for magaloni_extended for details on how the dataset was extended to the period before 1950 for some countries.
The measure of democracy from mainwaring, where 0 is non-democracy, 0.5 represents hybrid regimes, and 1 is democracy.
The measure of democracy from pacl, where 0 is non-democracy, and 1 is democracy.
The measure of democracy from pacl_update, where 0 is non-democracy, and 1 is democracy.
The measures of democracy from peps. Higher values indicate more democracy.
The measures of democracy from PIPE. Higher values indicate more democracy. See the documentation of PIPE for large caveats - these measures are calculated on the basis of unclear instructions and may contain errors.
The measures of democracy from pitf, converted to numberic values. Higher values indicate more democracy.
The measure of democracy in arat_pmm.
The measure of democracy in blm_pmm.
The measure of democracy in bollen_pmm.
The measure of democracy in fh_pmm. Check the documentation of fh_pmm for caveats.
The measure of democracy in hadenius_pmm.
The measure of democracy in mainwaring_pmm.
The measure of democracy in munck_pmm.
The measure of democracy in pacl_pmm.
The measure of democracy in polity_pmm. Check the documentation of polity_pmm for caveats.
The measure of democracy in polyarchy_pmm.
The measure of democracy in prc_pmm. Check the documentation of prc_pmm for caveats.
The measure of democracy in vanhanen_pmm. Check the documentation of vanhanen_pmm for caveats.
The polity measure of democracy in polity (version 5, with NAs for -88, -77, -66).
The polity2 measure of democracy in polity (version 5, interpolated for periods of interregnum, occupation, and the like - see documentation in the polity manual).
The polity measure of democracy in polity (version IV, with NAs for -88, -77, -66).
The polity2 measure of democracy in polity (version IV, interpolated for periods of interregnum, occupation, and the like - see documentation in the polity manual).
The contestation dimension
(CONTEST
) in polyarchy_dimensions.
The inclusion dimension
(INCLUS
) in polyarchy_dimensions.
The contestation dimension
(cont
) in polyarchy.
The original polyarchy scale
(poly
) in polyarchy, reversed so that higher values imply more
democracy. The codebook suggests this was superceded by
polyarchy_original_contestation
.
The measure of democracy in prc, where 1 is non-democracy, 3 are hybrid regimes, and 4 are democracies. Transitional regimes (2 in the original scale) are coded NA.
A measure of democracy from reign, obtained by coding all presidential and parliamentary democracies as 1, all other regimes as 0.
The continuous Support Vector Machine democracy index from svmdi, 2020 version.
The continuous Support Vector Machine democracy index from svmdi, 2016 version.
The dichotomous Support Vector Machine democracy index from svmdi, 2020 version.
A measure of democracy from svolik_regime. 0 = non-democracy, 1 = democracy.
The mean and median posterior scores of the uds index (2010, 2011, and 2014 releases). Higher values are more democratic.
The dichotmous measure of democracy in ulfelder.
The dichotomous measure of democracy in ulfelder_extended.
A dichotomous measure of democracy from utip. Equals 1 if the regime is a social democracy, conservative democracy, or one party democracy, 0 otherwise.
A dichotomous measure of democracy from utip. Equals 1 if the regime is a social democracy or conservative democracy, 0 otherwise.
A trichotomous measure of democracy from utip. Equals 2 if the regime is a social democracy or conservative democracy, 1 if it is a one-party democracy, 0 otherwise.
The competition index from vanhanen.
The democratization index from vanhanen.
The participation index from vanhanen.
The additive polyarchy index from vdem.
The deliberative democracy index from vdem.
The egalitarian democracy index from vdem.
The liberal democracy index from vdem.
The multiplicative polyarchy index from vdem.
The participative democracy index from vdem.
The polyarchy (electoral democracy) index from vdem.
The voice and accountability index from wgi.
A dichotomous measure of democracy from
wahman_teorell_hadenius, obtaining by coding 1 all democracies according
to the regime1ny
variable, 0 all other regimes.
A dichotomous measure of democracy from
wahman_teorell_hadenius, obtaining by coding 1 all democracies according
to the regimenyrobust
variable, 0 all other regimes.
D. Pemstein, S. Meserve, and J. Melton. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type". In: Political Analysis 18.4 (2010), pp. 426-449. DOI: 10.1093/pan/mpq020.
D. Pemstein, S. A. Meserve, and J. Melton. Replication data for: Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type. 2013. http://hdl.handle.net/1902.1/PMM.
democracy_data_long_no_download <- generate_democracy_scores_dataset(exclude_downloadable = TRUE,
keep_only_last_in_year = FALSE)
#> Adding fh_pmm data
#> Adding old polityIV data
#> Adding polity_pmm data
#> Adding Anckar-Fredriksson data
#> Adding ANRR data
#> Adding Arat data
#> Adding BLM data
#> Adding blm_pmm data
#> Adding BMR data
#> Adding BNR extended data
#> Adding BNR data
#> Adding BTI data
#> Adding Bollen data
#> Adding Doorenspleet data
#> Adding EIU data
#> Adding GWF data
#> Adding GWF data
#> Adding Hadenius data
#> Adding Kailitz data
#> Adding LIED data
#> Adding Magaloni extended data
#> Adding Magaloni data
#> Adding Mainwaring data
#> Adding mainwaring_pmm data
#> Adding Munck data
#> Adding PACL data
#> Adding pacl_pmm data
#> Adding Update of PACL data by Bjornskov and Rode
#> Adding PEPS data
#> Adding PITF data
#> Adding Polyarchy original data
#> Adding polyarchy_pmm data
#> Adding Polyarchy Dimensions data
#> Adding PRC/Gasiorowski data
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `prc = (structure(function (..., .x = ..1, .y = ..2, . = ..1)
#> ...`.
#> Caused by warning:
#> ! NAs introduced by coercion
#> Adding prc_pmm data
#> Adding PIPE data
#> Adding REIGN data
#> Adding SVMDI data
#> Adding SVMDI 2016 data
#> Adding Svolik data
#> Adding UDS 2014 data
#> Adding UDS 2011 data
#> Adding UDS 2010 data
#> Adding extended Ulfelder data
#> Adding Ulfelder data
#> Adding UTIP data
#> Adding Vanhanen data
#> Adding Vanhanen_pmm data
#> Adding vdem data
#> Adding Wahman, Teorell, and Hadenius data
#> Finalizing
# You can select only some datasets
democracy_data_gwf <- generate_democracy_scores_dataset(datasets = c("gwf_all"),
output_format = "wide")
#> Adding GWF data
#> Adding GWF data
#> Finalizing
# all PMM datasets
democracy_data_pmm <- generate_democracy_scores_dataset(selection = "pmm")
#> Adding fh_pmm data
#> Adding polity_pmm data
#> Adding Arat data
#> Adding blm_pmm data
#> Adding Bollen data
#> Adding Hadenius data
#> Adding mainwaring_pmm data
#> Adding Munck data
#> Adding pacl_pmm data
#> Adding polyarchy_pmm data
#> Adding prc_pmm data
#> Adding Vanhanen_pmm data
#> Finalizing
if (FALSE) {
# This produces scaled scores
generate_democracy_scores_dataset(exclude_downloadable = TRUE,
keep_only_last_in_year = FALSE,
scale_scores = TRUE)
# These require downloads:
democracy_data_long <- generate_democracy_scores_dataset()
democracy_data_wide <- generate_democracy_scores_dataset(output_format = "wide")
}