This function automates the process of joining together all the archived and downloadable datasets in this package, producing a comprehensive data frame with every democracy measure. It automatically converts each democracy measure to a numeric value, ensures that larger values are associated with more democracy, and that all country-years are appropriately matched.

generate_democracy_scores_dataset(
  datasets,
  selection,
  output_format = "long",
  use_extended = TRUE,
  include_extra_pmm = FALSE,
  verbose = TRUE,
  target_system = c("GWn", "cown"),
  force_redownload = FALSE,
  scale_scores = FALSE,
  keep_only_last_in_year = TRUE,
  exclude_downloadable = FALSE
)

Arguments

datasets

Character vector indicating which datasets to use in producing the combined data frame. Can be any or all of (an unambiguous abbreviation of) "anckar", "anrr", "LIED", "PIPE", "arat_pmm", "blm", "blm_pmm", "bmr", "bnr", "bnr_extended", "bti", "bollen_pmm", "doorenspleet", "eiu", "fh_pmm", "gwf_all", "gwf_all_extended", "hadenius_pmm", "kailitz", "magaloni", "magaloni_extended", "mainwaring", "mainwaring_pmm", "munck_pmm", "pacl", "pacl_pmm", "pacl_update", "peps", "pitf", "polity_pmm", "polyarchy", "polyarchy_dimensions", "polyarchy_pmm", "prc_gasiorowski", "prc_pmm", "svmdi", "svolik_regime", "uds_2010", "uds_2011", "uds_2014", "ulfelder", "ulfelder_extended", "utip", "vanhanen", "vdem", "wahman_teorell_hadenius", "reign" or "REIGN", "polityIV", "polity" (or "polity_annual"), "fh", "fh_electoral", "wgi". Default is all of them.

selection

A regular expression for selecting among the datasets. Optional.

output_format

Character. Whether to output a "wide" (each measure of democracy in a separate column) or a "long" (a column with measure names, a column with values) version of the data frame. Default is "long".

use_extended

Whether to use "extended" (that is, including values before 1945 for some regimes) versions of some datasets (gwf, ulfelder, bnr, and magaloni). Default is TRUE.

include_extra_pmm

Whether to include versions of some measures found in Pemstein, Meserve, and Melton's replication dataset for their 2010 piece introducing the Unified Democracy Scores (Pemstein, Meserve, and Melton 2010, 2013). See blm_pmm, prc_pmm, fh_pmm, pacl_pmm, vanhanen_pmm, and polity_pmm for details. This is included mostly to extend or replicate the uds scores.

verbose

Provides a running commentary on what the function is is doing. Default is TRUE.

target_system

Character vector describing which state system to use for the combined file, and which country codes to use. Can be one or both of "GWn" (Gleditsch and Ward, numeric codes) and "cown" (numeric codes for the Correlates of War system). The default is both Gleditsch-ward and Correlates of War (c("GWn", "cown")).

force_redownload

Whether to re-download all datasets that can be re-downloaded, including those archived with this package. Used only for debugging; default is FALSE.

scale_scores

Whether to scale each measure (substracting their mean and dividing by their standard deviation). Default is FALSE.

keep_only_last_in_year

Whether to keep only the last regime measurement in a given country-year. Some datasets (e.g., prc, reign) contain more than one regime measurement per country-year in some cases (if the regime changed multiple times during the year); setting this to TRUE discards all except the regime measurement as of 31 December of the year, the standard practice in most datasets. Default is TRUE. This setting is only of interest if you set output_format = "long", since it is ignored when output_format = "wide", which automatically discards all regime measurements except the last in the year.

exclude_downloadable

Whether to exclude all datasets that must be explicitly downloaded (polity, fh, wgi), using only archived datasets. This speeds up the process considerably, but you lose some of the more important democracy measures out there. Default is FALSE.

Value

A tibble with the selected democracy measures and state system information, in two versions: "long" and "wide". These contain the following variables:

Standard descriptive variables (generated by this package)

extended_country_name

The name of the country in the Gleditsch-Ward system of states, or the official name of the entity (for non-sovereign entities and states not in the Gleditsch and Ward system of states) or else a common name for disputed cases that do not have an official name (e.g., Western Sahara, Hyderabad). The Gleditsch and Ward scheme sometimes indicates the common name of the country and (in parentheses) the name of an earlier incarnation of the state: thus, they have Germany (Prussia), Russia (Soviet Union), Madagascar (Malagasy), etc. For details, see Gleditsch, Kristian S. & Michael D. Ward. 1999. "Interstate System Membership: A Revised List of the Independent States since 1816." International Interactions 25: 393-413. The list can be found at http://privatewww.essex.ac.uk/~ksg/statelist.html.

GWn

Gleditsch and Ward's numeric country code, from the Gleditsch and Ward list of independent states.

cown

The Correlates of War numeric country code, 2016 version. This differs from Gleditsch and Ward's numeric country code in a few cases. See http://www.correlatesofwar.org/data-sets/state-system-membership for the full list.

in_GW_system

Whether the state is "in system" (that is, is independent and sovereign), according to Gleditsch and Ward, for this particular date. Matches at the end of the year; so, for example South Vietnam 1975 is FALSE because, according to Gleditsch and Ward, the country ended on April 1975 (being absorbed by North Vietnam). It is also TRUE for dates beyond 2012 for countries that did not end by then, depsite the fact that the Gleditsch and Ward list has not been updated since.

Long version

In the "long" version of the dataset (format = "long"), the output data frame also contains the following variables:

year

The calendar year. Most measures of democracy reflect the country's situation as of 31 December of the year. If keep_last_in_year = FALSE (and format = "long"), a single country year may nevertheless contain more than one measurement for some measures (e.g., prc).

measure

The name of the measure. (e.g., "blm", "fh_total_reversed").

variable

The numerical value of the measure, in the original scale (if scale_scores = FALSE) or as a z-score (if scale_scores = TRUE).

index_type

The index type (dichotomous, trichotomous, ordinal/graded, continuous).

dataset

The name of the dataset.

Wide version

In the "wide" version of the dataset (format = "wide"), the output data frame can also contain any of the following variables (in the scales described below, unless scale_scores = TRUE, in which case the measures are converted to z-scores):

year

The calendar year. In the "wide" version, all measures of democracy reflect the country's situation as of 31 December of the year as much as possible.

anckar_democracy

The anckar measure of democracy, as a numeric value. Up to 2010 this should be identical to bmr_democracy_omitteddata. 0 = non-democracy, 1 = democracy.

anrr

The anrr measure of democracy, as a numeric value. 0 = non-democracy, 1 = democracy.

blm

The blm measure of democracy, as a numeric value. Can be 0 (authoritarian), 0.5 (hybrid), or 1 (democracy).

bmr_democracy

The bmr measure of democracy, as a numeric value. Can be 0 (non-democracy) or 1 (democracy).

bmr_democracy_omitteddata

According to the bmr codebook, "this is the same measure as bmr, except it records an NA for countries occupied during an international war (e.g., the Netherlands 1940-44) or experiencing state collapse during a civil war (e.g., Lebanon 1976-89). The democracy variable instead fills in these years as continuations of the same regime type." There are some -1 values that I've converted to NA (the measure is supposed to be between 0 and 1).

bmr_democracy_femalesuffrage

According to the bmr codebook, this is the same measure as bmr, except that it also requires that at least half of adult women have the right to vote. 30 countries change values.

bnr

The bnr event measure of democracy, reversed, so that 0 indicates non-democracy and 1 indicates democracy. Since the event variable of the bnr dataset only codes terminations of democracy (ignoring years when the country is non-democratic), this variable is mostly equal to 1.

bnr_extended

The bnr_extended variable, with 0 indicating non-democracy and 1 indicating democracy. See the documentation for bnr_extended for details of how this variable extends bnr, filling in years of non-democracy back to 1913.

bti

The bti - Berteslmann Transformation Democracy Status index. Ranges from 1 to 10.

doorenspleet

The doorenspleet measure of democracy, with 1 indicating non-democracy and 2 indicating democracy.

eiu

The eiu measure of democracy, ranging from 0 (least democratic) to 10 (most democratic). The report says that the index "is based on five categories: electoral process and pluralism; civil liberties; the functioning of government; political participation; and political culture" which form "one interrelated whole"

fh_electoral

The fh_electoral measure of electoral democracy, with 0 indicating a lack of electoral democracy, 1 indicating electoral democracy status.

fh_total_reversed

The Freedom House combined measure of political and civil rights, reversed, so that 0 is least free and 12 is most free.

gwf_democracy

A measure of democracy from gwf_all, obtained by coding all democracies as 1 and all non-democracies, including all non-democratic nonautocracies, as 0.

gwf_democracy_extended

A measure of democracy from gwf_all_extended, obtained by coding all democracies as 1 and all non-democracies, including all non-democratic nonautocracies, as 0. See the documentation for gwf_all_extended for details on how the dataset was extended to the period before 1945 for some countries.

gwf_democracy_strict

A measure of democracy from gwf_all, obtained by coding all democracies as 1 and all non-democracies as 0. It is NA for all non-democratic non-autocracies (e.g., warlord regimes, foreign occupation, etc.).

gwf_democracy_extended_strict

A measure of democracy from gwf_all_extended, obtained by coding all democracies as 1 and all non-democracies as 0. It is NA for all non-democratic non-autocracies (e.g., warlord regimes, foreign occupation, etc.). See the documentation for gwf_all_extended for details on how the dataset was extended to the period before 1945 for some countries.

kailitz_binary

A measure of democracy from kailitz, obtained by coding all liberal democracies as 1 and all other regimes as 0.

kailitz_tri

A trichotomous measure of democracy from kailitz, obtained by coding all liberal democracies as 2, all electoral autocracies as 1, and all other regimes as 0.

lexical_index

The LIED measure of electoral democracy, ranging from 0 to 6.

lexical_index_plus

The LIED measure of polyarchy, ranging from 0 to 7 (including political liberties).

magaloni_democracy

A measure of democracy from magaloni, obtained by coding all democracies as 1 and all non-democracies as 0.

magaloni_democracy_extended

A measure of democracy from magaloni_extended, obtained by coding all democracies as 1 and all non-democracies as 0. See the documentation for magaloni_extended for details on how the dataset was extended to the period before 1950 for some countries.

mainwaring

The measure of democracy from mainwaring, where 0 is non-democracy, 0.5 represents hybrid regimes, and 1 is democracy.

pacl

The measure of democracy from pacl, where 0 is non-democracy, and 1 is democracy.

pacl_update

The measure of democracy from pacl_update, where 0 is non-democracy, and 1 is democracy.

PEPS1i, PEPS1q, PEPS1v, PEPS2i, PEPS2q, PEPS2v

The measures of democracy from peps. Higher values indicate more democracy.

PIPE_democracy, PIPE_regime

The measures of democracy from PIPE. Higher values indicate more democracy. See the documentation of PIPE for large caveats - these measures are calculated on the basis of unclear instructions and may contain errors.

pitf, pitf_binary

The measures of democracy from pitf, converted to numberic values. Higher values indicate more democracy.

pmm_arat

The measure of democracy in arat_pmm.

pmm_blm

The measure of democracy in blm_pmm.

pmm_bollen

The measure of democracy in bollen_pmm.

pmm_fh

The measure of democracy in fh_pmm. Check the documentation of fh_pmm for caveats.

pmm_hadenius

The measure of democracy in hadenius_pmm.

pmm_mainwaring

The measure of democracy in mainwaring_pmm.

pmm_munck

The measure of democracy in munck_pmm.

pmm_pacl

The measure of democracy in pacl_pmm.

pmm_polity

The measure of democracy in polity_pmm. Check the documentation of polity_pmm for caveats.

pmm_polyarchy

The measure of democracy in polyarchy_pmm.

pmm_prc

The measure of democracy in prc_pmm. Check the documentation of prc_pmm for caveats.

pmm_vanhanen

The measure of democracy in vanhanen_pmm. Check the documentation of vanhanen_pmm for caveats.

polity

The polity measure of democracy in polity (version 5, with NAs for -88, -77, -66).

polity2

The polity2 measure of democracy in polity (version 5, interpolated for periods of interregnum, occupation, and the like - see documentation in the polity manual).

polityIV

The polity measure of democracy in polity (version IV, with NAs for -88, -77, -66).

polity2IV

The polity2 measure of democracy in polity (version IV, interpolated for periods of interregnum, occupation, and the like - see documentation in the polity manual).

polyarchy_contestation_dimension

The contestation dimension (CONTEST) in polyarchy_dimensions.

polyarchy_inclusion_dimension

The inclusion dimension (INCLUS) in polyarchy_dimensions.

polyarchy_original_contestation

The contestation dimension (cont) in polyarchy.

polyarchy_original_polyarchy

The original polyarchy scale (poly) in polyarchy, reversed so that higher values imply more democracy. The codebook suggests this was superceded by polyarchy_original_contestation.

prc

The measure of democracy in prc, where 1 is non-democracy, 3 are hybrid regimes, and 4 are democracies. Transitional regimes (2 in the original scale) are coded NA.

reign_democracy

A measure of democracy from reign, obtained by coding all presidential and parliamentary democracies as 1, all other regimes as 0.

csvmdi

The continuous Support Vector Machine democracy index from svmdi, 2020 version.

svmdi_2016

The continuous Support Vector Machine democracy index from svmdi, 2016 version.

dsvmdi

The dichotomous Support Vector Machine democracy index from svmdi, 2020 version.

svolik_democracy

A measure of democracy from svolik_regime. 0 = non-democracy, 1 = democracy.

uds_2010_mean, uds_2010_median, uds_2011_mean, uds_2011_median, uds_2014_mean, uds_2014_median

The mean and median posterior scores of the uds index (2010, 2011, and 2014 releases). Higher values are more democratic.

ulfelder_democracy

The dichotmous measure of democracy in ulfelder.

ulfelder_democracy_extended

The dichotomous measure of democracy in ulfelder_extended.

utip_dichotomous

A dichotomous measure of democracy from utip. Equals 1 if the regime is a social democracy, conservative democracy, or one party democracy, 0 otherwise.

utip_dichotomous_strict

A dichotomous measure of democracy from utip. Equals 1 if the regime is a social democracy or conservative democracy, 0 otherwise.

utip_trichotomous

A trichotomous measure of democracy from utip. Equals 2 if the regime is a social democracy or conservative democracy, 1 if it is a one-party democracy, 0 otherwise.

vanhanen_competition

The competition index from vanhanen.

vanhanen_democratization

The democratization index from vanhanen.

vanhanen_participation

The participation index from vanhanen.

v2x_api

The additive polyarchy index from vdem.

v2x_delibdem

The deliberative democracy index from vdem.

v2x_egaldem

The egalitarian democracy index from vdem.

v2x_libdem

The liberal democracy index from vdem.

v2x_mpi

The multiplicative polyarchy index from vdem.

v2x_partipdem

The participative democracy index from vdem.

v2x_polyarchy

The polyarchy (electoral democracy) index from vdem.

wgi_democracy

The voice and accountability index from wgi.

wth_democ1

A dichotomous measure of democracy from wahman_teorell_hadenius, obtaining by coding 1 all democracies according to the regime1ny variable, 0 all other regimes.

wth_democrobust

A dichotomous measure of democracy from wahman_teorell_hadenius, obtaining by coding 1 all democracies according to the regimenyrobust variable, 0 all other regimes.

References

D. Pemstein, S. Meserve, and J. Melton. "Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type". In: Political Analysis 18.4 (2010), pp. 426-449. DOI: 10.1093/pan/mpq020.

D. Pemstein, S. A. Meserve, and J. Melton. Replication data for: Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type. 2013. http://hdl.handle.net/1902.1/PMM.

Examples

democracy_data_long_no_download <- generate_democracy_scores_dataset(exclude_downloadable = TRUE,
         keep_only_last_in_year = FALSE)
#> Adding fh_pmm data
#> Adding old polityIV data
#> Adding polity_pmm data
#> Adding Anckar-Fredriksson data
#> Adding ANRR data
#> Adding Arat data
#> Adding BLM data
#> Adding blm_pmm data
#> Adding BMR data
#> Adding BNR extended data
#> Adding BNR data
#> Adding BTI data
#> Adding Bollen data
#> Adding Doorenspleet data
#> Adding EIU data
#> Adding GWF data
#> Adding GWF data
#> Adding Hadenius data
#> Adding Kailitz data
#> Adding LIED data
#> Adding Magaloni extended data
#> Adding Magaloni data
#> Adding Mainwaring data
#> Adding mainwaring_pmm data
#> Adding Munck data
#> Adding PACL data
#> Adding pacl_pmm data
#> Adding Update of PACL data by Bjornskov and Rode
#> Adding PEPS data
#> Adding PITF data
#> Adding Polyarchy original data
#> Adding polyarchy_pmm data
#> Adding Polyarchy Dimensions data
#> Adding PRC/Gasiorowski data
#> Warning: There was 1 warning in `mutate()`.
#>  In argument: `prc = (structure(function (..., .x = ..1, .y = ..2, . = ..1)
#>   ...`.
#> Caused by warning:
#> ! NAs introduced by coercion
#> Adding prc_pmm data
#> Adding PIPE data
#> Adding REIGN data
#> Adding SVMDI data
#> Adding SVMDI 2016 data
#> Adding Svolik data
#> Adding UDS 2014 data
#> Adding UDS 2011 data
#> Adding UDS 2010 data
#> Adding extended Ulfelder data
#> Adding Ulfelder data
#> Adding UTIP data
#> Adding Vanhanen data
#> Adding Vanhanen_pmm data
#> Adding vdem data
#> Adding Wahman, Teorell, and Hadenius data
#> Finalizing

# You can select only some datasets

democracy_data_gwf <- generate_democracy_scores_dataset(datasets = c("gwf_all"),
         output_format = "wide")
#> Adding GWF data
#> Adding GWF data
#> Finalizing

# all PMM datasets
democracy_data_pmm <- generate_democracy_scores_dataset(selection = "pmm")
#> Adding fh_pmm data
#> Adding polity_pmm data
#> Adding Arat data
#> Adding blm_pmm data
#> Adding Bollen data
#> Adding Hadenius data
#> Adding mainwaring_pmm data
#> Adding Munck data
#> Adding pacl_pmm data
#> Adding polyarchy_pmm data
#> Adding prc_pmm data
#> Adding Vanhanen_pmm data
#> Finalizing

if (FALSE) {
# This produces scaled scores
generate_democracy_scores_dataset(exclude_downloadable = TRUE,
         keep_only_last_in_year = FALSE,
         scale_scores = TRUE)

# These require downloads:

democracy_data_long <- generate_democracy_scores_dataset()
democracy_data_wide <- generate_democracy_scores_dataset(output_format = "wide")
}