This tutorial is designed to guide users through the basic functionality of the Election Forensics Toolkit website (http://electionforensicstoolkit.org/). The tutorial will focus on election forensics analysis of polling station vote count data from the 2013 election in Albania.
Note that the Election Forensics Toolkit website is currently a proof-of-concept creation using PHP and R to run analyses. Results may require substantial processing power and may take a while to complete. However, you may come back to a report after closing your browser. Using this website means you agree to our terms of service and understand that results are not guaranteed for any purpose. If you have problems using this website, please send mail to email@example.com.
The Election Forensics Toolkit website currently makes data available from select countries and elections. These data were organized and used in analyses done for the IIE/USAID grant that funded the original work on this website. The data are available preloaded under the “Forensics Analysis" tab (under 1a. Choose Election Country and Year) or can be downloaded from the Toolkit website and then uploaded from your computer. Information about key variables to use from the preloaded vote data is available in preloaded vote data information and, in many cases, these key variables have been set up as the default selections in the Toolkit interface. Information about all of the variables in the preloaded data can be found in the Legend.
For motivation, explanations, and examples of election forensics see the Guide to Election Forensics. For a discussion of the Albania 2013 election and elections in seven other countries see Working Paper on the Election Forensics Toolkit.
STEP 1: At the top of the screen, click on the Forensics Analysis tab and choose Albania 2013 from the drop-down menu in 1a. If you don't see the Forensics Analysis tab, click on the hamburger menu.
After a moment, the default selections for Albania will populate the menus for the rest of the selections. The first ten rows of data will be present in the tab Display Data. By clicking the Data Summary tab, you can get summary information for the variables in the data. Reports show a log of tests that have been previously run on the data and allow you to view the results.
STEP 2: Choose V35 as your selection for Select Candidates/Parties. V35 is the Socialist Movement for Integration party. (Alternatively, you could select V50, the Socialist Party of Albania.)
STEP 3: In menus 3-5, you can use the default selections: Qarku for Select Level, V1 for Select Total Registered, and V6 for Select Total Votes.
The Select Level menu allows you to select the level used to gather observations for analysis. In Albania, Qarku are counties. The 'Registered' variable measures the total number of eligible voters for each observation. The 'Ballots' variable measure the total number of ballots cast for each observation. If no variable is specified for Select Total Registered or Select Total Votes, then some methods will not be computable.
STEP 4: Check the methods for which you would like to get estimates. In this tutorial we are selecting Benford's 2nd Digit, Last Digit (Counts), Counts (05s) and Multimodality Test. These methods are informally referred to as 2BL, LastC, C05s, and DipT respectively.
The two methods listed last, Klimek et al. simulation method and especially finite mixture likelihood method, are computationally intensive and can take a very long time (i.e., days), so we won't be using them.
It will take some time to compute the results. Usually, computations are time-consuming. To calculate confidence intervals, the Toolkit does nonparametric bootstraps using 1000 resamples. That could take several minutes for all of the prefectures in the Albania data. The website has several features built-in that allow results to display faster. In particular, results are saved in a database, and these results may be presented again rather than recomputed if a user requests the same methods/ selections on the same data for which there are previously stored results.
STEP 5: After computations finish, a report will automatically appear, which will show both point estimates and 95 percent confidence intervals (appearing below the point estimate, in parentheses). For the DipT test, the value shown is the p-value. All anomalous values are marked in red.
Results are shown for 'Turnout' if variables were specified for Select Total Registered and Select Total Votes. The 'Turnout' proportion is Total Votes divided by Total Registered.
To save the results on your computer you can click on the Download Results button (for a .csv file) or on the Download HTML button. Downloaded HTML results will retain the color information for anomalies and can be converted to an xlsx file. To do so, open the file in Excel, delete any extra objects appearing with the table, and then copy the table to a new sheet.
FINISHED/RESULTS: For interpretations of Toolkit results on Albania, see the section on Albania from the Election Forensics Toolkit DRG Center Working paper.
In addition to running the preloaded files, data can be uploaded to the Election Forensics Toolkit website. Access the Albania data from the following links: Download Albania's data. For the tutorial in this section we'll be using the externally downloaded data, so at this stage, you need to click on the link and download the data for Albania.
STEP 1: At the top of the screen, click on the Forensic Analysis tab. On the left-hand side of the screen, click on the Choose File button at 1b. Upload the 'Albania2013QV.csv' file.
After a moment, the Using data from: box will show a random number that was generated when the file was uploaded. You will need to keep track of this number should you need to access previously run reports after navigating away from the Toolkit. Additionally, default selections for uploaded data will populate the menus, and the first ten rows of data will be present in the tab Display Data. By clicking the Data Summary tab, you can get summary information for the variables in the data. Reports will display the message 'No reports available' until after tests have been run. The Reports tab will then show a log of tests that have been previously run on the data and allow you to view the results.
It is important to point out that if you plan to upload your own files, you will need to make sure that your variable names do not contain spaces. If you must use spaces, you can put quotes around variable names.
At this point, you should be able to go back up to the previous section and follow the same steps (starting with STEP 2) because the preloaded file and uploaded file are the same.
Results from preloaded data and uploaded data are both automatically saved. Once a report has started to run, you may navigate away from the Toolkit or close your browser and return to find your results at a later time.
For preloaded data, saved reports can be found at the Reports tab. To find the report after closing your browser, you will have to select the Forensics Analysis tab first, and then select the preloaded data for the report which you would like to find. The Reports tab will appear, and you can find your report by the date and time stamp (or using the report ID).
For uploaded data, you will need to make note of the random number that appears in the Using data from: box when the data are uploaded or save/bookmark a copy of the URL. To come back to a report, you can enter the saved URL in your browsers address bar. You will be able to see results even if you closed your browser while the report was in progress.
The image below shows the random number circled in red in the Using data from: box. That same random number appears in the URL. In the image below, the random number is 5be313df11bb9. You will need to save the URL circled in red if you would like to navigate away from a completed report or report in progress and find it later.
This is a selection that is available under Select Candidates/Parties, which analyzes the parties with the most votes in a given area as specified by the Level in the next step.
All leaders will only be activated if candidate/party names start with 'C' or 'P' followed by a number (this naming convention helps the 'All Leaders' algorithm identify the subset of variables corresponding to candidate/party vote counts; all such variables are considered as possibly containing 'Leaders' votes). To use this selection, users will need to label and upload their datasets accordingly. The algorithm cannot be implemented with most of the preloaded data because those data sets do not follow this naming convention. However, to use All Leaders with preloaded data, one could take a preloaded data file and rename variables. For example, in the Albania data, V7-V72 are the candidates, but if those are relabeled as C007-C072, All Leaders can be applied.
'All' is an option under Select Level that uses each row (or entry) to compute a national level result. In other words,'All' is the option that can be used for data without a grouping variable that indicates the Level. For example, if there were no district information available (e.g., if the Qarku variable were missing in the Albania data), the data could still be analyzed at the national level. 'All' can also be used in cases where there are a limited number of units available to analyze (e.g., data are only available for part of a country, etc.) and subdividing them further (i.e., into districts) would not make sense.
In the table below, data appearing in the Electoral Data column should be identical to the preloaded data files. Links in the Variable Legends column should take you to information about all of the names of the parties or candidates associated with each variable in the preloaded (and shapefile) data.
Free files for downloading:
|Countries||Variables Legends||Electoral Data||Geodata|
|Afghanistan Initial 2014||Download||Download||Download|
|Afghanistan Runoff 2014||Download||Download||Download|
|Libya 2014 Fem||Download||Download||Download|
|South Africa 2014||Download||Download||Download|