21st Century US Presidential Election Analysis Group Project
TABLE OF CONTENTS
Tools used
Tableau | Excel | Google Slides
Skills Used
Mapping | Data Cleaning | Remote Collaboration | Data Visualization | Team work | Presentations
Presentation Link - NOTE: Please view presentation in “Presentation Mode” so that visuals and animations can be viewed properly.
The Scenrio
In this project, I was given the scenario of working as a data journalist for a political magazine. My task was to create a graphic feature for the online version of the magazine that would tell a story about the U.S presidential elections so far in the 21st century. The time given for this project was a day and a half and to be presented at an editorial meeting prior to publishing.
Cleaning the Data
The dataset used for this project came from the MIT Election Lab and had 72,618 rows of county-level data for every state in the US for the 2000, 2004, 2008, 2012, 2016, and 2020 U.S presidential elections. The 12 columns in this dataset included a county’s state and state code, county FIPS code, candidate names and their party, the number of votes for cast for a candidate, and the total votes cast in the county. I cleaned the data in Excel, starting with checking for blank values in the table. Using the COUNTBLANK command, I found that the only missing information in the table came from some missing FIPS codes. I filtered the table so that I only looked at these missing codes, and I took note of the county names in this list. After cross referencing these with other entries from the same county that had the FIPS code filled in, I replaced the missing information in these columns, leaving only state write-in’s from Connecticut, a federal precinct in Rhode Island, and Uniformed Service and Overseas Voters (UOCAVA) from Maine. Being that these were out of the ordinary values to deal with, these rows were ultimately ignored in the analysis.
With this column taken care of, I uploaded the dataset to Tableau to begin visualization and mapping. At this point in the project, I joined four other coursemates to work together on our images to create a more detailed story and divide up the work better within the allotted time.
Visualizing, Mapping, and Results
Tackling this project as a team allowed myself and my teammates the time to look at how election results changed from year to year across states and counties, how many votes different candidates received in different elections, party loyalty in historically red or blue states, and the voting patterns in swing states.
My contribution to the story visualizations was through state-by-state and county-by-county views of the elections data. I started by looking at state-by-state results, and after creating a map of the U.S using generated longitude and latitude values, I realized that representing this information would require several calculated fields. In order to understand a state’s result, I needed to count how many counties voted for which party. The dataset available did not have this information available, so I had to start by seeing which candidate’s votes were the maximum votes for that county in a particular election. I started by creating a calculated field that utilized a fixed LOD function to find the maximum candidate votes within one county in one year. Once I had the quantity of the maximum votes, I made two more calculated fields to return the winning candidate and the winning party of a vote. These both took a row’s candidate votes (one row was one candidate’s votes in a specific county in a specific year) and, if the value was equal to the county’s determined maximum votes for that election, the candidate’s name and party would be returned respectively. Having a county’s winning party available made it possible to create another calculated field to take the count of a state’s democratic or republican county results by taking the sum of how many county’s returned either as their winning party. Similarly, I was able to create a calculated field to return a state’s popular vote number for either party by adding the candidate votes for every county who voted for the county’s winning party. Lastly, I could return a state’s winning party by county votes or by popular vote by comparing whether the total democratic or republican county or popular vote was biggest.
Having the state-by-state results for popular vote now available, I returned to my map of the U.S and colored states based on their party winner. To add more details, I made it so that if one highlighted a state, the tooltip would show the state’s name, the election year, the total number of votes cast, and the state’s winning party. I added election years in the figure’s “pages” so that the changes these state-wide results could be animated and observed from year to year.
I compared this map with a similar one that displayed county-level results. Here, the color-coding was done by using county winners instead of state popular winners, and the tool tip included a county’s state, FIPS code, winning party, election year, and the number of votes cast in the county. Year after year, the map makes it seem as though the country was voting almost entirely republican. However, closer looks into county numbers showed that this wasn’t the case at all.
To highlight this discrepancy accurately, I zoomed into the state of Oregon on both maps for the 2004 election. Looking at the county-based view, only 8 counties in the state voted democrat, and the rest for republican. This would have implied that the state’s final vote would have been for that year’s Republican candidate, but in actuality, the popular vote resulted in a Democrat vote. Looking at the tool tips for democrat and republican voting counties though, it was clear that the red counties had significantly lower population numbers and thus votes cast, while the blue counties had up to nearly ten times the number of voters. From these maps, I was able to display the impact of the popular vote in election results, supporting the adage that “land doesn’t vote, people do”.