Wind Power Investment Data Sprint

The Scenario

For this project, I was paired up with four other course-mates to complete a 4-day data sprint. The scenario we were presented with was to work as analysts for a clean energy private equity firm and quickly assess the U.S wind turbine market. We were to work together to present our findings to the internal investment committee ahead of their quarterly investment decisions.

Data Sprint Overview

Presented with this real-world scenario, my teammates and I quickly got to work deciding what work we needed to do and how best to divide it up in the limited time frame. Our project was managed using the Agile methodology; more specifically, we used Trello to work through project items using the Kanban Agile method. After determining which workflow items had what priority, number of story points, and scope, we set to work and divided up our tasks between market and field research, data cleaning, visualization, and analysis. As this project was part of a remote certification course, we needed to collaborate with one another despite distance and timezone differences, and thus chose to use Google Drive to unify our materials. All team members were also strong independent workers, and we balanced this with the nature of our project by assigning tasks that could be done individually to one person so that we could work quietly over Zoom or Slack, while tasks that required group effort had all team members actively participating in the remote call while looking at the same Google Drive document.

My contribution: Querying and background research

My main input into this data sprint was in the project’s initial data wrangling and background research. We had two datasets available to us on PosgreSQL, one that described all wind turbines in the U.S and another from the U.S. Energy Information Administration that had data on power operators. I started my wrangling by looking at the individual tables, and I found that it would be easiest to understand the data as one table. The dataset about wind turbines had a good amount of information, but it lacked the supplemental detail on plant operators that could provide information on power projects in the country as a whole, including how expansive they were, who was managing them, and how many projects an operator was managing. As such, I decided to use a left Join to merge the operator data that was relevant to wind turbines to the wind turbine data set.

The resulting dataset, while useful, needed to be manipulated with more sophistication than PosgreSQL would have allowed. I decided to export the joined table to a CSV file and then upload it to Jupyter Notebook. Using the Python Pandas library, I was able to look at our 63,000 rows of EIA data with more clarity. To clean the data, me and my teammates filtered our dataset to check for any null values and where they might be. I took the lead in determining what to do with these values and found that there were over 15,000 rows of entirely null data - these were all dropped. Many others represented turbines missing data on their plant’s physical unit label, but seeing as we were focused on wind power exclusively I decided that these null values should be replaced with the dataset’s unit for gaseous energy sources: mcf. Other null values had to do with a turbine’s geography, which we filled in by checking the state of a turbine and matching it to its geographical region.

The dataset being otherwise clean, we were able to export this updated table to a CSV file to begin visualizations. Meanwhile, I turned my attention to the background research necessary to contextualize our findings. I focused on investigating the science behind wind power and the current wind turbine market.

Understanding how wind power is generated required that I focus my search to more technical and scientific sources. Much of the information gathered here came from either the energy.gov website or research-based pages. Through these I was able to provide detail to our team’s work on the types of wind-turbine configurations used to generate electricity, as well as the physics and constraints behind them. The latter was especially important, as it lent a deeper understanding to the importance of turbine height, wind speed, and blade length and how all three reflected varying power outputs by different turbine manufacturers and power plants.

Then, to understand the renewable energy market as a whole, I turned mostly to the eia.gov website to provide useful insight and data files on changes in U.S energy markets over time. By finding and investigating this data, I was able to present a market incentive for investment by demonstrating the growth of renewable energy sources and the multiple times higher generation potential of wind power.

Final recommendations and presentation

With our data, research, and visualizations complete, my teammates and I collaborated on our presentation in the last leg of the data sprint. Reflecting on my findings from my background research, as well as those found by analyzing and visualizing the data, we were able to make investment recommendations based on geographical location, turbine manufacturer, and power plant operator. For each category we presented three options to provide versatility for an investor depending on whether they wanted to invest in something small and starting out, something that already has a proven but saturated market, or something efficient with great earning potential. The slide deck was collectively designed using Google Slides, and we delivered our presentation for a mixed, mostly non-technical audience