Skip to the content.

Analyzing Power Outages: Watts the Problem?

A project by:

Introduction

Power outages can have significant economic and social impacts, affecting households, businesses, and essential services. Understanding the underlying causes of outages is crucial for improving power grid reliability and enabling quicker response times.

This project analyzes historical outage data (1540 rows × 57 columns) to develop a model that predicts the cause of an outage based on factors like anomaly level, outage duration, climate region, and customer impact. By identifying patterns in the data, we can provide insights that help power companies take preventative measures and optimize resource allocation during disruptions.

The focus of this project is on the following question: “Developing a machine learning model that can predict outage causes based on real-time factors.”

This predictive capability could help power companies anticipate issues, allocate resources more effectively, and minimize disruption. The description of the relevent variables is as follows:

Variable Name Description
year The year when the outage event occurred
state The state when the outage event occurred
nerc The North American Electric Reliability Corporation (NERC) regions involved in the outage event
climate_region U.S. Climate regions as specified by National Centers for Environmental Information
anomaly_level Represents the oceanic El Niño/La Niña (ONI) index referring to the cold and warm episodes by season
climate_category Represents the climate episodes corresponding to the years
outage_start_timestamp Indicates the date and time when the outage event started
restoration_timestamp Indicates the date and time when power was restored to all customers
cause Categories of all the events causing major power outages
cause_detail Detailed description of the event categories causing major power outages
duration Duration of outage events (in minutes)
demand_loss Amount of peak demand lost during an outage event (in Megawatt)
customers_affected Number of customers affected by the power outage event

Data Cleaning

The raw dataset required multiple preprocessing steps to ensure it was structured correctly for analysis. Below is a breakdown of the key data cleaning steps performed:

1. Adjusting Column Names and Removing Extra Rows

2. Selecting Relevant Features

3. Renaming Columns

4. Handling Missing and Invalid Values

5. Combining Date and Time Columns

The first few rows of our DataFrame outage look like this:

year state nerc climate_region climate_category anomaly_level cause cause_detail duration demand_loss customers_affected outage_start_timestamp restoration_timestamp
2011 Minnesota MRO East North Central normal -0.3 severe weather NaN 3060 NaN 70000 2011-07-01 17:00:00 2011-07-03 20:00:00
2014 Minnesota MRO East North Central normal -0.1 intentional attack vandalism 1 NaN NaN 2014-05-11 18:38:00 2014-05-11 18:39:00
2010 Minnesota MRO East North Central cold -1.5 severe weather heavy wind 3000 NaN 70000 2010-10-26 20:00:00 2010-10-28 22:00:00
2012 Minnesota MRO East North Central normal -0.1 severe weather thunderstorm 2550 NaN 68200 2012-06-19 04:30:00 2012-06-20 23:00:00
2015 Minnesota MRO East North Central warm 1.2 severe weather NaN 1740 250.0 250000 2015-07-18 02:00:00 2015-07-19 07:00:00

Exploratory Data Analysis

Univariate Analysis

1. Frequency of Power Outages Across Climate Regions

The bar chart illustrates the frequency of power outages across different climate regions, providing insight into how often outages occur in each area. The height of each bar represents the number of recorded outage events, making it easy to compare the distribution across regions. Regions with higher frequencies may experience more frequent extreme weather conditions, such as hurricanes, heavy snowfall, or heatwaves, which could contribute to power disruptions. Conversely, regions with lower frequencies may have more stable climates or resilient infrastructure.

As evident, the Northeast region experiences the most outages, whereas the West North Central region experiences the least outages.

2. Distribution of Power Outage Causes

The pie chart illustrates the distribution of power outages based on different cause categories, showing the proportion of outages attributed to each factor. Each slice represents a specific cause, with its size corresponding to the frequency of outages in that category. This visualization provides a clear overview of the primary reasons for power disruptions, such as extreme weather events, equipment failures, human errors, or other environmental factors.

By analyzing the distribution, we can identify that severe weather is the major cause of power outages, accounting for almost 49.7% of them, followed by intentional attack with 27.2%.

Bivariate Analysis

1. Impact of Climate Region on Outage Duration

This scatter plot visualizes the relationship between climate region and outage duration. Each point represents an outage, with the x-axis showing the climate region and the y-axis representing the duration of the outage. Different colors distinguish the climate regions, making it easier to identify patterns.

2. Impact of Climate Category on Anomaly Level

This bivariate analysis using the box plot visualizes the relationship between Climate Category and Anomaly Level. The x-axis represents different climate categories, while the y-axis shows the distribution of anomaly levels within each category. Here are some key observations:

Interesting Aggregates

We have created a pivot table to analyze the relationship between climate conditions and power outage duration, where the average outage duration is calculated for each combination of climate_category (such as cold, warm, or normal) and climate_region (geographical climate zones). This allows for a structured comparison of outage duration across different climate conditions.

climate_category Central East North Central Northeast Northwest South Southeast Southwest West West North Central
cold 2676.338235 6568.789474 3568.766129 874.680851 1977.403509 1707.066667 499.208333 1735.171875 200.000000
normal 2682.147059 5207.710843 2261.327778 733.612245 3685.441441 2392.267606 283.260870 1142.322581 28.428571
warm 2080.896552 3022.117647 3990.311111 2212.555556 1672.101695 2528.941176 5127.681818 1942.016667 2486.500000

As evident, East North Central experiences the highest impact during cold and normal conditions, suggesting grid vulnerabilities in harsh weather. The Southwest faces the most disruption in warm conditions, likely due to extreme heat and increased energy demand. In contrast, West North Central remains the most resilient, with minimal average disruptions across all climate conditions.

Assessment of Missingness

NMAR Analysis

In our dataset, the customers_affected column could potentially be Not Missing at Random (NMAR). This is because the missing values in this column might depend on the severity of the outage or reporting practices rather than being randomly missing. For example, smaller outages affecting fewer customers might be less likely to be reported accurately, whereas larger outages may have more complete data due to higher scrutiny and impact.

To determine if customers_affected is truly NMAR, we would need additional data, such as utility company reporting policies, regional reporting standards, or manual record-keeping biases. If we can obtain external data explaining why some values are missing—such as a pattern in which certain companies or states systematically fail to report small-scale outages—then the missingness could instead be classified as Missing at Random (MAR) rather than NMAR.

Missingness Dependency

To test missingness dependency, we will focus on the distribution of customers_affected. We will test this against the columns cause and climate_category.

1. customers_affected vs cause

Null Hypothesis: The distribution of outage cause is independent of whether the customers_affected value is missing or not.

Alternative Hypothesis: The distribution of outage cause differs based on whether the customers_affected value is missing or not.

We used TVD (Total Variation Distance) for our observed test statistic, which was found to be 0.557. Whereas, our p-value was found to be 0.001 after conducting a permutation test with 500 repetitions. Since our p-value is much below the standard significance level of 0.05, we reject the null hypothesis in favour of the alternative hypothesis, which means that the distribution of outage causes differs based on whether the customers_affected value is missing or not, and hence, the missingness of customers_affected is dependent on cause.

2. customers_affected vs climate_category

Null Hypothesis: The distribution of climate_category is independent of whether the customers_affected value is missing or not.

Alternative Hypothesis: The distribution of climate_category differs based on whether the customers_affected value is missing or not.

We used TVD (Total Variation Distance) for our observed test statistic, which was found to be 0.036. Whereas, our p-value was found to be 0.366 after conducting a permutation test with 500 repetitions. Since our p-value is much above the standard significance level of 0.05, we fail to reject the null hypothesis, which means that the distribution of climate_category is independent of whether the customers_affected value is missing or not, and hence, the missingness of customers_affected is not dependent on climate_category.

Hypothesis Testing

Analysis of the Difference in Mean Customers Affected by Power Outages in Texas and Washington

We aim to determine whether the number of customers affected by power outages is greater on average in Texas compared to Washington. The relevant columns for this test are customers_affected and state. We will only be using data from outages that occurred in Texas or Washington.

Test Statistic:

The difference in means, specifically:

Mean(Customers Affected in Texas) - Mean(Customers Affected in Washington)

Methodology:

We performed a permutation test with 10,000 simulations to generate an empirical distribution of the test statistic under the null hypothesis. In each permutation, the customers_affected values were shuffled while keeping the state labels fixed. This allowed me to assess how likely the observed difference would occur by random chance.

Results:

With a standard significance level of 0.05, the p-value is sufficiently small, allowing us to reject the null hypothesis. This suggests that the number of customers affected by outages in Texas is significantly higher on average compared to Washington.

Visualization:

The plot below shows the distribution of permuted differences, with the observed difference marked as a dashed red line.

Framing a Prediction Problem

Our model will try to predict the climate category of a power outage. This will be a binary classification problem because we are only focusing on outages occurring in either "normal" or "cold" climate conditions.

The metric I am using to evaluate my model is accuracy, as the dataset appears to be balanced, making accuracy a reasonable measure of overall performance. However, I will also consider precision, recall, and F1-score to ensure the model is not biased toward one class, especially if class imbalance exists.

At the time of prediction, we would know the state, climate region, cause of outage, number of customers affected, and outage duration. These features are available at the time of an outage and can be used to classify whether the climate conditions at the time were "normal" or "cold".

Baseline Model

Our model is a binary classifier that predicts the climate category of a power outage as either “normal” or “cold”. Understanding the climate category of an outage could help energy companies anticipate seasonal disruptions and optimize resource allocation for infrastructure resilience. The model is built using a Random Forest Classifier, with numerical and categorical features properly preprocessed.

Features Used in the Model:

Quantitative Features (Numerical)

Preprocessing:

Nominal Features (Categorical)

Preprocessing:

Target Variable:

Model Performance and Analysis

While the train accuracy is high, indicating that the model may be overfitting, the test accuracy of 0.6066 suggests poor generalization to new data. The low recall for the “cold” category indicates that the model is not effectively identifying severe weather-induced outages. Additionally, the imbalanced classes could be contributing to the model’s weaker performance, as the model appears to favor predicting “normal” outages.

Possible Improvements:

  1. Hyperparameter tuning will be employed using GridSearchCV to optimize the model’s complexity.

  2. Feature engineering will be applied to include more time-based features and refine the data input.

  3. Class imbalance will be addressed to improve performance for both classes, especially “cold” outages.

Final Model

Modeling Algorithm

For the final model, we kept the RandomForestClassifier, as it is an ensemble learning method that helps in reducing overfitting and improving predictive accuracy by aggregating the results from multiple decision trees. Random forests are particularly well-suited for complex datasets with both categorical and numerical features, as they can effectively capture non-linear relationships and interactions between features.

Feature Engineering and Selection

The final model included several key features that we believe significantly contribute to the predictive power of the model:

These features, particularly the time-based and regional ones, help the model better understand the environmental and seasonal factors that drive power outages, improving its ability to predict outages under various conditions.

Hyperparameter Tuning

To optimize the performance of the RandomForestClassifier, we used GridSearchCV to tune the hyperparameters:

Since this is a fairly balanced dataset for our prediction problem, we have used accuracy along with some other classification report metrics to rate our model. The final model achieved a test accuracy of 75.41% and a balanced classification performance, with an F1 score of 0.75, improving on the baseline model’s performance. Compared to the baseline, which had lower accuracy and recall for the “cold” class, the final model shows a more robust performance across both classes, particularly with higher recall for the “normal” class (85%). Additionally, the model’s cross-validation accuracy of 74.28% indicates better generalization and stability. This improvement suggests that the adjustments made, including feature selection and hyperparameter tuning, contributed to a more reliable and efficient model.l.

Fairness Analysis

Our groups for the fairness analysis are high-affected vs low-affected customer groups, defined as outages that impact more than 60,000 customers (high impact) versus those that affect fewer than 60,000 customers (low impact).

We chose these groups because the severity of the outage (in terms of customer impact) could influence the accuracy of the model’s predictions for the climate category. It’s important to ensure that the model performs equally well across both groups to avoid potential bias.

The evaluation metric used is precision, specifically the weighted precision, which calculates the proportion of true positive predictions relative to all positive predictions, giving weight to the size of each class. This metric was chosen because it focuses on minimizing false positives, which is important in ensuring that the model is making reliable and correct positive predictions across different groups. Unfairness in this could lead to significant misinterpretations or costly decisions. We will use permutation tests to calculate the precision for high-affected vs low-affected customer groups (that are randomly shuffled), keeping a standard significance level of 0.05 (5%).

Test Statistic: Absolute difference in means

The p-value is computed by comparing the absolute observed precision difference (calculated from the actual data) to the distribution of absolute precision differences from the 500 permutations. Since our p-value is 0.108, which is above the standard significance level, we fail to reject the null hypothesis. This means that there is no statistically significant difference in precision between high-affected and low-affected customer groups, suggesting that the model performs equally well across both groups. Therefore, we conclude that our model seems to be fair with respect to the precision metric.