Friday, October 30, 2020

Visualization of Maharashtra’s Elections

A Guide to Making an Election’s Choropleth Map

Maharashtra is the second the most populous, and the third largest state in India. Maharashtra contributes about 15 ‘% to the GDP of the country making it the single largest contributor to the economy, while state capital Mumbai holds the status of the financial capital of the country. In addition, Pune is also known as the ‘Oxford of the East’ due to the presence of numerous educational institutes like IUCAA, NCA, IISER etc. Henceforth, anything that happens in Maharashtra has economic ramifications at the national level.

This article will cover how to make a choropleth map of the Maharashtra’s 2019 assembly elections using publicly available data. The shape file for the assembly constituencies, in general, is an extremely scarce resource on the internet.

Data

  1. The shape files for the assembly constituencies can be found here https://github.com/datameet/maps
  2. The data for the successful parties in respective constituencies can be downloaded from the election commission webpage.

Required Packages

  1. Pandas (installation guide: https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html)
  2. Geopandas (installation: https://geopandas.org/install.html)
  3. Seaborn (installation: https://seaborn.pydata.org/installing.html)
  4. Matplotlib and numpy

Map Data

Importing Packages

At first the required packages are imported as shown below:

import pandas as pd  
import geopandas as gpd  
import matplotlib.pyplot as plt  
import numpy as np  
import seaborn as sns  
sns.set_style('whitegrid')

The ‘set_style’ is an optional command that sets a white grid for the upcoming plots.

# Importing the shape file of assembly-constituencies  
fp = "assembly-constituencies"  
map_df = gpd.read_file(fp)  
map_df = map_df[map_df['ST_NAME']=='MAHARASHTRA']  
map_df = map_df[['AC_NAME', 'geometry']]  
map_df.reset_index(drop=True,inplace=True)  
map_df.head(3)

The ‘read_file’ imports the shape files for the assembly constituencies in India as a DataFrame, and ‘map_df[‘ST_NAME’]==’MAHARASHTRA’ filters constituencies belonging to Maharashtra. The subsequent lines retain only the required features such as the constituency name (AC_NAME) and shape file (geometry).

Basic Exploratory Data Analysis

map_df.info()

It is seen from above that there are 288 assembly constituency names (AC_NAME) and 302 shape files (geometry). The data from election commission of India shows there are 288 assembly constituencies. Let us explore more to address this discrepancy.

# Plot of the assembly-constituencies  
map_df.plot()

The plot of the shape file seems to be normal. Let us check if there are any null type values.

# checking for null values  
Null_Values = map_df.isnull().sum()  
print(Null_Values)

A null type object in python is represented with a None keyword. Here, the isnull() command turns the DataFrame map_df into a True or False table, where True is the place holder for None keywords and vice-versa. The sum() command totals the True or False values with a value of one and zero, respectively. Hence, there are fourteen missing values in constituency names, and likewise zero missing values in shape file.

This can be visualized as a heat map from the Seaborn library:

# It seems some of the names of the assembly-constituencies is missing!  
sns.heatmap(Null_Values)

The odd straight lines in AC_NAME column represents the missing values. These values can be removed with the pandas’s dropna() as shown below.

map_df.dropna(inplace=True) # The 'None' type rows are removed  
map_df.reset_index(drop=True,inplace=True) # index is reset  
map_df.head()

After removing the null type values, the index is reset to account for the removed values, and sets the constituency name as the new index — this is achieved by the drop=True option. While the inplace=True option retains these changes in the map_df DataFrame.

Data Cleaning

The data has to be cleaned to match the names of constituencies from the election commission. One can see that there are (ST) and (SC) appended to the names to show the constituencies reserved for ST and SC.

#Cleaning the names of assembly constituencies  
def text_process(names):  
    semi_cleaned_names = [word for word in names.strip().lower().split() if word not in ['(st)','(sc)']]   
    joined_cleaned_names = " ".join(semi_cleaned_names)  
    removed_st_names = joined_cleaned_names.replace("(st)","")  
    fully_cleaned_names = removed_st_names.replace("(sc)","")  
    return fully_cleaned_names# The cleaned names  
map_df['cleaned_names']=map_df['AC_NAME'].apply(text_process)  
map_df = map_df[['cleaned_names','geometry']]

The apply command sends one word at a time to the text_process() function. Here, each word is stripped of the leading and trailing white spaces, converted into lower case, and split into more words with separator as whitespace via names.strip().lower().split(). An example word Arjuni Morgaon(SC) will look like [‘arjuni’, ‘morgaon(sc)’] after the first line of code. The word (sc) is still retained in this word because it is not separated with a whitespace from morgaon. The join() command on this word will result in arjuni morgaon(sc). Finally, the replace(“(sc)”,” ”) will give the fully cleaned name arjuni morgaon.

The DataFrame now looks like:

map_df.head()

Election Data

The data for the successful parties in respective constituencies can be downloaded from the election commission page in xlsx format.

# Importing Election Data  
df_election = pd.read_excel('2-List of Successful Candidates.xlsx',names=np.array(['State','Constituency','Winner','Sex','Party','Symbol']))  df_election.reset_index(drop=True)  
df_election.head()

The read_excel command from pandas directly imports the excel sheet data as a DataFrame, and the names option sets the new names for the columns.

Basic Exploratory Data Analysis

df_election.info()

The above result shows there are no null values in any of the six columns.

plt.figure(figsize=(10,5))  
sns.countplot(x='Party',data=df_election,)

The countplot command plots the number of constituencies won by each party, whereas the value_count() gives the exact numbers:

df_election['Party'].value_counts()

Clearly, the top 4 parties have won in far more constituencies than the rest. Now, let’s segregate only the parties that won in more than 20 constituencies. This is to focus on the parties that performed well.

df_successful_party = df_election['Party'].value_counts() > 20  
df_successful_party=df_successful_party[df_successful_party==True].index  
Succ_parties = df_successful_party.values  
df = df_election['Party'].apply(lambda x: x in Succ_parties)  
sns.countplot(x = 'Party',data = df_election[df],hue='Sex')
Party-wise breakdown of successful candidates with gender.
Party-wise breakdown of successful candidates with gender.

The first line gives a pandas’ series with party name as the index and the column as True or False values, where True represents values greater than 20, and False otherwise. The second line gives the parties names with only True values via index command. The third line converts the parties names into a numpy array. This array is used in the following apply command, which filters all the parties with wins greater than 20. The final line just plots the number of constituencies won by each party with classification among the Sex of the candidate. Among the absolute numbers the BJP seems to be representing more women among candidates that won. Let’s take a deeper look into this.

df_election['Sex'].value_counts()  
Percentage_of_women = 24/288*100  
Percentage_of_women

The above lines show that there is only 8% of women representation among all the successful candidates including all the parties. Breaking it down for a party-wise representation:

df_election[df_election['Party']=='BJP']['Sex'].value_counts()  
Percentage_of_women_BJP = (12/105)*100  
Percentage_of_women_BJP

This shows 11 % representation by women in BJP’s successful candidates. Similarly, calculating it for the all parties one can plot the following:

arr = np.array([Percentage_of_women,Percentage_of_women_BJP,Percentage_of_women_NCP,Percentage_of_women_SHS,Percentage_of_women_INC])

Women_representation = pd.DataFrame(data=arr,columns=[‘Women Representation (%)’],index=None)

Women_representation['Party'] = ['Overall','BJP','NCP','SHS','INC']plt.figure(figsize=(8,5))sns.barplot(y='Women Representation (%)',x='Party',data=Women_representation)
Women representation party-wise.
Women representation party-wise.

The above plot depicts the party-wise representation by women in terms of percentage among the successful candidates. It can be seen that BJP and Congress have almost equal women representation. In addition, both parties surpass the average women representation of 8 %. The third and fourth place is occupied by NCP and SHS with below average representations.

Data Cleaning

# Cleaning the names of assembly constituencies  
df_election['Constituency']=df_election['Constituency'].str.lower()  
df_election['Constituency']=df_election['Constituency'].str.strip()

# Joining both DF with ‘cleaned_names’ column as index
merged = map_df.set_index(‘cleaned_names’).join(df_election.set_index(‘Constituency’))
merged.head()

This is a simpler form of data cleaning compared to the map data. Here, all the names converted to lower, and starting and trailing white spaces are removed. The join command joins the map DataFrame and this DataFrame, where the names of the constituencies match!

Choropleth Map of Maharashtra’s Election

# Plotting the election results  
fig, ax = plt.subplots(1,figsize=(10, 6))  
ax.axis('off')  
ax.set_title('Maharashtra Election 2019', fontdict={'fontsize': '25', 'fontweight' : '3'})merged.plot(column='Party', cmap='plasma', linewidth=0.7, ax=ax, edgecolor='0.8', legend=True)  
leg = ax.get_legend()  
leg.set_bbox_to_anchor((1, 0.7, 0.2, 0.2))

The subplot() creates a figure and axis instance which can be used to set the attributes of the plot. This is an object oriented way of programming, in contrast, plt.plot() is a functional way of programming. Here axis instance, ax, is used to set the title of the plot, and remove plot axes. The merged.plot() plots the choropleth map with respect to the parties. This is a Pandas’ specific way of plotting, where a more general command looks like DataFrame_Name.plot(). It uses matplotlib as a backend, hence, works very well in conjunction with matplotlib specific commands such as subplot(). The cmap option sets the colour scheme of the plot. Furthermore, the get_legend() creates a legend instance which is used along with set_bbox_to_anchor() for more accurate placement of the legend.

The white patches in choropleth map represent missing constituencies. These are the assembly constituencies whose names were none and was removed from the map_df DataFrame. One can experiment with a few more cmap colour options as shown here. However, one must see that there are too many parties to identify all of them with just colours. In the future, we shall explore more on improving both comprehension and aesthetic of this plot!

The full notebook can be referred to here.

Related Articles

7 Ways to Invest in Yourself Today That Could Transform Your Life Later

The smallest shifts build-up to create the biggest changes.

12 Age-Reversing Habits: How I Made My Brain 10 Years Younger

Powerful scientific evidence from an addict turned neuroscientist.

Buddha Describes 5 Hindrances That May Stop us From Living Up to Our Full Potential

And how to overcome them. I’m in the middle of an inner revolution where I’m going all-in on mindfulness...

Stay Connected

18,978FansLike
13,689FollowersFollow
4,578SubscribersSubscribe

Latest Articles

7 Ways to Invest in Yourself Today That Could Transform Your Life Later

The smallest shifts build-up to create the biggest changes.

12 Age-Reversing Habits: How I Made My Brain 10 Years Younger

Powerful scientific evidence from an addict turned neuroscientist.

Buddha Describes 5 Hindrances That May Stop us From Living Up to Our Full Potential

And how to overcome them. I’m in the middle of an inner revolution where I’m going all-in on mindfulness...

Use Steve Jobs’s 30% Rule to Enter Your “Golden Age”

How the legendary founder employed minimalism to turn Apple around at its lowest point

How I Made 30k on My First Self-Published Book

How I Made $30,000 on My First Self-Published Book In 2019, I had a few hundred Twitter followers. In 2020, I reached 30,000 Twitter followers in just a few months. Thanks to building an audience first, my book made over $30,000 and sold over 1500 copies.