Plotting

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can I plot my data?

Objectives
  • Create a time series plot showing a single data set.

  • Create a scatter plot showing relationship between two data sets.

matplotlib is the most widely used scientific plotting library in Python.

import matplotlib.pyplot as plt
%matplotlib tk
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.xlabel('Numbers')
plt.ylabel('Doubles')

Plot data directly from a Pandas dataframe.

import pandas

data = pandas.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
data.ix['Australia'].plot()
plt.xticks(rotation=90)

Select and transform data, then plot it.

data.T.plot()
plt.ylabel('GDP per capita')
plt.xticks(rotation=90)

Many styles of plot are available.

plt.style.use('ggplot')
data.T.plot(kind='bar')
plt.xticks(rotation=90)
plt.ylabel('GDP per capita')
# Accumulator pattern to collect years (as character strings).
years = []
for col in data.columns:
    year = col[-4:]
    years.append(year)

# Australia data as list.
gdp_australia = data.ix['Australia'].tolist()

# Plot: 'b-' sets the line style.
plt.plot(years, gdp_australia, 'b-')

Can plot many sets of data together.

# Accumulator pattern to collect years (as character strings).
years = []
for col in data.columns:
    year = col[-4:]
    years.append(year)

# Select two countries' worth of data.
gdp_australia = data.ix['Australia']
gdp_nz = data.ix['New Zealand']

# Plot with differently-colored markers.
plt.plot(years, gdp_australia, 'b-', label='Australia')
plt.plot(years, gdp_nz, 'g-', label='New Zealand')

# Create legend.
plt.legend(loc='upper left')
plt.xlabel('Year')
plt.ylabel('GDP per capita ($)')
plt.scatter(gdp_australia, gdp_nz)
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')

Minima and Maxima

Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. Modify it again to plot the maximum GDP per capita over time for Europe.

data_europe = pandas.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.____.plot(label='min')
data_europe.____
plt.legend(loc='best')

Correlations

Modify the example in the notes to create a scatter plot showing the relationship between the minimum and maximum GDP per capita among the countries in Asia for each year in the data set. What relationship do you see (if any)?

data_asia = pandas.read_csv('gapminder_gdp_asia.csv')
data_asia.describe().T.plot(kind='scatter', x='min', y='max')

You might note that the variability in the maximum is much higher than that of the minimum. Take a look at the maximum and the max indexes:

data_asia = pandas.read_csv('gapminder_gdp_asia.csv')
data_asia.max().plot()
print(data_asia.idxmax())
print(data_asia.idxmin())

More Correlations

This short programs creates a plot showing the correlation between GDP and life expectancy for 2007, normalizing marker size by population:

data_all = pandas.read_csv('gapminder_all.csv')
data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
              s=data_all['pop_2007']/1e6)

Using online help and other resources, explain what each argument to plot does.

Key Points

  • matplotlib is the most widely used scientific plotting library in Python.

  • Plot data directly from a Pandas dataframe.

  • Select and transform data, then plot it.

  • Many styles of plot are available.

  • Can plot many sets of data together.