Looking at historic climate patterns in Cedar Rapids, IA¶

I chose to look at Cedar Rapids, Iowa in the United States for this portfolio post. Located in east central Iowa, Cedar Rapids is the second largest city in the state, population-wise behind the capital, Des Moines. I grew up here (4th grade through high school) and went to college just down the road in Iowa City. I moved to Cedar Rapids in 1990 and lived in eastern Iowa until 2020.

For this analysis I'll be using data from the NOAA NCEI Climate Data Online network (network ID: GHCND:USC00131319). This station has been in service since 1892 and has good coverage (99%). It has been relocated since originally coming online and is now currently located in what is technically Marion, IA a suburb of Cedar Rapids.

NCEI station information at Cedar Rapids, IA

In [21]:
# Import python libraries
import pandas as pd
import holoviews as hv
import hvplot.pandas
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter

# Work with vector data
import geopandas as gpd

# Save maps and plots to files
import holoviews as hv
# Create interactive maps and plots
import hvplot.pandas

# Search for locations by name - this might take a moment
import osmnx as ox
from osmnx import features as osm
In [4]:
# Generate API endpoint request for Cedar Rapids, IA (station id: USC00131319)
cr_url = ('https://www.ncei.noaa.gov/access/services/data/v1?'
            'dataset=daily-summaries'
            '&dataTypes=TOBS,PRCP'
            '&stations=USC00131319'
            '&startDate=1892-01-01'
            '&endDate=2024-04-16'
            '&includeStationName=true'
            '&includeStationLocation=1'
            '&units=standard')
cr_url
Out[4]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00131319&startDate=1892-01-01&endDate=2024-04-16&includeStationName=true&includeStationLocation=1&units=standard'
In [5]:
# Open data using pandas
cr_df = pd.read_csv(cr_url,
                    index_col='DATE',
                    na_values=['NaN'],
                    parse_dates=True)
cr_df
Out[5]:
STATION NAME LATITUDE LONGITUDE ELEVATION PRCP TOBS
DATE
1892-01-01 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 1.68 NaN
1892-01-02 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.67 NaN
1892-01-03 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 NaN
1892-01-04 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 NaN
1892-01-05 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 NaN
... ... ... ... ... ... ... ...
2024-04-12 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 63.0
2024-04-13 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 78.0
2024-04-14 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 84.0
2024-04-15 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.00 76.0
2024-04-16 USC00131319 CEDAR RAPIDS NUMBER 1, IA US 42.0496 -91.5881 247.8 0.45 59.0

47929 rows × 7 columns

In [6]:
# Clean up the data (drop columns that don't vary with time)
cr_df = cr_df[['TOBS','PRCP']]
cr_df
Out[6]:
TOBS PRCP
DATE
1892-01-01 NaN 1.68
1892-01-02 NaN 0.67
1892-01-03 NaN 0.00
1892-01-04 NaN 0.00
1892-01-05 NaN 0.00
... ... ...
2024-04-12 63.0 0.00
2024-04-13 78.0 0.00
2024-04-14 84.0 0.00
2024-04-15 76.0 0.00
2024-04-16 59.0 0.45

47929 rows × 2 columns

In [7]:
# Plot the raw temperature data (TOBS)
cr_df.plot(y='TOBS')
Out[7]:
<Axes: xlabel='DATE'>
No description has been provided for this image
In [8]:
# Subset the data (1920-2020)
cr_1920_2020 = cr_df["1920":"2020"]
cr_1920_2020
Out[8]:
TOBS PRCP
DATE
1920-01-01 2.0 0.00
1920-01-02 -1.0 0.00
1920-01-03 9.0 0.00
1920-01-04 4.0 0.00
1920-01-05 25.0 0.00
... ... ...
2020-12-27 27.0 0.08
2020-12-28 22.0 0.00
2020-12-29 25.0 0.55
2020-12-30 24.0 0.20
2020-12-31 20.0 0.00

36507 rows × 2 columns

In [9]:
# Resample the data to calculate annual mean
cr_annual_mean = cr_1920_2020.resample('YE').mean()
cr_annual_mean
Out[9]:
TOBS PRCP
DATE
1920-12-31 52.401639 0.057240
1921-12-31 56.302198 0.074466
1922-12-31 54.463014 0.074027
1923-12-31 50.188679 0.091575
1924-12-31 52.850365 0.113382
... ... ...
2016-12-31 57.945355 0.117760
2017-12-31 57.249315 0.096055
2018-12-31 54.768802 0.129860
2019-12-31 53.376731 0.122402
2020-12-31 55.397759 0.109890

101 rows × 2 columns

In [10]:
# Calculate temperature in Celsius and add new column
cr_annual_mean['TCel'] = (cr_annual_mean['TOBS'] - 32) * 5/9
cr_annual_mean
Out[10]:
TOBS PRCP TCel
DATE
1920-12-31 52.401639 0.057240 11.334244
1921-12-31 56.302198 0.074466 13.501221
1922-12-31 54.463014 0.074027 12.479452
1923-12-31 50.188679 0.091575 10.104822
1924-12-31 52.850365 0.113382 11.583536
... ... ... ...
2016-12-31 57.945355 0.117760 14.414086
2017-12-31 57.249315 0.096055 14.027397
2018-12-31 54.768802 0.129860 12.649335
2019-12-31 53.376731 0.122402 11.875962
2020-12-31 55.397759 0.109890 12.998755

101 rows × 3 columns

In [11]:
# Load bokeh extension for interactive plots
hv.extension('bokeh')
cr_annual_mean.hvplot(y='TOBS')
No description has been provided for this image No description has been provided for this image
Out[11]:
In [12]:
cr_df_reset = cr_annual_mean.reset_index()
cr_df_reset
Out[12]:
DATE TOBS PRCP TCel
0 1920-12-31 52.401639 0.057240 11.334244
1 1921-12-31 56.302198 0.074466 13.501221
2 1922-12-31 54.463014 0.074027 12.479452
3 1923-12-31 50.188679 0.091575 10.104822
4 1924-12-31 52.850365 0.113382 11.583536
... ... ... ... ...
96 2016-12-31 57.945355 0.117760 14.414086
97 2017-12-31 57.249315 0.096055 14.027397
98 2018-12-31 54.768802 0.129860 12.649335
99 2019-12-31 53.376731 0.122402 11.875962
100 2020-12-31 55.397759 0.109890 12.998755

101 rows × 4 columns

In [13]:
cr_df_reset['year'] = cr_df_reset['DATE'].dt.year
cr_df_reset
Out[13]:
DATE TOBS PRCP TCel year
0 1920-12-31 52.401639 0.057240 11.334244 1920
1 1921-12-31 56.302198 0.074466 13.501221 1921
2 1922-12-31 54.463014 0.074027 12.479452 1922
3 1923-12-31 50.188679 0.091575 10.104822 1923
4 1924-12-31 52.850365 0.113382 11.583536 1924
... ... ... ... ... ...
96 2016-12-31 57.945355 0.117760 14.414086 2016
97 2017-12-31 57.249315 0.096055 14.027397 2017
98 2018-12-31 54.768802 0.129860 12.649335 2018
99 2019-12-31 53.376731 0.122402 11.875962 2019
100 2020-12-31 55.397759 0.109890 12.998755 2020

101 rows × 5 columns

In [14]:
cr_df_reset.set_index('year', inplace=True)
cr_df_reset
Out[14]:
DATE TOBS PRCP TCel
year
1920 1920-12-31 52.401639 0.057240 11.334244
1921 1921-12-31 56.302198 0.074466 13.501221
1922 1922-12-31 54.463014 0.074027 12.479452
1923 1923-12-31 50.188679 0.091575 10.104822
1924 1924-12-31 52.850365 0.113382 11.583536
... ... ... ... ...
2016 2016-12-31 57.945355 0.117760 14.414086
2017 2017-12-31 57.249315 0.096055 14.027397
2018 2018-12-31 54.768802 0.129860 12.649335
2019 2019-12-31 53.376731 0.122402 11.875962
2020 2020-12-31 55.397759 0.109890 12.998755

101 rows × 4 columns

In [15]:
# Plot TOBS using matplotlib and include trendline (w help from ChatGPT)

# Define figure and axis objects, specify figure size
fig, ax = plt.subplots(figsize=(5,3))

# Define x_values using matplotlib mdates
x_values = cr_df_reset.index

# Plot the TCel data as a scatter plot
ax.scatter(x_values,
           cr_df_reset["TOBS"],
           label="Temperature Data",
           color='white',
           edgecolor='black')

# Fit a linear regression model (from ChatGPT)
slope, intercept = np.polyfit(x_values, cr_df_reset["TOBS"], 1)
trendline = slope * x_values + intercept

# Plot trendline and TCel data as line (connecting points in scatter plot)
ax.plot(x_values, trendline, color='red', label="Trendline")
ax.plot(x_values, cr_annual_mean["TOBS"], color='grey', markersize=0.3)

# Add text box with slope value (From ChatGPT)
slope_text = f"Slope: {slope:.3f} °F/yr"
ax.text(0.43, 0.9, slope_text, transform=ax.transAxes, fontsize=10,
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='whitesmoke', alpha=0.9))

ax.set(title="Mean Annual Temperature\nCedar Rapids, IA (1920-2020)",
       ylabel="Temperature (°F)")

ax.legend()  # Add legend to display labels

plt.show()
No description has been provided for this image
In [16]:
# Plot TCel using matplotlib and include trendline (w help from ChatGPT)

# Define figure and axis objects, specify figure size
fig, ax = plt.subplots(figsize=(5,3))

# Define x_values using matplotlib mdates
x_values = cr_df_reset.index

# Plot the TCel data as a scatter plot
ax.scatter(x_values,
           cr_df_reset["TCel"],
           label="Temperature Data",
           color='white',
           edgecolor='black')

# Fit a linear regression model (from ChatGPT)
slope, intercept = np.polyfit(x_values, cr_df_reset["TCel"], 1)
trendline = slope * x_values + intercept

# Plot trendline and TCel data as line (connecting points in scatter plot)
ax.plot(x_values, trendline, color='red', label="Trendline")
ax.plot(x_values, cr_annual_mean["TCel"], color='grey', markersize=0.3)

# Add text box with slope value (From ChatGPT)
slope_text = f"Slope: {slope:.3f} °C/yr"
ax.text(0.43, 0.9, slope_text, transform=ax.transAxes, fontsize=10,
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='whitesmoke', alpha=0.9))

ax.set(title="Mean Annual Temperature\nCedar Rapids, IA (1920-2020)",
       ylabel="Temperature (°C)")

ax.legend()  # Add legend to display labels

plt.show()
No description has been provided for this image

Making a map of Cedar Rapids¶

In [18]:
import osmnx as ox
In [19]:
# Define the city name and state
city = 'Cedar Rapids'
state = 'Iowa'

# Fetch the boundary polygon of the city
place_name = f'{city}, {state}, USA'
tags = {'place': 'city', 'name': city}
cr_gdf = ox.geometries_from_place(place_name, tags)
cr_gdf
/tmp/ipykernel_1169/3458676958.py:8: FutureWarning: The `geometries` module and `geometries_from_X` functions have been renamed the `features` module and `features_from_X` functions. Use these instead. The `geometries` module and function names are deprecated and will be removed in the v2.0.0 release.
  cr_gdf = ox.geometries_from_place(place_name, tags)
Out[19]:
ele gnis:feature_id name place population population:date wikidata wikipedia geometry name:en ... tiger:LSAD tiger:MTFCC tiger:NAME tiger:NAMELSAD tiger:PCICBSA tiger:PCINECTA tiger:PLACEFP tiger:PLCIDFP tiger:STATEFP type
element_type osmid
node 151626808 247 465941 Cedar Rapids city 137710 2022 Q486439 en:Cedar Rapids, Iowa POINT (-91.67041 41.97589) Cedar Rapids ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
relation 128355 NaN 2394368 Hiawatha city NaN NaN Q1922650 en:Hiawatha, Iowa POLYGON ((-91.67082 42.06598, -91.67083 42.065... NaN ... 25 G4110 Hiawatha Hiawatha city N N 35940 1935940 19 boundary
129078 NaN 2396389 Robins city NaN NaN Q1926058 en:Robins, Iowa MULTIPOLYGON (((-91.70149 42.10099, -91.69697 ... NaN ... 25 G4110 Robins Robins city N N 67800 1967800 19 boundary
129088 NaN 467567 Cedar Rapids city NaN NaN Q486439 en:Cedar Rapids, Iowa MULTIPOLYGON (((-91.72643 41.98905, -91.72643 ... NaN ... 25 G4110 Cedar Rapids Cedar Rapids city Y N 12000 1912000 19 boundary

4 rows × 33 columns

In [20]:
# Select out just the CR entries from this data
cr = cr_gdf[cr_gdf['name']=='Cedar Rapids']
cr

# Select out just the CR polygon
cr_poly = cr[cr['type']=='boundary']
cr_poly
Out[20]:
ele gnis:feature_id name place population population:date wikidata wikipedia geometry name:en ... tiger:LSAD tiger:MTFCC tiger:NAME tiger:NAMELSAD tiger:PCICBSA tiger:PCINECTA tiger:PLACEFP tiger:PLCIDFP tiger:STATEFP type
element_type osmid
relation 129088 NaN 467567 Cedar Rapids city NaN NaN Q486439 en:Cedar Rapids, Iowa MULTIPOLYGON (((-91.72643 41.98905, -91.72643 ... NaN ... 25 G4110 Cedar Rapids Cedar Rapids city Y N 12000 1912000 19 boundary

1 rows × 33 columns

In [22]:
# Plot gdf using .explore()
cr_gdf.explore()
Out[22]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [26]:
cr_map = cr_poly.hvplot(
    # Givethe map a descriptive title
    title="Cedar Rapids, IA",
    # Add a basemap
    geo=True, tiles='EsriImagery',
    # Change the colors
    fill_color='white', fill_alpha=0.2,
    line_color='skyblue', line_width=5,
    # Change the image size
    frame_width=800, frame_height=800)

# Display the map
cr_map
Out[26]:

Cedar Rapids is slowly getting warmer¶

Between 1920 and 2020, the averge annual temperature in Cedar Rapids, IA increased at a rate of 0.032 °C/yr; 3.2 °C/century [0.058 °F/yr; 5.8 °F/century]. During this time period 1957 and 2012 had the highest average observed temperatures at nearly 16 °C (~60 °F).

In [25]:
%%capture
%%bash
jupyter nbconvert *.ipynb --to html