Looking at historic climate patterns in Cedar Rapids, IA¶
I chose to look at Cedar Rapids, Iowa in the United States for this portfolio post. Located in east central Iowa, Cedar Rapids is the second largest city in the state, population-wise behind the capital, Des Moines. I grew up here (4th grade through high school) and went to college just down the road in Iowa City. I moved to Cedar Rapids in 1990 and lived in eastern Iowa until 2020.
For this analysis I'll be using data from the NOAA NCEI Climate Data Online network (network ID: GHCND:USC00131319). This station has been in service since 1892 and has good coverage (99%). It has been relocated since originally coming online and is now currently located in what is technically Marion, IA a suburb of Cedar Rapids.
# Import python libraries
import pandas as pd
import holoviews as hv
import hvplot.pandas
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
# Work with vector data
import geopandas as gpd
# Save maps and plots to files
import holoviews as hv
# Create interactive maps and plots
import hvplot.pandas
# Search for locations by name - this might take a moment
import osmnx as ox
from osmnx import features as osm
# Generate API endpoint request for Cedar Rapids, IA (station id: USC00131319)
cr_url = ('https://www.ncei.noaa.gov/access/services/data/v1?'
'dataset=daily-summaries'
'&dataTypes=TOBS,PRCP'
'&stations=USC00131319'
'&startDate=1892-01-01'
'&endDate=2024-04-16'
'&includeStationName=true'
'&includeStationLocation=1'
'&units=standard')
cr_url
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00131319&startDate=1892-01-01&endDate=2024-04-16&includeStationName=true&includeStationLocation=1&units=standard'
# Open data using pandas
cr_df = pd.read_csv(cr_url,
index_col='DATE',
na_values=['NaN'],
parse_dates=True)
cr_df
STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | PRCP | TOBS | |
---|---|---|---|---|---|---|---|
DATE | |||||||
1892-01-01 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 1.68 | NaN |
1892-01-02 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.67 | NaN |
1892-01-03 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | NaN |
1892-01-04 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | NaN |
1892-01-05 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
2024-04-12 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | 63.0 |
2024-04-13 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | 78.0 |
2024-04-14 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | 84.0 |
2024-04-15 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.00 | 76.0 |
2024-04-16 | USC00131319 | CEDAR RAPIDS NUMBER 1, IA US | 42.0496 | -91.5881 | 247.8 | 0.45 | 59.0 |
47929 rows × 7 columns
# Clean up the data (drop columns that don't vary with time)
cr_df = cr_df[['TOBS','PRCP']]
cr_df
TOBS | PRCP | |
---|---|---|
DATE | ||
1892-01-01 | NaN | 1.68 |
1892-01-02 | NaN | 0.67 |
1892-01-03 | NaN | 0.00 |
1892-01-04 | NaN | 0.00 |
1892-01-05 | NaN | 0.00 |
... | ... | ... |
2024-04-12 | 63.0 | 0.00 |
2024-04-13 | 78.0 | 0.00 |
2024-04-14 | 84.0 | 0.00 |
2024-04-15 | 76.0 | 0.00 |
2024-04-16 | 59.0 | 0.45 |
47929 rows × 2 columns
# Plot the raw temperature data (TOBS)
cr_df.plot(y='TOBS')
<Axes: xlabel='DATE'>
# Subset the data (1920-2020)
cr_1920_2020 = cr_df["1920":"2020"]
cr_1920_2020
TOBS | PRCP | |
---|---|---|
DATE | ||
1920-01-01 | 2.0 | 0.00 |
1920-01-02 | -1.0 | 0.00 |
1920-01-03 | 9.0 | 0.00 |
1920-01-04 | 4.0 | 0.00 |
1920-01-05 | 25.0 | 0.00 |
... | ... | ... |
2020-12-27 | 27.0 | 0.08 |
2020-12-28 | 22.0 | 0.00 |
2020-12-29 | 25.0 | 0.55 |
2020-12-30 | 24.0 | 0.20 |
2020-12-31 | 20.0 | 0.00 |
36507 rows × 2 columns
# Resample the data to calculate annual mean
cr_annual_mean = cr_1920_2020.resample('YE').mean()
cr_annual_mean
TOBS | PRCP | |
---|---|---|
DATE | ||
1920-12-31 | 52.401639 | 0.057240 |
1921-12-31 | 56.302198 | 0.074466 |
1922-12-31 | 54.463014 | 0.074027 |
1923-12-31 | 50.188679 | 0.091575 |
1924-12-31 | 52.850365 | 0.113382 |
... | ... | ... |
2016-12-31 | 57.945355 | 0.117760 |
2017-12-31 | 57.249315 | 0.096055 |
2018-12-31 | 54.768802 | 0.129860 |
2019-12-31 | 53.376731 | 0.122402 |
2020-12-31 | 55.397759 | 0.109890 |
101 rows × 2 columns
# Calculate temperature in Celsius and add new column
cr_annual_mean['TCel'] = (cr_annual_mean['TOBS'] - 32) * 5/9
cr_annual_mean
TOBS | PRCP | TCel | |
---|---|---|---|
DATE | |||
1920-12-31 | 52.401639 | 0.057240 | 11.334244 |
1921-12-31 | 56.302198 | 0.074466 | 13.501221 |
1922-12-31 | 54.463014 | 0.074027 | 12.479452 |
1923-12-31 | 50.188679 | 0.091575 | 10.104822 |
1924-12-31 | 52.850365 | 0.113382 | 11.583536 |
... | ... | ... | ... |
2016-12-31 | 57.945355 | 0.117760 | 14.414086 |
2017-12-31 | 57.249315 | 0.096055 | 14.027397 |
2018-12-31 | 54.768802 | 0.129860 | 12.649335 |
2019-12-31 | 53.376731 | 0.122402 | 11.875962 |
2020-12-31 | 55.397759 | 0.109890 | 12.998755 |
101 rows × 3 columns
# Load bokeh extension for interactive plots
hv.extension('bokeh')
cr_annual_mean.hvplot(y='TOBS')
cr_df_reset = cr_annual_mean.reset_index()
cr_df_reset
DATE | TOBS | PRCP | TCel | |
---|---|---|---|---|
0 | 1920-12-31 | 52.401639 | 0.057240 | 11.334244 |
1 | 1921-12-31 | 56.302198 | 0.074466 | 13.501221 |
2 | 1922-12-31 | 54.463014 | 0.074027 | 12.479452 |
3 | 1923-12-31 | 50.188679 | 0.091575 | 10.104822 |
4 | 1924-12-31 | 52.850365 | 0.113382 | 11.583536 |
... | ... | ... | ... | ... |
96 | 2016-12-31 | 57.945355 | 0.117760 | 14.414086 |
97 | 2017-12-31 | 57.249315 | 0.096055 | 14.027397 |
98 | 2018-12-31 | 54.768802 | 0.129860 | 12.649335 |
99 | 2019-12-31 | 53.376731 | 0.122402 | 11.875962 |
100 | 2020-12-31 | 55.397759 | 0.109890 | 12.998755 |
101 rows × 4 columns
cr_df_reset['year'] = cr_df_reset['DATE'].dt.year
cr_df_reset
DATE | TOBS | PRCP | TCel | year | |
---|---|---|---|---|---|
0 | 1920-12-31 | 52.401639 | 0.057240 | 11.334244 | 1920 |
1 | 1921-12-31 | 56.302198 | 0.074466 | 13.501221 | 1921 |
2 | 1922-12-31 | 54.463014 | 0.074027 | 12.479452 | 1922 |
3 | 1923-12-31 | 50.188679 | 0.091575 | 10.104822 | 1923 |
4 | 1924-12-31 | 52.850365 | 0.113382 | 11.583536 | 1924 |
... | ... | ... | ... | ... | ... |
96 | 2016-12-31 | 57.945355 | 0.117760 | 14.414086 | 2016 |
97 | 2017-12-31 | 57.249315 | 0.096055 | 14.027397 | 2017 |
98 | 2018-12-31 | 54.768802 | 0.129860 | 12.649335 | 2018 |
99 | 2019-12-31 | 53.376731 | 0.122402 | 11.875962 | 2019 |
100 | 2020-12-31 | 55.397759 | 0.109890 | 12.998755 | 2020 |
101 rows × 5 columns
cr_df_reset.set_index('year', inplace=True)
cr_df_reset
DATE | TOBS | PRCP | TCel | |
---|---|---|---|---|
year | ||||
1920 | 1920-12-31 | 52.401639 | 0.057240 | 11.334244 |
1921 | 1921-12-31 | 56.302198 | 0.074466 | 13.501221 |
1922 | 1922-12-31 | 54.463014 | 0.074027 | 12.479452 |
1923 | 1923-12-31 | 50.188679 | 0.091575 | 10.104822 |
1924 | 1924-12-31 | 52.850365 | 0.113382 | 11.583536 |
... | ... | ... | ... | ... |
2016 | 2016-12-31 | 57.945355 | 0.117760 | 14.414086 |
2017 | 2017-12-31 | 57.249315 | 0.096055 | 14.027397 |
2018 | 2018-12-31 | 54.768802 | 0.129860 | 12.649335 |
2019 | 2019-12-31 | 53.376731 | 0.122402 | 11.875962 |
2020 | 2020-12-31 | 55.397759 | 0.109890 | 12.998755 |
101 rows × 4 columns
# Plot TOBS using matplotlib and include trendline (w help from ChatGPT)
# Define figure and axis objects, specify figure size
fig, ax = plt.subplots(figsize=(5,3))
# Define x_values using matplotlib mdates
x_values = cr_df_reset.index
# Plot the TCel data as a scatter plot
ax.scatter(x_values,
cr_df_reset["TOBS"],
label="Temperature Data",
color='white',
edgecolor='black')
# Fit a linear regression model (from ChatGPT)
slope, intercept = np.polyfit(x_values, cr_df_reset["TOBS"], 1)
trendline = slope * x_values + intercept
# Plot trendline and TCel data as line (connecting points in scatter plot)
ax.plot(x_values, trendline, color='red', label="Trendline")
ax.plot(x_values, cr_annual_mean["TOBS"], color='grey', markersize=0.3)
# Add text box with slope value (From ChatGPT)
slope_text = f"Slope: {slope:.3f} °F/yr"
ax.text(0.43, 0.9, slope_text, transform=ax.transAxes, fontsize=10,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='whitesmoke', alpha=0.9))
ax.set(title="Mean Annual Temperature\nCedar Rapids, IA (1920-2020)",
ylabel="Temperature (°F)")
ax.legend() # Add legend to display labels
plt.show()
# Plot TCel using matplotlib and include trendline (w help from ChatGPT)
# Define figure and axis objects, specify figure size
fig, ax = plt.subplots(figsize=(5,3))
# Define x_values using matplotlib mdates
x_values = cr_df_reset.index
# Plot the TCel data as a scatter plot
ax.scatter(x_values,
cr_df_reset["TCel"],
label="Temperature Data",
color='white',
edgecolor='black')
# Fit a linear regression model (from ChatGPT)
slope, intercept = np.polyfit(x_values, cr_df_reset["TCel"], 1)
trendline = slope * x_values + intercept
# Plot trendline and TCel data as line (connecting points in scatter plot)
ax.plot(x_values, trendline, color='red', label="Trendline")
ax.plot(x_values, cr_annual_mean["TCel"], color='grey', markersize=0.3)
# Add text box with slope value (From ChatGPT)
slope_text = f"Slope: {slope:.3f} °C/yr"
ax.text(0.43, 0.9, slope_text, transform=ax.transAxes, fontsize=10,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='whitesmoke', alpha=0.9))
ax.set(title="Mean Annual Temperature\nCedar Rapids, IA (1920-2020)",
ylabel="Temperature (°C)")
ax.legend() # Add legend to display labels
plt.show()
Making a map of Cedar Rapids¶
import osmnx as ox
# Define the city name and state
city = 'Cedar Rapids'
state = 'Iowa'
# Fetch the boundary polygon of the city
place_name = f'{city}, {state}, USA'
tags = {'place': 'city', 'name': city}
cr_gdf = ox.geometries_from_place(place_name, tags)
cr_gdf
/tmp/ipykernel_1169/3458676958.py:8: FutureWarning: The `geometries` module and `geometries_from_X` functions have been renamed the `features` module and `features_from_X` functions. Use these instead. The `geometries` module and function names are deprecated and will be removed in the v2.0.0 release. cr_gdf = ox.geometries_from_place(place_name, tags)
ele | gnis:feature_id | name | place | population | population:date | wikidata | wikipedia | geometry | name:en | ... | tiger:LSAD | tiger:MTFCC | tiger:NAME | tiger:NAMELSAD | tiger:PCICBSA | tiger:PCINECTA | tiger:PLACEFP | tiger:PLCIDFP | tiger:STATEFP | type | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||||||
node | 151626808 | 247 | 465941 | Cedar Rapids | city | 137710 | 2022 | Q486439 | en:Cedar Rapids, Iowa | POINT (-91.67041 41.97589) | Cedar Rapids | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation | 128355 | NaN | 2394368 | Hiawatha | city | NaN | NaN | Q1922650 | en:Hiawatha, Iowa | POLYGON ((-91.67082 42.06598, -91.67083 42.065... | NaN | ... | 25 | G4110 | Hiawatha | Hiawatha city | N | N | 35940 | 1935940 | 19 | boundary |
129078 | NaN | 2396389 | Robins | city | NaN | NaN | Q1926058 | en:Robins, Iowa | MULTIPOLYGON (((-91.70149 42.10099, -91.69697 ... | NaN | ... | 25 | G4110 | Robins | Robins city | N | N | 67800 | 1967800 | 19 | boundary | |
129088 | NaN | 467567 | Cedar Rapids | city | NaN | NaN | Q486439 | en:Cedar Rapids, Iowa | MULTIPOLYGON (((-91.72643 41.98905, -91.72643 ... | NaN | ... | 25 | G4110 | Cedar Rapids | Cedar Rapids city | Y | N | 12000 | 1912000 | 19 | boundary |
4 rows × 33 columns
# Select out just the CR entries from this data
cr = cr_gdf[cr_gdf['name']=='Cedar Rapids']
cr
# Select out just the CR polygon
cr_poly = cr[cr['type']=='boundary']
cr_poly
ele | gnis:feature_id | name | place | population | population:date | wikidata | wikipedia | geometry | name:en | ... | tiger:LSAD | tiger:MTFCC | tiger:NAME | tiger:NAMELSAD | tiger:PCICBSA | tiger:PCINECTA | tiger:PLACEFP | tiger:PLCIDFP | tiger:STATEFP | type | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||||||
relation | 129088 | NaN | 467567 | Cedar Rapids | city | NaN | NaN | Q486439 | en:Cedar Rapids, Iowa | MULTIPOLYGON (((-91.72643 41.98905, -91.72643 ... | NaN | ... | 25 | G4110 | Cedar Rapids | Cedar Rapids city | Y | N | 12000 | 1912000 | 19 | boundary |
1 rows × 33 columns
# Plot gdf using .explore()
cr_gdf.explore()
cr_map = cr_poly.hvplot(
# Givethe map a descriptive title
title="Cedar Rapids, IA",
# Add a basemap
geo=True, tiles='EsriImagery',
# Change the colors
fill_color='white', fill_alpha=0.2,
line_color='skyblue', line_width=5,
# Change the image size
frame_width=800, frame_height=800)
# Display the map
cr_map
Cedar Rapids is slowly getting warmer¶
Between 1920 and 2020, the averge annual temperature in Cedar Rapids, IA increased at a rate of 0.032 °C/yr; 3.2 °C/century [0.058 °F/yr; 5.8 °F/century]. During this time period 1957 and 2012 had the highest average observed temperatures at nearly 16 °C (~60 °F).
%%capture
%%bash
jupyter nbconvert *.ipynb --to html