Project Set 3
Functions, modules, NumPy, Matplotlib.
Remember that you need to add the import numpy
(or commonly used import numpy as np
) statement in your script before you can use the numpy package. Similarly, we must import Matplotlib.
import numpy as np
import matplotlib.pyplot as plt
Project 10
Write a Python script that performs the following operations:
a) Create a numpy array, x, of values from -1.0 to 1.0 inclusive, with step sizes of 0.01. Use numpy.pi (or np.pi) for pi.
b) Create a numpy array, y, where y = sin(pix) + cos(3pix/2)
c) Determine and print the mean y value.
d) Determine and print the minimum value of y and the maximum value of y over the range of x. Also print the corresponding x coordinates where y reaches a minimum and where y reaches a maximum. Hint: Look up the argmin and argmax functions. Pay attention to the difference between index and value.
e) Go back to the first chapter and review how to make a plot of y versus x using Matplotlib. Add code to plot y as a function of x.
f) Add a line to accept user input to specify the x start, x end, and stride values. Your finished code should get these values from the user, print the values and x-coordinate of the max and min for y, and display a plot of y versus x. Upload the plot for input values of starting x=-2., ending x=2., stride=.01.
Example solution
import numpy as np
x = np.arange(0.0,50.0,0.01)
y = np.sin(x) + np.cos(1.4*x) + 0.1*x
mean_y = y.mean()
min_index = np.argmin(y) #np.where(y==y.min())
max_index = np.argmax(y) #np.where(y==y.max())
print (f"mean y:{mean_y}")
print (f"max y:{y[min_index]} at x={x[min_index]}")
print (f"max y:{y[max_index]} at x={x[max_index]}")
Project 11
You may use your choice of lists or NumPy arrays for this project.
Download the file cpi.csv This file is the value of $100 from a baseline in January 1913 (the oldest consistent data available) to January 2020, as computed by the US Bureau of Labor Statistics; this gives us the consumer price index. The ratio of any two years is an estimate of the change in cost of living from the one to the other.
a) Write a function that takes two years and returns this ratio. Note that the order will matter.
The derivative of the CPI is a crude estimate of the inflation rate. Write a function that takes two arrays and returns another array using the formula infl_rate=(cpi[yr+1]-cpi[yr])/12
Note that the returned array will have one fewer element than the original.
b) Write a program that uses your two functions to
read cpi.csv by any means you choose.
Request two years and a price from the user. Compute the corresponding price for the second year provided by the user.
Plot the CPI and your approximation to the inflation rate.
Plot inflation rate versus the midpoint of the years (so would start in
June 1913, i.e. 1913.5, and end June 2019, i.e. 2019.5).
c) In 1954 a color TV cost approximately $1295 in that year’s dollars. How much would that be in 2020 (as of January) dollars?
d) Look at the Matplotlib documentation and find how to use subplots to plot CPI and inflation rate one above the other.
e) (More advanced) Read about exceptions in the Files chapter and add them to your code to test for the existence of the file before attempting to open it. Add with/as to the reading of the file (you can surround a numpy call with this also).
f) (More advanced) Convert your file with the functions into a module. Isolate the calls to input/output, including plotting, by using a main() function. Add the if __name__=="__main__"
so that it will not be invoked if you import your module.
Example solution
"""inflation.py
This program reads Consumer Price Index data and computes an approximation
to the inflation rate
@author: K. Holcomb
@change: Modifications for Introduction to Python 20220202
@change: Initial code 20130210
"""
import numpy as np
import matplotlib.pyplot as plt
import csv
import sys
import math
def prices(years,cpi,year1,year2):
y1=math.nan
y2=math.nan
for y in range(len(years)):
if years[y]==year1: y1=y
if years[y]==year2: y2=y
if math.isnan(y1) or math.isnan(y2):
return None
else:
return cpi[y2]/cpi[y1]
def inflate(years,cpi):
inflation=np.zeros(nyears-1)
for n in range(nyears-1):
inflation[n]=(cpi[n+1]-cpi[n])/12.
return inflation
if len(sys.argv) < 2:
sys.exit("No data file specified")
compare=int(float(input("Do you want to compare prices, 1 yes, 0 no:")))
if ( compare ):
year1=int(float(input("Please enter the reference year:")))
year2=int(float(input("Please enter the second year:")))
amount=int(float(input("Please enter the amount to compare:")))
try:
infile=sys.argv[1]
except (ValueError,TypeError):
sys.exit("Invalid file name")
years_list=[]
cpi_list=[]
with open(infile,"r") as fin:
#Skip header.
fin.readline()
for line in fin:
(yr,factor)=line.rstrip("\r\n").split(",")
years_list.append(float(yr))
cpi_list.append(float(factor))
years =np.array(years_list)
cpi =np.array(cpi_list)
nyears=np.size(years)
if ( compare ):
ratio=prices(years,cpi,year1,year2)
if ratio is None:
print("One or more year specified is outside the data range")
sys.exit()
else:
print("An item costing {:.2f} in {:} is about {:.2f} in {:}".format(amount,year1,ratio*amount,year2))
inflation=inflate(years,cpi)
fig, axs = plt.subplots(2, 1)
axs[0].plot(years, cpi)
axs[0].set_ylabel('CPI')
axs[1].plot(years[:-1], inflation)
axs[1].set_xlabel('year')
axs[1].set_ylabel('Inflation')
plt.show()
Project 12
Write a program that reads the file bodyfat.csv.
-
Extract the body fat percentage, weight, and height data from each row (the first, third, and fourth columns). We do not need the age data for the current project.
-
Use your BMI function as a ufunc to compute all the BMI values at once and return a new array.
-
In your bmistats.py file, add a
main
function that runs a test by computing the BMI for a set of heights and weights and returning the category corresponding to that BMI. Compute the correct results and add code to compare them with your code’s results.
Plot the BMI versus bodyfat as a scatterplot.
- Add the if name=="main" code so that
main
will be executed only if the file is the main module. Run your tests.
The bodyfat.csv file contains an outlier, probably due to a typo. Add a function to your bmistats file to find outliers. Find the outlier in your list and remove it (don’t forget to remove the corresponding bodyfat element).
You may use the np.percentile(a,m)
function to compute the upper and lower quartiles using the IQR. See this
example.
Plot the corrected data.
Example solution
"""This program reads body-fat data, computes BMI (with appropriate units
conversions), cleans the data, and plots the data before and after cleaning.
Author: K. Holcomb
Changelog: Initial version 20130225
"""
import sys
import numpy as np
import matplotlib.pylab as plt
from scipy import stats
#Global constants
in2m =0.0254
oz2kg=0.0283495
def imp_ht_to_metric(ft,inch):
"""Converts imperial length units to meters"""
return (ft*12.0+inch)*in2m
def imp_wt_to_metric(lb,oz):
"""Converts pounds and ounces to kg"""
return (lb*16.0+oz)*oz2kg
def BMI(ht,wt):
"""Returns BMI from ht in meters and wt in kg"""
return wt/ht**2
def reject_outliers(array):
"""Returns a mask of values of array below the outlier value."""
"""Kind of a dumb method. Should use Chauvenet criterion at least."""
cutoff=np.percentile(array,99)
return array<cutoff
def main():
#Get name of file from the command line
if len(sys.argv) < 2:
print("No file specified")
sys.exit()
infile=sys.argv[1]
try:
(bodyfat,weight,height)=np.loadtxt(infile,usecols=(0,2,3), \
unpack=True,skiprows=1,delimiter=',')
except IOError:
print("Cannot read file")
sys.exit()
bmi=BMI(imp_ht_to_metric(0.,height),(imp_wt_to_metric(weight,0.)))
plt.figure()
plt.scatter(bodyfat,bmi)
valid=reject_outliers(bmi)
bodyfat=bodyfat[valid]
bmi=bmi[valid]
slope, intercept, r_value, p_value, std_err = stats.linregress(bodyfat,bmi)
linear_fit=bodyfat*slope+intercept
print("Mean BMI is %.2f, median is %.2f, and standard deviation is %.2f" % \
(bmi.mean(),np.median(bmi),bmi.std()))
print("The least-squares regression gives a r-value of %.2f" % r_value)
plt.figure()
plt.scatter(bodyfat,bmi)
plt.plot(bodyfat,linear_fit)
plt.figure()
#Histogram using WHO BMI categories with somewhat arbitrary upper
#and lower bounds.
bin_bounds=[5.,16.,18.5,25.,30.,35.,40.,60.]
plt.hist(bmi,bins=bin_bounds)
plt.show()
if __name__=='__main__':
main()
Project 13
Download cville_2017_april.csv, which contains April 2017 weather data for Charlottesville, VA.
- Read the file into appropriate NumPy arrays
- Make a line plot of average wind speed for each day
- Add main titles, and label axes to be more descriptive
- Play with the bottom axis (x-axis) to make sure all dates are visible
- Make a bar and line plot showing average wind speed (in bars) and max wind gust (as a line). Add legend to distinguish between the two.
- Make stacked bar chart of minimum and maximum temperatures for each day
- Make grouped bar chart of minimum and maximum temperatures for each day
- Plot the number of each weather ‘condition’. Plot sunny days, partly cloudy days, and rain days. There are several ways to do this.
Example solution
import numpy as np
import matplotlib.pyplot as plt
date=[]
hi_temp_list=[]
lo_temp_list=[]
avg_wind_list=[]
wind_gust_list=[]
precip_list=[]
condition=[]
with open('cville_2017_april.csv', 'r') as file:
file.readline() #skip header
for line in file:
row=line.strip("\r\n").split(',')
date.append(row[0])
hi_temp_list.append(float(row[1]))
lo_temp_list.append(float(row[2]))
avg_wind_list.append(float(row[3]))
wind_gust_list.append(float(row[4]))
precip_list.append(float(row[5]))
condition.append(row[6])
hi_temp=np.array(hi_temp_list)
lo_temp=np.array(lo_temp_list)
avg_wind=np.array(avg_wind_list)
wind_gust=np.array(wind_gust_list)
precip=np.array(precip_list)
xlabels=date
fig,axs1=plt.subplots()
axs1.set_title("Average Wind Speed")
axs1.set_xticklabels(rotation=45,labels=xlabels)
plt.xlabel("Date")
plt.ylabel("Speed in MPH")
plt.plot(avg_wind)
plt.show()
fig,axs2=plt.subplots()
axs2.set_title("Average Wind Speed and Gusts")
plt.plot(wind_gust,color='red')
plt.bar(xlabels,avg_wind)
axs2.xaxis.set_major_locator(plt.MaxNLocator(8))
axs2.set_xticklabels(rotation=45,labels=xlabels)
plt.xlabel("Date")
plt.ylabel("Speed in MPH")
axs2.legend()
plt.show()
#How to make a stacked chart if you want the values added
fig,axs3=plt.subplots()
axs3.set_title("Stacked High and Low Temperatures")
plt.bar(xlabels,lo_temp)
plt.bar(xlabels,hi_temp,bottom=lo_temp)
axs3.xaxis.set_major_locator(plt.MaxNLocator(8))
axs3.set_xticklabels(rotation=45,labels=xlabels)
plt.xlabel("Date")
plt.ylabel("Temperature (F)")
plt.show()
#What we really want
fig,axs4=plt.subplots()
x = np.arange(len(xlabels))
#Calculate a reasonable width (found on StackOverflow)
width = np.min(np.diff(x))/3.
plt.bar(x-width/2,hi_temp,width=width,label="High",align="center")
plt.bar(x+width/2,lo_temp,width=width,label="Low",align="center")
axs4.xaxis.set_major_locator(plt.MaxNLocator(8))
axs4.set_xticklabels(rotation=45,labels=xlabels)
plt.xlabel("Date")
plt.ylabel("Temperature (F)")
plt.title("Daily High and Low Temperatures")
plt.legend(labels=["High","Low"])
plt.show()
days={}
for i in range(len(condition)):
if condition[i] not in days:
days[condition[i]]=1
else:
days[condition[i]]+=1
print(days)
fig,axs5=plt.subplots()
plt.title("Days Per Cloud Condition")
plt.ylabel("Count")
sky_condition=list(days.keys())
sky_count=days.values()
x=np.arange(len(days))
plt.bar(sky_condition,sky_count)
plt.show()