# Project Set 2

Input/output, strings, dictionaries, functions, modules.

A. Write a program that obtains the sum of the numbers from 1 to some specified positive (>0) integer N. Request the value of N as console input from the user. Your program should catch user inputs that cannot be converted to integers greater than 0. Do not use the Gauss formula, do this via “brute force.” Print the number, its sum as obtained from your work, and the correct answer from the Gauss formula sum(N)=N(N+1)/2. Test your program with N=1, N=25, N=1000.

Example solution

n_str = input("Please enter integer number N > 0: ")
try:
N = int(n_str)
if N >0:
b_sum = 0
for number in range(1,N+1):
b_sum = b_sum+number
print (f"Sum (brute force): {b_sum}, sum (Gaussian method): {N*(N+1)//2}.")
else:
print ("Please enter an integer number greater than 0.")
except:
print (f"The entered value {n_str} cannot be converted to an integer number")



B. Convert your code to sum from 1 to N into a function. Write a main program that requests a number M from the user, then prints a table of the integers from 1 to M and their corresponding sums.

Example solution

"""
This program computes and prints a table the sums of the integers from 1 to N
"""

def sumit(N):
if N<0:
return None
sum_it=0
if N==0:
return N
for i in range(1,N+1):
sum_it+=i
return sum_it

#Header.  ^ center-justifies. See documentation for more formatting tricks.
print("{:^10s}    {:^10s}".format("Integer","Sum"))
for i in range(1,M+1):
print("{:^10d}    {:^10d}".format(i, sumit(i)))



## Project 7

Download the file us-state-capitals.csv. Write a program that will read this file and create a dictionary with the state name as the key and the capital name as the value. Using your dictionary, print the capitals of Arkansas, Virginia, and Wyoming.

Again using your dictionary, generate a list of all state capitals that begin with the letter ‘A’. Use the list to create a string consisting of these city names separated by a semicolon ; Open a new file capitals-with-a.txt and write this string to it.

Example solution

"""
Dictionary practice
Author:  K. Holcomb
"""

fin=open("us-state-capitals.csv")

states={}
for line in fin:
state_cap=line.strip("\r\n").split(",")
if state_cap not in states:
states[state_cap]=state_cap

fin.close()

for statelist in ["Arkansas","Virginia","Wyoming"]:
if statelist in states:
print("The capital of {:} is {:}".format(statelist,states[statelist]))

caps_in_M=[]
for state in states:
if states[state].startswith("A"):
caps_in_A.append(states[state])

#Just for fun, let's alphabetize them.
caps_in_A.sort()

caps_string=";".join(caps_in_A)

#Checking
print(caps_string)

fout=open("capitals-with-a.txt","w")
fout.write(caps_string+"\n")



## Project 8

Write a program to analyze a DNA sequence file. The program should contain at minimum a function countBases which takes a sequence consisting of letters ATCG, with each letter the symbol for a base, and returns a dictionary where the key is the letter and the value is the number of times it appears in the sequence. The program should contain another function printBaseComposition which takes the dictionary and prints a table of the proportions of each base, e.g.

• A : 0.25
• T : 0.25
• C : 0.25
• G : 0.25 Use the file HIV.txt to test your program. You can look at the file in a text editor. It consists of a label followed by a space followed by a sequence, for each sequence in the file. Hints: read each sequence as a line. Split the line on whitespace (rstrip first) and throw out the 0th element. Copy the next element of the list into a string, and use substrings to extract each letter. Build your dictionary as you step through the string. Repeat for the next line until you have read all the lines.
Example solution

"""
This program reads a gene file and creates a
dictionary of bases with the count as the value.

Author:    A. Programmer
"""

bases='ATCG'

def countBases(DNA):
DNAcounts={'A':0,'T':0,'C':0,'G':0}
for base in DNA:
if base in bases:
DNAcounts[base]+=1
return DNAcounts

def printBaseComposition(DNAcounts):
total=float(DNAcounts['A']+DNAcounts['T']+DNAcounts['C']+DNAcounts['G'])

for base in bases:
print "%s:%.4f" % (base,DNAcounts[base]/total)

#In a real code you should read the name of the file from the command
#line (using sys.argv) or ask the user for the name.
infile="Homo_sapiens-APC.txt"
fin=open(infile,'r')

DNAcounts=countBases(DNA)
printBaseComposition(DNAcounts)



## Project 9

In the early 2000’s an “urban legend” circulated that one could read text in which all letters except the first and last were scrambled. For example:

Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer is at the rghit pclae.

Write a program to scramble a passage input from a file. Print the result to a file with the same name as the original but a suffix _scrambled added (so if the original was Example.txt it will be Example_scrambled.txt). Look at the scrambled file first—can you read it?

• First write a scramble_word function, then a scramble_line function, and finally a function that writes the lines to the new file.
• Your main() routine will request from the user the name of the input file, then call the printing function, which will call the scramble_line function that will in turn call the scramble_word function. This is an example of how we divide up work into separate “chunks” or “concerns.” Internal punctuation (apostrophes) can be scrambled, but leave any punctuation such as periods, question marks, etc., at the end of a word in place.

Look up the documentation for the random module to find some useful functions. You can also import the string module to obtain a list of punctuaton marks.

• Hint: since you cannot overwrite strings you will need to convert them to a list and back again. Use Example.txt as your sample input.
• FYI this is an “urban legend” because: firstly, no such research was ever conducted at any university, and secondly it is true only for very practiced readers of English and even then only for familiar words that are easy to recognize in context.
Example solution

import string
import random

def scramble_word(word):
if any(word.startswith(s) for s in string.punctuation):
start=2
else:
start=1
if any(word.endswith(s) for s in string.punctuation):
end=-2
else:
end=-1
letters=list(word)
middle=letters[start:end]
random.shuffle(middle)
mid=''.join(middle)
if len(word)<3:
scramble=word
else:
scramble=word[:start]+mid+word[end:]
return scramble

def scramble_line(line):
newword_list=[]
wordlist=line.split()
for word in wordlist:
newword_list.append(scramble_word(word))
newline=' '.join(newword_list)
return newline

def scramble_text(text):
lines=text.splitlines()
newtext_list=[]
for line in lines:
newtext_list.append(scramble_line(line))
newtext='\n'.join(newtext_list)
return newtext

def main():

while True:
filename=raw_input("Enter the file name:")
if (filename):
break
else:
print "No file specified, please try again."
continue

with open(filename,'rU') as fin: