# Working with Data - Python practice book

1/3/2018, 9:36:00 PM

1. What will be the output of the following program?
``````x = [0, 1, [2]]
x[2][0] = 3
print(x)          # [0, 1, [3]]
x[2].append(4)
print(x)          # [0, 1, [3, 4]]
x[2] = 2
print(x)          # [0, 1, 2]``````
1. Python has a built-in function `sum` to find sum of all elements of a list. Provide an implementation for `sum`. 3. Can you make your `sum` function work for a list of strings as well.
``````def sum(x):
x = 0 if type(x[0]) == str else ''
for v in x:
x += v
return x``````
1. Implement a function `product`, to compute product of a list of numbers.
``````def product(x):
r = 1
for v in x:
r *= v
return r``````
``````# python 内置的 functools 模块
import functools
functools.reduce(lambda a, b: a * b, [1, 2, 3])``````
1. Write a function `factorial` to compute factorial of a number. Can you use the `product` function defined in the previous example to compute factorial?
``````def factorial(n):
return product(range(1, n + 1))
# functools.reduce(lambda a, b: a * b, [1, 2, 3])``````
1. Write a function `reverse` to reverse a list. Can you do this without using list slicing?
``````def reverse(x):
r = []
for i in range(len(x) - 1, -1, -1):
r.append(x[i])
return r``````
``````x[::-1]
x.reverse()``````
1. Python has built-in functions `min` and `max` to compute minimum and maximum of a given list. Provide an implementation for these functions. What happens when you call your `min` and `max` functions with a list of strings?

(略)

1. Cumulative sum of a list `[a, b, c, ...]` is defined as `[a, a+b, a+b+c, ...]`. Write a function `cumulative_sum` to compute cumulative sum of a list. Does your implementation work for a list of strings?
``````def cumulative_sum(x):
r = []
base = 0
for item in x:
base += item
r.append(base)
return r``````
``````# python 支持了 itertools.accumulate
itertools.accumulate([4, 3, 2, 1])``````
1. Write a function `cumulative_product` to compute cumulative product of a list of numbers.
``````def cumulative_product(x):
r = []
base =  1
for item in x:
base *= item
r.append(base)
return r``````
``````# python 内置的 itertools.accumulate 支持定义函数. 默认为 +
itertools.accumulate([4, 3, 2, 1], lambda x, y: x * y)``````
1. Write a function `unique` to find all the unique elements of a list.
``````def unique(list):
r = []
for item in list:
if not(item in r):
r.append(item)
return r``````
``set([1, 2, 1, 3, 2, 5])``
1. Write a function `dups` to find all duplicates in the list.
``````def dups(x):
r = []
helper = dict()
for item in x:
if item in helper and helper[item] == 1:
r.append(item)
helper[item] = 2
else:
helper[item] = 1
return r``````
1. Write a function `group(list, size)` that take a list and splits into smaller lists of given size.
``````def group(x, n):
i = 0
r = []
temp = []
for item in x:
if i < n:
i += 1
temp.append(item)
if i == n:
r.append(temp)
i = 0
temp.clear()
return r``````
1. 补充
``````x.sort()   # 改变原列表
sorted(x)  # 生成新列表``````
1. Write a function `lensort` to sort a list of strings based on length.
``````def lensort(x):
x.sort(key = lambda x: len(x))
return x``````
1. Improve the `unique` function written in previous problems to take an optional `key` function as argument and use the return value of the key function to check for uniqueness.
``````  def uniquesort(x, key):
return unique([key(item) for item in x])
# uniquesort(["python", "java", "Python", "Java"], lambda s: s.lower())``````
1. 补充

• list `[]`
• tuples `()`, `a = 1,`, `a = 1, 2` it is immutable
• set `{x, y}`
• 可使用 dir 查看 set 有哪些函数
• 交集(intersection)/并集(union)/差集(difference)/对称差集(symmetric_difference)
• 是否包含(issubset/<=/issuperset/>=)
• in/not in
• 同样可使用操作符实现(a | b, a & b, a - b, a ^ b)
• 复制(copy)
• 求长度(len)
• dict `{'x': y}`
2. Reimplement the `unique` function implemented in the earlier examples using sets.

``````def uniqueuseset(x):
return list(set(x))``````
1. 补充

• `str.split()`
• `str.strip([str])`
• `str.join()`
• `len(string)`
2. Write a function `extsort` to sort a list of files based on extension.

``````def extsort(x):
x.sort(key = lambda v:v.split('.')[1])
return x``````
1. 补充

• `f.read()`
• `f.readline()`
• `f.readlines()`
• `f.write()`
• `f.writelines()`
• `f.close()`
2. Write a program `reverse.py` to print lines of a file in reverse order.

``````  # reverse.py
import sys
def reverselines(file):

reverselines(sys.args[1])``````
1. Write a program to print each line of a file in reverse order.
``````def reverseeachline(file):
for line in lines[::-1]:
print(line)``````
1. Implement unix commands `head` and `tail`. The `head` and `tail` commands take a file as argument and prints its first and last 10 lines of the file respectively.
``````def head(file):
with open(file, 'r', encoding='utf-8') as f:
for i, line in enumerate(f):
if i < 10:
print(i, line)
else: break``````
``````  from collections import deque
def tail(file, n = 10):
with open(file, 'r', encoding='utf-8') as f:
for line in deque(f, n):
print('tail.line:', line, end = '')``````
1. Implement unix command `grep`. The `grep` command takes a string and a file as arguments and prints all lines in the file which contain the specified string.
``````  import sys
def eachline(file, handler):
with open(file, 'r', encoding='utf-8') as f:
for number, line in enumerate(f):
if handler(line, number):
break

def grep(substr):
def filter(line, number):
return substr in line
return filter

eachline(sys.argv[1], grep(sys.argv[2]))``````
1. Write a program `wrap.py` that takes filename and width as aruguments and wraps the lines longer than `width`.
``````def wrap(file, width):
r = []
def handler(line, n):
l = len(line)
if l < width:
r.append(line)
else:
start = 0
while 1:
if len(line) == 0 or start >= l:
break
r.append(line[start:start + width + 1].strip()) # 这里考虑去掉换行后的空白字符
start += width
eachline(file, handler)
for line in r:
print(line[:-1])``````
1. The above wrap program is not so nice because it is breaking the line at middle of any word. Can you write a new program `wordwrap.py` that works like `wrap.py`, but breaks the line only at the word boundaries?
``````def wordwrap(file, width):
r = []
def handler(line, n):
l = len(line)
if l < width:
r.append(line)
else:
# 分割, 但要求不能出现坏词, 并且不考虑单个词比 width 还长的情况
words = line.split()
text = []
usedwidth = 0
for word in words:
if len(word) + usedwidth > width:
r.append(' '.join(text))
text = []
usedwidth = 0
text.append(word)
if len(text) > 0:
r.append(' '.join(text))
usedwidth = 0
text = []
eachline(file, handler)
for line in r:
print(line)``````
1. Write a program `center_align.py` to center align all lines in the given file.
``````def center_align(file):
with open(file, 'r', encoding='utf-8') as f:
maxwidth = 0
for line in f:
width = len(line)
if width > maxwidth:
maxwidth = width
f.seek(0, 0)
for line in f:
width = len(line)
padding = (maxwidth - width) // 2
line = (' ' * padding) + line
print(line)``````
1. Provide an implementation for `zip` function using list comprehensions.
``````def zip(a, b):
return [(a[i], b[i]) for i in range(len(a))]``````
1. Python provides a built-in function `map` that applies a function to each element of a list. Provide an implementation for `map` using list comprehensions.
``````def map(f, list):
return [f(x) for x in list]``````
1. Python provides a built-in function `filter(f, a)` that returns items of the list a for which `f(item)` returns true. Provide an implementation for `filter` using list comprehensions.
``````def filter(f, list):
return [item for item in list if f(item)]``````
1. Write a function `triplets` that takes a number `n` as argument and returns a list of triplets such that sum of first two elements of the triplet equals the third element using numbers below n. Please note that `(a, b, c)` and `(b, a, c)` represent same triplet.
``````def triplets(n):
return [(x, y, z) for x in range(1, n) for y in range(x, n) for z in range(y, n) if x + y == z]``````
1. Write a function `enumerate` that takes a list and returns a list of tuples containing `(index,item)` for each item in the list.
``````def enumerate(list):
return [(index, list[index]) for index in range(len(list))]``````
1. Write a function `array` to create an 2-dimensional array. The function should take both dimensions as arguments. Value of each element can be initialized to None:
``````def array(rows, cols):
tmp = [None for i in range(cols)]
return [tmp for i in range(rows)]``````
1. Write a python function `parse_csv` to parse csv (comma separated values) files. 31. Generalize the above implementation of csv parser to support any delimiter and comments.
``````def parse_csv(file, delimiter, c):
return [line.strip().split(delimiter) for line in lines if line.strip()[0] != c]``````
1. Write a function `mutate` to compute all words generated by a single mutation on a given word. A mutation is defined as inserting a character, deleting a character, replacing a character, or swapping 2 consecutive characters in a string. For simplicity consider only letters from `a` to `z`.
``````  def mutate(word):
return {
value
for i in range(len(word))
for letter in 'abcdefghijklmnopqrstuvwxyz'
for value in [
word[:i] + word[i + 1:],          # remove
word[:i] + letter + word[i:],     # insert
word[:i] + letter + word[i + 1:], # replace
word[:i] + word[i + 1: i + 2] + word[i:i + 1] + word[i + 2:]
]
}``````
``````  # 将 a 转换为 b 的最少操作次数
# 1. 删除一个字符
# 2. 插入一个字符
# 3. 替换一个字符
def editdistance(a, b, m, n):
cache = [i for i in range(n + 1)]
for i in range(1, m + 1):
# 初始化左上角
old = cache[i - 1]
for j in range(1, n + 1):
prev = cache[j]
if a[i - 1] == b[j - 1]:
cache[j] = old
else:
# 上/左/左上
cache[j] = min(
cache[j] + 1,
cache[j - 1] + 1,
old + 1
)
# 更新左上角
old = prev
return cache[n]``````
1. Write a function `nearly_equal` to test whether two strings are nearly equal. Two strings `a` and `b` are nearly equal when `a` can be generated by a single mutation on `b`.
``````def nearly_equal(a, b):
return False if abs(len(a) - len(b)) > 1 else b in mutate(a)``````
1. Improve the above program to print the words in the descending order of the number of occurrences.
``````def sort():
list = [(key, dict.get(key)) for key in dict.keys()]
list.sort(key = lambda x: x[1])
for key, value in list:
print(key, value)``````
1. Write a program to count frequency of characters in a given file. Can you use character frequency to tell whether the given file is a Python program file, C program file or a text file?

(略)

1. Write a program to find anagrams in a given list of words. Two words are called anagrams if one word can be formed by rearranging letters of another. For example ‘eat’, ‘ate’ and ‘tea’ are anagrams.
``````def anagrams(words):
helper = {}
for item in words:
tmp = list(item)
tmp.sort()
word = ''.join(tmp)
if word in helper:
helper[word].append(item)
else:
helper[word] = [item]
r = []
for key in helper.keys():
r.append(helper[key])
return r``````
1. Write a function `valuesort` to sort values of a dictionary based on the key.
``````def valuesort(dict):
keys = [key for key in dict.keys()]
return [dict.get(key) for key in keys.sort()]``````
1. Write a function `invertdict` to interchange keys and values in a dictionary. For simplicity, assume that all values are unique.
``````def invertdict(x):
r = {}
for key, value in x.items():
r[value] = key
return r``````
Tag:
Python

Redky，生活在北京(北漂)，程序员，宅，喜欢动漫(海贼王)。"年轻骑士骑马出城，不曾见过绝望堡下森森骸骨，就以为自己可以快意屠龙拯救公主。"