Working with Data - Python practice book

1/3/2018, 1:36:00 PM 10 min read

What will be the output of the following program?

x = [0, 1, [2]]
x[2][0] = 3
print(x)          # [0, 1, [3]]
x[2].append(4)
print(x)          # [0, 1, [3, 4]]
x[2] = 2
print(x)          # [0, 1, 2]

Python has a built-in function sum to find sum of all elements of a list. Provide an implementation for sum. 3. Can you make your sum function work for a list of strings as well.
```
def sum(x):
  x = 0 if type(x[0]) == str else ''
  for v in x:
    x += v
  return x
```

Implement a function product, to compute product of a list of numbers.

def product(x):
  r = 1
  for v in x:
    r *= v
  return r

# python 内置的 functools 模块
import functools
functools.reduce(lambda a, b: a * b, [1, 2, 3])

Write a function factorial to compute factorial of a number. Can you use the product function defined in the previous example to compute factorial?
```
def factorial(n):
  return product(range(1, n + 1))
# functools.reduce(lambda a, b: a * b, [1, 2, 3])
```

Write a function reverse to reverse a list. Can you do this without using list slicing?

def reverse(x):
  r = []
  for i in range(len(x) - 1, -1, -1):
    r.append(x[i])
  return r

x[::-1]
x.reverse()

Python has built-in functions min and max to compute minimum and maximum of a given list. Provide an implementation for these functions. What happens when you call your min and max functions with a list of strings? (略)

Cumulative sum of a list [a, b, c, ...] is defined as [a, a+b, a+b+c, ...]. Write a function cumulative_sum to compute cumulative sum of a list. Does your implementation work for a list of strings?

def cumulative_sum(x):
  r = []
  base = 0
  for item in x:
    base += item
    r.append(base)
  return r

# python 支持了 itertools.accumulate
itertools.accumulate([4, 3, 2, 1])

Write a function cumulative_product to compute cumulative product of a list of numbers.

def cumulative_product(x):
  r = []
  base =  1
  for item in x:
    base *= item
    r.append(base)
  return r

# python 内置的 itertools.accumulate 支持定义函数. 默认为 +
itertools.accumulate([4, 3, 2, 1], lambda x, y: x * y)

Write a function unique to find all the unique elements of a list.

def unique(list):
  r = []
  for item in list:
    if not(item in r):
      r.append(item)
  return r

set([1, 2, 1, 3, 2, 5])

Write a function dups to find all duplicates in the list.

def dups(x):
  r = []
  helper = dict()
  for item in x:
    if item in helper and helper[item] == 1:
      r.append(item)
      helper[item] = 2
    else:
      helper[item] = 1
  return r

Write a function group(list, size) that take a list and splits into smaller lists of given size.

def group(x, n):
  i = 0
  r = []
  temp = []
  for item in x:
    if i < n:
      i += 1
      temp.append(item)
    if i == n:
      r.append(temp)
      i = 0
      temp.clear()
  return r

补充

x.sort()   # 改变原列表
sorted(x)  # 生成新列表

Write a function lensort to sort a list of strings based on length.

def lensort(x):
  x.sort(key = lambda x: len(x))
  return x

Improve the unique function written in previous problems to take an optional key function as argument and use the return value of the key function to check for uniqueness.
```
  def uniquesort(x, key):
    return unique([key(item) for item in x])
  # uniquesort(["python", "java", "Python", "Java"], lambda s: s.lower())
```
补充
- list []
- tuples (), a = 1,, a = 1, 2 it is immutable
- set {x, y}
  - 可使用 dir 查看 set 有哪些函数
  - 交集(intersection)/并集(union)/差集(difference)/对称差集(symmetric_difference)
  - 添加(add/update)/删除(remove/pop)/清空(clear)
  - 是否包含(issubset/<=/issuperset/>=)
  - in/not in
  - 同样可使用操作符实现(a | b, a & b, a - b, a ^ b)
  - 复制(copy)
  - 求长度(len)
- dict {'x': y}
Reimplement the unique function implemented in the earlier examples using sets.
```
def uniqueuseset(x):
  return list(set(x))
```
补充
- str.split()
- str.strip([str])
- str.join()
- len(string)

Write a function extsort to sort a list of files based on extension.

def extsort(x):
  x.sort(key = lambda v:v.split('.')[1])
  return x

补充
- f.read()
- f.readline()
- f.readlines()
- f.write()
- f.writelines()
- f.close()

Write a program reverse.py to print lines of a file in reverse order.

  # reverse.py
  import sys
  def reverselines(file):
    return reverse(open(file, 'r', encoding="utf-8").readlines())

  reverselines(sys.args[1])

Write a program to print each line of a file in reverse order.

def reverseeachline(file):
  lines = open(file, 'r', encoding='utf-8').readlines()
  for line in lines[::-1]:
    print(line)

Implement unix commands head and tail. The head and tail commands take a file as argument and prints its first and last 10 lines of the file respectively.

def head(file):
  with open(file, 'r', encoding='utf-8') as f:
    for i, line in enumerate(f):
      if i < 10:
        print(i, line)
      else: break

  from collections import deque
  def tail(file, n = 10):
    with open(file, 'r', encoding='utf-8') as f:
      for line in deque(f, n):
        print('tail.line:', line, end = '')

Implement unix command grep. The grep command takes a string and a file as arguments and prints all lines in the file which contain the specified string.

  import sys
  def eachline(file, handler):
    with open(file, 'r', encoding='utf-8') as f:
      for number, line in enumerate(f):
        if handler(line, number):
          break

  def grep(substr):
    def filter(line, number):
      return substr in line
    return filter

  eachline(sys.argv[1], grep(sys.argv[2]))

Write a program wrap.py that takes filename and width as aruguments and wraps the lines longer than width.

def wrap(file, width):
  r = []
  def handler(line, n):
    l = len(line)
    if l < width:
      r.append(line)
    else:
      start = 0
      while 1:
        if len(line) == 0 or start >= l:
          break
        r.append(line[start:start + width + 1].strip()) # 这里考虑去掉换行后的空白字符
        start += width
  eachline(file, handler)
  for line in r:
    print(line[:-1])

The above wrap program is not so nice because it is breaking the line at middle of any word. Can you write a new program wordwrap.py that works like wrap.py, but breaks the line only at the word boundaries?

def wordwrap(file, width):
  r = []
  def handler(line, n):
    l = len(line)
    if l < width:
      r.append(line)
    else:
      # 分割, 但要求不能出现坏词, 并且不考虑单个词比 width 还长的情况
      words = line.split()
      text = []
      usedwidth = 0
      padding = 0
      for word in words:
        if len(word) + usedwidth > width:
          r.append(' '.join(text))
          text = []
          usedwidth = 0
          padding = 0
        text.append(word)
        usedwidth += len(word) + padding
        if padding == 0: padding = 1
      if len(text) > 0:
        r.append(' '.join(text))
        usedwidth = 0
        padding = 0
        text = []
  eachline(file, handler)
  for line in r:
    print(line)

Write a program center_align.py to center align all lines in the given file.

def center_align(file):
  with open(file, 'r', encoding='utf-8') as f:
    maxwidth = 0
    for line in f:
      width = len(line)
      if width > maxwidth:
        maxwidth = width
    f.seek(0, 0)
    for line in f:
      width = len(line)
      padding = (maxwidth - width) // 2
      line = (' ' * padding) + line
      print(line)

Provide an implementation for zip function using list comprehensions.

def zip(a, b):
  return [(a[i], b[i]) for i in range(len(a))]

Python provides a built-in function map that applies a function to each element of a list. Provide an implementation for map using list comprehensions.
```
def map(f, list):
  return [f(x) for x in list]
```
Python provides a built-in function filter(f, a) that returns items of the list a for which f(item) returns true. Provide an implementation for filter using list comprehensions.
```
def filter(f, list):
  return [item for item in list if f(item)]
```
Write a function triplets that takes a number n as argument and returns a list of triplets such that sum of first two elements of the triplet equals the third element using numbers below n. Please note that (a, b, c) and (b, a, c) represent same triplet.
```
def triplets(n):
  return [(x, y, z) for x in range(1, n) for y in range(x, n) for z in range(y, n) if x + y == z]
```
Write a function enumerate that takes a list and returns a list of tuples containing (index,item) for each item in the list.
```
def enumerate(list):
  return [(index, list[index]) for index in range(len(list))]
```
Write a function array to create an 2-dimensional array. The function should take both dimensions as arguments. Value of each element can be initialized to None:
```
def array(rows, cols):
  tmp = [None for i in range(cols)]
  return [tmp for i in range(rows)]
```

Write a python function parse_csv to parse csv (comma separated values) files. 31. Generalize the above implementation of csv parser to support any delimiter and comments.

def parse_csv(file, delimiter, c):
  lines = open(file, 'r', encoding='utf-8').readlines()
  return [line.strip().split(delimiter) for line in lines if line.strip()[0] != c]

Write a function mutate to compute all words generated by a single mutation on a given word. A mutation is defined as inserting a character, deleting a character, replacing a character, or swapping 2 consecutive characters in a string. For simplicity consider only letters from a to z.

  def mutate(word):
    return {
      value
        for i in range(len(word))
          for letter in 'abcdefghijklmnopqrstuvwxyz'
            for value in [
              word[:i] + word[i + 1:],          # remove
              word[:i] + letter + word[i:],     # insert
              word[:i] + letter + word[i + 1:], # replace
              word[:i] + word[i + 1: i + 2] + word[i:i + 1] + word[i + 2:]
            ]
    }

  # 将 a 转换为 b 的最少操作次数
  # 1. 删除一个字符
  # 2. 插入一个字符
  # 3. 替换一个字符
  def editdistance(a, b, m, n):
    cache = [i for i in range(n + 1)]
    for i in range(1, m + 1):
      # 初始化左上角
      old = cache[i - 1]
      for j in range(1, n + 1):
        prev = cache[j]
        if a[i - 1] == b[j - 1]:
          cache[j] = old
        else:
          # 上/左/左上
          cache[j] = min(
            cache[j] + 1,
            cache[j - 1] + 1,
            old + 1
          )
        # 更新左上角
        old = prev
    return cache[n]

Write a function nearly_equal to test whether two strings are nearly equal. Two strings a and b are nearly equal when a can be generated by a single mutation on b.
```
def nearly_equal(a, b):
  return False if abs(len(a) - len(b)) > 1 else b in mutate(a)
```

Improve the above program to print the words in the descending order of the number of occurrences.

def sort():
  list = [(key, dict.get(key)) for key in dict.keys()]
  list.sort(key = lambda x: x[1])
  for key, value in list:
    print(key, value)

Write a program to count frequency of characters in a given file. Can you use character frequency to tell whether the given file is a Python program file, C program file or a text file? (略)

Write a program to find anagrams in a given list of words. Two words are called anagrams if one word can be formed by rearranging letters of another. For example ‘eat’, ‘ate’ and ‘tea’ are anagrams.

def anagrams(words):
  helper = {}
  for item in words:
    tmp = list(item)
    tmp.sort()
    word = ''.join(tmp)
    if word in helper:
      helper[word].append(item)
    else:
      helper[word] = [item]
  r = []
  for key in helper.keys():
    r.append(helper[key])
  return r

Write a function valuesort to sort values of a dictionary based on the key.

def valuesort(dict):
  keys = [key for key in dict.keys()]
  return [dict.get(key) for key in keys.sort()]

Write a function invertdict to interchange keys and values in a dictionary. For simplicity, assume that all values are unique.
```
def invertdict(x):
  r = {}
  for key, value in x.items():
    r[value] = key
  return r
```

Tag:: Python