Lecture 9, April 16, 2010

Admin Stuff

Bunch of new stuff up on webpage
Minor change to tab completion page
Minor change to problem #7 in HW #2

More on the standard library

There's all kinds of interesting stuff in the standard library -- we could spend several quarters talking about all the interesting things that are implemented there. However, let's take a moment to just highlight a few things that are (1) particularly useful, and (2) likely to be helpful on your homework:

os.walk
os.path.isdir
shutil.copy

There's also something particularly noteworthy about the functions in os.path and shutil -- they're completely agnostic to what operating system you're on. So if you're on Windows or Linux, it doesn't matter -- shutil.copy is going to make copies of files for you. You can use it without knowing anything at all about what copy commands are available on that platform, or (in the case of os.path) how directories are specified, etc.

`type` and `isinstance`

We've talked before about how all Python objects have a type associated with them, but you never have to specify that yourself -- Python is happy to take care of that for you. However, there are times when you really want to know the type of something -- for instance, you could imagine writing a function that has different behavior based on whether its input is, say, a list or an integer. So how do you find this out? We've already seen the type function:

>>> type(range(5))
<type 'list'>
>>> type(7)
<type 'int'>
>>> type(type)
<type 'type'>

But how do we find out if something is a list? Python has an operator for this, called isinstance:

>>> ls = range(5)
>>> type(ls)
<type 'list'>
>>> type(ls) == 'list'
False
>>> isinstance(ls, list)
True

It does just what you expect: returns True if the first argument is an object whose type is whatever the second argument is. This is definitely the Pythonic way to do this. That said, there are times where you really want to get your hands on the type object for some specific type; for the built-in Python types, they all live in the types module.

>>> import types
>>> list == types.ListType
True

tuples

Tuples are another fundamental type in Python. One can see them as a sibling of lists -- they do something fairly similar in many contexts, but are fundamentally different in a few ways. First, the similarities: tuples are a heterogeneous "container", just like lists, and you can iterate through them in a for loop just like you would a list. Tuples are specified with parentheses ((,)) instead of brackets. However, tuples have one particularly fundamental difference from lists: tuples are immutable, meaning that you can't modify them after they're created.

>>> t = (3, 7, 289)
>>> t[1:]
(7, 289)
>>> for x in t:
...     print x
...
3
7
289
>>> t + t
(3, 7, 289, 3, 7, 289)
>>> t[0]
3
>>> t[0] = 12938
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

We'll talk more about one reason why you'd want to use tuples when we talk about dictionaries. However, there's another simple reason to use them: since they're immutable, Python can set things up to perform many operations on tuples faster than with lists. This means that in certain circumstances, creating and using tuples will actually be faster than lists. So if you're creating a list that won't change, consider using a tuple instead.

Tuples are immutable, but it's also possible to modify them indirectly -- you can change the values inside them. Here's an example:

>>> t = (range(3), range(5))
>>> t
([0, 1, 2], [0, 1, 2, 3, 4])
>>> t[0] = range(10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> t[0][0] = 293047
>>> t
([293047, 1, 2], [0, 1, 2, 3, 4])

`reduce`

When we talked about map and filter a few days ago, we forgot to mention another basic function that's often grouped with them, called reduce. reduce is also sometimes called fold, which may be more suggestive of what it does: reduce takes two arguments, a function of two variables and a list. It then calls the function with the first two variables in the list, and then repeatedly calls the function with the result and the next value in the list until the list is exhausted. (I always think of "folding" the values together using the function, which is why I prefer the name fold.) So it's easy to use this to create some simple functions to add or multiply all the elements of a list together:

>>> def add(a,b):
...     return a+b
...
>>> reduce(add, range(11))
55
>>> def mult(a,b):
...     return a*b
...
>>> reduce(mult, range(1, 10))
3628800

You can use reduce to do all kinds of things; for instance, it's easy to do the same thing as the join method on strings. (One should really wrap this all up in a nice function, but we'll just play with an example.)

>>> def f(a,b):
...     return a + ' ' + str(b)
...
>>> reduce(f, ['hi', 3, 'abc'])
'hi 3 abc'

Line endings

Ok, so finally, let's put together a handful of the things we've done so far, and actually create a useful tool. We want to write two Python programs: one which takes an arbitrary input file and outputs a file with Unix-style line endings, an the other which does the same for DOS-style line endings. Here's the one to convert to Unix endings:

#!/usr/bin/env python

import os, sys

def to_unix_ending(line):
    if line.endswith('\r\n'):
        return line[:-2] + '\n'
    else:
        return line

def convert_to_unix_endings(input_filename, output_filename):
    if not os.path.exists(input_filename):
        print "Sorry, no input file found!"
        return
    if os.path.exists(output_filename):
        print "Sorry, output file already exists!"
        return
    f = open(input_filename, "r")
    g = open(output_filename, "w")
    for line in f:
        g.write(to_unix_ending(line))

    f.close()
    g.close()

if __name__ == '__main__':
    if len(sys.argv) != 3:
        print "Sorry, wrong number of arguments! Wanted 2, got %s."%len(sys.argv)
        sys.exit(1)

    convert_to_unix_endings(sys.argv[1], sys.argv[2])

This just opens the file, and then walks over every line, converting the line endings as needed. The similar file for converting to DOS endings is identical, except for the to_unix_ending function being replaced by a to_dos_ending function that looks something like this:

def to_dos_ending(line):
    if line.endswith('\r\n'):
        return line
    else:
        return line[:-1] + '\r\n'

Now, we could just leave these as two nearly identical files; a nicer choice might be to combine them, and pass an extra argument with which direction we're converting. Here's a version that does that (I called mine convert.py):

#!/usr/bin/env python

import os, sys

def to_unix_ending(line):
    if line.endswith('\r\n'):
        return line[:-2] + '\n'
    else:
        return line

def to_dos_ending(line):
    if line.endswith('\r\n'):
        return line
    else:
        return line[:-1] + '\r\n'

def convert_line_endings(input_filename, output_filename, converter):
    if not os.path.exists(input_filename):
        print "Sorry, no input file found!"
        return
    if os.path.exists(output_filename):
        print "Sorry, output file already exists!"
        return
    f = open(input_filename, "r")
    g = open(output_filename, "w")
    for line in f:
        g.write(converter(line))

    f.close()
    g.close()

if __name__ == '__main__':
    if len(sys.argv) != 4:
        print "Sorry, wrong number of arguments! Wanted 3, got %s."%len(sys.argv)
        sys.exit(1)

    if sys.argv[1] == 'to_unix':
        converter = to_unix_ending
    elif sys.argv[1] == 'to_dos':
        converter = to_dos_ending
    else:
        print "Sorry, unrecognized conversion:", sys.argv[2]

    convert_line_endings(sys.argv[2], sys.argv[3], converter)

And you can simply use it as follows:

$ python convert.py to_unix some_file some_file_unix_endings

10/480b/lectures/lec9