Lecture 7, April 12, 2010

Admin Stuff

New assignment on the website

`import`

First, let's talk about import in some detail. We've seen that there are two ways to interact with Python programs: one is directly from the command line, and the other is in the Python interpreter. So, for instance, we first used collatz.py directly from the command line:

$ python collatz.py 10
Sequence reached 1 after 6 steps.
 Entries: [10, 5, 16, 8, 4, 2, 1]

However, we also saw that we could play with the functions in collatz.py directly from the Python interpreter:

>>> import collatz
>>> collatz.collatz(10)
[10, 5, 16, 8, 4, 2, 1]

So we can see what import is used for: it lets us read in a Python file we've created and use it in a running session. There are two related things you could want to do with import. First, let's say you change the file: you need a way to force Python to notice that. The right way is with the reload command -- you can just do something like reload(collatz), and that will force Python to go reread that file from disk. The other thing that can cause annoyance is always having to type collatz. at the beginning of the line. What you really want is a way to just pull that name in -- and, of course, Python has one.

>>> from collatz import collatz
>>> collatz(10)
[10, 5, 16, 8, 4, 2, 1]

Unfortunately, when you do things this way, Python can't update the bindings when you reload the module. (Here the problem is compounded by the fact that the module and the function we imported have the same name -- after the from collatz import collatz, the name collatz refers to a function, not a module.) There's also another form of this from ... import ... statement that's much more dangerous in general: you can do from collatz import * to import all the functions from that module. However, you should be careful with this: you rarely know exactly what you're going to get in that case, and it's really quite easy to accidentally overwrite a name you didn't intend to. As they say, explicit is better than implicit ...

Lists

We've already run into lists before, but let's talk about them again in a little bit more detail. Remember that we said lists are simply heterogeneous collections of Python objects -- they're exactly what you think of when you talk about lists in the non-computer sense. You can specify them directly at the command line, and lots of "usual" operations do useful things with lists:

>>> ls1 = [3, 7, 11]
>>> ls2 = range(10,17)
>>> ls1
[3, 7, 11]
>>> ls2
[10, 11, 12, 13, 14, 15, 16]
>>> ls1 + ls2
[3, 7, 11, 10, 11, 12, 13, 14, 15, 16]
>>> ls1.extend(ls2)
>>> ls1
[3, 7, 11, 10, 11, 12, 13, 14, 15, 16]
>>> ls2
[10, 11, 12, 13, 14, 15, 16]
>>> ls2[5]
15
>>> ls2[5] = 3
>>> ls2
[10, 11, 12, 13, 14, 3, 16]
>>> ls1
[3, 7, 11, 10, 11, 12, 13, 14, 15, 16]

You index lists by using the [] notation -- remember that things are 0-based, so the first entry of the list is at index 0. There are a handful of other interesting ways to index into a list -- first, you can use negative indices to index from the end:

>>> ls = range(10)
>>> ls[-1]
9
>>> ls[-10]
0
>>> ls[-100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Better yet, if you just want a chunk of a list, you can use what's called slicing: you can specify a start and stop point, and take all the elements between. Keep in mind that it includes the starting point you specify, but excludes the endpoint.

>>> ls = range(20)
>>> ls[10:15]
[10, 11, 12, 13, 14]
>>> ls[18:50]
[18, 19]

You can leave out either the start or endpoint, in which case it'll just go from the beginning or end. This means that doing ls[:] will simply make a copy of the list.

>>> ls = range(30)
>>> ls[25:]
[25, 26, 27, 28, 29]
>>> ls[:5]
[0, 1, 2, 3, 4]
>>> ls = range(5)
>>> other_ls = ls[:]
>>> other_ls == ls
True
>>> other_ls[3] = 100
>>> other_ls
[0, 1, 2, 100, 4]
>>> ls
[0, 1, 2, 3, 4]

You can also specify a step to use, and any/all of these can be negative.

>>> ls = range(30)
>>> ls[10:20:2]
[10, 12, 14, 16, 18]
>>> ls[-5:-1]
[25, 26, 27, 28]
>>> ls[-10::3]
[20, 23, 26, 29]
>>> ls[-10::-7]
[20, 13, 6]

List Comprehensions

A huge amount of the work you generally find yourself doing involves various kinds of list manipulation, and one of the most common operations of this ilk is doing the same thing to every element of a list. The standard way to do this is via list comprehensions. These take the form [ expression for name in list ] and, like many things, are best explained by example:

>>> [ 2*x for x in range(4) ]
[0, 2, 4, 6]
>>> [ x*x for x in range(3,7) ]
[9, 16, 25, 36]

You can also ask to only include those elements which satisfy some condition.

>>> [ x*x for x in range(3,10) if x%2 == 1 ]
[9, 25, 49, 81]

These may not seem that powerful yet, but they're often an extremely clear and concise way of expressing something that would be a multi-line loop in another language. This will become much more apparent once we have some more exciting functions on hand to use.

This functionality is available in another form, whose heritage comes from the world of functional programming: map and filter. Here's an example of the same things as above:

>>> def double(x):
...     return 2*x
...
>>> map(double, range(4))
[0, 2, 4, 6]
>>> def square(x):
...     return x*x
...
>>> def is_odd(x):
...     return x%2 == 1
...
>>> filter(is_odd, range(3, 10))
[3, 5, 7, 9]
>>> map(square, filter(is_odd, range(3, 10)))
[9, 25, 49, 81]

Defining a function that you're just going to use once and throw away is cumbersome -- of course, there's a way around that. You can define an anonymous function (i.e. one without a name) using lambda -- the above statements are exactly equivalent to the following:

>>> map(lambda x:2*x, range(4))
[0, 2, 4, 6]
>>> filter(lambda x: x%2 == 1, range(3, 10))
[3, 5, 7, 9]
>>> map(lambda x: x*x, filter(lambda x: x%2 == 1, range(3, 10)))
[9, 25, 49, 81]

Strings

Strings are probably the most common datatype you'll use in Python, after integers and lists. Strings are used to represent any sort of text, and they're also the format you use for printing to both the screen and to files.

You can specify strings in Python several ways -- you just enclose a sequence of characters in matching single or double quotes, and it always means exactly the same thing. That is, there's no difference between 'foo' and "foo". In fact, Python will generally lean towards printing with single quotes when it can:

>>> 'foo'
'foo'
>>> "foo"
'foo'
>>> 'foo' == "foo"
True
>>> print 'foo'
foo
>>> print "foo"
foo

If you want to include a ' in a string, just use double quotes, and vice-versa. On the other hand, if you really need to have both in a string, you can use a backslash: the rule of thumb is that any character with a backslash in front of it is a "special" character. If you want a literal backslash, use \\. Python will always try to "simplify" the print representation of strings to use as few backslashes as possible, and will always prefer single quotes to double quotes.

>>> "\\"
'\\'
>>> print "\\"
\
>>> "\""
'"'
>>> '"'
'"'
>>> print '"'
"
>>> "'\""
'\'"'
>>> print "'\""
'"

Okay, so now on to the more interesting stuff: by far the most interesting part about strings in Python are the built-in methods, which make lots of standard string manipulation significantly easier. A few of the most important are find, replace,``split, and``join`.

>>> s = "this is my house"
>>> s.split()
['this', 'is', 'my', 'house']
>>> 'SPACE'.join(s.split())
'thisSPACEisSPACEmySPACEhouse'
>>> s.find('my')
8
>>> s[s.find('my'):]
'my house'
>>> s.find('your')
-1
>>> s = 'mi casa'
>>> s.replace('mi', 'su')
'su casa'
>>> print "%s es %s"%(s, s.replace('mi', 'su'))
mi casa es su casa

Finally, I can't leave the subject of strings in good conscience without at least mentioning Unicode. I could try talking about it, but Mark Pilgrim does a much better job -- you should go and read the first two sections of his chapter on strings right now.