Python Workbook

Author: Dave Kuhlman
Address:
dkuhlman@rexx.com
http://www.rexx.com/~dkuhlman
Revision: 1.1c
Date: January 14, 2009
Copyright:Copyright (c) 2008 Dave Kuhlman. All Rights Reserved. This software is subject to the provisions of the MIT License http://www.opensource.org/licenses/mit-license.php.
Abstract:This document is a workbook intended by those who are learning and teaching the Python programming language.

Contents

1   Introduction

This document takes a workbook and exercise-with-solutions approach to Python training. It is hoped that those who feel a need for less explanation and more practical exercises will find this useful.

A few notes about the exercises:

The latest version of this document is at my Web site (URL above).

If you have comments or suggestions, please send them my way.

2   Lexical Structures

2.1   Variables and names

A name is any combination of letters, digits, and the underscore, but the first character must be a letter or an underscore. Names may be of any length.

Case is significant.

Exercises:

  1. Which of the following are valid names?
    1. total
    2. total_of_all_vegetables
    3. big-title-1
    4. _inner_func
    5. 1bigtitle
    6. bigtitle1
  2. Which or the following pairs are the same name:
    1. the_last_item and the_last_item
    2. the_last_item and The_Last_Item
    3. itemi and itemj
    4. item1 and iteml

Solutions:

  1. Items 1, 2, 4, and 6 are valid. Item 3 is not a single name, but is three items separated by the minus operator. Item 5 is not valid because it begins with a digit.
  2. Python names are case-sensitive, which means:
    1. the_last_item and the_last_item are the same.
    2. the_last_item and The_Last_Item are different -- The second name has an upper-case characters.
    3. itemi and itemj are different.
    4. item1 and iteml are different -- This one may be difficult to see, depending on the font you are viewing. One name ends with the digit one; the other ends with the alpha character "el". And this example provides a good reason to use "1" and "l" judiciously in names.

The following are keywords in Python and should not be used as variable names:

and       del       from      not       while
as        elif      global    or        with
assert    else      if        pass      yield
break     except    import    print
class     exec      in        raise
continue  finally   is        return
def       for       lambda    try

Exercises:

  1. Which of the following are valid names in Python?
    1. _global
    2. global
    3. file

Solutions:

  1. Do not use keywords for variable names:
    1. Valid
    2. Not a valid name. "global" is a keyword.
    3. Valid, however, "file" is the name of a built-in type, as you will learn later, so you are advised not to redefine it. Here are a few of the names of built-in types: "file", "int", "str", "float", "list", "dict", etc. See Built-in Types -- http://docs.python.org/lib/types.html for more built-in types..

The following are operators in Python and will separate names:

+       -       *       **      /       //      %
<<      >>      &       |       ^       ~
<       >       <=      >=      ==      !=      <>

and     or      is      not     in

Also:   ()      []      . (dot)

But, note that the Python style guide suggests that you place blanks around binary operators. One exception to this rule is function arguments and parameters for functions: it is suggested that you not put blanks around the equal sign (=) used to specify keyword arguments and default parameters.

Exercises:

  1. Which of the following are single names and which are names separated by operators?
    1. fruit_collection
    2. fruit-collection

Solutions:

  1. Do not use a dash, or other operator, in the middle of a name:
    1. fruit_collection is a single name
    2. fruit-collection is two names separated by a dash.

2.2   Line structure

In Python, normally we write one statement per line. In fact, Python assumes this. Therefore:

  • Statement separators are not normally needed.
  • But, if we want more than one statement on a line, we use a statement separator, specifically a semi-colon.
  • And, if we want to extend a statement to a second or third line and so on, we sometimes need to do a bit extra.

Extending a Python statement to a subsequent line -- Follow these two rules:

  1. If there is an open context, nothing special need be done to extend a statement across multiple lines. An open context is an open parenthesis, an open square bracket, or an open curly bracket.
  2. We can always extend a statement on a following line by placing a back slash as the last character of the line.

Exercises:

  1. Extend the following statement to a second line using parentheses:

    total_count = tree_count + vegetable_count + fruit_count
    
  2. Extend the following statement to a second line using the backslash line continuation character:

    total_count = tree_count + vegetable_count + fruit_count
    

Solutions:

  1. Parentheses create an open context that tells Python that a statement extends to the next line:

    total_count = (tree_count +
        vegetable_count + fruit_count)
    
  2. A backslash as the last character on line tells Python that the current statement extends to the next line:

    total_count = tree_count + \
        vegetable_count + fruit_count
    

For extending a line on a subsequent line, which is better, parentheses or a backslash? Here is a quote:

"The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better."

-- PEP 8: Style Guide for Python Code -- http://www.python.org/dev/peps/pep-0008/

2.3   Indentation and program structure

Python uses indentation to indicate program structure. That is to say, in order to nest a block of code inside a compound statement, you indent that nested code. This is different from many programming languages which use some sort of begin and end markers, for example curly brackets.

The standard coding practice for Python is to use four spaces per indentation level and to not use hard tabs. (See the Style Guide for Python Code.) Because of this, you will want to use a text editor that you can configure so that it will use four spaces for indentation. See here for a list of Python-friendly text editors: PythonEditors.

Exercises:

  1. Given the following, nest the print statement inside the if statement:

    if x > 0:
    
    print x
    
  2. Nest these two lines:

    z = x + y
    print z
    

    inside the following function definition statement:

    def show_sum(x, y):
    

Solutions:

  1. Indentation indicates that one statement is nested inside another statement:

    if x > 0:
        print x
    
  2. Indentation indicates that a block of statements is nested inside another statement:

    def show_sum(x, y):
        z = x + y
        print z
    

3   Execution Model

Here are a few rules:

  1. Python evaluates Python code from the top of a module down to the bottom of a module.
  2. Binding statements at top level create names (and bind values to those names) as Python evaluates code. Further more, a name is not created until it is bound to a value/object.
  3. A nested reference to a name (for example, inside a function definition or in the nested block of an if statement) is not used until that nested code is evaluated.

Exercises:

  1. Will the following code produce an error?

    show_version()
    def show_version():
        print 'Version 1.0a'
    
  2. Will the following code produce an error?

    def test():
        show_version()
    
    def show_version():
        print 'Version 1.0a'
    
    test()
    
  3. Will the following code produce an error? Assume that show_config is not defined:

    x = 3
    if x > 5:
        show_config()
    

Solutions:

  1. Answer: Yes, it generates an error. The name show_version would not be created and bound to a value until the def function definition statement binds a function object to it. That is done after the attempt to use (call) that object.

  2. Answer: No. The function test() does call the function show_version(), but since test() is not called until after show_version() is defined, that is OK.

  3. Answer: No. It's bad code, but in this case will not generate an error. Since x is less than 5, the body of the if statement is not evaluated.

    N.B. This example shows why it is important during testing that every line of code in your Python program be evaluated. Here is good Pythonic advice: "If it's not tested, it's broken."

4   Built-in Data Types

Each of the subsections in this section on built-in data types will have a similar structure:

  1. A brief description of the data type and its uses.
  2. Representation and construction -- How to represent an instance of the data type. How to code a literal representation that creates and defines an instance. How to create an instance of the built-in type.
  3. Operators that are applicable to the data type.
  4. Methods implemented and supported by the data type.

4.1   Numbers

The numbers you will use most commonly are likely to be integers and floats. Python also has long integers and complex numbers.

A few facts about numbers (in Python):

  • Python will convert to using a long integer automatically when needed. You do not need to worry about exceeding the size of a (standard) integer.

  • The size of the largest integer in your version of Python is in sys.maxint. To learn what it is, do:

    >>> import sys
    >>> print sys.maxint
    9223372036854775807
    

    The above show the maximum size of an integer on a 64-bit version of Python.

  • You can convert from integer to float by using the float constructor. Example:

    >>> x = 25
    >>> y = float(x)
    >>> print y
    25.0
    
  • Python does "mixed arithmetic". You can add, multiply, and divide integers and floats. When you do, Python "promotes" the result to a float.

4.1.1   Literal representations of numbers

An integer is constructed with a series of digits or the integer constructor (int(x)). Be aware that a sequence of digits beginning with zero represents an octal value. Examples:

>>> x1 = 1234
>>> x2 = int('1234')
>>> x3 = -25
>>> x1
1234
>>> x2
1234
>>> x3
-25

A float is constructed either with digits and a dot (example, 12.345) or with engineering/scientific notation or with the float constructor (float(x)). Examples:

>>> x1 = 2.0e3
>>> x1 = 1.234
>>> x2 = -1.234
>>> x3 = float('1.234')
>>> x4 = 2.0e3
>>> x5 = 2.0e-3
>>> print x1, x2, x3, x4, x5
1.234 -1.234 1.234 2000.0 0.002

Exercises:

Construct these numeric values:

  1. Integer zero
  2. Floating point zero
  3. Integer one hundred and one
  4. Floating point one thousand
  5. Floating point one thousand using scientific notation
  6. Create a positive integer, a negative integer, and zero. Assign them to variables
  7. Write several arithmetic expressions. Bind the values to variables. Use a variety of operators, e.g. +, -, /, *, etc. Use parentheses to control operator scope.
  8. Create several floats and assign them to variables.
  9. Write several arithmetic expressions containing your float variables.
  10. Write several expressions using mixed arithmetic (integers and floats). Obtain a float as a result of division of one integer by another; do so by explicitly converting one integer to a float.

Solutions:

  1. 0

  2. 0.0, 0., or .0

  3. 101

  4. 1000.0

  5. 1e3 or 1.0e3

  6. Asigning integer values to variables:

    In [7]: value1 = 23
    In [8]: value2 = -14
    In [9]: value3 = 0
    In [10]: value1
    Out[10]: 23
    In [11]: value2
    Out[11]: -14
    In [12]: value3
    Out[12]: 0
    
  7. Assigning expression values to variables:

    value1 = 4 * (3 + 5)
    value2 = (value1 / 3.0) - 2
    
  8. Assigning floats to variables:

    value1 = 0.01
    value2 = -3.0
    value3 = 3e-4
    
  9. Assigning expressions containing varialbes:

    value4 = value1 * (value2 - value3)
    value4 = value1 + value2 + value3 - value4
    
  10. Mixed arithmetic:

    x = 5
    y = 8
    z = float(x) / y
    

You can also construct integers and floats using the class. Calling a class (using parentheses after a class name, for example) produces an instance of the class.

Exercises:

  1. Construct an integer from the string "123".
  2. Construct a float from the integer 123.
  3. Construct an integer from the float 12.345.

Solutions:

  1. Use the int data type to construct an integer instance from a string:

    int("123")
    
  2. Use the float data type to construct a float instance from an integer:

    float(123)
    
  3. Use the int data type to construct an integer instance from a float:

    int(12.345)    # --> 12
    

    Notice that the result is truncated to the integer part.

4.1.2   Operators for numbers

You can use most of the familiar operators with numbers, for example:

+       -       *       **      /       //      %
<<      >>      &       |       ^       ~
<       >       <=      >=      ==      !=      <>

Look here for an explanation of these operators when applied to numbers: Numeric Types -- int, float, long, complex -- http://docs.python.org/lib/typesnumeric.html.

Some operators take precedence over others. The table in the Web page just referenced above also shows that order of priority.

Here is a bit of that table:

All numeric types (except complex) support the following operations,
sorted by ascending priority (operations in the same box have the same
priority; all numeric operations have a higher priority than comparison
operations):

Operation      Result
---------      ------
x + y          sum of x and y
x - y          difference of x and y
x * y          product of x and y
x / y          quotient of x and y
x // y         (floored) quotient of x and y
x % y          remainder of x / y
-x             x negated
+x             x unchanged
abs(x)         absolute value or magnitude of x
int(x)         x converted to integer
long(x)        x converted to long integer
float(x)       x converted to floating point
complex(re,im) a complex number with real part re, imaginary part
               im. im defaults to zero.
c.conjugate()  conjugate of the complex number c
divmod(x, y)   the pair (x // y, x % y)
pow(x, y)      x to the power y
x ** y         x to the power y

Notice also that the same operator may perform a different function depending on the data type of the value to which it is applied.

Exercises:

  1. Add the numbers 3, 4, and 5.
  2. Add 2 to the result of multiplying 3 by 4.
  3. Add 2 plus 3 and multiply the result by 4.

Solutions:

  1. Arithmetic expressions are follow standard infix algebraic syntax:

    3 + 4 + 5
    
  2. Use another infix expression:

    2 + 3 * 4
    

    Or:

    2 + (3 * 4)
    

    But, in this case the parentheses are not necessary because the * operator binds more tightly than the + operator.

  3. Use parentheses to control order of evaluation:

    (2 + 3) * 4
    

    Note that the * operator has precedence over (binds tighter than) the + operator, so the parentheses are needed.

Python does mixed arithemetic. When you apply an operation to an integer and a float, it promotes the result to the "higher" data type, a float.

If you need to perform an operation on several integers, but want use a floating point operation, first convert one of the integers to a float using float(x), which effectively creates an instance of class float.

Try the following at your Python interactive prompt:

  1. 1.0 + 2
  2. 2 / 3 -- Notice that the result is truncated.
  3. float(2) / 3 -- Notice that the result is not truncated.

Exercises:

  1. Given the following assignments:

    x = 20
    y = 50
    

    Divide x by y giving a float result.

Solutions:

  1. Promote one of the integers to float before performing the division:

    z = float(x) / y
    

4.1.3   Methods on numbers

Most of the methods implemented by the data types (classes) int and float are special methods that are called through the use of operators. Special methods often have names that begin and end with a double underscore. To see a list of the special names and a bit of an indication of when each is called, do any of the following at the Python interactive prompt:

>>> help(int)
>>> help(32)
>>> help(float)
>>> help(1.23)
>>> dir(1)
>>> dir(1.2)

4.2   Lists

Lists are a container data type that acts as a dynamic array. That is to say, a list is a sequence that can be indexed into and that can grow and shrink.

A tuple is an index-able container, like a list, except that a tuple is immutable.

A few characteristics of lists and tuples:

  • A list has a (current) length -- Get the length of a list with len(mylist).
  • A list has an order -- The items in a list are ordered, and you can think of that order as going from left to right.
  • A list is heterogeous -- You can insert different types of objects into the same list.
  • Lists are mutable, but tuples are not. Thus, the following are true of lists, but not of tuples:
    • You can extended or add to a list.
    • You can shrink a list by deleting items from it.
    • You can insert items into the middle of a list or at the beginning of a list. You can add items to the end of a list.
    • You can change which item is at a given position in a list.

4.2.1   Literal representation of lists

The literal representation of a list is square brackets containing zero or more items separated by commas.

Examples:

  1. Try these at the Python interactive prompt:

    >>> [11, 22, 33]
    >>> ['aa', 'bb', 'cc', ]
    >>> [100, 'apple', 200, 'banana', ]    # The last comma is
    >>> optional.
    
  2. A list can contain lists. In fact a list can contain any kind of object:

    >>> [1, [2, 3], 4, [5, 6, 7, ], 8]
    
  3. Lists are heterogenous, that is, different kinds of objects can be in the same list. Here is a list that contains a number, a string, and another list:

    >>> [123, 'abc', [456, 789]]
    

Exercises:

  1. Create (define) the following tuples and lists using a literal:
    1. A tuple of integers
    2. A tuple of strings
    3. A list of integers
    4. A list of strings
    5. A list of tuples or tuple of lists
    6. A list of integers and strings and tuples
    7. A tuple containing exactly one item
    8. An empty tuple
  2. Do each of the following:
    1. Print the length of a list.
    2. Print each item in the list -- Iterate over the items in one of your lists. Print each item.
    3. Append an item to a list.
    4. Insert an item at the beginning of a list. Insert an item in the middle of a list.
    5. Add two lists together. Do so by using both the extend method and the plus (+) operator. What is the difference between extending a list and adding two lists?
    6. Retrieve the 2nd item from one of your tuples or lists.
    7. Retrieve the 2nd, 3rd, and 4th items (a slice) from one of your tuples or lists.
    8. Retrieve the last (right-most) item in one of your lists.
    9. Replace an item in a list with a new item.
    10. Pop one item off the end of your list.
    11. Delete an item from a list.
    12. Do the following list manipulations:
      1. Write a function that takes two arguments, a list and an item, and that appends the item to the list.
      2. Create an empty list,
      3. Call your function several times to append items to the list.
      4. Then, print out each item in the list.

Solutions:

  1. We can define list literals at the Python or IPython interactive prompt:

    1. Create a tuple using commas, optionally with parentheses:

      In [1]: a1 = (11, 22, 33, )
      In [2]: a1
      Out[2]: (11, 22, 33)
      
    2. Quoted characters separated by commas create a tuple of strings:

      In [3]: a2 = ('aaa', 'bbb', 'ccc')
      In [4]: a2
      Out[4]: ('aaa', 'bbb', 'ccc')
      
    3. Items separated by commas inside square brackets create a list:

      In [26]: a3 = [100, 200, 300, ]
      In [27]: a3
      Out[27]: [100, 200, 300]
      
    4. Strings separated by commas inside square brackets create a list of strings:

      In [5]: a3 = ['basil', 'parsley', 'coriander']
      In [6]: a3
      Out[6]: ['basil', 'parsley', 'coriander']
      In [7]:
      
    5. A tuple or a list can contain tuples and lists:

      In [8]: a5 = [(11, 22), (33, 44), (55,)]
      In [9]: a5
      Out[9]: [(11, 22), (33, 44), (55,)]
      
    6. A list or tuple can contain items of different types:

      In [10]: a6 = [101, 102, 'abc', "def", (201, 202), ('ghi', 'jkl')]
      In [11]: a6
      Out[11]: [101, 102, 'abc', 'def', (201, 202), ('ghi', 'jkl')]
      
    7. In order to create a tuple containing exactly one item, we must use a comma:

      In [13]: a7 = (6,)
      In [14]: a7
      Out[14]: (6,)
      
    8. In order to create an empty tuple, use the tuple class/type to create an instance of a empty tuple:

      In [21]: a = tuple()
      In [22]: a
      Out[22]: ()
      In [23]: type(a)
      Out[23]: <type 'tuple'>
      

4.2.2   Operators on lists

There are several operators that are applicable to lists. Here is how to find out about them:

Exercises:

  1. Concatenate (add) two lists together.
  2. Create a single list that contains the items in an initial list repeated 3 times.
  3. Compare two lists.

Solutions:

  1. The plus operator, applied to two lists produces a new list that is a concatenation of two lists:

    >>> [11, 22] + ['aa', 'bb']
    
  2. Multiplying a list by an integer n creates a new list that repeats the original list n times:

    >>> [11, 'abc', 4.5] * 3
    
  3. The comparison operators can be used to compare lists:

    >>> [11, 22] == [11, 22]
    >>> [11, 22] < [11, 33]
    

4.2.3   Methods on lists

Again, use dir() and help() to learn about the methods supported by lists.

Examples:

  1. Create two (small) lists. Extend the first list with the items in the second.
  2. Append several individual items to the end of a list.
  3. (a) Insert a item at the beginning of a list. (b) Insert an item somewhere in the middle of a list.
  4. Pop an item off the end of a list.

Solutions:

  1. The extend method adds elements from another list, or other iterable:

    >>> a = [11, 22, 33, 44, ]
    >>> b = [55, 66]
    >>> a.extend(b)
    >>> a
    [11, 22, 33, 44, 55, 66]
    
  2. Use the append method on a list to add/append an item to the end of a list:

    >>> a = ['aa', 11]
    >>> a.append('bb')
    >>> a.append(22)
    >>> a
    ['aa', 11, 'bb', 22]
    
  3. The insert method on a list enables us to insert items at a given position in a list:

    >>> a = [11, 22, 33, 44, ]
    >>> a.insert(0, 'aa')
    >>> a
    ['aa', 11, 22, 33, 44]
    >>> a.insert(2, 'bb')
    >>> a
    ['aa', 11, 'bb', 22, 33, 44]
    

    But, note that we want to use append to add items at the end of a list.

  4. The pop method on a list returns the "right-most" item from a list and removes that item from the list:

    >>> a = [11, 22, 33, 44, ]
    >>>
    >>> b = a.pop()
    >>> a
    [11, 22, 33]
    >>> b
    44
    >>> b = a.pop()
    >>> a
    [11, 22]
    >>> b
    33
    

    Note that the append and pop methods taken together can be used to implement a stack, that is a LIFO (last in first out) data structure.

4.2.4   List comprehensions

A list comprehension is a convenient way to produce a list from an iterable (a sequence or other object that can be iterated over).

In its simplest form, a list comprehension resembles the header line of a for statement inside square brackets. However, in a list comprehension, the for statement header is prefixed with an expression and surrounded by square brackets. Here is a template:

[expr(x) for x in iterable]

where:

  • expr(x) is an expression, usually, but not always, containing x.
  • iterable is some iterable. An iterable may be a sequence (for example, a list, a string, a tuple) or an unordered collection or an iterator (something over which we can iterate or apply a for statement to).

Here is an example:

>>> a = [11, 22, 33, 44]
>>> b = [x * 2 for x in a]
>>> b
[22, 44, 66, 88]

Exercises:

  1. Given the following list of strings:

    names = ['alice', 'bertrand', 'charlene']
    

    produce the following lists: (1) a list of all upper case names; (2) a list of capitalized (first letter upper case);

  2. Given the following function which calculates the factorial of a number:

    def t(n):
        if n <= 1:
            return n
        else:
            return n * t(n - 1)
    

    and the following list of numbers:

    numbers = [2, 3, 4, 5]
    

    create a list of the factorials of each of the numbers in the list.

Solutions:

  1. For our expression in a list comprehension, use the upper and capitalize methods:

    >>> names = ['alice', 'bertrand', 'charlene']
    >>> [name.upper() for name in names]
    ['ALICE', 'BERTRAND', 'CHARLENE']
    >>> [name.capitalize() for name in names]
    ['Alice', 'Bertrand', 'Charlene']
    
  2. The expression in our list comprehension calls the factorial function:

    def t(n):
        if n <= 1:
            return n
        else:
            return n * t(n - 1)
    
    def test():
        numbers = [2, 3, 4, 5]
        factorials = [t(n) for n in numbers]
        print 'factorials:', factorials
    
    if __name__ == '__main__':
        test()
    

A list comprehension can also contain an if clause. Here is a template:

[expr(x) for x in iterable if pred(x)]

where:

  • pred(x) is an expression that evaluates to a true/false value. Values that count as false are numeric zero, False, None, and any empty collection. All other values count as true.

Examples:

>>> a = [11, 22, 33, 44]
>>> b = [x * 3 for x in a if x % 2 == 0]
>>> b
[66, 132]

Exercises:

  1. Given two lists, generate a list of all the strings in the first list that are not in the second list. Here are two sample lists:

    names1 = ['alice', 'bertrand', 'charlene', 'daniel']
    names2 = ['bertrand', 'charlene']
    

Solutions:

  1. The if clause of our list comprehension checks for containment in the list names2:

    def test():
        names1 = ['alice', 'bertrand', 'charlene', 'daniel']
        names2 = ['bertrand', 'charlene']
        names3 = [name for name in names1 if name not in names2]
        print 'names3:', names3
    
    if __name__ == '__main__':
        test()
    

    When run, this script prints out the following:

    names3: ['alice', 'daniel']
    

4.3   Strings

A string is an ordered sequence of characters. Here are a few characteristics of strings:

  • A string has a length. Get the length with the len() built-in function.
  • A string is indexable. Get a single character at a position in a string with the square bracket operator, for example mystring[5].
  • You can retrieve a slice (sub-string) of a string with a slice operation, for example mystring[5:8].

Create strings with single quotes or double quotes. You can put single quotes inside double quotes and you can put double quotes inside single quotes. You can also escape characters with a backslash.

Exercises:

  1. Create a string containing a single quote.
  2. Create a string containing a double quote.
  3. Create a string containing both a single quote a double quote.

Solutions:

  1. Create a string with double quotes to include single quotes inside the string:

    >>> str1 = "that is jerry's ball"
    
  2. Create a string with single quotes to include single quotes inside the string:

    >>> str1 = 'say "goodbye", bullwinkle'
    
  3. Take your choice. Escape either the single quotes or the double quotes with a backslash:

    >>> str1 = 'say "hello" to jerry\'s mom'
    >>> str2 = "say \"hello\" to jerry's mom"
    >>> str1
    'say "hello" to jerry\'s mom'
    >>> str2
    'say "hello" to jerry\'s mom'
    

Triple quotes enable you to create a string that spans multiple lines. Use three single quotes or three double quotes to create a single quoted string.

Examples:

  1. Create a triple quoted string that contains single and double quotes.

Solutions:

  1. Use triple single quotes or triple double quotes to create multi-line strings:

    String1 = '''This string extends
    across several lines.  And, so it has
    end-of-line characters in it.
    '''
    
    String2 = """
    This string begins and ends with an end-of-line
    character.  It can have both 'single'
    quotes and "double" quotes in it.
    """
    
    def test():
        print String1
        print String2
    
    if __name__ == '__main__':
        test()
    

4.3.1   Characters

Python does not have a distinct character type. In Python, a character is a string of length 1. You can use the ord() and chr() built-in functions to convert from character to integer and back.

Exercises:

  1. Create a character "a".
  2. Create a character, then obtain its integer representation.

Solutions:

  1. The character "a" is a plain string of length 1:

    >>> x = 'a'
    
  2. The integer equivalent of the letter "A":

    >>> x = "A"
    >>> ord(x)
    65
    

4.3.2   Operators on strings

You can concatenate strings with the "+" operator.

You can create multiple concatenated copies of a string with the "*" operator.

And, augmented assignment (+= and *=) also work.

Examples:

>>> 'cat' + ' and ' + 'dog'
'cat and dog'
>>> '#' * 40
'########################################'
>>>
>>> s1 = 'flower'
>>> s1 += 's'
>>> s1
'flowers'

Exercises:

  1. Given these strings:

    >>> s1 = 'abcd'
    >>> s2 = 'efgh'
    

    create a new string composed of the first string followed by (concatenated with) the second.

  2. Create a single string containing 5 copies of the string 'abc'.

  3. Use the multiplication operator to create a "line" of 50 dashes.

  4. Here are the components of a path to a file on the file system: "home", "myusername", "Workdir", "notes.txt". Concatenate these together separating them with the path separator to form a complete path to that file. (Note that if you use the backslash to separate components of the path, you will need to use a double backslash, because the backslash is the escape character in strings.

Solutions:

  1. The plus (+) operator applied to a string can be used to concatenate strings:

    >>> s3 = s1 + s2
    >>> s3
    'abcdefgh'
    
  2. The multiplication operator (*) applied to a string creates a new string that concatenates a string with itself some number of times:

    >>> s1 = 'abc' * 5
    >>> s1
    'abcabcabcabcabc'
    
  3. The multiplication operator (*) applied to a string can be used to create a "horizontal divider line":

    >>> s1 = '-' * 50
    >>> print s1
    --------------------------------------------------
    
  4. The sep member of the os module gives us a platform independent way to construct paths:

    >>> import os
    >>>
    >>> a = ["home", "myusername", "Workdir", "notes.txt"]
    >>> path = a[0] + os.sep + a[1] + os.sep + a[2] + os.sep + a[3]
    >>> path
    'home/myusername/Workdir/notes.txt'
    

    And, a more concise solution:

    >>> import os
    >>> a = ["home", "myusername", "Workdir", "notes.txt"]
    >>> os.sep.join(a)
    'home/myusername/Workdir/notes.txt'
    

    Notes:

    • Note that importing the os module and then using os.sep from that module gives us a platform independent solution.
    • If you do decide to code the path separator character explicitly and if you are on MS Windows where the path separator is the backslash, then you will need to use a double backslash, because that character is the escape character.

4.3.3   Methods on strings

String support a variety of operations. You can obtain a list of these methods by using the dir() built-in function on any string:

>>> dir("")
['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__',
'__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
'__rmul__', '__setattr__', '__str__', 'capitalize', 'center',
'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find',
'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip',
'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition',
'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']

And, you can get help on any specific method by using the help() built-in function. Here is an example:

>>> help("".strip)
Help on built-in function strip:

strip(...)
    S.strip([chars]) -> string or unicode

    Return a copy of the string S with leading and trailing
    whitespace removed.
    If chars is given and not None, remove characters in chars instead.
    If chars is unicode, S will be converted to unicode before stripping

Exercises:

  1. Strip all the whitespace characters off the right end of a string.
  2. Center a short string within a longer string, that is, pad a short string with blank characters on both right and left to center it.
  3. Convert a string to all upper case.
  4. Split a string into a list of "words".
  5. (a) Join the strings in a list of strings to form a single string. (b) Ditto, but put a newline character between each original string.

Solutions:

  1. The rstrip() method strips whitespace off the right side of a string:

    >>> s1 = 'some text   \n'
    >>> s1
    'some text   \n'
    >>> s2 = s1.rstrip()
    >>> s2
    'some text'
    
  2. The center(n) method centers a string within a padded string of width n:

    >>> s1 = 'Dave'
    >>> s2 = s1.center(20)
    >>> s2
    '        Dave        '
    
  3. The upper() method produces a new string that converts all alpha characters in the original to upper case:

    >>> s1 = 'Banana'
    >>> s1
    'Banana'
    >>> s2 = s1.upper()
    >>> s2
    'BANANA'
    
  4. The split(sep) method produces a list of strings that are separated by sep in the original string. If sep is omitted, whitespace is treated as the separator:

    >>> s1 = """how does it feel
    ... to be on your own
    ... no directions known
    ... like a rolling stone
    ... """
    >>> words = s1.split()
    >>> words
    ['how', 'does', 'it', 'feel', 'to', 'be', 'on', 'your', 'own', 'no',
    'directions', 'known', 'like', 'a', 'rolling', 'stone']
    
  5. The join() method concatenates strings from a list of strings to form a single string:

    >>> lines = []
    >>> lines.append('how does it feel')
    >>> lines.append('to be on your own')
    >>> lines.append('no directions known')
    >>> lines.append('like a rolling stone')
    >>> lines
    ['how does it feel', 'to be on your own', 'no directions known',
     'like a rolling stone']
    >>> s1 = ''.join(lines)
    >>> s2 = ' '.join(lines)
    >>> s3 = '\n'.join(lines)
    >>> s1
    'how does it feelto be on your ownno directions knownlike a rolling stone'
    >>> s2
    'how does it feel to be on your own no directions known like a rolling stone'
    >>> s3
    'how does it feel\nto be on your own\nno directions known\nlike a rolling stone'
    >>> print s3
    how does it feel
    to be on your own
    no directions known
    like a rolling stone
    

4.4   Dictionaries

A dictionary is an un-ordered collection of key-value pairs.

A dictionary has a length, specifically the number of key-value pairs.

The keys must be immutable object types.

4.4.1   Literal representation of dictionaries

Curley brackets are used to represent a dictionary. Each pair in the dictionary is represented by a key and value separated by a colon. Multiple pairs are separated by comas. For example, here is an empty dictionary and several dictionaries containing key/value pairs:

In [4]: d1 = {}
In [5]: d2 = {'width': 8.5, 'height': 11}
In [6]: d3 = {1: 'RED', 2: 'GREEN', 3: 'BLUE', }
In [7]: d1
Out[7]: {}
In [8]: d2
Out[8]: {'height': 11, 'width': 8.5}
In [9]: d3
Out[9]: {1: 'RED', 2: 'GREEN', 3: 'BLUE'}

Notes:

  • A comma after the last pair is optional. See the RED-GREEN-BLUE example above.
  • Strings and integers work as keys, since they are immutable. You might also want to think about the use of tuples of integers as keys in a dictionary used to represent a sparse array.

Exercises:

  1. Define a dictionary that has the following key-value pairs:

    Key

    Value

    Eggplant

    Purple

    Tomato

    Red

    Parsley

    Green

    Lemon

    Yellow

    Pepper

    Green, Red, Yellow

  2. Define a dictionary to represent the "enum" days of the week: Sunday, Monday, Tuesday, ...

Solutions:

  1. A dictionary whose keys and values are strings can be used to represent this table:

    vegetables = {
        'Eggplant': 'Purple',
        'Tomato': 'Red',
        'Parsley': 'Green',
        'Lemon': 'Yellow',
        'Pepper': 'Green',
        }
    
  2. We might use strings for the names of the days of the week as keys:

    DAYS = {
        'Sunday':    1,
        'Monday':    2,
        'Tuesday':   3,
        'Wednesday': 4,
        'Thrusday':  5,
        'Friday':    6,
        'Saturday':  7,
        }
    

4.4.2   Operators on dictionaries

Dictionaries support the following "operators":

  • Length -- len(d) returns the number of pairs in a dictionary.

  • Indexing -- You can both set and get the value associated with a key. Examples:

    In [12]: d3[2]
    Out[12]: 'GREEN'
    In [13]: d3[0] = 'WHITE'
    In [14]: d3[0]
    Out[14]: 'WHITE'
    

4.4.3   Methods on dictionaries

Here is a table that describes the methods applicable to dictionarys:

Operation Result
len(a) the number of items in a
a[k] the item of a with key k
a[k] = v set a[k] to v
del a[k] remove a[k] from a
a.clear() remove all items from a
a.copy() a (shallow) copy of a
k in a True if a has a key k, else False
k not in a equivalent to not k in a
a.has_key(k) equivalent to k in a, use that form in new code
a.items() a copy of a's list of (key, value) pair
a.keys() a copy of a's list of keys
a.update([b]) updates a with key/value pairs from b, overwriting existing keys, returns None
a.fromkeys(seq[, value]) creates a new dictionary with keys from seq and values set to value
a.values() a copy of a's list of values
a.get(k[, x]) a[k] if k in a, else x)
a.setdefault(k[, x]) a[k] if k in a, else x (also setting it)
a.pop(k[, x]) a[k] if k in a, else x (and remove k) (8)
a.popitem() remove and return an arbitrary (key, value) pair
a.iteritems() return an iterator over (key, value) pairs
a.iterkeys() return an iterator over the mapping's keys
a.itervalues() return an iterator over the mapping's values

You can also find this table at the standard documentation Web site in the "Python Library Reference": Mapping Types -- dict http://docs.python.org/lib/typesmapping.html

Exercises:

  1. Print the keys and values in the above "vegetable" dictionary.
  2. Print the keys and values in the above "vegetable" dictionary with the keys in alphabetical order.
  3. Test for the occurance of a key in a dictionary.

Solutions:

  1. We can use the d.items() method to retrieve a list of tuples containing key-value pairs, then use unpacking to capture the key and value:

    Vegetables = {
        'Eggplant': 'Purple',
        'Tomato': 'Red',
        'Parsley': 'Green',
        'Lemon': 'Yellow',
        'Pepper': 'Green',
        }
    
    def test():
        for key, value in Vegetables.items():
            print 'key:', key, ' value:', value
    
    test()
    
  2. We retrieve a list of keys with the keys() method, the sort it with the list sort() method:

    Vegetables = {
        'Eggplant': 'Purple',
        'Tomato': 'Red',
        'Parsley': 'Green',
        'Lemon': 'Yellow',
        'Pepper': 'Green',
        }
    
    def test():
        keys = Vegetables.keys()
        keys.sort()
        for key in keys:
            print 'key:', key, ' value:', Vegetables[key]
    
    test()
    
  3. To test for the existence of a key in a dictionary, we can use either the in operator (preferred) or the d.has_key() method (old style):

    Vegetables = {
        'Eggplant': 'Purple',
        'Tomato': 'Red',
        'Parsley': 'Green',
        'Lemon': 'Yellow',
        'Pepper': 'Green',
        }
    
    def test():
        if 'Eggplant' in Vegetables:
            print 'we have %s egplants' % Vegetables['Eggplant']
        if 'Banana' not in Vegetables:
            print 'yes we have no bananas'
        if Vegetables.has_key('Parsley'):
            print 'we have leafy, %s parsley' % Vegetables['Parsley']
    
    test()
    

    Which will print out:

    we have Purple egplants
    yes we have no bananas
    we have leafy, Green parsley
    

4.5   Files

A Python file object represents a file on a file system.

A file object open for reading a text file is iterable. When we iterate over it, it produces the lines in the file.

A file may be opened in these modes:

  • 'r' -- read mode. The file must exist.
  • 'w' -- write mode. The file is created; an existing file is overwritten.
  • 'a' -- append mode. An existing file is opened for writing (at the end of the file). A file is created if it does not exist.

The open() built-in function is used to create a file object. For example, the following code (1) opens a file for writing, then (2) for reading, then (3) for appending, and finally (4) for reading again:

def test(infilename):
    # 1. Open the file in write mode, which creates the file.
    outfile = open(infilename, 'w')
    outfile.write('line 1\n')
    outfile.write('line 2\n')
    outfile.write('line 3\n')
    outfile.close()
    # 2. Open the file for reading.
    infile = open(infilename, 'r')
    for line in infile:
        print 'Line:', line.rstrip()
    infile.close()
    # 3. Open the file in append mode, and add a line to the end of
    #    the file.
    outfile = open(infilename, 'a')
    outfile.write('line 4\n')
    outfile.close()
    print '-' * 40
    # 4. Open the file in read mode once more.
    infile = open(infilename, 'r')
    for line in infile:
        print 'Line:', line.rstrip()
    infile.close()

test('tmp.txt')

Exercises:

  1. Open a text file for reading, then read the entire file as a single string, and then split the content on newline characters.
  2. Open a text file for reading, then read the entire file as a list of strings, where each string is one line in the file.
  3. Open a text file for reading, then iterate of each line in the file and print it out.

Solutions:

  1. Use the open() built-in function to open the file and create a file object. Use the read() method on the file object to read the entire file. Use the split() or splitlines() methods to split the file into lines:

    >>> infile = open('tmp.txt', 'r')
    >>> content = infile.read()
    >>> infile.close()
    >>> lines = content.splitlines()
    >>> print lines
    ['line 1', 'line 2', 'line 3', '']
    
  2. The f.readlines() method returns a list of lines in a file:

    >>> infile = open('tmp.txt', 'r')
    >>> lines = infile.readlines()
    >>> infile.close()
    >>> print lines
    ['line 1\n', 'line 2\n', 'line 3\n']
    
  3. Since a file object (open for reading) is itself an iterator, we can iterate over it in a for statement:

    """
    Test iteration over a text file.
    Usage:
        python test.py in_file_name
    """
    
    import sys
    
    def test(infilename):
        infile = open(infilename, 'r')
        for line in infile:
            # Strip off the new-line character and any whitespace on
            # the right.
            line = line.rstrip()
            # Print only non-blank lines.
            if line:
                print line
        infile.close()
    
    def main():
        args = sys.argv[1:]
        if len(args) != 1:
            print __doc__
            sys.exit(1)
        infilename = args[0]
        test(infilename)
    
    if __name__ == '__main__':
        main()
    

    Notes:

    • The last two lines of this solution check the __name__ attribute of the module itself so that the module will run as a script but will not run when the module is imported by another module.
    • The __doc__ attribute of the module gives us the module's doc-string, which is the string defined at the top of the module.
    • sys.argv gives us the command line. And, sys.argv[1:] chops off the program name, leaving us with the comman line arguments.

4.6   A few miscellaneous data types

4.6.1   None

None is a singleton. There is only one instance of None. Use this value to indicate the absence of any other "real" value.

Test for None with the identity operator is.

Exercises:

  1. Create a list, some of whose elements are None. Then write a for loop that counts the number of occurances of None in the list.

Solutions:

  1. The identity operators is and is not can be used to test for None:

    >>> a = [11, None, 'abc', None, {}]
    >>> a
    [11, None, 'abc', None, {}]
    >>> count = 0
    >>> for item in a:
    ...     if item is None:
    ...         count += 1
    ...
    >>>
    >>> print count
    2
    

4.6.2   The booleans True and False

Python has the two boolean values True and False. Many comparison operators return True and False.

Examples:

  1. What value is returned by 3 > 2?

    Answer: The boolean value True.

  2. Given these variable definitions:

    x = 3
    y = 4
    z = 5
    

    What does the following print out:

    print y > x and z > y
    

    Answer -- Prints out "True"

5   Statements

5.1   Assignment statement

The assignment statement uses the assignment operator =.

The assignment statement is a binding statement: it binds a value to a name within a namespace.

Exercises:

  1. Bind the value "eggplant" to the variable vegetable.

Solutions:

1. The = operator is an assignment statement that binds a value to a variable:

>>> vegetable = "eggplant"

There is also augmented assignment using the operators +=, -=, *=, /=, etc.

Exercises:

  1. Use augmented assignment to increment the value of an integer.
  2. Use augmented assignment to append characters to the end of a string.
  3. Use augmented assignment to append the items in one list to another.
  4. Use augmented assignment to decrement a variable containing an integer by 1.

Solutions:

  1. The += operator increments the value of an integer:

    >>> count = 0
    >>> count += 1
    >>> count
    1
    >>> count += 1
    >>> count
    2
    
  2. The += operator appends characters to the end of a string:

    >>> buffer = 'abcde'
    >>> buffer += 'fgh'
    >>> buffer
    'abcdefgh'
    
  3. The += operator appends items in one list to another:

    In [20]: a = [11, 22, 33]
    In [21]: b = [44, 55]
    In [22]: a += b
    In [23]: a
    Out[23]: [11, 22, 33, 44, 55]
    
  1. The -= operator decrements the value of an integer:

    >>> count = 5
    >>> count
    5
    >>> count -= 1
    >>> count
    4
    

You can also assign a value to (1) an element of a list, (2) an item in a dictionary, (3) an attribute of an object, etc.

Exercises:

  1. Create a list of three items, then assign a new value to the 2nd element in the list.

  2. Create a dictionary, then assign values to the keys "vegetable" and "fruit" in that dictionary.

  3. Use the following code to create an instance of a class:

    class A(object):
        pass
    a = A()
    

    Then assign values to an attribue named category in that instance.

Solutions:

  1. Assignment with the indexing operator [] assigns a value to an element in a list:

    >>> trees = ['pine', 'oak', 'elm']
    >>> trees
    ['pine', 'oak', 'elm']
    >>> trees[1] = 'cedar'
    >>> trees
    ['pine', 'cedar', 'elm']
    
  2. Assignment with the indexing operator [] assigns a value to an item (a key-value pair) in a dictionary:

    >>> foods = {}
    >>> foods
    {}
    >>> foods['vegetable'] = 'green beans'
    >>> foods['fruit'] = 'nectarine'
    >>> foods
    {'vegetable': 'green beans', 'fruit': 'nectarine'}
    
  3. Assignment along with the dereferencing operator . (dot) enables us to assign a value to an attribute of an object:

    >>> class A(object):
    ...     pass
    ...
    >>> a = A()
    >>> a.category = 25
    >>> a.__dict__
    {'category': 25}
    >>> a.category
    25
    

5.3   if statement

The if statement is a compound statement that enables us to conditionally execute blocks of code.

The if statement also has optional ifel: and else: clauses.

The condition in an if: or elif: clause can be any Python expression, in other words, something that returns a value (even if that value is None).

In the condition in an if: or elif: clause, the following values are count as "false":

  • False
  • None
  • Numeric zero
  • An empty collection, for example an empty list or dictionary
  • An empty string (a string of length zero)

All other values count as true.

Exercises:

  1. Given the following list:

    >>> bananas = ['banana1', 'banana2', 'banana3',]
    

    Print one message if it is an empty list and another messge if it is not.

  2. Here is one way of defining a Python equivalent of an "enum":

    NO_COLOR, RED, GREEN, BLUE = range(4)
    

    Write an if: statement which implements the effect of a "switch" statement in Python. Print out a unique message for each color.

Solutions:

  1. We can test for an empty or non-empty list:

    >>> bananas = ['banana1', 'banana2', 'banana3',]
    >>> if not bananas:
    ...     print 'yes, we have no bananas'
    ... else:
    ...     print 'yes, we have bananas'
    ...
    yes, we have bananas
    
  2. We can simulate a "switch" statement using if:elif: ...:

    NO_COLOR, RED, GREEN, BLUE = range(4)
    
    def test(color):
        if color == RED:
            print "It's red."
        elif color == GREEN:
            print "It's green."
        elif color == BLUE:
            print "It's blue."
    
    def main():
        color = BLUE
        test(color)
    
    if __name__ == '__main__':
        main()
    

    Which, when run prints out the following:

    It's blue.
    

5.4   for statement

The for: statement is the Python way to iterate over and process the elements of a collection or other iterable.

The basic form of the for: statement is the following:

for X in Y:
    statement
    o
    o
    o

where:

  • X is something that can be assigned to. It is something to which Python can bind a value.
  • Y is some collection or other iterable.

Exercises:

Solutions:

When we need a sequential index, we can use the range() built-in function to create a list of integers. And, the xrange() built-in function produces an interator that produces a sequence of integers without creating the entire list. To iterate over a large sequence of integers, use xrange() instead of range().

Exercises:

  1. Print out the integers from 0 to 5 in sequence.

  2. Compute the sum of all the integers from 0 to 99999.

  3. Given the following generator function:

    import urllib
    
    Urls = [
        'http://yahoo.com',
        'http://python.org',
        'http://gimp.org',    # The GNU image manipulation program
        ]
    
    def walk(url_list):
        for url in url_list:
            f = urllib.urlopen(url)
            stuff = f.read()
            f.close()
            yield stuff
    

    Write a for: statement that uses this iterator generator to print the lengths of the content at each of the Web pages in that list.

Solutions:

  1. The range() built-in function gives us a sequence to iterate over:

    In [5]: for idx in range(6):
       ...:     print 'idx: %d' % idx
       ...:
       ...:
    idx: 0
    idx: 1
    idx: 2
    idx: 3
    idx: 4
    idx: 5
    
  2. Since that sequence is a bit large, we'll use xrange() instead of range():

    In [8]: count = 0
    In [9]: for n in xrange(100000):
       ...:     count += n
       ...:
       ...:
    In [10]: count
    Out[10]: 4999950000
    
  3. The for: statement enables us to iterate over iterables as well as collections:

    import urllib
    
    Urls = [
        'http://yahoo.com',
        'http://python.org',
        'http://gimp.org',    # The GNU image manipulation program
        ]
    
    def walk(url_list):
        for url in url_list:
            f = urllib.urlopen(url)
            stuff = f.read()
            f.close()
            yield stuff
    
    def test():
        for x in walk(Urls):
            print 'length: %d' % (len(x), )
    
    if __name__ == '__main__':
        test()
    

    When I ran this script, it prints the following:

    length: 9562
    length: 16341
    length: 12343
    

If you need an index while iterating over a sequence, consider using the enumerate() built-in function.

Exercises:

  1. Given the following two lists of integers of the same length:

    a = [1, 2, 3, 4, 5]
    b = [100, 200, 300, 400, 500]
    

    Add the values in the first list to the corresponding values in the second list.

Solutions:

  1. The enumerate() built-in function gives us an index and values from a sequence. Since enumerate() gives us an interator that produces a sequence of two-tuples, we can unpack those tuples into index and value variables in the header line of the for statement:

    In [13]: a = [1, 2, 3, 4, 5]
    In [14]: b = [100, 200, 300, 400, 500]
    In [15]:
    In [16]: for idx, value in enumerate(a):
       ....:     b[idx] += value
       ....:
       ....:
    In [17]: b
    Out[17]: [101, 202, 303, 404, 505]
    

5.5   while statement

A while: statement executes a block of code repeatedly as long as a condition is true.

Here is a template for the while: statement:

while condition:
    statement
    o
    o
    o

Where:

  • condition is an expression. The expression is something that returns a value which can be interpreted as true or false.

Exercises:

  1. Write a while: loop that doubles all the values in a list of integers.

Solutions:

  1. A while: loop with an index variable can be used to modify each element of a list:

    def test_while():
        numbers = [11, 22, 33, 44, ]
        print 'before: %s' % (numbers, )
        idx = 0
        while idx < len(numbers):
            numbers[idx] *= 2
            idx += 1
        print 'after: %s' % (numbers, )
    

    But, notice that this task is easier using the for: statement and the built-in enumerate() function:

    def test_for():
        numbers = [11, 22, 33, 44, ]
        print 'before: %s' % (numbers, )
        for idx, item in enumerate(numbers):
            numbers[idx] *= 2
        print 'after: %s' % (numbers, )
    

5.6   break and continue statements

The continue statement skips the remainder of the statements in the body of a loop and starts at the top of the loop again.

A break statement in the body of a loop terminates the loop. It exits from the immediately containing loop.

break and continue can be used in both for: and while: statements.

Exercises:

  1. Write a for: loop that takes a list of integers and triples each integer that is even. Use the continue statement.
  2. Write a loop that takes a list of integers and computes the sum of all the integers up until a zero is found in the list. Use the break statement.

Solutions:

  1. The continue statement enables us to "skip" items that satisfy a condition or test:

    def test():
        numbers = [11, 22, 33, 44, 55, 66, ]
        print 'before: %s' % (numbers, )
        for idx, item in enumerate(numbers):
            if item % 2 != 0:
                continue
            numbers[idx] *= 3
        print 'after: %s' % (numbers, )
    
    test()
    
  2. The break statement enables us to exit from a loop when we find a zero:

    def test():
        numbers = [11, 22, 33, 0, 44, 55, 66, ]
        print 'numbers: %s' % (numbers, )
        sum = 0
        for item in numbers:
            if item == 0:
                break
            sum += item
        print 'sum: %d' % (sum, )
    
    test()
    

5.7   Exceptions and the try:except: and raise statements

The try:except: statement enables us to catch an exception that is thrown from within a block of code, or from code called from any depth withing that block.

The raise statement enables us to throw an exception.

An exception is a class or an instance of an exception class. If an exception is not caught, it results in a traceback and termination of the program.

There is a set of standard exceptions. You can learn about them here: Built-in Exceptions -- http://docs.python.org/lib/module-exceptions.html.

You can define your own exception classes. To do so, create an empty subclass of the class Exception. Defining your own exception will enable you (or others) to throw and then catch that specific exception type while ignore others exceptions.

Exercises:

  1. Write a try:except: statement that attempts to open a file for reading and catches the exception thrown when the file does not exist.

    Question: How do you find out the name of the exception that is thrown for an input/output error such as the failure to open a file?

  2. Define an exception class. Then write a try:except: statement in which you throw and catch that specific exception.

Solutions:

  1. Use the Python interactive interpreter to learn the exception type thrown when a I/O error occurs. Example:

    >>> infile = open('xx_nothing__yy.txt', 'r')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IOError: [Errno 2] No such file or directory: 'xx_nothing__yy.txt'
    >>>
    

    In this case, the exception type is IOError.

    Now, write a try:except: block which catches that exception:

    def test():
        infilename = 'nothing_noplace.txt'
        try:
            infile = open(infilename, 'r')
            for line in infile:
                print line
        except IOError, exp:
            print 'cannot open file "%s"' % infilename
    
    test()
    
  2. We define a exception class as a sub-class of class Exception, then throw it (with the raise statement) and catch it (with a try:except: statement):

    class SizeError(Exception):
        pass
    
    def test_exception(size):
        try:
            if size <= 0:
                raise SizeError, 'size must be greater than zero'
            # Produce a different error to show that it will not be caught.
            x = y
        except SizeError, exp:
            print '%s' % (exp, )
            print 'goodbye'
    
    def test():
        test_exception(-1)
        print '-' * 40
        test_exception(1)
    
    test()
    

    When we run this script, it produces the following output:

    $ python workbook027.py
    size must be greater than zero
    goodbye
    ----------------------------------------
    Traceback (most recent call last):
      File "workbook027.py", line 20, in <module>
        test()
      File "workbook027.py", line 18, in test
        test_exception(1)
      File "workbook027.py", line 10, in test_exception
        x = y
    NameError: global name 'y' is not defined
    

    Notes:

    • Our except: clause caught the SizeError, but allowed the NameError to be uncaught.

6   Functions

A function has these characteristics:

A function is defined with the def: statement. Here is a simple example/template:

def function_name(arg1, arg2):
    local_var1 = arg1 + 1
    local_var2 = arg2 * 2
    return local_var1 + local_var2

And, here is an example of calling this function:

result = function_name(1, 2)

Here are a few notes of explanation:

Exercises:

  1. Write a function that takes a list of integers as an argument, and returns the sum of the integers in that list.

Solutions:

  1. The return statement enables us to return a value from a function:

    def list_sum(values):
        sum = 0
        for value in values:
            sum += value
        return sum
    
    def test():
        a = [11, 22, 33, 44, ]
        print list_sum(a)
    
    if __name__ == '__main__':
        test()
    

6.1   Optional arguments and default values

You can provide a default value for an argument to a function.

If you do, that argument is optional (when the function is called).

Here are a few things to learn about optional arguments:

  • Provide a default value with an equal sign and a value. Example:

    def sample_func(arg1, arg2, arg3='empty', arg4=0):
    
  • All parameters with default values must be after (to the right of) normal parameters.

  • Do not use a mutable object as a default value. Because the def: statement is not evaluated only once and not each time the function is called, the object may be shared across multiple calls to the function. Do not do this:

    def sample_func(arg1, arg2=[]):
    

    Instead, do this:

    def sample_func(arg1, arg2=None):
        if arg2 is None:
            arg2 = []
    

    Here is an example that illustrates how this might go wrong:

    def adder(a, b=[]):
        b.append(a)
        return b
    
    def test():
        print adder('aaa')
        print adder('bbb')
        print adder('ccc')
    
    test()
    

    Which, when executed, displays the following:

    ['aaa']
    ['aaa', 'bbb']
    ['aaa', 'bbb', 'ccc']
    

Exercises:

  1. Write a function that writes a string to a file. The function takes two arguments: (1) a file that is open for output and (2) a string. Give the second argument (the string) a default value so that when the second argument is omitted, an empty, blank line is written to the file.
  2. Write a function that takes the following arguments: (1) a name, (2) a value, and (3) and optional dictionary. The function adds the value to the dictionary using the name as a key in the dictionary.

Solutions:

  1. We can pass a file as we would any other object. And, we can use a newline character as a default parameter value:

    import sys
    
    def writer(outfile, msg='\n'):
        outfile.write(msg)
    
    def test():
        writer(sys.stdout, 'aaaaa\n')
        writer(sys.stdout)
        writer(sys.stdout, 'bbbbb\n')
    
    test()
    

    When run from the command line, this prints out the following:

    aaaaa
    
    bbbbb
    
  2. In this solution we are careful not to use a mutable object as a default value:

    def add_to_dict(name, value, dic=None):
        if dic is None:
            dic = {}
        dic[name] = value
        return dic
    
    def test():
        dic1 = {'albert': 'cute', }
        print add_to_dict('barry', 'funny', dic1)
        print add_to_dict('charlene', 'smart', dic1)
        print add_to_dict('darryl', 'outrageous')
        print add_to_dict('eddie', 'friendly')
    
    test()
    

    If we run this script, we see:

    {'barry': 'funny', 'albert': 'cute'}
    {'barry': 'funny', 'albert': 'cute', 'charlene': 'smart'}
    {'darryl': 'outrageous'}
    {'eddie': 'friendly'}
    

    Notes:

    • It's important that the default value for the dictionary is None rather than an empty dictionary, for example ({}). Remember that the def: statement is evaluated only once, which results in a single dictionary, which would be shared by all callers that do not provide a dictionary as an argument.

6.2   Passing functions as arguments

A function, like any other object, can be passed as an argument to a function. This is due the the fact that almost all (maybe all) objects in Python are "first class objects". A first class object is one which we can:

  1. Store in a data structure (e.g. a list, a dictionary, ...).
  2. Pass to a function.
  3. Return from a function.

Exercises:

  1. Write a function that takes three arguments: (1) an input file, (2) an output file, and (3) a filter function:

    • Argument 1 is a file opened for reading.
    • Argument 2 is a file opened for writing.
    • Argument 3 is a function that takes a single argument (a string), performs a transformation on that string, and returns the transformed string.

    The above function should read each line in the input text file, pass that line through the filter function, then write that (possibly) transformed line to the output file.

    Now, write one or more "filter functions" that can be passed to the function described above.

Solutions:

  1. This script adds or removes comment characters to the lines of a file:

    import sys
    
    def filter(infile, outfile, filterfunc):
        for line in infile:
            line = filterfunc(line)
            outfile.write(line)
    
    def add_comment(line):
        line = '## %s' % (line, )
        return line
    
    def remove_comment(line):
        if line.startswith('## '):
            line = line[3:]
        return line
    
    def main():
        filter(sys.stdin, sys.stdout, add_comment)
    
    if __name__ == '__main__':
        main()
    

    Running this might produce something like the following (note for MS Windows users: use type instead of cat):

    $ cat tmp.txt
    line 1
    line 2
    line 3
    $ cat tmp.txt | python workbook005.py
    ## line 1
    ## line 2
    ## line 3
    

6.3   Extra args and keyword args

Additional positional arguments passed to a function that are not specified in the function definition (the def: statement``), are collected in an argument preceded by a single asterisk. Keyword arguments passed to a function that are not specified in the function definition can be collected in a dictionary and passed to an argument preceded by a double asterisk.

Examples:

  1. Write a function that takes one positional argument, one argument with a default value, and also extra args and keyword args.

    Solution:

    def show_args(x, y=-1, *args, **kwargs):
        print '-' * 40
        print 'x:', x
        print 'y:', y
        print 'args:', args
        print 'kwargs:', kwargs
    
    def test():
        show_args(1)
        show_args(x=2, y=3)
        show_args(y=5, x=4)
        show_args(4, 5, 6, 7, 8)
        show_args(11, y=44, a=55, b=66)
    
    test()
    

    Running this script produces the following:

    $ python workbook006.py
    ----------------------------------------
    x: 1
    y: -1
    args: ()
    kwargs: {}
    ----------------------------------------
    x: 2
    y: 3
    args: ()
    kwargs: {}
    ----------------------------------------
    x: 4
    y: 5
    args: ()
    kwargs: {}
    ----------------------------------------
    x: 4
    y: 5
    args: (6, 7, 8)
    kwargs: {}
    ----------------------------------------
    x: 11
    y: 44
    args: ()
    kwargs: {'a': 55, 'b': 66}
    

6.3.1   Order of arguments (positional, extra, and keyword args)

In a function definition, arguments must appear in the following order, from left to right:

  1. Positional (normal, plain) arguments
  2. Arguments with default values, if any
  3. Extra arguments parameter (proceded by single asterisk), if present
  4. Keyword arguments parameter (proceded by double asterisk), if present

In a function call, arguments must appear in the following order, from left to right:

  1. Positional (plain) arguments
  2. Extra arguments, if present
  3. Keyword arguments, if present

6.4   Functions and duck-typing and polymorphism

A function can be called arguments of different types, so long as the arguments make sense for that function. And,

Exercises:

  1. Implement a function that takes two arguments: a function and an object. It applies the function argument to the object.
  2. Implement a function that takes two arguments: a list of functions and an object. It applies each function in the list to the argument.

Solutions:

  1. We can pass a function as an argument to a function:

    def fancy(obj):
        print 'fancy fancy -- %s -- fancy fancy' % (obj, )
    
    def plain(obj):
        print 'plain -- %s -- plain' % (obj, )
    
    def show(func, obj):
        func(obj)
    
    def main():
        a = {'aa': 11, 'bb': 22, }
        show(fancy, a)
        show(plain, a)
    
    if __name__ == '__main__':
        main()
    
  2. We can also put functions (function objects) in a data structure (for example, a list), and then pass that data structure to a function:

    def fancy(obj):
        print 'fancy fancy -- %s -- fancy fancy' % (obj, )
    
    def plain(obj):
        print 'plain -- %s -- plain' % (obj, )
    
    Func_list = [fancy, plain, ]
    
    def show(funcs, obj):
        for func in funcs:
            func(obj)
    
    def main():
        a = {'aa': 11, 'bb': 22, }
        show(Func_list, a)
    
    if __name__ == '__main__':
        main()
    

Notice that Python supports polymorphism (with or) without inheritance. This type of polymorphism is enabled by what is called duck-typing. For more on this see: Duck typing -- http://en.wikipedia.org/wiki/Duck_typing at Wikipedia.

6.5   Recursive functions

A recursive function is a function that calls itself.

A recursive function must have a limiting condition, or else it will loop endlessly.

Each recursive call consumes space on the function call stack. Therefore, the number of recursions must have some reasonable upper bound.

Exercises:

  1. Write a recursive function that prints information about each node in the following tree-structure data structure:

    Tree = {
        'name': 'animals',
        'left_branch': {
            'name': 'birds',
            'left_branch': {
                'name': 'seed eaters',
                'left_branch': {
                    'name': 'house finch',
                    'left_branch': None,
                    'right_branch': None,
                },
                'right_branch': {
                    'name': 'white crowned sparrow',
                    'left_branch': None,
                    'right_branch': None,
                },
            },
            'right_branch': {
                'name': 'insect eaters',
                'left_branch': {
                    'name': 'hermit thrush',
                    'left_branch': None,
                    'right_branch': None,
                },
                'right_branch': {
                    'name': 'black headed phoebe',
                    'left_branch': None,
                    'right_branch': None,
                },
            },
        },
        'right_branch': None,
    }
    

Solutions:

  1. We write a recursive function to walk the whole tree. The recursive function calls itself to process each child of a node in the tree:

    Tree = {
        'name': 'animals',
        'left_branch': {
            'name': 'birds',
            'left_branch': {
                'name': 'seed eaters',
                'left_branch': {
                    'name': 'house finch',
                    'left_branch': None,
                    'right_branch': None,
                },
                'right_branch': {
                    'name': 'white crowned sparrow',
                    'left_branch': None,
                    'right_branch': None,
                },
            },
            'right_branch': {
                'name': 'insect eaters',
                'left_branch': {
                    'name': 'hermit thrush',
                    'left_branch': None,
                    'right_branch': None,
                },
                'right_branch': {
                    'name': 'black headed phoebe',
                    'left_branch': None,
                    'right_branch': None,
                },
            },
        },
        'right_branch': None,
    }
    
    Indents = ['    ' * idx for idx in range(10)]
    
    def walk_and_show(node, level=0):
        if node is None:
            return
        print '%sname: %s' % (Indents[level], node['name'], )
        level += 1
        walk_and_show(node['left_branch'], level)
        walk_and_show(node['right_branch'], level)
    
    def test():
        walk_and_show(Tree)
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • Later, you will learn how to create equivalent data structures using classes and OOP (object-oriented programming). For more on that see Recursive calls to methods in this document.

6.6   Generators and iterators

The "iterator protocol" defines what an iterator object must do in order to be usable in an "iterator context" such as a for statement. The iterator protocol is described in the standard library reference: Iterator Types -- http://docs.python.org/lib/typeiter.html

An easy way to define an object that obeys the iterator protocol is to write a generator function. A generator function is a function that contains one or more yield statements. If a function contains at least one yield statement, then that function when called, returns generator iterator, which is an object that obeys the iterator protocol, i.e. it's an iterator object.

Note that in recent versions of Python, yield is an expression. This enables the consumer to communicate back with the producer (the generator iterator). For more on this, see PEP: 342 Coroutines via Enhanced Generators - http://www.python.org/dev/peps/pep-0342/.

Exercises:

  1. Implement a generator function -- The generator produced should yield all values from a list/iterable that satisfy a predicate. It should apply the transforms before return each value. The function takes these arguments:

    1. values -- A list of values. Actually, it could be any iterable.

    2. predicate -- A function that takes a single argument, performs a test on that value, and returns True or False.

    3. transforms -- (optional) A list of functions. Apply each function in this list and returns the resulting value. So, for example, if the function is called like this:

      result = transforms([11, 22], p, [f, g])
      

      then the resulting generator might return:

      g(f(11))
      
  2. Implement a generator function that takes a list of URLs as its argument and generates the contents of each Web page, one by one (that is, it produces a sequence of strings, the HTML page contents).

Solutions:

  1. Here is the implementation of a function which contains yield, and, therefore, produces a generator:

    #!/usr/bin/env python
    """
    filter_and_transform
    
    filter_and_transform(content, test_func, transforms=None)
    
    Return a generator that returns items from content after applying
    the functions in transforms if the item satisfies test_func .
    
    Arguments:
    
       1. ``values`` -- A list of values
    
       2. ``predicate`` -- A function that takes a single argument,
          performs a test on that value, and returns True or False.
    
       3. ``transforms`` -- (optional) A list of functions.  Apply each
          function in this list and returns the resulting value.  So,
          for example, if the function is called like this::
    
           result = filter_and_transforms([11, 22], p, [f, g])
    
          then the resulting generator might return::
    
              g(f(11))
    """
    
    def filter_and_transform(content, test_func, transforms=None):
        for x in content:
            if test_func(x):
                if transforms is None:
                    yield x
                elif isiterable(transforms):
                    for func in transforms:
                        x = func(x)
                    yield x
                else:
                    yield transforms(x)
    
    def isiterable(x):
        flag = True
        try:
            x = iter(x)
        except TypeError, exp:
            flag = False
        return flag
    
    def iseven(n):
        return n % 2 == 0
    
    def f(n):
        return n * 2
    
    def g(n):
        return n ** 2
    
    def test():
        data1 = [11, 22, 33, 44, 55, 66, 77, ]
        for val in filter_and_transform(data1, iseven, f):
            print 'val: %d' % (val, )
        print '-' * 40
        for val in filter_and_transform(data1, iseven, [f, g]):
            print 'val: %d' % (val, )
        print '-' * 40
        for val in filter_and_transform(data1, iseven):
            print 'val: %d' % (val, )
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • Because function filter_and_transform contains yield, when called, it returns an iterator object, which we can use in a for statement.
    • The second parameter of function filter_and_transform takes any function which takes a single argument and returns True or False. This is an example of polymorphism and "duck typing" (see Duck Typing -- http://en.wikipedia.org/wiki/Duck_typing). An analogous claim can be made about the third parameter.
  2. The following function uses the urllib module and the yield function to generate the contents of a sequence of Web pages:

    import urllib
    
    Urls = [
        'http://yahoo.com',
        'http://python.org',
        'http://gimp.org',    # The GNU image manipulation program
        ]
    
    def walk(url_list):
        for url in url_list:
            f = urllib.urlopen(url)
            stuff = f.read()
            f.close()
            yield stuff
    
    def test():
        for x in walk(Urls):
            print 'length: %d' % (len(x), )
    
    if __name__ == '__main__':
        test()
    

    When I run this, I see:

    $ python generator_example.py
    length: 9554
    length: 16748
    length: 11487
    

7   Object-oriented programming and classes

Classes provide Python's way to define new data types and to do OOP (object-oriented programming).

If you have made it this far, you have already used lots of objects. You have been a "consumer" of objects and their services. Now, you will learn how to define and implement new kinds of objects. You will become a "producer" of objects. You will define new classes and you will implement the capabilities (methods) of each new class.

A class is defined with the class statement. The first line of a class statement is a header for a compound statement (it has a colon at the end), and it specifies the name of the class being defined and an (option) superclass. The body of the class statement contains statements, importantly, def statements that define the methods that can be called on instances of the objects implemented by this class.

Exercises:

  1. Define a class with one method show. That method should print out "Hello". Then, create an instance of your class, and call the show method.

Solutions:

  1. A simple instance method can have the self parameter and no others:

    class Demo(object):
        def show(self):
            print 'hello'
    
    def test():
        a = Demo()
        a.show()
    
    test()
    

    Notes:

    • Notice that we use object as a superclass, because we want to define an "new-style" class and because there is no other class that we want as a superclass. See the following for more information on new-style classes: New-style Classes -- http://www.python.org/doc/newstyle/.
    • In Python, we create an instance of a class by calling the class, that is, we apply the function call operator (parentheses) to the class.

7.1   The constructor

A class can define methods with special names. You have seem some of these before. These names begin and end with a double underscore.

One important special name is __init__. It's the constructor for a class. It is called each time an instance of the class is created. Implementing this method in a class gives us a chance to initialize each instance of our class.

Exercises:

  1. Implement a class named Plant that has a constructor which initializes two instance variables: name and size. Also implement a method named show that prints out the values of these instance variables.
  2. Implement a class name Node that has two instance variables: data and children, where data is any, arbitrary object and children is a list of child Nodes. Also implement a method named show that recursively displays the nodes in a "tree".

Solutions:

  1. The constructor for a class is a method with the special name __init__:

    class Plant(object):
        def __init__(self, name, size):
            self.name = name
            self.size = size
        def show(self):
            print 'name: "%s"  size: %d' % (self.name, self.size, )
    
    def test():
        p1 = Plant('Eggplant', 25)
        p2 = Plant('Tomato', 36)
        plants = [p1, p2, ]
        for plant in plants:
            plant.show()
    
    test()
    

    Notes:

    • Our constructor takes two arguments: name and size. It saves those two values as instance variables, that is in attributes of the instance.
    • The show() method prints out the value of those two instance variables.
  2. It is a good idea to initialize all instance variables in the constructor. That enables someone reading our code to learn about all the instance variables of a class by looking in a single location:

    # simple_node.py
    
    Indents = ['    ' * n for n in range(10)]
    
    class Node(object):
        def __init__(self, name=None, children=None):
            self.name = name
            if children is None:
                self.children = []
            else:
                self.children = children
        def show_name(self, indent):
            print '%sname: "%s"' % (Indents[indent], self.name, )
        def show(self, indent=0):
            self.show_name(indent)
            indent += 1
            for child in self.children:
                child.show(indent)
    
    def test():
        n1 = Node('N1')
        n2 = Node('N2')
        n3 = Node('N3')
        n4 = Node('N4')
        n5 = Node('N5', [n1, n2,])
        n6 = Node('N6', [n3, n4,])
        n7 = Node('N7', [n5, n6,])
        n7.show()
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • Notice that we do not use the constructor for a list ([]) as a default value for the children parameter of the constructor. A list is mutable and would be created only once (when the class statement is executed) and would be shared.

7.2   Inheritance -- Implementing a subclass

A subclass extends or specializes a superclass by adding additional methods to the superclass and by overriding methods (with the same name) that already exist in the superclass.

Exercises:

  1. Extend your Node exercise above by adding two additional subclasses of the Node class, one named Plant and the other named Animal. The Plant class also has a height instance variable and the Animal class also has a color instance variable.

Solutions:

  1. We can import our previous Node script, then implement classes that have the Node class as a superclass:

    from simple_node import Node, Indents
    
    class Plant(Node):
        def __init__(self, name, height=-1, children=None):
            Node.__init__(self, name, children)
            self.height = height
        def show(self, indent=0):
            self.show_name(indent)
            print '%sheight: %s' % (Indents[indent], self.height, )
            indent += 1
            for child in self.children:
                child.show(indent)
    
    class Animal(Node):
        def __init__(self, name, color='no color', children=None):
            Node.__init__(self, name, children)
            self.color = color
        def show(self, indent=0):
            self.show_name(indent)
            print '%scolor: "%s"' % (Indents[indent], self.color, )
            indent += 1
            for child in self.children:
                child.show(indent)
    
    def test():
        n1 = Animal('scrubjay', 'gray blue')
        n2 = Animal('raven', 'black')
        n3 = Animal('american kestrel', 'brown')
        n4 = Animal('red-shouldered hawk', 'brown and gray')
        n5 = Animal('corvid', 'none', [n1, n2,])
        n6 = Animal('raptor', children=[n3, n4,])
        n7a = Animal('bird', children=[n5, n6,])
        n1 = Plant('valley oak', 50)
        n2 = Plant('canyon live oak', 40)
        n3 = Plant('jeffery pine', 120)
        n4 = Plant('ponderosa pine', 140)
        n5 = Plant('oak', children=[n1, n2,])
        n6 = Plant('conifer', children=[n3, n4,])
        n7b = Plant('tree', children=[n5, n6,])
        n8 = Node('birds and trees', [n7a, n7b,])
        n8.show()
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • The show method in class Plant calls the show_name method in its superclass using self.show_name(...). Python searches up the inheritance tree to find the show_name method in class Node.
    • The constructor (__init__) in classes Plant and Animal each call the constructor in the superclass by using the name of the superclass. Why the difference? Because, if (in the Plant class, for example) it used self.__init__(...) it would be calling the __init__ in the Plant class, itself. So, it bypasses itself by referencing the constructor in the superclass directly.
    • This exercise also demonstrates "polymorphism" -- The show method is called a number of times, but which implementation executes depends on which instance it is called on. Calling on the show method on an instance of class Plant results in a call to Plant.show. Calling the show method on an instance of class Animal results in a call to Animal.show. And so on. It is important that each show method takes the correct number of arguments.

7.3   Classes and polymorphism

Python also supports class-based polymorphism, which was, by the way, demonstrated in the previous example.

Exercises:

  1. Write three classes, each of which implement a show() method that takes one argument, a string. The show method should print out the name of the class and the message. Then create a list of instances and call the show() method on each object in the list.

Solution:

  1. We implement three simple classes and then create a list of instances of these classes:

    class A(object):
        def show(self, msg):
            print 'class A -- msg: "%s"' % (msg, )
    
    class B(object):
        def show(self, msg):
            print 'class B -- msg: "%s"' % (msg, )
    
    class C(object):
        def show(self, msg):
            print 'class C -- msg: "%s"' % (msg, )
    
    def test():
        objs = [A(), B(), C(), A(), ]
        for idx, obj in enumerate(objs):
            msg = 'message # %d' % (idx + 1, )
            obj.show(msg)
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • We can call the show() method in any object in the list objs as long as we pass in a single parameter, that is, as long as we obey the requirements of duck-typing. We can do this because all objects in that list implement a show() method.
    • In a statically typed language, that is a language where the type is (also) present in the variable, all the instances in example would have to descend from a common superclass and that superclass would have to implement a show() method. Python does not impose this restriction. And, because variables are not not typed in Python, perhaps that would not even possible.
    • Notice that this example of polymorphism works even though these three classes (A, B, and C) are not related (for example, in a class hierarchy). All that is required for polymorphism to work in Python is for the method names to be the same and the arguments to be compatible.

7.4   Recursive calls to methods

A method in a class can recusively call itself. This is very similar to the way in which we implemented recursive functions -- see: Recursive functions.

Exercises:

  1. Re-implement the binary tree of animals and birds described in Recursive functions, but this time, use a class to represent each node in the tree.
  2. Solve the same problem, but this time implement a tree in which each node can have any number of children (rather than exactly 2 children).

Solutions:

  1. We implement a class with three instance variables: (1) name, (2) left branch, and (3) right branch. Then, we implement a show() method that displays the name and calls itself to show the children in each sub-tree:

    Indents = ['    ' * idx for idx in range(10)]
    
    class AnimalNode(object):
    
        def __init__(self, name, left_branch=None, right_branch=None):
            self.name = name
            self.left_branch = left_branch
            self.right_branch = right_branch
    
        def show(self, level=0):
            print '%sname: %s' % (Indents[level], self.name, )
            level += 1
            if self.left_branch is not None:
                self.left_branch.show(level)
            if self.right_branch is not None:
                self.right_branch.show(level)
    
    Tree = AnimalNode('animals',
        AnimalNode('birds',
            AnimalNode('seed eaters',
                AnimalNode('house finch'),
                AnimalNode('white crowned sparrow'),
            ),
            AnimalNode('insect eaters',
                AnimalNode('hermit thrush'),
                AnimalNode('black headed phoebe'),
            ),
        ),
        None,
    )
    
    def test():
        Tree.show()
    
    if __name__ == '__main__':
        test()
    
  2. Instead of using a left branch and a right branch, in this solution we use a list to represent the children of a node:

    class AnimalNode(object):
        def __init__(self, data, children=None):
            self.data = data
            if children is None:
                self.children = []
            else:
                self.children = children
    
        def show(self, level=''):
            print '%sdata: %s' % (level, self.data, )
            level += '    '
            for child in self.children:
                child.show(level)
    
    Tree = AnimalNode('animals', [
        AnimalNode('birds', [
            AnimalNode('seed eaters', [
                AnimalNode('house finch'),
                AnimalNode('white crowned sparrow'),
                AnimalNode('lesser gold finch'),
            ]),
            AnimalNode('insect eaters', [
                AnimalNode('hermit thrush'),
                AnimalNode('black headed phoebe'),
            ]),
        ])
    ])
    
    def test():
        Tree.show()
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • We represent the children of a node as a list. Each node "has-a" list of children.
    • Notice that because a list is mutable, we do not use a list constructor ([]) in the initializer of the method header. Instead, we use None, then construct an empty list in the body of the method if necessary. See section Optional arguments and default values for more on this.
    • We (recursively) call the show method for each node in the children list. Since a node which has no children (a leaf node) will have an empty children list, this provides a limit condition for our recursion.

7.5   Class variables, class methods, and static methods

A class variable is one whose single value is shared by all instances of the class and, in fact, is shared by all who have access to the class (object).

"Normal" methods are instance methods. An instance method receives the instance as its first argument. A instance method is defined by using the def statement in the body of a class statement.

A class method receives the class as its first argument. A class method is defined by defining a normal/instance method, then using the classmethod built-in function. For example:

class ASimpleClass(object):
    description = 'a simple class'
    def show_class(cls, msg):
        print '%s: %s' % (cls.description , msg, )
        show_class = classmethod(show_class)

A static method does not receive anything special as its first argument. A static method is defined by defining a normal/instance method, then using the staticmethod built-in function. For example:

class ASimpleClass(object):
    description = 'a simple class'
    def show_class(msg):
        print '%s: %s' % (ASimpleClass.description , msg, )
        show_class = staticmethod(show_class)

In effect, both class methods and static methods are defined by creating a normal (instance) method, then creating a wrapper object (a class method or static method) using the classmethod or staticmethod built-in function.

Exercises:

  1. Implement a class that keeps a running total of the number of instances created.
  2. Implement another solution to the same problem (a class that keeps a running total of the number of instances), but this time use a static method instead of a class method.

Solutions:

  1. We use a class variable named instance_count, rather than an instance variable, to keep a running total of instances. Then, we increment that variable each time an instance is created:

    class CountInstances(object):
    
        instance_count = 0
    
        def __init__(self, name='-no name-'):
            self.name = name
            CountInstances.instance_count += 1
    
        def show(self):
            print 'name: "%s"' % (self.name, )
    
        def show_instance_count(cls):
            print 'instance count: %d' % (cls.instance_count, )
        show_instance_count = classmethod(show_instance_count)
    
    
    def test():
        instances = []
        instances.append(CountInstances('apple'))
        instances.append(CountInstances('banana'))
        instances.append(CountInstances('cherry'))
        instances.append(CountInstances())
        for instance in instances:
            instance.show()
        CountInstances.show_instance_count()
    
    
    if __name__ == '__main__':
        test()
    

    Notes:

    • When we run this script, it prints out the following:

      name: "apple"
      name: "banana"
      name: "cherry"
      name: "-no name-"
      instance count: 4
      
    • The call to the classmethod built-in function effectively wraps the show_instance_count method in a class method, that is, in a method that takes a class object as its first argument rather than an instance object. To read more about classmethod, go to Built-in Functions -- http://docs.python.org/lib/built-in-funcs.html and search for "classmethod".

  2. A static method takes neither an instance (self) nor a class as its first paramenter. And, static method is created with the staticmethod() built-in function (rather than with the classmethod() built-in):

    class CountInstances(object):
    
        instance_count = 0
    
        def __init__(self, name='-no name-'):
            self.name = name
            CountInstances.instance_count += 1
    
        def show(self):
            print 'name: "%s"' % (self.name, )
    
        def show_instance_count():
            print 'instance count: %d' % (
                CountInstances.instance_count, )
        show_instance_count = staticmethod(show_instance_count)
    
    def test():
        instances = []
        instances.append(CountInstances('apple'))
        instances.append(CountInstances('banana'))
        instances.append(CountInstances('cherry'))
        instances.append(CountInstances())
        for instance in instances:
            instance.show()
        CountInstances.show_instance_count()
    
    if __name__ == '__main__':
        test()
    

7.5.1   Decorators for classmethod and staticmethod

A decorator enables us to do what we did in the previous example with a somewhat simpler syntax.

For simple cases, the decorator syntax enables us to do this:

@functionwrapper
def method1(self):
    o
    o
    o

instead of this:

def method1(self):
    o
    o
    o
method1 = functionwrapper(method1)

So, we can write this:

@classmethod
def method1(self):
    o
    o
    o

instead of this:

def method1(self):
    o
    o
    o
method1 = classmethod(method1)

Exercises:

  1. Implement the CountInstances example above, but use a decorator rather than the explicit call to classmethod.

Solutions:

  1. A decorator is an easier and cleaner way to define a class method (or a static method):

    class CountInstances(object):
    
        instance_count = 0
    
        def __init__(self, name='-no name-'):
            self.name = name
            CountInstances.instance_count += 1
    
        def show(self):
            print 'name: "%s"' % (self.name, )
    
        @classmethod
        def show_instance_count(cls):
            print 'instance count: %d' % (cls.instance_count, )
        # Note that the following line has been replaced by
        #   the classmethod decorator, above.
        # show_instance_count = classmethod(show_instance_count)
    
    def test():
        instances = []
        instances.append(CountInstances('apple'))
        instances.append(CountInstances('banana'))
        instances.append(CountInstances('cherry'))
        instances.append(CountInstances())
        for instance in instances:
            instance.show()
        CountInstances.show_instance_count()
    
    if __name__ == '__main__':
        test()
    

8   Additional and Advanced Topics

8.1   Decorators and how to implement them

Decorators can be used to "wrap" a function with another function.

When implementing a decorator, it is helpful to remember that the following decorator application:

@dec
def func(arg1, arg2):
    pass

is equivalent to:

def func(arg1, arg2):
    pass
func = dec(func)

Therefore, to implement a decorator, we write a function that returns a function object, since we replace the value originally bound to the function with this new function object. It may be helpful to take the view that we are creating a function that is a wrapper for the original function.

Exercises:

  1. Write a decorator that writes a message before and after executing a function.

Solutions:

  1. A function that contains and returns an inner function can be used to wrap a function:

    def trace(func):
        def inner(*args, **kwargs):
            print '>>'
            func(*args, **kwargs)
            print '<<'
        return inner
    
    @trace
    def func1(x, y):
        print 'x:', x, 'y:', y
        func2((x, y))
    
    @trace
    def func2(content):
        print 'content:', content
    
    def test():
        func1('aa', 'bb')
    
    test()
    

    Notes:

    • Your inner function can use *args and **kwargs to enable it to call functions with any number of arguments.

8.1.1   Decorators with arguments

Decorators can also take arguments.

The following decorator with arguments:

@dec(argA, argB)
def func(arg1, arg2):
    pass

is equivalent to:

def func(arg1, arg2):
    pass
func = dec(argA, argB)(func)

Because the decorator's arguments are passed to the result of calling the decorator on the decorated function, you may find it useful to implement a decorator with arguments using a function inside a function inside a function.

Exercises:

  1. Write and test a decorator that takes one argument. The decorator prints a message along with the value of the argument before and after entering the decorated function.

Solutions:

  1. Implement this decorator with arguments with a function containing a nested function which in turn contains a nested function:

    def trace(msg):
        def inner1(func):
            def inner2(*args, **kwargs):
                print '>> [%s]' % (msg, )
                retval = func(*args, **kwargs)
                print '<< [%s]' % (msg, )
                return retval
            return inner2
        return inner1
    
    @trace('tracing func1')
    def func1(x, y):
        print 'x:', x, 'y:', y
        result = func2((x, y))
        return result
    
    @trace('tracing func2')
    def func2(content):
        print 'content:', content
        return content * 3
    
    def test():
        result = func1('aa', 'bb')
        print 'result:', result
    
    test()
    

8.1.2   Stacked decorators

Decorators can be "stacked".

The following stacked decorators:

@dec2
@dec1
def func(arg1, arg2, ...):
    pass

are equivalent to:

def func(arg1, arg2, ...):
    pass
func = dec2(dec1(func))

Exercises:

  1. Implement a decorator (as above) that traces calls to a decorated function. Then "stack" that with another decorator that prints a horizontal line of dashes before and after calling the function.
  2. Modify your solution to the above exercise so that the decorator that prints the horizontal line takes one argument: a character (or characters) that can be repeated to produce a horizontal line/separator.

Solutions:

  1. Reuse your tracing function from the previous exercise, then write a simple decorator that prints a row of dashes:

    def trace(msg):
        def inner1(func):
            def inner2(*args, **kwargs):
                print '>> [%s]' % (msg, )
                retval = func(*args, **kwargs)
                print '<< [%s]' % (msg, )
                return retval
            return inner2
        return inner1
    
    def horizontal_line(func):
        def inner(*args, **kwargs):
            print '-' * 50
            retval = func(*args, **kwargs)
            print '-' * 50
            return retval
        return inner
    
    
    @trace('tracing func1')
    def func1(x, y):
        print 'x:', x, 'y:', y
        result = func2((x, y))
        return result
    
    @horizontal_line
    @trace('tracing func2')
    def func2(content):
        print 'content:', content
        return content * 3
    
    def test():
        result = func1('aa', 'bb')
        print 'result:', result
    
    test()
    
  2. Once again, a decorator with arguments can be implemented with a function nested inside a function which is nested inside a function. This remains the same whether the decorator is used as a stacked decorator or not. Here is a solution:

    def trace(msg):
        def inner1(func):
            def inner2(*args, **kwargs):
                print '>> [%s]' % (msg, )
                retval = func(*args, **kwargs)
                print '<< [%s]' % (msg, )
                return retval
            return inner2
        return inner1
    
    def horizontal_line(line_chr):
        def inner1(func):
            def inner2(*args, **kwargs):
                print line_chr * 15
                retval = func(*args, **kwargs)
                print line_chr * 15
                return retval
            return inner2
        return inner1
    
    @trace('tracing func1')
    def func1(x, y):
        print 'x:', x, 'y:', y
        result = func2((x, y))
        return result
    
    @horizontal_line('<**>')
    @trace('tracing func2')
    def func2(content):
        print 'content:', content
        return content * 3
    
    def test():
        result = func1('aa', 'bb')
        print 'result:', result
    
    test()
    

8.1.3   More help with decorators

There is more about decorators here:

8.2   Iterables

8.2.1   A few preliminaries on Iterables

Definition: iterable (adjective) -- that which can be iterated over.

A good test of whether something is iterable is whether it can be used in a for: statement. For example, if we can write for item in X:, then X is iterable. Here is another simple test:

def isiterable(x):
    try:
        y = iter(x)
    except TypeError, exp:
        return False
    return True

Some kinds of iterables:

  • Containers -- We can iterate over lists, tuples, dictionaries, sets, strings, and other containers.
  • Some built-in (non-container) types -- Examples:
  • Instances of classes that obey the iterator protocol. For a description of the iterator protocol, see Iterator Types -- http://docs.python.org/lib/typeiter.html. Hint: Type dir(obj) and look for "__iter__" and "next".
  • Generators -- An object returned by any function or method that contains yield.

Exercises:

  1. Implement a class whose instances are interable. The constructor takes a list of URLs as its argument. An instance of this class, when iterated over, generates the content of the Web page at that address.

Solutions:

  1. We implement a class that has __iter__() and next() methods:

    import urllib
    
    class WebPages(object):
        def __init__(self, urls):
            self.urls = urls
            self.current_index = 0
        def __iter__(self):
            self.current_index = 0
            return self
        def next(self):
            if self.current_index >= len(self.urls):
                raise StopIteration
            url = self.urls[self.current_index]
            self.current_index += 1
            f = urllib.urlopen(url)
            content = f.read()
            f.close()
            return content
    
    def test():
        urls = [
            'http://www.python.org',
            'http://en.wikipedia.org/',
            'http://en.wikipedia.org/wiki/Python_(programming_language)',
            ]
        pages = WebPages(urls)
        for page in pages:
            print 'length: %d' % (len(page), )
        pages = WebPages(urls)
        print '-' * 50
        page = pages.next()
        print 'length: %d' % (len(page), )
        page = pages.next()
        print 'length: %d' % (len(page), )
        page = pages.next()
        print 'length: %d' % (len(page), )
        page = pages.next()
        print 'length: %d' % (len(page), )
    
    test()
    

9   Applications and Recipies

9.2   XML

Exercises:

  1. SAX -- Parse an XML document with SAX, then show some information (tag, attributes, character data) for each element.

  2. Minidom -- Parse an XML document with minidom, then walk the DOM tree and show some information (tag, attributes, character data) for each element.

    Here is a sample XML document that you can use for input:

    <?xml version="1.0"?>
    <people>
        <person id="1" value="abcd" ratio="3.2">
            <name>Alberta</name>
            <interest>gardening</interest>
            <interest>reading</interest>
            <category>5</category>
        </person>
        <person id="2">
            <name>Bernardo</name>
            <interest>programming</interest>
            <category></category>
            <agent>
                <firstname>Darren</firstname>
                <lastname>Diddly</lastname>
            </agent>
        </person>
        <person id="3" value="efgh">
            <name>Charlie</name>
            <interest>people</interest>
            <interest>cats</interest>
            <interest>dogs</interest>
            <category>8</category>
            <promoter>
                <firstname>David</firstname>
                <lastname>Donaldson</lastname>
                <client>
                    <fullname>Arnold Applebee</fullname>
                    <refid>10001</refid>
                </client>
            </promoter>
            <promoter>
                <firstname>Edward</firstname>
                <lastname>Eddleberry</lastname>
                <client>
                    <fullname>Arnold Applebee</fullname>
                    <refid>10001</refid>
                </client>
            </promoter>
        </person>
    </people>
    
  3. ElementTree -- Parse an XML document with ElementTree, then walk the DOM tree and show some information (tag, attributes, character data) for each element.

  4. Lxml -- Parse an XML document with lxml, then walk the DOM tree and show some information (tag, attributes, character data) for each element.

  5. Modify document with ElementTree -- Use ElementTree to read a document, then modify the tree. Show the contents of the tree, and then write out the modified document.

Solutions:

  1. We can use the SAX support in the Python standard library:

    #!/usr/bin/env python
    
    """
    Parse and XML with SAX.  Display info about each element.
    
    Usage:
        python test_sax.py infilename
    Examples:
        python test_sax.py people.xml
    """
    
    import sys
    from xml.sax import make_parser, handler
    
    class TestHandler(handler.ContentHandler):
        def __init__(self):
            self.level = 0
    
        def show_with_level(self, value):
            print '%s%s' % ('    ' * self.level, value, )
    
        def startDocument(self):
            self.show_with_level('Document start')
            self.level += 1
    
        def endDocument(self):
            self.level -= 1
            self.show_with_level('Document end')
    
        def startElement(self, name, attrs):
            self.show_with_level('start element -- name: "%s"' % (name, ))
            self.level += 1
    
        def endElement(self, name):
            self.level -= 1
            self.show_with_level('end element -- name: "%s"' % (name, ))
    
        def characters(self, content):
            content = content.strip()
            if content:
                self.show_with_level('characters: "%s"' % (content, ))
    
    def test(infilename):
        parser = make_parser()
        handler = TestHandler()
        parser.setContentHandler(handler)
        parser.parse(infilename)
    
    def usage():
        print __doc__
        sys.exit(1)
    
    def main():
        args = sys.argv[1:]
        if len(args) != 1:
            usage()
        infilename = args[0]
        test(infilename)
    
    if __name__ == '__main__':
        main()
    
  2. The minidom module contains a parse() function that enables us to read an XML document and create a DOM tree:

    #!/usr/bin/env python
    
    """Process an XML document with minidom.
    
    Show the document tree.
    
    Usage:
        python minidom_walk.py [options] infilename
    """
    
    import sys
    from xml.dom import minidom
    
    def show_tree(doc):
        root = doc.documentElement
        show_node(root, 0)
    
    def show_node(node, level):
        count = 0
        if node.nodeType == minidom.Node.ELEMENT_NODE:
            show_level(level)
            print 'tag: %s' % (node.nodeName, )
            for key in node.attributes.keys():
                attr = node.attributes.get(key)
                show_level(level + 1)
                print '- attribute name: %s  value: "%s"' % (attr.name,
                    attr.value, )
            if (len(node.childNodes) == 1 and
                node.childNodes[0].nodeType == minidom.Node.TEXT_NODE):
                show_level(level + 1)
                print '- data: "%s"' % (node.childNodes[0].data, )
            for child in node.childNodes:
                count += 1
                show_node(child, level + 1)
        return count
    
    def show_level(level):
        for x in range(level):
            print '   ',
    
    def test():
        args = sys.argv[1:]
        if len(args) != 1:
            print __doc__
            sys.exit(1)
        docname = args[0]
        doc = minidom.parse(docname)
        show_tree(doc)
    
    if __name__ == '__main__':
        #import pdb; pdb.set_trace()
        test()
    
  3. elementtree enables us to parse an XML document and create a DOM tree:

    #!/usr/bin/env python
    
    """Process an XML document with elementtree.
    
    Show the document tree.
    
    Usage:
        python elementtree_walk.py [options] infilename
    """
    
    import sys
    from xml.etree import ElementTree as etree
    
    def show_tree(doc):
        root = doc.getroot()
        show_node(root, 0)
    
    def show_node(node, level):
        show_level(level)
        print 'tag: %s' % (node.tag, )
        for key, value in node.attrib.iteritems():
            show_level(level + 1)
            print '- attribute -- name: %s  value: "%s"' % (key, value, )
        if node.text:
            text = node.text.strip()
            show_level(level + 1)
            print '- text: "%s"' % (node.text, )
        if node.tail:
            tail = node.tail.strip()
            show_level(level + 1)
            print '- tail: "%s"' % (tail, )
        for child in node.getchildren():
            show_node(child, level + 1)
    
    def show_level(level):
        for x in range(level):
            print '   ',
    
    def test():
        args = sys.argv[1:]
        if len(args) != 1:
            print __doc__
            sys.exit(1)
        docname = args[0]
        doc = etree.parse(docname)
        show_tree(doc)
    
    if __name__ == '__main__':
        #import pdb; pdb.set_trace()
        test()
    
  1. lxml enables us to parse an XML document and create a DOM tree. In fact, since lxml attempts to mimic the elementtree API, our code is very similar to that in the solution to the elementtree exercise:

    #!/usr/bin/env python
    
    """Process an XML document with elementtree.
    
    Show the document tree.
    
    Usage:
        python lxml_walk.py [options] infilename
    """
    
    #
    # Imports:
    import sys
    from lxml import etree
    
    def show_tree(doc):
        root = doc.getroot()
        show_node(root, 0)
    
    def show_node(node, level):
        show_level(level)
        print 'tag: %s' % (node.tag, )
        for key, value in node.attrib.iteritems():
            show_level(level + 1)
            print '- attribute -- name: %s  value: "%s"' % (key, value, )
        if node.text:
            text = node.text.strip()
            show_level(level + 1)
            print '- text: "%s"' % (node.text, )
        if node.tail:
            tail = node.tail.strip()
            show_level(level + 1)
            print '- tail: "%s"' % (tail, )
        for child in node.getchildren():
            show_node(child, level + 1)
    
    def show_level(level):
        for x in range(level):
            print '   ',
    
    def test():
        args = sys.argv[1:]
        if len(args) != 1:
            print __doc__
            sys.exit(1)
        docname = args[0]
        doc = etree.parse(docname)
        show_tree(doc)
    
    if __name__ == '__main__':
        #import pdb; pdb.set_trace()
        test()
    
  1. We can modify the DOM tree and write it out to a new file:

    #!/usr/bin/env python
    
    """Process an XML document with elementtree.
    
    Show the document tree.
    Modify the document tree and then show it again.
    Write the modified XML tree to a new file.
    
    Usage:
        python elementtree_walk.py [options] infilename outfilename
    Options:
        -h, --help      Display this help message.
    Example:
        python elementtree_walk.py myxmldoc.xml myotherxmldoc.xml
    """
    
    import sys
    import os
    import getopt
    import time
    
    # Use ElementTree.
    from xml.etree import ElementTree as etree
    # Or uncomment to use Lxml.
    #from lxml import etree
    
    def show_tree(doc):
        root = doc.getroot()
        show_node(root, 0)
    
    def show_node(node, level):
        show_level(level)
        print 'tag: %s' % (node.tag, )
        for key, value in node.attrib.iteritems():
            show_level(level + 1)
            print '- attribute -- name: %s  value: "%s"' % (key, value, )
        if node.text:
            text = node.text.strip()
            show_level(level + 1)
            print '- text: "%s"' % (node.text, )
        if node.tail:
            tail = node.tail.strip()
            show_level(level + 1)
            print '- tail: "%s"' % (tail, )
        for child in node.getchildren():
            show_node(child, level + 1)
    
    def show_level(level):
        for x in range(level):
            print '   ',
    
    def modify_tree(doc, tag, attrname, attrvalue):
        root = doc.getroot()
        modify_node(root, tag, attrname, attrvalue)
    
    def modify_node(node, tag, attrname, attrvalue):
        if node.tag == tag:
            node.attrib[attrname] = attrvalue
        for child in node.getchildren():
            modify_node(child, tag, attrname, attrvalue)
    
    def test(indocname, outdocname):
        doc = etree.parse(indocname)
        show_tree(doc)
        print '-' * 50
        date = time.ctime()
        modify_tree(doc, 'person', 'date', date)
        show_tree(doc)
        write_output = False
        if os.path.exists(outdocname):
            response = raw_input('Output file (%s) exists.  Over-write? (y/n): ' %
                outdocname)
            if response == 'y':
                write_output = True
        else:
            write_output = True
        if write_output:
            doc.write(outdocname)
            print 'Wrote modified XML tree to %s' % outdocname
        else:
            print 'Did not write output file.'
    
    def usage():
        print __doc__
        sys.exit(1)
    
    def main():
        args = sys.argv[1:]
        try:
            opts, args = getopt.getopt(args, 'h', ['help',
                ])
        except:
            usage()
        for opt, val in opts:
            if opt in ('-h', '--help'):
                usage()
        if len(args) != 2:
            usage()
        indocname = args[0]
        outdocname = args[1]
        test(indocname, outdocname)
    
    if __name__ == '__main__':
        #import pdb; pdb.set_trace()
        main()
    

    Notes:

    • The above solution contains an import statement for ElementTree and another for Lxml. The one for Lxml is commented out, but you could change that if you wish to use Lxml instead of ElementTree. This solution will work the same way with either ElementTree or Lxml.

9.3   Relational database access

You can find information about database programming in Python here: Database Programming -- http://wiki.python.org/moin/DatabaseProgramming/.

For database access we use the Python Database API. You can find information about it here: Python Database API Specification v2.0 -- http://www.python.org/dev/peps/pep-0249/.

To use the database API we do the following:

  1. Use the database interface module to create a connection object.
  2. Use the connection object to create a cursor object.
  3. Use the cursor object to execute an SQL query.
  4. Retrieve rows from the cursor object, if needed.
  5. Optionally, commit results to the database.
  6. Close the connection object.

Our examples use the gadfly database, which is written in Python. If you want to use gadfly, you can find it here: http://gadfly.sourceforge.net/. gadfly is a reasonable choice if you want an easy to use database on your local machine.

Another reasonable choice for a local database is sqlite3, which is in the Python standard library. Here is a descriptive quote from the SQLite Web site:

"SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain."

You can learn about it here:

If you want or need to use another, enterprise class database, for example PostgreSQL, MySQL, Oracle, etc., you will need an interface module for your specific database. You can find information about database interface modules here: Database interfaces -- http://wiki.python.org/moin/DatabaseInterfaces

Excercises:

  1. Write a script that retrieves all the rows in a table and prints each row.
  2. Write a script that retrieves all the rows in a table, then uses the cursor as an iterator to print each row.
  3. Write a script that uses the cursor's description attribute to print out the name and value of each field in each row.
  4. Write a script that performs several of the above tasks, but uses sqlite3 instead of gadfly.

Solutions:

  1. We can execute a SQL query and then retrieve all the rows with fetchall():

    import gadfly
    
    def test():
        connection = gadfly.connect("dbtest1", "plantsdbdir")
        cur = connection.cursor()
        cur.execute('select * from plantsdb order by p_name')
        rows = cur.fetchall()
        for row in rows:
            print '2. row:', row
        connection.close()
    
    test()
    
  2. The cursor itself is an iterator. It iterates over the rows returned by a query. So, we execute a SQL query and then we use the cursor in a for: statement:

    import gadfly
    
    def test():
        connection = gadfly.connect("dbtest1", "plantsdbdir")
        cur = connection.cursor()
        cur.execute('select * from plantsdb order by p_name')
        for row in cur:
            print row
        connection.close()
    
    test()
    
  3. The description attribute in the cursor is a container that has an item describing each field:

    import gadfly
    
    def test():
        cur.execute('select * from plantsdb order by p_name')
        for field in cur.description:
            print 'field:', field
        rows = cur.fetchall()
        for row in rows:
            for idx, field in enumerate(row):
                content = '%s: "%s"' % (cur.description[idx][0], field, )
                print content,
            print
        connection.close()
    
    test()
    

    Notes:

    • The comma at the end of the print statement tells Python not to print a new-line.
    • The cur.description is a sequence containing an item for each field. After the query, we can extract a description of each field.
  4. The solutions using sqlite3 are very similar to those using gadfly:

    #!/usr/bin/env python
    
    """
    Perform operations on sqlite3 (plants) database.
    
    Usage:
        python py_db_api.py command [arg1, ... ]
    Commands:
        create -- create new database.
        show -- show contents of database.
        add -- add row to database.  Requires 3 args (name, descrip, rating).
        delete - remove row from database.  Requires 1 arg (name).
    Examples:
        python test1.py create
        python test1.py show
        python test1.py add crenshaw "The most succulent melon" 10
        python test1.py delete lemon
    """
    
    
    import sys
    import sqlite3
    
    Values = [
        ('lemon', 'bright and yellow', '7'),
        ('peach', 'succulent', '9'),
        ('banana', 'smooth and creamy', '8'),
        ('nectarine', 'tangy and tasty', '9'),
        ('orange', 'sweet and tangy', '8'),
        ]
    
    Field_defs = [
        'p_name varchar',
        'p_descrip varchar',
        #'p_rating integer',
        'p_rating varchar',
        ]
    
    
    def createdb():
        connection = sqlite3.connect('sqlite3plantsdb')
        cursor = connection.cursor()
        q1 = "create table plantsdb (%s)" % (', '.join(Field_defs))
        print 'create q1: %s' % q1
        cursor.execute(q1)
        q1 = "create index index1 on plantsdb(p_name)"
        cursor.execute(q1)
        q1 = "insert into plantsdb (p_name, p_descrip, p_rating) values ('%s', '%s', %s)"
        for spec in Values:
            q2 = q1 % spec
            print 'q2: "%s"' % q2
            cursor.execute(q2)
        connection.commit()
        showdb1(cursor)
        connection.close()
    
    
    def showdb():
        connection, cursor = opendb()
        showdb1(cursor)
        connection.close()
    
    
    def showdb1(cursor):
        cursor.execute("select * from plantsdb order by p_name")
        hr()
        description = cursor.description
        print description
        print 'description:'
        for rowdescription in description:
            print '    %s' % (rowdescription, )
        hr()
        rows = cursor.fetchall()
        print rows
        print 'rows:'
        for row in rows:
            print '    %s' % (row, )
        hr()
        print 'content:'
        for row in rows:
            descrip = row[1]
            name = row[0]
            rating = '%s' % row[2]
            print '    %s%s%s' % (
                name.ljust(12), descrip.ljust(30), rating.rjust(4), )
    
    
    def addtodb(name, descrip, rating):
        try:
            rating = int(rating)
        except ValueError, exp:
            print 'Error: rating must be integer.'
            return
        connection, cursor = opendb()
        cursor.execute("select * from plantsdb where p_name = '%s'" % name)
        rows = cursor.fetchall()
        if len(rows) > 0:
            ql = "update plantsdb set p_descrip='%s', p_rating='%s' where p_name='%s'" % (
                descrip, rating, name, )
            print 'ql:', ql
            cursor.execute(ql)
            connection.commit()
            print 'Updated'
        else:
            cursor.execute("insert into plantsdb values ('%s', '%s', '%s')" % (
                name, descrip, rating))
            connection.commit()
            print 'Added'
        showdb1(cursor)
        connection.close()
    
    
    def deletefromdb(name):
        connection, cursor = opendb()
        cursor.execute("select * from plantsdb where p_name = '%s'" % name)
        rows = cursor.fetchall()
        if len(rows) > 0:
            cursor.execute("delete from plantsdb where p_name='%s'" % name)
            connection.commit()
            print 'Plant (%s) deleted.' % name
        else:
            print 'Plant (%s) does not exist.' % name
        showdb1(cursor)
        connection.close()
    
    
    def opendb():
        connection = sqlite3.connect("sqlite3plantsdb")
        cursor = connection.cursor()
        return connection, cursor
    
    
    def hr():
        print '-' * 60
    
    
    def usage():
        print __doc__
        sys.exit(1)
    
    
    def main():
        args = sys.argv[1:]
        if len(args) < 1:
            usage()
        cmd = args[0]
        if cmd == 'create':
            if len(args) != 1:
                usage()
            createdb()
        elif cmd == 'show':
            if len(args) != 1:
                usage()
            showdb()
        elif cmd == 'add':
            if len(args) < 4:
                usage()
            name = args[1]
            descrip = args[2]
            rating = args[3]
            addtodb(name, descrip, rating)
        elif cmd == 'delete':
            if len(args) < 2:
                usage()
            name = args[1]
            deletefromdb(name)
        else:
            usage()
    
    if __name__ == '__main__':
        main()
    

9.4   CSV -- comma separated value files

Exercises:

  1. Read a CSV file and print the fields in columns. Here is a sample file to use as input:

    # name  description  rating
    Lemon,Bright yellow and tart,5
    Eggplant,Purple and shiny,6
    Tangerine,Succulent,8
    

Solutions:

  1. Use the CSV module in the Python standard library to read a CSV file:

    """
    Read a CSV file and print the contents in columns.
    """
    
    import csv
    
    def test(infilename):
        infile = open(infilename)
        reader = csv.reader(infile)
        print '====                 ===========                              ======'
        print 'Name                 Description                              Rating'
        print '====                 ===========                              ======'
        for fields in reader:
            if len(fields) == 3:
                line = '%s %s %s' % (fields[0].ljust(20),
                    fields[1].ljust(40), fields[2].ljust(4))
                print line
        infile.close()
    
    def main():
        infilename = 'csv_report.csv'
        test(infilename)
    
    if __name__ == '__main__':
        main()
    

    And, when run, here is what it displays:

    ====                 ===========                              ======
    Name                 Description                              Rating
    ====                 ===========                              ======
    Lemon                Bright yellow and tart                   5
    Eggplant             Purple and shiny                         6
    Tangerine            Succulent                                8
    

9.5   YAML and PyYAML

YAML is a structured text data representation format. It uses indentation to indicate nesting. Here is a description from the YAML Web site:

"YAML: YAML Ain't Markup Language

"What It Is: YAML is a human friendly data serialization standard for all programming languages."

You can learn more about YAML and PyYAML here:

Exercises:

  1. Read the following sample YAML document. Print out the information in it:

    american:
      - Boston Red Sox
      - Detroit Tigers
      - New York Yankees
    national:
      - New York Mets
      - Chicago Cubs
      - Atlanta Braves
    
  2. Load the YAML data used in the previous exercise, then make a modification (for example, add "San Francisco Giants" to the National League), then dump the modified data to a new file.

Solutions:

  1. Printing out information from YAML is as "simple" as printing out a Python data structure. In this solution, we use the pretty printer from the Python standard library:

    import yaml
    import pprint
    
    def test():
        infile = open('test1.yaml')
        data = yaml.load(infile)
        infile.close()
        pprint.pprint(data)
    
    test()
    

    We could, alternatively, read in and then "load" from a string:

    import yaml
    import pprint
    
    def test():
        infile = open('test1.yaml')
        data_str = infile.read()
        infile.close()
        data = yaml.load(data_str)
        pprint.pprint(data)
    
    test()
    
  2. The YAML dump() function enables us to dump data to a file:

    import yaml
    import pprint
    
    def test():
        infile = open('test1.yaml', 'r')
        data = yaml.load(infile)
        infile.close()
        data['national'].append('San Francisco Giants')
        outfile = open('test1_new.yaml', 'w')
        yaml.dump(data, outfile)
        outfile.close()
    
    test()
    

    Notes:

    • If we want to produce the standard YAML "block" style rather than the "flow" format, then we could use:

      yaml.dump(data, outfile, default_flow_style=False)