Useful Python time formats for dealing with HTTP headers

Time-formats

  1. HTTP header format. HTTP headers use a particular format, involving abbreviated English names of the days of the week and the months, a comma (after the first of these), day before month before year, 24-hour:minute:second, and GMT. Examples:
    Thu, 01 Dec 1994 16:00:00 GMT
    Thu, 26 Jan 2006 15:01:16 GMT
    Tue, 12 Jan 2010 13:48:00 GMT

    Note that this is different from what is output by Python’s time.asctime() function.

  2. The struct_time object. Python represents time conveniently as a struct_time object, defined at http://docs.python.org/library/time.html#time.struct_time as “an object with a named tuple interface: values can be accessed by index and by attribute name” (accessed 20120523). Here is what the first of the three date-time strings above looks like composed as a struct_time object:
    time.struct_time(tm_year=1994, tm_mon=12, tm_mday=1, tm_hour=16, 
            tm_min=0, tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)
  3. ISO 8601 standard format. The International Organization for Standardization (ISO) prescribes a standard format for date and time: YYYY-MM-DDThh:mm:ss. See the ISO’s informal description of the standard. (The 2004 revision of the actual specification, which costs money to download, is currently available at this site.)
  4. Compact sortable format. For appending to the names of output files when I need to generate multiple versions, and for use as a sortable but human-readable date-time string, I use a format YYYMMDD_hhmmss. The three dates initially listed above are shown in this format here:
    19941201_160000
    20060126_150116
    20100112_134800

    I haven’t seen this given a standard name in the Python docs, so I call it “compact sortable” format; it is more compact than the ISO 8601 standard format but still easy to read. In cases where it is impossible for there to be more than one output file generated in the same clock minute, I omit the two digits representing seconds.

Time-format conversions

  1. We can use time.strptime and time.strftime to convert to and from a struct_time object using a kind of formatting syntax using %, reminiscent of the more primitive of the string-formatting syntaxes, and similar to that used in C’s sprintf() function. Details are provided in the Python docs cited above.
  2. Examples:
    1. HTTP header format and a struct_time object
      1. generate HTTP header format from a struct_time object:
        time.strftime('%a, %d %b %Y %H:%M:%S GMT', time_struct)
      2. generate a struct_time object from HTML header time format:
        time.strptime(http_header_time, '%a, %d %b %Y %H:%M:%S GMT')
      3. as an identity, deconstruct and regenerate the original HTML header time format:
        time.strftime('%a, %d %b %Y %H:%M:%S GMT', 
                time.strptime(http_header_time, 
                '%a, %d %b %Y %H:%M:%S GMT'))
    2. ISO 8601 format and a struct_time object
      1. generate ISO 8601 format from a struct_time object:
        time.strftime('%Y-%m-%dT%H:%M:%S', time_struct)
      2. generate a struct_time object from ISO 8601 format:
        time.strptime(iso8601, '%Y-%m-%dT%H:%M:%S')
      3. as an identity, deconstruct and regenerate the original ISO 8601 string:
        time.strftime('%Y-%m-%dT%H:%M:%S', time.strptime(iso8601, 
                '%Y-%m-%dT%H:%M:%S'))
    3. produce compact sortable format from HTML header time format
      strftime("%Y%m%d_%H%M%S", time.strptime(http_header_time, 
              '%a, %d %b %Y %H:%M:%S GMT'))
  3. The following brief functions can be used
    def make_struct_time(http_header_time):
        '''Input HTML header-type time string and output struct_time'''
        return time.strptime(http_header_time, 
                '%a, %d %b %Y %H:%M:%S GMT')
    
    def make_http_time_string(time_struct):
        '''Input struct_time and output HTTP header-type time string'''
        return time.strftime('%a, %d %b %Y %H:%M:%S GMT', 
                time_struct)
    
    def make_iso8601_time_string(time_struct):
        '''Input struct_time and output an ISO 8601 time string'''
        return time.strftime('%Y-%m-%dT%H:%M:%S', 
                time_struct)
        
    def make_sortable_time_string(time_struct):
        '''Input struct_time and output a "compact sortable" time string'''
        return time.strftime('%Y%m%d_%H%M%S', 
                time_struct)

Current time

  1. The following generates our current, local time as a struct_time object:
    time.localtime()

[end]

Tricked again by Python’s mutable objects

In order to keep a running count of a group of things, I created a Python dictionary d, with its keys populated from unique members of a list list_of_counters and its values set uniformly to 0:

d = {}.fromkeys(set(list_of_counters), 0)

I used set() to ensure that only one unique copy of each element of list_of_counters was used, and all went well. Each item in the dictionary functioned as an independent counter.

Later, I realized that I needed to keep track of two things for each counter, rather than one. No problem; I replaced the 0 above with a list [0, 0], one index for each thing to be kept track of.

d = {}.fromkeys(set(list_of_counters), [0, 0])

Disaster! Every time I updated one index for any particular item, the same index was changed for all the items.

My error was to use a list to populate a dictionary’s values using a single list. Since lists are mutable objects in Python, they are passed by reference. So there was only a single list being used as the value for all the keys in the dictionary; when one copy of it was updated, all were changed to match it.

The solution is to populate the initial values using a loop, rather than all at once. For instance:

d = {}.fromkeys(set(list_of_counters))
for i in d:
    d[i] = [0, 0]

A comprehension can also be used. In the intial creation of the dictionary, the values can be left empty.

Sorting a list of Unicode strings in Python, case-insensitively and ignoring diacritics

This is tricky, because the sort-key key=str.lower apparently only works for ASCII strings, not Unicode strings. I get the error

TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'

Instead, try declaring your locale and letting your LANG environment set your encoding for you:

import locale
locale.setlocale(locale.LC_ALL, '')
listname = sorted(listname, cmp=locale.strcoll)

which seems to work. My encoding is set to en_US.UTF-8, which treats A, a, and ā as the same letter; that’s what I want for sorting in this particular case.

I could also specify the encoding, but according to the Python docs at http://docs.python.org/release/2.6.6/library/locale.html?highlight=locale#locale.setlocale, it seems the actual encoding can be left out and supplied from one’s environment.

That’s what the empty single quotes are; but they need to be there because setlocale() takes a 2-tuple. Trying that on my system returns ‘en_US.UTF-8′. If I omit the single quotes altogether, the function returns ‘C’ and the sort is case sensitive.

Avoid deleting the contents of a file in Python through sloppy use of “write” mode

The safest way to open files in Python, whether for reading or writing, is by using the with statement:

with open('file.txt', 'r') as f:
    f.read() # and do some other things...

The with statement ensures that the file’s exit code is executed, closing the file no matter what interruptions may take place. That prevents files from becoming locked unexpectedly. Some people suggest embedding the whole with block in a try-except block to catch exceptions.

However, there is one disaster that may befall you even when using with. If you open an existing file with a mode of 'w' (write) instead of 'r' (read), the content of the file will be completely overwritten, even if you do nothing to the file other than open it for writing. It will have a size of 0 and no content at all.

Since you have probably typed a mode of 'w' inadvertently, there is no point recommending the use of other modes as safeties. The safest way to avoid this problem is to ensure that the file you are reading from is backed up somewhere. That way, if you type 'w' instead of 'r', you will still have an easy way to recover.

Reloading a Python module after modifying it

If a module is being tested while in the midst of development, it has to be reloaded into a running interpreter in order for its most current version to be available to the interpreter. Use the reload() function for this. Note that although the initial import is a statement, meaning that there are no parentheses around what follows it, reload() is a function, not a statement, and so there are parentheses around it.

Ipython (v. 0.12) and the normal Python interpreter (v. 2.6.5) handle this in slightly different ways. Supposing you have a program my_module.py whose functions you want to import, and you do so (as I almost always do) using a shorter, alternate name, like “M”:

>>> import my_module as M

If my_module undergoes changes that you want to make use of, in the regular Python interactive mode you have two choices: (1) you can simply use

>>> reload(my_module)
<module 'my_module' from 'my_module.pyc'>

or (2) you can use alternate name you imported the module as:

>>> reload(M)
<module 'my_module' from 'my_module.pyc'>

Python accepts either name as the argument of reload(), as shown by its response.

In Ipython, however, only the second of these works (2):

In [1]: import my_module as M
# (now my_module undergoes changes that you want to make use of)
In [2]: reload(M)
Out[2]: <module 'my_module' from 'my_module.pyc'>

The first (1) will generate an error:

In [3]: reload(my_module)
NameError: name 'my_module' is not defined

In Python 3, reload() has been assigned to the imp module, for functionality related to the import statement. So it has to be called as imp.reload().

Graphing flowcharts and automata in LaTeX

The language Dot describes graphs in plain text. It is used with the Graphviz graphic application; both were originally developed at Bell Labs.

Below are a few notes on surprises I had when working with .dot and .svg (“Scalable Vector Graphics”, a standard XML-based format for graphs) files in Python:

  1. For viewing .svg files, desktop installations of Ubuntu use the Gnome viewer “Eye of GNOME” (eog) by default.
  2. Ubuntu’s (Lucid) server installation of graphviz does not include a viewer by default. You can display .svg files using Firefox; set the browser to about:config and confirm that you have the setting svg.smil.enabled;true and place an entry in your ~/.mailcap file:

    image/svg+xml; firefox

    Of course, you can also install eog on your server.

  3. The current Mac version of Graphviz (v. 2.28) has no trouble opening a .dot file, but apparently it cannot open .svg files.

For use within LaTeX documents, it is possible to do everything native packages or (more interestingly) to incorporate Graphviz output by converting it to a native format:

  1. The native LaTeX tools for producing flowcharts and automata are the tikz and pstricks packages. TikZ, which has more comprehensive support, supplies a library called automata (see the TikZ manual for detailed instructions. There is also a third library, VauCanSon-G, but it appears to have less functionality.
  2. There is a Python module, dot2tex by Kjell Magne Fauske, that converts .dot and other Graphviz formats to TikZ or pstricks.
  3. Fauske has also written a LaTeX package, dot2texi, that allows .dot (etc.) graphical output to be embedded directly in a LaTeX document.

One man’s calm reflection on Java-think in Python

P. J. Eby writes (2004):

Getters and setters are evil. Evil, evil, I say! Python objects are not Java beans. Do not write getters and setters. This is what the ‘property’ built-in is for. And do not take that to mean that you should write getters and setters, and then wrap them in ‘property’. That means that until you prove that you need anything more than a simple attribute access, don’t write getters and setters. They are a waste of CPU time, but more important, they are a waste of programmer time. Not just for the people writing the code and tests, but for the people who have to read and understand them as well.

“Python Is Not Java” (http://dirtsimple.org/2004/12/python-is-not-java.html, accessed 20111026)

Simulating private variables in Python

Python does not have the private member variables that many programming languages do. I observe three strategies in use to simulate them, none effective. [Edit:] I don’t advocate this; it’s not Pythonic. But it’s useful to be able to recognize in code one is reading or contributing to.

One convention is that a leading single underscore can be use as an “advisory” or “soft” privacy notation. So a method named method() is meant to be public, while _method() is supposed to be treated as private. The leading underscore helps coders to remember not to use it, although there is no way to keep them from using it if they want to.

A second strategy is the use of such things as private class variables. These exist mainly not for the sake of privacy but to avoid collisions between variables with the same name that originate in different classes. They are marked as private by the use of two distinguishing characteristics:

  • they are named with a leading “dunder” (double underscore, i.e., __).

  • “mangling” (alteration) of their names: in order to access them outside of their own class, you prefix the dunder with an additional single underscore and the name of the class itself.

For instance, if you have a class called TheClass containing a method called __method(), the method can be called on some object thing not as plain

thing.__method()

but as

thing._TheClass__method()

I have posted a little piece of code called degrees_of_privacy.py to my BitBucket repository to illustrate the syntax and effects of this and the “advisory” usages. But even these supposedly private variables are fully accessible outside of their own classes, as long as they are correctly named, unlike true private member variables in C++ and other languages.


A third strategy I have observed is that some coders use a single underscore before the public form of a variable’s name and no underscore before the nominally private form of the same variable. In 2009, John Reid posted code for a red black tree with a class definition beginning this way:

class rbnode(object):

def __init__(self, key):
self._key = key

The point is that (based on a reading of Reid’s code) key is apparently meant to be a private variable and _key a public one. Similarly, Bruno Preiss, in his interesting 2003 on-line book of data structures and algorithms in Python uses syntax of the form:

class Element(object):

def __init__(self, list, datum, next):
self._list = list
self._datum = datum
self._next = next

Again, list seems meant to be private and _list public. There is a similar prescription in the Wikipedia article on mutator methods (accessed 20111017):

class Student():

def __init__(self, name):
self._name = name

I wonder where this syntax comes from, since the official Python documentation and reference books I have consulted do not mention it, and after all these variables cannot be private in the true sense. Reid and Preiss enforce their private variables as static (unmodifiable or read-only) by use of the property() function with the fget function:

key = property(fget=lambda self: self._key, doc=”The node’s key”)

(from Reid). Here _key can be changed outside of the class itself, but key cannot in the absence of fset or fdel as an argument of property(). That is just the opposite of the “advisory” leading underscore described near the beginning of this posting.

I can’t help thinking that I am looking at indirect influence from C++. Reid and Preiss are both modeling nodes — Reid for a binary tree and Preiss for a linked list, both structures for which a language with pointers would be natural. Reid (following the pseudocode in the algorithms book by Cormen et al.) has some objects of the form

z.p.p.right

meaning the right attribute of an object p, which is itself the attribute of a different object p, itself the attribute of still another object z. I can’t think of a simpler way than this to model the nodes of graphs and trees in Python, though it does call to mind a similar pointer construction

((z->p)->p)->right

in C++. Perhaps this is an earlier version (or a recollection) of forms like

_TheClass__method()

cited above.

Inconsistent results of the same seed in random.seed() on different Python installations

Non-integer seeds as the argument of random.seed() are normally replaced by the integer hash of the seed supplied. The value of hash(x) seems to be different on on 32-bit architectures from its value on 64-bit architectures, provided the seed x is a string. The values are the same if the seed is an integer.

This problem came up among several people trying to run the same code on different platforms and different versions of Python 2.5 and 2.6. Overall, there appear to be two different sets of results for hash(x) with x a string rather than an integer — corresponding to 32- vs. 64-bit architectures.


Of the random.seed(x) function, Beazley (Python Essential Reference, 4th ed., p. 254) writes: “If x is not an integer, it must be a hashable object and the value of hash(x) is used as a seed.”

The Python documentation website says, “All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().”

Strings are immutable in Python. Integers simply hash to their own values, so they’re naturally the same on all installations.


Edit: I’ve added a little piece of code I used to explore the Python 2.6 hashing function to my public repository on BitBucket: https://bitbucket.org/dpb/show_hash/overview.