July 13, 2008

Using the newest zipped pytz on GAE

I posted an entry about how to use zipped pytz on GAE[1].

1. http://takashi-matsuo.blogspot.com/2008/07/using-zipped-pytz-on-gae.html

It works well with older versions of pytz. Stefano pointed out that my method doesn't work with the newest pytz distribution. This article describes how to use the newest zipped pytz on GAE.

First, retrieve the newest pytz from pypi[2] and extract the archive.

2. http://pypi.python.org/pypi/pytz/

$ tar xjf pytz-2008c.tar.bz2
$ cd pytz-2008c/pytz
$ zip -q zoneinfo.zip `find zoneinfo -type f ! -name '*.pyc' -print`
$ rm -rf zoneinfo

After that, you have to edit pytz/__init__.py and modify open_resource function to use zipped zoneinfo database like following:
(pkg_resources stuff were struck out. Thank you again Stefano!)

def open_resource(name):
"""Open a resource from the zoneinfo subdir for reading.

Uses the pkg_resources module if available.
"""
import zipfile
from cStringIO import StringIO

if resource_stream is not None:
return resource_stream(__name__, 'zoneinfo/' + name)
else:

name_parts = name.lstrip('/').split('/')
for part in name_parts:
if part == os.path.pardir or os.path.sep in part:
raise ValueError('Bad path segment: %r' % part)
zoneinfo = zipfile.ZipFile(os.path.join(os.path.dirname(__file__),
'zoneinfo.zip'))
return StringIO(zoneinfo.read(os.path.join('zoneinfo', *name_parts)))

Then, the only thing to do is to copying your original pytz directory into your application directory.

$ cd ../
$ cp -r pytz /your/application/directory


If you'd like to avoid using CPU to unzip zoneinfo data every time, perhaps you could use following function:

from google.appengine.api import memcache
import logging
import pytz
from pytz import timezone

def getTimezone(tzname):
try:
tz = memcache.get("tz:%s" % tzname)
except:
tz = None
logging.debug("timezone get failed: %s" % tzname)
if tz is None:
tz = timezone(tzname)
memcache.add("tz:%s" % tzname, tz, 86400)
logging.debug("timezone memcache added: %s" % tzname)
else:
logging.debug("timezone memcache hit: %s" % tzname)

return tz


Happy coding :-)

11 comments:

Stefano said...
This comment has been removed by the author.
Stefano said...

Watch out for resource_stream. If it's available (comes with setuptools), your code breaks.

That's why I removed it. It was breaking on my dev environment (pretty standard Mac OS X).

Nice idea to use caching, by the way!

tmatsuo said...

Thank you again Stefano! I striked out resource_stream stuff.

glandium said...

Using memcache won't prevent anything, actually, because the way pytz does pickling prevents any useful use of memcache: basically, the only thing stored in memcache ends up being the zone name, and the zone is reread from the zip file when you memcache.get()

I figured that when trying to use memcache in pytz itself...

glandium said...

Actually, it doesn't reread from the zipfile every time, because it has a _tzinfo_cache that apparently GAE keeps around (though it would not cross the server boundaries, so each server in the GAE farm would have its own cache)

But the thing is still that when using memcache.get it ends up calling pytz.timezone(zone), so it'd just be better to call that directly.

Александр Васильев said...

I think I'll cross-post this comment into GAE issue#498

Due to the cross-dependencies on pytz (as glandium mentioned) or even dependences on pytz absence (see google.appengine.cron.groctimespecitication.py) it is better to:
a) rename pytz into gaepytz
b) rename pytz referencies inside gaepytz dir like this:
perl -p -i -e 's/pytz/gaepytz/g' $(egrep -rl pytz $(ls **/*(.) | egrep -v svn))

Dan Olsen said...

type 'exceptions.KeyError': 'zoneinfo\\Africa\\Abidjan'
args = (r'zoneinfo\Africa\Abidjan',)
message = r'zoneinfo\Africa\Abidjan'

What would I be missing in order for it not to be able to find the file?

Александр Васильев said...

You miss the Right OS (unix ;)
path to the archive file is assembled with
zoneinfostr = get_cached_zoneinfo(os.path.join('zoneinfo', *name_parts))
Which is '/' in Linux and '\' in Windows.

Try to change code to join with '/' or hardcode like:
zoneinfostr = get_cached_zoneinfo('zoneinfo/Africa/Abidjan')

Александр Васильев said...

[code class='prettyprint']
def open_resource(name):
"""Open a resource from the zoneinfo subdir for reading.

Uses the pkg_resources module if available.
"""
name_parts = name.lstrip('/').split('/')
for part in name_parts:
if part == os.path.pardir or os.path.sep in part:
raise ValueError('Bad path segment: %r' % part)
path = 'zoneinfo/%s' % '/'.join(name_parts)
zoneinfostr = get_cached_zoneinfo(path)
return StringIO(zoneinfostr)
[/code]

Junghwan Park said...

For a simpler uses (datetime conversion or timezone correctness check),

try this simple GET API:

http://timezonetimezone.appspot.com

You can figure out how to use it in 5 mins.

Unknown said...

If my app is accessed after being left unused for a few minutes, the response will use about 12000ms of cpu time, but immediate subsequent accesses from that point on use around 50-70ms of cpu until the app is left to rest again. I know that cold starting is causing a major performance decrease, but can I do anything more to counter this? Is this perfectly normal?