Abstract
CherryPy lets developers use Python to develop web applications, just as they would use Python for any other type of application. Building a web application with CherryPy is very straightforward and does not require the developer to change habits, or learn many features, before being able to produce a working application. This section will review the basic components which you will use to build a CherryPy application.CherryPy has lots of fancy features to help you manage HTTP messages. But the most
fundamental thing it does is allow you to map URI's to handler functions. It does this in a
very straightforward way: the path portion of a URI is heirarchical, so CherryPy uses a
parallel heirarchy of objects, starting with cherrypy.root
. If your application
receives a request for "/admin/user?name=idunno", then CherryPy will try to find the handler:
cherrypy.root.admin.user
. If it exists, is callable, and has an "exposed = True"
attribute, then CherryPy will hand off control to that function. Any URI parameters (like
"name=idunno", above) are passed to the handler as keyword arguments.
There are some special cases, however. To what handler should we map a path like
"/admin/search/"? Note the trailing slash after "search"—it indicates that our path has
three components: "admin", "search", and "". Static webservers interpret this to mean
that the search
object is a directory, and, since the third component is
blank, they use an index.html
file if it exists. CherryPy is a dynamic
webserver, so it allows you to specify an index
method to handle this. In
our example, CherryPy will look for a handler at
cherrypy.root.admin.search.index
. Let's pause and show our example
application so far:
Example 3.1. Sample application (handler mapping example)
import cherrypy class Root: def index(self): return "Hello, world!" index.exposed = True class Admin: def user(self, name=""): return "You asked for user '%s'" % name user.exposed = True class Search: def index(self): return search_page() index.exposed = True cherrypy.root = Root() cherrypy.root.admin = Admin() cherrypy.root.admin.search = Search()
So far, we have three exposed handlers:
root.index
. This will be called for the URI's "/" and
"/index".
root.admin.user
. This will be called for the URI
"/admin/user".
root.admin.search.index
. This will be called for the URI's
"/admin/search/" and "/admin/search".
Yes, you read that third line correctly: root.admin.search.index
will
be called whether or not the URI has a trailing slash. Actually, that isn't quite true;
CherryPy will answer a request for "/admin/search" (without the slash) with an HTTP
Redirect response. Most browsers will then request "/admin/search/" as the redirection
suggests, and then our root.admin.search.index
handler
will be called. But the final outcome is the same.
Now, let's consider another special case. What if, instead of passing a user name as a parameter, we wish to use a user id as part of the path? What to do with a URI like "/admin/user/8173/schedule"? This is intended to reference the schedule belonging to "user #8173", but we certainly don't want to have a separate function for each user id!
CherryPy allows you to map a single handler to multiple URI's with the simple
approach of not writing handlers you don't need. If a node in the
cherrypy.root
tree doesn't have any children, that node will be called for
all of its child paths, and CherryPy will pass the leftover path info as positional
arguments. In our example, CherryPy will call cherrypy.root.admin.user("8173",
"schedule")
. Let's rewrite our user method to handle such requests:
Example 3.2. A user method which handles positional parameters
class Admin: def user(self, *args): if not args: raise cherrypy.HTTPError(400, "A user id was expected but not supplied.") id = args.pop(0) if args and args[0] == 'schedule': return self.schedule(id) return "You asked for user '%s'" % id user.exposed = True
Note that this is different behavior than CherryPy 2.1, which only allowed positional params for methods named "default".
Are you ready for another special case? What handler is called in our example if
you request the URI "/not/a/valid/path"? Given the behavior we have described up to this
point, you might deduce that the root.index
method will end up handling
any path that can't be mapped elsewhere. This would mean, in effect,
that CherryPy applications with a root.index
could never return a "404 Not
Found" response!
To prevent this, CherryPy doesn't try to call index methods unless they are
attached to the last node in the path; in our example, the only index method that might
be called would be a root.not.a.valid.path.index
method. If you truly want
an intermediate index method to receive positional parameters, well, you can't do that.
But what you can do is define a default
method to do that for you, instead
of an index
method. If we wanted our cherrypy.root
to handle
any child path, and receive positional parameters, we could rewrite it like this:
Example 3.3. A default
method example
class Root: def index(self): return "Hello, world!" index.exposed = True def default(self, *args): return "Extra path info: %s" % repr(args) default.exposed = True
This new Root class would handle the URI's "/" and "/index" via the
index
method, and would handle URI's like "/not/a/valid/path" and
"/admin/unknown" via the default
method.
For those of you who need to see in exactly what order CherryPy will try various handlers, here are some examples, using the application above. We always start by trying to find the longest object path first, and then working backwards until an exposed, callable handler is found:
Example 3.4. Traversal examples
"/admin/user/8192/schedule" Trying to reach cherrypy.root.admin.user.8192.schedule.index... cherrypy.root exists? Yes. .root.admin exists? Yes. .admin.user exists? Yes. .user.8192 exists? No. .user.default is callable and exposed? No. .admin.user is callable and exposed? Yes. Call it. "/admin/search/" Trying to reach cherrypy.root.admin.search.index... cherrypy.root exists? Yes. .root.admin exists? Yes. .admin.search exists? Yes. .search.index exists? Yes. Path exhausted. .search.index is callable and exposed? Yes. Call it. "/admin/unknown" Trying to reach cherrypy.root.admin.unknown.index... cherrypy.root exists? Yes. .root.admin exists? Yes. .admin.unknown exists? No. .admin.default is callable and exposed? No. .root.admin is callable and exposed? No. .root.default is callable and exposed? Yes. Call it.
Filters are one of the most important features of CherryPy. The CherryPy core can call user-defined functions at specific points during request processing; a filter is a class which defines those functions. Filters are designed to be called at a low level—the HTTP request/response level—and therefore should only be used in that context.
CherryPy comes with a set of built-in filters, but they're turned off by default. To enable them, you must use the configuration system as follows:
filterName.on = True
Example 3.5. Turning on a default filter
[/entries/view] tidy_filter.on = True tidy_filter.tmp_dir = "/tmp" tidy_filter.strict_xml = True
On the first line we define that the tidy filter will be used by the
core whenever the path /entries/view
(or one of its sub-paths)
is called. On the two last lines we also define some parameters used by the
filter.
CherryPy lets you write your own filters as we will see in the
developer reference chapter. However, the way to use them is different from
the default filters. You do not declare custom filters within the
configuration file; instead, use the _cp_filters
attribute in
your source code:
Example 3.6. Using a non default filter
import cherrypy from myfiltermodule import MyFilterClass class Entry: _cp_filters = [ MyFilterClass() ] def view(self, id): # do suff... view.exposed = True class Root: pass cherrypy.root = Root() cherrypy.root.entries = Entry() cherrypy.server.start()
As all objects below cherrypy.root.entries
will inherit
the filter, there is no need to re-specify it in each
_cp_filters
underneath.
Keep in mind that the user-defined filters are called in the order you add them to the list.
The CherryPy configuration system provides fine-grained control over how each part of the application should react. You will use it for two reasons:
Web server settings
Enabling filters per path
You will be able to declare the configuration settings either from a file or from a Python dictionary.
First of all, let's see how a typical configuration file is defined.
Example 3.7. Configuration file
# The configuration file called myconfigfile.conf [global] server.socket_port=8080 server.socket_host="" server.socket_file="" server.socket_queue_size=5 server.protocol_version="HTTP/1.0" server.log_to_screen=True server.log_file="" server.reverse_dns=False server.thread_pool=10 server.environment="development" [/service/xmlrpc] xmlrpc_filter.on = True [/admin] session_authenticate_filter.on=True [/css/default.css] static_filter.on = True static_filter.file = "data/css/default.css" # From your script... cherrypy.config.update(file="myconfigfile.conf")
The settings can also be defined using a python dictionary instead of a file as follows:
Example 3.8. Configuration dictionary
settings = { 'global': { 'server.socket_port' : 8080, 'server.socket_host': "", 'server.socket_file': "", 'server.socket_queue_size': 5, 'server.protocol_version': "HTTP/1.0", 'server.log_to_screen': True, 'server.log_file': "", 'server.reverse_dns': False, 'server.thread_pool': 10, 'server.environment': "development" }, '/service/xmlrpc' : { 'xmlrpc_filter.on': True }, '/admin': { 'session_authenticate_filter.on' :True }, '/css/default.css': { 'static_filter.on': True, 'static_filter.file': "data/css/default.css" } } cherrypy.config.update(settings)
Each section of the configuration refers to an object path; the object path is used to
lookup the correct handler for each Request-URI. Therefore when the server receives a
Request-URI of /css/default.css
, the static filter will handle the request, and
the server will actually return the physical file at
data/css/default.css
. Since the path /service/xmlrpc
has
the XML-RPC filter enabled, all the exposed methods of the object
cherrypy.root.service.xmlrpc
will be treated as XML-RPC methods.
The global
entry represents settings which apply outside the request
process, including server settings such as the port, the protocol version to use by default,
the number of threads to start with the server, etc. This is not the
same as the root entry [/]
, which maps to cherrypy.root.
By default, URI's and object paths are equivalent; however, filters may rewrite the
objectPath to produce a different mapping between URI's and handlers. This is necessary, for
example, when mounting applications at virtual roots (e.g. serving the object path
/welcome
at the URI "/users/~rdelon/welcome").
All values in the configuration file must be valid Python values. Strings must be quoted, booleans must be True or False, etc.
The server.environment
entry controls how CherryPy should run. Three
values are built in:
development
log_debug_info_filter is enabled
HTTPErrors (and therefore the default _cp_on_error) display tracebacks in the browser if errors occur
autoreload is enabled
NotFound errors (404) are listed in the error.log
production
log_debug_info_filter is disabled
tracebacks are logged, but are not displayed in the browser
autoreload is disabled
NotFound errors (404) aren't listed in the error log
staging (same as production for the moment)
Beginning in CherryPy 2.2, the behavior of each environment is defined in
cherrypy.config.environments
, a dict whose keys are "development",
"production", etc, and whose values are dicts of config keys and values. Application
developers are free to modify existing environments, or define new environments for use
by their deployers, by modifying this container. For example, if you develop an
application which absolutely cannot handle autoreload, your app can set
cherrypy.config.environments['development']['autoreload.on'] = False
.
Deployers who selected the "development" environment would then be free from the danger
of autoreload interacting with your application. Another example of using
config.environments directly might be an application which needs a "development" and
"production" environment, but also separate "beta", "rc", "live data" and/or "testing"
environments.
Abstract
CherryPy 2.1 includes a powerful sessions system provided via a new
session_filter
.
First you need to enable the session filter through the
configuration system, by setting session_filter.on
to
True
. This gives you a variable called
cherrypy.session
, which is a dictionary-like object
where you can read/store your session data. This dictionary always has a
special key called _id
which contains the session
id.
Here is sample code showing how to implement a simple counter using sessions:
Example 3.9. Basic example of session usage
import cherrypy class Root: def index(self): count = cherrypy.session.get('count', 0) + 1 cherrypy.session['count'] = count return 'Counter: %s' % count index.exposed = True cherrypy.config.update({'session_filter.on': True}) cherrypy.root = Root() cherrypy.server.start()
The following configuration options are available for "session_filter":
session_filter.on
: True
or
False
(default): enable/disable sessions
session_filter.storage_type
: Specify which
storage type should be used for storing session data on the server.
Built-in types are Ram
(default),
File
and PostgreSQL
(see Section 1.4.3, “Choosing the backend” for more info).
session_filter.storage_path
: Specifies the directory
in which CherryPy puts the session files when session_filter.storage_type is set
to File
.
session_filter.timeout
: The number of minutes of
inactivity before an individual session can be removed. It can be a
float (ex: 0.5 for 30 seconds). Defaults to 60.
session_filter.clean_up_delay
: Once in a while the
server cleans up old/expired sessions. This config option specifies
how often this clean up process should happen. The delay is in
minutes. Defaults to 5.
session_filter.cookie_name
: The name of the
cookie that CherryPy will use to store the session ID. Defaults to
sessionID
.
session_filter.get_db
: See the
PostgreSQL
backend from Section 1.4.3, “Choosing the backend”.
session_filter.deadlock_timeout
: See Section 1.4.5, “Handling concurrent requests for the same session data”.
session_filter.on_create_session
: See Section 1.4.6, “Being notified when sessions are created/deleted”.
session_filter.on_renew_session
: See Section 1.4.6, “Being notified when sessions are created/deleted”.
session_filter.on_delete_session
: See Section 1.4.6, “Being notified when sessions are created/deleted”.
session_filter.storage_class
: See Section 1.4.4, “Writing your own custom backend”.
CherryPy comes with multiple build-in backends for storing session data on the server side. They are:
Ram
: All data is stored in RAM; this is the
fastest storage, but it means that the data will be lost if you
restart the server; and it also means that it won't scale to multiple
processes/machines
File
: All data is stored on disk; this is a bit
slower than Ram storage, but the data will persist if you restart the
server. It also means that data can be shared amongst multiple
CherryPy processes, either on the same machine, or on multiple
machines if all machines have access to the same disk (for example,
via NFS).
PostgreSQL
: This backend is included
with CherryPy to show how easy it is to implement your own custom
backend for the session system. All data is stored in a PostgreSQL
database; storing your data in a database is the recommend setup for
production if you have a very high traffic website and you need to scale
your site across multiple machines. To use this backend, you'll need to
create the following table in your PostgreSQL database:
create table session ( id varchar(40), data text, expiration_time timestamp )
You also need to programmatically set the
session_filter.get_db
config option to a function that
returns a DB connection. Note that you should use the psycopg2
module.Ram
backend, the session data is saved as soon as you stick it in cherrypy.session
. So even if an error occurs later on in the page handler the data is still saved; this is not the case for the other backends.
By default, CherryPy comes with 3 built-in backends, but if you have specific needs, it is very easy to implement your own custom backend (for instance, another database, or an XML-RPC server, ...). To do so, all you have to do is write a class that implements the following methods:
class MyCustomBackend: def save(self, id, data, expirationTime): """ Save the session data and expirationTime for that session id """ def load(self, id): """ Load the session data and expirationTime for 'id' and return a tuple (data, expirationTime) (even if the session is expired). Return None if id doesn't exist. """ def clean_up(self): """ Delete expired session data from storage and call 'on_delete_session' for each deleted session id """
Note that if you want to use explicit
locking (see Section 1.4.5, “Handling concurrent requests for the same session data”), you also have to implement
two extra methods: acquire_lock
and
release_lock
.
Once you have written this class, you have to programmatically set
the session_filter.storage_class
config option to this
class.
If you need help in writing your own custom backend it is a good
idea to look at how the current ones (ram, file and postgresql) are
implemented. They are implemented in the file
cherrypy/lib/filter/sessionfilter.py
It is normally quite rare to have two simultaneous requests with the same session ID. It means that a same browser is making 2 requests to your server at the same time (to dynamic pages ... static data like images don't have sessions). However, this case can happen (if you're using frames for instance), and it will happen more and more often as more and more people start using Ajax.
In that case, we need to make sure that access to the session data is serialized. This way, threads can't both modify the data at the same time and leave it in an inconsistent state.
What you need to do is call "cherrypy.session.acquire_lock()" in methods that update the session data. (Method that only read it don't need that call). The lock will be automatically released when the request is over. Here is a sample code that does it:
class Root: def increment_counter(self): # We call acquire_lock at the beginning # of the method cherrypy.session.acquire_lock() c = cherrypy.session.get('counter', 0) + 1 cherrypy.session['counter'] = c return str(c) increment_counter.exposed = True def read_counter(self): # No need to call acquire_lock # because we're only reading # the session data c = cherrypy.session.get('counter', 0) + 1 return str(c) read_counter.exposed = True
It is possible to configure the session_filter
so
that it calls some special callback functions from your code when sessions
are being created/renewed/deleted. To do so you have to set the
session_filter.on_create_session
,
session_filter.on_renew_session
, and
session_filter.on_delete_session
config options. When a
session is created/deleted, CherryPy will call these functions and pass
them the session data.
CherryPy is a low-level framework for building web applications, and thus does not offer high-level features such as an integrated templating system. This is quite a different point of view from many other web frameworks. CherryPy does not force you to use a specific templating language; instead, it allows you to plug in your favourite one as you see fit.
CherryPy works with all the main templating systems:
You will find recipes on how to use them on the CherryPy website.
Static content is now handled by a filter called "static_filter" that
can easily be enabled and configured in your config file. For instance, if
you wanted to serve /style.css
from
/home/site/style.css
and /static/*
from
/home/site/static/*
, you can use the following
configuration:
Example 3.10. Static filter configuration
[global] static_filter.root = "/home/site" [/style.css] static_filter.on = True static_filter.file = "style.css" [/static] static_filter.on = True static_filter.dir = "static"
The static_filter.root
entry can be either absolute or
relative. If absolute, static content is sought within that absolute path.
Since CherryPy cannot guess where your application root is located, relative
paths are assumed to be relative to the directory where your
cherrypy.root
class is defined (if you do not provide a root,
it defaults to "", and therefore to the directory of your
cherrypy.root
class).
As an application developer, the design of your application affects whether you choose to use absolute or relative paths. If you are creating a one-off application that will only be deployed once, you might as well use absolute paths. But you can make multiple deployments easier by using relative paths, letting CherryPy calculate the absolute path each time for you. Absolute paths, however, give deployers the ability to place static content on read-only filesystems, or on faster disks.
Before version 2.1, CherryPy handled file uploads by reading the entire file into memory, storing it in a string, and passing it to the page handler method. This worked well for small files, but not so well for large files.
CherryPy 2.1 uses the python cgi
module to parse the
POST data. When a file is being uploaded, the cgi
module
stores it in a temp file and returns a FieldStorage
instance
which contains information about this file. CherryPy then passes this
FieldStorage
instance to the method. The
FieldStorage
instance has the following attributes:
file
: the file(-like) object from which you can
read the datafilename
: the client-side filenametype
: the content-type of the fileAs you read this section, refer to the following diagram to understand the flow of execution:
When an unhandled exception is raised inside CherryPy, three actions occur (in order):
before_error_response
filter methods are
called
a _cp_on_error
method is called
response.finalize
is called
after_error_response
filter methods are
called
The error response filter methods are defined by each filter;
they cannot prevent the call to _cp_on_error
(unless
before_error_response
raises an exception, including
HTTPRedirect).
The _cp_on_error
function is a CherryPy
"special attribute"; that is, you can define your own
_cp_on_error
method for any branch in your
cherrypy.root
object tree, and it will be invoked for
all child handlers. For example:
Example 3.11. A custom _cp_on_error
method
import cherrypy class Root: def _cp_on_error(self): cherrypy.response.body = ("We apologise for the fault in the website. " "Those responsible have been sacked.") def index(self): return "A m" + 00 + "se once bit my sister..." index.exposed = True
The default _cp_on_error
function simply responds
as if an HTTPError 500 had been raised (see the next
section).
If an HTTPRedirect is raised during the error-handling
process, it will be handled appropriately. If any other kind of
error occurs during the handling of an initial error, then CherryPy
punts, returning a bare-bones, text/plain
error
response (containing both tracebacks if
server.show_tracebacks
is True).
HTTPError exceptions do not result in calls to
_cp_on_error
. Instead, they have their own
_cp_on_http_error
function. Like _cp_on_error
,
this is a "special attribute" and can be overridden by
cherrypy.root objects. The default _cp_on_http_error
handler sets the HTTP response to a pretty HTML error page.