Thursday, November 23, 2006

IronPython and trace style debugging

When developing with IronPython under Windows, there is excellent debugger support via Visual Studio. But sometimes you just want to do simple trace style debugging, in other words, put some print statements in your code. This works well when developing an application you can run under the console but is problematic if the application is a service or ASP.NET handler. When developing similar applications with CPython under Windows, I used the win32trace.pyd/win32traceutil.py to achieve this.

From the win32 python docs:

These modules allow one Python process to generate output via a "print" statement, and another unrelated Python process to display the data. This works even if the Python process generating the output is running under the context of a service, where more traditional means are not available. The module works on Windows 95 and Windows NT.

To enable this debugging, simply "import win32traceutil" somewhere in your program - thats it! This automatically re-directs the standard Python output streams to the remote collector (If that behaviour is undesirable, you will need to use win32trace directly.) To actually see the output as it is produced, start a DOS prompt, and run the file "win32traceutil.py" (eg, double-click on it from Explorer, or "python.exe win32traceutil.py") This will print all output from all other Python processes (that have imported win32traceutil) until you kill it with Ctrl+Break!)



So I decided you create something similar for IronPython. win32trace uses a memory mapped file for communication between the application and the trace collector. Since the only way I could find to use memory mapped files under CLI, relied on calls to unmanaged win32 api, I decided to use network UDP datagrams instead. Rather than writing something from scratch, I remembered a python script that I had used many years ago, creosote.py by Jeff Bauer. The original link to the script is broken, so I finally tracked a copy down in the Zope CVS. So I wrapped this code in a wrapper traceutil.py.

To use it, you just need to put this script in your IronPython or CPython path, and any output to stdout or stderr will be sent to the collector.

To start the collector:

ipy.exe traceutil.py


If the following code is run as a script by IronPython or CPython:

import traceutil

print "start"
a = "A"
# Create an error
import monty


the following would appear in the collector console running under .NET:

creosote bucket waiting on port: 7739
start
\n
Traceback (most recent call last):\r\n File E:\\IronPython\\IPCE-r4\\test_traceutil.py, line 6, in Initialize\r\n File , line 0, in __import__##4r\nImportErr
or: No module named monty\r\n


and the following would appear in the collector console running under Mono:

creosote bucket waiting on port: 7739
start
\n
Traceback (most recent call last):\n File test, line unknown, in Initialize\nImportError: No module named monty\n

It is important to note that the creosote client code does not block if the collector isn't running which is a good behaviour.

Thursday, November 02, 2006

Microsoft Release FastCGI Technical Preview

To allow PHP to perform better with IIS, Microsoft have released a FastCGI component for IIS. Why is this good for Python Web apps? Many support FastCGI and if your web app/framework is WSGI compliant you can use flup. So we finally have Microsoft "blessed" way to get our Python apps to run behind IIS.

Tuesday, October 31, 2006

Roundup and WSGI

In a previous post I mentioned that I had an adaptor to run the Roundup Issue Tracker as a WSGI application. I never actually published the location of the code. So if you want to have a play, it can be found here. There are also 2 example ini files for using the adaptor with Paste.Deploy

Currently there is one unresolved issue: when using Roundup's internal authentication, after login the browser is not always re-directed correctly. This is not an issue for me as I use WSGI based authentication middleware.

10 Nov 2006
As of Roundup 1.3, Richard Jones has added a wsgi handler so I suggest you use that one instead.

Saturday, October 28, 2006

IronPython and ADO.NET Part 2

This is the second in a series of posts about database access with IronPython and ADO.NET. This post will discuss connecting to the database and executing basic DDL and SQL statements via the Python DB-API instead of directly accessing ADO.Net. So that the examples can run on Windows and non Windows systems, they will support either SQLite3 via the Mono.Data.SQLiteCilent ADO provider or Microsoft Access via the System.Data.Odbc provider.

The Python DB-API is a specification created by the Python Database SIG for a consistent interface to relational databases. For CPython there is at least one DB-API compliant library for most of the relational database engines that are used today. As part of his fepy project, Seo Sanghyeon has created a set of wrappers that provide DB-API support for MySQL, PostgreSql, SQLite, Microsoft SQL Server and ODBC ADO.NET database drivers. Since one of the features of ADO.Net is to also provide a consistent interface to relational databases, you may question why do you need to use another layer for database access with IronPython. The answer is simple, the DB-API allows you to focus on the actual access and manipulation of the data and hides the low-level ADO.Net setup and management code. To show how the DB-API can simpify your IronPython code, the examples from the first post of this series have been modified to use the DB-API.

If you want try the example or use the DB-API with IronPython you will need to install it. You can either download the modules from here and copy to the IronPython Lib directory or build and/or install the IronPython Community Edition which includes the DB-API.

Creating a table example

The first section of code is only required so the examples will work on either *ix or windows platforms. In normal usage, there is no need to import dbapi.py directly, just import the DB-API module for the database you want to use.
import dbapi
try:
import sqlite3 as db
dbapi._load_type(db.assembly,db.typename)
connectstr = 'ip2country.db'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom INTEGER,
ipto INTEGER,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
PRIMARY KEY (ipfrom,ipto)
)
'''
except:
import odbc as db
dbapi._load_type(db.assembly,db.typename)
connectstr = 'DSN=ip2country'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom DOUBLE,
ipto DOUBLE,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
CONSTRAINT ip2country_pk PRIMARY KEY (ipfrom,ipto)
)
'''

Comparing this code with the ADO.Net example you see that the opening of the database connection, and create a command instance is automatically done by the DB-API module. The Python DB-API PEP specifies implicit transactions that are started automatically and committed or rolled back on demand, so compared to ADO.Net example , a commit is required.
dbcon = db.connect(connectstr)

cursor = dbcon.cursor()

cursor.execute(ip2country_create_table_ddl)

dbcon.commit()

dbcon.close()

Load the data example.

Compared to the ADO.Net example, the DB-API allows the IronPython code to be simpler as it handles the creation of Parameters. (Note to self: MS Access, bulk inserts and transactions == very slow)
dbcon = db.connect(connectstr)

import re
re_csv = re.compile(',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')

cursor = dbcon.cursor()
insert_statement= '''
INSERT INTO ip2country (
ipfrom, ipto, countrycode2, countrycode3, countryname
) VALUES ( ?,?,?,?,? )
'''

f = open("ip-to-country.csv")
print "Loading..."
for line in f.readlines():
if line.endswith("\r\n"):
line = line[:-2] # remove \r\n
else:
line = line[:-1] # just remove \n
print line
ipf, ipt, cc2, cc3, cn = re_csv.split(line)
cursor.execute(insert_statement,(ipf[1:-1],ipt[1:-1],cc2[1:-1],cc3[1:-1],cn[1:-1]))
dbcon.commit()

f.close()
dbcon.close()

Select some data example.

Instead of using the ExecuteReader as in the ADO.Net example, the fetch method of the cursor instance is used to get the query result.
def ip2number(ipaddress):
'''
Convert dotted IP address to number
'''
A,B,C,D = ipaddress.split(".")
return (int(A) * 16777216) + (int(B) * 65536) + (int(C) * 256) + int(D)

dbcon = db.connect(connectstr)

cursor = dbcon.cursor()

try:
ipaddress = sys.argv[1]
# Convert dotted ip address to number
ipnumber = ip2number(ipaddress)
except:
print "Error - An IP Address is required"
sys.exit(1)

select_statement = '''SELECT * FROM ip2country
WHERE ipfrom <= %s AND ipto >= %s
''' % (ipnumber, ipnumber)

cursor.execute(select_statement)

row = cursor.fetchone()
print "The location of IP address %s is %s." % (ipaddress, row[4])

dbcon.close()

Friday, October 27, 2006

IronPython Community Edition R3 Released.

Yesterday Seo Sanghyeon announced that the third release of IPCE was available. This release uses the current IronPython stable version 1.0.1 and the modules of the CPython 2.4.4 standard library that are known to work under IronPython. Also Sanghyeon has been busy creating additional CPython-compatible wrappers which are included with this release:
  • zlib, using System.IO.Compression.
  • hashlib, using System.Security.Cryptography
  • sqlite3, using generic DB-API module
I think another important change is the move to a MIT license. There had been some negative discussion about IPCE's previous WTF license, so hopefully this change means more people will consider using IPCE. So if you want a version of IronPython that has "batteries included", give IPCE a go.

You can download it from here or here.

Wednesday, October 18, 2006

Saturday, October 14, 2006

Adding .pth file support to IronPython

In a blog post I read, there was the following statement:

'Damnit. Apparently IronPython doesn’t support .pth files. I’m not sure if I should expect this or not from a “1.0” product, but it’s sure annoying since it seems most libraries use them.'

For me, it is an issue due the fact that many of the libraries I want to use come as Python Eggs, and the easy-install.pth file is required if you want them to work. So I decided to see what was needed to add .pth support to IronPython. In the end it was not too hard as the logic to add the contents of the pth files to the sys.path is included in the CPython site.py. It needed a little modification, but if you add this code to IronPython's site.py, .pth files work.

Sunday, October 08, 2006

Now running on Blogger Beta

Yesterday I moved this blog to run under the Blogger Beta. I had been running a test blog under it for the last month, and had suffered no show stopping issues. So made the decision to move yesterday and are now regretting it. The one thing I didn't test was the Atom feed with the various feed readers that subscribe to my blog. It would appear that pyblagg (the site most of my readers come via) doesn't see that the feed has been updated, so any posts created since the move are not being displayed. Do not have time to investigate at the moment, so created a feed using Feedburner in the hope that pyblagg can use this.

Using IronPython's 1.0.1 new community written built-in module support

In a previous post I mentioned the community written built-in modules support provided with IronPython 1.0.1. I remembered that Kevin Chu had posted a C# replacement for the CPython md5 library on the mailing list for inclusion in the IronPython core. His code is a perfect candiate for using the new external module loading support. So if you compile the code below as a library assembly

using System;
using System.Collections.Generic;
using System.Text;
using IronPython.Runtime;
using System.Security.Cryptography;
using System.Runtime.InteropServices;

[assembly: PythonModule("md5", typeof(IronPythonCommunity.Modules.PythonMD5))]
namespace IronPythonCommunity.Modules
{
[PythonType("md5")]
public class PythonMD5
{
private MD5 _provider;
private readonly Encoding raw = Encoding.GetEncoding("iso-8859-1");
private byte[] empty;

public PythonMD5()
{
_provider = MD5.Create();
empty = raw.GetBytes("");
}

public PythonMD5(string arg)
: this()
{
this.Update(arg);
}

internal PythonMD5(MD5 provider)
: this()
{
_provider = provider;
}
[PythonName("new")]
public static PythonMD5
PythonNew([DefaultParameterValue(null)] string arg)
{
PythonMD5 obj;
if (arg == null)
obj = new PythonMD5();
else
obj = new PythonMD5(arg);
return obj;
}
[PythonName("md5")]
public static PythonMD5
PythonNew2([DefaultParameterValue(null)] string arg)
{
return PythonMD5.PythonNew(arg);
}
[PythonName("update")]
public void Update(string arg)
{
byte[] bytes = raw.GetBytes(arg);
_provider.TransformBlock(bytes, 0, bytes.Length, bytes, 0);
}
[PythonName("digest")]
public string Digest()
{
_provider.TransformFinalBlock(empty, 0, 0);
return raw.GetString(_provider.Hash);
}
[PythonName("hexdigest")]
public string HexDigest()
{
_provider.TransformFinalBlock(empty, 0, 0);
string hexString = "";
foreach (byte b in empty)
{
hexString += b.ToString("X2");
}
return hexString;
}
[PythonName("copy")]
public PythonMD5 Clone()
{
PythonMD5 obj = new PythonMD5(this._provider);
return obj;
}
}
}
and create a DLLs directory in your IronPython 1.0.1 installation and copy the compiled assembly to it. When you start a new IronPython console, you be able to import md5 and use it like the CPython version.
>>> import md5
>>> md5
<module 'md5' (built-in)>
>>> m = md5.new()
>>> m.update("Nobody inspects")
>>> m.update(" the spammish repetition")
>>> m.digest()
u'\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9'


Mind you, if you compare the code for the C# implementation with Seo Sanghyeon's md5.py you can see why using IronPython with .NET or Mono makes a programmers life easier.

Friday, October 06, 2006

Latest IronPython Releases

IronPython 1.0.1 released

IronPython 1.0.1 was released today. Apart from some minor bug fixes, it includes a new feature described in the release email as follows:

"The new support for community written built-in modules enables loading the .NET DLLs on startup and adding them to the built-in module list. This feature was implemented by updating site.py to check for a "DLLs" directory and looking for the PythonModuleAttribute point to an assembly. Now users can create built-in modules by simply adding this attribute to their assembly and re-distributing only the new assembly which the user can add to their DLLs directory."

The IronPython team say it is a small feature, but I believe it has greater implications. There has been much discussion on the mailing list about Microsoft not accepting community contributed code and when will missing CPython standard library module X that relies on a C extension be in the distribution. Now the community has another way to provide them.

IronPython Community Edition (IPCE)

A couple of weeks ago the busiest member of the IronPython community Seo Sanghyeon released a distribution of IronPython compiled under Mono. He has called this the IronPython Community Edition. As well as have a number of patches applied to the IronPython source that makes it work better under Mono, it comes with a substantial portion of the CPython standard library known to work under IronPython, CPython-compatible wrappers for .NET library:
md5, pyexpat, select, sha, socket, ssl, unicodedata, and the third party libraries - BeautifulSoup and ElementTree. He has also created a Sourceforge project for his IronPython modules and IPCE http://fepy.sourceforge.net/

Tuesday, September 19, 2006

Deploying the GDATA Reader as an executable revisited

In a previous post, I described how to create an executable with IronPython and provided a simple script to do it. I see that the IronPython team have released their own script Pyc to do the job. It can be found on the IronPython samples page.

Wednesday, September 06, 2006

IronPython1.0 Final Released Today

Jim Hugunin announced today that the Microsoft Dynamic Languages team have released IronPython 1.0. The release includes some new samples as well.

Thanks to Jim and his team for a great product. I never thought I would say this, thanks Microsoft ;-)

Tuesday, September 05, 2006

IronPython and ADO.NET Part 1

This is the first in a series of posts about database access with IronPython and ADO.NET. This post will discuss connecting to the database and executing basic DDL and SQL statements. So that the examples can run on Windows and non Windows systems, they will support either SQLite3 via the Mono.Data.SQLiteCilent ADO provider or Microsoft Access via the System.Data.Odbc provider.

Firstly we need some data to play with. A number of Python Web frameworks have been adding support for displaying the flag of the country against weblog comments. The country is identified from the remote IP address of the users browser. So we use IronPython to create a table and load it with the data from the ip-to-country cvs file which can be download from here. The full source of the IronPython script is here.

Creating a table


The first section of code references and imports the assemblies required for connecting and accessing the database. So the script can support either SQLite or Access, it tries to reference and import the SQLite ADO provider first, if this fails it then attempts to import the ODBC provider. The scripts uses an import alias so that we can refer to the database connection method by the same name independant of what ADO provider is being used. Also the database specific connection strings and table creation statements are defined here.

import clr
import System
clr.AddReference("System.Data")
import System.Data
try:
clr.AddReference("Mono.Data.SqliteClient")
from Mono.Data.SqliteClient import SqliteConnection as dbconnection
connectstr = 'URI=file:ip2country.db,version=3'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom INTEGER,
ipto INTEGER,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
PRIMARY KEY (ipfrom,ipto)
)
'''
except:
from System.Data.Odbc import OdbcConnection as dbconnection
connectstr = 'DSN=ip2country'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom DOUBLE,
ipto DOUBLE,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
CONSTRAINT ip2country_pk PRIMARY KEY (ipfrom,ipto)
)
'''
The last section of the script, connects to the database, opens the connection and creates the table ip2country using the database specific DDL.

dbcon = dbconnection(connectstr)

dbcon.Open()

dbcmd = dbcon.CreateCommand()
dbcmd.CommandText = ip2country_create_table_ddl

dbcmd.ExecuteNonQuery()

dbcon.Close()
So once you run the script, you will either have an SQLite database ip2country.db or Access database ip2country.mdb with a single empty table called ip2country.

Load the data


The full source of the IronPython script for loading the CSV data can be found here. We use the same code from the previous script to setup access to the data. The ip-to-country.csv file contains data of the following format:

"33996344","33996351","GB","GBR","UNITED KINGDOM"
"50331648","69956103","US","USA","UNITED STATES"
"69956104","69956111","BM","BMU","BERMUDA"
To parse each line of the csv file, a regular expression is used.
import re
re_csv = re.compile(',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')
Rather than dynamically creating the SQL insert statement using string concatenation, the script uses variable placeholders in the insert statement and data parameters. In theory, this should be more efficent with the database only needing to prepare the insert statement once, and then binding the data parameters on each insert. But not sure if SQLite or Access does this type of optimisation. At least it means there will be no problems with country names like COTE D'IVOIRE that contain quotes.

dbcmd = dbcon.CreateCommand()
insert_statement= '''
INSERT INTO ip2country (
ipfrom, ipto, countrycode2, countrycode3, countryname
) VALUES ( ?,?,?,?,? )
'''
# Create empty parameters for insert and attach to db command
p1 = dbparam()
dbcmd.Parameters.Add(p1)
p2 = dbparam()
dbcmd.Parameters.Add(p2)
p3 = dbparam()
dbcmd.Parameters.Add(p3)
p4 = dbparam()
dbcmd.Parameters.Add(p4)
p5 = dbparam()
dbcmd.Parameters.Add(p5)

dbcmd.CommandText = insert_statement

Next the script opens the csv file and reads it line by line. After removing the line separator(s), the line is split into individual fields using the compiled regular expression. The value of each field is then assigned to the insert parameter with the delimiting double quotes removed. And the data is inserted into the ip2country table by calling the ExecuteNonQuery Method.
f = open("ip-to-country.csv")
print "Loading..."
for line in f.readlines():
if line.endswith("\r\n"):
line = line[:-2] # running on a posix platform so remove \r\n
else:
line = line[:-1] # must be windows, just remove \n
print line
ipf, ipt, cc2, cc3, cn = re_csv.split(line)
p1.Value = ipf[1:-1]
p2.Value = ipt[1:-1]
p3.Value = cc2[1:-1]
p4.Value = cc3[1:-1]
p5.Value = cn[1:-1]
dbcmd.ExecuteNonQuery()

f.close()
dbcon.Close()


To run the script the ip-to-country.csv file must be in the current directory, and since it contains 65,000+ lines of data, it will take a while to run.
ipy.exe load_op2country.py

Select some data


Now the ip2country table should contain some data we can query. Let's create a simple script that when passed an IP address, it prints the country location of the IP address. The source of the script can be found here.
To find the location, the script first converts the IP address to the numeric equivalent used in the ip2country data using the function ip2number. A SQL select statement is defined using the numeric ip address as the bounds for the where clause. Then an ExecuteReader instance is created and the results processed in a while loop.
dbcon.Open()

dbcmd = dbcon.CreateCommand()

try:
ipaddress = sys.argv[1]
# Convert dotted ip address to number
ipnumber = ip2number(ipaddress)
except:
print "Error - An IP Address is required"
sys.exit(1)


dbcmd.CommandText = '''
SELECT * FROM ip2country
WHERE ipfrom <= %s
AND ipto >= %s
''' % (ipnumber, ipnumber)


reader = dbcmd.ExecuteReader()

while reader.Read():
print "The location of IP address %s is %s." % (ipaddress, reader[4])

reader.Close()
dbcon.Close()

The country name is accessed from row result by column number. I would prefer to get the value of the column via it's name e.g.
print "The location of IP address %s is %s." % (ipaddress, row['countryname'])
and this problem has been addressed by a Greg Stein's dtuple Python module. You will find a version of the find location script that uses dtuple here.
Hopefully this post has given you some insight in how to use IronPython with ADO.Net.

Saturday, September 02, 2006

Serving a Pylons App with ISAPI-WSGI

David Primmer has put together a great how-to on running a Pylons app with ISAPI-WSGI

http://pylonshq.com/project/pylonshq/wiki/ServePylonsWithIIS

And since Pylons uses Paste, it is a good how-to for running any Paste app under IIS.

And just out of interest, is anyone other than David and myself using isapi-wsgi?