Saturday, October 28, 2006

IronPython and ADO.NET Part 2

This is the second in a series of posts about database access with IronPython and ADO.NET. This post will discuss connecting to the database and executing basic DDL and SQL statements via the Python DB-API instead of directly accessing ADO.Net. So that the examples can run on Windows and non Windows systems, they will support either SQLite3 via the Mono.Data.SQLiteCilent ADO provider or Microsoft Access via the System.Data.Odbc provider.

The Python DB-API is a specification created by the Python Database SIG for a consistent interface to relational databases. For CPython there is at least one DB-API compliant library for most of the relational database engines that are used today. As part of his fepy project, Seo Sanghyeon has created a set of wrappers that provide DB-API support for MySQL, PostgreSql, SQLite, Microsoft SQL Server and ODBC ADO.NET database drivers. Since one of the features of ADO.Net is to also provide a consistent interface to relational databases, you may question why do you need to use another layer for database access with IronPython. The answer is simple, the DB-API allows you to focus on the actual access and manipulation of the data and hides the low-level ADO.Net setup and management code. To show how the DB-API can simpify your IronPython code, the examples from the first post of this series have been modified to use the DB-API.

If you want try the example or use the DB-API with IronPython you will need to install it. You can either download the modules from here and copy to the IronPython Lib directory or build and/or install the IronPython Community Edition which includes the DB-API.

Creating a table example

The first section of code is only required so the examples will work on either *ix or windows platforms. In normal usage, there is no need to import dbapi.py directly, just import the DB-API module for the database you want to use.
import dbapi
try:
import sqlite3 as db
dbapi._load_type(db.assembly,db.typename)
connectstr = 'ip2country.db'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom INTEGER,
ipto INTEGER,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
PRIMARY KEY (ipfrom,ipto)
)
'''
except:
import odbc as db
dbapi._load_type(db.assembly,db.typename)
connectstr = 'DSN=ip2country'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom DOUBLE,
ipto DOUBLE,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
CONSTRAINT ip2country_pk PRIMARY KEY (ipfrom,ipto)
)
'''

Comparing this code with the ADO.Net example you see that the opening of the database connection, and create a command instance is automatically done by the DB-API module. The Python DB-API PEP specifies implicit transactions that are started automatically and committed or rolled back on demand, so compared to ADO.Net example , a commit is required.
dbcon = db.connect(connectstr)

cursor = dbcon.cursor()

cursor.execute(ip2country_create_table_ddl)

dbcon.commit()

dbcon.close()

Load the data example.

Compared to the ADO.Net example, the DB-API allows the IronPython code to be simpler as it handles the creation of Parameters. (Note to self: MS Access, bulk inserts and transactions == very slow)
dbcon = db.connect(connectstr)

import re
re_csv = re.compile(',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')

cursor = dbcon.cursor()
insert_statement= '''
INSERT INTO ip2country (
ipfrom, ipto, countrycode2, countrycode3, countryname
) VALUES ( ?,?,?,?,? )
'''

f = open("ip-to-country.csv")
print "Loading..."
for line in f.readlines():
if line.endswith("\r\n"):
line = line[:-2] # remove \r\n
else:
line = line[:-1] # just remove \n
print line
ipf, ipt, cc2, cc3, cn = re_csv.split(line)
cursor.execute(insert_statement,(ipf[1:-1],ipt[1:-1],cc2[1:-1],cc3[1:-1],cn[1:-1]))
dbcon.commit()

f.close()
dbcon.close()

Select some data example.

Instead of using the ExecuteReader as in the ADO.Net example, the fetch method of the cursor instance is used to get the query result.
def ip2number(ipaddress):
'''
Convert dotted IP address to number
'''
A,B,C,D = ipaddress.split(".")
return (int(A) * 16777216) + (int(B) * 65536) + (int(C) * 256) + int(D)

dbcon = db.connect(connectstr)

cursor = dbcon.cursor()

try:
ipaddress = sys.argv[1]
# Convert dotted ip address to number
ipnumber = ip2number(ipaddress)
except:
print "Error - An IP Address is required"
sys.exit(1)

select_statement = '''SELECT * FROM ip2country
WHERE ipfrom <= %s AND ipto >= %s
''' % (ipnumber, ipnumber)

cursor.execute(select_statement)

row = cursor.fetchone()
print "The location of IP address %s is %s." % (ipaddress, row[4])

dbcon.close()