Tuesday, September 05, 2006

IronPython and ADO.NET Part 1

This is the first in a series of posts about database access with IronPython and ADO.NET. This post will discuss connecting to the database and executing basic DDL and SQL statements. So that the examples can run on Windows and non Windows systems, they will support either SQLite3 via the Mono.Data.SQLiteCilent ADO provider or Microsoft Access via the System.Data.Odbc provider.

Firstly we need some data to play with. A number of Python Web frameworks have been adding support for displaying the flag of the country against weblog comments. The country is identified from the remote IP address of the users browser. So we use IronPython to create a table and load it with the data from the ip-to-country cvs file which can be download from here. The full source of the IronPython script is here.

Creating a table

The first section of code references and imports the assemblies required for connecting and accessing the database. So the script can support either SQLite or Access, it tries to reference and import the SQLite ADO provider first, if this fails it then attempts to import the ODBC provider. The scripts uses an import alias so that we can refer to the database connection method by the same name independant of what ADO provider is being used. Also the database specific connection strings and table creation statements are defined here.

import clr
import System
import System.Data
from Mono.Data.SqliteClient import SqliteConnection as dbconnection
connectstr = 'URI=file:ip2country.db,version=3'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom INTEGER,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
PRIMARY KEY (ipfrom,ipto)
from System.Data.Odbc import OdbcConnection as dbconnection
connectstr = 'DSN=ip2country'
ip2country_create_table_ddl = '''
CREATE TABLE ip2country (
ipfrom DOUBLE,
ipto DOUBLE,
countrycode2 CHAR(2),
countrycode3 CHAR(3),
countryname VARCHAR(50),
CONSTRAINT ip2country_pk PRIMARY KEY (ipfrom,ipto)
The last section of the script, connects to the database, opens the connection and creates the table ip2country using the database specific DDL.

dbcon = dbconnection(connectstr)


dbcmd = dbcon.CreateCommand()
dbcmd.CommandText = ip2country_create_table_ddl


So once you run the script, you will either have an SQLite database ip2country.db or Access database ip2country.mdb with a single empty table called ip2country.

Load the data

The full source of the IronPython script for loading the CSV data can be found here. We use the same code from the previous script to setup access to the data. The ip-to-country.csv file contains data of the following format:

"33996344","33996351","GB","GBR","UNITED KINGDOM"
"50331648","69956103","US","USA","UNITED STATES"
To parse each line of the csv file, a regular expression is used.
import re
re_csv = re.compile(',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')
Rather than dynamically creating the SQL insert statement using string concatenation, the script uses variable placeholders in the insert statement and data parameters. In theory, this should be more efficent with the database only needing to prepare the insert statement once, and then binding the data parameters on each insert. But not sure if SQLite or Access does this type of optimisation. At least it means there will be no problems with country names like COTE D'IVOIRE that contain quotes.

dbcmd = dbcon.CreateCommand()
insert_statement= '''
INSERT INTO ip2country (
ipfrom, ipto, countrycode2, countrycode3, countryname
) VALUES ( ?,?,?,?,? )
# Create empty parameters for insert and attach to db command
p1 = dbparam()
p2 = dbparam()
p3 = dbparam()
p4 = dbparam()
p5 = dbparam()

dbcmd.CommandText = insert_statement

Next the script opens the csv file and reads it line by line. After removing the line separator(s), the line is split into individual fields using the compiled regular expression. The value of each field is then assigned to the insert parameter with the delimiting double quotes removed. And the data is inserted into the ip2country table by calling the ExecuteNonQuery Method.
f = open("ip-to-country.csv")
print "Loading..."
for line in f.readlines():
if line.endswith("\r\n"):
line = line[:-2] # running on a posix platform so remove \r\n
line = line[:-1] # must be windows, just remove \n
print line
ipf, ipt, cc2, cc3, cn = re_csv.split(line)
p1.Value = ipf[1:-1]
p2.Value = ipt[1:-1]
p3.Value = cc2[1:-1]
p4.Value = cc3[1:-1]
p5.Value = cn[1:-1]


To run the script the ip-to-country.csv file must be in the current directory, and since it contains 65,000+ lines of data, it will take a while to run.
ipy.exe load_op2country.py

Select some data

Now the ip2country table should contain some data we can query. Let's create a simple script that when passed an IP address, it prints the country location of the IP address. The source of the script can be found here.
To find the location, the script first converts the IP address to the numeric equivalent used in the ip2country data using the function ip2number. A SQL select statement is defined using the numeric ip address as the bounds for the where clause. Then an ExecuteReader instance is created and the results processed in a while loop.

dbcmd = dbcon.CreateCommand()

ipaddress = sys.argv[1]
# Convert dotted ip address to number
ipnumber = ip2number(ipaddress)
print "Error - An IP Address is required"

dbcmd.CommandText = '''
SELECT * FROM ip2country
WHERE ipfrom <= %s
AND ipto >= %s
''' % (ipnumber, ipnumber)

reader = dbcmd.ExecuteReader()

while reader.Read():
print "The location of IP address %s is %s." % (ipaddress, reader[4])


The country name is accessed from row result by column number. I would prefer to get the value of the column via it's name e.g.
print "The location of IP address %s is %s." % (ipaddress, row['countryname'])
and this problem has been addressed by a Greg Stein's dtuple Python module. You will find a version of the find location script that uses dtuple here.
Hopefully this post has given you some insight in how to use IronPython with ADO.Net.

1 comment:

Anonymous said...