Archive

Programming

Recently I needed to do some data processing on a pile of CSV files exported from an Access database. Since the relational structure was fairly complex, I decided not to try to stitch it together with UNIX commands but rather to import the whole lot into a sqlite3 database on which to run queries.

Here’s the Python code, which is fairly straightforward. The only slightly exciting thing is that it detects columns whose names end in “_id” and adds an index for them, so that when you start doing join queries they’re not painfully slow.

The script traverses a directory full of CSV files. It takes two arguments; run it like so:

python csv2sqlite.py nameofdb directory

 

from __future__ import print_function
import sqlite3
import csv
import os
import glob
import sys

db = sys.argv[1]

conn = sqlite3.connect(db)
conn.text_factory = str  # allows utf-8 data to be stored

c = conn.cursor()

# traverse the directory and process each .csv file
for csvfile in glob.glob(os.path.join(sys.argv[2], "*.csv")):
    # remove the path and extension and use what's left as a table name
    tablename = os.path.splitext(os.path.basename(csvfile))[0]

    with open(csvfile, "rb") as f:
        reader = csv.reader(f)

        header = True
        for row in reader:
            if header:
                # gather column names from the first row of the csv
                header = False

                sql = "DROP TABLE IF EXISTS %s" % tablename
                c.execute(sql)
                sql = "CREATE TABLE %s (%s)" % (tablename,
                          ", ".join([ "%s text" % column for column in row ]))
                c.execute(sql)

                for column in row:
                    if column.lower().endswith("_id"):
                        index = "%s__%s" % ( tablename, column )
                        sql = "CREATE INDEX %s on %s (%s)" % ( index, tablename, column )
                        c.execute(sql)

                insertsql = "INSERT INTO %s VALUES (%s)" % (tablename,
                            ", ".join([ "?" for column in row ]))

                rowlen = len(row)
            else:
                # skip lines that don't have the right number of columns
                if len(row) == rowlen:
                    c.execute(insertsql, row)

        conn.commit()

c.close()
conn.close()
Advertisements

This is one of those ones where you only realise the answer is one line in the documentation after an hour or two of fruitless googling…

I was using ctypes to access a C function which returns a pointer to a buffer and the size of the buffer, along the lines of:

char *get_data (size_t *len);

Mapping this into Python wasn’t hard:

get_data = lib["get_data"]
get_data.restype = POINTER(c_char)
get_data.argtypes = [ POINTER(c_size_t) ]

...

len = c_size_t()
data = get_data(byref(len))

(Note that you don’t use c_char_p as the return type, because that’s used for NULL-terminated strings, whereas our function returns binary data which may include NULLs.)

I then wanted to write the data to a file. However, a Python File object’s .write() method takes a string (or bytes object), so I can’t pass it ‘data’ as is (because ctypes doesn’t know how to cast it to a string, which isn’t unreasonable given that it also doesn’t know how long it is).

So the question is: how do you cast a pointer to data to a Python string?

The answer is to use the ctypes string_at() function:

len = c_size_t()
data = get_data(byref(len))

outfile.write(string_at(data, len))

I’m not 100% sure whether this copies the buffer or simply shares the pointer, but my understanding from various forum posts out there is that it is the latter, and so much more efficient than converting the data to a string by iterating over the bytes and constructing the string.

10 or so years ago I worked on a CD-ROM product which continues to sell from time to time. At the time, we used a product called PC-Install to make the installer; since then we’ve moved on to Inno Setup.

Unfortunately, although our software was 32-bit, the installer produced by PC-Install was 16-bit, and some flavours of recent versions of Windows no longer run 16-bit software. So although the application could run, there was no way for users to install it.

So I set about using Inno Setup to create an installer which would work against the existing stock of CD-ROMs (binning them and issuing a new version isn’t financially viable).

The basic idea was for the installer to detect the CD-ROM, copy the files across, and set up the icons. Obviously, I couldn’t rely on the D: drive always being the optical drive, so I’d need to detect it dynamically. This stackoverflow post links to the code I used and adapted (it doesn’t work out of the box). It implements a function to discover the optical drive letter, and then uses a scripted constant to invoke it for use as the file source.

So, I got the function working, and then used it in my Source line:

Source: "{code:GetCdromDrive}\*.*"; DestDir: "{app}"; Flags: recursesubdirs;

However, when compiling/running this, I got the error:

Unknown filename prefix "{code:"

suggesting one couldn’t use scripted constants in Source lines. After a bit of fruitless experimentation and googling though I returned to the original example code and spotted I’d missed one of the Flags: ‘external’. This turns out to be the key:

Source: "{code:GetCdromDrive}\*.*"; DestDir: "{app}"; Flags: recursesubdirs external;

Apparently the source directory is ordinarily determined at compile time, and the ‘external’ keyword defers it to run time. (This is hardly obvious from the documentation though.)

I’ve been banging my head for a couple of hours trying to work out why W3C geolocation was working in Firefox but not IE9. Using the F12 developer tools I was able to step through the javascript and see that it was calling getCurrentPosition(), but nothing would happen at that point – neither the success nor error handlers were called.

Various internet suggestions I tried before finally arriving at the answer:

  • Using window.navigator.geolocation rather than just navigator.geolocation. You should do this (or find a toolkit that abstracts it for you – jquery doesn’t seem to).
  • Temporarily changing my DOCTYPE to the HTML5 declaration <!doctype html>. The Microsoft geolocation pages say that geolocation is only supported in standards mode, and that using the HTML5 doctype is the easiest way to do this. However, I could see from F12 developer tools that I was already in IE9 standards mode.
  • Changing the IE9 privacy settings. There’s a checkbox to prevent sites from requesting your location; it wasn’t checked, but there’s a “Clear sites” button next to it so I gave that a go. No dice.
  • Combing through the security settings to see if there was a magic radio button. Nope.

Finally, I created a minimal page on my desktop with just a hyperlink to invoke getCurrentPosition(). That worked, so I assumed there was something subtly broken in my real page. So, I saved the source to the desktop so I could start butchering it to find the problem. However, when I loaded the unmodified page in IE9, it worked.

So I figured it was something to do with the page’s URL. It was being served on an intranet server, so I used the old trick of using a qualified domain (http://foo.internal rather than http://foo) to trick IE into thinking it was an internet site (it seems that dots in the domain are how IE decides). Lo and behold, that did it, and geolocation started working.

However, that wasn’t the end of the story. It turns out that when IE9 prompts you to ask whether you want to share your location, if you click the ‘X’ close button rather than making a selection, it will no longer prompt you thereafter. Restarting the browser clears it. I then went back to my unqualified domain, and that fixed that too.

TL;DR: try restarting your browser.

 

 

If you’ve used jquery, you’ll be used to attaching click events etc to items:

// for <a class="delete" href="{url}">, get user confirmation
// before navigating off to {url}

$('a.delete').click(function () {
    return confirm('Really?');
});

Sometimes though, the items you want to attach behaviours to don’t exist yet (e.g. because they’re created and destroyed dynamically in response to UI interaction).

You could attach the behaviour at the point you create the new element, but if you’re doing that in several places, you’ll have the same code in various places. Plus, your implementation of the behaviour for one kind of element will be buried inside the implementation of the behaviour of some other kind of element (or refactored into a function in some more public scope than it ought to be).

It turns out that jquery has a method .on() which allows you to attach event handlers to elements that may or may not yet exist:

$('body').on('click', 'a.delete', function () {    return confirm('Really?');});

The documentation explains how this works (attaching the handler to some parent element — the body element in the above example — and using browser event bubbling to catch events originating on the selected children).

It’s quite nice, you end up with quite a clean pattern whereby you create elements with a CSS class for which you have previously defined the behaviour using .on().

I was processing a UTF-8 dataset where the performance copyright symbol (Unicode code point 0x2117) had been entered as ASCII “(P)”. I wanted to replace this sequence with the actual symbol, but I didn’t know how to create an arbitrary UTF-8 character. After a bit of googling (more than 5 mins, otherwise I wouldn’t be posting!) here’s the answer:

txt = re.sub(r'[(]P[)]', unichr(0x2117).encode('utf-8'), txt)

Python’s ctypes library lets you load a dynamically-loaded C library (.so, .dll, .dylib depending on platform) and access the symbols within (call functions, etc).

The documentation is quite good, but here’s something that I didn’t immediately find an answer to.

Suppose you have a C structure, and a function that returns a pointer to such a structure:

struct foo {
    int bar;
};

struct foo *newfoo () {...}

In Python, you want to access the members of the struct foo returned by a function. How do you do it?

First, you need to define a Python class that represents struct foo:

class Foo (ctypes.Structure):
    _fields_ = [ ("bar", c_int) ]

Next, you need to tell ctypes that newfoo() returns a pointer to such a thing:

mylib.newfoo.restype = POINTER(Foo)

Finally, after invoking newfoo(), you need to use the pointer’s contents member to dereference it:

p = mylib.newfoo()
p.contents.bar = 17
%d bloggers like this: