I have recently yet again come across a need to get country information from IP addresses out of SQL server. I use the GeoIP Country data that is freely available from MaxMind.
The GeoIP data is made available in CSV format with the following fields:
Column | Type | Description |
---|---|---|
1 (FirstIP) | varchar(15) | First IP address (dotted) |
2 (LastIP) | varchar(15) | Last IP address (dotted) |
3 (FirstNum) | unsigned int | First IP address (long) |
4 (LastNum) | unsigned int | Last IP address (long) |
5 (CountryCode) | char(2) | Country code |
6 (CountryName) | varchar(50) | Country name |
As you can see, the IP address ranges are available in dotted or integral (long) formats. Unfortunately, SQL server doesn’t allow us to create unsigned columns, so we need to convert either from unsigned to signed, or from dotted to long. Since we will presumably need to convert dotted to long in the future, we may as well go that route. For now, just create tables like so:
Countries | ||
---|---|---|
Column | Type | Comments |
CountryCode | char(2) not null | Primary Key |
CountryName | varchar(50) not null |
GeoIP | ||
---|---|---|
Column | Type | Comments |
FirstNum | int not null | |
LastNum | int not null | |
CountryCode | char(2) not null | Foreign key: Countries.CountryCode |
Import the CSV data into a temporary table, which I assume you’ll call GeoIP_temp. You can then fill your Countries table like so:
INSERT INTO Countries (CountryCode, CountryName) SELECT DISTINCT CountryCode, CountryName FROM GeoIP_temp
Now let’s look at getting the IP addresses into the system in a format that we can use, by converting the dotted addresses to longs. To do this, we’ll make a User-Defined Function (UDF) in SQL server. The original version of this code doesn’t actually work, so I’ve modified it a bit.
CREATE FUNCTION dbo.NumFromIP ( @ip VARCHAR(15) ) RETURNS INT AS BEGIN DECLARE @rv BIGINT, @o1 BIGINT, @o2 BIGINT, @o3 BIGINT, @o4 BIGINT SELECT @o1 = CONVERT(BIGINT, PARSENAME(@ip, 4)), @o2 = CONVERT(BIGINT, PARSENAME(@ip, 3)), @o3 = CONVERT(BIGINT, PARSENAME(@ip, 2)), @o4 = CONVERT(BIGINT, PARSENAME(@ip, 1)) IF (@o1 BETWEEN 0 AND 255) AND (@o2 BETWEEN 0 AND 255) AND (@o3 BETWEEN 0 AND 255) AND (@o4 BETWEEN 0 AND 255) BEGIN SET @rv = (@o1 * 16777216) + (@o2 * 65536) + (@o3 * 256) + (@o4) IF (@rv > 2147483647) BEGIN SET @rv = @rv - 4294967296 END END ELSE SET @rv = -1 RETURN CONVERT(INT,@rv) END
That last IF statement is what does the conversion from unsigned to signed and allows us to use a regular int instead of a bigint.
The inverse of this function, to convert back from long to dotted, is here:
CREATE FUNCTION dbo.IPFromNum ( @ip INT ) RETURNS VARCHAR(15) AS BEGIN DECLARE @o1 INT, @o2 INT, @o3 INT, @o4 INT, @bip BIGINT SET @bip = @ip IF @bip < 0 SET @bip = @bip + 4294967296 IF @bip > 4294967295 RETURN '255.255.255.255' IF @bip < 0 RETURN '0.0.0.0' SET @o1 = @bip / 16777216 SET @bip = @bip % 16777216 SET @o2 = @bip / 65536 SET @bip = @bip % 65536 SET @o3 = @bip / 256 SET @bip = @bip % 256 SET @o4 = @bip RETURN CONVERT(VARCHAR(4), @o1) + '.' + CONVERT(VARCHAR(4), @o2) + '.' + CONVERT(VARCHAR(4), @o3) + '.' + CONVERT(VARCHAR(4), @o4) END
You will then need to transfer the data into your GeoIP table, like so:
INSERT INTO GeoIP (FirstNum, LastNum, CountryCode) SELECT dbo.NumFromIP(FirstIP), dbo.NumFromIP(LastIP), CountryCode FROM GeoIP_temp
You now have everything you need to do basic GeoIP lookup, like so:
SELECT c.CountryName, COUNT(*) AS HowMany FROM Countries AS c INNER JOIN GeoIP AS g ON (c.CountryCode = g.CountryCode) INNER JOIN Logs AS l ON (dbo.NumFromIP(ip) BETWEEN g.FirstNum AND g.LastNum) GROUP BY c.CountryName ORDER BY 2 DESC
However, this query can be a rather lengthy process, as BETWEEN operations are basically unindexable table scans and take quite some time to run. Instead, you may want to modify your table to automagically lookup and store the country information as the IP address is inserted or updated. This little bit of denormalization is handled perfectly with a small trigger:
CREATE TRIGGER [ip_fix] ON dbo.Logs AFTER INSERT, UPDATE AS SET NOCOUNT ON IF UPDATE(ip) BEGIN UPDATE Logs SET CountryCode = ( SELECT CountryCode FROM GeoIP WHERE dbo.NumFromIP(Logs.ip) BETWEEN FirstNum AND LastNum ) WHERE (id IN (SELECT id FROM inserted)) END SET NOCOUNT OFF
This presumes, of course, that your Logs table has a primary key called id and that you’ve added in the CountryCode field. This trigger will slow down inserts and updates slightly, so you may instead choose to update the country codes asynchronously (nightly, hourly, whatever) if you care more about update speed than query speed. And don’t forget to index your CountryCode field in any tables you add it to!