@@ -0,0 +1,94 @@
+
+mod_asn looks up the AS and network prefix of IP address.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+mod_asn is an Apache module doing lookups of the autonomous system (AS) and the
+network prefix that an IP address is contained in.
+
+It is written with scalability in mind. To do high-speed lookups, it uses the
+PostgreSQL ip4r datatype that is indexable with a Patricia Trie algorithm to
+store network prefixes.
+
+It comes with script to create such a database and update it with snapshots from
+router's "view of the world".
+
+The module sets the looked up data as env table variables, for use by other
+Apache module to do things with it, or for logging -- and it can add the data
+as response headers to the client.
+
+
+Example HTTP response headers:
+
+HTTP/1.1 200 OK
+Date: Thu, 12 Feb 2009 23:24:33 GMT
+Server: Apache/2.2.11 (Linux/SUSE)
+X-Prefix: 83.133.0.0/16
+X-AS: 13237
+
+
+
+Performance
+~~~~~~~~~~~
+
+The database with all ~250.000 prefixes is about 20-30MB in size in the form of
+a PostgreSQL database. Without any tuning, it is able to to >3000 lookups per
+second on a MacBook Pro (tested with random IPs, a single connection, and
+client written in Python running on the same machine).
+
+The Apache module is extremely lightweight.
+
+
+
+Design notes
+~~~~~~~~~~~~
+
+Performed with a Patricia Trie algorithm, the lookup is very efficient. The
+Patricia Trie is a radix tree that works it way from bit to bit, starting at
+the most significant bit. At each bit, there are two alternative "paths". Or
+put another way, the space of prefixes is roughly divided in two halfs at each
+point. The ip4r datatype achieves this by implementing an index that works this
+way. Without the index, a full table scan would be required, plus bitmask
+prefix match for each of the ~250.000 candidate rows.
+
+"Conventional" storage in databases is possible with a workaround, e.g. with
+two long integers denoting each prefix in a MySQL database. But this would
+require an SQL "between" query. An additional column would be needed to store
+the prefix length, in order to find the closest match (the most narrow prefix).
+The built-in inet/cidr data type in PostgreSQL doens't help either because it
+can't be indexed. With conventional methods, only about 30 lookups per second
+can be achieved with a database.
+
+Having the data in a real database makes it accessible for other means as well;
+it is easily possible to query it the list of prefixes that an AS announces,
+for instance. In addition, the storage in the database offers the possibility
+to change and update the data (or even completely replace it) in a simple way,
+by doing this in transaction, without blocking running queries.
+
+For usage outside of Apache, a small libpq-based standalone daemon could be
+written that queries the database. Alternatively, a small handler could be
+written for mod_asn that does nothing than read an IP address from a request
+body (or URL) and return the result.
+
+One argument for the ip4r data type in PostgreSQL is that it is IPv6-ready.
+Some IPv6 autonomous systems already exist (about 800 as of the beginning of
+2009).
+
+
+Usage with MirrorBrain
+~~~~~~~~~~~~~~~~~~~~~~
+
+mod_asn can support mod_mirrorbrain (see http://mirrorbrain.org).
+mod_mirrorbrain can use the data (set in the subprocess environment) for its
+mirror selection algorithm.
+
+In addition, the database can be queried with the MirrorBrain tool set:
+
+ # mb iplookup mirror.susestudio.com
+130.57.19.0/24 (AS3680)
+ # mb iplookup mirror.susestudio.com --all-prefixes
+130.57.19.0/24 (AS3680)
+130.57.0.0/16, 130.57.0.0/20, 130.57.19.0/24, 130.57.32.0/21, 137.65.0.0/16,
+147.2.0.0/17, 151.155.0.0/16, 164.99.0.0/16, 192.31.114.0/24, 192.94.118.0/24,
+192.108.102.0/24, 192.149.26.0/24, 195.109.215.0/24, 212.153.69.0/24
+
+
|