When building any location-aware application, one of the first problems is how to build the locations database for the application, and what information is actually needed: names, ISO codes, latitude/longitude, boundaries, bounding boxes, administrative levels, etc. There are numerous sources of geospatial data available online, each with different licenses and features; just to name a few: Geonames, GADM, Natural Earth, TIGER (US only), OSM or Quattroshapes.

In particular, Geonames is a very detailed worldwide database, though it lacks administrative boundaries. Nonetheless, this data can be used to enhance shapefiles from other sources, such as GADM or OSM. These are the steps to achieve this on a Mac:

  1. Install PostgreSQL with the PostGIS extension. We will be using the engine to index the Geonames data. I am currently using PostgreSQL 9.4.0 with PostGIS 2.1.5.

    brew install postgresql postgis

    Alternatively download PostgresApp which bundles PostGIS.

  2. Download Geonames data, either choosing allCountries or just selecting the ones you are interested in.

  3. Create the Geonames DB in PostgreSQL and enable spatial extensions:

    createdb geonames
    psql -d geonames -c "CREATE EXTENSION postgis;"
    psql -d geonames -c "CREATE EXTENSION postgis_topology;"

    Should you stumble an issue creating the spatial extensions, if you installed PostgreSQL with brew, try adding a symlink to PostGIS scripts in the PostgreSQL extensions folder:

    ln -s $(brew --prefix postgis)/share/postgis/*  $(brew --prefix postgres)/share/postgresql/extension/
  4. Import Geonames data into PostgreSQL. This Ruby script provides handy commands for setting up the DB by running ./gazetteer.rb setup -d geonames, though I ran into some CSV issues when importing the data. Luckily this can be achieved directly from PostgreSQL as instructed here:

    copy geoname (geonameid,name,asciiname,alternatenames,latitude,longitude,fclass,fcode,country,cc2,admin1,admin2,admin3,admin4,population,elevation,gtopo30,timezone,moddate) from 'AR.txt' null as '';

    Make sure to add the PostGIS geometry column as well, and the constraints if desired:

    SELECT AddGeometryColumn ('public','geoname','the_geom',4326,'POINT',2);
    UPDATE geoname SET the_geom = ST_PointFromText('POINT(' || longitude || ' ' || latitude || ')', 4326);
    CREATE INDEX idx_geoname_the_geom ON public.geoname USING gist(the_geom);
  5. Download the shapefile you want to enhance with Geonames data. In this case I am using the GADM shapefile for Argentina, and using the first administrative level, provinces, found in ARG_adm1.shp.

  6. Download the script shape-gn-matcher.py which will be running all the magic. The original version downloads Geonames data from the API, I changed it so all the metadata fields are retrieved from the PostgreSQL database.

  7. Run the script with the following command:

    ./shape-gn-matchr.py --shp_name_keys=NAME_1 --dbname=geonames --dbuser=username --dbpass=password --shp_cc_key=ISO --allowed_gn_classes="" --allowed_gn_codes="ADM1" ARG_adm1.shp ARG_adm1_annotated.json

    Note that you can specify extension .json to generate GeoJSON instead of a shapefile. Also, make sure to correctly specify the field where the shape names are stored in the shapefile, you can check this by running ogrinfo on the shapefile.
    Finally, it might require some playing around with the Geonames allowed codes and classes. In this case, all provinces were tagged with code ADM1, but depending on what you are working with, you might need to look for other classes or codes. Check the fclass and fcode columns in your geonames DB to check these values.

The new shapefile will contain the latitude, longitude, class, code, country code and admin level codes, as extracted from Geonames.

OGRFeature(ARG_adm1_annotated):1
  fclass (String) = A
  NAME_0 (String) = Argentina
  NAME_1 (String) = Catamarca
  countryCod (String) = AR
  VARNAME_1 (String) = (null)
  geonameid (String) = 3862286
  NL_NAME_1 (String) = (null)
  TYPE_1 (String) = Provincia
  fcode (String) = ADM1
  adminCode4 (String) = (null)
  ID_0 (Integer) = 11
  ID_1 (Integer) = 2
  ISO (String) = ARG
  ENGTYPE_1 (String) = Province
  lat (Real) = -27.000000000000000
  lng (Real) = -67.000000000000000
  adminCode1 (String) = 02
  adminCode2 (String) = (null)
  adminCode3 (String) = (null)
  POLYGON ((...))