GeoNames2Sql
GeoNames2Sql is a tool built by the Gaois research group in Fiontar & Scoil na Gaeilge, DCU to import Gazetteer data from the GeoNames geographical database and store it in a SQL Server instance. It is essentially an abstraction over the NGeoNames library, adding a Command-line Interface and the tools to generate a SQL database. GeoNames2Sql is implemented in C# as a .NET Core 2.1 Console Application, meaning it can run cross-platform. This implementation presumes a SQL Server data store but it should be relatively simple to implement providers for PostgreSQL or other relational database types.
Status
This application addresses the requirements of the Gaois research group. It is not, as yet, comprehensive for all other use cases. The output data set includes tables for GeoNames, alternate names and country info. There is not, at this time, provision for feature codes, admin codes, time zones and the other reference lists included in the full GeoNames data set as these items fall outside the scope of our requirements. These items may be included in the future, if the need arises, and we also welcome pull requests from other users.
Installation and setup
Database
- Give the application permissions to a database.
- Run the SQL script to create the database tables.
- Optionally, create indexes for your tables. Some sample indexes are provided here.
The database schema mirrors the GeoNames Gazetteer data structure described here.
Application
Clone the repository to your machine:
git clone https://github.com/gaois/GeoNames2Sql
Then, publish the .NET Core project in src/GeoNames2Sql
, specifying your target runtime environment, e.g.:
dotnet publish -r win10-x64
Tip: See a list of target runtime identifiers here.
This will output a collection of dynamic linked libraries (.dll files) and an appsettings.json file. Copy these files to the directory in which you wish to run the program, if necessary. Alternatively, you can just run the application from within Visual Studio.
Usage
GeoNames2Sql allows you to compose the contents of your output database. For example, you may not need in-depth coverage of all countries in the world or you may only need a subset of alternate language data. The exact makeup of the target database can be specified within the configuration file.
Configuration
Use the appsettings.json file to configure your target database, e.g.:
{
"ConnectionString": "Server=localhost;Database=geonames;Trusted_Connection=True;",
"DataDirectory": "C:/PATH/TO/DIRECTORY",
"GeoNames": {
"AllCountries": false,
"AlternateNamesLanguages": [ "en", "ga" ],
"CitiesMinimumPopulation": 15000,
"Countries": ["AT", "BE", "CA", "CH", "CZ", "DE", "DK", "EE", "ES", "FI", "FR", "GB", "GR", "IE", "IM", "IT", "LU", "MT", "NL", "NO", "PL", "PT", "US", "SE", "no-country" ],
"CountryInfo": true
}
}
The table below outlines the configuration properties.
Key | Description |
---|---|
ConnectionString | SQL Server database connection string. |
DataDirectory | Where you intend to store the raw data dump files which will be downloaded from the GeoNames data store prior to database insert. |
GeoNames:AllCountries | If true, GeoNames data will be retrieved for all countries in the GeoNames Gazetteer. If true, neither the CitiesMinimumPopulation nor Countries values need be specified. |
GeoNames:AlternateNamesLanguages | Specify the languages for which you wish to have alternate toponymic names. Provide a comma-separated array of ISO two-letter language codes. If empty, no alternate languages data will be downloaded. |
GeoNames:CitiesMinimumPopulation | If set, GeoNames data will be retrieved for all cities with at least the minimum population specified. The possible values are 500, 1,000, 5,000 or 15,000. |
GeoNames:Countries | If set, GeoNames data pertaining to the specified countries will be retrieved. Provide a comma-separated array of ISO Alpha-2 country codes. |
GeoNames:CountryInfo | If true, the CountryInfo table will be populated. |
Note that the configuration describes an additive model. For example, the sample configuration above will output a database with the following characteristics:
- It will contain all GeoNames data for a specified list of countries.
- It will also contain GeoNames data for all cities globally with a population of over 15,000.
- It will populate the AlternateNames table with data in two languages.
- It will populate the CountryInfo table with all relevant data.
Some other points to note:
- You will not need to retrieve the GeoNames raw data dumps yourself. The application will do this for you. This is why you specify the
DataDirectory
where these files will be stored. - Some GeoNames raw text files range from 0.5-1.5GB unzipped, depending on the configuration you choose. Ensure you have adequate disk space on the drive where your
DataDirectory
is located. - The time it takes to populate the database varies according to the configuration, e.g. whether all countries or a subset of countries are specified, how many alternate languages are specified, etc.
Populating the database
Having specified the configuration you can now run the program. Change directory to the location of the published GeoName2Sql program and run the following command:
dotnet ./GeoNames2Sql.dll
The program will start, validate your configuration and describe the makeup of the output database. Follow the instructions on screen to proceed.
Updating the database
The GeoNames Gazetteer is updated on an ongoing basis and you may want to periodically bring the data in your SQL database up to speed with the latest changes. To do this, simply re-run the program. GeoNames2Sql will merge any updated data into the current database in a performant manner: no valid pre-existing data are thrown away and the process should not place any undue load on the target database, even in production environments. If you wish to alter the makeup of your target database, for example, to include data from additional countries or extra alternate languages, you only need to update the configuration file accordingly.