Based on a request on the german mailinglist back in july, I thought about how the perfect localization of the german mapnik style would look like and finaly implemented something which comes close. Unfortunately up till now I did not document it.
However Reading about a map in manx today, I came to the conclusion, that I really need to do this.
First of all I came up with the following assumptions (valid for all languages using latin script IMO):
- always prefer mapped names over automated transliteration
- prefer name:<yourlang> over any other name tags (name:de in my case)
- prefer int_name over non-latin script
- prefer name:en over non-latin script if int_name has not been specified
- transliterate non-latin script as a last resort
So how has this been implemented?
I decided to do it inside the SQL-query. This way it is independent of the rendering Software. It will certainly work at least with mapnik, mapserver and geoserver. Even the proprietary ESRI rendering stuff should actually work 🙂
Basically any rendering system using a PostgreSQL backend can be easily adapted. Of course your database must provide all the required name columns.
So how would one enable rendering a latin name insead of just the generic name tag?
Assume your style uses something like this for rendering a street-name:
SELECT name
FROM planet_osm_line;
Now just replace this by the following:
SELECT get_localized_name(name,"name:de",int_name,"name:en") as name
FROM planet_osm_line;
Quite easy isn’t it?
Well, here comes the (slightly) more complicated stuff…
Of course PostgreSQL does not provide a get_localized_name function out of the box, we have to install it first. So here is how to do this in two steps:
The get_localized_name function has been implemented in PL/pgSQL and is available at http://svn.openstreetmap.org/applications/rendering/mapnik-german/views/get_localized_name.sql.
So first add this function to your database using the following command:
psql -f get_localized_name.sql <your_database>
Second add the transliterate function available at http://svn.openstreetmap.org/applications/rendering/mapnik-german/utf8translit/.
To compile and install it on GNU/Linux (sorry, I don’t care about Windows) do the following:
- svn co http://svn.openstreetmap.org/applications/rendering/mapnik-german/utf8translit
- Install the Server dev package (On Debian/Ubuntu this would be called postgresql-server-dev-x.y, postgresql-server-dev-9.2 in my case)
- Install the libicu-dev package
- compile and install calling make; make install
- On Debian/Ubuntu you would be better off using dpkg-buildpackage and install the resulting package instead of using the make install procedure.
Now enable the function from the shared object using the following SQL command (from a postgresql admin account):
CREATE FUNCTION transliterate(text)RETURNS text
AS '$libdir/utf8translit', 'transliterate' LANGUAGE C STRICT;
Here is how to check if this works:
mydb=> select transliterate('Москва́');
transliterate
---------------
Moskvá
(1 row)
Well that’s it, I hope that this will be useful for some people.
Unfortunately this stuff has currently (at least) two problems:
- Transliteration of Thai Language uses ISO 11940 instead of the RTGS system
- Transliteration of japanese Kanji characters end up with a chinese transliteration (e.g. dōng jīng instead of Tōkyō for 東京)
If anybody has some suggestions on how to solve these please post them here!
Hi
We are doing transliteration in our iPhone App “MapOut” as well. And had the same problem with Japanese transliteration – we solved it by using a Japanese transliteration utility from Java Lucene (Apache License)
http://www.java2s.com/Open-Source/Java-Open-Source-Library/Search/solr/org/apache/lucene/analysis/ja/util/ToStringUtil.java.htm
It is basically a Katakana->Romanji conversion (this is the typical used one for transliterating Japanese text). We basically do it in two steps:
First transform the Japanese text to Katakana (using Kakasi – but ICU could work as well, Kakasi seems slightly better) and the transform this Katakana text to Romanji using the Lucene helper class described above.
One problem is, that from the characters itself it’s not possible (at least I have not found a way yet) to find out if the text is Chinese or Japanese, so I will first do a region check: If the tag is inside our Japanese polygon, we do a Romanji conversion, if outside we do the standard Chinese transliteration.
I’m still interested in Arabic transliteration (seems quite challenging, because they omit vowels in writing) – so if you have any points regarding this, any help would be greatly appreciated.
I did not think about arabic that much. I just use what libicu produces. See here for an example: http://openstreetmap.de/karte.html?lat=35.69623&lon=51.40953&zoom=15&layers=B000TF
Yes, we do the same – but for names without any English translations it produces very hard to guess names, e.g.
Kabul:
کابل -> Kabl
Abu Dhabi
أبو ظبي -> Abw Zby
Sanaa
صنعاء -> Snʿaʾ
(those names are actually no problem in OSM, because those mega cities all have proper English name tags as well – but if ICU fails on those I think transliterated street names would be of now real benefit).