Modify

Opened 11 months ago

Closed 11 months ago

Last modified 11 months ago

#22991 closed enhancement (fixed)

[PATCH] Improve precision for boundaries + add subdivisions of Indonesia + add autonomous regions of various countries

Reported by: westnordost Owned by: team
Priority: normal Milestone: 23.06
Component: Core Version:
Keywords: boundaries Cc:

Description (last modified by westnordost)

Changes overview:

  1. Revised all borders to fulfill the "Villages and mayor roads should be on the right side of the border" guideline in terms of precision.
  2. Added regions (islands) of Indonesia
  3. Added autonomous / self-administered subdivisions

For each change, the plus in boundaries.osm size is noted.

(I would have liked to show a visual (geospatial) diff but I have not found a way to properly do that. The geojson-diff ruby gem used by github when looking at a geojson diff may work, but at least on github it tells me that the diff is too large to be displayed. I tried something with QGIS, but I also failed with that - haven't used QGIS before, though.)

1. Revised all borders to fulfill the guideline in terms of precision (+691 KB)

"Villages and mayor roads should be on the right side of the border". This is the guideline I followed with earlier contributions. Yes, this was an extreme amount of work 😰

Most changes were necessary in Africa, India and China because these have become somewhat less blank in the years since this file first has been created, hence, more villages are visible when tracing the boundaries from the OSM map.

E.g. IIRC in the Congo, an area the size of Sicily (+towns) was on the wrong side of the border. In India and China, also thousands of villages were on the wrong side of the province border.

Despite the greatly increased size (=increased precision), the precision in China and India may still not be up to the cited guideline 100% even in places where the villages are already mapped because the borders are so bonkers sometimes and just go straight through populated areas, e.g. see the border between Madhya Pradesh and Uttar Pradesh: https://www.openstreetmap.org/relation/1950071#map=11/25.2847/78.9220
(For some borders, especially in India, I also doubt that what is mapped in OSM is actually correct/precise because sometimes they have almost the same shape as a river that is mapped half a kilometer in another direction and things like that. So, no need for this data to be so precise either, then.)

I expect the file to grow somewhat even further when Africa, India and China get even less blank over time. In retrospect, the cited guidelines is a high goal to aim for. At the same time, to have every road on the right side of the border would be even better. After all, it matters also near to the border on which side you have to drive, and in which language the POIs and street names are, for example.
But at least the density of (named) roads is somewhat higher in populated areas (villages etc.), so I think this is a good compromise.

2. Added regions (islands) of Indonesia (+7 KB)

Indonesia is a big and populous country with significant cultural and language differences. (Indonesia is the most language diverse country in the world, with 500+ languages spoken, IIRC). Also, it is very cheap to add these, because they are islands (=few added geometry)

Added Indonesian subdivisions :

Added autonomous / self-administered subdivisions (+178 KB)

Autonomous regions have a great deal of autonomy and hence have the power to have different legislation for things that concern OpenStreetMap, such as traffic regulations (e.g. Scotland), have different official languages (e.g. Basque Country) or open data in general.
In general, I think this is a useful consistent point to which to include country subdivisions in this file: Other than of large (federated) countries (US, CA, AU, IN, CN, ID, ...), have those subdivisions that have (certain) autonomy, such as the republics within Russia etc.

Source: https://en.wikipedia.org/wiki/List_of_autonomous_areas_by_country

The following were ommitted because there are not any ISO 3166 codes for the self-administered zones / autonomous districts / counties mentioned in the source as autonomous: Bolivia, Myanmar, India, China, Somalia.

Antigua and Barbuda (+1 KB):

  • Barbuda

Bosnia and Herzegovina (+24 KB):

(all regions are autonomous)

  • Federation of Bosnia and Herzegovina
  • Republika Srpska
  • Brčko District

Comoros (+1 KB):

(all regions are autonomous)

  • Anjouan
  • Grande Comore
  • Mohéli

Fiji (+1 KB):

Georgia (+5 KB):

  • Abkhazia (1), (2)
  • Adjara
  • Shida Kartli (1), (2) (Northern part only, as South Ossetia)

Greece (+1 KB):

  • Monastic community of Mount Athos

Indonesia (+ 7KB):

  • Yogyakarta a sultanate within republic of Indonesia
  • Aceh Sharia law is offical law
  • Papua (1)
  • West Papua (1)
  • Highland Papua (1)
  • Central Papua (1)
  • South Papua (1)

Iraq (+ 12KB):

  • Kurdistan Region (1)

Italy (+ 10KB):

  • Aosta Valley (1)
  • Friuli-Venezia Giulia (1)
  • Sardinia
  • Sicily
  • Trentino-Alto Adige (1)

Mauritius (+ 1KB):

  • Rodrigues

Moldavia (+ 7KB):

  • Administrative-Territorial Units of the Left Bank of the Dniester (1), (2) (as Transnistria)
  • Bender (1), (2) (as Transnistria)
  • Gagauzia (1)

Nicaragua (+ 3KB):

  • North Caribbean Coast Autonomous Region
  • South Caribbean Coast Autonomous Region

Pakistan (+ 5KB):

  • Azad Kashmir

Papua New Guinea (+1 KB):

  • Bougainville intents to become independent in a few years

Philippines (+6 KB):

  • Bangsamoro Muslim mayority region

Portugal (+1 KB):

  • Azores
  • Madeira

Saint Kitts and Nevis (+1 KB):

  • Nevis
  • Saint Kitts

São Tomé and Principe (+ 1KB):

  • Principe

Serbia (+ 4KB):

South Korea (+ 1KB):

  • Jeju Province

Spain (+ 68KB):

(all regions are autonomous)

  • Andalusia
  • Aragon (1)
  • Asturias (1)
  • Balearic Islands (1)
  • Basque Country (1)
  • Canary Islands
  • Cantabria
  • Castile and León
  • Castilla-La Mancha
  • Catalonia (1)
  • Extremadura
  • Galicia (1)
  • La Rioja
  • Madrid
  • Murcia
  • Navarre (1)
  • Valencia (1)

Tajikistan (+ 5KB):

  • Badakhshan Mountainous Autonomous Region

Tanzania (+ 2KB):

(actually, Zanzibar+Pemba are autonomous as one but there is no ISO code for all of the subdivisions combined)

  • North Pemba (1)
  • South Pemba (1)
  • North Zanzibar (1)
  • West Zanzibar (1)
  • South Zanzibar (1)

Trinidad and Tobago (+ 1KB):

  • Tobago

United Kingdom (+ 7KB):

  • Wales (1)
  • England not an autonomous region, but the only subdivision that is missing to have them all

Uzbekistan (+ 2KB):

  • Karakalpakstan (1)

(1) different official languages
(2) currently independent state with limited international recognition

Attachments (1)

boundaries_new.osm (2.8 MB ) - added by westnordost 11 months ago.
new boundaries.osm file

Change History (12)

by westnordost, 11 months ago

Attachment: boundaries_new.osm added

new boundaries.osm file

comment:1 by westnordost, 11 months ago

Description: modified (diff)

comment:2 by westnordost, 11 months ago

Continuing from the discussion in #22835, for your information:

The generated file sizes from this data for the countryboundaries library are:

File Size Zipped
boundaries60x30.ser 298 KB 180 KB
boundaries180x90.ser 518 KB 226 KB
Last edited 11 months ago by westnordost (previous) (diff)

comment:3 by taylor.smock, 11 months ago

Milestone: 23.06

This brings up the in-memory representation from 3.52 MiB to 4.68 MiB. I think this will be survivable for most people.

comment:4 by taylor.smock, 11 months ago

FML. I'm going to have to go through it and renumber everything. Please use the dataset from JOSM Preferences -> Advanced Preferences -> More... -> Edit boundaries in the future! (You do need to start JOSM with --debug).

comment:5 by westnordost, 11 months ago

Sorry, I did, but only for the first session. I worked on it for more than a week or so (on and off). For the subsequent sessions, I loaded the saved file.

How do you renumber all the nodes?

comment:6 by westnordost, 11 months ago

According to my back of the napkin calculation, renumbering all the ids to start at -1 will lead to a file size decrease of about 120 kb.

in reply to:  5 comment:7 by taylor.smock, 11 months ago

Replying to westnordost:

Sorry, I did, but only for the first session. I worked on it for more than a week or so (on and off). For the subsequent sessions, I loaded the saved file.

That makes it a bit harder. We cannot use the id from the file for various reasons (which isn't good). I don't know what you could have done, since it was multiple edit sessions.

How do you renumber all the nodes?

This time I'm just doing osmium renumber --start-id=-1 ~/Downloads/boundaries_new.osm -o resources/data/boundaries.osm --overwrite. I don't want to do that in the future, as it is going to effectively reset the ids. This also decreased the raw size of the file from 2930071 bytes to 2642628 bytes (-287443 bytes).

EDIT: It looks like part of that size difference might be due to trimming a space right before the />.
Anyway, max ids:

  • Node: -25351 (down from -140719)
  • Way: -578 (down from -112911)
  • Relation: -55 (down from -99820)
Last edited 11 months ago by taylor.smock (previous) (diff)

comment:8 by westnordost, 11 months ago

Hm, nice. Double of what I estimated. (Now, remove the indentation and newlines ;-) )

comment:9 by taylor.smock, 11 months ago

Now, remove the indentation and newlines

Yeah, no (at least not the source). Maybe in a processing step prior to putting it in the jar, but I don't think it has a significant enough decrease to justify the time.
It is a pretty big decrease from a raw file standpoint (down to 2277203 bytes, -652868 bytes or ~637 KiB), but the compressed difference isn't that much (469510 down to 455420, -14090, or ~14 kb).

comment:10 by taylor.smock, 11 months ago

Resolution: fixed
Status: newclosed

In 18749/josm:

Fix #22991: Improve precision for boundaries + add subdivisions of Indonesia + add autonomous regions of various countries (patch by westnordost, modified)

Changelog:

  • Revised all borders to ensure that almost all "villages and major roads should be on the right side of the border"
  • Add autonomous/self-administered subdivisions
    • Some were omitted, specifically, those that did not have any ISO 3166 code

Modifications to the patch are as follows:

  • ids have been renumbered
    • Node: Max is now -25351 down from -140719
    • Way: Max is now -578 down from -112911
    • Relation: Max is now -55 down from -99820

comment:11 by westnordost, 11 months ago

Thank you, that was fast!

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.