#24579 closed defect (fixed)
boundaries.osm does not contain all exclaves
| Reported by: | Vectorial1024 | Owned by: | team |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | Core | Version: | |
| Keywords: | Cc: |
Description
Basically, I have been told the JOSM team maintains the boundaries.osm file, and I have an issue with that file.
TL;DR: the boundaries.osm file does not contain all exclaves, which causes issues downstream. This may have something to do with multipolygon boundaries not being correctly extracted from OSM data.
This issue was discovered downstream while reviewing the boundaries of both Hong Kong and Macao. For more background (downstream) information, (e.g. illustrative diagrams, explanations, ...), see e.g. https://github.com/westnordost/countryboundaries/issues/25 and also https://github.com/graphhopper/graphhopper/issues/3253
Attachments (2)
Change History (36)
comment:1 by , 3 weeks ago
comment:2 by , 3 weeks ago
P.S. Boundaries file was never meant to be an exact representation of the political situation of the world. That's simply too fast changing and has too many disputed situations. It's simply a means to apply certain country specific rules.
comment:3 by , 3 weeks ago
These few days I have used JOSM to draw the new polygons onto the existing boundaries.osm file, but I have noticed that JOSM would scramble the node IDs when I save the modified file. The produced diff would include the ID scrambling, and is too noisy.
How were past patches made? It seems past patches could preserve the negative IDs and not scramble them. Or, is ID scrambling somehow expected, and did happen in the past?
comment:4 by , 3 weeks ago
I think there is code to conserve the ids for this special purpose, but I don't remember how exactly one has to activate it.
follow-up: 7 comment:5 by , 3 weeks ago
Upon closer look, it turns out it was not an "ID scrambling", but an "ID shift"; all (negative) IDs got shifted by a constant, which I presume to be the "next available negative ID".
From what I see, JOSM doesn't have any direct way to preserve IDs. But, seeing all those IDs are nicely sorted down from -1, I can guess the last guy used "osmium renumber" to sort the IDs. (Sidetrack: does JOSM have any method to "chain" with osmium?)
imo this pipeline (JOSM -> "osmium renumber" -> boundaries.osm) is worth being documented somewhere prominent, perhaps as an FAQ.
Anyhow, let's say I have a good diff file. Then, how should I submit it to here? I followed online tutorials and got TortoiseSVN to produce diff files, but I am unfamiliar with the usual SVN workflow.
comment:7 by , 3 weeks ago
Replying to Vectorial1024:
I followed online tutorials and got TortoiseSVN to produce diff files, but I am unfamiliar with the usual SVN workflow.
I usually use Git diffs, not much of a difference.
comment:8 by , 3 weeks ago
The diff file has been attached. This patch has the following:
- Added exclave of Hong Kong: Shenzhen Bay Control Point
- Added exclave of Macau: Universidade de Macau
- Modified existing HK-CN border to reduce intersection with CN roads
Hengqin Port Area (Macau) is omitted from this patch because according to local information, Macau only has limited jurisdiction in that area (specific building levels only), which does not constitute a proper exclave.
Please review this patch, thanks!
follow-up: 10 comment:9 by , 3 weeks ago
Is it intended that the exclave is not a member of the relation with tag ISO3166-2=CN-GD?
comment:10 by , 3 weeks ago
Replying to GerdP:
Is it intended that the exclave is not a member of the relation with tag
ISO3166-2=CN-GD?
Correct.
It turns out I have missed that CN-GD is actually a multipolygon with CN-HK and CN-MO as inner. I will provide a new patch.
by , 3 weeks ago
| Attachment: | boundaries_diff added |
|---|
comment:11 by , 3 weeks ago
The patch has been updated to exclude the exclaves from CN-GD. Some of the HK-CN borders have also been adjusted upon review.
comment:13 by , 2 weeks ago
Thanks for the patch, the build was successfull. I don't know what else needs to be done to solve your problem with the 3rd party tools that use the data in boundaries.osm
comment:14 by , 2 weeks ago
That is good to hear. I believe the work of the JOSM team should be complete, I can take it from here.
I don't know the details either, but from what I understand, I need to perform a "stack pop" just like how calling functions work in most programming languages, and bring this information downstream.
Thank you again for the guidance.
comment:15 by , 4 days ago
| Resolution: | fixed |
|---|---|
| Status: | closed → reopened |
JOSM-Inetgration complains about this:
HK ==> expected: <1> but was: <2> Stacktrace org.opentest4j.AssertionFailedError: HK ==> expected: <1> but was: <2> at org.openstreetmap.josm.data.BoundariesTestIT.testBoundariesFile(BoundariesTestIT.java:64)
comment:16 by , 4 days ago
I don't understand that. I think the test passed when I tried it on my machine. I also don't understand why it complains.
comment:18 by , 4 days ago
The previous version of the file contained exclaves. The change simply added two more. I've compared the tags and find no difference regarding the iso codes.
comment:19 by , 3 days ago
Let's see if we have some insights.
I tried to add an exclave to the file. A well-known exclave irl is Kaliningrad, which is currently an exclave of Russia.
If we are thinking about Kaliningrad, then I can see that Kaliningrad has its own ISO3166-2 code. Interestingly, the Kaliningrad polygon in the boundaries.osm file does not have ISO3166-1:alpha2 code, but belongs to the multipolygon of Russia instead.
For a long time because Hong Kong only has a single polygon, it exists directly as a polygon. If I were to add an exclave to it, then does this mean I should create a new multipolygon to represent Hong Kong?
This should similarly affect Macau, just that the test probably stopped early because HK is sorted before MO.
comment:20 by , 3 days ago
If we are thinking about Kaliningrad, then I can see that Kaliningrad has its own ISO3166-2 code. Interestingly, the Kaliningrad polygon in the boundaries.osm file does not have ISO3166-1:alpha2 code, but belongs to the multipolygon of Russia instead.
Yes. I think that's the problem. ISO3166-1:alpha2 should be in the relation instead of duplicate on the ways.
comment:21 by , 3 days ago
If you found a fix please commit it. I tried different patches and none worked.
comment:22 by , 3 days ago
I think another patch is needed to change the existing HK and MO geometry into a proper multipolygon.
by , 38 hours ago
| Attachment: | multipolygon_fix_diff added |
|---|
comment:23 by , 38 hours ago
I have attached multipolygon_fix_diff file to fix the duplicated geometry error for both Hong Kong and Macau. Please review.
comment:25 by , 36 hours ago
Let's see what the tests say :-) I also though the fix should look similar to that, but didn't go into detail yet.
follow-up: 29 comment:27 by , 22 hours ago
A while ago, I wondered why the dataset is manually crafted. If there is a live dataset, using a pipeline to perform processing tasks such as removing unnecessary tags, adding extra tags if needed and simplifying geometries probably would be easier than maintaining the dataset separately. What is the history of this file?
comment:29 by , 17 hours ago
Replying to gaben:
A while ago, I wondered why the dataset is manually crafted. If there is a live dataset, using a pipeline to perform processing tasks such as removing unnecessary tags, adding extra tags if needed and simplifying geometries probably would be easier than maintaining the dataset separately. What is the history of this file?
Well. Somewhen it was necessary to have such information for something (I think it replaced something which was there before for left/right traffic) and then it got used for more and more purposes. Then test came to check validity. And so on.
As always it wasn't designed like this from the beginning. And as it's changed seldom, inbetween everybody forgot how to do it :-)
comment:30 by , 15 hours ago
And as it's changed seldom, inbetween everybody forgot how to do it :-)
Well, great. I'll put it on my never ending todo list to make it more maintainable, because it was a headache before this ticket too.
comment:33 by , 13 hours ago
I see there is another changeset that modifies the boundaries.osm file; do I need to make another diff patch to fix the broken tests?
That aside, I would think this boundaries.osm file was created because someone wanted a quick solution to check which country a (land) point is located in. Maybe many years ago the ecosystem just wasn't there, so no one could build a pipeline to generate the boundary file. Maybe they knew that checking polygon-contains with a many-edge polygon is expensive and they didn't need the 100% accuracy, they only needed like 99.99% accuracy and handcrafting the boundaries was good enough for the overwhelming majority of cases.



If you provide a patch we'll apply it.