Modify

Opened 4 weeks ago

Closed 36 hours ago

Last modified 13 hours ago

#24579 closed defect (fixed)

boundaries.osm does not contain all exclaves

Reported by: Vectorial1024 Owned by: team
Priority: normal Milestone:
Component: Core Version:
Keywords: Cc:

Description

Basically, I have been told the JOSM team maintains the boundaries.osm file, and I have an issue with that file.

TL;DR: the boundaries.osm file does not contain all exclaves, which causes issues downstream. This may have something to do with multipolygon boundaries not being correctly extracted from OSM data.

This issue was discovered downstream while reviewing the boundaries of both Hong Kong and Macao. For more background (downstream) information, (e.g. illustrative diagrams, explanations, ...), see e.g. https://github.com/westnordost/countryboundaries/issues/25 and also https://github.com/graphhopper/graphhopper/issues/3253

Attachments (2)

boundaries_diff (10.8 KB ) - added by Vectorial1024 3 weeks ago.
multipolygon_fix_diff (2.2 KB ) - added by Vectorial1024 38 hours ago.

Download all attachments as: .zip

Change History (36)

comment:1 by stoecker, 3 weeks ago

If you provide a patch we'll apply it.

comment:2 by stoecker, 3 weeks ago

P.S. Boundaries file was never meant to be an exact representation of the political situation of the world. That's simply too fast changing and has too many disputed situations. It's simply a means to apply certain country specific rules.

comment:3 by Vectorial1024, 3 weeks ago

These few days I have used JOSM to draw the new polygons onto the existing boundaries.osm file, but I have noticed that JOSM would scramble the node IDs when I save the modified file. The produced diff would include the ID scrambling, and is too noisy.

How were past patches made? It seems past patches could preserve the negative IDs and not scramble them. Or, is ID scrambling somehow expected, and did happen in the past?

comment:4 by GerdP, 3 weeks ago

I think there is code to conserve the ids for this special purpose, but I don't remember how exactly one has to activate it.

comment:5 by Vectorial1024, 3 weeks ago

Upon closer look, it turns out it was not an "ID scrambling", but an "ID shift"; all (negative) IDs got shifted by a constant, which I presume to be the "next available negative ID".

From what I see, JOSM doesn't have any direct way to preserve IDs. But, seeing all those IDs are nicely sorted down from -1, I can guess the last guy used "osmium renumber" to sort the IDs. (Sidetrack: does JOSM have any method to "chain" with osmium?)

imo this pipeline (JOSM -> "osmium renumber" -> boundaries.osm) is worth being documented somewhere prominent, perhaps as an FAQ.

Anyhow, let's say I have a good diff file. Then, how should I submit it to here? I followed online tutorials and got TortoiseSVN to produce diff files, but I am unfamiliar with the usual SVN workflow.

comment:6 by GerdP, 3 weeks ago

Just attach the diff file here and I'll have a look soon.

in reply to:  5 comment:7 by gaben, 3 weeks ago

Replying to Vectorial1024:

I followed online tutorials and got TortoiseSVN to produce diff files, but I am unfamiliar with the usual SVN workflow.

I usually use Git diffs, not much of a difference.

comment:8 by Vectorial1024, 3 weeks ago

The diff file has been attached. This patch has the following:

  • Added exclave of Hong Kong: Shenzhen Bay Control Point
  • Added exclave of Macau: Universidade de Macau
  • Modified existing HK-CN border to reduce intersection with CN roads

Hengqin Port Area (Macau) is omitted from this patch because according to local information, Macau only has limited jurisdiction in that area (specific building levels only), which does not constitute a proper exclave.

Please review this patch, thanks!

comment:9 by GerdP, 3 weeks ago

Is it intended that the exclave is not a member of the relation with tag ISO3166-2=CN-GD?

in reply to:  9 comment:10 by Vectorial1024, 3 weeks ago

Replying to GerdP:

Is it intended that the exclave is not a member of the relation with tag ISO3166-2=CN-GD?

Correct.

It turns out I have missed that CN-GD is actually a multipolygon with CN-HK and CN-MO as inner. I will provide a new patch.

by Vectorial1024, 3 weeks ago

Attachment: boundaries_diff added

comment:11 by Vectorial1024, 3 weeks ago

The patch has been updated to exclude the exclaves from CN-GD. Some of the HK-CN borders have also been adjusted upon review.

comment:12 by GerdP, 2 weeks ago

Resolution: fixed
Status: newclosed

In 19451/josm:

fix #24579: boundaries.osm does not contain all exclaves
Patch boundaries_diff by Vectorial1024.
This patch has the following:

  • Added exclave of Hong Kong: Shenzhen Bay Control Point
  • Added exclave of Macau: Universidade de Macau
  • Modified existing HK-CN border to reduce intersection with CN roads

comment:13 by GerdP, 2 weeks ago

Thanks for the patch, the build was successfull. I don't know what else needs to be done to solve your problem with the 3rd party tools that use the data in boundaries.osm

comment:14 by Vectorial1024, 2 weeks ago

That is good to hear. I believe the work of the JOSM team should be complete, I can take it from here.

I don't know the details either, but from what I understand, I need to perform a "stack pop" just like how calling functions work in most programming languages, and bring this information downstream.

Thank you again for the guidance.

comment:15 by stoecker, 4 days ago

Resolution: fixed
Status: closedreopened

JOSM-Inetgration complains about this:

HK ==> expected: <1> but was: <2>

Stacktrace

org.opentest4j.AssertionFailedError: HK ==> expected: <1> but was: <2>
	at org.openstreetmap.josm.data.BoundariesTestIT.testBoundariesFile(BoundariesTestIT.java:64)

comment:16 by GerdP, 4 days ago

I don't understand that. I think the test passed when I tried it on my machine. I also don't understand why it complains.

comment:17 by stoecker, 4 days ago

Maybe the test is not fit yet for exclaves?

comment:18 by GerdP, 4 days ago

The previous version of the file contained exclaves. The change simply added two more. I've compared the tags and find no difference regarding the iso codes.

comment:19 by Vectorial1024, 3 days ago

Let's see if we have some insights.

I tried to add an exclave to the file. A well-known exclave irl is Kaliningrad, which is currently an exclave of Russia.

If we are thinking about Kaliningrad, then I can see that Kaliningrad has its own ISO3166-2 code. Interestingly, the Kaliningrad polygon in the boundaries.osm file does not have ISO3166-1:alpha2 code, but belongs to the multipolygon of Russia instead.

For a long time because Hong Kong only has a single polygon, it exists directly as a polygon. If I were to add an exclave to it, then does this mean I should create a new multipolygon to represent Hong Kong?

This should similarly affect Macau, just that the test probably stopped early because HK is sorted before MO.

comment:20 by stoecker, 3 days ago

If we are thinking about Kaliningrad, then I can see that Kaliningrad has its own ISO3166-2 code. Interestingly, the Kaliningrad polygon in the boundaries.osm file does not have ISO3166-1:alpha2 code, but belongs to the multipolygon of Russia instead.

Yes. I think that's the problem. ISO3166-1:alpha2 should be in the relation instead of duplicate on the ways.

comment:21 by GerdP, 3 days ago

If you found a fix please commit it. I tried different patches and none worked.

comment:22 by Vectorial1024, 3 days ago

I think another patch is needed to change the existing HK and MO geometry into a proper multipolygon.

by Vectorial1024, 38 hours ago

Attachment: multipolygon_fix_diff added

comment:23 by Vectorial1024, 38 hours ago

I have attached multipolygon_fix_diff file to fix the duplicated geometry error for both Hong Kong and Macau. Please review.

comment:24 by stoecker, 36 hours ago

Resolution: fixed
Status: reopenedclosed

In 19473/josm:

fix #24579 - fix integration test for boundaries

comment:25 by stoecker, 36 hours ago

Let's see what the tests say :-) I also though the fix should look similar to that, but didn't go into detail yet.

comment:26 by stoecker, 22 hours ago

Right way, but not there yet :-)

CN-HK ==> expected: <1> but was: <3>

comment:27 by gaben, 22 hours ago

A while ago, I wondered why the dataset is manually crafted. If there is a live dataset, using a pipeline to perform processing tasks such as removing unnecessary tags, adding extra tags if needed and simplifying geometries probably would be easier than maintaining the dataset separately. What is the history of this file?

comment:28 by stoecker, 17 hours ago

In 19474/josm:

see #24579 - fix integration test for boundaries

in reply to:  27 comment:29 by stoecker, 17 hours ago

Replying to gaben:

A while ago, I wondered why the dataset is manually crafted. If there is a live dataset, using a pipeline to perform processing tasks such as removing unnecessary tags, adding extra tags if needed and simplifying geometries probably would be easier than maintaining the dataset separately. What is the history of this file?

Well. Somewhen it was necessary to have such information for something (I think it replaced something which was there before for left/right traffic) and then it got used for more and more purposes. Then test came to check validity. And so on.

As always it wasn't designed like this from the beginning. And as it's changed seldom, inbetween everybody forgot how to do it :-)

comment:30 by gaben, 15 hours ago

And as it's changed seldom, inbetween everybody forgot how to do it :-)

Well, great. I'll put it on my never ending todo list to make it more maintainable, because it was a headache before this ticket too.

comment:31 by stoecker, 15 hours ago

There is a list here: ToDo - 2737 entries ATM :-)

comment:32 by stoecker, 14 hours ago

In 19476/josm:

fix left hand traffic detection for new areas, add Shenzhen Bay to tests, see #24579

comment:33 by Vectorial1024, 13 hours ago

I see there is another changeset that modifies the boundaries.osm file; do I need to make another diff patch to fix the broken tests?

That aside, I would think this boundaries.osm file was created because someone wanted a quick solution to check which country a (land) point is located in. Maybe many years ago the ecosystem just wasn't there, so no one could build a pipeline to generate the boundary file. Maybe they knew that checking polygon-contains with a many-edge polygon is expensive and they didn't need the 100% accuracy, they only needed like 99.99% accuracy and handcrafting the boundaries was good enough for the overwhelming majority of cases.

comment:34 by stoecker, 13 hours ago

I think I got it right now. Though I also was sure the last time :-)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.