Modify

Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#14833 closed enhancement (fixed)

[PATCH] Selectively added details to data/boundaries.xml

Reported by: westnordost Owned by: Don-vip
Priority: minor Milestone: 17.06
Component: Core Version:
Keywords: boundaries, geocoding Cc: Klumbumbus

Description

I selectively added detail to the boundaries.xml globally for populated areas. In detail, I

  • added detail to the border so that villages near the border are on the correct side of the border
  • added much detail to towns that run alongside the border

The above only for international borders.

I edited the file with JSON, saved it and then reduced the precision of lat lons back to 5 decimal points (with a text editor) like it is in the current repository version.

Posting a patch does not really make sense here, as every line differs because JOSM generates new ids. I am posting it anyway, plus the new version of the file.

Attachments (7)

boundaries.patch (3.1 MB ) - added by westnordost 7 years ago.
boundaries.osm (1.5 MB ) - added by westnordost 7 years ago.
full new file
old-file-reference.png (184.6 KB ) - added by michael2402 7 years ago.
new-file-output.png (184.0 KB ) - added by michael2402 7 years ago.
new-file-differences.png (19.5 KB ) - added by michael2402 7 years ago.
changes.png (230.4 KB ) - added by michael2402 7 years ago.
josm_keep_ids.patch (9.0 KB ) - added by Don-vip 7 years ago.

Change History (28)

by westnordost, 7 years ago

Attachment: boundaries.patch added

by westnordost, 7 years ago

Attachment: boundaries.osm added

full new file

comment:1 by Klumbumbus, 7 years ago

Cc: Klumbumbus added

in reply to:  description comment:2 by Don-vip, 7 years ago

Replying to osm@…:

Posting a patch does not really make sense here, as every line differs because JOSM generates new ids.

I started a patch to change that, in order to be able to review changes on this file.

comment:3 by westnordost, 7 years ago

That sounds reasonable.
However, it does not affect this patch, as the work has been done and the "original" ids cannot retroactively be recovered, even when future JOSM versions will not generate new ids.

comment:4 by Don-vip, 7 years ago

The problem is that without the patch, it's almost impossible to review changes... I didn't think someone would provide a patch so soon :( I'll let you know how I will handle your contribution.

comment:5 by westnordost, 7 years ago

Two things come to my mind:

  • create a visual diff by subtracting the repos geometry from the patch geometry. Not sure if this is possible without too much effort in Josm, otherwise perhaps in QGis?
  • Junit test that checks whether geometry is valid (closed ways etc), using the same checks as they are used before upload

by michael2402, 7 years ago

Attachment: old-file-reference.png added

by michael2402, 7 years ago

Attachment: new-file-output.png added

by michael2402, 7 years ago

Attachment: new-file-differences.png added

by michael2402, 7 years ago

Attachment: changes.png added

comment:6 by michael2402, 7 years ago

I just abused the MapCSS test:

Last edited 7 years ago by michael2402 (previous) (diff)

comment:7 by anonymous, 7 years ago

Assuming this is the right place to ask:

Some time after this patch has been merged, I plan to further split some countries into its provinces the same it has been done for US, AU and CA.
The reason is to be able to more precisely capture intra-country differences.

For example in India, many of the provinces each have a different language (and script!) as official language, same with China (Cantonese etc.). Then, there is Belgium with the North speaking Dutch, the South speaking French.
And even more so in many countries in Afrika.

So my question is whether I can expect that this kind of change would be merged?

comment:8 by westnordost, 7 years ago

Whoops, that was me who was asking.

comment:9 by Don-vip, 7 years ago

I have not yet merged the patch. The reason why I split US/CA/AU into smaller units it's because the subentities (states, provinces) have a high degree of autonomy and sometimes different laws or regulations (ex: speed limits). This situation appears to be quite rare in the world, where generally the law does not differ in administrative units, so it was light.

Languages, however, are another subject. I'm afraid the file would become simply too big if we go into this level of detail, see https://en.wikipedia.org/wiki/List_of_multilingual_countries_and_regions

Before you start working on this, you should estimate about how much the data would increase. If it's two or three times bigger, I'd say no.

comment:10 by Don-vip, 7 years ago

Milestone: 17.06

comment:11 by Don-vip, 7 years ago

Owner: changed from team to Don-vip
Status: newassigned

comment:12 by bastiK, 7 years ago

The boundaries data currently accounts for 236 kB of the distributed .jar file (about 2%). This is okay, but I say we should try to keep it under 300 kB, including future refinements.

If you have a specific use case, e.g. a custom map style or validation rule: We can introduce a system where you can ship a boundaries file along with your .mapcss file (just like icons).

comment:13 by michael2402, 7 years ago

I don't think that the distribution size is our biggest problem there

The main problem I see with a more detailed file is a performance problem: Many rules (like left/right hand traffic) require us to query it. And we need to load it on every JOSM startup.

in reply to:  13 comment:14 by bastiK, 7 years ago

Replying to michael2402:

The main problem I see with a more detailed file is a performance problem: Many rules (like left/right hand traffic) require us to query it.

This query is highly optimized. On average, it shouldn't be a significantly slower than something like :in-downloaded-area (please prove me wrong).

And we need to load it on every JOSM startup.

True, but it can be loaded in parallel to other tasks, which is a plus. (This seems to be broken at the moment.) Before making such a judgment, I'd prefer to do some tests, e.g. replace the file with one 3 times the size and compare startup time.

comment:15 by westnordost, 7 years ago

If you have a specific use case, e.g. a custom map style or validation rule

My specific use case is actually that I use this file in my own OSM project, StreetComplete, to add location based intelligence. So, basically the same reason as why the file exists for JOSM. With this patch, I just wanted to contribute the enhancements upstream.

Currently, I use the following data (perhaps relevant for JOSM) data per country/region and use the boundaries file to determine which region it applies to:

  • which speed unit is used (for preselecting whether to use mph or km/h)
  • which sports are popular (for order of selectable sports for pitches)
  • first day of workweek and number of regular shopping days (to preselect the right days when adding opening hours to a place)
  • regex for determining if a housenumber seems valid (i.e. "1 bis" in France is valid etc.)
  • list of languages sorted by officiality, importance (for detecting and automatically un-abbreviating street name abbreviations, for offering the right keyboard layout when inputing names etc.)

Regarding size:
I think the size could be reduced quite a lot if each country would be a relation of its boundaries. So, each boundary line is used by two countries. But to do this manually is not worth the effort. Perhaps export to TopoJSON and simplify, then import from TopoJSON(?)

in reply to:  15 comment:16 by bastiK, 7 years ago

Replying to osm@…:

If you have a specific use case, e.g. a custom map style or validation rule

My specific use case is actually that I use this file in my own OSM project, StreetComplete, to add location based intelligence. So, basically the same reason as why the file exists for JOSM. With this patch, I just wanted to contribute the enhancements upstream.

Currently, I use the following data (perhaps relevant for JOSM) data per country/region and use the boundaries file to determine which region it applies to:

  • which speed unit is used (for preselecting whether to use mph or km/h)
  • which sports are popular (for order of selectable sports for pitches)
  • first day of workweek and number of regular shopping days (to preselect the right days when adding opening hours to a place)
  • regex for determining if a housenumber seems valid (i.e. "1 bis" in France is valid etc.)
  • list of languages sorted by officiality, importance (for detecting and automatically un-abbreviating street name abbreviations, for offering the right keyboard layout when inputing names etc.)

Sounds quite interesting, the additional data could be of some value. If not as part of the main distributed jar file, then possibly for a plugin or similar.

Regarding size:
I think the size could be reduced quite a lot if each country would be a relation of its boundaries. So, each boundary line is used by two countries.

Removing this kind of verbatim duplication is what a zip algorithms is good at. So in the end, this may not improve the (zipped) file size at all.

comment:17 by Don-vip, 7 years ago

Resolution: fixed
Status: assignedclosed

In 12430/josm:

fix #14833 - Selectively added details to data/boundaries.xml (patch by westnordost, modified to remove 3 duplicate nodes, keep upload=never, strip last 0 in coordinates to reduce 7 Kb)

in reply to:  17 comment:18 by Don-vip, 7 years ago

The patch is merged with some minor modifications described above. Thanks for this submission :) Can you please tell me how do you want to be credited? I didn't include your e-mail address as it would have become public information.

In the next few days I'll see how to integrate my patch that allows JOSM to keep primitive IDs. This way, future patches to this file will be easier to review.

by Don-vip, 7 years ago

Attachment: josm_keep_ids.patch added

comment:19 by westnordost, 7 years ago

You can use my real name (Tobias Zwick) or alias (westnordost), but credit is not required.

comment:20 by Don-vip, 7 years ago

In 12479/josm:

see #14833 - fix DataSet copy constructor, add unit test

comment:21 by Don-vip, 7 years ago

In 12536/josm:

see #14833 - new API to manipulate primitives id counter

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Don-vip.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.