Modify

Opened 7 years ago

Last modified 3 years ago

#5421 new enhancement

Please make SimplifyArea plugin to be part of the JOSM core

Reported by: *Martin* Owned by: team
Priority: normal Milestone:
Component: Core Version:
Keywords: simplify way area Cc:

Description (last modified by Don-vip)

Please make SimplifyArea plugin to be the part of the JOSM core. Simplify Area action has become one of my essential operations performed very often on various areas (buildings, landuses, ...). I've created the plugin because the built-in Simplify Way is absolutely not suitable for this purpose and mostly makes more damage than profit. IMHO SimplifyArea is more suitable even for simplifying ways than Simplify Way itself.

Please try this plugin on your own and you will experience what I am talking about :-). The sources are commited to your SVN.

Attachments (5)

sample.osm (4.2 KB) - added by *Martin* 7 years ago.
sample
sample2.osm (747 bytes) - added by *Martin* 7 years ago.
even better sample
rotated.osm (4.8 KB) - added by bastiK 7 years ago.
sample3.osm (1.2 KB) - added by *Martin* 7 years ago.
Simplify Way - Fail! Simplify Area - Win!
simplify-area-leads-to-straight-line.osm (779 bytes) - added by bastiK 7 years ago.
Example where simplify area leads to nodes on a straight line (use with default parameters, 21873)

Download all attachments as: .zip

Change History (35)

comment:1 Changed 7 years ago by bastiK

Hi, can you give some examples (best would be *.osm file) to see how it performs in comparison to simplify way? Not only your cadastre use case, but also gpx file that was converted to osm data, etc.

In my opinion we shouldn't have 2 tools in JOSM that do the same, so it would mean to remove the old feature. Therefore, the new tool must be an improvement for all use cases.

It would also be interesting to compare the same file with Merkaartor simplify tool. It uses Ramer–Douglas–Peucker (see wikipedia), which seems like a sensible algorithm to me at first sight.

Changed 7 years ago by *Martin*

Attachment: sample.osm added

sample

comment:2 Changed 7 years ago by anonymous

I've attached the sample. My simplify-way.max-error is set to 1. Problems of simplify way:

  • Building A - simplify Way will make the building bigger with wrong-angled wall
  • Building B - incorrectly simplified detail
  • Building C - the obviously redundant node is not removed. This may be caused because the note is beginning /end of the path. My Simplify Area can handle this and make another node to be the beginning/end one. I think this can be fixed in Simplify Way too.
  • Building D - we can't be sure which node should be removed and so Simplify Area will average them.

Even better example I lately managed to create, after attaching sample.osm: see the sample2.osm. Simplify Way removes the node on the building's corner!!. Simplify Area correctly removes the redundant node.

My goals when simplifying buildings were to:

  • average nearby nodes
  • remove nodes that are on the almost-straight line - they are lying on a angle near 180 degree
  • removed node should not make the building area to be changed more than by threshold area

Maybe Simplify Way is better on standard highways, but from my experience (or experience of Slovak OSM community) is Simplify Area much more appropriate on buildings, or landuses as forest, etc. I tried to tune the simplify-way.max-error with different values but still without any great success. BTW you can change (three) parameters of Simplify Area too.

I think that both ways of simplifying should be in JOSM. I know - it can be difficult for users to understand which one to use. Maybe you could combine them somehow...

Changed 7 years ago by *Martin*

Attachment: sample2.osm added

even better sample

comment:3 Changed 7 years ago by *Martin*

Today I also got an idea to make Simplify Area algorithm even a little bit better, but causing the time complexity to go from O(n) to O(n)2 (or little bit less ;-)):

while (true) {
  Node node = findTheBestCandidateToRemove(nodes);
  if (node != null) {
    nodes.remove(node);
  } else {
    break;
  }
}

This would bring a little bit better result in the following scenario:
We have 4 nodes: A B C and D. Let A and B are fixed (tagged, or lying on sharp angles). Node B lies on the angle 175 and node C on angle 176. Let the angle removal threshold is max 6 degrees. Currently the algorithm removes the first candidate (node B). Node C will not be removed, because now the angle would be 5+4=9 and that is more than 6.

New algorithm would first pick the best candidate (node C, because its removal would cause less damage). The next iteration would leave node B as is because of the same constraint. Should I try to improve it this way?

comment:4 Changed 7 years ago by bastiK

Not sure, after converting gpx to osm, a node count of 10000 and more should be expected. But it's worth a try.

Last edited 7 years ago by bastiK (previous) (diff)

Changed 7 years ago by bastiK

Attachment: rotated.osm added

comment:5 Changed 7 years ago by bastiK

I added an example with 2 ways. The number of nodes after simplification differs, although one is just a rotated copy of the other.

You can try to use LatLon.heading and LatLon.greatCircleDistance.

comment:6 Changed 7 years ago by anonymous

Not sure, after converting gpx to osm, a node count of 10000 and more should be expected. But it's worth a try.

You can tune its parameters. Now they are set for the building simplification = not very agressive.

  • simplify-area.angle=10.0
  • simplify-area.area=5.0
  • simplify-area.distance=0.2

I added an example with 2 ways. The number of nodes after simplification differs, although one is just a rotated copy of the other

Strange. Thanks for the warning, I'll try to check it in the evening.

comment:7 Changed 7 years ago by bastiK

In the simplify way part there are at least 3 ways to put a limit. (Nodes A, B, C - node B is supposed to be removed.)

(1) angle of bearing change from AB to BC

(2) distance of B to the line AC. (This is done in current implementation and I think also in Merkaartor.)

(3) area of triangle ABC

To me, (2) is the obvious choice to retain positional accuracy, because it limits the overall error to the original track.

Comparing (1) and (2): When you have a small and a large triangle with the same angles, (2) gives a penalty to the large one. This is reasonable as you get further "off the road".

Comparing (2) and (3): When you have two triangles, one where AC is short and one where AC is long, but for both the point B has the same distance to the line AC, then the area threshold (3) penalizes the larger triangle. This seems counter intuitive to me: why is it worse to be 3 m off a 2 km long road then being 3 m off a 10 m long road?

Basically I'm not convinced your angle / area approach is the best choice for simplifying gps data.

comment:8 Changed 7 years ago by *Martin*

bastiK, I am no expert in this area. I just saw the damage being caused by Simplify Way on the buildings (the sample2.osm shows it the best) and so I created my own rules how should they be simplified. Maybe I was wrong with proclaiming Simplify Area to be better on ways than Simplify Way, but on buildings I see it definitively better. Buildings are defined by their shape and the area they occupy. Therefore the shape/area threshold constraints are reasonable in this case. I really think that JOSM should have both - Simplify Way and Simplify Area. One is better for ways, other for areas. At least buildings, but I think that also on eg. landuses... What do you think?

comment:9 in reply to:  8 Changed 7 years ago by bastiK

Replying to m.zdila@…:

bastiK, I am no expert in this area.

Me neither, I just looked at the changes to see if it can be merged with "simplify way".

I just saw the damage being caused by Simplify Way on the buildings (the sample2.osm shows it the best) and so I created my own rules how should they be simplified. Maybe I was wrong with proclaiming Simplify Area to be better on ways than Simplify Way, but on buildings I see it definitively better. Buildings are defined by their shape and the area they occupy. Therefore the shape/area threshold constraints are reasonable in this case.

I still think you are on the wrong track with your area test: It only looks at the area of the individual triangles. The overall area error is the number of simplifications times the threshold value. So you do not limit the area error at all: If the way is subdivided into very small segments, the sum of the area differences can be arbitrarily large.

For the test (2) however, the overall area error is limited to the length of the way times the distance threshold. So the distance threshold could even be tuned such that the resulting area is never xx % larger or smaller than the original area.

I really think that JOSM should have both - Simplify Way and Simplify Area. One is better for ways, other for areas. At least buildings, but I think that also on eg. landuses... What do you think?

Could you merge your other improvements into SimplifyWayAction.java? (Handle first and last node of closed way; Average nodes with only a small distance to each other.)

Then we can see if a second action for simplifying building is really needed.

comment:10 Changed 7 years ago by *Martin*

I've checked Ramer–Douglas–Peucker algorithm on Wikipedia. This algorithm keeps the first and last - this is its first step. One way to workaround this is to run the algorithm, then set different node to be first/last and simplify the (closed) way again. And then maybe repeat it for every other node. Is it worth to implement something like this? It is a bit crazy idea for me :-)

BTW I think that the Ramer–Douglas–Peucker algorithm on the closed way can give different result if some other node is in the role of first/last.

Next, I've played with the rotated.osm sample. I've updated the code to use LatLon.heading and LatLon.greatCircleDistance. Unfortunately I got the same results. Isn't that actually allright :-)? If you would like, you could check the code I've already commited to SVN. You can uncomment the 3 System.out logging lines to see the differences in areas but mostly in angles.

About your comment of area error - I know I am not checking the area error globally, but from my experience it is not necessary. It just works very good with area-per-triangle check. I can't give you any math proofs, I just see it works. And I see that Simplify Way causes the damage. Or it sometimes doesn't remove the obviously redundant node.

I created by accident another example (sample3.osm) where Simplify Way fails. Set simplify-way.max-error=1. There are 3 obviously redundant nodes to be removed, but none is. If you add a new node or move some a slightly, then another node may be removed. It behaves kind of unstably. I can't tell you why. I think that the problem is in the Ramer–Douglas–Peucker algorithm itself :-O.

You can also feel the difference yourself - download http://93.184.70.94:21880/data/budovy/budovy_mur.tar.bz2 (big file), unpack it and try to batch-simplify some buildings. Select all, then simplify area/way and compare.

Changed 7 years ago by *Martin*

Attachment: sample3.osm added

Simplify Way - Fail! Simplify Area - Win!

comment:11 Changed 7 years ago by bastiK

added the distance threshold from josm core. (See [o23480])

Now please test:

(a) set area to 5000 and dist to default (3)
(b) set area to default (5) and dist to 5000

comment:12 Changed 7 years ago by bastiK

About Douglas–Peucker: This algorithm does not allow your angle threshold, which is good to prevent aggressive simplifications at small scales, imho. Maybe better use your idea from comment 3. If you cache the evaluation of each candidate, keep them sorted and update only the ones that change, you should get n*log(n). The update part may be tricky, however.

comment:13 Changed 7 years ago by anonymous

Some notes/questions until I check it (in the evening):

  • AFAIK computeConvectAngle was not necessary to be rewritten because angle computed from Math.abs(coord2.heading(coord3) - coord1.heading(coord2)) should be always in the range <0, 2*PI). Therefore return Math.toDegrees(angle < Math.PI ? angle : 2 * Math.PI - angle) should be sufficient.
  • is coord1.heading(coord2) wrong? In the changeset 23478 i was using custom method too, but AFAIK only because I had't known about LatLon.heading. I am asking because you are using custom heading method. Will it pass the rotated.osm test then?

comment:14 in reply to:  13 Changed 7 years ago by bastiK

Replying to anonymous:

Some notes/questions until I check it (in the evening):

  • AFAIK computeConvectAngle was not necessary to be rewritten because angle computed from Math.abs(coord2.heading(coord3) - coord1.heading(coord2)) should be always in the range <0, 2*PI). Therefore return Math.toDegrees(angle < Math.PI ? angle : 2 * Math.PI - angle) should be sufficient.

At some point I got an offset of PI, so the fastest fix was to rewrite computeConvectAngle. Cannot reconstruct the reasons, sorry. (I guess your original code was fine.)

  • is coord1.heading(coord2) wrong? In the changeset 23478 i was using custom method too, but AFAIK only because I had't known about LatLon.heading. I am asking because you are using custom heading method. Will it pass the rotated.osm test then?

LatLon.heading uses some approximation that is not sufficient for crossTrackError. I think it passes rotated.osm test.

comment:15 in reply to:  12 Changed 7 years ago by bastiK

Replying to bastiK:

About Douglas–Peucker: This algorithm does not allow your angle threshold, which is good to prevent aggressive simplifications at small scales, imho. Maybe better use your idea from comment 3. If you cache the evaluation of each candidate, keep them sorted and update only the ones that change, you should get n*log(n). The update part may be tricky, however.

An alternative would be to split the way in parts of 1000 nodes each and simplify these parts one by one. We never get perfect results, so there is no reason to be over ambitious...

comment:16 Changed 7 years ago by *Martin*

which is good to prevent aggressive simplifications at small scales

Sometimes you know that the small details are desirable (eg. for buildings from our cadastre) and sometimes you know they are undesirable (eg. for data from our vectorized "forest" roads, see http://93.184.70.94:21880/data/NLC_red_shapefile/data.html). Currently I choose Simplify Area for the first case and Simplify Way for the second one.

comment:17 Changed 7 years ago by *Martin*

I tested Simplify Area with your settings but I don't know what it is good for. If area is 5000 it almost mean "don't constraint area". If dist is 5000, then it means almost like "don't use dist constraint".

Also, in my System.out logs the XTE is often negative. Is it ok?

Generally XTE threshold brings another one constraint for preventing the node to be removed.

So, what is our result? I still think here should be both in the JOSM - Simplify Way and Simplify Area. Every for a different putpose. See my last comment why :-)

comment:18 in reply to:  17 ; Changed 7 years ago by bastiK

Replying to Martin Ždila <m.zdila@…>:

I tested Simplify Area with your settings but I don't know what it is good for. If area is 5000 it almost mean "don't constraint area". If dist is 5000, then it means almost like "don't use dist constraint".

Right, 5000 is supposed to turn one of the tests off, so you can compare both methods.

Also, in my System.out logs the XTE is often negative. Is it ok?

Thats a bug, corrected it.

Generally XTE threshold brings another one constraint for preventing the node to be removed.

Right.

So, what is our result? I still think here should be both in the JOSM - Simplify Way and Simplify Area. Every for a different putpose. See my last comment why :-)

In the end we may have 2 commands (how to explain this to the user is another question), but internally there should should be only one implementation using different parameter sets. At the moment we have 2 flawed implementations (the second much less flawed, though :-) :

  • Simplify Way does not handle start/end node for closed way and does not support angle threshold.
  • Simplify Area traverses the nodes one by one with the result that after simplify action there can still be nodes perfectly in line. (I can give some example if you need it.)

My suggestion would be to improve Simplify Area using your ideas from comment 3 and then use Simplify Area as the only implementation.

Then there is a second issue, the choice of constraints (angle, distance and area) and parameter values. Can you give me examples, where area constraint has better results than distance? For gpx data, distance is better imho, so I'd like to see why it is needed at all.

Sebastian

comment:19 in reply to:  18 ; Changed 7 years ago by *Martin*

Replying to bastiK:

  • Simplify Area traverses the nodes one by one with the result that after simplify action there can still be nodes perfectly in line. (I can give some example if you need it.)

Please give me that example.

My suggestion would be to improve Simplify Area using your ideas from comment 3 and then use Simplify Area as the only implementation.

Implementing findTheBestCandidateToRemove is not so straightforward. How to select best candidate node if there are currently three measurements that influence the selection? Some node may be best candidate because of smallest angle, other for smallest area and another for smallest XTE. And various combinations, eg. node with biggest angle possible for removal but smallest XTE. Do you have any idea how to solve it? My humble idea:

final double angleWeight = computeConvectAngle(coord1, coord2, coord3) / angleThreshold;
final double areaWeight = computeArea(coord1, coord2, coord3) / areaThreshold;
final double distanceWeight = Math.abs(crossTrackError(coord1, coord2, coord3)) / distanceThreshold;

final double weight = isRequiredNode(w, prevNode) ||
	!closed && i == len - 1 || // don't remove last node of the not closed way
	angleWeight > 1.0 || areaWeight > 1.0 || distanceWeight > 1.0 ? Double.MAX_VALUE :
	angleWeight * angleFactor + areaWeight * areaFactor + distanceWeight * distanceFactor;

Node with smallest weight would be removed. This also requires three new configuration parameters to be configured and fine-tuned (angleFactor, areaFactor, distanceFactor). Also the linear relationship can be questionable.

Then there is a second issue, the choice of constraints (angle, distance and area) and parameter values. Can you give me examples, where area constraint has better results than distance? For gpx data, distance is better imho, so I'd like to see why it is needed at all.

Currently I have no example, but we'll need to have it and to fine tune parameters for various use-cases. For example, when simplifying GPX trace we should not consider angles. If I walk and have one node per second, there will be many "big" angles.

*Martin*

PS: today I commited fix of considering already removed node in computation.

Changed 7 years ago by bastiK

Example where simplify area leads to nodes on a straight line (use with default parameters, 21873)

comment:20 in reply to:  19 Changed 7 years ago by bastiK

Replying to m.zdila@…:

Replying to bastiK:

  • Simplify Area traverses the nodes one by one with the result that after simplify action there can still be nodes perfectly in line. (I can give some example if you need it.)

Please give me that example.

Done.

My suggestion would be to improve Simplify Area using your ideas from comment 3 and then use Simplify Area as the only implementation.

Implementing findTheBestCandidateToRemove is not so straightforward. How to select best candidate node if there are currently three measurements that influence the selection? Some node may be best candidate because of smallest angle, other for smallest area and another for smallest XTE. And various combinations, eg. node with biggest angle possible for removal but smallest XTE. Do you have any idea how to solve it?

Imho it doesn't matter that much. I reckon all three measures will lead to similar results in this case.

My humble idea:

final double angleWeight = computeConvectAngle(coord1, coord2, coord3) / angleThreshold;
final double areaWeight = computeArea(coord1, coord2, coord3) / areaThreshold;
final double distanceWeight = Math.abs(crossTrackError(coord1, coord2, coord3)) / distanceThreshold;

final double weight = isRequiredNode(w, prevNode) ||
	!closed && i == len - 1 || // don't remove last node of the not closed way
	angleWeight > 1.0 || areaWeight > 1.0 || distanceWeight > 1.0 ? Double.MAX_VALUE :
	angleWeight * angleFactor + areaWeight * areaFactor + distanceWeight * distanceFactor;

Node with smallest weight would be removed. This also requires three new configuration parameters to be configured and fine-tuned (angleFactor, areaFactor, distanceFactor). Also the linear relationship can be questionable.

Why not? I'd say, start with (1,1,1) or (0,0,1) and see how it goes. The critical parameters are the thresholds.

Then there is a second issue, the choice of constraints (angle, distance and area) and parameter values. Can you give me examples, where area constraint has better results than distance? For gpx data, distance is better imho, so I'd like to see why it is needed at all.

Currently I have no example, but we'll need to have it and to fine tune parameters for various use-cases. For example, when simplifying GPX trace we should not consider angles. If I walk and have one node per second, there will be many "big" angles.

Agreed.

comment:21 Changed 7 years ago by *Martin*

Commmited. Now only one node is being removed from your example.

My changes also include doing averaging very close nodes in the second step, after core simplifications. This averaging uses best match approach too.

Next we should fine-tune all configurable parameters for different usecases. And we must define the usecases (simplify GPS track, building, forest landuse, water stream... ???)

comment:22 Changed 7 years ago by bastiK

(link: [o25007])

comment:23 Changed 7 years ago by *Martin*

I've just committed new version that handles also nodes on the shared ways if all are selected.

comment:24 Changed 7 years ago by *Martin*

Commited new version with preferences dialog. TODO: multiple profiles.

comment:25 Changed 6 years ago by *Martin*

Are there any other requirements for this feature to replace current Simplify Way action? Simplification parameters can already be configured in user friendly GUI. To the future I propose to implement support for multiple profiles. It would be great if every profile could be bound to different custom keyboard shortcut, but this is currently more advanced topic for me :-).

comment:26 in reply to:  25 Changed 6 years ago by skyper

Replying to Martin Ždila <m.zdila@…>:

Are there any other requirements for this feature to replace current Simplify Way action? Simplification parameters can already be configured in user friendly GUI. To the future I propose to implement support for multiple profiles. It would be great if every profile could be bound to different custom keyboard shortcut, but this is currently more advanced topic for me :-).

Maybe JOSM could be smart and take the right profile automatically (e.g look for major tags like highway,landuse,building...)

comment:27 Changed 3 years ago by Don-vip

Description: modified (diff)
Keywords: simplify way area added

comment:28 Changed 3 years ago by Don-vip

I like this plugin very much. Nice work :) I think we could effectively integrate it into core. I'm just not sure about #8217: I see this as a major defect and don't understand why the plugin has this behaviour: shouldn't we keep nodes when they have at least 2 referring ways? It can cause some damage if we remove them in this case.

comment:29 Changed 3 years ago by *Martin*

Yes, it is a major defect. I can check it only after two weeks.

comment:30 Changed 3 years ago by *Martin*

#8217 has been just fixed.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain team.
as The resolution will be set.
to The owner will be changed from team to the specified user.
The owner will change to *Martin*
as duplicate The resolution will be set to duplicate.The specified ticket will be cross-referenced with this ticket
The owner will be changed from team to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.