Modify

Opened 2 years ago

Closed 8 months ago

#8211 closed enhancement (fixed)

Automatic data corrector functionality

Reported by: stoecker Owned by: team
Priority: normal Milestone:
Component: Core Version:
Keywords: Cc: bastiK, Don-vip

Description

A suggestion caused by the color-->colour change.

We already have a "drop useless tags" option in JOSM. Another problem with OSM is the increasing number of diverging tagging methods and also spelling mistakes.

I would suggest an option to auto-correct these when uploading, same like the dropping keys option. With each upload we then would unify the database a little more, fix spelling mistakes, remove beginning or ending spaces, fix lowercase/uppercase issues and also remove our own errors like "color".

It is clear that adding new entries to such a list must be done careful, but also it should be clear that we would go the normal JOSM way, which means we accept suggestions from outside and wiki, but the final decision is our own.

Changes should be grouped into "always" (beginners, default) and "advanced" (only in expert mode, check changes with user).

We wont stop the fact that crap will accumulate in db, but at least we can reduce it a bit.

Attachments (1)

deprecated.diff (1.7 KB) - added by bastiK 21 months ago.
Ok, you had your fun... ;)

Download all attachments as: .zip

Change History (21)

comment:1 Changed 2 years ago by simon04

The DeprecatedTags validaton test is somewhat related.

comment:2 Changed 2 years ago by stoecker

  • Resolution set to fixed
  • Status changed from new to closed

In 5621/josm:

fix #8211 - data fix on upload

comment:3 follow-up: Changed 2 years ago by bastiK

  • Resolution fixed deleted
  • Status changed from closed to reopened

We should only do these silent fixes for very undisputed cases. Maybe I've missed some discussions, but type=multipolygon && boundary=administrative used to be the common way to map boundaries in some countries.

If users choose to tag in some way (and are aware of the options) we shouldn't force them use whatever JOSM developers happen to prefer. Of course we can make suggestions. As Simon mentioned, this is basically what the DepricatedTags does.

comment:4 Changed 2 years ago by akks

In 5623/josm:

fix NPE in FixDataHook, see #8211

comment:5 Changed 2 years ago by akks

I have fixed NPE in r5621 (could not upload natural=wood, type=multipolygon).

Not sure about type=multipolygon && boundary=administrative autofixing. For my country it is good, but what for the others?

comment:6 in reply to: ↑ 3 Changed 2 years ago by stoecker

Replying to bastiK:

We should only do these silent fixes for very undisputed cases. Maybe I've missed some discussions, but type=multipolygon && boundary=administrative used to be the common way to map boundaries in some countries.

These type has mainly been automatic imports. And after 3 years now the stats show it was never really accepted. There have not been negative comments when I finally stated the deprecated state in the wiki and also Frederik has not really a counter-argument (but still hoping for a area primitive).

While I agree with Frederik that mass-retagging is not a good idea such silent changes are acceptable in my eyes. And I wanted to include a little bit controverse tag, so that we see if

  • People actually notice and
  • if they care about it.

It's so silent about JOSM lately...

Last edited 2 years ago by stoecker (previous) (diff)

comment:7 Changed 2 years ago by OverQuantum

Only if in expert mode JOSM will reask user about changes.
It would be better, if in expert mode JOSM will allow user to load unchanged data - after additional confirmation or so.

comment:8 follow-up: Changed 2 years ago by Ivan Komarov

I think that silent automatic correction of manually entered tags is unacceptable. A user should be asked if he agrees with these changes and should have an option to suppress it.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 2 years ago by bastiK

Replying to Ivan Komarov:

I think that silent automatic correction of manually entered tags is unacceptable. A user should be asked if he agrees with these changes and should have an option to suppress it.

Well, the idea is that the user always wants these changes. Or at least should want them. :)

@stoecker: I had in mind the initial version with roles enclave and exclave which was not as powerful/consistent as multipolygon. Now the standard seems to be: Use multipolygon syntax (roles inner and outer) but simply with another name (type=boundary). So there are no strong reasons to prefer the old/alternative tagging type=multipolygon && boundary=administrative because it is basically the same. Still I think this autofix is quite bold, but you seem to be aware of that...

comment:10 Changed 2 years ago by skyper

As I am in favour of using type=boundary, I will be able to blame JOSM, now. Maybe this will bring some noise. Personally, I think this will be a big change and needs to be well documented.

Dirk did change the wiki page in English but all other western european languages versions totally contradict as they all still recommand to use type=multipolygon for boundaries.

Why is only boundary=administrative changed. E.g. postal_code and LEZ should be changed, too.

Last edited 2 years ago by skyper (previous) (diff)

comment:11 in reply to: ↑ 9 Changed 2 years ago by stoecker

@stoecker: I had in mind the initial version with roles enclave and exclave

Some years ago I helped a bit to unify the two styles. enclave and exclave aren't used really anymore and I also thought about changing the few remaining to inner/outer, but the code was not yet ready for role changing (tags are easier). Feel free to add it.

Still I think this autofix is quite bold, but you seem to be aware of that...

;-)

Either nobody cares or we get a discussion. In both cases I know what people think about that. If I start a discussion by asking in a forum the result will be that lots of people who like discussing more than real work will give their opinion and the real users stay silent. This has no use.

Dirk did change the wiki page in English but all other western european languages versions totally contradict as they all still recommand to use type=multipolygon for boundaries.

Feel free to update. These recommendations haven't have been in the English version previously (I had an eye on that, as I was against a multipolygon recommendation since the beginning).

Why is only boundary=administrative changed. E.g. postal_code and LEZ should be changed, too.

These aren't documented at all, so I wanted to be on the safe side and don't change them (yet).

comment:12 Changed 2 years ago by Sergey Astakhov

Presence of boundary=administrative should not be only condition to change relation type from multipolygon.

For example, in Russia we actively use two types of boundary - boundary=administrative and place=city/town/etc (description in russian). In some cases they are equals, so there is boundary=administrative and place=* on the same relation. type=boundary may be ok for boundary=administrative, but not for place=*.

If these tags should not appears on the same object - then in first place there is should be error is validation, but not silent changing from one type of relation to another.

comment:13 Changed 2 years ago by simon04

I would like the software do what I want it to do. This is why I use Linux. Also do I expect from JOSM not to make any hidden changes (probably except for completely undisputed cases as created_by and odbl). I consider the DeprecatedTags test perfectly suited for the use-case. It worked for a long time without those automatic changes.

comment:14 Changed 23 months ago by skyper

Thought there is an extra message about these changes in expert mode but it does not show up and changes are made.

comment:15 follow-up: Changed 21 months ago by anonymous

I stumbled across this issue as I tried to change type=boundary to type=multipolygon for admin_level=10 boundaries in The Netherlands.
We just finished completing the boundaries for all places. 2173 of them are tagged type=multipolygon, which is the Dutch standard. The other 327 'still' have type=boundary. I was very surprised and annoyed when Josm kept changing them back.
In my opinion, it's a bit bold to just add this kind of functionality without even a notification in the release notes. The boundary=administrative/type=multipolygon convention is mainly used in Germany and The Netherlands, so it couldn't be so hard to consult at least those two communities.

Gertjan Idema

By the way, a automatic silent change from "type=boundary" to type="multipolygon" for administrative boundaries would be a nice feature ;-)

comment:16 in reply to: ↑ 15 Changed 21 months ago by g.idema@…

The previous comment was anonymous by mistake.

Changed 21 months ago by bastiK

Ok, you had your fun... ;)

comment:17 Changed 21 months ago by stoecker

Well, I still think current solution is correct, but one comment is right: We forgot to document it in the changelog.

comment:18 Changed 21 months ago by stoecker

P.S. The suggested patch is really wrong. It removes "boundary=administrative"!

comment:19 Changed 21 months ago by stoecker

For current stats: November last year: 120.000 type=boundary and 30.000 type=multipolygon, today 160.000 to 36.000. Give the code some time and it will be 200.000 to 0 and we finally have one unique style. :-)

comment:20 Changed 8 months ago by stoecker

  • Resolution set to fixed
  • Status changed from reopened to closed

1 year and no complains, so I consider that as closable.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed The owner will remain team.
as The resolution will be set. Next status will be 'closed'.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.