Modify

Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#17055 closed enhancement (fixed)

[Patch] Validator should complain about simple typos in tag values

Reported by: GerdP Owned by: team
Priority: normal Milestone: 19.03
Component: Core validator Version:
Keywords: template_report Cc:

Description

What steps will reproduce the problem?

  1. Create way with typo like highway=residental (missing 2nd i)
  2. Run validator

What is the expected result?

Warning Misspelled property value - Value 'residental' for key 'highway' looks like 'residential'. (1)

What happens instead?

Only an informational message is produced:
Presets do not contain property value - Value 'residental' for key 'highway' not in presets. (1)

Please provide any additional information below. Attach a screenshot if possible.

See also https://josm.openstreetmap.de/ticket/15203#comment:31
The attached patch uses Utils.getLevenshteinDistance() to detect typical typos and propose a fix.
I found one possible problem so far:
When you type highway=servics it suggests highway=services

Build-Date:2018-12-01 17:14:34
Revision:14478
Is-Local-Build:true

Identification: JOSM/1.5 (14478 SVN en) Windows 10 64-Bit
OS Build number: Windows 10 Home 1803 (17134)
Memory Usage: 1437 MB / 3641 MB (388 MB allocated, but free)
Java version: 1.8.0_191-b12, Oracle Corporation, Java HotSpot(TM) 64-Bit Server VM
Screen: \Display0 1920x1080
Maximum Screen Size: 1920x1080
VM arguments: [-agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:51821, -ea, -Dfile.encoding=UTF-8]
Dataset consistency test: No problems found

Plugins:
+ OpeningHoursEditor (34535)
+ apache-commons (34506)
+ buildings_tools (34724)
+ download_along (34503)
+ ejml (34389)
+ geotools (34513)
+ jaxb (34506)
+ jts (34524)
+ merge-overlap (34664)
+ o5m (34405)
+ opendata (34698)
+ pbf (34576)
+ poly (34546)
+ reverter (34552)
+ undelete (34568)
+ utilsplugin2 (34506)

Last errors/warnings:
- W: Update plugins - org.openstreetmap.josm.plugins.PluginHandler$UpdatePluginsMessagePanel[,0,0,0x0,invalid,layout=java.awt.GridBagLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=9,maximumSize=,minimumSize=,preferredSize=]
- W: No configuration settings found.  Using hardcoded default values for all pools.
- W: Failed to locate resource '/README'.
- W: Failed to locate resource '/CONTRIBUTION'.
- W: Failed to locate resource '/LICENSE'.

Attachments (11)

17055-v1.patch (1.5 KB ) - added by GerdP 5 years ago.
filter_words.patch (2.0 KB ) - added by GerdP 5 years ago.
small hack to create a reduced version of words.cfg
shrink-words.cfg.patch (13.2 KB ) - added by GerdP 5 years ago.
patch to reduce and update words.cfg (some values added, many ignored values removed)
17055-v2.patch (20.4 KB ) - added by GerdP 5 years ago.
17055-enh-v1.patch (12.9 KB ) - added by GerdP 5 years ago.
17055-enh-v2.patch (14.2 KB ) - added by GerdP 5 years ago.
previous version contained a bug
17055-use-ignore.patch (14.7 KB ) - added by GerdP 5 years ago.
use values with prefix K: for checks with Levenshtein distance, too
frequent-tags.zip (168.2 KB ) - added by GerdP 5 years ago.
extract_1000.sql (131 bytes ) - added by GerdP 5 years ago.
extract.script (92 bytes ) - added by GerdP 5 years ago.
TaginfoTestIT.patch (3.4 KB ) - added by GerdP 4 years ago.

Download all attachments as: .zip

Change History (74)

by GerdP, 5 years ago

Attachment: 17055-v1.patch added

comment:1 by Don-vip, 5 years ago

If we go this way we should clean the words.cfg file (TagChecker.SPELL_FILE) and make sure there is no regression.

Last edited 5 years ago by Don-vip (previous) (diff)

comment:2 by GerdP, 5 years ago

Sounds good. I'll have a closer look tomorrow.

in reply to:  1 comment:3 by stoecker, 5 years ago

Replying to Don-vip:

If we go this way we should clean the words.cfg file (TagChecker.SPELL_FILE) and make sure there is no regression.

Remove it? That has not be maintained for ages. I think this is from the time before we had real presets. The approach of this tickets sounds more like a modern approach comparing the input to valid data.

comment:4 by Don-vip, 5 years ago

Removing it is the ultimate cleaning :) We just need to check we don't loose functionality (by sampling values).

comment:5 by GerdP, 5 years ago

My understanding is that words.cfg is only used for tags with misspelled keys, not for wrong tag values with known keys.
Try with e.g. highway=residentail

Last edited 5 years ago by GerdP (previous) (diff)

by GerdP, 5 years ago

Attachment: filter_words.patch added

small hack to create a reduced version of words.cfg

by GerdP, 5 years ago

Attachment: shrink-words.cfg.patch added

patch to reduce and update words.cfg (some values added, many ignored values removed)

by GerdP, 5 years ago

Attachment: 17055-v2.patch added

comment:6 by GerdP, 5 years ago

17055-v2.patch improves:

  • reduced words.cfg
  • reduced memory footprint because harmonizedKeys no longer contains pairs where key and value are equal
  • if more than one alternative for a tag value is found, all are listed and auto fix is disabled
  • updated unit tests

My my gut feeling is that Utils.getLevenshteinDistance() will not work for the keys, but I'll try to code it later.

comment:7 by GerdP, 5 years ago

As expected we will get many false positives when we simply use all tag keys in the presets in combination with getLevenshteinDistance():

Key 'draft' looks like 'craft'. (3)
Key 'link' looks like 'line'. (1)
Key 'ref_name' looks like 'reg_name'. (59)
Key 'sat' looks like 'salt'. (49)
Key 'type2' looks like 'type'. (1)

Also performance is poor compared to the current code because we have to calculate the Levenshtein distance ~600 times for each key of each primitive when we use all keys from the presets.
I see possibilities:
1) Don't use getLevenshteinDistance() with keys
2) Specify a list of keys for which the getLevenshteinDistance() can be used. That could be a simple list or maybe an attribute in the presets xml.

I think I still prefer the solution implemented with 17055-v2.patch

comment:8 by stoecker, 5 years ago

Let's try it...

If bad we'll get reports soon 🤗

Version 0, edited 5 years ago by stoecker (next)

comment:9 by GerdP, 5 years ago

Resolution: fixed
Status: newclosed

In 14490/josm:

fix #17055 17055-v2.patch with one empty line removed

comment:10 by Klumbumbus, 5 years ago

How to disable clear false positives like Key 'ref_name' looks like 'reg_name'. (59)? Simply add ref_name as correctly spelled key to words.cfg?

comment:11 by GerdP, 5 years ago

These wrong positives were reported when I tried to ignore words.cfg (did not post a patch for that). You should not see them with r14490.
On the other hand I just noticed that typo highway=residentail is still not flagged :-(

by GerdP, 5 years ago

Attachment: 17055-enh-v1.patch added

by GerdP, 5 years ago

Attachment: 17055-enh-v2.patch added

previous version contained a bug

comment:12 by GerdP, 5 years ago

Resolution: fixed
Status: closedreopened

With 17055-enh-v2.patch I tried to improve TagChecker so that it also suggests a fix when the harmonized tag value looks like a truncated value listed in the presets.
Examples:
highway=u -> highway=unclassified

Problem: Many values are ambiguous, e.g.
highway=res could be residential or rest_area
highway=ser could be serivce or services
highway=tr could be track, trunk, or trunk_link
In those cases a warning with "Ambiguous truncated property value" is produced.
My goal here is not to find the most likely value but to produce a warning instead of information.
What do you think about that?

There is still a problem with the getLevenshteinDistance(). I see possibly wrong warnings like
Value '125' for key 'fire_hydrant:diameter' looks like one of [100, 150, 250].
Value 'none' for key 'sidewalk' looks like 'no'.

It is probably not a good idea to use that for short values.

in reply to:  12 comment:13 by Klumbumbus, 5 years ago

Replying to GerdP:

There is still a problem with the getLevenshteinDistance(). I see possibly wrong warnings like
Value '125' for key 'fire_hydrant:diameter' looks like one of [100, 150, 250].

fire_hydrant:diameter=125 is in the presets. The getLevenshteinDistance() should only be tested on tags which are not in the presets.

Value 'none' for key 'sidewalk' looks like 'no'.

none is not "officially deprecated", but no is the preferred value, so this warning would be ok in my opinion.

comment:14 by GerdP, 5 years ago

125 is not in the defaultpresets.xml. See

            <combo key="fire_hydrant:diameter" text="Diameter (in mm)" values="50,80,100,150,200,250,300,400" />

Taginfo says that sidewalk=no and sidewalk=none are nearly equally used. Maybe we can have both in the presets?

in reply to:  14 comment:15 by Klumbumbus, 5 years ago

Replying to GerdP:

125 is not in the defaultpresets.xml.

Right, sorry, I mixed it up with 150.

Taginfo says that sidewalk=no and sidewalk=none are nearly equally used. Maybe we can have both in the presets?

Looking at the history of the sidewalk wiki page the preferred tag changed several times over the years. We should have only one in our preset and stay with the current no.

comment:16 by GerdP, 5 years ago

In 14508/josm:

see #17055 improve TagChecker

  • allow Levenshtein distance of 2 so that residentail is found as typo
  • disable Fix for typos where the correct value is not known, e.g. services can be service or services
  • don't suggest values if all preset values are short, e.g.

layer=6 should not say "6 looks like one of [1,2,3,4,5]"

comment:17 by GerdP, 5 years ago

I didn't commit the changes reg. truncated values, they produced more noise than helpful warnings.
What I'd really like to have would be a flag that says if the list of preset values is quite complete and therefore other values are probably wrong.
This would allow to produce a warning when the value is not in the presets.
I simply don't like that we only produce an information for well known tag keys like highway, e.g. highway=dirt, highway=footpath or highway=u when we do all kinds of complex checks for highways like "Way end node near other highway".

An alternative woulld be to write a MapCSS check which lists all the values that appear in the presets, but I think that would mean a duplication of the lists.
I think a MapCSS rule is needed for something like highway=driveway, where the validator should suggest to replace it with
highway=service + service=driveway?

comment:18 by naoliv, 5 years ago

I am unsure if I am the only person who sometimes press Enter too quickly (before auto-completing a value with tab), but I usually have wrong surface=a or surface=u values in my data set.

For surface=u it's suggesting mud, while it most probably should be unpaved
Is it possible to have surface=asurface=asphalt too?

comment:19 by GerdP, 5 years ago

I coded something like that but was not satisfied with the results.
Can you explain why you have to press tab for autocompletion? I don't need to do that.

comment:20 by naoliv, 5 years ago

Next time I promise I will test exactly what I do before writing :-)
Indeed, I don't use tab but somehow I end up with surface=a and surface=u objects.
(and I will create some local rules to catch this an error)

Rectifying my previous message then, probably surface=u shouldn't be understood as surface=mud (nor offered an auto-fix for this specific case)

comment:21 by GerdP, 5 years ago

OK, I think JOSM should not suggest to fix a value with only one character with one that has three.
Reg. surface=a : You are right, it is probably always asphalt, but presets also contain artificial_turf, so JOSM would only show a list of these possible candidates, but it would not enable the Fix button. That's why I wasn't satisfied with my approach.
See also comment:12.

comment:22 by GerdP, 5 years ago

In 14517/josm:

see #17055 improve TagChecker

  • don't suggest fix value when given value is short and Levenshtein distance is 2

surface=u should not be fixed with surface=mud but highway=tra can be fixed with highway=track

comment:23 by naoliv, 5 years ago

A way with sidewalk=none is giving a warning saying that none seems to be no, but https://wiki.openstreetmap.org/wiki/Sidewalks#Sidewalk_as_refinement_to_a_highway says that "none is also used synonymously to no"

Probably there shouldn't be a warning for this case too.

in reply to:  23 comment:24 by GerdP, 5 years ago

Replying to naoliv:

A way with sidewalk=none is giving a warning saying that none seems to be no, but https://wiki.openstreetmap.org/wiki/Sidewalks#Sidewalk_as_refinement_to_a_highway says that "none is also used synonymously to no"

Probably there shouldn't be a warning for this case too.

I also noticed this, see comment:13 to comment:15. I also don't like the warning in this case, but I am unsure how to handle it.
I want to see the warning when one types e.g. oneway=none so one possible solution would be to add a line

K:sidewalk:no

to the end of file ignoretags.cfg.

comment:25 by GerdP, 5 years ago

In 14531/josm:

see #17055 improve TagChecker: Don't say "looks like" when all characters of given value are different, e.g. "Value '10' for key 'fee' looks like 'no'." makes no sense. The Levenshtein distance from 10 to no is only 2, but the length is also only 2.

comment:26 by naoliv, 5 years ago

shop=gas is warning with Value 'gas' for key 'shop' looks like one of [bag, car, yes].

Gerd, will this warning magically disappear if a preset is created for shop=gas?

comment:27 by GerdP, 5 years ago

Yes, I think so. Still, this should also be handled in the code. Looking at it...

comment:28 by GerdP, 5 years ago

In 14571/josm:

see #17055 When harmonized value is only 3 characters reduce allowed Levenshtein distance from 2 to 1.

When shop=gas is not in presets, we don't want a message like "Value 'gas' for key 'shop' looks like one of [bag, car, yes]."
just because one character of those values matches.

comment:29 by GerdP, 5 years ago

I wonder if we should flag all tag values with just one character (not a digit), so that we produce a warning instead of information for
e.g. highway=u or surface=a, but only for known preset keys.
We have a similar test that says "uncommon short key" for short unknown keys, e.g. for s=asphalt

comment:30 by naoliv, 5 years ago

shop=pasta is giving a warning saying that it looks like pastry, but they are different:

https://wiki.openstreetmap.org/wiki/Tag:shop%3Dpasta
https://wiki.openstreetmap.org/wiki/Tag:shop%3Dpastry

shop=party is giving a warning saying that it looks like [art, pastry] while it also shouldn't

https://wiki.openstreetmap.org/wiki/Tag:shop%3Dparty

comment:31 by GerdP, 5 years ago

I fear you will always find more here because the presets will probably never cover all cases.
Maybe we could create a list of "good" values for which we don't have a preset using taginfo? I think we already have a script TagInfoExtract.groovy to find possibly missing presets, but it seems the result is not stored anywhere.

comment:32 by naoliv, 5 years ago

The only thing that I fear is people pressing Fix without actually taking a look at the message and at the value (in this example, changing pasta to pastry)
Maybe it's safer to disable the autofix and only issue a warning?
Or, I don't know if it's possible, to have a fix option only for cases where there is no doubt at all.

comment:33 by GerdP, 5 years ago

OK, I'll remove the autofix option for now.

comment:34 by GerdP, 5 years ago

In 14585/josm:

see #17055 Disable autofix for possibly misspelled tag values

Some keys like shop have a long list of documented values and not all are in the presets. It can happen that a good value is flagged and a fix and an improper fix value is suggested.
This should be reverted once we have a better source for "correct" values. The presets are too incomplete.

comment:35 by GerdP, 5 years ago

Maybe I should also change the message text to something like
"Value '...' for key '...' is not in presets, did you mean '...'?"

in reply to:  31 ; comment:36 by Klumbumbus, 5 years ago

Replying to GerdP:

I fear you will always find more here because the presets will probably never cover all cases.
Maybe we could create a list of "good" values for which we don't have a preset using taginfo? I think we already have a script TagInfoExtract.groovy to find possibly missing presets, but it seems the result is not stored anywhere.

"Good" values which don't have a preset yet (e.g. due to low usage numbers like shop=pasta) can be added to source:/josm/trunk/data/validator/ignoretags.cfg Does your misspelled values test consider this list?

I think TagInfoExtract.groovy is used only for the integration test. (Currently tags without wiki page are ignored too, but this constraint may be removed after adding some tags to the presets and feeding the ignore list a bit, see #16900.)

in reply to:  36 ; comment:37 by GerdP, 5 years ago

Replying to Klumbumbus:

"Good" values which don't have a preset yet (e.g. due to low usage numbers like shop=pasta) can be added to source:/josm/trunk/data/validator/ignoretags.cfg Does your misspelled values test consider this list?

Well, it is not executed for the values in this list. Might not be the best way to consider the list.

I think TagInfoExtract.groovy is used only for the integration test. (Currently tags without wiki page are ignored too, but this constraint may be removed after adding some tags to the presets and feeding the ignore list a bit, see #16900.)

The list doesn't contain the examples shop=party or shop=pastry, so we'd probably need a longer list.

in reply to:  37 comment:38 by Klumbumbus, 5 years ago

Replying to GerdP:

Replying to Klumbumbus:

"Good" values which don't have a preset yet (e.g. due to low usage numbers like shop=pasta) can be added to source:/josm/trunk/data/validator/ignoretags.cfg Does your misspelled values test consider this list?

Well, it is not executed for the values in this list. Might not be the best way to consider the list.

Yes, with "consider" I meant that the test is not executed for the values in this list.

I think TagInfoExtract.groovy is used only for the integration test. (Currently tags without wiki page are ignored too, but this constraint may be removed after adding some tags to the presets and feeding the ignore list a bit, see #16900.)

The list doesn't contain the examples shop=party or shop=pastry, so we'd probably need a longer list.

Yes these two (and probably some more we don't know atm) should be added to the list to remove the false positive warnings.

in reply to:  35 comment:39 by Klumbumbus, 5 years ago

Replying to GerdP:

Maybe I should also change the message text to something like
"Value '...' for key '...' is not in presets, did you mean '...'?"

That sounds better. (Maybe ...in the presets...)

comment:40 by GerdP, 5 years ago

OK, will change it. Reg. the list I think about a different approach:
We could generate a list of all frequently used tags and run them through TagChecker. All false positives should be added to
a special list which would than be used to suggest correct spelling. The test is not so much about correct tagging, so I would not care if e.g. sidewalk=none or sidewalk=no is in the presets as long as both values are often used.
How does that sound?

comment:41 by Klumbumbus, 5 years ago

Sounds good, however it might be hard to find an proper automatic rule. How to define "often used"?

comment:42 by GerdP, 5 years ago

Probably there is no simple answer. As a first approach I'd say all tags used > 1000 times are often used. Let me code something to process such a list, I probably cannot help filling it.

Thought about the warning text again. The "you" is not so good when someone else created the tag.
Next try:

single value in {2} :   "Value ''{0}'' for key ''{1}'' is not in the presets, maybe ''{2}'' is meant?"
multiple values in {2}: "Value ''{0}'' for key ''{1}'' is not in the presets, maybe one of {2} is meant?"
Last edited 5 years ago by GerdP (previous) (diff)

by GerdP, 5 years ago

Attachment: 17055-use-ignore.patch added

use values with prefix K: for checks with Levenshtein distance, too

comment:43 by GerdP, 5 years ago

With the 17055-use-ignore.patch the lines from ignoretags.cfg starting with K: are treated as "often used".
So, if you add K:shop=party to it
and run the validator with shop=party it will be quiet, for shop=panty it will show

Unknown property value - Value 'panty' for key 'shop' is not in the presets, maybe 'party' is meant? (1)

and for shop=pasty you get

Unknown property value - Value 'pasty' for key 'shop' is not in the presets, maybe one of [party, pastry] is meant? (1)

open questions:
1) Should we use another file for the often used values? If yes, should we use the "K:" prefix?
2) Should we always raise a warning when a tag key is found in that list but not the value?
Depending on this the message texts would change again, I think "not in the presets" would then be "unknown".

comment:44 by Don-vip, 5 years ago

Milestone: 19.01

in reply to:  42 comment:45 by simon04, 5 years ago

comment:46 by GerdP, 5 years ago

Yes, it is close to what we need, but it stops for value count < 10000. I think that's too early.

comment:47 by simon04, 5 years ago

Alternatively, you could download the entire SQLite database from https://taginfo.openstreetmap.org/download and perform custom queries.

by GerdP, 5 years ago

Attachment: frequent-tags.zip added

comment:48 by GerdP, 5 years ago

Thanks for the hint and sorry for the late answer. I've downloaded the complete DB (1.1GB) and played with a few SQL to extract frequently used tags.
My current thinking is this:
Execute a rather simple sql like

'select * from tags where count_all > 1000';

or maybe a bit more complex:

select * from tags where count_all > 1000 and key not like '%name%' and key not like '%source%' and key not like 'tiger%' and key not like 'yh:%' and key not like 'addr:%' and key not like 'is_in%' and key not like 'contact:%' and key not like 'turn:lanes%' and key not in ('circumference','color','colour','created_by','description','ele','fixme','height','width','level','opening_hours','website');

to extract a list of frequently used tags, lets name it frequent-tags.txt. Attached is the result for the latter.
The table tags looks like this:

CREATE TABLE tags (
  key              VARCHAR,
  value            VARCHAR,
  count_all        INTEGER DEFAULT 0,
  count_nodes      INTEGER DEFAULT 0,
  count_ways       INTEGER DEFAULT 0,
  count_relations  INTEGER DEFAULT 0,
  in_wiki          INTEGER DEFAULT 0,
  in_wiki_en       INTEGER DEFAULT 0
);

The result contains the columns separated by |.
My current thinking is this: I could either try to improve the sql so that it filters more tags which are not usefull for the Levenshtein text ("val looks like ...") so that we can add that result directly to the data\validator directory and parse it on each start of JOSM like we do with ignoretags.cfg. The result could be some HashSets, maybe separated for nodes, ways, and relations, which would be used to suggest different spelling.
I'll try to code that now. Question is how the input file is updated.
I think it would be best if that file would be created on the taginfo server once we know exactly how the format should look like?

comment:49 by GerdP, 5 years ago

I've played with this for a while now. I have a list of frequently used tags where the values are strings. Many of them are also in the presets, BUT also many are not.

I don't think that I can simply use the list from taginfo without review, as it sometimes contains bad data from imports.
Example: leaf_type=broadl-leaved appears > 27000 times but is simply wrong (should be broadleaved)

I don't want to produce a hint like
Unknown property value - Value 'broadlleaved' for key 'leaf_type' is unknown, maybe one of [broadl-leaved, broadleaved] is meant? (1)
So the list has to be checked carefully.
I fear this will take some more days to produce a good start.
I think about involving the community here.

comment:50 by GerdP, 5 years ago

In 14739/josm:

see #17055 add K:natural=shrub to suppress false positive in validator

comment:51 by GerdP, 5 years ago

In 14750/josm:

see #17055:

  • update description for lines starting with K:
  • add K:craft=tailor and K:shop=pasta to suppress false positives:

Value 'tailor' for key 'craft' is unknown, maybe 'tiler' is meant?
Value 'pasta' for key 'shop' is unknown, maybe 'pastry' is meant?

comment:52 by GerdP, 5 years ago

I think about adding these keys to ignoredtags.cfg (with prefix K:)

building=shop
building=chalet
building=clinic
crop=rape
crossing=marked
crossing=controlled
highway=trail
highway=ford
indoor=door
natural=land
natural=hill
natural=arete
natural=slope
piste:grooming=classic+skating
power=sub_station
product=wine
railway=derail
railway=radio
resource=sand
shop=trade
sidewalk=none
sport=skating
surface=hard

They all produce false warnings:

Value 'classic+skating' for key 'piste:grooming' is unknown, maybe 'classic;skating' is meant? (1)
Value 'clinic' for key 'building' is unknown, maybe 'civic' is meant? (1)
Value 'controlled' for key 'crossing' is unknown, maybe 'uncontrolled' is meant? (1)
Value 'door' for key 'indoor' is unknown, maybe 'room' is meant? (1)
Value 'ford' for key 'highway' is unknown, maybe 'road' is meant? (1)
Value 'hard' for key 'surface' is unknown, maybe 'sand' is meant? (1)
Value 'none' for key 'sidewalk' is unknown, maybe 'no' is meant? (1)
Value 'radio' for key 'railway' is unknown, maybe 'rail' is meant? (1)
Value 'rape' for key 'crop' is unknown, maybe 'grape' is meant? (1)
Value 'sand' for key 'resource' is unknown, maybe 'salt' is meant? (1)
Value 'skating' for key 'sport' is unknown, maybe 'karting' is meant? (1)
Value 'slope' for key 'natural' is unknown, maybe 'stone' is meant? (1)
Value 'sub_station' for key 'power' is unknown, maybe 'substation' is meant? (1)
Value 'trade' for key 'shop' is unknown, maybe 'frame' is meant? (1)
Value 'wine' for key 'product' is unknown, maybe 'lime' is meant? (1)

Any problems with that?

in reply to:  52 comment:53 by Klumbumbus, 5 years ago

Replying to GerdP:

Any problems with that?

No.

comment:54 by GerdP, 5 years ago

In 14751/josm:

See #17055: Add more often used tags to ignoretags.cfg to suppress false positives

by GerdP, 5 years ago

Attachment: extract_1000.sql added

by GerdP, 5 years ago

Attachment: extract.script added

comment:55 by GerdP, 5 years ago

To find this list I did this:
1) install Sqlite3: (unzip sqlite-tools-win32-x86-3260000.zip and sqlite-dll-win64-x64-3260000.zip)
2) download and uncopress taginfo db.db.bz2 from https://taginfo.openstreetmap.org/download/taginfo-db.db.bz2(uncompressed size is ~6.5GB)
3) create attached simple sql and script. I also created a much more complex sql to filter all kinds of tags like names etc but decided that it would be difficult to maintain.
4) run Sqlite3 with sqlite3 -readonly -init extract.script taginfo-db.db
5) type ".quit" followed by ENTER to get out of Sqlite3 (I did not yet dind a way to automate that)
6) copy the result file frequent-tags.cfg (~ 2.6 MB) to \josm\nodist\data
7) execute unit test TaginfoTestIT with attached patch TaginfoTestIT.patch
This produces a list of possible false positives which has to be checked by someone who really understands the meaning of the tags,
I see no way to automate that last step. One has to know the tags and maybe also the results produced by other JOSM tests to decide what to do with this list. Maybe we could feed that list to a web page for review by the community?

The rest might be automated, esp. with a little help from taginfo to avoid the download of the full db.

Last edited 4 years ago by GerdP (previous) (diff)

comment:56 by Don-vip, 5 years ago

Milestone: 19.0119.02

comment:57 by Don-vip, 5 years ago

Milestone: 19.0219.03

comment:58 by GerdP, 5 years ago

In 14881/josm:

see #17055: Suppress warning
Unknown property value - Value '12' for key 'capacity:disabled' is unknown, maybe one of [1, 2] is meant?

comment:59 by Don-vip, 5 years ago

What's the status of this ticket?

comment:60 by mkoniecz, 5 years ago

It seems to be working, for example shop=tyres vs shop=tires.

comment:61 by GerdP, 5 years ago

I left it open for possible false positives. We can also close this ticket and maybe create a new one which collects the false positives.

comment:62 by Klumbumbus, 5 years ago

Resolution: fixed
Status: reopenedclosed

Yes. The main implementation was done. If new false positives pop up we can use new tickets or simply reference this one.

comment:63 by GerdP, 5 years ago

In 15119/josm:

See #17055: Don't produce warning "Unknown property value - Value 'bank' for key 'building' is unknown, maybe
'barn' is meant?"
See also http://gis.19327.n8.nabble.com/building-bank-tp5940445.html

by GerdP, 4 years ago

Attachment: TaginfoTestIT.patch added

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.