Modify

Opened 4 months ago

Last modified 12 days ago

#17055 reopened enhancement

[Patch] Validator should complain about simple typos in tag values

Reported by: GerdP Owned by: team
Priority: normal Milestone: 19.03
Component: Core validator Version:
Keywords: template_report Cc:

Description

What steps will reproduce the problem?

  1. Create way with typo like highway=residental (missing 2nd i)
  2. Run validator

What is the expected result?

Warning Misspelled property value - Value 'residental' for key 'highway' looks like 'residential'. (1)

What happens instead?

Only an informational message is produced:
Presets do not contain property value - Value 'residental' for key 'highway' not in presets. (1)

Please provide any additional information below. Attach a screenshot if possible.

See also https://josm.openstreetmap.de/ticket/15203#comment:31
The attached patch uses Utils.getLevenshteinDistance() to detect typical typos and propose a fix.
I found one possible problem so far:
When you type highway=servics it suggests highway=services

Build-Date:2018-12-01 17:14:34
Revision:14478
Is-Local-Build:true

Identification: JOSM/1.5 (14478 SVN en) Windows 10 64-Bit
OS Build number: Windows 10 Home 1803 (17134)
Memory Usage: 1437 MB / 3641 MB (388 MB allocated, but free)
Java version: 1.8.0_191-b12, Oracle Corporation, Java HotSpot(TM) 64-Bit Server VM
Screen: \Display0 1920x1080
Maximum Screen Size: 1920x1080
VM arguments: [-agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:51821, -ea, -Dfile.encoding=UTF-8]
Dataset consistency test: No problems found

Plugins:
+ OpeningHoursEditor (34535)
+ apache-commons (34506)
+ buildings_tools (34724)
+ download_along (34503)
+ ejml (34389)
+ geotools (34513)
+ jaxb (34506)
+ jts (34524)
+ merge-overlap (34664)
+ o5m (34405)
+ opendata (34698)
+ pbf (34576)
+ poly (34546)
+ reverter (34552)
+ undelete (34568)
+ utilsplugin2 (34506)

Last errors/warnings:
- W: Update plugins - org.openstreetmap.josm.plugins.PluginHandler$UpdatePluginsMessagePanel[,0,0,0x0,invalid,layout=java.awt.GridBagLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=9,maximumSize=,minimumSize=,preferredSize=]
- W: No configuration settings found.  Using hardcoded default values for all pools.
- W: Failed to locate resource '/README'.
- W: Failed to locate resource '/CONTRIBUTION'.
- W: Failed to locate resource '/LICENSE'.

Attachments (11)

17055-v1.patch (1.5 KB) - added by GerdP 4 months ago.
filter_words.patch (2.0 KB) - added by GerdP 4 months ago.
small hack to create a reduced version of words.cfg
shrink-words.cfg.patch (13.2 KB) - added by GerdP 4 months ago.
patch to reduce and update words.cfg (some values added, many ignored values removed)
17055-v2.patch (20.4 KB) - added by GerdP 4 months ago.
17055-enh-v1.patch (12.9 KB) - added by GerdP 4 months ago.
17055-enh-v2.patch (14.2 KB) - added by GerdP 4 months ago.
previous version contained a bug
17055-use-ignore.patch (14.7 KB) - added by GerdP 3 months ago.
use values with prefix K: for checks with Levenshtein distance, too
frequent-tags.zip (168.2 KB) - added by GerdP 2 months ago.
TaginfoTestIT.patch (3.4 KB) - added by GerdP 7 weeks ago.
extract_1000.sql (131 bytes) - added by GerdP 7 weeks ago.
extract.script (92 bytes) - added by GerdP 7 weeks ago.

Download all attachments as: .zip

Change History (69)

Changed 4 months ago by GerdP

Attachment: 17055-v1.patch added

comment:1 Changed 4 months ago by Don-vip

If we go this way we should clean the words.cfg file (TagChecker.SPELL_FILE) and make sure there is no regression.

Last edited 4 months ago by Don-vip (previous) (diff)

comment:2 Changed 4 months ago by GerdP

Sounds good. I'll have a closer look tomorrow.

comment:3 in reply to:  1 Changed 4 months ago by stoecker

Replying to Don-vip:

If we go this way we should clean the words.cfg file (TagChecker.SPELL_FILE) and make sure there is no regression.

Remove it? That has not be maintained for ages. I think this is from the time before we had real presets. The approach of this tickets sounds more like a modern approach comparing the input to valid data.

comment:4 Changed 4 months ago by Don-vip

Removing it is the ultimate cleaning :) We just need to check we don't loose functionality (by sampling values).

comment:5 Changed 4 months ago by GerdP

My understanding is that words.cfg is only used for tags with misspelled keys, not for wrong tag values with known keys.
Try with e.g. highway=residentail

Last edited 4 months ago by GerdP (previous) (diff)

Changed 4 months ago by GerdP

Attachment: filter_words.patch added

small hack to create a reduced version of words.cfg

Changed 4 months ago by GerdP

Attachment: shrink-words.cfg.patch added

patch to reduce and update words.cfg (some values added, many ignored values removed)

Changed 4 months ago by GerdP

Attachment: 17055-v2.patch added

comment:6 Changed 4 months ago by GerdP

17055-v2.patch improves:

  • reduced words.cfg
  • reduced memory footprint because harmonizedKeys no longer contains pairs where key and value are equal
  • if more than one alternative for a tag value is found, all are listed and auto fix is disabled
  • updated unit tests

My my gut feeling is that Utils.getLevenshteinDistance() will not work for the keys, but I'll try to code it later.

comment:7 Changed 4 months ago by GerdP

As expected we will get many false positives when we simply use all tag keys in the presets in combination with getLevenshteinDistance():

Key 'draft' looks like 'craft'. (3)
Key 'link' looks like 'line'. (1)
Key 'ref_name' looks like 'reg_name'. (59)
Key 'sat' looks like 'salt'. (49)
Key 'type2' looks like 'type'. (1)

Also performance is poor compared to the current code because we have to calculate the Levenshtein distance ~600 times for each key of each primitive when we use all keys from the presets.
I see possibilities:
1) Don't use getLevenshteinDistance() with keys
2) Specify a list of keys for which the getLevenshteinDistance() can be used. That could be a simple list or maybe an attribute in the presets xml.

I think I still prefer the solution implemented with 17055-v2.patch

comment:8 Changed 4 months ago by stoecker

Let's try it (your suggestion) ...

If bad we'll get reports soon 🤗

Last edited 4 months ago by stoecker (previous) (diff)

comment:9 Changed 4 months ago by GerdP

Resolution: fixed
Status: newclosed

In 14490/josm:

fix #17055 17055-v2.patch with one empty line removed

comment:10 Changed 4 months ago by Klumbumbus

How to disable clear false positives like Key 'ref_name' looks like 'reg_name'. (59)? Simply add ref_name as correctly spelled key to words.cfg?

comment:11 Changed 4 months ago by GerdP

These wrong positives were reported when I tried to ignore words.cfg (did not post a patch for that). You should not see them with r14490.
On the other hand I just noticed that typo highway=residentail is still not flagged :-(

Changed 4 months ago by GerdP

Attachment: 17055-enh-v1.patch added

Changed 4 months ago by GerdP

Attachment: 17055-enh-v2.patch added

previous version contained a bug

comment:12 Changed 4 months ago by GerdP

Resolution: fixed
Status: closedreopened

With 17055-enh-v2.patch I tried to improve TagChecker so that it also suggests a fix when the harmonized tag value looks like a truncated value listed in the presets.
Examples:
highway=u -> highway=unclassified

Problem: Many values are ambiguous, e.g.
highway=res could be residential or rest_area
highway=ser could be serivce or services
highway=tr could be track, trunk, or trunk_link
In those cases a warning with "Ambiguous truncated property value" is produced.
My goal here is not to find the most likely value but to produce a warning instead of information.
What do you think about that?

There is still a problem with the getLevenshteinDistance(). I see possibly wrong warnings like
Value '125' for key 'fire_hydrant:diameter' looks like one of [100, 150, 250].
Value 'none' for key 'sidewalk' looks like 'no'.

It is probably not a good idea to use that for short values.

comment:13 in reply to:  12 Changed 4 months ago by Klumbumbus

Replying to GerdP:

There is still a problem with the getLevenshteinDistance(). I see possibly wrong warnings like
Value '125' for key 'fire_hydrant:diameter' looks like one of [100, 150, 250].

fire_hydrant:diameter=125 is in the presets. The getLevenshteinDistance() should only be tested on tags which are not in the presets.

Value 'none' for key 'sidewalk' looks like 'no'.

none is not "officially deprecated", but no is the preferred value, so this warning would be ok in my opinion.

comment:14 Changed 4 months ago by GerdP

125 is not in the defaultpresets.xml. See

            <combo key="fire_hydrant:diameter" text="Diameter (in mm)" values="50,80,100,150,200,250,300,400" />

Taginfo says that sidewalk=no and sidewalk=none are nearly equally used. Maybe we can have both in the presets?

comment:15 in reply to:  14 Changed 4 months ago by Klumbumbus

Replying to GerdP:

125 is not in the defaultpresets.xml.

Right, sorry, I mixed it up with 150.

Taginfo says that sidewalk=no and sidewalk=none are nearly equally used. Maybe we can have both in the presets?

Looking at the history of the sidewalk wiki page the preferred tag changed several times over the years. We should have only one in our preset and stay with the current no.

comment:16 Changed 4 months ago by GerdP

In 14508/josm:

see #17055 improve TagChecker

  • allow Levenshtein distance of 2 so that residentail is found as typo
  • disable Fix for typos where the correct value is not known, e.g. services can be service or services
  • don't suggest values if all preset values are short, e.g.

layer=6 should not say "6 looks like one of [1,2,3,4,5]"

comment:17 Changed 4 months ago by GerdP

I didn't commit the changes reg. truncated values, they produced more noise than helpful warnings.
What I'd really like to have would be a flag that says if the list of preset values is quite complete and therefore other values are probably wrong.
This would allow to produce a warning when the value is not in the presets.
I simply don't like that we only produce an information for well known tag keys like highway, e.g. highway=dirt, highway=footpath or highway=u when we do all kinds of complex checks for highways like "Way end node near other highway".

An alternative woulld be to write a MapCSS check which lists all the values that appear in the presets, but I think that would mean a duplication of the lists.
I think a MapCSS rule is needed for something like highway=driveway, where the validator should suggest to replace it with
highway=service + service=driveway?

comment:18 Changed 4 months ago by naoliv

I am unsure if I am the only person who sometimes press Enter too quickly (before auto-completing a value with tab), but I usually have wrong surface=a or surface=u values in my data set.

For surface=u it's suggesting mud, while it most probably should be unpaved
Is it possible to have surface=asurface=asphalt too?

comment:19 Changed 4 months ago by GerdP

I coded something like that but was not satisfied with the results.
Can you explain why you have to press tab for autocompletion? I don't need to do that.

comment:20 Changed 4 months ago by naoliv

Next time I promise I will test exactly what I do before writing :-)
Indeed, I don't use tab but somehow I end up with surface=a and surface=u objects.
(and I will create some local rules to catch this an error)

Rectifying my previous message then, probably surface=u shouldn't be understood as surface=mud (nor offered an auto-fix for this specific case)

comment:21 Changed 4 months ago by GerdP

OK, I think JOSM should not suggest to fix a value with only one character with one that has three.
Reg. surface=a : You are right, it is probably always asphalt, but presets also contain artificial_turf, so JOSM would only show a list of these possible candidates, but it would not enable the Fix button. That's why I wasn't satisfied with my approach.
See also comment:12.

comment:22 Changed 4 months ago by GerdP

In 14517/josm:

see #17055 improve TagChecker

  • don't suggest fix value when given value is short and Levenshtein distance is 2

surface=u should not be fixed with surface=mud but highway=tra can be fixed with highway=track

comment:23 Changed 4 months ago by naoliv

A way with sidewalk=none is giving a warning saying that none seems to be no, but https://wiki.openstreetmap.org/wiki/Sidewalks#Sidewalk_as_refinement_to_a_highway says that "none is also used synonymously to no"

Probably there shouldn't be a warning for this case too.

comment:24 in reply to:  23 Changed 3 months ago by GerdP

Replying to naoliv:

A way with sidewalk=none is giving a warning saying that none seems to be no, but https://wiki.openstreetmap.org/wiki/Sidewalks#Sidewalk_as_refinement_to_a_highway says that "none is also used synonymously to no"

Probably there shouldn't be a warning for this case too.

I also noticed this, see comment:13 to comment:15. I also don't like the warning in this case, but I am unsure how to handle it.
I want to see the warning when one types e.g. oneway=none so one possible solution would be to add a line

K:sidewalk:no

to the end of file ignoretags.cfg.

comment:25 Changed 3 months ago by GerdP

In 14531/josm:

see #17055 improve TagChecker: Don't say "looks like" when all characters of given value are different, e.g. "Value '10' for key 'fee' looks like 'no'." makes no sense. The Levenshtein distance from 10 to no is only 2, but the length is also only 2.

comment:26 Changed 3 months ago by naoliv

shop=gas is warning with Value 'gas' for key 'shop' looks like one of [bag, car, yes].

Gerd, will this warning magically disappear if a preset is created for shop=gas?

comment:27 Changed 3 months ago by GerdP

Yes, I think so. Still, this should also be handled in the code. Looking at it...

comment:28 Changed 3 months ago by GerdP

In 14571/josm:

see #17055 When harmonized value is only 3 characters reduce allowed Levenshtein distance from 2 to 1.

When shop=gas is not in presets, we don't want a message like "Value 'gas' for key 'shop' looks like one of [bag, car, yes]."
just because one character of those values matches.

comment:29 Changed 3 months ago by GerdP

I wonder if we should flag all tag values with just one character (not a digit), so that we produce a warning instead of information for
e.g. highway=u or surface=a, but only for known preset keys.
We have a similar test that says "uncommon short key" for short unknown keys, e.g. for s=asphalt

comment:30 Changed 3 months ago by naoliv

shop=pasta is giving a warning saying that it looks like pastry, but they are different:

https://wiki.openstreetmap.org/wiki/Tag:shop%3Dpasta
https://wiki.openstreetmap.org/wiki/Tag:shop%3Dpastry

shop=party is giving a warning saying that it looks like [art, pastry] while it also shouldn't

https://wiki.openstreetmap.org/wiki/Tag:shop%3Dparty

comment:31 Changed 3 months ago by GerdP

I fear you will always find more here because the presets will probably never cover all cases.
Maybe we could create a list of "good" values for which we don't have a preset using taginfo? I think we already have a script TagInfoExtract.groovy to find possibly missing presets, but it seems the result is not stored anywhere.

comment:32 Changed 3 months ago by naoliv

The only thing that I fear is people pressing Fix without actually taking a look at the message and at the value (in this example, changing pasta to pastry)
Maybe it's safer to disable the autofix and only issue a warning?
Or, I don't know if it's possible, to have a fix option only for cases where there is no doubt at all.

comment:33 Changed 3 months ago by GerdP

OK, I'll remove the autofix option for now.

comment:34 Changed 3 months ago by GerdP

In 14585/josm:

see #17055 Disable autofix for possibly misspelled tag values

Some keys like shop have a long list of documented values and not all are in the presets. It can happen that a good value is flagged and a fix and an improper fix value is suggested.
This should be reverted once we have a better source for "correct" values. The presets are too incomplete.

comment:35 Changed 3 months ago by GerdP

Maybe I should also change the message text to something like
"Value '...' for key '...' is not in presets, did you mean '...'?"

comment:36 in reply to:  31 ; Changed 3 months ago by Klumbumbus

Replying to GerdP:

I fear you will always find more here because the presets will probably never cover all cases.
Maybe we could create a list of "good" values for which we don't have a preset using taginfo? I think we already have a script TagInfoExtract.groovy to find possibly missing presets, but it seems the result is not stored anywhere.

"Good" values which don't have a preset yet (e.g. due to low usage numbers like shop=pasta) can be added to source:/josm/trunk/data/validator/ignoretags.cfg Does your misspelled values test consider this list?

I think TagInfoExtract.groovy is used only for the integration test. (Currently tags without wiki page are ignored too, but this constraint may be removed after adding some tags to the presets and feeding the ignore list a bit, see #16900.)

comment:37 in reply to:  36 ; Changed 3 months ago by GerdP

Replying to Klumbumbus:

"Good" values which don't have a preset yet (e.g. due to low usage numbers like shop=pasta) can be added to source:/josm/trunk/data/validator/ignoretags.cfg Does your misspelled values test consider this list?

Well, it is not executed for the values in this list. Might not be the best way to consider the list.

I think TagInfoExtract.groovy is used only for the integration test. (Currently tags without wiki page are ignored too, but this constraint may be removed after adding some tags to the presets and feeding the ignore list a bit, see #16900.)

The list doesn't contain the examples shop=party or shop=pastry, so we'd probably need a longer list.

comment:38 in reply to:  37 Changed 3 months ago by Klumbumbus

Replying to GerdP:

Replying to Klumbumbus:

"Good" values which don't have a preset yet (e.g. due to low usage numbers like shop=pasta) can be added to source:/josm/trunk/data/validator/ignoretags.cfg Does your misspelled values test consider this list?

Well, it is not executed for the values in this list. Might not be the best way to consider the list.

Yes, with "consider" I meant that the test is not executed for the values in this list.

I think TagInfoExtract.groovy is used only for the integration test. (Currently tags without wiki page are ignored too, but this constraint may be removed after adding some tags to the presets and feeding the ignore list a bit, see #16900.)

The list doesn't contain the examples shop=party or shop=pastry, so we'd probably need a longer list.

Yes these two (and probably some more we don't know atm) should be added to the list to remove the false positive warnings.

comment:39 in reply to:  35 Changed 3 months ago by Klumbumbus

Replying to GerdP:

Maybe I should also change the message text to something like
"Value '...' for key '...' is not in presets, did you mean '...'?"

That sounds better. (Maybe ...in the presets...)

comment:40 Changed 3 months ago by GerdP

OK, will change it. Reg. the list I think about a different approach:
We could generate a list of all frequently used tags and run them through TagChecker. All false positives should be added to
a special list which would than be used to suggest correct spelling. The test is not so much about correct tagging, so I would not care if e.g. sidewalk=none or sidewalk=no is in the presets as long as both values are often used.
How does that sound?

comment:41 Changed 3 months ago by Klumbumbus

Sounds good, however it might be hard to find an proper automatic rule. How to define "often used"?

comment:42 Changed 3 months ago by GerdP

Probably there is no simple answer. As a first approach I'd say all tags used > 1000 times are often used. Let me code something to process such a list, I probably cannot help filling it.

Thought about the warning text again. The "you" is not so good when someone else created the tag.
Next try:

single value in {2} :   "Value ''{0}'' for key ''{1}'' is not in the presets, maybe ''{2}'' is meant?"
multiple values in {2}: "Value ''{0}'' for key ''{1}'' is not in the presets, maybe one of {2} is meant?"
Last edited 3 months ago by GerdP (previous) (diff)

Changed 3 months ago by GerdP

Attachment: 17055-use-ignore.patch added

use values with prefix K: for checks with Levenshtein distance, too

comment:43 Changed 3 months ago by GerdP

With the 17055-use-ignore.patch the lines from ignoretags.cfg starting with K: are treated as "often used".
So, if you add K:shop=party to it
and run the validator with shop=party it will be quiet, for shop=panty it will show

Unknown property value - Value 'panty' for key 'shop' is not in the presets, maybe 'party' is meant? (1)

and for shop=pasty you get

Unknown property value - Value 'pasty' for key 'shop' is not in the presets, maybe one of [party, pastry] is meant? (1)

open questions:
1) Should we use another file for the often used values? If yes, should we use the "K:" prefix?
2) Should we always raise a warning when a tag key is found in that list but not the value?
Depending on this the message texts would change again, I think "not in the presets" would then be "unknown".

comment:44 Changed 3 months ago by Don-vip

Milestone: 19.01

comment:45 in reply to:  42 Changed 3 months ago by simon04

comment:46 Changed 3 months ago by GerdP

Yes, it is close to what we need, but it stops for value count < 10000. I think that's too early.

comment:47 Changed 3 months ago by simon04

Alternatively, you could download the entire SQLite database from https://taginfo.openstreetmap.org/download and perform custom queries.

Changed 2 months ago by GerdP

Attachment: frequent-tags.zip added

comment:48 Changed 2 months ago by GerdP

Thanks for the hint and sorry for the late answer. I've downloaded the complete DB (1.1GB) and played with a few SQL to extract frequently used tags.
My current thinking is this:
Execute a rather simple sql like

'select * from tags where count_all > 1000';

or maybe a bit more complex:

select * from tags where count_all > 1000 and key not like '%name%' and key not like '%source%' and key not like 'tiger%' and key not like 'yh:%' and key not like 'addr:%' and key not like 'is_in%' and key not like 'contact:%' and key not like 'turn:lanes%' and key not in ('circumference','color','colour','created_by','description','ele','fixme','height','width','level','opening_hours','website');

to extract a list of frequently used tags, lets name it frequent-tags.txt. Attached is the result for the latter.
The table tags looks like this:

CREATE TABLE tags (
  key              VARCHAR,
  value            VARCHAR,
  count_all        INTEGER DEFAULT 0,
  count_nodes      INTEGER DEFAULT 0,
  count_ways       INTEGER DEFAULT 0,
  count_relations  INTEGER DEFAULT 0,
  in_wiki          INTEGER DEFAULT 0,
  in_wiki_en       INTEGER DEFAULT 0
);

The result contains the columns separated by |.
My current thinking is this: I could either try to improve the sql so that it filters more tags which are not usefull for the Levenshtein text ("val looks like ...") so that we can add that result directly to the data\validator directory and parse it on each start of JOSM like we do with ignoretags.cfg. The result could be some HashSets, maybe separated for nodes, ways, and relations, which would be used to suggest different spelling.
I'll try to code that now. Question is how the input file is updated.
I think it would be best if that file would be created on the taginfo server once we know exactly how the format should look like?

comment:49 Changed 2 months ago by GerdP

I've played with this for a while now. I have a list of frequently used tags where the values are strings. Many of them are also in the presets, BUT also many are not.

I don't think that I can simply use the list from taginfo without review, as it sometimes contains bad data from imports.
Example: leaf_type=broadl-leaved appears > 27000 times but is simply wrong (should be broadleaved)

I don't want to produce a hint like
Unknown property value - Value 'broadlleaved' for key 'leaf_type' is unknown, maybe one of [broadl-leaved, broadleaved] is meant? (1)
So the list has to be checked carefully.
I fear this will take some more days to produce a good start.
I think about involving the community here.

comment:50 Changed 8 weeks ago by GerdP

In 14739/josm:

see #17055 add K:natural=shrub to suppress false positive in validator

comment:51 Changed 8 weeks ago by GerdP

In 14750/josm:

see #17055:

  • update description for lines starting with K:
  • add K:craft=tailor and K:shop=pasta to suppress false positives:

Value 'tailor' for key 'craft' is unknown, maybe 'tiler' is meant?
Value 'pasta' for key 'shop' is unknown, maybe 'pastry' is meant?

comment:52 Changed 7 weeks ago by GerdP

I think about adding these keys to ignoredtags.cfg (with prefix K:)

building=shop
building=chalet
building=clinic
crop=rape
crossing=marked
crossing=controlled
highway=trail
highway=ford
indoor=door
natural=land
natural=hill
natural=arete
natural=slope
piste:grooming=classic+skating
power=sub_station
product=wine
railway=derail
railway=radio
resource=sand
shop=trade
sidewalk=none
sport=skating
surface=hard

They all produce false warnings:

Value 'classic+skating' for key 'piste:grooming' is unknown, maybe 'classic;skating' is meant? (1)
Value 'clinic' for key 'building' is unknown, maybe 'civic' is meant? (1)
Value 'controlled' for key 'crossing' is unknown, maybe 'uncontrolled' is meant? (1)
Value 'door' for key 'indoor' is unknown, maybe 'room' is meant? (1)
Value 'ford' for key 'highway' is unknown, maybe 'road' is meant? (1)
Value 'hard' for key 'surface' is unknown, maybe 'sand' is meant? (1)
Value 'none' for key 'sidewalk' is unknown, maybe 'no' is meant? (1)
Value 'radio' for key 'railway' is unknown, maybe 'rail' is meant? (1)
Value 'rape' for key 'crop' is unknown, maybe 'grape' is meant? (1)
Value 'sand' for key 'resource' is unknown, maybe 'salt' is meant? (1)
Value 'skating' for key 'sport' is unknown, maybe 'karting' is meant? (1)
Value 'slope' for key 'natural' is unknown, maybe 'stone' is meant? (1)
Value 'sub_station' for key 'power' is unknown, maybe 'substation' is meant? (1)
Value 'trade' for key 'shop' is unknown, maybe 'frame' is meant? (1)
Value 'wine' for key 'product' is unknown, maybe 'lime' is meant? (1)

Any problems with that?

comment:53 in reply to:  52 Changed 7 weeks ago by Klumbumbus

Replying to GerdP:

Any problems with that?

No.

comment:54 Changed 7 weeks ago by GerdP

In 14751/josm:

See #17055: Add more often used tags to ignoretags.cfg to suppress false positives

Changed 7 weeks ago by GerdP

Attachment: TaginfoTestIT.patch added

Changed 7 weeks ago by GerdP

Attachment: extract_1000.sql added

Changed 7 weeks ago by GerdP

Attachment: extract.script added

comment:55 Changed 7 weeks ago by GerdP

To find this list I did this:
1) install Sqlite3: (unzip sqlite-tools-win32-x86-3260000.zip and sqlite-dll-win64-x64-3260000.zip)
2) download and uncopress taginfo db.db.bz2 from https://taginfo.openstreetmap.org/download/taginfo-db.db.bz2(uncompressed size is ~6.5GB)
3) create attached simple sql and script. I also created a much more complex sql to filter all kinds of tags like names etc but decided that it would be difficult to maintain.
4) run Sqlite3 with sqlite3 -readonly -init extract.script taginfo-db.db
5) type ".quit" followed by ENTER to get out of Sqlite3 (I did not yet dind a way to automate that)
6) copy the result file frequent-tags.cfg (~ 2.6 MB) to \josm\data_nodist
7) execute unit test TaginfoTestIT with attached patch TaginfoTestIT.patch
This produces a list of possible false positives which has to be checked by someone who really understands the meaning of the tags,
I see no way to automate that last step. One has to know the tags and maybe also the results produced by other JOSM tests to decide what to do with this list. Maybe we could feed that list to a web page for review by the community?

The rest might be automated, esp. with a little help from taginfo to avoid the download of the full db.

comment:56 Changed 7 weeks ago by Don-vip

Milestone: 19.0119.02

comment:57 Changed 4 weeks ago by Don-vip

Milestone: 19.0219.03

comment:58 Changed 12 days ago by GerdP

In 14881/josm:

see #17055: Suppress warning
Unknown property value - Value '12' for key 'capacity:disabled' is unknown, maybe one of [1, 2] is meant?

Modify Ticket

Change Properties
Set your email in Preferences
Action
as reopened The owner will remain team.
as The resolution will be set.
to The owner will be changed from team to the specified user.
The owner will change to GerdP
as duplicate The resolution will be set to duplicate.The specified ticket will be cross-referenced with this ticket

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.