Modify

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#11774 closed enhancement (fixed)

[Patch] Warn about obvious misspelled tag keys

Reported by: mdk Owned by: team
Priority: normal Milestone: 15.09
Component: Core validator Version: latest
Keywords: Cc: Klumbumbus

Description

Inspired by http://www.openstreetmap.org/user/marczoutendijk/diary/35512 mentioned in the current Wochennotiz Nr. 263 I extend the validator now for tag keys as I already did for tag values (see #11498):

A lot of the (faulty) keys I found are of the uppercase/lowercase type:
Name when name was meant for instance. Almost any regular key (amenity, shop, tourism, highway, landuse etc) appears in a misspelled version in the database (tourims, land-use etc). Also added interpunction (name; or name, or name-) counts for quiet a number of those one-time-only keys.

My patch don't cover all typos, but all uppercase/lowercase and some of the cases with additional and missplaced interpunction like land-use, name;, name, or name-.

More general, I first normalize all keys found in presets:

  • convert to lower case
  • replace -, : and SPACE with _
  • remove all leading and trailing -, _, ;, : and ,

When during validation a key would trigger the "Presets do not contain property key" warning, I look now, if this key (also normalized) will match one of the normalized preset keys. If I found a match, I produce a warning like Key 'Building' looks like 'building'. with an auto fix to replace the key with the non normalized key I found in the presets.
See patch validateKeys1.diff.

I also add an alternative patch, where I merge this check with the existing spell checking feature using data/validator/words.cfg and the other optional dictionaries found by Main.pref.getCollection(PREF_SOURCES, DEFAULT_SOURCES).
See patch validateKeys2.diff.

BTW only the second patch warns about Key 'land-use' looks like 'landuse'., bacause words.cfg contains

+landuse
-land_use

and land-use is normalized to land_use :)

With the second patch, we could also reduce the size of words.cfg by eleminating all missspelled keys which are covered by the generic approach. We could also cover the tourims case by adding this to words.cfg, but this is a different story...

Attachments (3)

validateKeys1.diff (4.9 KB ) - added by mdk 9 years ago.
simple patch
validateKeys2.diff (6.8 KB ) - added by mdk 9 years ago.
extended patch
TagCheckerTest.java (1.9 KB ) - added by simon04 9 years ago.

Download all attachments as: .zip

Change History (11)

by mdk, 9 years ago

Attachment: validateKeys1.diff added

simple patch

by mdk, 9 years ago

Attachment: validateKeys2.diff added

extended patch

comment:1 by Klumbumbus, 9 years ago

Cc: Klumbumbus added

comment:2 by simon04, 9 years ago

Hi, thank you for your contributions. Some remarks:

  • some tests are very welcome, see attachment:TagCheckerTest.java (to be placed in test/unit/org/openstreetmap/josm/data/validation/tests/TagCheckerTest.java) for a starting example
  • org.openstreetmap.josm.data.validation.tests.TagChecker#prettifyKey is rather a harmonizeKey?
  • org.openstreetmap.josm.data.validation.tests.TagChecker#addKey did not output a single warning on the default presets. Are all those tests needed?

by simon04, 9 years ago

Attachment: TagCheckerTest.java added

in reply to:  2 ; comment:3 by mdk, 9 years ago

Replying to simon04:

Hi, thank you for your contributions. Some remarks:

  • some tests are very welcome, see attachment:TagCheckerTest.java (to be placed in test/unit/org/openstreetmap/josm/data/validation/tests/TagCheckerTest.java) for a starting example

I wasn't able to execute the tests. I always get the error:

ERROR: java.io.IOException: Failed to open input stream for resource 'resource://data/preferences.xsd'

and the preferences.xml is replaced by an "empty" version. What is the correct configuration for Eclipse to run these tests?

  • org.openstreetmap.josm.data.validation.tests.TagChecker#prettifyKey is rather a harmonizeKey?

Yes. But then we should also rename prettifyValue.

  • org.openstreetmap.josm.data.validation.tests.TagChecker#addKey did not output a single warning on the default presets. Are all those tests needed?

This is a paranoid test to detect errors in presets and/or spelling files. Lets assume one preset uses the wrong key Landuse, and an other one the correct landuse, this test will detect such errors.

in reply to:  3 ; comment:4 by simon04, 9 years ago

Replying to mdk:

I wasn't able to execute the tests. I always get the error:

ERROR: java.io.IOException: Failed to open input stream for resource 'resource://data/preferences.xsd'

and the preferences.xml is replaced by an "empty" version. What is the correct configuration for Eclipse to run these tests?

How did you run the tests? Running ant clean dist && ant test seems to work for me. See also InstallNotes#Compiling.

Yes. But then we should also rename prettifyValue.

Yes.

This is a paranoid test to detect errors in presets and/or spelling files. Lets assume one preset uses the wrong key Landuse, and an other one the correct landuse, this test will detect such errors.

I'm not aware of any typo of this kind in the presets. I personally would be reluctant when testing conditions which hardly ever will occur …

in reply to:  4 comment:5 by mdk, 9 years ago

Replying to simon04:

Replying to mdk:

I wasn't able to execute the tests. I always get the error:

ERROR: java.io.IOException: Failed to open input stream for resource 'resource://data/preferences.xsd'

and the preferences.xml is replaced by an "empty" version. What is the correct configuration for Eclipse to run these tests?

How did you run the tests? Running ant clean dist && ant test seems to work for me. See also InstallNotes#Compiling.

From the context menu "Run As" -> "JUnit Test"

This is a paranoid test to detect errors in presets and/or spelling files. Lets assume one preset uses the wrong key Landuse, and an other one the correct landuse, this test will detect such errors.

I'm not aware of any typo of this kind in the presets. I personally would be reluctant when testing conditions which hardly ever will occur …

Yes. Thats what my tests shows too :-)

We could remove this test easily:

    private static void addKey(String harmonizedKey, String key) {
        String otherKey = harmonizedKeys.get(harmonizedKey);
        if (otherKey == null) {
            harmonizedKeys.put(harmonizedKey, key);
            // Main.debug(prettyKey + " -> " + key);
        }
    }

But keep in mind: I check the presets and the spelling files together. And the harmonizedKeys map is is only created on first validator run.

in reply to:  4 comment:6 by Klumbumbus, 9 years ago

Replying to simon04:

This is a paranoid test to detect errors in presets and/or spelling files. Lets assume one preset uses the wrong key Landuse, and an other one the correct landuse, this test will detect such errors.

I'm not aware of any typo of this kind in the presets. I personally would be reluctant when testing conditions which hardly ever will occur …

We had typos in the internal preset for color/colour and protected_class (https://josm.openstreetmap.de/ticket/10691#comment:14)

comment:7 by simon04, 9 years ago

Resolution: fixed
Status: newclosed

In 8753/josm:

fix #11774 - Warn about obvious misspelled tag keys (patch by mdk, modified)

comment:8 by simon04, 9 years ago

Milestone: 15.09

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain team.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.