Opened 15 months ago
Closed 14 months ago
#23235 closed defect (fixed)
Correct url is reported with invalid path
Reported by: | GerdP | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | 23.11 |
Component: | Core validator | Version: | |
Keywords: | template_report | Cc: |
Description (last modified by )
What steps will reproduce the problem?
- Run validator on https://www.openstreetmap.org/way/771715139 which has
url=https://www.nabu-weyhe.de/projekte-und-themen/eulen-falken-und-deren-nistkästen/trafotürme
What is the expected result?
No warning, the link seems to work fine when I open it in JOSM
What happens instead?
URL validator - 'url': URL contains an invalid path: /projekte-und-themen/eulen-falken-und-deren-nistkästen/trafotürme (1)
Please provide any additional information below. Attach a screenshot if possible.
I assume the German umlauts are the reason?
Relative:URL: ^/trunk Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b Last:Changed Date: 2023-08-29 13:38:40 +0200 (Tue, 29 Aug 2023) Revision:18822 Build-Date:2023-08-30 01:30:57 URL:https://josm.openstreetmap.de/svn/trunk Identification: JOSM/1.5 (18822 en) Windows 10 64-Bit OS Build number: Windows 10 Home 2009 (19045) Memory Usage: 1334 MB / 2016 MB (744 MB allocated, but free) Java version: 17.0.5+8-LTS, Azul Systems, Inc., OpenJDK 64-Bit Server VM Look and Feel: com.sun.java.swing.plaf.windows.WindowsLookAndFeel Screen: \Display0 1920×1080 (scaling 1.50×1.50) Maximum Screen Size: 1920×1080 Best cursor sizes: 16×16→48×48, 32×32→48×48 System property file.encoding: Cp1252 System property sun.jnu.encoding: Cp1252 Locale info: en_DE Numbers with default locale: 1234567890 -> 1234567890 VM arguments: [-Djpackage.app-version=1.5.18622, --add-modules=java.scripting,java.sql,javafx.controls,javafx.media,javafx.swing,javafx.web, --add-exports=java.base/sun.security.action=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.plugins.jpeg=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.spi=ALL-UNNAMED, --add-opens=java.base/java.lang=ALL-UNNAMED, --add-opens=java.base/java.nio=ALL-UNNAMED, --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED, --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED, --add-opens=java.desktop/javax.imageio.spi=ALL-UNNAMED, --add-opens=java.desktop/javax.swing.text.html=ALL-UNNAMED, --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED, -Djpackage.app-path=%UserProfile%\AppData\Local\JOSM\HWConsole.exe] Dataset consistency test: No problems found Plugins: + OpeningHoursEditor (36126) + buildings_tools (36134) + measurement (36126) + o5m (36126) + poly (36126) + reverter (36126) + undelete (36126) + utilsplugin2 (36134) Tagging presets: + d:\josm\core\resources\data\defaultpresets.xml Last errors/warnings: - 00001.233 W: extended font config - overriding 'filename.Myanmar_Text=mmrtext.ttf' with 'MMRTEXT.TTF' - 00001.237 W: extended font config - overriding 'filename.Mongolian_Baiti=monbaiti.ttf' with 'MONBAITI.TTF' - 00003.467 E: java.security.KeyStoreException: Windows-ROOT not found. Cause: java.security.NoSuchAlgorithmException: Windows-ROOT KeyStore not available - 01623.706 W: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
Attachments (0)
Change History (11)
comment:1 by , 15 months ago
Description: | modified (diff) |
---|
comment:2 by , 15 months ago
comment:4 by , 15 months ago
Milestone: | → 23.10 |
---|
comment:5 by , 15 months ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I'm not sure if this a really good idea. UTF-8 support is a bit strange, but the REAL URL is
"https://www.nabu-weyhe.de/projekte-und-themen/eulen-falken-und-deren-nistk%C3%A4sten/trafot%C3%BCrme"
You get this, when you paste the URL into e.g. Firefox, submit it and then copy the URL line.
Many systems convert Unicode to such a representation, but it's not sure all do this and also no every server really uses UTF-8.
I'm not sure if we want to encourage using the UTF-8 form of the URL instead of the technically correct one.
I'd rather suggest to use the url-encoded form and only display the Unicode (like e.g. Firefox does).
comment:6 by , 15 months ago
I took a quick look at the RFCs before I made the change -- I didn't see anything which restricted the path to ascii. But it does look like browsers automatically convert it to ascii.
comment:7 by , 15 months ago
https://datatracker.ietf.org/doc/html/rfc3986#section-1.2.1
"A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters."
" Percent-encoded octets (Section 2.1) may be used within a URI to represent characters outside the range of the US-ASCII coded character set if this representation is allowed by the scheme or by the protocol element in which the URI is referenced."
So while UTF-8 usually works nowadays and most recent software can handle it, it will be non-standard, as it's no longer an URI. Depending on the software you'll also get different calls to the server. Some software will send it as is, some will correctly percent encode it and some may convert the charset and sent it e.g. in iso-8859-1, which then will be wrong.
comment:9 by , 14 months ago
Actually I still think that should be reverted. Maybe the warning could be adapted to tell that "proper URL encoding must be used" and maybe a short help how to reach that, i.e. copy into browser URL line and paste result.
Looks like it. source:trunk/src/org/openstreetmap/josm/data/validation/routines/UrlValidator.java#L332.
The regex we are using is
^(/[-\w:@&?=+,.!/~*'%$_;\(\)]*)?$
. We can probably fix it by addingPattern.UNICODE_CHARACTER_CLASS
to the pattern compilation.