Opened 15 months ago
Last modified 15 months ago
#23448 new defect
JOSM uploaded thousands of duplicate ways and relations
Reported by: | nkamapper | Owned by: | team |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Core | Version: | |
Keywords: | template_report | Cc: |
Description (last modified by )
Not sure if this is a problem in JOSM or in the OSM api, but the handling in JOSM was at least unusual.
What steps will reproduce the problem?
Please have a look at topo in the municipality of Alta in Norway (Overpass provided below). I did a large topo update this morning. OSM now contains multiple copies of the uploaded data (thousands of ways and relations).
There was no such problem with the data in JOSM before or after the upload (I still have a copy of the saved file without duplicates, and I have considerable experience in working with large datasets in OSM). JOSM was latest stable version, 18940.
During uploading from JOSM, a couple of unusual things happened:
- I hit the new rate limit.
- When continuing after one hour, JOSM after a while retried several times. It kept repeating Retry 1 for the same changeset number as display in the popup window (never getting to retry 2, 3 etc) until I halted it (cancel during a waiting period). This happened around the stage where ways and relations are being uploaded from JOSM, after nodes, before deletions. JOSM continued after I reduced changeset size.
It seems that one (or a few) changesets got uploaded several times, each time producing a new duplicate of ways and relations.
The duplicates in OSM have been reported to DWG.
What is the expected result?
Uploads without duplicates.
What happens instead?
See above.
Please provide any additional information below. Attach a screenshot if possible.
Here is the Overpass api used (shows the affected data):
[out:xml][timeout:200]; (area[name="Alta"][place=municipality];)->.searchArea; ( nwr["natural"](area.searchArea); nwr["landuse"](area.searchArea); nwr["waterway"](area.searchArea); nwr["leisure"](area.searchArea); nwr["aerodrome"](area.searchArea); nwr["man_made"](area.searchArea); ); (._;>;<;); out meta;
Revision:18940 Build-Date:2024-01-17 12:45:24 Identification: JOSM/1.5 (18940 en_GB) Mac OS X 12.7.2 OS Build number: macOS 12.7.2 (21G1974) Memory Usage: 2828 MB / 8192 MB (1542 MB allocated, but free) Java version: 17.0.10+7-LTS, Azul Systems, Inc., OpenJDK 64-Bit Server VM Look and Feel: com.apple.laf.AquaLookAndFeel Screen: Display 69733568 1440×900 (scaling 2.00×2.00) Maximum Screen Size: 1440×900 Best cursor sizes: 16×16→16×16, 32×32→32×32 System property file.encoding: UTF-8 System property sun.jnu.encoding: UTF-8 Locale info: en_GB Numbers with default locale: 1234567890 -> 1234567890 VM arguments: [-Djpackage.app-version=18940, --add-modules=java.scripting,java.sql,javafx.controls,javafx.media,javafx.swing,javafx.web, --add-exports=java.base/sun.security.action=ALL-UNNAMED, --add-exports=java.desktop/com.apple.eawt=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.plugins.jpeg=ALL-UNNAMED, --add-exports=java.desktop/com.sun.imageio.spi=ALL-UNNAMED, --add-opens=java.base/java.lang=ALL-UNNAMED, --add-opens=java.base/java.nio=ALL-UNNAMED, --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED, --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED, --add-opens=java.desktop/javax.imageio.spi=ALL-UNNAMED, --add-opens=java.desktop/javax.swing.text.html=ALL-UNNAMED, --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED, -Djpackage.app-path=/Applications/JOSM.app/Contents/MacOS/JOSM] Dataset consistency test: No problems found Plugins: + PicLayer (1.0.3) + RelationDissolve (0.2.0) + apache-commons (36176) + apache-http (36176) + changeset-viewer (0.0.7) + conflation (0.6.11) + ejml (36176) + ext_tools (36126) + geotools (36176) + imagery-xml-bounds (36196) + jackson (36176) + jaxb (36118) + jna (36176) + jts (36004) + log4j (36176) + opendata (36200) + pdfimport (36200) + reverter (36196) + scripting (v0.3.1) + todo (137) + utilsplugin2 (36200) Tagging presets: + https://josm.openstreetmap.de/josmfile?page=Presets/LaneAttributes&zip=1 + https://raw.githubusercontent.com/OpenNauticalChart/josm/master/INT-1-preset.xml Map paint styles: + https://josm.openstreetmap.de/josmfile?page=Styles/Lane_and_Road_Attributes&zip=1 - https://raw.githubusercontent.com/OpenSeaMap/josm/master/CEVNI_MapCSS.mapcss - https://raw.githubusercontent.com/OpenSeaMap/josm/master/INT1_Seamark.mapcss - https://josm.openstreetmap.de/josmfile?page=Styles/PublicTransport&zip=1
Attachments (1)
Change History (18)
follow-up: 4 comment:1 by , 15 months ago
comment:2 by , 15 months ago
There are a few large files here: https://www.jottacloud.com/s/059f4e21889c60d4e4aaa64cc857322b134
Look in the N50 folder.
comment:3 by , 15 months ago
Replying to nkamapper:
- When continuing after one hour, JOSM after a while retried several times. It kept repeating Retry 1 for the same changeset number as display in the popup window (never getting to retry 2, 3 etc) until I halted it (cancel during a waiting period).
Once you manually interrupt/cancel an upload you have to be very careful. JOSM does not know which objects have been successfully uploaded. All new objects are still new (id:0) in the local data layer despite that they might have been successfully uploaded and got a positive id by the server. At this stage you need to manually download your last changeset, merge it with your data layer and manually delete all duplicates (keeping the objects with positive id in favor of the duplicates with id:0). Manually running validator on the whole data layer should show you warnings about the duplicates.
comment:4 by , 15 months ago
Replying to taylor.smock:
I hit the new rate limit.
I'm going to have to check and see what happens when I hit the rate limit. Do you have a sample file I can upload to the test API server?
I do not think that the dev server has the rate limit enabled.
comment:5 by , 15 months ago
OK. Just to be clear: I cancelled during the x seconds waiting period, so was assuming the last upload attempt was completed.
And JOSM seemed to be stuck in a loop, continuously retrying to upload the same changeset, but only with 1 retry.
comment:6 by , 15 months ago
@skyper: I'll poke Firefishy to see if it is enabled.
@nkamapper: I'm not understanding something. Can you give us clear steps, e.g.
- Open file
- Upload to OSM
- Cancel after some period of time
- Retry upload to OSM
With that said, I'm expecting the rate limit to give us some useful information. I'll have to poke it to see what happens.
comment:7 by , 15 months ago
Another thing: I have seen (so far) up to 18 duplicates of the same way, but I only interrupted uploads a couple of times.
comment:8 by , 15 months ago
Clear steps:
- Open file.
- Upload to OSM.
- After some time, rate limit message in JOSM.
- Save file.
- After one hour, load file and continue upload.
- After some time, repeated messages in JOSM with Retry 1.
- After a number of retries by JOSM, I cancel upload sequence during x second wait for next retry.
- Continues uploads with smaller changeset size.
- Items 6-8 repeated 3-4 times until completed.
- Rate limit message in JOSM again.
- Remaining upload done using a different account (deletions only). No problems here.
comment:9 by , 15 months ago
I believe this changeset, in my history/sequence of changesets, was the last one before the rate limit hit (step 3 above): https://www.openstreetmap.org/changeset/146847646
comment:10 by , 15 months ago
I have created a script which will remove the duplicate elements in OSM at Alta. The method is to keep all elements in my saved file (from JOSM after uploading) and to remove all other elements which were uploaded that day in that area from my account.
My concern is that other users might start editing in the area, making it difficult to fix the problem in OSM.
I have saved the current version of OSM in Alta, including all duplicates, in this file (topo related elements only): https://www.jottacloud.com/s/05946f3663c9bd2427bbc9614bb7d1bac4d
Hope this is ok for debugging.
There are 7566 duplicate relations, 149952 ways and 13651 nodes.
follow-up: 12 comment:11 by , 15 months ago
I still have a copy of the saved file without duplicates
If that file contains the data before you started to upload it would certainly help more to reproduce than the file that is the result of the problem.
comment:12 by , 15 months ago
Replying to GerdP:
I still have a copy of the saved file without duplicates
If that file contains the data before you started to upload it would certainly help more to reproduce than the file that is the result of the problem.
Here is a folder with 3 files - https://www.jottacloud.com/s/05945942147e55847e9b5d6bcb64372debd/list/
- Saved from JOSM before upload
- Saved from JOSM after upload
- Resulting dataset in OSM after upload, with duplicates (same as previous link), before duplicates were removed today in OSM
comment:13 by , 15 months ago
The OSM server should be returning a 429 response. This is the same as HttpURLConnection.HTTP_CONFLICT
; we catch those in OsmApi#sendRequest (L823 currently).
Apparently the data from the failing upload should not be committed, and I'm trying to check that. But it does appear that the dev api does not have the limits. Or at least I've been unable to hit them, despite trying. And I'm not going to try on the production API.
I suspect this is due to cancelling and restarting when JOSM was in the middle of a diff upload.
comment:14 by , 15 months ago
The rate limiting on the dev API has been fixed, and the response from the server looks like this:
Response code: 429
Response body: Upload has been blocked due to rate limiting. Please try again later.
Interesting response headers:
Error: Upload has been blocked due to rate limiting. Please try again later.
Date: Wed, 31 Jan 2024 15:51:15 GMT
-- this doesn't appear to be when I can upload again, unfortunately.
I think the current code is "good enough", assuming that the server does reject the entire upload (per TomH on #osm-dev).
by , 15 months ago
Attachment: | 23448.patch added |
---|
Sample special casing of rate limiting (do not apply)
comment:15 by , 15 months ago
Description: | modified (diff) |
---|
comment:16 by , 15 months ago
I got into the same problems with a new upload yesterday for Lebesby municipality (Norway). Rougly same size as with Alta municipality above - 167 changesets x 10k each.
Step by step:
- Saved file from JOSM and started uploading to OSM with 10k changesets.
- After 100 changesets: Message about rate limit.
- Last changeset remained open but was empty.
- Waited 1 hour, then continued uploading
- After 50 changesets: Several strange "Retry 1" messages (see video)
- 10 identical but duplicate changesets created (with new id's), one for each retry
- No updates in JOSM from the duplicated changesets (including not from the first of them)
- Popup window did not indicate that new changesets had been uploaded (no countdown of remaining changesets)
- Cancelled uploads while in countdown/sleep sequence.
- Observation: The previous Alta case also stopped at around 100 + 50 changesets.
- Observation: The problems after 50 changesets happend exactly at the point were new ways started to be uploaded (before that only nodes). This was also the case for Alta.
- Video + photo of duplicated changesets available in link below
- Waited 1 hour (although no rate limit message this time, unlike the previous Alta case), then started uploading again:
- Got "Retry 1" message, then JOSM itself stopped further uploading.
- One more duplicate changeset created with same content as in step 4; no update of file in JOSM.
- Perhaps waiting 1 hour was not necessary. Not sure what the rate limit is after the first "rate limit" message, by the way - is it 50% of original limit?
- Tried various methods:
- a) Uploading using another account
- b) Saving file and restarting JOSM then uploading
- c) Combinations of a and b
- Same problem and results as in step 5.
- Reduced changeset size:
- First tried size 5000, which worked for 5 uploads (compiled by JOSM in 3 changesets)
- Then reduced to 2500 which worked and the upload was completed successfully
- Saved JOSM file. The resulting file in JOSM was ok with respect to integrity but did not contain any duplicates.
- Used script to remove all 16500 duplicate ways in OSM.
If I had to try again I would be inclined to just use a small changeset size.
Link to JOSM files + video: https://www.jottacloud.com/s/0592672e82b03e1470c9f96b5c72e97c5e4/list/
comment:17 by , 15 months ago
I uploaded yet another large dataset for Tana municipality (Norway) and encountered a new problem:
- First 100 changesets (each 10k) were ok. Then paused 1 hour after rate limit message. Saved file.
- After 1 hour, selected all remaining new nodes and tried to upload. JOSM was stuck at the first changeset:
- Stuck at "Postprocessing uploaded data" message.
- JOSM unresponsive for 10 minutes. Forced exit.
- This is the changeset: https://www.openstreetmap.org/changeset/147376840 (later reverted).
- Restarted JOSM and uploaded everything with changeset size 2500 (in total approx. 950k elements). All ok.
I'm going to have to check and see what happens when I hit the rate limit. Do you have a sample file I can upload to the test API server?