Modify

Opened 4 weeks ago

Closed 3 weeks ago

#16743 closed defect (fixed)

HTTP timeout errors in tile cache

Reported by: anonymous Owned by: anonymous
Priority: normal Milestone: 18.09
Component: Core imagery Version:
Keywords: network Cc: wiktorn

Description

My network connection is currently poor and I'm getting a large number of timeout errors (stacktrace below). The tile cache behaves erratically - some tiles don't load others, already loaded, randomly disappear. Reloading a tile or all tiles helps but after a while the tiles start disappearing again.

Version 14180.

Stacktrace:

22018-09-16 17:52:13.377 WARNING: Already here java.net.SocketTimeoutException: Read timed out
2018-09-16 17:52:43.488 INFO: GET https://clarity.maptiles.arcgis.com/arcgis/rest/services/World_Imagery/MapServer/tile/18/86335/130862 -> !!!
2018-09-16 17:52:43.489 WARNING: java.net.SocketTimeoutException: Read timed out
java.net.SocketTimeoutException: Read timed out
	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:425)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:154)
	at java.base/sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1031)
	at java.base/sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	at java.base/sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1402)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1429)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
	at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:567)
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:163)
	at org.openstreetmap.josm.tools.HttpClient.connect(HttpClient.java:145)
	at org.openstreetmap.josm.tools.HttpClient.connect(HttpClient.java:91)
	at org.openstreetmap.josm.data.cache.JCSCachedTileLoaderJob.loadObject(JCSCachedTileLoaderJob.java:322)
	at org.openstreetmap.josm.data.cache.JCSCachedTileLoaderJob.run(JCSCachedTileLoaderJob.java:231)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:844)


Attachments (0)

Change History (12)

comment:1 Changed 4 weeks ago by Don-vip

Cc: wiktorn added
Keywords: network added
Type: defectenhancement

I don't know if we can do something to improve this kind of situation.

comment:2 Changed 4 weeks ago by anonymous

I mean, network timeouts are to be expected. But the tiles that are already in the cache should not be removed/invalidated. Currently when I pan/zoom the view a bit JOSM will try and fail to reload some or all tiles. Not sure why it happens, or even if it is related to the above exceptions.

comment:3 Changed 4 weeks ago by anonymous

Type: enhancementdefect

comment:4 Changed 4 weeks ago by wiktorn

Can you check by right clicking on the tile and choosing "Show tile info" how long JOSM will keep the tiles in the cache (metadata expirationTime)? It might be the case, that server sends really short validity period that would result in such behaviour. If you add an imagery entry in advanced mode, you have the ability to set minimum cache validity.

It would be good to know what size of maximum size of disk cache have you set in imagery settings as well as if initialized properly (no additional warnings upon opening first imagery layer).

Other things that you may try:

  • reduce number of concurrent downloads in imagery settings
  • increase the socket.timeout.read property in advanced settings

comment:5 Changed 4 weeks ago by anonymous

Normally the tiles have an expiration time set to 24hrs but some tiles had no info on expiration time at all (only name, url, size and display size in tile info). I'm using TMS tiles ("Esri World Imagery (Clarity) Beta").

Other messages when initializing imagery:

2018-09-16 19:29:40.967 INFO: Defaults for imagery.generic.default_autozoom differ: true != 
2018-09-16 19:29:40.967 INFO: Defaults for imagery.generic.default_autozoom differ:  != true
2018-09-16 19:29:40.968 INFO: Defaults for imagery.generic.default_autoload differ: true != 
2018-09-16 19:29:40.968 INFO: Defaults for imagery.generic.default_autoload differ:  != true
2018-09-16 19:29:40.968 INFO: Defaults for imagery.generic.default_showerrors differ: true != 
2018-09-16 19:29:40.968 INFO: Defaults for imagery.generic.default_showerrors differ:  != true
2018-09-16 19:29:41.345 INFO: AbstractTileSourceLayer: estimated visible tiles: 28, estimated cache size: 112
2018-09-16 19:29:41.605 INFO: AbstractTileSourceLayer: estimated visible tiles: 28, estimated cache size: 112
2018-09-16 19:29:41.833 INFO: AbstractTileSourceLayer: estimated visible tiles: 28, estimated cache size: 112
2018-09-16 19:29:41.834 INFO: Allocate for tile source layer: 28 MB of memory. Available: 1 998 MB.
2018-09-16 19:30:12.412 INFO: GET https://clarity.maptiles.arcgis.com/arcgis/rest/services/World_Imagery/MapServer/tile/17/41450/64992 -> !!!
2018-09-16 19:30:12.416 WARNING: java.net.SocketTimeoutException: Read timed out
java.net.SocketTimeoutException: Read timed out
	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:425)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:154)
	at java.base/sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1031)
	at java.base/sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	at java.base/sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1402)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1429)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
	at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:567)
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:163)
	at org.openstreetmap.josm.tools.HttpClient.connect(HttpClient.java:145)
	at org.openstreetmap.josm.tools.HttpClient.connect(HttpClient.java:91)
	at org.openstreetmap.josm.data.cache.JCSCachedTileLoaderJob.loadObject(JCSCachedTileLoaderJob.java:322)
	at org.openstreetmap.josm.data.cache.JCSCachedTileLoaderJob.run(JCSCachedTileLoaderJob.java:231)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:844)

I use 1 thread/host. Previously I thought this may cure the problem but it hasn't. I've switched to 120s read timeout but that didn't help. It just takes more time before I get the timeout.

I have now found a workaround which seems solves the problem for me: I've switched the tile source URL from https to http. Looks like someone here (UTC+8) doesn't like encrypted network connections. No more timeouts. I can still notice tiles reloading occasionally (why?) when panning/zooming but that is no longer a problem if reload succeeds.

comment:6 Changed 4 weeks ago by wiktorn

In 14268/josm:

Retry tile download on transient errors.

Till now there was no strategy to retry tile download. Tiles for which download
finished with error where by design not downloaded again. Because we were
issuing surplus download requests it was not easy to notice, that some tiles
where downloaded at different zoom level. When surplus downloads where removed -
problem became aparent.

Redefine semantics in AbstractTileSourceLayer, that all Tiles that are not
loaded are tried again. Move all logic deciding whether tile has loaded or not
to TMSCachedTileLoaderJob.

Implment in TMSCachedTileLoaderJob following rules:

  • when HTTP status code is between 400 and 500 - treat errors as permament
  • when HTTP status code equals or is greater than 500 - then treat error as

temporary

  • when there was exception in JCSCachedTileLoaderJob, then HTTP status code is

set to 599. Treat this as permament error. This is suboptimal, as we could treat
Socket read timeout and similar as temporary, and UnkownHostException as
permament

The other aproach is to implement retry count in Tile and retry for finite
number of times, preferably with some delay between tries.

Closes: #16747
See: #16743

comment:7 Changed 4 weeks ago by wiktorn

Good way to simulate such situation is to run:

$ sudo tc qdisc change dev enp0s3 root netem rate 4kbit delay 2s 5s

When concurrent downloads are ~4 per host. I've also set -Dsun.net.inetaddr.ttl=-1 to prevent UnkownHostException that may happen with such aggressive traffic throttling.

The difficulty I had with retesting was, that if throttling was too aggressive, I've got NoRouteToHostException or javax.net.ssl.SSLHandshakeException - Remote host closed connection during handshake.

comment:8 Changed 4 weeks ago by wiktorn

In 14270/josm:

Treat SocketTimeoutException as transient error.

Extend CacheEntryAttributes and allow storage of exception class name. Use this
information to treat SocketTimeoutException in TMS downloads as transient.

See: #16743

comment:9 Changed 4 weeks ago by wiktorn

Owner: changed from team to anonymous

This should help for SocketTimeoutException and JOSM will try to refetch such tiles. For other exceptions we need more robust retry policy.

As for the other bug mentioned here (reloading the tiles that are already visible). It might be related to #16747 that this were tiles downloaded from different zoom level and that's way they where downloaded.

Please check if this addresses your issue and close the ticket if JOSM works as expected.

comment:10 Changed 4 weeks ago by wiktorn

Milestone: 18.09

comment:11 Changed 3 weeks ago by anonymous

Thanks! This looks like a lot of relevant changes. Unfortunately (or fortunately) I am no longer subjected to SSL connection throttling/filtering. I won't be able to confirm that for at least several months. Perhaps someone currently in China could test it.

As for re-downloading the tiles (and sometimes failing at that) I can confirm that it indeed seemed to be related to switching between zoom levels.

comment:12 Changed 3 weeks ago by wiktorn

Resolution: fixed
Status: newclosed

Ok, so I close this ticket for now. If you happen to visit China again and notice the same problem - please open new ticket.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain anonymous.
as The resolution will be set.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.