Modify

Opened 5 years ago

Closed 5 years ago

#9813 closed enhancement (fixed)

Improve TMS cache directory layout

Reported by: wavexx Owned by: The111
Priority: normal Milestone: 15.01
Component: JMapViewer Version:
Keywords: cache Cc:

Description

I'm trying to make several programs on Linux share the same tile cache directory layout, mostly to limit bandwidth usage.

With the attached patch I modify the TMS cache directory to use the "standard" OSM on-disk tile layout, which is a pretty straightforward Z/X/Y[.extension] hierarchy. It has the added advantage of not slowing down on Windows when the tile number grows after a long session, and it's self-pruning.

Of course, setting up some symlinks to use a shared cache is still up to the user, but at least it's now possible. It doesn't make any difference from the user's perspective.

Attachments (1)

osm-ondisktile.diff (4.2 KB) - added by wavexx 5 years ago.

Download all attachments as: .zip

Change History (43)

Changed 5 years ago by wavexx

Attachment: osm-ondisktile.diff added

comment:1 Changed 5 years ago by stoecker

Sounds like a good idea to me.

comment:2 Changed 5 years ago by Don-vip

This is a good idea indeed we have a couple of tickets on the same subject. But this is only part of the problem that must be addressed all at one:

  • we must change directory structure
  • but we must remove or migrate existing cache (it can be very large)
  • we must check why the cache is never cleaned
  • to differentiate the two structures, changing cache location would be a good idea. We have another couple of tickets for windows and Linux
  • as the cleanup/migration will be extremely long it must be done in background, in several runs and the user should be informed as it will slow down the machine

I think that's all :)

comment:3 Changed 5 years ago by Don-vip

Milestone: 14.05

See #4904, #5309, #6248.
I would prefer to fix all of them at the same time after we switch to Java 7 => targeted for 14.05

comment:4 Changed 5 years ago by anonymous

Just FYI, On Linux JOSM defaults to put the cache into /tmp/. /tmp has been on tmpsfs since at least 4 years on Debian/Ubuntu, which means that the cache is virtually per-session.

About #4904: is this a request to use a standard location? I would arguably use the XDG spec, and use ~/.cache/tiles/[id]/z/x/y.png

The problem is that [id] is hardly unique among programs (and too vaguely defined. Using ~/.cache/tiles as the base directory though would be as good as any other. I think here we just need a couple of projects joining the bandwagon. At the same time, yesterday I submitted a patch to viking (viking.sf.net) to use the same tile format for the cache. Viking uses ~/.viking-maps/[id], but I could propose something in common with josm.

As for #5309, the issue (slowdown) is caused by the huge flat directory on windows. On linux the issue is less visible, but still present. This is solved with this format.

#6248: No idea about WMS cache. Is this just a webkit component with a browser cache? (because in this case I wouldn't conflate the two).

Last edited 5 years ago by Don-vip (previous) (diff)

comment:5 Changed 5 years ago by wavexx

(the previous poster was me, just forgot to log-in).

I have a few questions related to the cache:

  • I see no size management. While I fixed "Flush tile cache" (in the patch, which will also flush correctly the existing cache), I see no way to keep the size contained. Am I missing something, or it's just un-implemented?
  • JOSM likes to save etags, though the other programs can just get away with timestamps and age-based caching. While saving additional files in the tree is not a problem for the tile format, why exactly we save etags and timestamps? This shouldn't really be needed, with ctime being more than enough to cache the files.

comment:6 in reply to:  5 Changed 5 years ago by stoecker

Replying to wavexx:

(the previous poster was me, just forgot to log-in).

I have a few questions related to the cache:

  • I see no size management. While I fixed "Flush tile cache" (in the patch, which will also flush correctly the existing cache), I see no way to keep the size contained. Am I missing something, or it's just un-implemented?

Probably unimplemented, probably simply broken? :-)

  • JOSM likes to save etags, though the other programs can just get away with timestamps and age-based caching. While saving additional files in the tree is not a problem for the tile format, why exactly we save etags and timestamps? This shouldn't really be needed, with ctime being more than enough to cache the files.

Hmm, Slippymap plugin displayed tile information. Don't know if this still is supported or a leftover.

comment:7 Changed 5 years ago by wavexx

I saw a plugin to display the tile information, though the point is more along the line: is it really worth it to create a file to store useless information? For the scope of the tile cache, ctime is all we need to perform caching and expiry.

If that's not strictly needed for something else that I don't see, I would actually remove the 'tags' files from the cache. Also for the sake of a shared tile cache, these files would be left without the actual tile counterpart if another application purges a tile to conserve the space.

As for the migration, I could help also doing this (though, speaking for linux, I don't believe anyone needs it). The issue there is:

  • Should we prompt for a tile cache migration to another path? (~/.cache/tile and windows counterpart [guidance needed here]?)
  • Should we prompt whether to migrate or simply purge the old cache?

Being a one-time-off, doing a shell/perl script would actually be easier for the interested parties, but I know this would be a problem in windows-land.

comment:8 Changed 5 years ago by skyper

Keywords: cache added

comment:9 Changed 5 years ago by bastiK

Instead of a real migration, it might be easier to keep the old cache. Just read in both the old and the new cache and only write to the new cache. After a some time, the old stuff could be removed.

comment:10 Changed 5 years ago by stoecker

I'd simply drop the old cache!

comment:11 in reply to:  7 Changed 5 years ago by bastiK

Replying to anonymous:

#6348: No idea about WMS cache. Is this just a webkit component with a browser cache? (because in this case I wouldn't conflate the two).

No, WMS is widely used and it is supported well in JOSM. The problems and missing features are the same for both the WMS and the TMS cache. Anyway, if only the TMS cache is improved for now, this would still be a significant step forward.

Replying to wavexx:

(the previous poster was me, just forgot to log-in).

I have a few questions related to the cache:

  • I see no size management. While I fixed "Flush tile cache" (in the patch, which will also flush correctly the existing cache), I see no way to keep the size contained. Am I missing something, or it's just un-implemented?

Not implemented as far as I know.

  • JOSM likes to save etags, though the other programs can just get away with timestamps and age-based caching. While saving additional files in the tree is not a problem for the tile format, why exactly we save etags and timestamps? This shouldn't really be needed, with ctime being more than enough to cache the files.

It depends on server support.

Apparently there are two ways to find out if a file has changed on the server: ETag and Last-Modified. (see TileSource.java)

There are two steps: First the server needs to send the ETag / Last-Modified header along with the image and second it has to support the If-None-Match / If-Modified-Since header. If only the first part is supported, you need to send an extra HEAD request to find out if the etag / mtime has changed. There may be servers that support the None-Match but not the If-Modified-Since header, so we should not remove this feature.

Actually this data is missing on the Maps page and we should check, whether the current implementation works as expected.

Replying to wavexx:

I saw a plugin to display the tile information, though the point is more along the line: is it really worth it to create a file to store useless information? For the scope of the tile cache, ctime is all we need to perform caching and expiry.

If that's not strictly needed for something else that I don't see, I would actually remove the 'tags' files from the cache. Also for the sake of a shared tile cache, these files would be left without the actual tile counterpart if another application purges a tile to conserve the space.

As for the migration, I could help also doing this (though, speaking for linux, I don't believe anyone needs it). The issue there is:

  • Should we prompt for a tile cache migration to another path? (~/.cache/tile and windows counterpart [guidance needed here]?)
  • Should we prompt whether to migrate or simply purge the old cache?

Ideally these changes shouldn't be noticed by the normal user, so no prompts if possible.

comment:12 Changed 5 years ago by robbieonsea

Just a quick note, instead of just ~/.cache ; I think it should be (if the env var exists) $XDG_CACHE_HOME and if it doesn't then fall back to ~/.cache

See: http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html

comment:13 Changed 5 years ago by wavexx

I'm trying to draft something that could be used as guideline for a shared tile cache.

It's here: https://josm.openstreetmap.de/wiki/SharedTileCache

I decided to use JOSM's wiki as I believe JOSM is the biggest player in this area. I'd like suggestions or comments on the draft, which I believe is simple and liberal enough to be used by most applications. I'll ping the "Viking" developers (as it's another application I'm using) and ask about their opinion.

Nothing written on this page is definitive, but I'd like to base my patch on something that other people can follow.

comment:14 Changed 5 years ago by stoecker

If you create a new file format, please use XML. For JOSM we too often started with a text format and later converted to XML, now the rule is always to use XML :-)

comment:15 Changed 5 years ago by wavexx

I'd rather keep the description file format to an absolute minimum which is easily processable even with standard unix tools, so that even simple shell scripts could operate on the cache.

While it's rather easy to plug an XML parser in Java, it's an additional dependency for almost any other project. Not to mention that there's nothing else that I've come across that needs to be stored for both the configuration and the tile metadata.

I'm really pushing for minimum friction.

comment:16 Changed 5 years ago by wavexx

Meanwhile, I also contacted GPSPrune, Marble and Viking developers asking to comment on the spec.

comment:17 Changed 5 years ago by bastiK

A shared tile cache for different applications is certainly ambitious, but your current specification is already quite nice! Some comments:

  • How do you choose the base URL for the cases with multiple subdomains, e.g.
    https://{switch:a,b,c}.tile.openstreetmap.org/{zoom}/{x}/{y}.png
    and
    https://gps-{switch:a,b,c}.tile.openstreetmap.org/lines/{zoom}/{x}/{y}.png?
  • How do you handle OSGeo-style y-coordinate (see Maps), convert it to the conventional tile coordinates?
  • The reason for maximum cache size is limited space on the hard drive, so it could be more useful to limit the overall size of the TMS cache (including OSM tiles, Bing, ...)
  • I would expect that different application have different default values for the cache time and the cache size. How do you negotiate these values? It is certainly not a good idea if the setting in the cache.ini is switched back and forth whenever you change the software.
    For the cache time, I see no reason to fix this value system-wide. Each program can decide independently if it displays the cached tile or considers it too old and does an update.
    For the cache size, it is more complicated. If a user builds up a large cache, e.g. for later offline use, it would certainly not be nice to delete everything without notice when he/she runs a tool which happens to have a very low setting for the cache size.

comment:18 Changed 5 years ago by xeen

Ticket #7770 has been marked as a duplicate of this ticket.

comment:19 in reply to:  17 Changed 5 years ago by wavexx

Replying to bastiK:

A shared tile cache for different applications is certainly ambitious, but your current specification is already quite nice! Some comments:

  • How do you choose the base URL for the cases with multiple subdomains, e.g.
    https://{switch:a,b,c}.tile.openstreetmap.org/{zoom}/{x}/{y}.png
    and
    https://gps-{switch:a,b,c}.tile.openstreetmap.org/lines/{zoom}/{x}/{y}.png?

Very good question. I would like something that would avoid multiple keys.

Is it a too strong assumption to ask the application (which should know the a/b/c subdomains) to perform the match by themselves?

Do you think it would be best to have a new key, like alias=[list] with domain aliases for dumb applications?

  • How do you handle OSGeo-style y-coordinate (see Maps), convert it to the conventional tile coordinates?

Yes, I would, unless there's actual loss of information I would keep the storage coordinates uniform.

We could foresee a different layout= or format= key, to provision for something that cannot be stored exactly in the same way. I do not have experience beyond TMS, so please, give advice here.

  • The reason for maximum cache size is limited space on the hard drive, so it could be more useful to limit the overall size of the TMS cache (including OSM tiles, Bing, ...)

Pruning the cache is a difficult topic. Currently applications are free to choose their pruning strategy, because different applications may have better ideas on how to perform it. In this case, if you exceed the size of one cache, how do you prune? You kill somebody else's cache that you are currently not even using?

I think it's a very good concern for an unified cache, but I would consider this spec as a first step, and keep things very simple and eventually re-iterate.

  • I would expect that different application have different default values for the cache time and the cache size. How do you negotiate these values? It is certainly not a good idea if the setting in the cache.ini is switched back and forth whenever you change the software.
    For the cache time, I see no reason to fix this value system-wide. Each program can decide independently if it displays the cached tile or considers it too old and does an update.
    For the cache size, it is more complicated. If a user builds up a large cache, e.g. for later offline use, it would certainly not be nice to delete everything without notice when he/she runs a tool which happens to have a very low setting for the cache size.

This depends on the use, indeed. For the way I was using different applications, I always wanted the same cache to have the same size and expiration time independently of the program. For the way I envision the cache management, one would load the expiration settings at start and use it though-out unless the user changes them.

I see perfectly your point though. I think a good solution in this case could come from the applications themselves, to offer a "use global settings (for the cache)" and a switch to override them using the application settings (without actually modifying the parameters in cache.ini).

comment:20 Changed 5 years ago by wavexx

To clarify a bit the complexity, my main argument is thinking in the shoes of the developers, which may glance at the spec. I want them to see that this is just a convenience schema, and there's very little to do to comply for the user's benefit.

As an application writer, we'll ensure you have enough room to do advanced expiration and management (your application may as well manage all the caches the users is seeing at _once_), but the dumb application that was written 5 years ago just needs a small tweak and it will work.

I don't want developers to think "oh, I *need* to perform pruning for all caches" to to adhere to the spec.

comment:21 Changed 5 years ago by AM909

From ticket #7770, talking about the last part of the path, I'd like to see a structure like:
112987.543112.png (xxxxxx.yyyyyy.png) stored as 11/29/87/54/31/12.png (xx/xx/xx/yy/yy/yy.png) to limit the size of each directory to 100 files (or directories), or 112/987/543/112.png (xx/xx/xx/yy/yy/yy.png) to limit the size of each directory to 1000 files (or directories), for the benefit of performance of things like Windows explorer and possibly other tools and APIs.

comment:22 Changed 5 years ago by wavexx

Already splitting the path by zoom/x would lead to huge cut in terms of files per directory. Do you really have some suggestion that this wouldn't be enough?

A deep level hierarchy would actually slow down direct tile access and cache pruning (due to the deeper tree traversal).

comment:23 Changed 5 years ago by stoecker

More than 1000 files per directory really slows down access. I think the 1000 files structure is a good idea. Directory traversal is not really that complicated compared to the directory access times.

Last edited 5 years ago by stoecker (previous) (diff)

comment:24 Changed 5 years ago by wavexx

So I guess the coordinate would have to be padded, so that "100" would be:

00/01/00?

The largest x coordinate at the highest zoom level can be 524288, so 6 digits are fine.

Considering the metadata files, the last level will have at most 200 files in it.

My 4gb cache has the following number of files per zoom-level:

z10 23
z11 63
z12 193
z13 662
z14 2459
z15 9465
z16 36874

Maximum count per leaf directory at zoom 16 (largest I currently have):

$ for dir in maps/16/*; do find $dir | wc -l; done | sort -rn | head -1
230

This is not much given the total size, but granted this is not the maximum zoom level.

It's a bit sad that this would be a departure to the slippy-map format though, but I will not oppose this change.

Is there anybody against it?

comment:25 in reply to:  24 Changed 5 years ago by stoecker

Replying to wavexx:

So I guess the coordinate would have to be padded, so that "100" would be:

00/01/00?

Works also without padding.
So 100.png gets 100.png and 1000.png gets 1/000.png. Depends on how it is defined. :-)

It's a bit sad that this would be a departure to the slippy-map format though, but I will not oppose this change.

Is there anybody against it?

It is similar to mod_tile format:
http://svn.openstreetmap.org/!svn/bc/20000/applications/utils/mod_tile/readme.txt

There are multiple possibilities to define such a format. Important is only to choose one. Probably looking at other applications already using something alike. mod_tile format probably does not fully fit, as they use meta-tiles which are groups in x and y direction.

comment:26 Changed 5 years ago by Don-vip

Milestone: 14.0514.06

Move imagery cache-related tickets to next milestone (too risky now for this release)

comment:27 Changed 5 years ago by Don-vip

Milestone: 14.0614.07

Move all tickets for which no work has been done yet to next milestone

comment:28 Changed 5 years ago by Don-vip

Milestone: 14.0714.08

Move some tickets to next milestone

comment:29 Changed 5 years ago by Don-vip

Milestone: 14.0814.09

move imagery cache tickets to next milestone

comment:30 Changed 5 years ago by Don-vip

Milestone: 14.0914.10

Move complicated/risky tickets to next milestone.

comment:31 Changed 5 years ago by Don-vip

Milestone: 14.1014.11

Not enough time/resources for these tickets this month.

comment:32 Changed 5 years ago by Don-vip

I think we finally have enough time and no other urgent matters to achieve this subject this month. I didn't look at the patch yet, did all the remarks from the thread have been addressed in both patch and specification?

comment:33 Changed 5 years ago by wavexx

Not yet. The directory layout is final, but the spec is a bit more general for the tile metadata.
Very minor modifications required actually, but I won't have time to work on this in the next month.

I don't know if the provided patch still applies.
I could provide for an updated one if it doesn't, since I'm following the SVN trunk.

comment:34 Changed 5 years ago by Don-vip

Milestone: 14.1114.12

comment:35 Changed 5 years ago by bastiK

In [o30854] - see ​#josm5309 - rework cache directory structure so you don't have too many files in in one directory
New cache is implemented in TMSFileCacheTileLoader. Old cache in OsmFileCacheTileLoader? should work
as before.

comment:36 Changed 5 years ago by bastiK

In 7823/josm:
see #5309 - too many files in tile cache (see [o30854])
Set Einstein pref tms.newcache=true to test the new cache, it is not yet activated by default.

comment:37 Changed 5 years ago by bastiK

The change isn't life, because I'm not sure I'll have time to fix any issues during holidays. Otherwise it should be ready.

The location of the cache has changed from the temporary directory to ~/.josm/cache/tms. (WMS tiles are already stored at ~/.josm/cache/wms.) It uses now the id of the imagery source for the toplevel directory, if available.

I took the liberty to change the directory structure yet again: Each directory contains one digit from the x and one digit from the y coordinate. For example the tile

http://b.tile.openstreetmap.org/18/140917/86038.png`

is stored at

~/.josm/cache/tms/mapnik_osm/z18/x1y0/x4y8/x0y6/x9y0/x1y3/x7y8.png

That means the deepest directory will contain all tiles in a 10x10 square. One level above will store all tiles in a 100x100 square and so on. So tiles that are close together, will usually be stored in the same folder.

The disadvantage is a more complicated scheme, but I think it is still fairly straightforward.

Since the cache was so far basically limited to one day, I don't think it is worth the effort to do a migration.

Regarding the proposed specification and the shared cache: I'm not opposed to this, but I will not invest much time or energy in it either. The priority for me, is to improve the TMS cache in JOSM. If wavexx, or someone else, likes to work on the unification, they are welcome to do so. The cache layout can be changed again later, if needed.

comment:38 Changed 5 years ago by Don-vip

Summary: [patch] Improve TMS cache directory layoutImprove TMS cache directory layout

comment:39 Changed 5 years ago by Don-vip

Milestone: 14.1215.01

finish all cache-related tickets next month

comment:40 Changed 5 years ago by bastiK

In [o30890]: see ​#josm9813 - create directory when writing tags file

comment:41 Changed 5 years ago by bastiK

In 7911/josm:

see #9813 - activate new TMS cache

comment:42 Changed 5 years ago by bastiK

Resolution: fixed
Status: newclosed

The cache directory layout has been improved. I'm closing this ticket, because of inactivity regarding the topic of cache unification. To get further, we need code contribution (patches) or communication with other projects.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain The111.
as The resolution will be set.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.