Modify

Opened 10 months ago

Closed 9 months ago

Last modified 8 months ago

#18542 closed enhancement (fixed)

Obtain tag2link rules from Wikidata and OSM Sophox

Reported by: simon04 Owned by: Don-vip
Priority: normal Milestone: 20.01
Component: Core tag2link Version:
Keywords: wikidata privacy Cc: nyurik, stoecker

Description (last modified by simon04)

Wikidata links items and properties to OSM keys and tags via https://www.wikidata.org/wiki/Property:P1282
Wikidata maintains an URL formatter for various properties via https://www.wikidata.org/wiki/Property:P1630

We can combine those two pieces of information – https://w.wiki/FD6 – and augment JOSM's tag2link capabilities

Relates to #17842.

Attachments (0)

Change History (19)

comment:1 Changed 10 months ago by simon04

Description: modified (diff)
Keywords: wikidata added

comment:2 Changed 10 months ago by simon04

Resolution: fixed
Status: newclosed

In 15677/josm:

fix #18542, see #13901 - Obtain tag2link rules from Wikidata

comment:3 Changed 10 months ago by simon04

Milestone: 20.01
Last edited 10 months ago by simon04 (previous) (diff)

comment:4 Changed 10 months ago by Don-vip

Great! :)

comment:5 Changed 10 months ago by simon04

In 15679/josm:

see #13901 see #18542 - Obtain tag2link rules from OSM Sophox

comment:6 Changed 10 months ago by simon04

The following rules are used:

comment:7 Changed 10 months ago by simon04

Summary: Obtain tag2link rules from WikidataObtain tag2link rules from Wikidata and OSM Sophox

comment:8 Changed 10 months ago by simon04

Cc: nyurik added

comment:9 Changed 10 months ago by simon04

Cc: stoecker added

Replying to 13901#comment:13 stoecker:

I don't like much that this change causes permanent web accesses to non-JOSM servers for elements which are not users-selected. We're giving a telemetry of user actions this way to providers we haven't under control.

I assume, you are referring to this change? The user cannot be tracked on her individual action. On JOSM startup, one query to https://query.wikidata.org/sparql and https://sophox.org/sparql, each, is made and the results are cached.

comment:10 in reply to:  9 Changed 10 months ago by stoecker

Replying to simon04:

Replying to 13901#comment:13 stoecker:

I don't like much that this change causes permanent web accesses to non-JOSM servers for elements which are not users-selected. We're giving a telemetry of user actions this way to providers we haven't under control.

I assume, you are referring to this change? The user cannot be tracked on her individual action. On JOSM startup, one query to https://query.wikidata.org/sparql and https://sophox.org/sparql, each, is made and the results are cached.

I know that this is no major issue, but I also know that in the last years small issues showed that the impact may be much larger. I'd prefer if we cache these files via the JOSM server. This way we reduce the impact. I'll setup a proxy for this purpose.

E.g. It's much easier to restrict JOSM when there is only one server you need to block in a firewall.

comment:11 Changed 9 months ago by stoecker

Resolution: fixed
Status: closedreopened

I did setup a caching proxy on JOSM server:

https://josm.openstreetmap.de/remote/wikidata-sparql
https://josm.openstreetmap.de/remote/sophox-sparql

That works fine for the first link, but I can't get sophox to work, neither as cache nor direct.

comment:12 Changed 9 months ago by nyurik

Replying to stoecker:

@stoecker sorry just saw this ticket. What issues are you having with Sophox? Could you paste the specific SPARQL query you are running against it? Also, please join https://osmus-slack.herokuapp.com/ (OSM Slack) -- we can discuss it there in #sophox or #josm channels. (Ping nyurik). Thanks!

Last edited 9 months ago by nyurik (previous) (diff)

comment:13 Changed 9 months ago by nyurik

I have been trying to wrap by head around the goal and approach of this ticket, and still highly confused.

If the goal is to get the right value->URL formatter in tag2link, the easiest is to follow the same scheme as what iD does:

  1. convert key to a "sitelink": ("Key:" + key).replace('_', ' ') (replace underscores with spaces). See iD code.
  2. call https://wiki.openstreetmap.org/w/api.php?action=wbgetentities&... to get the data items for each of the sitelinks. You should pass all sitelinks to it in a single call, rather than calling it multiple times. See iD code.
  3. check if P8 property claim is defined on each of the resulting data items, and if so, use it.

We already have formatter defined on some of the keys - list, and it will be very easy to add more.

Lastly, I believe the first two steps should be placed into the core JOSM (at least eventually), because these same steps will be useful for any other kind of data item access, e.g. getting key/tag documentation.

Last edited 9 months ago by nyurik (previous) (diff)

comment:14 in reply to:  13 ; Changed 9 months ago by simon04

Replying to nyurik:

I have been trying to wrap by head around the goal and approach of this ticket, and still highly confused.

Instead of running 1.+2.+3. on every possible tag, upon JOSM start all URL formatters related to OSM keys are obtained from both Wikidata and OSM Wiki Wikibase using the queries mentioned in comment:6.

comment:15 in reply to:  14 ; Changed 9 months ago by nyurik

Replying to simon04:

Instead of running 1.+2.+3. on every possible tag, upon JOSM start all URL formatters related to OSM keys are obtained from both Wikidata and OSM Wiki Wikibase using the queries mentioned in comment:6.

You don't need to run them individually. Instead, you can take all keys you see (either all keys on a single object, or all keys in all objects), and do them at once. Step (2) allows you to get up to 50 or a 100 i think in one call.

Benefits of this approach:

  • same code will allow you to get other key/tag metadata, such as key/tag documentation, or even simple validation rules like regex - see Key:population example (at the bottom).
  • you only rely on a single source of data -- OSM wiki, without any additional querying mechanism (i.e. wikidata.org, query.wikidata.org, or sophox.org).

On the other hand, you could use a single Sophox query to get the same data too - I can write a simple qurey for you that will return all available formatters. The only downside is that sophox.org is a bit less stable than OSM wiki itself, but it allows richer querying for the same data.

Last edited 9 months ago by nyurik (previous) (diff)

comment:16 in reply to:  11 Changed 9 months ago by simon04

Resolution: fixed
Status: reopenedclosed

Replying to stoecker:

I did setup a caching proxy on JOSM server:

Sets track this change/improvement in #18599.

comment:17 in reply to:  15 Changed 9 months ago by simon04

Replying to nyurik:

You don't need to run them individually. Instead, you can take all keys you see (either all keys on a single object, or all keys in all objects), and do them at once. Step (2) allows you to get up to 50 or a 100 i think in one call.

I'm not convinced, nor do I want to rewrite the working code and get swept up in the Wikibase internals. This query https://wiki.openstreetmap.org/w/api.php?action=wbgetentities&format=json&languagefallback=1&languages=en&origin=*&sites=wiki&titles=Key%3Anatural%7CTag%3Anatural%3Dpeak (for one key and one tag only) returns 21KB of data exposing all the Wikibase internals (mainsnak, snaktype, datavalue) that has to be parsed again. If anyone else is up to looking into wbgetentities, open a separate ticket and attach a patch.

comment:18 Changed 8 months ago by Don-vip

Keywords: privacy added

comment:19 in reply to:  12 Changed 8 months ago by stoecker

Replying to nyurik:

@stoecker sorry just saw this ticket. What issues are you having with Sophox? Could you paste the specific SPARQL query you are running against it? Also, please join https://osmus-slack.herokuapp.com/ (OSM Slack) -- we can discuss it there in #sophox or #josm channels. (Ping nyurik). Thanks!

After setup of caching in #18599 I see following result "Query string present but no explicit expiration time" for sophox. Do you have any influence in the server sending the data? If so, it would be fine if the server would send an Expire header for the sparql request answers, so that caching works. That would also reduce load on the server.

I.e. like wikipedia add something like Cache-Control: public, max-age=300 or any other valid time. More important is probably the Last-Modified line.

Last edited 8 months ago by stoecker (previous) (diff)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Don-vip.
as The resolution will be set.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.