Modify

Opened 16 months ago

Last modified 14 months ago

#20433 assigned task

Imagery Integration tests

Reported by: GerdP Owned by: Don-vip
Priority: normal Milestone: Longterm
Component: Unit tests Version:
Keywords: imagery jenkins Cc: Don-vip, stoecker

Description

IIGTR the job is submitted every 6 hours but its runtime is > 7 hours.

Attachments (1)

20433.patch (1.9 KB) - added by taylor.smock 16 months ago.
Enable parallel test execution and annotate ImageryPreferenceTestIT#testImageryEntryValidity with @Execution(ExecutionMode.CONCURRENT)

Download all attachments as: .zip

Change History (47)

comment:1 Changed 16 months ago by stoecker

Probably should be reduced to once every 24h?

comment:2 Changed 16 months ago by Don-vip

Owner: changed from team to Don-vip
Status: newassigned

It's a combination of:

  • regression from JUnit 5 migration which disabled the parallel execution
  • the cumulative increasing number of timeout errors

I changed the cron to once per 24h, and still have the parallel execution on my radar.

Any help to fix imagery sources which timeout will help to reduce the duration of this test and provide working entries to our users.

@Gerd what does "IIGTR" mean?

Last edited 16 months ago by Don-vip (previous) (diff)

comment:3 Changed 16 months ago by Don-vip

Component: CoreUnit tests

comment:4 Changed 16 months ago by GerdP

IIGTR: If I got that right

comment:5 Changed 16 months ago by GerdP

reg. help: I have no clue about all the parameters used in the WMS/TMS definitions or how the results of the test help to find what has to be changed :(
Isn't that something that has to be done by those who created the wiki entries?

comment:6 in reply to:  5 Changed 16 months ago by Don-vip

Replying to GerdP:

Isn't that something that has to be done by those who created the wiki entries?

You can't expect people who submitted a source several years ago to monitor it daily for production issues, change of url, broken certificates and so on.

comment:7 Changed 16 months ago by Don-vip

As for clues... Well for each error you have to dig: has the layer name changed, has the server url changed, has the server been decommissioned, are they blocking German IP addresses, and so on.

comment:8 Changed 16 months ago by GerdP

You can't expect people who submitted a source several years ago to monitor it daily for production issues, change of url, broken certificates and so on.

My idea is something like an automatic filtering of the wrong entries so that JOSM can still show the entry but with a flag that it is probably not working because of errors in the wiki definition.

Well for each error you have to dig.

Does that mean you contact someone and ask or do you have tools to find that out?

comment:9 in reply to:  2 ; Changed 16 months ago by taylor.smock

Replying to Don-vip:

  • regression from JUnit 5 migration which disabled the parallel execution

Oops. I'll see if I can fix that.

comment:10 in reply to:  9 Changed 16 months ago by Don-vip

Replying to taylor.smock:

Oops. I'll see if I can fix that.

It's OK, I should just have to configure the job to add required properties. It's pure Jenkins configuration, I can do it without modifying anything in the source tree I think.

comment:11 Changed 16 months ago by taylor.smock

There is a JUnit annotation for running parallel tests.

Unfortunately, it is currently considered experimental.

Anyway, I'll post a patch for it (I finished up the work for the patch about the time you posted).

Its fairly tiny. The hard part (for me) is extracting it from the rest of the stuff I've been modifying for #16567, and then ensuring it applies cleanly (for you).

Changed 16 months ago by taylor.smock

Attachment: 20433.patch added

Enable parallel test execution and annotate ImageryPreferenceTestIT#testImageryEntryValidity with @Execution(ExecutionMode.CONCURRENT)

comment:12 Changed 16 months ago by Don-vip

Doesn't setting the property at this location enable parallel mode for ALL tests?

Last edited 16 months ago by Don-vip (previous) (diff)

comment:13 in reply to:  12 Changed 16 months ago by taylor.smock

Replying to Don-vip:

Doesn't setting the property at this location enable parallel mode for ALL tests?

-Djunit.jupiter.execution.parallel.enabled=true just allows the @Execution annotations to be used.

If I added -Djunit.jupiter.execution.parallel.mode.default=concurrent, then yes.

Link to docs: https://junit.org/junit5/docs/5.7.0/user-guide/#writing-tests-parallel-execution (if you want to look at all the various config options).

Last edited 16 months ago by taylor.smock (previous) (diff)

comment:14 Changed 16 months ago by Don-vip

And of course I totally forgot to mention that I committed r17474 which is the main reason for the recent duration increase.

comment:15 Changed 16 months ago by Don-vip

@Taylor ok thank you, I didn't know about the annotation and thought the properties were the only way to go.

comment:16 Changed 16 months ago by Don-vip

In 17478/josm:

see #16567 - see #20433 - restore parallel execution of imagery integration test

comment:17 in reply to:  15 Changed 16 months ago by taylor.smock

Replying to Don-vip:

@Taylor ok thank you, I didn't know about the annotation and thought the properties were the only way to go.

No problem. There is a lot of new things in JUnit 5, and I am pretty certain I know less than half of the new features. In this case, I assumed that JUnit wouldn't force tests to be all parallel or all sequential. And to be fair, a good portion is marked as experimental (specifically, in this case, the concurrent execution).

Let me know how it worked on the server -- it took ~260 minutes on my machine with and without the patch.

comment:18 Changed 15 months ago by Don-vip

Test still takes 7 hours as follows:

  • ImageryPreferenceTestIT.AR-ign-wms => 3h39
  • ImageryPreferenceTestIT.AR-Mapa-Educativo-wms => 1h00
  • 20 tests take more than 1 min
  • 20 tests take between 15s and 1 min

comment:19 Changed 15 months ago by Don-vip

In 17517/josm:

see #20433 - don't loop over all tile sources if we face maximum allowed server time

comment:20 Changed 15 months ago by Don-vip

Zalitoar made a great PR to fix Argentina sources: https://github.com/osmlab/editor-layer-index/pull/1058/files

comment:21 Changed 15 months ago by Don-vip

Milestone: Longterm

comment:22 Changed 15 months ago by Don-vip

Ticket #20546 has been marked as a duplicate of this ticket.

comment:23 Changed 15 months ago by mdk

Several tests are failing because of invalid bounding boxes (see #20354).
Are the BBOX values hard coded or calculated?

comment:24 Changed 15 months ago by GerdP

The job seems to block anything else on Jenkins?

comment:25 Changed 15 months ago by Don-vip

Yes the Argentina entries make everything hang. You can help me by fixing them. See the ELI PR above.

comment:26 Changed 15 months ago by Don-vip

Keywords: imagery jenkins added
Summary: Is Jenkins job "JOSM-Imagery-Integration" OK?Imagery Integration tests

comment:27 Changed 15 months ago by GerdP

How can I help with this PR?

comment:28 Changed 15 months ago by Don-vip

Review if the changes are OK and update JOSM wiki accordingly.

comment:29 in reply to:  28 ; Changed 15 months ago by GerdP

Replying to Don-vip:

Review if the changes are OK and update JOSM wiki accordingly.

I fear you are ten steps ahead. I have zil knowledge about this stuff, I just use background images in JOSM and I understand that this is about the entries that I see in the corresponding JOSM menu.
So far I've cloned ​https://github.com/osmlab/editor-layer-index.git and I found https://josm.openstreetmap.de/wiki/Maps#Documentation and started to read, but got lost in all the details.

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow? I don't know how to use the PR in my local clone cause I don't know how to use git / GitHub etc.
Do I 1st have to learn git to help with this or can I skip this somehow?

Last edited 15 months ago by stoecker (previous) (diff)

comment:30 Changed 15 months ago by stoecker

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?

If you have a pull URL like ​https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like ​https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)

comment:31 in reply to:  30 ; Changed 15 months ago by GerdP

Replying to stoecker:

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?

If you have a pull URL like ​https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like ​https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)

Oh, so very obvious ;)

comment:32 in reply to:  31 Changed 15 months ago by stoecker

Replying to GerdP:

Replying to stoecker:

I failed to download the patch in the "PR" as text, GitHub seems to hide that somehow?

If you have a pull URL like ​https://github.com/osmlab/editor-layer-index/pull/1058/files add a .patch or .diff behind the number like ​https://github.com/osmlab/editor-layer-index/pull/1058.patch :-)

Oh, so very obvious ;)

There probably is a hidden text or button in the UI somewhere, which links to that as well. I have no idea where...

comment:33 in reply to:  29 Changed 15 months ago by Don-vip

Replying to GerdP:

Replying to Don-vip:

Review if the changes are OK and update JOSM wiki accordingly.

I fear you are ten steps ahead. I have zil knowledge about this stuff
Do I 1st have to learn git to help with this or can I skip this somehow?

This has nothing to do with Git or GitHub. You just have to look at the changes, review them, report them to JOSM wiki when they're OK. I've disabled the job until we fix at least Argentina sources as it makes Jenkins hang everyday.

comment:34 Changed 15 months ago by GerdP

OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki. I still have no clue what actions the review includes.

comment:35 in reply to:  34 ; Changed 15 months ago by stoecker

Replying to GerdP:

OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki.

Not necessary. Call "ant imageryindexdownload". You'll get the recent ELI file in our XML format.

I still have no clue what actions the review includes.

Essentially check if the changes improve the situation or are incorrect. Sadly we can't rely on any changes from ELI to be correct.

Any changes which improve the situation should be copied. Everything else added to ignore list.

comment:36 in reply to:  35 Changed 15 months ago by GerdP

Replying to stoecker:

Replying to GerdP:

OK, I guess I am able to transform the changes in the PR to the syntax used in the JOSM wiki.

Not necessary. Call "ant imageryindexdownload". You'll get the recent ELI file in our XML format.

My understanding is that the PR was not yet applied. I may be wrong with that.

I still have no clue what actions the review includes.

Essentially check if the changes improve the situation or are incorrect. Sadly we can't rely on any changes from ELI to be correct.

Any changes which improve the situation should be copied. Everything else added to ignore list.

Well, that's my problem. I don't know what changes I should look for and how to decide if new is better than old. E.g. I see that the PR removes a line containing "EPSG:4326",
How do I know if this is a good idea or not? It also adds new entries. What has to be done to verify those?

comment:37 Changed 15 months ago by Don-vip

Put the projection changes aside unless they are absolutely needed (by testing the entry in JOSM). They made a lot of changes in this area and I need to update the integration tests to test the projections better. Focus on URLs. The timeout we observe on Jenkins likely come from a bad URL.

comment:38 Changed 15 months ago by GerdP

OK, but don't wait for me. This morning I tried a few things image layers in AR and got all kinds of messages that I don't yet understand. No progress so far..

comment:39 Changed 15 months ago by mdk

It looks like there several problems in the test code:

  • no parallel execution
  • more than 10'000 retries on the same server, even when the server return 403 or even worst run into a timeout

Here is a short analyse of the last run.

10'045 requests for http://mapa.educacion.gob.ar/geoserver/wms always returning 403 (taking one hour):

[junitlauncher] 2021-03-08 00:14:09.839 INFO: GET http://mapa.educacion.gob.ar/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=publico:analfabetismo_depto_2010&STYLES=&SRS=EPSG:2393&WIDTH=512&HEIGHT=512&BBOX=3486281.9505065,-30183771.1349102,23523790.2932958,-10146262.7921210 -> HTTP/1.1 403 (531 ms; 1018 B)
...
[junitlauncher] 2021-03-08 01:15:32.266 INFO: GET http://mapa.educacion.gob.ar/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=publico:analfabetismo_depto_2010&STYLES=&SRS=EPSG:30791&WIDTH=512&HEIGHT=512&BBOX=-12080989.0463769,-6480670.3377719,-12061421.1671358,-6461102.4585309 -> HTTP/1.1 403 (253 ms; 1018 B)

203 request for http://geoadmin.agroindustria.gob.ar, ending with a "Read time out" after 5 minutes - for each request(!)

[junitlauncher] 2021-03-08 01:17:00.271 INFO: Skipping unsupported image format utfgrid
[junitlauncher] 2021-03-08 01:22:05.026 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] 2021-03-08 01:22:05.024 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:4131&WIDTH=512&HEIGHT=512&BBOX=-102.6664550,-270.0080852,257.3335450,89.9919148 -> !!! (5 min 0 s)
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 01:27:05.374 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:5463&WIDTH=512&HEIGHT=512&BBOX=486459.2657183,-30187847.7236765,40561475.9512968,9887168.9619020 -> !!! (5 min 0 s)
[junitlauncher] 2021-03-08 01:27:05.375 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 01:32:05.727 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:6794&WIDTH=512&HEIGHT=512&BBOX=-9412103.8675231,-33695717.9308253,30662912.8180554,6379298.7547532 -> !!! (5 min 0 s)
[junitlauncher] 2021-03-08 01:32:05.727 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 01:37:06.087 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
...
[junitlauncher] 2021-03-08 18:13:16.875 WARNING: java.net.SocketTimeoutException: Read timed out. Cause: java.net.SocketTimeoutException: Read timed out
[junitlauncher] java.net.SocketTimeoutException: Read timed out
removed call stack
[junitlauncher] 2021-03-08 18:13:16.874 INFO: GET http://geoadmin.agroindustria.gob.ar:443/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&SERVICE=WMS&REQUEST=GetMap&LAYERS=spearfish&STYLES=&SRS=EPSG:3673&WIDTH=512&HEIGHT=512&BBOX=-11782348.4487659,-25634747.7019831,28292668.2368126,14440268.9835954 -> !!! (5 min 0 s)

If there are also 10'000 calls to be expected, the test will run another 34 days!

comment:40 Changed 15 months ago by GerdP

@mdk: What input did you use for your analyses?
I tried to find the wiki entries which might produce the 403 messages. I updated the "default entries" in JOSM to refresh cached file mirror_https___josm.openstreetmap.de_maps. It contains only one url that starts with mapa.educacion.gob.ar/geoserver but neither the bounds nor the projections from the log in comment:appear in the file, so I wonder what data the unit test is testing?

<url><![CDATA[http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities]]></url>
<entry>
<name>Educational map (WMS)</name>
<name lang="es">Mapa Educativo (WMS)</name>
<id>Mapa-Educativo-wms</id>
<category>map</category>
<type>wms_endpoint</type>
<url><![CDATA[http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities]]></url>
<permission-ref>https://datos.gob.ar/acerca/seccion/Marco%20legal</permission-ref>
<projections>
<code>CRS:84</code>
<code>EPSG:4326</code>
<code>EPSG:3857</code>
</projections>

The files in editor-layer-index don't contain the url. I really get frustrated here because I have no clue where to start.

What I am missing is something like the "steps to reproduce" in the TRAC tickets. I don't know what the problem is, I don't know how to reproduce it.

comment:41 Changed 15 months ago by mdk

I just take a look at the console output of the failed (aborted) JOSM-Imagery-integration Jenkins job: https://josm.openstreetmap.de/jenkins/job/JOSM-Imagery-Integration/2642/jdk=JDK8/consoleFull
I don't know where the test get gets the URLs from.

Last edited 15 months ago by mdk (previous) (diff)

comment:42 Changed 15 months ago by mdk

The URL in https://josm.openstreetmap.de/wiki/Maps/Argentina is http://mapa.educacion.gob.ar/geoserver/ows?service=wms&version=1.1.1&request=GetCapabilities

As far as I understood request=GetCapabilities returns an XML with all possible layers and projections. In this case there are about 6673 <SRS> elements with different projections. Maybe the test code has a generic method to generate all possible URLs from this XML.

comment:43 Changed 15 months ago by GerdP

Ah, thanks, that helps to understand a bit more. Now, what should happen after the first "server return 403"?

comment:44 Changed 15 months ago by GerdP

My simple approach would be to remove the wiki entry "Educational map (WMS)" from https://josm.openstreetmap.de/wiki/Maps/Argentina#EducationalmapWMS

comment:45 Changed 15 months ago by mdk

Here some information about the ows in the URL:

OWS is not a protocol. It's a stand-in term for (I believe) OGC Web Service - basically it means its an endpoint that could be hosting any of the OGC services. It's commonly seen on GeoServer endpoints.
So for example:
http://www.example.com/geoserver/wms - would in theory be an endpoint for just WMS.
Whereas
http://www.example.com/geoserver/ows - would be an endpoint that could serve any of WMS, WFS, WCS, WMTS.

The problem is, that anytime another server could fail. In case of a timeout the test could easily run a month this way. So I think the assumed generic method should be changed. If a server fails with 403 or timeout, the code should stop generation further tests.

As a workaround these two servers should be comment out:

But maybe there are more tests which may fail this way afterwards.

Perhaps we should think about using a "normal" URL instead of the generic approach. Also we should check if the EducationalmapWMS return 403 "forbidden" because we treat the server with 10k+ requests each time the test is running. An admin of this site could block our test server URL because of abuse.

Last edited 15 months ago by mdk (previous) (diff)

comment:46 Changed 14 months ago by simon04

JOSM-Integration tests are hanging on Jenkins (again)...

Modify Ticket

Change Properties
Set your email in Preferences
Action
as assigned The owner will remain Don-vip.
as The resolution will be set.
to The owner will be changed from Don-vip to the specified user.
The owner will change to GerdP
as duplicate The resolution will be set to duplicate.The specified ticket will be cross-referenced with this ticket

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.