| *** haleyb is now known as haleyb|out | 00:31 | |
| seongsoocho[m] | Hi Infra Team,... (full message at <https://matrix.org/oftc/media/v1/media/download/AVMHsz-QNRZyYlms6tOmkUCI8lGxiVn99cdec6h431_DbXWNM3cCPSXgnwrIcqDCnsHWApiA9cpZ3mZ51VuuhB1CeZpKULsgAG1hdHJpeC5vcmcva0tWTHh3aVBzalNRTGR0UFd3WkZUZ2pX>) | 11:18 |
|---|---|---|
| frickler | seongsoocho[m]: it would help it you send messages only one line at a time, that makes them readable for people outside the matrix, too. (no need to repeat this one, just a note for next time) | 11:25 |
| frickler | we can review and hopefully merge https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/921878 without impact, then likely you can amend the project-config change to create a new job first, which could then be tested against specific projects | 11:26 |
| seongsoocho[m] | frickler: got it. I’ll send messages in one line from next time. | 11:27 |
| *** ykarel_ is now known as ykarel | 11:43 | |
| opendevreview | Seongsoo Cho proposed openstack/openstack-zuul-jobs master: Add ansible play for weblate client configuration https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/921878 | 12:57 |
| *** croeland1 is now known as croelandt | 13:21 | |
| opendevreview | Chaemin Lim proposed openstack/openstack-zuul-jobs master: Add ansible play for weblate client configuration https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/921878 | 15:00 |
| sfernand | hey guys! I'm working on some zuul jobs for Cinder and noticed something weird when some tempest tests fail to execute. Tempest run completes as excepted even if some tests fail to execute, but the job outputs POST_FAILURE instead of FAILURE. I see no controller logs so I suspect it might be some timeout pushing log information or something related | 18:26 |
| sfernand | https://zuul.opendev.org/t/openstack/build/702ef66f6d8e4c80b9a68b4dccb08046/log/zuul-info/zuul-info.controller.txt | 18:26 |
| clarkb | let me see | 18:27 |
| sfernand | ops sorry, wrong link | 18:27 |
| sfernand | https://zuul.opendev.org/t/openstack/build/702ef66f6d8e4c80b9a68b4dccb08046/log/job-output.txt | 18:27 |
| clarkb | sfernand: https://zuul.opendev.org/t/openstack/build/702ef66f6d8e4c80b9a68b4dccb08046/log/job-output.txt#34961 this shows at least one test does fail | 18:28 |
| clarkb | oh I see youexpect a FAILURE result not POST_FAILURE | 18:29 |
| sfernand | yep! I expect it to ouput FAILURE so I could check the logs | 18:29 |
| clarkb | POST_FAILURE occurs when at least one post-run playbook fails | 18:29 |
| clarkb | and this value overrides the SUCCESS/FAILURE state of the run playbook | 18:29 |
| clarkb | I don't see any obvious failures in post-run at https://zuul.opendev.org/t/openstack/build/702ef66f6d8e4c80b9a68b4dccb08046/console so maybe the failure occurs after we upload logs. Let me see if I can find this in the executor logs | 18:30 |
| sfernand | oh I see | 18:32 |
| clarkb | sfernand: https://zuul.opendev.org/t/openstack/build/702ef66f6d8e4c80b9a68b4dccb08046/log/job-output.txt#54711-54798 this is the problem. The post run playbook to capture system logs timed out | 18:33 |
| clarkb | something about `TASK [capture-system-logs : Stage various logs and reports]` is taking about half an hour | 18:34 |
| clarkb | a naive guess is that significant amounts of logs have been written so they need more processing than can be performed in that period of time and the task times out | 18:34 |
| sfernand | wow | 18:34 |
| sfernand | yeah all tests are failing due to volume or server creation timeouts so it writes lots of logs sayings like "waiting for resource" | 18:36 |
| clarkb | unfortunately it looks like the df and du tasks didn't produce useable output | 18:37 |
| clarkb | that may have given us some insight into the scope of the problem | 18:37 |
| clarkb | because that playbook timed out and wasn't ended properly we don't see it in the console page | 18:38 |
| clarkb | ya I think it is scrubbed out of the json entirely too :/ | 18:38 |
| clarkb | so the best clue we have is the job-output.txt fiel indicating whih task was started when the playbook timed out | 18:39 |
| clarkb | from that you might be able to infer where the specific issues are or potentially add more debugging | 18:39 |
| clarkb | sfernand: https://opendev.org/openstack/devstack/src/branch/master/roles/capture-system-logs/tasks/main.yaml this is what that task is doing | 18:41 |
| clarkb | I suspect either https://opendev.org/openstack/devstack/src/branch/master/roles/capture-system-logs/tasks/main.yaml#L36-L40 or https://opendev.org/openstack/devstack/src/branch/master/roles/capture-system-logs/tasks/main.yaml#L44-L53 | 18:42 |
| sfernand | sorry for dumb question is there a way to change the capture-system-logs for testings just for a specific job? | 18:42 |
| clarkb | sfernand: the same script will be used in every job that calls it. So you either need to modify devstack to switch behavior based on parameters or change the script to work for everything. I would probably start by pushing updates to that script to try and identify where the specific problem is first before settling on any solution | 18:43 |
| clarkb | hwoever if your job is creating many core dumps or so many deprecation warnings that you cannot process them in half an hour I would consider each of those to be bugs that should be fixed in your job and not in the script | 18:44 |
| sfernand | yeah for sure | 18:44 |
| clarkb | and if pushing debug updates to that script (use depends on to those updates from your change to see what happens) doesn't identify a source of the problem we can probably also hold a node and inspect it directly | 18:45 |
| clarkb | I would probably modify it to do something like if [ -d /var/core ] ; then ls -lh /var/core && du -hs /var/core ; fi | 18:48 |
| clarkb | then similar with the deprecation warnings drop all the seds and just do something like | wc -l to see how many are in there | 18:48 |
| clarkb | maybe also du -hs {{ stage_dir }}/logs/* {{ stage_dir }}/apache/* | 18:49 |
| clarkb | something along those lines to try and narrow down where the slowdown might be occuring | 18:49 |
| sfernand | yeah that is helpful thanks a lot clarkb! | 18:51 |
| sfernand | I will talk to the devstack folks and propose a change to the script with the debugs | 18:53 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!