*** dviroel|out is now known as dviroel | 00:33 | |
*** dviroel is now known as dviroel|out | 01:00 | |
*** rlandy|ruck is now known as rlandy|out | 01:00 | |
*** ysandeep|out is now known as ysandeep | 01:52 | |
*** ysandeep is now known as ysandeep|afk | 03:23 | |
*** ysandeep|afk is now known as ysandeep | 10:14 | |
*** rlandy|out is now known as rlandy | 10:33 | |
*** anbanerj is now known as frenzy_friday | 11:33 | |
*** dviroel|out is now known as dviroel | 11:34 | |
*** anbanerj is now known as frenzy_friday | 11:58 | |
lajoskatona | Hi, on Ussuri and Victoria we have failing py3x jobs, see: https://zuul.openstack.org/builds?job_name=openstack-tox-py38&job_name=openstack-tox-py36&project=openstack%2Fneutron&branch=stable%2Fvictoria&branch=stable%2Fussuri&pipeline=periodic-stable&skip=0 | 12:03 |
---|---|---|
lajoskatona | Locally I can't reproduce the timeout, and for first check nothing special was merged to these branches on Neutron since 11 July, last week | 12:04 |
lajoskatona | so perhaps you know if something has changed in image or in some mirror or things like that that would be helpful | 12:04 |
fungi | mmm, lajoskatona seems to have disappeared while i was trying to search through the logs of one of those builds, but it looks like there are eventlet timeout tracebacks, is that what can't be reproduced? | 12:30 |
fungi | elodilles: have you seen similar issues in u/v branches of other projects in the past 1.5 weeks? | 12:30 |
lajoskatona | fungi: Hi, seems I have some net issues, sorry | 12:45 |
fungi | no worries, did you catch my earlier comments from the channel log? | 12:47 |
fungi | specifically, it's the 40-second eventlet timeouts you're unable to reproduce? | 12:48 |
fungi | i thought i saw a thread on openstack-discuss about that, i wonder if there were fixes for whatever it was which only got backported as far as wallaby | 12:49 |
lajoskatona | fungi: yes, I have the same eventlet locally and no timeout locally, so that's why I thought there's something which I dont have locally | 12:49 |
lajoskatona | fungi: I check, perhaps I missed that thread | 12:49 |
fungi | i'll see if i can find the ml thread i'm thinking of, or whether i dreamed it | 12:49 |
lajoskatona | fungi: thanks | 12:50 |
fungi | i'm not immediately able to spot it in the archive | 12:51 |
fungi | i may be remembering the os-vif/libuxbridge problem, though that doesn't seem at all similar | 12:52 |
fungi | nothing jumps out at me from june either | 12:54 |
lajoskatona | fungi: I found an old thread which pointed to this req bump: https://review.opendev.org/c/openstack/requirements/+/811555 | 12:55 |
lajoskatona | fungi: but not sure if we see the same here, this is the mail: https://lists.openstack.org/pipermail/openstack-discuss/2021-October/025179.html | 12:56 |
elodilles | fungi: so far i only found this at neutron's unit test jobs | 12:57 |
fungi | odd that it would have just started ~1.5 weeks ago. i don't see any recent constraints changes on those branches at all | 12:58 |
elodilles | i've checked the pip freeze outputs of the failing job vs the previous passing job (from July 11th) and there is no difference at all | 12:59 |
fungi | do we capture dpkg -l output? | 12:59 |
elodilles | neither requirements' stable/victoria was touched since april | 12:59 |
elodilles | fungi: i don't think i saw 'dpkg -l' in the logs | 13:00 |
fungi | yeah, capture it for devstack jobs but not unit tests | 13:00 |
elodilles | also interesting, that victoria is focal based but ussuri is bionic | 13:01 |
fungi | maybe we could infer it by grabbing the dpkg -l from devstack jobs on the 10th and 12th or something | 13:01 |
elodilles | fungi: ok, i'll try to do that | 13:02 |
fungi | to see what might have updated in focal and in bionic around those dates | 13:02 |
fungi | could be there was a security fix ubuntu rolled out on the 11th | 13:02 |
lajoskatona | fungi, elodilles: yeah pip seems to be the same in the green and red runs | 13:21 |
dpawlik | clarkb: hey, wanna check https://review.opendev.org/c/openstack/ci-log-processing/+/848218 please? | 13:25 |
elodilles | i've taken a sample (from bionic, stable/ussuri) dpkg-l.txt diff between Jul 05 and Jul 19: https://paste.opendev.org/show/b2EuM9b16RC6il4G6kHx/ | 13:26 |
elodilles | (these were the closest runs i've found in neutron repo) | 13:27 |
fungi | we don't do periodic stable devstack jobs for neutron daily any more? | 13:34 |
elodilles | as far as i know stable-periodics are all 'lightweight' unit test jobs | 13:36 |
*** ysandeep is now known as ysandeep|afk | 13:37 | |
elodilles | hmmm, but neutron has extra 'periodic' jobs | 13:41 |
fungi | even generic periodic stable jobs for devstack might be sufficient to spot what's changed in distro packages, if most of the same packages are getting installed in those jobs | 13:44 |
lajoskatona | elodilles: we have, like here: https://zuul.openstack.org/buildsets?project=openstack%2Fneutron&branch=stable%2Fvictoria&pipeline=periodic&skip=0 | 13:45 |
lajoskatona | though it's new for me that there's periodic and periodic-stable pipeline..... | 13:45 |
lajoskatona | in my mind it was the same | 13:46 |
fungi | periodic is usually for master branch testing, and periodic-stable is for stable branch testing. we trigger them at slightly different times to offset the load | 13:46 |
*** dasm|off is now known as dasm|ruck | 13:48 | |
fungi | oh, i guess not really that far apart. https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml indicates periodic should trigger at 02:00 and periodic-stable at 02:01, just far enough apart to make sure the periodic jobs get some priority for their node requests in case we can't run them all before load on the system picks back up | 13:49 |
fungi | i was thinking of the periodic-weekly pipeline, which starts at 08:00 on saturdays, hopefully after the daily periodics have wrapped up | 13:50 |
elodilles | so the difference seems to be: https://paste.opendev.org/show/bwXLDiDuz9mCUEk3OnZn/ | 13:57 |
elodilles | ignore me, i've diffed stable/wallaby :/ | 13:58 |
fungi | though it may be the same | 14:02 |
fungi | at least the same as for victoria, since they run on the same platform | 14:02 |
elodilles | yes, they are the same (both are focal) | 14:03 |
elodilles | so at least the result is the same | 14:03 |
fungi | so that suggests this situation could be brought on by a kernel or libc update | 14:05 |
fungi | though i wonder why it doesn't affect wallaby jobs | 14:05 |
elodilles | yes. (for ussuri / bionic: https://paste.opendev.org/show/b8JwAKL4wMqdxCGOZpp8/ ) | 14:09 |
*** ysandeep|afk is now known as ysandeep | 14:13 | |
lajoskatona | elodilles: the upper lines of packages are from a passed run? | 14:15 |
elodilles | lajoskatona: yes. in both case versions were bumped with one between July 11th and July 12th | 14:23 |
elodilles | from 5.4.0-121 to 5.4.0-122 ; from 4.15.0-188 to 4.15.0-189 | 14:24 |
lajoskatona | elodilles: thanks | 14:25 |
lajoskatona | elodilles: I this to the bug (https://bugs.launchpad.net/neutron/+bug/1982206 ) | 14:26 |
elodilles | lajoskatona: ++ | 14:45 |
fungi | you should be able to look at the ubuntu package changelogs to find out what "fixes" were included in -122 and -189 and if there's overlap that could be a clue, or maybe this was related to the libc update (the distro package updates could also just be a red herring) | 14:52 |
fungi | oh, or maybe the kernel package changelogs are effectively useless :/ | 14:54 |
fungi | https://changelogs.ubuntu.com/changelogs/pool/main/l/linux-signed/linux-signed_4.15.0-189.200/changelog | 14:54 |
fungi | i guess we'd need to figure out what patches were imported into the linux-signed source package between 4.15.0-188.199 and 4.15.0-189.200 | 14:58 |
fungi | there's probably a git repo on lp for that | 14:58 |
fungi | https://code.launchpad.net/ubuntu/+source/linux-signed seems to be the place | 15:00 |
clarkb | dpawlik: I've mentioned it before but I wonder why you don't just send the json as is? I don't understand why you have to read the json then reformat it and send it out again | 15:01 |
fungi | https://git.launchpad.net/ubuntu/+source/linux-signed?h=ubuntu%2Fbionic-security | 15:02 |
clarkb | dpawlik: but I also don't really have the bandwidth to review that stuff. This is why the opendev team stopped running those services | 15:03 |
dpawlik | clarkb: there is also an json send to the separate index | 15:05 |
dpawlik | clarkb: and for me it was obvious if someone will continue working on making some graphs base on the value that I have prepared in logsender would be good to review | 15:05 |
dpawlik | but if not, ok, we can go as it is. | 15:05 |
fungi | i can't seem to find the kernel patches, even on the applied version of that branch | 15:06 |
clarkb | dpawlik: right sending to a separate index is good (I think that allows you to manage data rotations independently for the different types of information and have longer/shorter retention as necessary). But what confuses me is why you need to deserialize and reserialize the document in a different format. Can you just take what the job is emitting and send it to opensearch? I | 15:08 |
clarkb | also agree it is good to have reviews. The problem is if I was someone who was able to do those reviews we wouldn't have needed to evict these services from opendev. I think you should be looking for help from the openstack project which aimed to preserve this functionality | 15:08 |
clarkb | fungi: iirc ubuntu does log them somewhere but it is somewhere weird | 15:08 |
clarkb | or weird to me ebcause I don't understand all the different branches and packges for the ubuntu kernels | 15:08 |
clarkb | fungi: http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.4.0-122.138/changelog | 15:10 |
fungi | https://ubuntu.com/security/notices/USN-5515-1 | 15:11 |
fungi | i came at it from another angle | 15:11 |
clarkb | fungi: lajoskatona also I'm on a bit of a campaign to remind everyone that asks about failures without linking to a specific failure log to please do so :) | 15:22 |
lajoskatona | clarkb: ack, I keep in mind | 15:23 |
clarkb | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bb1/periodic-stable/opendev.org/openstack/neutron/stable/victoria/openstack-tox-py38/bb13920/tmpkify9opp the truncated subunit log might be helpful too will show you the order of the tests that did run and which tests ran | 15:23 |
clarkb | that might single out a specific test that is problematic or class of tests | 15:24 |
clarkb | looking at the console log the last thing logged to the console was about 16 minutes prior to the timeout. Are your unittests still runnning with internal test timeouts? If so this implies whatever it is breaks that | 15:26 |
clarkb | if not, then maybe you should reenable those timeouts to see if they can help catch what is breaking | 15:26 |
clarkb | etc | 15:26 |
clarkb | looks like neutron may have actually removed the test timeout by default ... | 15:30 |
clarkb | on master it is only applied to the db migration tests? | 15:30 |
clarkb | this is why those timeouts exist. So that the code that creates the problem can be more readily identified | 15:30 |
lajoskatona | clarkb: you mean OS_TEST_TIMEOUT ? | 15:34 |
clarkb | lajoskatona: yes | 15:35 |
clarkb | but it seems like that is only applied to the db migration tests? | 15:35 |
clarkb | the original intent way back when was that it be applied globally to catch test cases that locked up and hopefully provide some sort of feedback into where the lock up was | 15:36 |
clarkb | I don't know that it would help here, but the idea behidn the global test timeouts is that it would | 15:37 |
lajoskatona | clarkb: I see it in tox.ini on master, and for functional/test_migrations, so yes | 15:38 |
clarkb | lajoskatona: one option to try may be setting a global test timeout to like 5 minutes (~1/3 the delta between logging and job timeout) and see if that produces any errors that are debuggable | 15:46 |
*** dviroel is now known as dviroel|lunch | 16:06 | |
*** ysandeep is now known as ysandeep|out | 16:11 | |
*** akekane_ is now known as abhishekk | 17:09 | |
*** dviroel|lunch is now known as dviroel | 17:23 | |
*** dviroel is now known as dviroel|afk | 19:34 | |
*** rlandy is now known as rlandy|biab | 21:10 | |
*** rlandy|biab is now known as rlandy | 21:33 | |
*** rlandy is now known as rlandy|bbl | 22:15 | |
*** rlandy|bbl is now known as rlandy | 23:30 | |
*** dasm|ruck is now known as dasm|off | 23:41 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!