Wednesday, 2022-07-20

*** dviroel\|out is now known as dviroel		00:33
*** dviroel is now known as dviroel\|out		01:00
*** rlandy\|ruck is now known as rlandy\|out		01:00
*** ysandeep\|out is now known as ysandeep		01:52
*** ysandeep is now known as ysandeep\|afk		03:23
*** ysandeep\|afk is now known as ysandeep		10:14
*** rlandy\|out is now known as rlandy		10:33
*** anbanerj is now known as frenzy_friday		11:33
*** dviroel\|out is now known as dviroel		11:34
*** anbanerj is now known as frenzy_friday		11:58
lajoskatona	Hi, on Ussuri and Victoria we have failing py3x jobs, see: https://zuul.openstack.org/builds?job_name=openstack-tox-py38&job_name=openstack-tox-py36&project=openstack%2Fneutron&branch=stable%2Fvictoria&branch=stable%2Fussuri&pipeline=periodic-stable&skip=0	12:03
lajoskatona	Locally I can't reproduce the timeout, and for first check nothing special was merged to these branches on Neutron since 11 July, last week	12:04
lajoskatona	so perhaps you know if something has changed in image or in some mirror or things like that that would be helpful	12:04
fungi	mmm, lajoskatona seems to have disappeared while i was trying to search through the logs of one of those builds, but it looks like there are eventlet timeout tracebacks, is that what can't be reproduced?	12:30
fungi	elodilles: have you seen similar issues in u/v branches of other projects in the past 1.5 weeks?	12:30
lajoskatona	fungi: Hi, seems I have some net issues, sorry	12:45
fungi	no worries, did you catch my earlier comments from the channel log?	12:47
fungi	specifically, it's the 40-second eventlet timeouts you're unable to reproduce?	12:48
fungi	i thought i saw a thread on openstack-discuss about that, i wonder if there were fixes for whatever it was which only got backported as far as wallaby	12:49
lajoskatona	fungi: yes, I have the same eventlet locally and no timeout locally, so that's why I thought there's something which I dont have locally	12:49
lajoskatona	fungi: I check, perhaps I missed that thread	12:49
fungi	i'll see if i can find the ml thread i'm thinking of, or whether i dreamed it	12:49
lajoskatona	fungi: thanks	12:50
fungi	i'm not immediately able to spot it in the archive	12:51
fungi	i may be remembering the os-vif/libuxbridge problem, though that doesn't seem at all similar	12:52
fungi	nothing jumps out at me from june either	12:54
lajoskatona	fungi: I found an old thread which pointed to this req bump: https://review.opendev.org/c/openstack/requirements/+/811555	12:55
lajoskatona	fungi: but not sure if we see the same here, this is the mail: https://lists.openstack.org/pipermail/openstack-discuss/2021-October/025179.html	12:56
elodilles	fungi: so far i only found this at neutron's unit test jobs	12:57
fungi	odd that it would have just started ~1.5 weeks ago. i don't see any recent constraints changes on those branches at all	12:58
elodilles	i've checked the pip freeze outputs of the failing job vs the previous passing job (from July 11th) and there is no difference at all	12:59
fungi	do we capture dpkg -l output?	12:59
elodilles	neither requirements' stable/victoria was touched since april	12:59
elodilles	fungi: i don't think i saw 'dpkg -l' in the logs	13:00
fungi	yeah, capture it for devstack jobs but not unit tests	13:00
elodilles	also interesting, that victoria is focal based but ussuri is bionic	13:01
fungi	maybe we could infer it by grabbing the dpkg -l from devstack jobs on the 10th and 12th or something	13:01
elodilles	fungi: ok, i'll try to do that	13:02
fungi	to see what might have updated in focal and in bionic around those dates	13:02
fungi	could be there was a security fix ubuntu rolled out on the 11th	13:02
lajoskatona	fungi, elodilles: yeah pip seems to be the same in the green and red runs	13:21
dpawlik	clarkb: hey, wanna check https://review.opendev.org/c/openstack/ci-log-processing/+/848218 please?	13:25
elodilles	i've taken a sample (from bionic, stable/ussuri) dpkg-l.txt diff between Jul 05 and Jul 19: https://paste.opendev.org/show/b2EuM9b16RC6il4G6kHx/	13:26
elodilles	(these were the closest runs i've found in neutron repo)	13:27
fungi	we don't do periodic stable devstack jobs for neutron daily any more?	13:34
elodilles	as far as i know stable-periodics are all 'lightweight' unit test jobs	13:36
*** ysandeep is now known as ysandeep\|afk		13:37
elodilles	hmmm, but neutron has extra 'periodic' jobs	13:41
fungi	even generic periodic stable jobs for devstack might be sufficient to spot what's changed in distro packages, if most of the same packages are getting installed in those jobs	13:44
lajoskatona	elodilles: we have, like here: https://zuul.openstack.org/buildsets?project=openstack%2Fneutron&branch=stable%2Fvictoria&pipeline=periodic&skip=0	13:45
lajoskatona	though it's new for me that there's periodic and periodic-stable pipeline.....	13:45
lajoskatona	in my mind it was the same	13:46
fungi	periodic is usually for master branch testing, and periodic-stable is for stable branch testing. we trigger them at slightly different times to offset the load	13:46
*** dasm\|off is now known as dasm\|ruck		13:48
fungi	oh, i guess not really that far apart. https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml indicates periodic should trigger at 02:00 and periodic-stable at 02:01, just far enough apart to make sure the periodic jobs get some priority for their node requests in case we can't run them all before load on the system picks back up	13:49
fungi	i was thinking of the periodic-weekly pipeline, which starts at 08:00 on saturdays, hopefully after the daily periodics have wrapped up	13:50
elodilles	so the difference seems to be: https://paste.opendev.org/show/bwXLDiDuz9mCUEk3OnZn/	13:57
elodilles	ignore me, i've diffed stable/wallaby :/	13:58
fungi	though it may be the same	14:02
fungi	at least the same as for victoria, since they run on the same platform	14:02
elodilles	yes, they are the same (both are focal)	14:03
elodilles	so at least the result is the same	14:03
fungi	so that suggests this situation could be brought on by a kernel or libc update	14:05
fungi	though i wonder why it doesn't affect wallaby jobs	14:05
elodilles	yes. (for ussuri / bionic: https://paste.opendev.org/show/b8JwAKL4wMqdxCGOZpp8/ )	14:09
*** ysandeep\|afk is now known as ysandeep		14:13
lajoskatona	elodilles: the upper lines of packages are from a passed run?	14:15
elodilles	lajoskatona: yes. in both case versions were bumped with one between July 11th and July 12th	14:23
elodilles	from 5.4.0-121 to 5.4.0-122 ; from 4.15.0-188 to 4.15.0-189	14:24
lajoskatona	elodilles: thanks	14:25
lajoskatona	elodilles: I this to the bug (https://bugs.launchpad.net/neutron/+bug/1982206 )	14:26
elodilles	lajoskatona: ++	14:45
fungi	you should be able to look at the ubuntu package changelogs to find out what "fixes" were included in -122 and -189 and if there's overlap that could be a clue, or maybe this was related to the libc update (the distro package updates could also just be a red herring)	14:52
fungi	oh, or maybe the kernel package changelogs are effectively useless :/	14:54
fungi	https://changelogs.ubuntu.com/changelogs/pool/main/l/linux-signed/linux-signed_4.15.0-189.200/changelog	14:54
fungi	i guess we'd need to figure out what patches were imported into the linux-signed source package between 4.15.0-188.199 and 4.15.0-189.200	14:58
fungi	there's probably a git repo on lp for that	14:58
fungi	https://code.launchpad.net/ubuntu/+source/linux-signed seems to be the place	15:00
clarkb	dpawlik: I've mentioned it before but I wonder why you don't just send the json as is? I don't understand why you have to read the json then reformat it and send it out again	15:01
fungi	https://git.launchpad.net/ubuntu/+source/linux-signed?h=ubuntu%2Fbionic-security	15:02
clarkb	dpawlik: but I also don't really have the bandwidth to review that stuff. This is why the opendev team stopped running those services	15:03
dpawlik	clarkb: there is also an json send to the separate index	15:05
dpawlik	clarkb: and for me it was obvious if someone will continue working on making some graphs base on the value that I have prepared in logsender would be good to review	15:05
dpawlik	but if not, ok, we can go as it is.	15:05
fungi	i can't seem to find the kernel patches, even on the applied version of that branch	15:06
clarkb	dpawlik: right sending to a separate index is good (I think that allows you to manage data rotations independently for the different types of information and have longer/shorter retention as necessary). But what confuses me is why you need to deserialize and reserialize the document in a different format. Can you just take what the job is emitting and send it to opensearch? I	15:08
clarkb	also agree it is good to have reviews. The problem is if I was someone who was able to do those reviews we wouldn't have needed to evict these services from opendev. I think you should be looking for help from the openstack project which aimed to preserve this functionality	15:08
clarkb	fungi: iirc ubuntu does log them somewhere but it is somewhere weird	15:08
clarkb	or weird to me ebcause I don't understand all the different branches and packges for the ubuntu kernels	15:08
clarkb	fungi: http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.4.0-122.138/changelog	15:10
fungi	https://ubuntu.com/security/notices/USN-5515-1	15:11
fungi	i came at it from another angle	15:11
clarkb	fungi: lajoskatona also I'm on a bit of a campaign to remind everyone that asks about failures without linking to a specific failure log to please do so :)	15:22
lajoskatona	clarkb: ack, I keep in mind	15:23
clarkb	https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bb1/periodic-stable/opendev.org/openstack/neutron/stable/victoria/openstack-tox-py38/bb13920/tmpkify9opp the truncated subunit log might be helpful too will show you the order of the tests that did run and which tests ran	15:23
clarkb	that might single out a specific test that is problematic or class of tests	15:24
clarkb	looking at the console log the last thing logged to the console was about 16 minutes prior to the timeout. Are your unittests still runnning with internal test timeouts? If so this implies whatever it is breaks that	15:26
clarkb	if not, then maybe you should reenable those timeouts to see if they can help catch what is breaking	15:26
clarkb	etc	15:26
clarkb	looks like neutron may have actually removed the test timeout by default ...	15:30
clarkb	on master it is only applied to the db migration tests?	15:30
clarkb	this is why those timeouts exist. So that the code that creates the problem can be more readily identified	15:30
lajoskatona	clarkb: you mean OS_TEST_TIMEOUT ?	15:34
clarkb	lajoskatona: yes	15:35
clarkb	but it seems like that is only applied to the db migration tests?	15:35
clarkb	the original intent way back when was that it be applied globally to catch test cases that locked up and hopefully provide some sort of feedback into where the lock up was	15:36
clarkb	I don't know that it would help here, but the idea behidn the global test timeouts is that it would	15:37
lajoskatona	clarkb: I see it in tox.ini on master, and for functional/test_migrations, so yes	15:38
clarkb	lajoskatona: one option to try may be setting a global test timeout to like 5 minutes (~1/3 the delta between logging and job timeout) and see if that produces any errors that are debuggable	15:46
*** dviroel is now known as dviroel\|lunch		16:06
*** ysandeep is now known as ysandeep\|out		16:11
*** akekane_ is now known as abhishekk		17:09
*** dviroel\|lunch is now known as dviroel		17:23
*** dviroel is now known as dviroel\|afk		19:34
*** rlandy is now known as rlandy\|biab		21:10
*** rlandy\|biab is now known as rlandy		21:33
*** rlandy is now known as rlandy\|bbl		22:15
*** rlandy\|bbl is now known as rlandy		23:30
*** dasm\|ruck is now known as dasm\|off		23:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!