Monday, 2023-11-06

opendevreview	gnuoy proposed openstack/project-config master: Create mono repo for sunbeam charms https://review.opendev.org/c/openstack/project-config/+/900167	11:29
*** elodilles_pto is now known as elodilles		11:39
opendevreview	gnuoy proposed openstack/project-config master: Create mono repo for sunbeam charms https://review.opendev.org/c/openstack/project-config/+/900167	14:04
opendevreview	Merged openstack/project-config master: Remove OSA rsyslog noop jobs once repo content is removed https://review.opendev.org/c/openstack/project-config/+/863087	14:45
opendevreview	gnuoy proposed openstack/project-config master: Create mono repo for sunbeam charms https://review.opendev.org/c/openstack/project-config/+/900167	14:45
tonyb	fungi, clarkb: IIUC the next step in mirgtaing mirror nodes to jammy is for someone with infra-root to boot a new node in each region. Then I/we can add those to the proper inventory and frobnicate DNS	15:01
fungi	tonyb: sounds right	15:02
tonyb	.... followed by cleanup etc	15:02
tonyb	I don't see any docs in system-config/docs/source about meetpad/jvb. In the bionic-servers etherpad I see "no real local state." I'm trying to determine what if any local state there is. Otherwise it looks really easy to test new meetpad/jvb nodes on jammy	16:03
clarkb	tonyb: because we don't do recordings etc I think the only "state" as it is is the CA stuff. However, because it is java it isn't actually checking for proper trust amongst the CAs instead it just wants to see a signed cert and then do ssl	16:06
clarkb	its weird, but ya I think it simplifies things	16:06
*** gouthamr_ is now known as gouthamr		16:09
clarkb	fungi: for the meeting agenda do we want to keep mm3 on the list to do a recap or should we pull it off (I beleive all tasks are completed now)	16:41
clarkb	when I scheduled the gerrit upgrade for 15:30 UTC I failed to remember that we'd be dropping DST this last weekend. Thats fine I'll just get up early	16:51
clarkb	more of a "DST strikes again!" situation	16:52
clarkb	fungi: I guess one oustanding mm3 item is to figure out the error we had post upgrade here: https://paste.opendev.org/show/bc7jfeZCt97fZm0dCPKw/ Compressing... Error parsing template socialaccount/signup.html: socialaccount/base.html	16:57
fungi	clarkb: oh, right. i'll look into that. i guess keep it on the agenda for now	17:06
tonyb	fungi: Possibly https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/MGY6JA6O7BWGR2KNKD3PQTMW7ZY7NHS3/ ... pip install django_allauth\<0.58	17:10
fungi	tonyb: supposed to be fixed in latest django-allauth i thouhgt	17:11
tonyb	Okay	17:11
clarkb	looks like the newer version fiex it. But maybe our pip resolution didn't pull it in?	17:12
corvus	clarkb fungi tonyb hiya; i wrote a change to the nodepool-openstack functional test (the test that runs devstack and then runs nodepool against it). part of that test boots a node (in potentially nested vm) and keyscans it. i noticed that it was taking an unusually long time to boot and start ssh, sometimes causing the test to fail, so my change increases the time we wait for the node to boot. the extra time could be natural evolution (maybe	17:15
corvus	some new thing on the image is just a little slower, and it's a straw-that-broke-the-camels-back situation). or it could be a string of bad luck in noisy-neighbor scheduling on our test nodes. or it could be something more substantial either in the devstack cloud or the test image used. i bring it up here in order to surface the problem so that more than just zuul maintainers would see it. if you have any thoughts about who else should be	17:15
corvus	made aware of this, feel free to ping them and let them know. https://review.opendev.org/900048 is the change to bump the timeout	17:15
clarkb	corvus: TheJulia has been debugging some unnecessary slowness with cloud-init when using config drive (it will dhcp first, then look at config drive and learn it must statically configure things). But I'm 98% sure that those jobs use glean on the images not cloud-init so I don't think that issue is related	17:16
clarkb	other than that I can't think of anything that might be slowing down the test node booting process	17:17
TheJulia	speaking of, you guys can nuke that held node now	17:17
opendevreview	Tony Breeds proposed opendev/system-config master: Add Tony Breeds to base_users for all hosts https://review.opendev.org/c/opendev/system-config/+/900220	17:17
TheJulia	since I think the outstanding questions got answered on Friday	17:17
clarkb	TheJulia: will do	17:17
clarkb	done.	17:18
clarkb	fungi: should I also delete the etherpad 1.9.4 hold since we have upgraded?	17:19
corvus	clarkb: yeah, i think they do use glean. they are centos stream 9 images.	17:23
corvus	here is the log output from one of the runs with an embedded console log: https://zuul.opendev.org/t/zuul/build/78382647905044ff8287dd03da8c154f/log/podman/nodepool_nodepool-launcher_1.txt#8019	17:24
fungi	clarkb: oh, yes don't need the held nodes for etherpad any longer	17:27
clarkb	fungi: all done	17:28
fungi	clarkb: we held the mailman upgrade change until django-allauth got fixed and then i rechecked it to confirm the new images had the fixed version	17:28
clarkb	fungi: huh, maybe it has to do with the upgrade then?	17:28
tonyb	corvus: I haven't looked at a centos boot for a while but amost a minute here seems excessive: https://zuul.opendev.org/t/zuul/build/78382647905044ff8287dd03da8c154f/log/podman/nodepool_nodepool-launcher_1.txt#8597	17:30
fungi	clarkb: tonyb running `sudo docker-compose -f /etc/mailman-compose/docker-compose.yaml exec mailman-web pip list\|grep allauth` on lists01 reports "django-allauth 0.58.1" by the way	17:31
tonyb	fungi: Okay. That verifies that.	17:32
clarkb	tonyb: corvus: these are doing qemu and not nested virt. So slowness is to be expected. I think generous timeouts are a good idea. Also I agree that looks like glean and no cloud-init to me	17:32
clarkb	as for why there is a bit time gap where tonyb points to I'm not sure. That looks like systemd working through rc scripts and /etc settings for unit files	17:33
clarkb	maybe it is waiting for a non RO filesystem ?	17:33
tonyb	Could be.	17:35
* tonyb is grabbing a CentOS image to poke at		17:37
clarkb	tonyb: note the images in these tests are built from scratch using the -minimal image	17:37
clarkb	er -minimal element so they won't have cloud-init in them for example. Maybe not necessary to see why early systemd is slow, but want to call it out	17:38
fungi	if you have the bandwidth, you can just download an already built one from https://nb01.opendev.org/images/ (or nb01)	17:38
fungi	(er, or nb02)	17:38
tonyb	Thanks. I was prepared for differences. Just wanted to poke around locally	17:38
tonyb	... or indeed I could do that :)	17:39
fungi	https://nb02.opendev.org/images/centos-9-stream-60d70c487274458ab3d42567cce05714.qcow2 is from 2023-11-05 15:46 and 7.6G in size	17:40
fungi	so pretty sure that's the newest one	17:40
corvus	unfortunately, i don't think we get a useful console log unless the boot fails, which makes it hard to compare with earlier successful builds	17:42
fungi	could collect syslog from the successfully booted one?	17:43
fungi	oh, i get it. you're saying we don't have earlier records, right	17:43
corvus	yeah would be a good idea. the job attempts to collect a console log outside of nodepool, but all it gets is grub. see for example this earlier successful build: https://zuul.opendev.org/t/zuul/build/2b10d60c2d804a3a9afdce06f50d7653/log/instances/57a231ba-c4f7-40bc-bed3-b5790d906a22/console.log	17:44
corvus	(yeah, so 2 probs: 1 we don't have the collection in place to record successful console logs (or its broken); 2 since that's true, we'll never have old data to compare it to; we can only fix that going forward)	17:46
tonyb	For a one off reference we could potentially use an older 9-stream compose and generate an image from there. I don't think that'd be too much work	18:06
tonyb	Looks like we could go back in time by about a month: https://composes.stream.centos.org/production/	18:09
clarkb	https://element.io/blog/element-to-adopt-agplv3/ I don't think this will affect directly as we aren't making changes to matrix services (and tehy would be publicly hosted even if we did)	18:14
tonyb	"Please see here for The Matrix.org Foundation's position. Contributors to these new repositories will now need to agree to a standard Apache CLA with Element before their PRs can be merged - this replaces the previous DCO sign-off mechanism." Interesting they're switching away from DCO	18:21
clarkb	tonyb: it is because they need a CLA to allow them to sell the code under other license terms	18:22
clarkb	they mention that explicitly later in the post	18:22
clarkb	(which is good that they aren't trying to be sneaky about it)	18:22
tonyb	Okay.	18:40
clarkb	fungi: any idea what the story is with https://review.opendev.org/c/openstack/project-config/+/800442 ? Is that somethign we just missed? I worry that merging it now would create more problems...	18:49
*** dhill is now known as Guest6059		19:42
*** JayF is now known as Guest6061		19:55
*** JasonF is now known as JayF		19:55
*** zigo_ is now known as zigo		19:57
fungi	i'm not sure. i guess it might be worth checking with the author to see if they still want it created (but also creating new repos in the x/ namespace seems questionable, ideally they would pick a new namespace)	19:59
johnsom	FYI, lots of job failures at the moment. orange post failers	20:09
clarkb	yes it was just bourhgt up in #openstack-infra	20:13
clarkb	appears to be a problem with at least one swift endpoint (rax-iad)	20:13
clarkb	need to spot check others to see if the rest of rax is ok or now	20:13
clarkb	*or not	20:13
clarkb	second one is in rax-dfw so probably a rax problem and not region specific	20:15
clarkb	2 on rax iad and 1 on rax dfw so far. I think thats enough to push a chagne to disable rax for now	20:17
opendevreview	Clark Boylan proposed opendev/base-jobs master: Temporarily disable log uploads to rax swift https://review.opendev.org/c/opendev/base-jobs/+/900243	20:18
clarkb	nothing on the rax status page yet. Could be on our end I suppose, but hard to say until we reproduce independently	20:19
clarkb	and I need to eat lunch while it is hot	20:19
johnsom	I am also having lunch, cheers	20:20
clarkb	https://zuul.opendev.org/t/openstack/stream/d3a19b50ca2f405aa9e5020cecf054d9?logfile=console.log is currently uploading to rax iad. I think it is going to fail, btu if it doesnt' that could be another useful data point	20:28
clarkb	fungi: ^ fyi a change to disable rax log uploads. And a job we can watch to see if it eventually fails	20:29
clarkb	https://zuul.opendev.org/t/openstack/stream/4d91518405774bc9acd1a62a24f7d40d?logfile=console.log is another to watch	20:30
clarkb	fungi: if it were an api/sdk compatibility thing I would expect these jobs to fail quicker	20:32
clarkb	but instead they seem to eb doign uploads or attempting them and then failing	20:32
clarkb	ok I just ran openstack container list against IAD and got back a 500	20:32
clarkb	goign to repeat for the other two regions, but I think this is not on our end	20:33
fungi	done with an old known-working cli/sdk install?	20:33
clarkb	it works against ORD but not IAD and DFW	20:33
fungi	aha, yep that sounds like it's on the service end	20:33
clarkb	fungi: just using what we've got installed on bridge but it works against ORD. But also servers shouldn't report Internal errors 500 on api comaptibility issues	20:33
fungi	i'm going to bypass zuul to merge that change	20:33
clarkb	fungi: do we want to modify it to keep ord for nopw?	20:34
clarkb	fungi: that could be good data that rax generally works iwth our sdk versions	20:34
fungi	clarkb: yeah, push a revision and then i'll bypass testing to merge it	20:34
clarkb	on it	20:34
opendevreview	Clark Boylan proposed opendev/base-jobs master: Temporarily disable log uploads to rax dfw and iad swift https://review.opendev.org/c/opendev/base-jobs/+/900243	20:35
clarkb	done	20:35
opendevreview	Merged opendev/base-jobs master: Temporarily disable log uploads to rax dfw and iad swift https://review.opendev.org/c/opendev/base-jobs/+/900243	20:36
fungi	#status log Bypassed testing to merge change 900243 as a temporary workaround for an outage in one of our log storage providers	20:37
opendevstatus	fungi: finished logging	20:37
clarkb	fwiw those two builds I linked did eventually fail which gives more evidence towards a problem on the other end	20:38
clarkb	the jobs for 900179 should have all started after the fix landed	20:49
clarkb	s/fix/workaround/	20:49
clarkb	I can list containers in iad and dfw again	21:52
clarkb	I suspect that we can revert that change. I'll push up a test only change first though to confirm	21:53
opendevreview	Clark Boylan proposed opendev/base-jobs master: Force base-test to upload only to rax-iad and rax-dfw https://review.opendev.org/c/opendev/base-jobs/+/900248	21:56
opendevreview	Clark Boylan proposed opendev/base-jobs master: Reset log upload targets https://review.opendev.org/c/opendev/base-jobs/+/900249	21:56
clarkb	I'm going to self approve that first one so that I can do a round of testing with a test change	21:57
clarkb	I'll recheck https://review.opendev.org/c/zuul/zuul-jobs/+/680178 as my test change once the base-test update lands	21:59
opendevreview	Merged opendev/base-jobs master: Force base-test to upload only to rax-iad and rax-dfw https://review.opendev.org/c/opendev/base-jobs/+/900248	22:02
clarkb	680178 jobs lgtm I think we can land 900249 but I won't self approve that one since it affects production jobs	22:35
JayF	fungi: any way to get mailman to send me a clean copy of an email to the list once it's released from moderation?	22:46
JayF	oh, there it goes, it's just very late, weird	22:47
JayF	my response to that in the web ui landed in my inbox before the email I released did	22:47
JayF	spooky	22:47
clarkb	I'm not sure I parsed that. You sent two responses one via email and the other by web ui?	22:51
JayF	I'm a moderator on the MM list. I get the email saying "moderate this, you!"	22:51
JayF	I go moderate it. Wait some time for the email to hit because it'll need a reply.	22:52
JayF	Never hits, so I assume (wrongly at this point) that I don't get my own copy, so I reply in web UI, that reply shows up in my inbox a minute or two later.	22:52
JayF	Literally three minutes later; the original message I approved shows up.	22:52
fungi	JayF: it's possible your mail provider is greylisting messages from the listserver. you should be able to look at the received header chain to determine where the delays were	22:53
clarkb	I think mailman operates internally via cron as well	22:53
JayF	I'm just over here trying to solve the architectural question of how it's possible, except via a delivery failure to my email address, ... yes exactly	22:53
clarkb	may have needed the release email job to execute	22:53
JayF	fungi: the extra-irony is: the email that arrived faster was marked by google as phishing, because it was from:@gr-oss.io and not sent by google	22:53
JayF	that makes sense, actually, from an architectural standpoint why I woulda seen the behavior I expected	22:54
fungi	we can work around that in the latest mailman version by setting specific domains to always get their from addresses rewritten	22:54
JayF	I'm sure it's fine, it got delivered to my inbox so it can't have made google that angry lol	22:55
fungi	google is especially tricky in that regard, because the dmarc policy they publish says to do one thing, but then they disregard it and do something different	22:55
JayF	You know my first tech job was for a spamhou^W email marketing company, right?	22:55
JayF	This stuff was a pain back then, and it's only moved to being more and more hidden and obscure. At least back in the late 2000s they'd still send you feedback when things were blocked or someone marked as spam so you could take action.	22:56
fungi	JayF: yes, my employer at the time was their hosting provider, i think we established	23:03
JayF	HS or P10?	23:03
fungi	so i was the employee tasked with receiving, triaging and forwarding along all the abuse complaints	23:03
fungi	hs	23:03
fungi	looks like whatever the swift blip in dfw/iad was, https://rackspace.service-now.com/system_status never recorded it	23:12
opendevreview	Merged opendev/base-jobs master: Reset log upload targets https://review.opendev.org/c/opendev/base-jobs/+/900249	23:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!