Monday, 2022-09-12

*** ysandeep\|out is now known as ysandeep		04:36
*** ysandeep is now known as ysandeep\|afk		05:54
*** ysandeep\|afk is now known as ysandeep		06:30
frickler	ianw: seems the translate01 mysql backup is failing since the recent update: mysqldump: Error: 'Access denied; you need (at least one of) the PROCESS privilege(s) for this operation' when trying to dump tablespaces	06:35
ianw	frickler: thanks, will look into it	06:39
frickler	Clark[m]: seems "docker-compose ps" also lists exited containers, seems to be a bug afaict, because it does have the -a option that should do that, like for "docker ps"	06:43
*** gibi_off is now known as gibi		07:02
*** jpena\|off is now known as jpena		07:10
*** ysandeep is now known as ysandeep\|lunch		10:34
opendevreview	Ghanshyam proposed openstack/project-config master: Make python version template unversioned https://review.opendev.org/c/openstack/project-config/+/856904	11:04
*** dviroel_ is now known as dviroel		11:45
frickler	gmann: I think I understand the idea now, but I'm not sure if it can actually word as planned. hoping others with more knowledge will jump in	12:00
frickler	s/word/work/	12:00
fungi	frickler: gmann: if you're going to set branch matchers, you have to do it in the jobs which the template includes, not in the template itself	12:20
fungi	"it is not possible to explicity set a branch matcher on a Project Template" https://zuul-ci.org/docs/zuul/latest/config/project.html#project-template	12:20
fungi	i get the impression from the commit message that was the expectation	12:20
fungi	ahh, yes i see the latest review comment points that out as well	12:22
gmann	fungi: frickler I know template cannot have branches variant but we can do this way https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/856903/1/zuul.d/project-templates.yaml#1466	12:24
gmann	same way we did for stable periodic job template https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/856903/1/zuul.d/project-templates.yaml#2398	12:26
fungi	gmann: yes, if you do it in the jobs that will work. it may make sense to go ahead and add the master branch matcher to the jobs in the new template so that it's clearer	12:28
gmann	fungi: sure, we can do that and once new stable branch is cut then add those explicitly on required jobs. I can update that in https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/856903	12:30
gmann	fungi: plan is if we can merge this then bot patch will propose the right template name otherwise more deliverables releases with stable branch will have different template https://review.opendev.org/c/openstack/project-config/+/856904	12:31
fungi	well, we can also (carefully) backport the template swap, but yes avoiding more unnecessary backports would be good	12:32
gmann	yeah, let's leave those with versioned-named template and now onwards we can take care of branch and master on generic one	12:34
fungi	gmann: you need 856903 merged first though, right? otherwise the release scripts are going to propose patches which can't merge	12:34
fungi	oh, i see, the template name is being reused, so i guess they'll still run tests even before 856903	12:35
gmann	fungi: we have same name template 'openstack-python3-jobs' exist, so release script bot patches will be ok. but yes we will merge 856903 soon so that it run the latest set of jobs as per 2023.1 testing runtime	12:35
gmann	https://github.com/openstack/openstack-zuul-jobs/blob/b61f7acddba26e5ac7c4ea7dbe1fe8cdc29fec7e/zuul.d/project-templates.yaml#L1597	12:36
gmann	this is helping me to transition from already merged bot patches to this new generic one	12:36
fungi	right, makes sense	12:36
*** ysandeep\|lunch is now known as ysandeep		12:44
*** dasm\|off is now known as dasm		13:33
gmann	fungi: can you review it, frickler is +2 now. https://review.opendev.org/c/openstack/project-config/+/856904	14:20
opendevreview	Merged openstack/project-config master: Make python version template unversioned https://review.opendev.org/c/openstack/project-config/+/856904	14:29
fungi	gmann: ^	14:38
gmann	fungi: thanks	14:41
frickler	amorin: infra-root: I'm seeing issues with nested-kvm on ovh-gra1, I'm not sure we even intended to run these jobs there. strangely the issue only manifests with a new cirros version using the latest 5.15 kernel. Ubuntu 22.04 using the same kernel seems unaffected, as well as older versions of both ubuntu and cirros	14:46
frickler	the same thing on vexxhost is working fine. https://zuul.opendev.org/t/openstack/build/b53d8543592f42da8b704a93ccab2e50/artifacts vexxhost working, https://zuul.opendev.org/t/openstack/build/9e75cbb7c1904c2cb39c4c001caed751 ovh broken	14:47
frickler	the VM crashes shortly after boot with: unchecked MSR access error: WRMSR to 0x48 (tried to write 0x0000000000000004) at rIP: 0xffffffff9f296104 (native_write_msr+0x4/0x30)	14:48
frickler	switching to qemu instead of kvm also solves the issue. seems pretty reproducible, but I've no idea what to do about it	14:50
fungi	frickler: i see a lot of reports online of benign "unchecked MSR access error: WRMSR to ..." when the kernel is configured to apply microcode updates to the processor at boot or resume from suspend	14:51
fungi	possible the crash is unrelated to that	14:52
frickler	hmm, the kernel should be the same, I can try to check whether the ubuntu image uses some additional cmdline args	14:59
fungi	well, the kernel in our job node and the guest created on it are, but you don't know what kernel is used for the underlying provider hypervisor	15:02
fungi	for these nested virt related crashes, it often involves some interaction between the kernel versions at all three layers	15:03
*** ysandeep is now known as ysandeep\|out		15:16
clarkb	new debian libc6 package is now available that should fix the ansible thing without backporting. I'll work on updating my change to update our python images to incorporate that (as I think upstream may not have updated the base image yet). I'll also look at the zuul restart failure this morning. But first local package updates and reboots and breakfast	15:16
*** ysandeep\|out is now known as ysandeep		15:16
amorin	frickler, ack, first time I heard about such thing	15:17
amorin	perhaps our CPU behavior is different that the one on vexxhost	15:18
amorin	which flavor are you using?	15:18
amorin	oh, its flavor from open infra I think	15:18
fungi	amorin: seems to be called "ssd-osFoundation-3"	15:19
amorin	ack, would you mind trying with a c2-7 or b2-7 for a one shot test?	15:20
*** ysandeep is now known as ysandeep\|out		15:20
fungi	frickler: ^ you'll probably need a refined reproducer you can run on a manually booted instance	15:21
fungi	maybe set a job hold, then shutdown and snapshot the instance for the held node, then boot that snapshot on the other flavors, ssh in and try booting the failed guest in devstack again?	15:23
fungi	(after restacking, i guess)	15:23
opendevreview	Clark Boylan proposed opendev/system-config master: Update python builder and base image https://review.opendev.org/c/opendev/system-config/+/856537	15:31
clarkb	I think if we land ^ we can remove the zuul libc workarounds	15:31
clarkb	the docker issue on zm05 has the docker OOM'd return code	15:40
clarkb	and it seems it was the exec to stop the container that OOM'd not the container itself	15:40
fungi	ouch	15:40
clarkb	however dmesg records no OOM either	15:40
clarkb	in syslog docker records its shim disconnected so it cleaned things up	15:43
frickler	I think I can manually boot a node an deploy the test there, but not today	15:43
*** marios is now known as marios\|out		15:47
clarkb	ok digging more I don't think OOMkiller was invoked. cacti and dmesg and syslog and so on all indicate no oomkiller would have happened. Docker inspect also shows "OOMKilled": false. Instead I htink this is a race between the merger exiting when asked to gracefully shutdown and the exec running in that container completing. The container itself reports it exited 0 and it is the	15:53
clarkb	docker exec that requests graceful shutdown with a 137 return code	15:53
clarkb	I think we can safely ignore rc 137 on the graceful shutdown command	15:54
clarkb	I'll work on a patch	15:54
*** dviroel is now known as dviroel\|lunch		15:55
fungi	rc 137 == "fatal error: requested action already complete"?	15:56
clarkb	fungi: its apparently what you get when you get kill -9'd	15:56
clarkb	in this case the container stopping and being cleaned up as a runtime is causing the equivalent of a kill -9 against the exec'd command I think	15:56
clarkb	since there is no longer a runtime for that	15:56
fungi	ahh, okay	15:57
clarkb	The executor theoretically ahs the same race but is far less likely to hit it simply due to the executor having a lot more stuff to do to shutdown. I won't update the executor as I'd like to observe if that ever happens	15:59
opendevreview	Clark Boylan proposed opendev/system-config master: Handle zuul merger shutdown race in graceful stop https://review.opendev.org/c/opendev/system-config/+/857209	16:03
clarkb	something like that	16:03
*** dhill is now known as Guest117		16:04
clarkb	I expect https://review.opendev.org/c/zuul/zuul/+/855801/ to land real soon now and we can coordinate ^ to run after that	16:05
clarkb	oh wait but there is another bug which is that docker-compose ps -q will list exited containers	16:07
* clarkb works on a fix for that too		16:07
clarkb	I'm beginning to wonder if I should just do what corvus suggests and set failed_when false on that task and stop trying to check things ahead of time	16:08
clarkb	that would address both issues	16:08
*** jpena\|off is now known as jpena		16:09
fungi	clarkb: in theory we could hit that same race with an idle executor, right?	16:10
*** jpena is now known as jpena\|off		16:10
clarkb	fungi: yes, but far less likely since even an idle executor has a bit more to shutdown	16:11
clarkb	fwiw I think frickler is correct that docker-compose ps is buggy. `docker ps -q --filter status=running` does not report exited containers but docker-compose command does	16:12
clarkb	if I flip it to status=exited then I see the containers indicating the filter is working on the docker side	16:12
clarkb	The issue with corvus' suggestion that I need to think about is whether or not we can safely failed_when false on the docker wait command. Since having no running containers would cause that to error	16:13
clarkb	maybe we need both things. Skip when no running containers, otherwise don't worry about errors too much	16:13
opendevreview	Clark Boylan proposed opendev/system-config master: Fixup zuul merger and executor graceful shutdowns https://review.opendev.org/c/opendev/system-config/+/857209	16:26
clarkb	That tries to check things rather than just doing a failed_when false	16:26
clarkb	allows us to do a wait when there are containers listed (and waiting on an exited container is a noop it just returns)	16:26
*** dviroel\|lunch is now known as dviroel		16:52
clarkb	fungi: in the mm3 change were you also going to update the prod inventory file for opendev and then comment out the other bits https://review.opendev.org/c/opendev/system-config/+/851248/74/inventory/service/host_vars/lists01.opendev.org.yaml ? THne we can stack changes on top of that when we reach that point to uncomment each site as we want to deploy it	17:08
fungi	oh, sure! i got tunnel vision on the test config	17:15
fungi	thanks for catching that	17:15
clarkb	the python image update change passes now and shows the new libc6 installation in the logs https://zuul.opendev.org/t/openstack/build/5a4589af96b04dc181a20f8a884de568/log/job-output.txt#1194	18:16
clarkb	I think if we land that we can drop the backport workaround in the zuul images	18:16
clarkb	and if I can get reviews for https://review.opendev.org/c/opendev/system-config/+/857209 I can manually rerun that playbook when we're ready to update the zuul nodesets support	18:25
clarkb	fungi: I guess at this point I can delete the older of the two mm3 autoholds.	18:27
clarkb	fungi: and we still need to test the pipermail redirect? Have you had a chance to look at that yet?	18:27
fungi	oh, no i haven't checked out the redirect yet	18:29
clarkb	ok we can do that on the newer of the two holds. Any objection to deleting the older of the two now?	18:29
fungi	no objection	18:31
fungi	we can set another autohold once i revise the change to add the missing bits added to the production config	18:32
clarkb	ok deleting now	18:32
clarkb	and ya thats probably a good idea to retest and ensure the migration is happier with those lists pre existing	18:33
fungi	yeah, now that i have it more scripted, it's easier to test that in bbulk	18:33
fungi	bulk	18:33
clarkb	I've just updated the meeting agenda wiki page. I've tried to add changes that are in flight and similar bits to the various topics. Please feel free to add more topics or info to existing topics	18:38
clarkb	hrm the system-config-run-zuul job has timed out twice in a row	19:39
fungi	odd	19:39
clarkb	They both ran on ovh gra1	19:40
clarkb	and it seems like we may be seeing the high cost per ansible task here at about 3 seconds	19:41
fungi	okay, maybe not so odd then if performance was similar for both	19:41
clarkb	https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#1841-6175	19:43
clarkb	thats a good chunk of time just adding known hosts	19:44
clarkb	about 13 minutes	19:44
clarkb	another minute here https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#6295-6373	19:45
clarkb	oh my first example is for two ansible loops not one. Each individully take about 6.5 minutes	19:48
clarkb	wow ok so we do this twice the second pass makes the first one redundant. According to the chagne log we do it redundantly beacuse some test checks the first one	19:50
clarkb	This is an easy improvement by removing the first and fixing the test if I can track it down	19:51
opendevreview	Clark Boylan proposed zuul/zuul-jobs master: Remove redundant ssh known hosts prep https://review.opendev.org/c/zuul/zuul-jobs/+/857228	19:58
clarkb	That would save 6.5 ish minutes	19:58
clarkb	another thing I'm noticing from those logs is that it almost looks like things run sequentially when they can run in parallel	20:03
clarkb	https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#7038-7044 is output by https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L46-L52	20:04
clarkb	it doesn't look like we set serial: 1 on that playbook	20:06
clarkb	oh! could it be that we run with -f 5? so the first 5 happen quickly and then we have a straggler?	20:06
clarkb	yes I think that may be what is happening	20:07
clarkb	jobs with more nodes than ~5 will incur wall clock penalties for things that would otherwise happen in parallel	20:08
clarkb	https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#7122-7162 writing out our zuul job vars files is also not quick. I think the templating is particularly slow? I may work on a change that moves non template files to copies instead of templates to see if that is any better and keep templating only when needed	20:15
corvus	clarkb: is the zuul rolling restart stuck again?	20:16
corvus	i see dev10 and dev18 components	20:16
clarkb	corvus: it crashed, there is discussion about it in scrollback but the tldr is in https://review.opendev.org/c/opendev/system-config/+/857209	20:17
corvus	clarkb: is fix to merge that and re-run?	20:17
clarkb	and now I'm sorting out why the job in that change is consistently timing out which led to https://review.opendev.org/c/zuul/zuul-jobs/+/857228	20:17
clarkb	corvus: yes	20:17
corvus	clarkb: i thought we ran our service playbooks with more than 5 forks, so if we're doing 5 in the test job, seems like we should increase that	20:23
clarkb	corvus: is that controllable from zuul?	20:25
corvus	i thought we were talking about the nested ansible	20:26
clarkb	I think it is affecting both	20:26
clarkb	https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L46-L52 that is run by zuul aiui.	20:27
clarkb	Ah and later in that playbook it sets -f 50 on a few playbooks. The others should only run against a single host	20:27
*** dviroel is now known as dviroel\|afk		20:28
corvus	so it should only be the in-zuul part, like the vars templating you were looking at. i don't think that's controllable in zuul.	20:30
corvus	s/should only be/should only be limited by/	20:31
clarkb	I think halving the runtime cost of setting up ssh known hosts via 857228 will make a quick big impact. And ya working on the shift of templating to copies if templating isn't needed to see if that helps too	20:32
corvus	clarkb: i'd be really surprised if the templating takes longer than copying	20:34
corvus	i think we're just looking at the iterating cost	20:34
clarkb	corvus: I thought that initially too, but the normal iteration cost seems to be about 2.5-3 seconds but the templates all take about 6 seconds	20:34
clarkb	however, that could just be chance. I figure putting the change together and checking isn't too bad	20:35
clarkb	if it is about 3 seconds quicker we'll save noticeable time with the number of files being written	20:36
corvus	clarkb: i can see no discernable difference in a local test	20:37
clarkb	corvus: https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#7122-7162 shows the 6 second ish cost per template and if you look at the tasks above that those that don't template are closer to 3	20:39
clarkb	but that is only one data point	20:39
opendevreview	Jeremy Stanley proposed opendev/system-config master: Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	20:40
corvus	sure but there are other differences at play here? restarting the host loop... different tasks....	20:40
corvus	i guess i'm saying two things: 1) it seems unlikely and probably not the lowest hanging fruit; 2) having those be templated is really valuable and if we're going to change that, i think we should be really sure it's making a difference. i don't want to start propagating a "copy is faster than template" meme without really compelling evidence.	20:41
corvus	i mean, i just proved locally that template is faster than copy... so.. :)	20:42
clarkb	yes, and the way to collect that evidence is to write a change and have zuul run it in the CI system?	20:42
clarkb	I'm not saying we should merge any such chagne right now. Just that I think it is worth checking	20:43
corvus	clarkb: testing it that way is going to be super tricky -- you're introducing the load of the executors and the remote cloud chosen to run the job and the noisy neighbors in that cloud all as variables	20:43
corvus	i think a reliable test of that needs to be two runs in the exact same conditions	20:44
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Check if file copies are quicker than templating https://review.opendev.org/c/opendev/system-config/+/857232	20:44
corvus	so right now, i'm giving my local test of a playbook far more weight than i would give a change that just s/template/copy/ in our job	20:44
clarkb	corvus: your local run excludes any networking and probably has the advantage of nvme storage and so on though. I can imagine a situation where your local run isn't representative of what happens in the CI system.	20:45
corvus	i ran it over the network	20:45
corvus	and of course it's not representative	20:46
clarkb	that said I agree it is a long shot, but I mean ansible tasks should never take 3 seconds anyway. So I won't be surprised at all if the templating system has some weirdness that makes it take longer too	20:46
clarkb	the change to remove unneeded tasks from the multi node known hosts role is definitely the one we should focus on right now I think	20:46
corvus	the point is to determine if template is faster than copy, and i have a high degree of confidence it's not	20:46
corvus	er, strike that, reverse it.	20:46
corvus	clarkb: yeah, that one is already approved	20:47
clarkb	oh cool	20:47
corvus	i was really just trying to save you the trouble of writing 857232	20:47
corvus	which is why i went to the trouble of doing the local tests	20:47
corvus	but :(	20:47
clarkb	I don't see it as trouble right now. Ansible is slow and exhibiting some interesting behaviors and I think examining them is useful and interesting	20:48
corvus	sure, but that test isn't going to tell us anything	20:48
corvus	i agree improvements and understanding would be great	20:49
corvus	https://etherpad.opendev.org/p/r0rl8o1XZ2SI1p1bugF0	20:49
corvus	clarkb: ^ that's the playbook i ran with the resulting times	20:50
corvus	quite likely they're within the margin of error	20:50
clarkb	in particular I think if we can understand why an inifile task at https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#7113 takes half the time as a template task at https://zuul.opendev.org/t/openstack/build/5dc476ff4e614e2fadefc7a9ff2255de/log/job-output.txt#7122 we may be able to improve our use of ansible in an impactful manner for many	20:50
clarkb	jobs	20:50
clarkb	Both are writing to the same filesystem on the same host (the fake bridge). It is possible we got lucky and noisy neighbors just happened to show up in between those two tasks. The inifile module may have a particularly efficient implementation. etc	20:52
opendevreview	Merged zuul/zuul-jobs master: Remove redundant ssh known hosts prep https://review.opendev.org/c/zuul/zuul-jobs/+/857228	20:55
clarkb	The ansible async option is something we might want to look at on these loops too. But I seem to recally people having difficulty using that option	20:55
clarkb	corvus: the run against 857232 does seem to show inifile is still half the runtime of template and copy is about the same as template. The consistent speedup between inifile and the rest make think we aren't getting lucky with noisy neighbors. Maybe we need to put everything in an inifile	21:00
corvus	clarkb: haha, i'm having a "cant tell if serious" moment. :)	21:01
clarkb	not really serious, but it does show that ansible can be faster at similar tasks than it is using the typical approach. I'm wondering if inifile avoids copying any files over the wire and does all of its changes on the remote node which might explain it	21:02
corvus	yeah, it did occur to me it might have some radically different approach like that; i haven't opened up the modules to see	21:03
corvus	(my first thought was "haha ini files for everything!" then "wait could that work?" then "well, yes, but it would be silly, right?" :)	21:03
corvus	i do with there were a way to do this without loops	21:04
clarkb	++ I was thinking that with the known hosts thing too. Like giving the known hosts module a list of keys to set	21:06
clarkb	A lot of the ansible modules seem to be written to accept a singleton input with the assumption you'll use loops to do multiples. Problem is loops are super expensive	21:07
corvus	i wonder if there's something about or callbacks or otherwise related to our env that slows it down. since my local test is like 0.5s for a task	21:09
clarkb	https://github.com/ansible-collections/community.general/blob/main/plugins/modules/files/ini_file.py I do think it is operating on the remote exclusively without doing file copies	21:09
corvus	these are really small files though, so while it is a difference, i wouldn't expect transferring that data to take an extra 3 seconds... how does it transfer it anyway? in the json blob, or does it open an ssh channel?	21:10
clarkb	parsing the copy module is a lot more involved :) trying to sort out where it gets the data	21:14
clarkb	it looks like the copy module receives source paths and not content unless you use teh content parameter. But I'm not immediately seeing how it converts a src path on the ansible control side to the remote side file copy	21:18
clarkb	ah I think the action module portion of copy does the transfer and the regular ansible module code can ignore it	21:23
clarkb	yup the action module creates a new tmp_src value that it feeds into the ansible module when it calls it	21:25
clarkb	corvus: there is what appears to be an undocumented "raw" argument to the copy module that seems to write the file straight to its destination rather than doing a copy to the remote node in a tmp location and then a copy from that tmp location to the final location	21:29
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Check if file copies are quicker than templating https://review.opendev.org/c/opendev/system-config/+/857232	21:32
clarkb	I really hope using raw ^ isn't faster	21:32
clarkb	corvus: looking at it from the callback angle the command stuff doesn't apply and the console stream runs in another process. That would mean zuul_json would have to be at fault?	21:40
clarkb	(assuming it is something to do with callbacks	21:40
clarkb	the task start callback handler is pretty straightforward. It accesses a few task attributes and gets a time stamp	21:43
clarkb	I have a hard time seeing how that would create an impact unless processing the callback at all was problematic	21:43
corvus	yeah, nothing is immediately occuring to me either. also, these are small files.	21:44
clarkb	my attempt at using raw failed with permissions errors. I suspect not raw figures those out automatically	21:45
clarkb	but I think grepping the error message shows that it is using ssh to transfer the file	21:46
corvus	so maybe we're looking at the overhead for establishing a new ssh channel?	21:50
clarkb	ya, it seems to be hooking up ssh to dd (and the dd is what ultimately failed)	21:51
ianw	frickler: that error sounds exactly like -> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1973839	21:52
ianw	https://review.opendev.org/c/openstack/project-config/+/842654 was the workaround for that	21:54
ianw	i am guessing that booting cirros with nested virt hits the same problem in the cirros kernel?	21:55
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Check if file copies are quicker than templating https://review.opendev.org/c/opendev/system-config/+/857232	21:57
clarkb	corvus: ^ that does appear to save about a minute of wall time compared to the original in the currently running test for it	22:07
clarkb	corvus: the impact on the slower ovh gra1 nodes is bigger too. Basically switching to synchronize means each synchronize task is baout the same as a single template or copy	22:08
clarkb	in the worst case I think we're saving about 3x the case above. So about 3 minutes total saved? I'm not sure that is worthwhile but if we can chip away a minute here and a minute there by writing more efficient ansible the aggregate should be pretty decent	22:14
clarkb	ianw: ^ related I think we should look into opportunities to change how the log files are encrypted	22:15
corvus	clarkb: tbh, the extra minute is worth it to me in order to make all of that more comprehensible. i hate that it's slow, but i don't like the idea that we would be adding one more level to the flow chart of "how did this variable get into the final job". i like that your change is conservative in that it syncs all the files over meaning that it should be impossible to end up with a file missing. but it does mean that if we want to template,	22:15
corvus	we have to remember to add it to the template list (otherwise, we'll end up with {string} values. all said, i think i can deal with it, so if i'm in the minority, i won't object.	22:15
clarkb	ianw: thats another place where we loop and maybe we don't need to. Perhaps we can have a shell task that loops over them all and does the encryption instead?	22:15
corvus	clarkb: (fwiw, i think that is the best possible change given the position ansible has put us in, so i mean, on that basis: nicely done :)	22:17
clarkb	ya I'll work on cleaning the change up once testing shows its generally happy. Then we can decide if we want to merge it or not	22:18
corvus	clarkb: (i'm mostly just coming from "this is the most complicated and round-about part of the whole nested ansible process and i barely understand the original and i wrote it)	22:18
corvus	i'm always forgetting to add the fake template var files, so i know i'm going to forget to add them to the template list	22:19
clarkb	ya I forget often myself. Related the mm3 change adds host_vars just for the list server vars becuse I couldn't figure out how to do bits that needed to be raw as raw when we needed nested raw tags https://review.opendev.org/c/opendev/system-config/+/851248/75/inventory/service/host_vars/lists01.opendev.org.yaml	22:21
corvus	but hey, maybe they will fail in a way that suggests the obvious fix is to add them to the template list	22:21
clarkb	in particular it seems the parser ansible uses doesn't do proper push down and its doing a naive matching so the first end matches the outer rather than inner raw handling	22:21
clarkb	I gave up and just decided to copy the file as is	22:21
clarkb	lines 50-54 are the probematic ones there	22:22
opendevreview	Merged opendev/system-config master: Fixup zuul merger and executor graceful shutdowns https://review.opendev.org/c/opendev/system-config/+/857209	22:24
clarkb	I'll work on running ^ in screen on bridge next	22:24
clarkb	and it is running now	22:30
clarkb	I guess I should've checked if all the changes on teh zuul side we wanted in are in first. But its late enough in the day that it should be done sometime tomorrow and we can run it again :)	22:30
corvus	clarkb: like what changes?	22:32
corvus	you mean the speedup, or any changes to get in for the restart?	22:33
corvus	(if the former, i think your zuul-jobs change landed; if the latter -- i think we had zuul where we wanted it for the restart for 6.4.0 on friday, so getting it unstuck as close to that as possible would be great)	22:34
opendevreview	Clark Boylan proposed opendev/system-config master: Use synchronize to copy test host/group vars https://review.opendev.org/c/opendev/system-config/+/857232	22:35
clarkb	corvus: the latter. Ok I wasn't sure if the nodeset changes had landed yet	22:36
clarkb	but those are also less important for 6.4.0 I think	22:36
clarkb	corvus: 857232 is cleaned up an in a reviewable state if you want to add your thoughts to it.	22:36
corvus	yeah, my preference would be to defer those (and anything else) until we complete the weekend restart so that we can release current master as 6.4.0.	22:37
clarkb	works for me. That restart is in progress now	22:37
clarkb	it is logging to the normal file on bridge if you want tofollow along in more detail than the componets page	22:37
* clarkb adds ansible optimization thoughts to the meeting agenda		22:38
corvus	clarkb: done, thanks!	22:41
*** dasm is now known as dasm\|off		22:59
ianw	clarkb: yeah, it's quite loop-y. i guess it's nice to have the tasks split up in console/ara views, but equally a shell script gets the same thing done	23:08
clarkb	ianw: ya the problem is 6 secnods for each task adds up quickly when you do more than 10 things in a loop	23:13
clarkb	I agree ansible should be better here, but dealing with that we've got seems pragmatic if we continue to hit job timeouts	23:13
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Try ansible pipelining in our system-config-run jobs https://review.opendev.org/c/opendev/system-config/+/857239	23:21
clarkb	after pushing that I've realized we alerady set that in the ansible config so thats a noop	23:23
ianw	it looks like the translate mysql dump backup is failing due to https://bugs.mysql.com/bug.php?id=100219 which is a breaking security change. one option is to add --no-tablespaces to the dump, or i guess adjust the privileges of the dump user	23:30
ianw	the db's don't have a root user by default, you have to add one and that then warns the db will become unsupported. so i can see a way to GRANT PROCESS to the zanata user. i'll juts work around it in puppet	23:35
opendevreview	Ian Wienand proposed opendev/system-config master: translate: fix dump with MySQL 5.7 https://review.opendev.org/c/opendev/system-config/+/857241	23:39
clarkb	the zuul reboot playbook has moved on to ze02. Thats a good indication I haven't broken anything horribly with that change	23:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!