fungi | that's what it looked like a month ago | 00:02 |
---|---|---|
ianw | oohhh, this must have run before https://opendev.org/openstack/devstack/commit/07be5574726ac71cae7707677258a5d711411725 | 00:02 |
ianw | merged 7 hours ago? | 00:02 |
ianw | that has to be it | 00:03 |
fungi | https://review.opendev.org/712609 is what changed it to not that | 00:03 |
ianw | yes it merged at 16:16 and the job i'm looking at ran at 13 something | 00:04 |
fungi | that makes sense, in that case | 00:04 |
ianw | so what have we reverted the suse images too? was the old one still around? | 00:04 |
fungi | ianw: it was only sort of around | 00:04 |
fungi | basically vexxhost was failing to delete a copy so mordred managed to download it back from vexxhost and he and corvus reintegrated it into nodepool | 00:05 |
ianw | well, i feel like it would work now anyway | 00:05 |
fungi | seems likely, yes | 00:07 |
fungi | but in case it doesn't, it would be good to have a rollback solution which doesn't take hours to enact | 00:08 |
fungi | granted a bunch of that time was spent in confusion because nodepool doesn't correctly reflect deleting state for the remote copy (a fix for that has since been proposed) | 00:11 |
fungi | but we basically lucked out that there was still a copy stuck deleting in vexxhost | 00:12 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: nodepool: use job inheritance https://review.opendev.org/713158 | 00:46 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add ubuntu-bionic-plain to more regions https://review.opendev.org/720316 | 00:48 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add ubuntu-bionic-plain to all regions https://review.opendev.org/720316 | 00:52 |
*** prometheanfire has quit IRC | 01:00 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: nodepool: Add more plain images https://review.opendev.org/720318 | 01:07 |
*** prometheanfire has joined #opendev | 01:07 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add ubuntu-bionic-plain to all regions https://review.opendev.org/720316 | 02:31 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: nodepool: Add more plain images https://review.opendev.org/720318 | 02:31 |
prometheanfire | ianw: mind taking a look at my glean review? https://review.opendev.org/717339 | 03:18 |
prometheanfire | ianw: it's probably needed for the dib change | 03:18 |
prometheanfire | which means a release :( | 03:18 |
ianw | thanks, i think that seems sane | 03:23 |
prometheanfire | glad it does to someone :D | 03:25 |
ianw | couple of nits inline | 03:26 |
prometheanfire | cool | 03:37 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove ansible_user_dir https://review.opendev.org/720336 | 03:38 |
mordred | ianw: if you have a sec ... ^^ - been fighting the long tail on the new trigger-things-with-in-tree | 03:39 |
ianw | mordred: heh, i am familiar with long tail on changes!!!! | 03:39 |
mordred | it's the best tail isn't it :) | 03:39 |
mordred | ianw: most of the stuff is well tested - but the bits where tests and prod are different are where we keep failing. go figure right? | 03:40 |
prometheanfire | ianw: for some reason, even though the exit code is 0, if I use == 0 instead of true it doesn't work, I'll try again (tox runs should catch it) | 03:40 |
prometheanfire | ianw: this fails tox :| https://gist.github.com/77295626417800dedb6971e3188ae7a5 | 03:49 |
prometheanfire | I do think that it should check against numbers though | 03:49 |
ianw | what's the failure? | 03:54 |
prometheanfire | DEBUG [glean] resolved in use, writing to /etc/systemd/resolved.conf | 04:03 |
prometheanfire | ianw: it's confusing to me, it doesn't seem like the networkd codepath in the test_glean.py file isn't being hit | 04:04 |
prometheanfire | specifically if distro.lower() is 'networkd': | 04:05 |
*** osmanlicilegi has joined #opendev | 04:14 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Add centos aarch64 tests https://review.opendev.org/720339 | 04:16 |
prometheanfire | ianw: you mind applying the patch and running tox? I feel like I'm doing something wrong mock wise | 04:21 |
ianw | prometheanfire: tox -e py3 works for me? | 04:24 |
prometheanfire | ianw: with the patch (gist) I linked you? | 04:25 |
prometheanfire | ianw: those should be returning 0/3 and checking for it, but that doesn't seem to be working, for some reason (for instance) the opensuse test has resolved_enabled == 0 | 04:27 |
ianw | hrm, just a sec | 04:28 |
*** ykarel|away is now known as ykarel | 04:28 | |
openstackgerrit | Merged openstack/diskimage-builder master: Do not try to use MBR on AArch64 https://review.opendev.org/719805 | 04:28 |
prometheanfire | changing the is not to 0 to be is not '0' (and is '0') seems to help | 04:29 |
prometheanfire | now I have the oposite fail, but closer, I think | 04:29 |
ianw | why don't you have just one function as the side-effect of os.system and switch in there? | 04:33 |
prometheanfire | I'd have to pass the distro to the side_effect as well, at least I think | 04:33 |
prometheanfire | like I said, my python/mock isn't the best | 04:33 |
ianw | you can do that, just pass it with the functools partial | 04:37 |
prometheanfire | do you think it'll help here? or is that just a style fix? | 04:38 |
prometheanfire | because I think it's just style | 04:38 |
* prometheanfire has spent hours on this true/false/string/int stuff at this point | 04:38 | |
openstackgerrit | Merged opendev/system-config master: Remove ansible_user_dir https://review.opendev.org/720336 | 04:44 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: nb03: use linaro-us mirror https://review.opendev.org/720342 | 04:51 |
ianw | sorry i've just got about 3 other things i'm monitoring right now | 04:51 |
prometheanfire | understood, still banging my head against it | 04:56 |
prometheanfire | something is going to give and it's not my head | 04:56 |
prometheanfire | ianw: basically, if you can figure out a better way to mock that system call (who returns ints not bool it seems) then I'm all for it, but for some reason the moking is not picking up what distro the test is running | 04:58 |
prometheanfire | added a print statemet that only gets printed if we are networkd, then, if not returns 3 for that os.system call. in cmd.py I log the output of that os.system call | 05:00 |
prometheanfire | it's always 0 | 05:00 |
prometheanfire | wtf | 05:00 |
prometheanfire | freaking is vs == | 05:01 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 05:04 |
prometheanfire | ianw: computers suck, they do exactly what we tell them instead of figuring out the right thing | 05:04 |
ianw | if distro.lower() is 'networkd' won't do what you'd think ... that wants to be == | 05:12 |
prometheanfire | is one of those still around? | 05:16 |
prometheanfire | oh, I should remove my tox change | 05:17 |
openstackgerrit | Ian Wienand proposed opendev/glean master: [dnm] update of I644e0b50cfb7bb00a108160b99c0c1359d6a9dd4 https://review.opendev.org/720348 | 05:17 |
ianw | prometheanfire: ^ something like that i think | 05:18 |
openstackgerrit | Matthew Thode proposed opendev/glean master: write one resolv config https://review.opendev.org/717339 | 05:18 |
prometheanfire | ianw: I'm not sure what you changed? my review doesn't use 'is' anymore | 05:18 |
prometheanfire | ah, we both solved it separately :D | 05:19 |
ianw | just make one os_system_side_effect function | 05:19 |
prometheanfire | I did | 05:20 |
prometheanfire | ianw: see my last two changes | 05:20 |
prometheanfire | :D | 05:20 |
ianw | ok | 05:22 |
prometheanfire | my function is slightly diferent, but overall does the same thing | 05:23 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Use TOX_CONSTRAINTS_FILE in release script https://review.opendev.org/720265 | 05:36 |
AJaeger | ianw: updated as suggested ^ | 05:37 |
ianw | AJaeger: perhaps a __ typo there? | 05:37 |
*** ysandeep is now known as ysandeep|brb | 05:38 | |
AJaeger | indeed ;( | 05:41 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Use TOX_CONSTRAINTS_FILE in release script https://review.opendev.org/720265 | 05:41 |
AJaeger | thx | 05:46 |
AJaeger | ianw: is https://review.opendev.org/713158 safe to merge? Then I'll review later... | 05:46 |
ianw | umm, how about i restart the builders so it is, we have to do it sometime. nb04 is ok | 05:48 |
openstackgerrit | Merged openstack/project-config master: nb03: use linaro-us mirror https://review.opendev.org/720342 | 05:49 |
AJaeger | infra-root, in https://review.opendev.org/720342 the promote job failed - infra-prod-service-nodepool | 05:52 |
AJaeger | guess mordred needs to fix the failure above first ^ | 05:53 |
ianw | Host key verification failed. ... weird | 05:53 |
*** ralonsoh has joined #opendev | 05:56 | |
openstackgerrit | Merged openstack/project-config master: Use TOX_CONSTRAINTS_FILE in release script https://review.opendev.org/720265 | 05:58 |
*** Romik has joined #opendev | 05:58 | |
ianw | #status log restarted all nodepool builders to pickup https://review.opendev.org/#/c/713157/ | 05:59 |
openstackstatus | ianw: finished logging | 05:59 |
ianw | (well, not nb04 because that's already got it) | 05:59 |
prometheanfire | good, passed tests | 06:10 |
*** Romik has quit IRC | 06:24 | |
*** ysandeep|brb is now known as ysandeep | 06:34 | |
*** DSpider has joined #opendev | 06:35 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: nodepool: use job inheritance https://review.opendev.org/713158 | 06:37 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Add ubuntu-bionic-plain to all regions https://review.opendev.org/720316 | 06:37 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: nodepool: Add more plain images https://review.opendev.org/720318 | 06:37 |
frickler | mordred: ianw: AJaeger: the jobs adds the local known host key for bridge.o.o, but then the role connects to zuul@localhost. not sure whether amending the role or just adding the key for localhost would be the right solution, though | 06:38 |
AJaeger | thanks, frickler | 06:39 |
*** drifterza has joined #opendev | 06:39 | |
AJaeger | frickler: care to review https://review.opendev.org/713158 , please? | 06:39 |
*** Romik has joined #opendev | 06:44 | |
frickler | AJaeger: uh, that's a big one, will have to put it on my list for later today | 06:44 |
AJaeger | frickler: yeah, took me a mug of tea ;) | 06:45 |
AJaeger | frickler: it's rather mechanical on the other hand | 06:45 |
*** dpawlik has joined #opendev | 06:49 | |
*** rpittau|afk is now known as rpittau | 07:34 | |
*** tosky has joined #opendev | 07:38 | |
*** ykarel is now known as ykarel|lunch | 07:43 | |
*** moppiner is now known as moppy | 07:45 | |
openstackgerrit | Roman Gorshunov proposed openstack/project-config master: Retire airship-in-a-bottle https://review.opendev.org/720160 | 08:03 |
*** Romik has quit IRC | 08:10 | |
*** ysandeep is now known as ysandeep|lunch | 08:53 | |
*** hrw has joined #opendev | 09:04 | |
hrw | morning | 09:04 |
hrw | ianw: thanks | 09:04 |
*** ykarel|lunch is now known as ykarel | 09:21 | |
openstackgerrit | Marcin Juszkiewicz proposed openstack/project-config master: Add CentOS 8 AArch64 nodes https://review.opendev.org/720167 | 09:22 |
ianw | thanks, lgtm, my preference would be to remove pip-and-virtualenv to not grow further dependencies we need to remove. happy for someone to merge, you should be able to view build logs on nb03.openstack.org | 09:26 |
openstackgerrit | Marcin Juszkiewicz proposed openstack/project-config master: Add CentOS 8 AArch64 nodes https://review.opendev.org/720167 | 09:30 |
hrw | thanks | 09:37 |
*** smcginnis has quit IRC | 09:52 | |
*** lpetrut has joined #opendev | 09:56 | |
frickler | mordred: the post-merge failure here also seems related to your bridge updates https://review.opendev.org/720245 | 10:09 |
*** elod_ has joined #opendev | 10:09 | |
*** elod_ has quit IRC | 10:09 | |
*** rpittau is now known as rpittau|bbl | 10:19 | |
*** ysandeep|lunch is now known as ysandeep | 10:22 | |
openstackgerrit | Merged openstack/project-config master: Add CentOS 8 AArch64 nodes https://review.opendev.org/720167 | 10:37 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: nodepool: use job inheritance https://review.opendev.org/713158 | 10:55 |
*** avass has quit IRC | 11:06 | |
*** ysandeep is now known as ysandeep|afk | 11:15 | |
*** drifterza has quit IRC | 11:22 | |
*** ysandeep|afk is now known as ysandeep | 11:47 | |
*** dpawlik has quit IRC | 11:50 | |
*** dpawlik has joined #opendev | 11:50 | |
*** rpittau|bbl is now known as rpittau | 12:07 | |
*** hashar has joined #opendev | 12:09 | |
*** dpawlik has quit IRC | 12:09 | |
*** smcginnis has joined #opendev | 12:09 | |
*** dpawlik has joined #opendev | 12:10 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: dhall-diff: add new job https://review.opendev.org/718694 | 12:44 |
mordred | frickler: I'm very confused why it's trying to push to localhost :( | 13:02 |
mordred | it should be trying to push to bridge ... which is what I'd expect ansible_host to be set to there | 13:06 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Set ansible_host explicitly https://review.opendev.org/720469 | 13:07 |
mordred | frickler, fungi : ^^ that's a bit of a stab in the dark | 13:08 |
*** ysandeep is now known as ysandeep|mtg | 13:14 | |
corvus | mordred: see comment on https://review.opendev.org/720469 | 13:25 |
mordred | corvus: damn. yup | 13:27 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Set ansible_host explicitly https://review.opendev.org/720469 | 13:27 |
mordred | corvus: that said - does that make _any_ sense to you? | 13:28 |
corvus | mordred: nope, was about to start looking for other ideas | 13:28 |
mordred | the role does delegate_to: locahost and uses ansible_host ... only thing I could think of is maybe add_host isn't setting ansible_host - but that is just bong | 13:29 |
mordred | corvus: I have confirmed the behavior | 13:31 |
corvus | mordred: well, it did the same thing with _port | 13:31 |
corvus | mordred: can we add it to add_host, like port? | 13:32 |
mordred | maybe - lemme check | 13:32 |
*** ykarel is now known as ykarel|afk | 13:32 | |
mordred | yes, that works | 13:32 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Set ansible_host explicitly https://review.opendev.org/720469 | 13:34 |
mordred | corvus: that works in my local testing | 13:34 |
corvus | mordred: +3 | 13:35 |
mordred | corvus: ok - so - I get the same behavior with the file in the ... oh! | 13:36 |
mordred | corvus: we explicitly set ansible_host in the zuul prepared inventory | 13:36 |
mordred | to the ip address | 13:36 |
mordred | if I make an inventory with just a host in it (no ansible_host set) - I get the same behavior as with add_host in the playbook | 13:36 |
corvus | ah, so we accidentally relied on that being set by zuul (which isn't crazy, it's pretty much a zuul role :) | 13:37 |
mordred | so - in general there seems like some weirdness related to add_host there - but in the zuul exeution context it should work | 13:37 |
mordred | yeah | 13:37 |
corvus | i think this is probably the best solution | 13:37 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Meetpad: proxy through meetpad to etherpad.opendev.org https://review.opendev.org/720095 | 13:39 |
corvus | mordred: i think my next task should either be working on containerizing zk or nodepool-launcher -- are you doing either of those right now? | 13:41 |
mordred | corvus: I started looking at nodepool-launcher right before eod yesterday - it's my planned next task | 13:42 |
corvus | cool, i'll start on zk | 13:42 |
mordred | corvus: I'm excited about our new containerized zuul future | 13:42 |
corvus | ya | 13:42 |
*** kevinz has quit IRC | 13:51 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove puppet and cron mentions from docs https://review.opendev.org/718791 | 14:00 |
mordred | corvus, fungi: ^^ updated that with mentions of the DISABLE-ANSIBLE flag file | 14:00 |
openstackgerrit | Merged opendev/system-config master: Set ansible_host explicitly https://review.opendev.org/720469 | 14:01 |
corvus | oof, and rebased :( | 14:01 |
corvus | mordred: whenever possible, can you try to avoid rebasing changes like that? :) | 14:02 |
corvus | i know there's a lot of stuff in flight, but i'd really love to just review the delta there | 14:02 |
*** roman_g has quit IRC | 14:02 | |
corvus | mordred: i rebased it locally, mind if i push that up? | 14:04 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Remove puppet and cron mentions from docs https://review.opendev.org/718791 | 14:05 |
mordred | corvus: thanks | 14:08 |
mordred | corvus: woot! install-ansible worked | 14:09 |
corvus | mordred: install-ansible? er, do you mean the workspace sync stuff? | 14:09 |
mordred | yeah | 14:09 |
corvus | \ol | 14:09 |
* mordred is going to re-enqueue the project-config patch that failed overnight | 14:10 | |
mordred | corvus: multiple project stanzas are ok and get merged right? | 14:13 |
corvus | yep | 14:13 |
corvus | (we use that heavily in zuul-jobs/zuul-test.d) | 14:14 |
mordred | corvus: I'd like to split the system-config zuul.yaml into a .zuul.d dir organized by purpose | 14:14 |
mordred | so I was thinking putting the project defs for the set of jobs in the same file would be nice | 14:14 |
corvus | ++ yep that's the pattern in zuul-jobs | 14:14 |
mordred | cool | 14:14 |
mordred | infra-root: could we land https://review.opendev.org/#/c/711057/ and https://review.opendev.org/#/c/718788/ | 14:17 |
* mordred looking through things that are maybe falling through the cracs | 14:17 | |
mordred | cracks | 14:17 |
*** lpetrut has quit IRC | 14:34 | |
*** mlavalle has joined #opendev | 14:38 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install kubectl via openshift client tools https://review.opendev.org/707412 | 14:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 14:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install kubectl via openshift client tools https://review.opendev.org/707412 | 14:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 14:42 |
corvus | apparently we have an "install-zookeeper" role which we use to test the nodepool deployment; i think once this is finished, we should remove that in favor of having the gate stand up a 1-node zk cluster with this setup. | 14:42 |
mordred | corvus: ++ | 14:44 |
mordred | corvus: also - amusingly enough - the service-zuul patch is failing ... because it's trying to install a zuul user and there is already a zuul user, because it's what we use in zuul | 14:45 |
mordred | it is ... an unfortunate edge case | 14:45 |
*** ykarel|afk is now known as ykarel | 14:45 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 14:47 |
mordred | corvus: for now I'm going to just add a failed_when: false to the user creation... I don't have a better idea of how to deal with it but maybe we'll think of something | 14:47 |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: run ZK from containers https://review.opendev.org/720498 | 14:50 |
AJaeger | config-core, another victoria change for review, please: https://review.opendev.org/720257 | 14:51 |
corvus | actually, we're probably going to *have* to replace install-zookeeper with this, otherwise we're not going to be testing the tls stuff | 14:51 |
corvus | AJaeger: is that more of an #openstack-infra thing? | 14:52 |
AJaeger | corvus: I rather use #opendev nowadays - should we really split that? | 14:54 |
corvus | i think the idea was to make this more approachable to folks who are here for non-openstack projects | 14:55 |
AJaeger | but the goal was to abandon #openstack-infra, so my understanding is we wanted to use this channel primarily. #opendev is for everybody - not excluding openstack. That's at least how I understood it so far... | 14:57 |
mordred | I didn't think it was the goal to abandon #openstack-infra - but instead to split so that opendev is about the general service, and openstack-infra is about things in service of the openstack project more specifically | 14:58 |
corvus | AJaeger: we have different understandings then; mine was a venn-diagram of communities, and they overlap here, but other channels may still be useful for openstack/airship/zuul/etc specific topics. we should get more folks together to talk about this and come to consensus :) | 14:58 |
*** ysandeep|mtg is now known as ysandeep | 14:59 | |
mordred | yeah - it's a new concept and I think we're still figuring it out for sure :) | 14:59 |
AJaeger | ;) | 14:59 |
AJaeger | I'm fine to change, would be great to have a common understanding. | 15:00 |
fungi | we did talk about folding the openstack-infra ml into openstack-discuss, but i don't recall talking about doing something like that for the irc channel | 15:01 |
fungi | if we abandon the #openstack-infra irc channel, it might be more consistent to move openstack-oriented discussions to #openstack-dev or #openstack-qa | 15:02 |
mordred | yeah - although project-config like discussions might be weird still | 15:03 |
fungi | infrastructure-related discussions can certainly happen in here for any project hosted in opendev's infrastructure i think | 15:05 |
*** roman_g has joined #opendev | 15:05 | |
fungi | because we shouldn't need to be expected to hang out in everybody's irc channels | 15:05 |
fungi | but discussing release and job configuration which is specific to a particular project, even if it's hosted in one of our trusted config repos, may still be better in a project-specific channel | 15:06 |
AJaeger | so, airship/starlingx job configuration happens should happen in some airship/starlingx/... channel? Or #opendev - but openstack ones in #openstack-infra? | 15:07 |
fungi | it's a good question. the current comingling of project-specific configs in a central repository means picking a venue for discussion depends on finding somewhere everyone who needs to discuss that can be present | 15:09 |
AJaeger | we can play it by ear for now - and review in a few weeks. | 15:10 |
AJaeger | What I take away is that really openstack specific discussion can stay on #openstack-infra - for now ;) | 15:10 |
fungi | that's true, at a minimum | 15:10 |
fungi | i intend to continue sticking around in that channel anyway | 15:11 |
*** ysandeep is now known as ysandeep|away | 15:11 | |
AJaeger | reading http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013380.html again - I agree with your comments, sorry, I somehow had that internalized differently. | 15:12 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install kubectl via openshift client tools https://review.opendev.org/707412 | 15:12 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 15:12 |
frickler | mordred: corvus: comment/question on https://review.opendev.org/711057 about non-root-useability | 15:21 |
*** dpawlik has quit IRC | 15:26 | |
*** dpawlik has joined #opendev | 15:26 | |
AJaeger | frickler: is https://review.opendev.org/713158 now good? I made the suggested changes quickly | 15:28 |
frickler | AJaeger: I was hoping ianw could answer the question regarding nb03, but we can also do that in a followup, if you think we should merge now before new conflicts appear | 15:30 |
AJaeger | frickler: Ah, I see - ok, let ianw self-approve and followup | 15:32 |
*** ykarel is now known as ykarel|away | 15:35 | |
*** ttx has quit IRC | 15:38 | |
*** ttx has joined #opendev | 15:38 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: WIP: run ZK from containers https://review.opendev.org/720498 | 15:40 |
corvus | mordred, fungi: can you review the commit message there ^ -- figure out what kind of a migration we want to do | 15:40 |
fungi | lookin' | 15:41 |
openstackgerrit | Merged opendev/system-config master: Get rid of all-clouds.yaml https://review.opendev.org/718788 | 15:41 |
mordred | corvus: I think that proposal seems fine | 15:42 |
corvus | k. part of me is itchin to do the 'roll out new servers under a new domain' thing, but this'll be faster, and that's not user-facing | 15:42 |
fungi | corvus: will moving data directories break the running daemons? | 15:43 |
corvus | fungi: yes, sorry i meant to suggest we do that during an outage | 15:43 |
corvus | like a 5-min outage | 15:43 |
fungi | ahh, okay, that wasn't indicated in the commit message. given that, sounds fine. we need to take down zuul and nodepool during that time too i guess? | 15:44 |
corvus | yep | 15:44 |
corvus | oh, you know, there's probably a way to do this as a rolling restart | 15:44 |
corvus | it *is* an ha cluster :) | 15:45 |
mordred | corvus: in nodepool-launcher we're currently installing an ssh private key - but it feels like a leftover - are we still using that for something? | 15:45 |
corvus | mordred: i don't think so | 15:45 |
mordred | k. I'm going to leave it out - and we can always add it back if needed | 15:45 |
corvus | sounds like a plan | 15:45 |
*** moppy has quit IRC | 15:49 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 15:52 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 15:52 |
mordred | corvus: somehow I think that's actually it for nodepool | 15:53 |
mordred | corvus: and looking at it - I kind of think we could rebase it to not be on top of the zuul one - and just shut down the launchers and run it and I think we'll be transitioned :) | 15:53 |
mordred | (there's not a lot going on with the launchers) | 15:53 |
corvus | mordred, fungi: a quick look at zk upgrade instructions makes me think we should be able to do a live rolling upgrade from our current to the new container-based system | 15:54 |
mordred | corvus: cool! | 15:55 |
mordred | corvus: that said - won't we need an outage to update to ssl? | 15:55 |
mordred | or is that online too? | 15:55 |
corvus | i'm thinking maybe what we should do is make a copy of the data files on all 3 servers (as a DR backup), then do the rolling upgrade. if it borks, shut everything down and try to restart the new servers on the old data files. and if that borks, well, we'll get nice new images. :) | 15:56 |
corvus | maybe we can do a data dump too.... | 15:56 |
mordred | corvus: seems good to me | 15:56 |
corvus | mordred: i'm not sure, i think there might be a way to rolling upgrade to tls | 15:56 |
corvus | either way, i'd like to do that as a second phase anyway | 15:57 |
corvus | looks like we can do zk-shell mirror | 15:58 |
corvus | so that's a good second-level data backup | 15:58 |
corvus | cool, i just made a data backup on nl01 | 16:01 |
openstackgerrit | James Page proposed openstack/project-config master: Add TrilioVault charms https://review.opendev.org/720534 | 16:01 |
corvus | it took ~40 seconds | 16:01 |
corvus | it's a json file | 16:01 |
fungi | looking into rolling maintenance across the cluster seems like a useful exercise anyway, agreed | 16:03 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 16:06 |
mordred | corvus: cool! | 16:09 |
AJaeger | do we want to run pypy on bindep - or time to drop pypy testing? | 16:14 |
* mordred does not care about pypy at all | 16:15 | |
*** knikolla has joined #opendev | 16:16 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Remove pypy job from bindep https://review.opendev.org/720543 | 16:17 |
AJaeger | If anybody cares, please -1 ;) ^ | 16:17 |
*** sshnaidm has joined #opendev | 16:17 | |
*** rpittau is now known as rpittau|afk | 16:18 | |
mordred | AJaeger: I'm excited about removing pypy jobs :) | 16:21 |
AJaeger | ;) | 16:22 |
AJaeger | the templates are still used in a few stable branches but master is now rid of it - with exception of jjb | 16:23 |
corvus | aww, i like pypy :) | 16:23 |
fungi | if they renamed it puppy, nobody would want to get rid of something so cute | 16:24 |
corvus | i mean, i agree that we're not really targeting it or putting effort into supporting it, and we shouldn't run it. but that doesn't make me happy | 16:24 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install kubectl via openshift client tools https://review.opendev.org/707412 | 16:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 16:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 16:25 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 16:25 |
mordred | corvus: I have no problem with it - my excitement about removing them is that nobody put any energy in to supporting it, so the jobs have been wasted energy - I agree, I thik it would have been cool if people had actually cared | 16:26 |
corvus | ya | 16:29 |
corvus | apparently we have a 'linter' that runs tests that check "yaml groups" | 16:36 |
corvus | it emitted the error "The group <puppet> does not contain host <zk01.openstack.org>" | 16:36 |
corvus | i'm like "yeah, that's right... why do you think it should?" | 16:36 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 16:38 |
corvus | fungi, mordred: ^ okay that is ready for review and expected to pass all jobs now | 16:39 |
corvus | i think we might be able to execute that today | 16:39 |
fungi | thanks! adding to the top of my pile | 16:39 |
corvus | you can look at the output of the run from the previous patchset -- the newest ps only fixes that linter error | 16:39 |
mordred | corvus: yeah - I think that's ultimately a test of the yamlgroup plugin - but it sure is annoying when we shift things :) | 16:40 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 16:42 |
mordred | corvus: ^^ I rebased that off of the other stack - because it really doesn't depend on it | 16:43 |
mordred | corvus: I think we should be able to roll it out today too | 16:43 |
corvus | mordred: ya. we'll still need the zuul stuff for tls though | 16:43 |
mordred | yah | 16:43 |
mordred | but in the spirit of rolling out smaller changes as we can :) | 16:44 |
corvus | ++ | 16:44 |
clarkb | mordred: corvus so should we land https://review.opendev.org/#/c/719589/ to avoid needing to coordinate that change across more services? | 16:44 |
clarkb | fungi: also any progress on etherpad. I was going to use it to help record that nodepool debugging but it seems to now be unhappy with my browser :/ | 16:45 |
mordred | clarkb: yeah - I think ti would be good to land the compose change before we roll out the new services | 16:45 |
clarkb | also do we need to treat the etherpad issues as more of a fire? | 16:45 |
clarkb | (they seem to be persisting) | 16:45 |
corvus | clarkb: what browser? | 16:46 |
mordred | clarkb: have we applied the new remediations? | 16:46 |
clarkb | corvus: FF | 16:46 |
corvus | what remediations? | 16:46 |
clarkb | mordred: I wasn't aware of any, do you have links? | 16:46 |
fungi | clarkb: we were going to try to troubleshoot from an affected client next | 16:46 |
mordred | I think we might need to run the etherpad playbook - if we landed and changes during the jobs being broken | 16:46 |
fungi | mordred: the only outstanding possible config adjustment indicated in that one seemingly related bug was to set a timeout on the proxy | 16:46 |
mordred | ah - but we haven't done that yet | 16:47 |
clarkb | corvus: fungi: I clicked new pad which didn't actually render the text in the button box, then it seemed to sit on loading until I closed the tab and switched to paste | 16:47 |
corvus | can we downgrade? | 16:47 |
fungi | clarkb: what ip address were you coming from? | 16:47 |
corvus | clarkb: i also see it hanging after clicking 'new pad' | 16:48 |
mordred | corvus: I don't know - I'm not sure what the db implications would be | 16:48 |
corvus | clarkb: eventually loaded for me after about a minute | 16:48 |
mordred | (since there isn't a schema or schema upgrades, I have no idea if 1.8 would write data that earlier can't read) | 16:48 |
fungi | eww, docker-compose logs include ansi escapes even if stdout isn't a tty | 16:49 |
clarkb | fungi: probably because those originate in the service | 16:49 |
clarkb | (so they aren't rewriting the logs for us) | 16:50 |
mordred | there is a --no-ansi option to docker-compose logs | 16:50 |
fungi | mordred: thanks | 16:50 |
corvus | or you can use 'docker logs' | 16:50 |
fungi | clarkb: looks like it attempted to create k4tmEkOycMixbBdNxbHc for you | 16:51 |
fungi | at 16:42:07 | 16:51 |
clarkb | mordred: looking at etehrpad server we don't have the fixed apache logs yet so rerunning config management for that may be a good idea if nothing else | 16:51 |
mordred | clarkb: k. want me to run the playbook real quick? | 16:51 |
clarkb | fungi: that timestamp looks correct | 16:51 |
clarkb | mordred: I'll defer to others as I'm not sure what all else is happening, just noting we did fix that in config so should apply it at some point | 16:52 |
mordred | corvus, fungi : thoughts? | 16:52 |
clarkb | idea: etherpad-dev is upgraded we can compare to it maybe? | 16:53 |
corvus | mordred: running the pb sounds good | 16:53 |
corvus | clarkb: can you elaborate? | 16:53 |
fungi | clarkb: so interstingly, there was no error or traceback related to pad k4tmEkOycMixbBdNxbHc in the logs | 16:53 |
* mordred runs playbook | 16:54 | |
clarkb | corvus: we've got etherpad-dev running the newer etherpad code too, but it is not using upstream docker images, may be using a different apache, nodejs, mysql, etc | 16:54 |
fungi | just two info lines, one for creating the pad, one for the author leaving the padf | 16:54 |
clarkb | corvus: if we find that etherpad-dev is operating more happily that may help us narrow down where the problems are | 16:54 |
corvus | clarkb: right -- though we should make sure we consider "load" as a variable | 16:54 |
clarkb | corvus: thats fair. fwiw etherpad-dev new pad button loads properly and gives me a pad relatively quickly (~3 seconds or so?) | 16:55 |
fungi | i'm fetching an updated server-status since we know it's misbehaving currently | 16:55 |
fungi | that's also taking a while to return | 16:55 |
*** moppy has joined #opendev | 16:56 | |
fungi | i think apache itself may be having trouble | 16:56 |
fungi | `wget -qO- https://etherpad.opendev.org/server-status` locally on the server just hangs for me | 16:57 |
corvus | mordred: "mysql -u root -p" inside the mariadb container with the root password specified in the environment in the docker-compose file isn't working -- anything obvious you see i'm missing? | 16:57 |
fungi | [Thu Apr 16 16:57:45.176108 2020] [mpm_event:error] [pid 31892:tid 139699770100672] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit. | 16:57 |
fungi | that's likely related | 16:58 |
fungi | started up around 16:54:54 after a graceful restart was triggered | 16:58 |
fungi | anybody object if i want to restart apache? i don't think we can get status info out of it in this state anyway | 16:58 |
corvus | fungi: coordinate with mordred | 16:59 |
corvus | he's running a playbook which is probably gracefully restarting apache | 16:59 |
fungi | oh | 16:59 |
mordred | go for it | 16:59 |
mordred | playbook is done | 16:59 |
mordred | corvus: looking | 16:59 |
fungi | well, possible your playbook is why i couldn't get server-status in that case | 16:59 |
clarkb | AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit. <- that implies to me that maybe we do need apache tuing | 16:59 |
clarkb | *tuning | 16:59 |
fungi | sorry, i was still trying to investigate and gather data, didn't realize folks were already changing things | 16:59 |
mordred | corvus: mysql -p$MYSQL_ROOT_PASSWORD works for me | 17:00 |
corvus | clarkb: we had apache tuning; did we lose it, or are you saying we need to re-tune? | 17:00 |
fungi | clarkb: yes, that's what i pasted a few minutes ago, but it only began at 16:54:54 | 17:00 |
fungi | which probably coincides with ansible | 17:00 |
corvus | mordred: weird that works for me too. i wonder why my copy/paste didn't work | 17:00 |
mordred | yeah. that's about when I ran ansible | 17:00 |
clarkb | corvus: maybe its a side effect of the graceful restart and wouldn't otherwise be an issue | 17:00 |
corvus | mordred: hahaha | 17:01 |
fungi | seems to have started immediately following ansible requesting a service reload | 17:01 |
corvus | mordred: a close examination of the password will explain the problem | 17:01 |
corvus | mordred: there is a character that should not be in a password | 17:01 |
corvus | (we should probably use a consistent "pwgen -s 16" or similar to make our psswords | 17:01 |
fungi | anyway, i still can't fetch server-status, while i was able to do it this way yesterday with no problem | 17:01 |
mordred | corvus: "neast" | 17:01 |
corvus | fungi: are you going to restart apache? | 17:02 |
fungi | waiting for conversation to coalesce. we're okay with trying an apache restart next? | 17:02 |
corvus | fungi: mordred said go for it | 17:02 |
fungi | okay, doing that next | 17:02 |
fungi | apache has started back up now | 17:03 |
fungi | and /server-status returns now for me | 17:03 |
corvus | i'm able to load a pad but it's agonizingly slow | 17:04 |
corvus | the db is super responsive; it's almost completely idle, i don't see any contention | 17:05 |
mordred | yeah - db seems fine to me too | 17:07 |
fungi | the /server-status scorecard is much smaller today than yesterday | 17:08 |
fungi | er, scoreboard | 17:08 |
corvus | clarkb, mordred, fungi: i think we did lose our apache tuning | 17:09 |
corvus | https://opendev.org/opendev/puppet-etherpad_lite/src/branch/master/files/apache-connection-tuning | 17:10 |
corvus | i don't see anything like that on the new server | 17:10 |
mordred | agree. totally lost that. | 17:10 |
fungi | yeah, even yesterday, we only had 11 workers when i was checking | 17:11 |
mordred | corvus: you re-adding or want me to? | 17:11 |
corvus | mordred: you do it | 17:11 |
mordred | on it | 17:11 |
corvus | i'll look for anything else in the old module we might have missed | 17:11 |
corvus | mordred, clarkb, fungi: i also don't see this, but i don't know what it does: https://opendev.org/opendev/puppet-etherpad_lite/src/branch/master/files/pad.js | 17:12 |
clarkb | corvus: that should default open the chat window | 17:13 |
clarkb | we can probably work that in later ocne general performance things are happier | 17:13 |
corvus | ah, yep, that does appear to be a behavior change | 17:13 |
corvus | agreed | 17:13 |
fungi | doesn't sound too critical | 17:13 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add apache connection tuning back to apache https://review.opendev.org/720562 | 17:14 |
corvus | we do have a robots.txt, but it's returning 403 | 17:15 |
mordred | corvus: we're not binding mounting it | 17:15 |
fungi | RewriteRule ^/robots.txt$ /var/etherpad/robots.txt [L] | 17:16 |
fungi | yeah, i guess need a bindmount for /var/etherpad/robots.txt then? | 17:16 |
corvus | AH01630: client denied by server configuration: /var/etherpad/robots.txt | 17:16 |
corvus | no we want apache doing that | 17:16 |
corvus | we just don't have a correct apache config | 17:16 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Bind mount robots.txt https://review.opendev.org/720564 | 17:16 |
fungi | interesting, yeah i wonder why we're missnig a directory allow | 17:17 |
mordred | oh - wait - that's dumb from me - sorry. | 17:18 |
mordred | we don't need to bind mount it - apache is running on the host | 17:18 |
fungi | on the old server we used to serve it from /srv/etherpad-lite/robots.txt | 17:18 |
mordred | corvus: we're missing an allow on that path aren't we? | 17:19 |
fungi | even on the old server i don't see any directory block allowing access to that | 17:19 |
mordred | well - on the old server that was set as docroot | 17:20 |
mordred | in the puppet module | 17:20 |
fungi | not in the vhost config though | 17:20 |
corvus | mordred: yeah, we need a Direcotry + Require all granted | 17:20 |
mordred | did puppet set something? | 17:20 |
fungi | there is no docroot in the old vhost either | 17:20 |
mordred | yeah. I agree | 17:20 |
fungi | maybe we broke it a while back | 17:21 |
corvus | mordred: if it was under /var/www i think it'd be ok | 17:21 |
fungi | anyway, yeah, i think we need to explicitly include an allow directive for at least that one file path | 17:21 |
corvus | can you do a file? | 17:21 |
corvus | otherwise, how about we move that to /var/etherpad/www/robots.txt then add a <Directory> for /var/etherpad/www | 17:22 |
corvus | so that we don't accidentally allow /var/etherpad/db/ | 17:22 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Grant access to robots.txt https://review.opendev.org/720564 | 17:22 |
mordred | corvus: good point - changing | 17:22 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Grant access to robots.txt https://review.opendev.org/720564 | 17:23 |
fungi | though if we just want that one file, https://httpd.apache.org/docs/current/mod/core.html#files | 17:23 |
corvus | ah cool, that'd work too | 17:24 |
fungi | i'm not seeing absolute path references for the files directive, but it can be nested in a directory | 17:25 |
fungi | so could make a directory block of /var/etherpad with a files block for robots.txt inside that and then grant access just to that | 17:25 |
corvus | well, mordred has the directory version done... except i left a comment of https://review.opendev.org/720564 | 17:25 |
fungi | but yeah, i think the directory solution is fine | 17:26 |
corvus | i think we only need "Require" now? | 17:26 |
fungi | Require all granted | 17:26 |
fungi | yep | 17:26 |
corvus | yeah, but we don't need order or allow | 17:26 |
fungi | unless we want to allow overrides or anything | 17:26 |
corvus | i think order+allow is 2.2 backwards compat | 17:26 |
fungi | right | 17:26 |
corvus | yeah https://cwiki.apache.org/confluence/display/HTTPD/ClientDeniedByServerConfiguration confirms | 17:27 |
fungi | just "Require all granted" is sufficient in 2.4+ | 17:27 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Grant access to robots.txt https://review.opendev.org/720564 | 17:28 |
corvus | cool +2 on both | 17:28 |
corvus | actually +3 on the first | 17:28 |
fungi | +3 on the second now | 17:29 |
corvus | okay, i think there's a good possibility this explains the issues, so we should re-evaluate after they land i think | 17:29 |
fungi | it almost definitely explains the issues. we were low on slots when i started looking at /server-status yesterday *after* prblmes had calmed down | 17:30 |
corvus | ah, even better then | 17:30 |
fungi | and we didn't really get complaints until tuesdayish when folks started to do stuff en masse on the server | 17:30 |
mordred | agree | 17:30 |
fungi | our testing over the weekend showed it was nice and snappy when we were the only folks using it | 17:31 |
fungi | and etherpad-dev is still nice and snappy | 17:31 |
corvus | those pesky users | 17:31 |
mordred | we should pay attention post ansible ... | 17:33 |
mordred | the last ansible driven apache restart seemed maybe unhappy - but maybe also it was fine and was just the same symptom | 17:33 |
fungi | i think it was just that graceful restarting when the server was already overloaded isn't going to go well | 17:34 |
mordred | nod | 17:35 |
fungi | we may need to do a hard restart after this one for the same reason | 17:35 |
fungi | graceful restart tries to keep serving established connections and no longer accepting new connections on each worker until they can be expired from rotation and a new worker spawned with the updated config | 17:35 |
corvus | mordred: zuul -1 on https://review.opendev.org/720527 | 17:36 |
corvus | i'm going to afk for 30m | 17:36 |
fungi | so couple that with long-lived websocket connections for etherpad and a lack of available worker slots... | 17:36 |
clarkb | ya I'm in and out right now with kids school stuff. I have about 10 minutes to the next thing anything I should review urgently | 17:36 |
*** sshnaidm has quit IRC | 17:37 | |
corvus | i think we got the urgent stuff +3d | 17:37 |
fungi | review a cup of tea | 17:37 |
corvus | clarkb: you can probably review the commit message of https://review.opendev.org/720498 | 17:37 |
corvus | not urgent, but a good use of a minute i think | 17:38 |
mordred | clarkb, corvus: https://review.opendev.org/#/c/707412/ ... what am I doing wrong? | 17:38 |
clarkb | corvus: looks good. I only wonder what split data and log files means, but that seems to be implementation detail | 17:39 |
mordred | oh. blerg | 17:39 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install kubectl via openshift client tools https://review.opendev.org/707412 | 17:40 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 17:40 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 17:40 |
clarkb | mordred: path issues I think | 17:40 |
corvus | clarkb: zk has an option to store the data files and transaction logs in different locations; we don't use it, but the default zk image sets up volume mounts for it that way. i figured now would be a good time to do that, even though we still will have them on the same disk. we could move them later more easily if we want to add an ssd or something. | 17:40 |
clarkb | corvus: got it | 17:41 |
corvus | essentially something like "mv *.log ....." | 17:41 |
clarkb | mordred: seems like you untar to /opt but check /usr for the binaries | 17:41 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 17:42 |
mordred | clarkb: no - it was a symlink issue - I didn't tell it to link :) | 17:42 |
mordred | clarkb: or - I could copy them in place. let me do that actually | 17:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install kubectl via openshift client tools https://review.opendev.org/707412 | 17:45 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 17:45 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 17:45 |
mordred | clarkb: that should be better | 17:45 |
mordred | (that way we don't have to mount /opt/oc into containers or bubblewrap) | 17:46 |
*** diablo_rojo has joined #opendev | 17:50 | |
openstackgerrit | Jeremy Stanley proposed opendev/irc-meetings master: Not all meetings are OpenStack https://review.opendev.org/720063 | 17:52 |
*** ildikov has joined #opendev | 17:58 | |
*** hashar is now known as hasharAway | 18:03 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 18:10 |
mordred | infra-root: nb02 is unhappy - I can't shell in to it | 18:11 |
clarkb | I agree "connection reset by peer" | 18:12 |
mordred | I think at this point we need to cloud reboot it right? | 18:12 |
clarkb | mordred: I think that is the normal resolution. Maybe check the console first for obvious signs of distress | 18:12 |
mordred | yeah | 18:13 |
mordred | looking now | 18:13 |
mordred | nope | 18:13 |
mordred | rebooting | 18:14 |
corvus | back | 18:15 |
mordred | back in | 18:15 |
corvus | it apparently died around 18:10:31 | 18:16 |
*** ralonsoh has quit IRC | 18:16 | |
*** diablo_rojo has quit IRC | 18:18 | |
mordred | corvus: re-review on https://review.opendev.org/#/c/707412 ? | 18:19 |
mordred | clarkb: and review from you please on https://review.opendev.org/#/c/707412 and https://review.opendev.org/#/c/709293/ | 18:20 |
*** icarusfactor has joined #opendev | 18:24 | |
clarkb | mordred: corvus I've removed the WIP from https://review.opendev.org/#/c/719589/ maybe thats a thing to try and land once we're happy with where etherpad has ended up? | 18:26 |
*** factor has quit IRC | 18:27 | |
mordred | clarkb: I think it's ok to land that one whenver there's adequte human attention | 18:28 |
clarkb | ok, I'm not really in that space at the moment. virtual aquarium tour is over but now I need to find food and get a bike ride in then should have the bulk of the afternoon for attention | 18:28 |
mordred | yeah. I'm ok with rolling forward with it once you're around - however, I'm doing a bike ride a little later this afternoon, so I don't know if our attention buckets will overlap (Although I'm also fine if you want to go ahead withit) | 18:29 |
clarkb | k | 18:29 |
mordred | probably just mostly need 2 of us to actually pay attention | 18:30 |
openstackgerrit | Merged opendev/system-config master: Add apache connection tuning back to apache https://review.opendev.org/720562 | 18:33 |
clarkb | mordred: the weather has been great here. My fake commute keeps getting longer :) | 18:33 |
mordred | clarkb: same! | 18:34 |
mordred | clarkb: our range for what a "normal short walk" is has gotten quite long too | 18:34 |
corvus | clarkb: +2 | 18:34 |
corvus | clarkb, fungi: if either of you want to add a +2 to https://review.opendev.org/720498 then i can start doing that after lunch | 18:35 |
clarkb | corvus: left a couple of notes | 18:39 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 18:39 |
corvus | clarkb: thanks! | 18:40 |
clarkb | infra-root I think we may have to restart apache to pick up the connection tuning change in production | 18:40 |
clarkb | the file seems to be there but apache wasn't restarted | 18:40 |
fungi | checking | 18:40 |
mordred | clarkb: oh - you know - we need a notify on that task | 18:40 |
fungi | we may not trigger a reload on those | 18:40 |
fungi | yeah | 18:41 |
mordred | it's about to get a restart for something else | 18:41 |
mordred | the robots patch is in the gate | 18:41 |
mordred | so that'll take care of it - but I'll do a followup real quick | 18:41 |
clarkb | ah ok so if we wait it will get taken care of but still a minor thing to fix | 18:41 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 18:41 |
mordred | oh - there IS a notify on that | 18:41 |
mordred | so I don't know why it didn't restart | 18:42 |
mordred | clarkb: hahaha. my nodepool launchpad patch failed because I forgot to install docker-compose | 18:43 |
fungi | [Thu Apr 16 18:35:30.647491 2020] [mpm_event:notice] [pid 30025:tid 139892810472384] AH00493: SIGUSR1 received. Doing graceful restart | 18:43 |
openstackgerrit | Merged opendev/system-config master: Grant access to robots.txt https://review.opendev.org/720564 | 18:44 |
fungi | [Thu Apr 16 18:35:30.658496 2020] [mpm_event:warn] [pid 30025:tid 139892810472384] AH00501: changing ServerLimit to 128 from original value of 16 not allowed during restart | 18:44 |
clarkb | mordred: just parent to my chagne and it will fix that for you :P | 18:44 |
mordred | clarkb: yeah | 18:44 |
fungi | [Thu Apr 16 18:35:30.658529 2020] [mpm_event:warn] [pid 30025:tid 139892810472384] AH00516: MaxRequestWorkers of 4096 would require 128 servers and exceed ServerLimit of 16, decreasing to 512 | 18:44 |
corvus | fungi: cool, i vote we just manually restart as a one off | 18:44 |
clarkb | corvus: ++ | 18:44 |
fungi | yep, that's where i was headed | 18:44 |
mordred | well - hang on | 18:45 |
mordred | the robots patch is about to run in deploy | 18:45 |
mordred | let's see if that does it? | 18:45 |
mordred | (otherwise we might be fighting ansible here) | 18:45 |
fungi | yeah, but it's going to do a graceful too right? | 18:45 |
mordred | oh - yeah - probably | 18:45 |
mordred | fungi: so I now agree just restart it :) | 18:45 |
fungi | apparently the tuning changes need a hard restart, not just graceful | 18:46 |
mordred | ++ | 18:46 |
fungi | amd restarting | 18:46 |
fungi | er, and | 18:46 |
fungi | now /server-status has a huuuuge scoreboard compared to before | 18:47 |
mordred | corvus: just left a comment on the zk patch ... it's going to conflict silently with clarkb's patch | 18:47 |
fungi | infra-root: keep an ear to the ground for more reports of etherpad issues, but this has hopefully resolved them | 18:47 |
mordred | corvus: so we either need to rebase yours on his and remove the install of docker-compose from packages, or we need to rebase his on yours and then have his include a removal of the docker-compose from the zk role | 18:48 |
* mordred doesn't have a strong opinion on which - just want to make sure we don't miss the overlap | 18:48 | |
corvus | mordred: yep, i think we'll just need to see what our schedules are :) | 18:49 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 18:49 |
mordred | corvus: ++ | 18:49 |
clarkb | mordred: corvus another option is to have zk change install from pypi and then we can drop that once my change merges | 18:49 |
clarkb | rebase lite | 18:49 |
* mordred has rebased the nodepool patch on the clarkb patch - because the nodepool patch is failing due to lack of docker-compose install :) | 18:49 | |
mordred | clarkb: good point | 18:50 |
clarkb | that might be a good way to decouple things for now | 18:50 |
clarkb | its a little extra on the todo list but its an easy todo | 18:50 |
mordred | yeah - will also keep us at one zk restart | 18:50 |
mordred | left that suggestion as a comment | 18:51 |
corvus | clarkb: how did you determine that distro docker-compose did not support stop_grace_period? | 18:56 |
mordred | corvus: we added it to the compose file and the tests failed | 18:56 |
mordred | corvus: with an "unsupported option" error - and then clarkb went through and found out that option was added in a later version of compose than what's in xenial | 18:57 |
mordred | corvus: so - you know - not so much with that versioned compose file format :( | 18:57 |
corvus | is there a link to where it was added? | 18:57 |
mordred | not sure - lemme find a link to the error though | 18:58 |
corvus | i'm just not seeing the version info when i look it up | 18:58 |
corvus | i believe you :) | 18:58 |
corvus | i'm just trying to learn (a) what version is required and (b) how to learn what version is required | 18:58 |
mordred | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_436/719051/1/check/system-config-run-review/4366e0a/bridge.openstack.org/ara-report/result/f97758dc-12ba-44de-9275-d2e7195d2f38/ | 18:59 |
mordred | corvus: :) | 18:59 |
mordred | corvus: yeah- good point | 18:59 |
corvus | maybe the docs are just wrong? | 19:00 |
corvus | ie, the config versioning is correct, it just shouldn't have been listed in the v2 docs? | 19:00 |
mordred | https://docs.docker.com/compose/release-notes/#1100 | 19:01 |
mordred | in the compose file version 2.0 and up section | 19:01 |
corvus | well, that knocks that theory | 19:01 |
mordred | and xenial has 1.8.0 | 19:01 |
corvus | mordred: but i think that doc is the answer to my q, thanks! | 19:01 |
mordred | corvus: \o/ | 19:01 |
* mordred has provided helpful | 19:01 | |
corvus | mordred: bionic has 1.17 | 19:02 |
corvus | i realize the pip install is a working solution | 19:02 |
corvus | i just have some slight hesitation, because, well, the whole point of dockering was to stop global pip installs | 19:03 |
clarkb | docker compose releasenotes hadit iirc | 19:03 |
clarkb | oh good you found it | 19:03 |
clarkb | ya v2 doesnt mean v2 I guess? | 19:03 |
corvus | but we can totally play the "it'll be fine this time" card, and fix it later :) | 19:03 |
clarkb | it limits the blast radius at least | 19:04 |
corvus | this is actually a case where i'd much rather just install a statically compiled binary :) | 19:05 |
mordred | yeah :) | 19:05 |
*** hasharAway is now known as hashar | 19:05 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Install docker-compose from pypi https://review.opendev.org/719589 | 19:05 |
corvus | okay, that's rebase-lite | 19:05 |
mordred | corvus: it looks like you did rebase-lite in the docker-compose patch | 19:06 |
mordred | instead of in the zk patch | 19:06 |
corvus | derp, sorry | 19:07 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Install docker-compose from pypi https://review.opendev.org/719589 | 19:08 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 19:08 |
corvus | should be fixed (reverted to previous ps on d-c patch | 19:08 |
fungi | we seem to still be getting frequent oom conditions on lists.o.o... today around 12:45-12:50 we had a burst of 10 oom events killing different "python" processes over that 5 minute span... i can go through and restart all the queue runners for the sites on that server, but am curious if there's a good way to diagnose what's running away with memory (or if we've simply added too many sites to one machine). | 19:09 |
fungi | cacti graphs also show a hole in data around that time, suggesting the server mostly fell over, though on the tail end of that you can see load average coming down from a massive spike, most likely swap thrash? | 19:09 |
fungi | cacti data (and lack thereof) is making me suspect that adding more swap won't help, and that it's probably not a function of the number of sites we're hosting | 19:10 |
mordred | corvus: +2 | 19:10 |
mordred | fungi: ugh | 19:11 |
mordred | fungi: I have no useful suggestions | 19:11 |
fungi | #status log restarted all mailman sites on lists.openstack.org following oom events around 12:45-12:50z | 19:18 |
openstackstatus | fungi: finished logging | 19:18 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers https://review.opendev.org/717620 | 19:24 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run nodepool launchers with ansible and containers https://review.opendev.org/720527 | 19:24 |
mordred | clarkb: if you get a sec, https://review.opendev.org/#/c/718791/ | 19:25 |
mordred | corvus: neat! the zuul change has failed because zookeeper hosts is undefined :) | 19:51 |
mordred | corvus: so - I think it'll likely make sense to base it on your zk patch so it can, you know, create a zk | 19:52 |
corvus | mordred: cool, yeah let's do that (rather than the nodepool thing) | 19:52 |
corvus | i'm just about to start the manual zk work | 19:53 |
mordred | cool. I'm about to step out for a bit - but I think you've got that under control | 19:53 |
corvus | added zk* to emergency | 20:07 |
corvus | i prepared a checkout of change 720498 with the parts about running docker-compose commented out | 20:12 |
corvus | i copied the data files to a backup location on all 3 servers | 20:13 |
corvus | i made a secondary json backup on nl01 | 20:15 |
corvus | i'll start running the playbook now | 20:16 |
corvus | now i'm going to run this in the locally modified checkout: ansible-playbook --limit="zk01.openstack.org:localhost" playbooks/service-zookeeper.yaml | 20:17 |
corvus | now i'm going to be confused why "no hosts matched" | 20:20 |
fungi | zk01.openstack.org is definitely an actual hostname, and localhost should match regardless right? | 20:21 |
corvus | yeah... | 20:21 |
corvus | and it's using some of the inventory files from /etc, but that should be okay -- zk01 is currently in the inventory file and in the 'zookeeper' group | 20:21 |
corvus | (the only inventory related modifications from my change are to remove it from the puppet group; should have no impact here) | 20:22 |
corvus | oh | 20:24 |
corvus | it's also reading the emergency file | 20:24 |
corvus | i'll modify the playbook to omit !disabled | 20:24 |
corvus | off we go | 20:24 |
fungi | makes sense | 20:25 |
corvus | done; i'll check out the state on zk01 now | 20:26 |
corvus | oh, i just now had the thought that maybe we should try running zk as a non-root user | 20:28 |
corvus | (that's how it is currently run in packages) | 20:28 |
fungi | and not how it's running in the container images i guess? | 20:28 |
corvus | right | 20:29 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 20:37 |
corvus | fungi: how's the patchset delta on that look to you? | 20:38 |
corvus | i think creation will no-op since we currently have a zookeeper user and group on the host | 20:38 |
corvus | so if it looks good, i can rm -rf /var/zookeeper (the new data/conf location) and re-run the playbook with that applied | 20:39 |
fungi | checking | 20:39 |
fungi | corvus: lgtm. marked a misspelling inline but just a nit as it's only in the task name not anything syntactic | 20:41 |
*** Romik has joined #opendev | 20:41 | |
corvus | goup? :) | 20:41 |
corvus | cool i'll fix that and remove some whitespace | 20:41 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 20:42 |
corvus | and i'll rm and run the playbook again now | 20:42 |
fungi | cool | 20:42 |
fungi | i'm here but also operating a hot skillet at the same time so may go quiet for a few minutes at a time | 20:43 |
corvus | ah drat, that's a change to the user's home dir, which can't run while the user is in use | 20:46 |
corvus | i think what i should do is modify the home directory in my local copy so it's a no-op, then do a further manual usermod while the service is stopped to change to the new location, then we'll merge the change as written with the new location | 20:48 |
corvus | i'm going to stop zk on zk01 now | 20:49 |
fungi | that sounds fine, yep | 20:53 |
clarkb | reviewing mordred's doc change now. Is etherpad stuff done? | 20:53 |
*** Romik has quit IRC | 20:55 | |
corvus | here's what i've done so far for the migration: http://paste.openstack.org/show/792255/ | 20:56 |
corvus | followed by some apt-get purging and autoremoving | 20:57 |
corvus | i think i'm ready to bring zk01 up now | 20:58 |
clarkb | corvus: are you going to manually run ansible on brdige to do that? | 20:58 |
corvus | clarkb: i manually ran a modified playbook to do the prep steps; to bring it up i'll manually run docker-compose up -d | 20:58 |
clarkb | rgr | 20:59 |
corvus | doing that now | 20:59 |
corvus | oh, ha -- removing the packages removed the zk user | 21:00 |
corvus | so i'll run the playbook again, and it should (re-)create the user this time | 21:01 |
corvus | perhaps with different ids, so i'll check that | 21:01 |
fungi | yeah, if the user was in an autocreate uid range it will wind up getting whatever the next unassigned uid is in that range, so will likely change | 21:04 |
clarkb | looks like apache on etherpad was restarted | 21:04 |
clarkb | creating a new pad seems to be happy | 21:04 |
clarkb | I guess we watch it and see if that was the fix we needed? | 21:05 |
fungi | i restarted it manually, yes, and confirmed /server-status showed many more available slots | 21:05 |
fungi | apparently some tuning changes can't be applied via graceful restart, and require a hard restart instead | 21:06 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Use HUP to stop gerrit in docker-compose https://review.opendev.org/719051 | 21:08 |
clarkb | ok ^ is rebased onto latest ps of its parent now | 21:08 |
clarkb | mordred: corvus fungi (ianw if around) should we go ahead and approve https://review.opendev.org/#/c/719589/ now? | 21:08 |
corvus | oh... hrm, the zk container image has a zk user, but it's uid 1000 | 21:09 |
corvus | that's the 'ubuntu' image on our host... | 21:09 |
clarkb | ya 1000 is a bad choice | 21:09 |
clarkb | since a lot of distros start non system there | 21:09 |
corvus | (ugh, the whole non-root user under docker thing is a mess) | 21:09 |
clarkb | corvus: is there a way to map uids in and out of containers | 21:10 |
corvus | i think we can just give it a numeric uid and probably nothing inside the container will care | 21:10 |
corvus | like, we could tell it to run as 999:998 (which is what just got created), or we could pick new numbers, like 10002 | 21:11 |
fungi | almost everything is always happy with numeric uid/gid anyway, yes | 21:11 |
fungi | user and group names are mainly cosmetic | 21:11 |
fungi | unless referred to in conffiles and the like | 21:11 |
corvus | (i believe in the bleeding edge or possibly later crio systems, they actually modify /etc/passwd inside containers on startup, so this might actually get better in th efuture) | 21:11 |
corvus | so maybe let's just specify 10001:10001 in our user/group creation, and map that in numerically | 21:12 |
fungi | i'm good with giving that a try | 21:13 |
clarkb | https://docs.docker.com/engine/security/userns-remap/#about-remapping-and-subordinate-user-and-group-ids | 21:13 |
fungi | it's probably only a concern if the uid/gid also happen to be used by another unprivileged user in the parent system | 21:13 |
fungi | in which case they get access to files in the container's file tree | 21:13 |
clarkb | I guess that is so you can pretend to be root? but maybe it would work for ths too? though it seems fairly involved to set up | 21:14 |
clarkb | options to dockerd, and config files need to be set | 21:14 |
clarkb | oh and we couldn't use host netowrking if we do that | 21:15 |
corvus | i think that's useful for root, but unecessary here | 21:15 |
*** hashar has quit IRC | 21:17 | |
openstackgerrit | Merged opendev/system-config master: Remove puppet and cron mentions from docs https://review.opendev.org/718791 | 21:18 |
corvus | hrm. it's running, but i don't think it's able to join | 21:24 |
corvus | an exception in the log i don't understand | 21:24 |
clarkb | corvus: let me know if you'd liek more eyeballs on it | 21:26 |
corvus | ack, i'll try to eliminate some simple things first | 21:27 |
corvus | https://stackoverflow.com/a/61215487 | 21:33 |
corvus | that might be us | 21:33 |
clarkb | ah so maybe we pin the image afterall? | 21:34 |
corvus | yeah, let me see what i was testing with locally | 21:34 |
corvus | looks like 3.5.6 | 21:38 |
corvus | there's a 3.5.7 now | 21:38 |
corvus | maybe we should use :3.5 ? | 21:38 |
corvus | or should we pin to 3.5.7? | 21:38 |
clarkb | I think we are probably sfae to stick to 3.5.x | 21:39 |
corvus | okay, now i think i should shut it down, and copy all the old data files over again | 21:39 |
corvus | because 3.6.0 may have munged them | 21:39 |
fungi | sounds prudent given the circumstances | 21:39 |
corvus | zk_1 | 2020-04-16 21:42:08,022 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@69] - FOLLOWING - LEADER ELECTION TOOK - 19 MS | 21:42 |
corvus | zk_1 | 2020-04-16 21:42:08,189 [myid:1] - INFO [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Learner@529] - Learner received UPTODATE message | 21:42 |
corvus | this looks good. | 21:42 |
corvus | i'm going to stop it and back out some of my silly hostname/ip changes which shouldn't be necessary | 21:43 |
corvus | okay, one of them is necessary | 21:45 |
corvus | we have to specify 0.0.0.0 for an own server's binding in the config file | 21:45 |
corvus | i'll work on jinjaing that up real quick | 21:46 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 21:51 |
corvus | okay, i think that reflects reality on zk1 | 21:51 |
corvus | cool -- that last patchset with the bogus uids -- it failed testing :) | 21:53 |
corvus | all right, i think i'm ready to move on to zk02 with this modified procedure: http://paste.openstack.org/show/792256/ | 21:55 |
fungi | looks good, i guess ansible creating the user and group is still fine then? | 21:59 |
corvus | yep. it'll get removed, then created with the id we specify | 22:00 |
corvus | moving on to zk02 now | 22:03 |
clarkb | corvus: plan lgtm | 22:03 |
corvus | stopped | 22:04 |
corvus | running playbook | 22:06 |
clarkb | not to completely change the subject but I'm thinking if we have interest from frickler and ajaeger for virtual PTG slots we can do an early morning (relative to me) PTG chunk of time one day and if ianw is interested a late day chunk | 22:09 |
clarkb | and maybe do ~2 4 hour chunks for a total of 8 hours or something | 22:09 |
clarkb | (I worry any more than that will just be painful) | 22:09 |
corvus | okay, second set of manual steps complete; config files look good | 22:10 |
corvus | i'm going to start it | 22:10 |
corvus | i was unconvinced it rejoined correctly the first time, so i stopped it and started it again | 22:15 |
corvus | the second time i see the warm fuzzy UPTODATE message | 22:15 |
ianw | clarkb: i'm happy to be around for such a thing, maybe some others in ~ tz's like tonyb or ricolin and kevinz might find it good too | 22:15 |
corvus | i will move on to zk03 now | 22:17 |
clarkb | ianw: good point | 22:17 |
clarkb | Looks like the europe chunk is 1300-1700 UTC and apac is 0400-0800 | 22:18 |
clarkb | now to do math to see how those map onto my timezone | 22:18 |
clarkb | 6am to 10am and 9pm to 1am | 22:19 |
corvus | incidentally, we did the rolling restart the 'hard' way, killing the leader each time | 22:20 |
clarkb | the third chunk is 2pm to 6pm. I think I can actually do all three of those if necessary | 22:20 |
fungi | for me that's 9am to 1pm and midnight to 4am | 22:20 |
clarkb | fungi: ya I'm thinking that maybe 1300-1700 and 2100-0100 might be better given your eastern timezone | 22:20 |
clarkb | but I'm not sure how early 2100 is for ianw yet | 22:21 |
fungi | midnight to 4am might be tough for me, but if i don't have anything scheduled the day before/after that i can manage it | 22:21 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run ZK from containers https://review.opendev.org/720498 | 22:21 |
clarkb | 7am to 11am for ianw I think | 22:21 |
ianw | 2100 would be ... 6 or 7 i think | 22:21 |
clarkb | ianw: and china would be even earlier I think | 22:21 |
clarkb | now I'm thinking maybe we do 2 hours in each of the chunks maybe and try and make things nicer then do ~4 sessions? I'll keep noodling on this and probably set up a chart of timezones :) | 22:22 |
ianw | i think taiwan is 2 hours before that | 22:23 |
ianw | not sure what tz kevinz falls into | 22:23 |
clarkb | like maybe we do 1300-1500, 2300-0100, and 0400-0600 | 22:23 |
corvus | moving to #opendev-meeting | 22:24 |
clarkb | corvus: sorry! | 22:24 |
corvus | don't be, that's what it's there for | 22:24 |
ianw | frickler / AJaeger: thanks for reviews and fixups on the nodepool config. i'll apply it now and watch through. frickler i responded that i very much hope the -plain images disappear ASAP | 22:25 |
ianw | if we need fixups or other odd things (which it's looking like we should, hopefully, not) we should be able to handle that in base jobs, rather than images | 22:26 |
ianw | clarkb: the only change to https://review.opendev.org/#/c/718224/ was putting ontop of the "check for pip" bits, yeah? | 22:28 |
clarkb | ianw: yup | 22:29 |
ianw | corvus: when you have a second, if you could loop back on https://review.opendev.org/#/c/717663/26 and i believe with the changes to check if pip is installed, your -1 should be satisfied | 22:29 |
corvus | ianw: yep +0 (looks like it just carried over cause it was a rebase) feel free to proceed with existing votes :) | 22:30 |
ianw | thanks | 22:32 |
openstackgerrit | Merged openstack/project-config master: nodepool: use job inheritance https://review.opendev.org/713158 | 22:34 |
ianw | my scrollback ran out ... did we figure out the project-config post job localhost key thing? | 22:34 |
ianw | i guess so, it looks like it's running for ^^ | 22:36 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add ubuntu-bionic-plain to all regions https://review.opendev.org/720316 | 22:42 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: nodepool: Add more plain images https://review.opendev.org/720318 | 22:42 |
ianw | infra-root: ^ if we could consider these, it will be good for testing both "plain" hosts, and also testing the container builder on other image types | 22:42 |
clarkb | ok I used pencil on paper and drew somethings. I think if we do Monday 1300-1500, 2300-0100 then Wednesday 0400-0600, that gives us 6 hours to talk about things but also gives each timezone chunk 4 hours of non super painful time. And we should be able to find sleep too | 22:43 |
clarkb | we can shift the days forward if necessary too, but this also nicely doesn't conflict iwth our normal team meeting | 22:44 |
clarkb | ianw: for those changes we don't add the -plain images to the launcher providers looks like? is that intentional? we just want ot upload for now? | 22:47 |
ianw | yeah i thought maybe build them first? i can add if you want an all-in-one | 22:52 |
clarkb | nah if that is intentional its fine | 22:55 |
clarkb | I was owrried you were expecting nodes in the providers too | 22:55 |
ianw | it moves a lot of builds into the container builder, so i'll be interested if anything happens | 22:56 |
ianw | wrt to building the other image types there. we don't (yet) have functional tests covering them all; on my todo list to convert dib | 22:56 |
openstackgerrit | Merged zuul/zuul-jobs master: ensure-pip: export ensure_pip_virtualenv_command https://review.opendev.org/718224 | 23:01 |
ianw | note that devstack is working on the bionic-plain images https://review.opendev.org/#/c/712211/ | 23:04 |
ianw | opensuse should have actually been working, but the required change got -2'd by devstack initially and didn't make it in, leading to a lot of unfortunate confusion | 23:04 |
*** tosky has quit IRC | 23:09 | |
mnaser | hmm | 23:16 |
mnaser | did something happen with zuul not long ago? | 23:16 |
mnaser | oh, docker-ifying things | 23:16 |
mnaser | that can explain why a job that was wrapping up is now in 2. attempt ? | 23:17 |
clarkb | mnaser: yes, its zookeeper db had a sad. We think we've got it back to 2 happy nodes and now trying to get the third to achieve quorum | 23:17 |
clarkb | mnaser: yes | 23:17 |
mnaser | ok no worries, sorrym i quickly glanced scrollback and didn't see anything obvious | 23:17 |
* mnaser sends hugops | 23:17 | |
clarkb | mnaser: when zuul loses connectivity to zk nodepool cleans up all the nodes | 23:17 |
clarkb | zuul sees that as a network error and will retry the jobs if they haven't arleady retried 3 times for some reason | 23:17 |
mnaser | ive only seen the 2 attempt happen when pre fails, but that's a new scenario i learned i guess | 23:18 |
openstackgerrit | Merged zuul/zuul-jobs master: fetch-zuul-cloner: use ensure-pip https://review.opendev.org/717882 | 23:25 |
mnaser | i just saw 2 jobs fail to upload to buildset registry with a timeout | 23:31 |
mnaser | Get https://zuul-jobs.buildset-registry:5000/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) | 23:31 |
mnaser | ill retry them, because they had to be restarted, but i'll let yo uknow if i see it happen again.. | 23:32 |
clarkb | mnaser: that could be fallout from.the other issues | 23:45 |
clarkb | because buildset registry runs jn a paused job | 23:46 |
clarkb | and that wont restart properly maybe? | 23:46 |
clarkb | I expect rechecks to be fine now that zk is stable again | 23:46 |
*** mlavalle has quit IRC | 23:50 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!