clarkb | https://docs.python.org/3/library/ssl.html#id9 is the related documentation | 00:03 |
---|---|---|
clarkb | I bet that this is realted to 1.3 somehow | 00:03 |
*** sboyron has quit IRC | 00:05 | |
fungi | clarkb: my change and the dnm change stacked on it indicate that your revision works with 3.6-3.8 on bionic and 3.9 on focal, so i expect it's safe | 00:18 |
fungi | is there more we want to test? | 00:18 |
*** cloudnull has quit IRC | 00:27 | |
*** cloudnull has joined #opendev | 00:27 | |
TheJulia | Anything special about limehost? | 00:29 |
TheJulia | I ask because it to be the cloud our multinode job in ironic loves to fail on | 00:30 |
fungi | limestone? it's ipv6-only with ipv4 access to the internet via many-to-one nat | 00:33 |
TheJulia | yeah | 00:33 |
fungi | what kind of failures? talking to things on the internet? ipv4-only things? | 00:33 |
TheJulia | it *looks* like the vxlan tunnel is just not passing traffic | 00:33 |
TheJulia | between the two nodes | 00:33 |
fungi | oh, neat. i think we do our multinode setup specifically p2p so that vxlan won't try to use multicast... are you using ours or did you roll your own? | 00:34 |
fungi | is it failing to pass traffic over vxlan between nodes all the time there, or only sometimes? | 00:35 |
fungi | possible the lan there has gotten partitioned or something, i suppose | 00:36 |
openstackgerrit | Merged opendev/system-config master: refstack: use CNAME for production server https://review.opendev.org/c/opendev/system-config/+/780125 | 00:38 |
fungi | TheJulia: another possibility is that ipv4 connectivity for some nodes is breaking partway into the build? we'd still be able to reach them via ipv6 so zuul wouldn't realize anything had gone wrong network-wise | 00:40 |
fungi | might make sense to look at syslog on one of the failure examples, see if dhcpd logs any lease updates, arp overwrites, et cetera | 00:41 |
TheJulia | https://0b3775447bad164395a7-ce9ebe3ea1326bbb58a211f00836955d.ssl.cf2.rackcdn.com/778145/2/gate/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/4c83c2d | 00:41 |
TheJulia | When we power up VMs attached to brbm, basically the packets never get through it appears | 00:41 |
TheJulia | so they never boot | 00:41 |
TheJulia | at least off of compute1 | 00:41 |
fungi | basically we're communicating with those nodes exclusively over ipv6, while vxlan is communicating between the nodes over ipv4, so if the latter is dying that could explain it | 00:42 |
fungi | we do at least test initially that each node can reach something on the internet over ipv4, but it could be breaking after that i suppose | 00:43 |
fungi | syslog shows iptables blocking a bunch of multicast traffic | 00:44 |
fungi | is that typical? | 00:44 |
fungi | vxlan will try to tunnel layer-2 broadcast traffic over multicast ip | 00:45 |
fungi | possible that's just benign noise | 00:49 |
*** tosky has quit IRC | 00:51 | |
TheJulia | I think it is noise | 00:51 |
TheJulia | cross node traffic seems to work just fine otherwise | 00:55 |
TheJulia | I'm not an ovs expert but it almost looks like ovs kind of works, datapath gets established, and then ovs seems to become unhappy and boom | 00:59 |
* TheJulia wonders about MTUs | 00:59 | |
guillaumec | clarkb, indeed, "context.options |= ssl.OP_NO_TLSv1_3" solves the zuul ssl test issue | 01:01 |
fungi | TheJulia: ooh, good line of inquiry. that could vary by provider too | 01:01 |
fungi | TheJulia: https://zuul.opendev.org/t/openstack/build/4c83c2d9c1774ce09f0d447bbdbed4d1/log/zuul-info/zuul-info.compute1.txt#26 | 01:02 |
fungi | 1500 | 01:02 |
fungi | i think devstack tries to set the virtual interfaces lower to accommodate that | 01:02 |
TheJulia | 1500 feels like classic physical interface. v6 has pmtu discovery, I wonder if we're in some weird cross-hypervisor packet dropping | 01:03 |
* TheJulia prepares to mark the job non-voting :( | 01:03 | |
fungi | yeah, also any particular snapshot of the pmtu for those peers won't necessarily be consistent | 01:04 |
fungi | logan-: ^ if you're around, maybe you could have some theories since you know what the underlying network looks like | 01:05 |
TheJulia | For the VM's themselves we have, we're dropping the mtu to 1330. Neutron runs at 1430 (there is a reason for the 100 bytes, I just don't remember it without tasty beverages.) | 01:07 |
fungi | yeah, that memory is best not relived without some chemical safety net | 01:08 |
TheJulia | Yeah, I'd think the only way to really figure this out is to be able to catch it in the act with a pcap or something | 01:09 |
TheJulia | but that would be huge | 01:09 |
fungi | is this the most common failure for that job? if so an autohold could at least keep the nodes around after the job fails | 01:10 |
fungi | doesn't mean whatever was breaking them would still be broken by the time we logged in, but worth a shot | 01:10 |
TheJulia | fungi: I looked at ?3? randomly on that job and it was all the same | 01:13 |
TheJulia | all on limestone | 01:13 |
TheJulia | I dunno, I'm kind of okay with just defferring it at the moment, too much work to do. | 01:13 |
TheJulia | that is unless magical ideas appear | 01:14 |
openstackgerrit | Jeremy Stanley proposed opendev/gear master: DNM: see if intermediate Python versions work too https://review.opendev.org/c/opendev/gear/+/780131 | 01:18 |
fungi | TheJulia: once you (or anyone really) is ready to dig into it, we can set up an autohold for that job and wait for it to catch a failure | 01:23 |
*** mlavalle has quit IRC | 01:24 | |
TheJulia | fungi: much appreciated | 01:34 |
*** mgagne has joined #opendev | 02:29 | |
ianw | kopecmartin / clarkb : i gave the containers a cycle after the config change applied and i can see results on https://refstack.openstack.org now. so i think it's working and won't roll back | 02:38 |
johnsom | I'm trying to push that tag for wsme, but ssh with gerrit is rejecting me. Even if I try to checkout a patch using ssh I get permission denied. Any tips/ideas? | 02:41 |
johnsom | The key in gerrit (web) is correct | 02:41 |
*** artom has quit IRC | 02:46 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: refstack: cleanup old puppet https://review.opendev.org/c/opendev/system-config/+/780138 | 02:49 |
johnsom | Ok, it's something broken on this fedora workstation. Everything works fine from other VMs. | 02:54 |
*** whoami-rajat_ has joined #opendev | 02:55 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: certcheck: cleanup letsencrypt domains https://review.opendev.org/c/opendev/system-config/+/780140 | 03:01 |
ianw | johnsom: fedora 33? | 03:01 |
johnsom | yeah | 03:02 |
ianw | yep, that's a known issue | 03:02 |
johnsom | lol | 03:02 |
fungi | openssl security defaults | 03:02 |
johnsom | Can I get an hour refund? | 03:02 |
ianw | https://issues.apache.org/jira/browse/SSHD-1118 if you'd like to read too much inconclusive detail on it :) | 03:02 |
johnsom | ha, thanks, I will take a look | 03:02 |
ianw | speaking of, RAX got the wrong end of the stick with my report that fedora 33 doesn't work with their console host | 03:03 |
ianw | i think they thought it meant fedora 33 hosts don't show a console, not that you can't connect to their console host via fedora 33 with default configuration | 03:04 |
ianw | that's even more screwed up and i'm owed a bigger refund than johnsom there :) | 03:05 |
johnsom | Yep, that was the exact problem. Thanks ianw | 03:08 |
*** whoami-rajat_ is now known as whoami-rajat | 03:16 | |
ianw | i filed https://issues.apache.org/jira/browse/SSHD-1141 as requested in sshd-1118 | 03:32 |
ianw | i think i distilled it correctly, fungi ^ could maybe check :) | 03:32 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kerberos-kdc: role to manage Kerberos KDC servers https://review.opendev.org/c/opendev/system-config/+/778840 | 04:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kerberos: switch servers to Ansible control https://review.opendev.org/c/opendev/system-config/+/779890 | 04:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: kerberos-kdc: add database backups https://review.opendev.org/c/opendev/system-config/+/779891 | 04:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: refstack: add backup https://review.opendev.org/c/opendev/system-config/+/775061 | 04:18 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup hosts: use exact names https://review.opendev.org/c/opendev/system-config/+/780144 | 04:28 |
*** ysandeep|holiday is now known as ysandeep | 04:33 | |
*** ykarel has joined #opendev | 04:50 | |
ykarel | ianw, hi, u around? | 05:11 |
ykarel | we facing mirror issues for centos 8-stream, and on seeing i see the centos mirror from which infra mirrors follow is not synched for 12 hours | 05:12 |
ykarel | http://mirror.dal10.us.leaseweb.net/centos/8-stream/AppStream/x86_64/os/repodata/ one | 05:13 |
ykarel | in https://mirror-status.centos.org/ i see some mirror which are good | 05:14 |
openstackgerrit | Merged opendev/system-config master: refstack: add backup https://review.opendev.org/c/opendev/system-config/+/775061 | 05:15 |
ykarel | the mirror u added in https://review.opendev.org/c/opendev/system-config/+/684437 is good currently, which was changed later to current one ^ as that was not up to date that time and was not in mirror-status.centos,org | 05:17 |
ykarel | in https://review.opendev.org/c/opendev/system-config/+/716602 | 05:17 |
*** stevebaker has quit IRC | 05:18 | |
*** stevebaker has joined #opendev | 05:23 | |
ykarel | ok http://mirror.dal10.us.leaseweb.net/centos/8-stream/AppStream/x86_64/os/repodata/ is updated now, so next rsync should fix infra mirrors | 05:25 |
*** whoami-rajat has quit IRC | 05:28 | |
ykarel | last run missed that, and now next run is in approx 1.25 hour | 05:37 |
ykarel | if it can be manually triggered before that it will be good, else have to wait | 05:37 |
*** ykarel_ has joined #opendev | 06:08 | |
*** ykarel has quit IRC | 06:08 | |
*** ralonsoh has joined #opendev | 06:18 | |
*** marios has joined #opendev | 06:20 | |
*** ykarel_ has quit IRC | 06:31 | |
*** ykarel has joined #opendev | 06:32 | |
*** whoami-rajat_ has joined #opendev | 06:56 | |
ykarel | mirrors got updated now | 07:01 |
*** slaweq has joined #opendev | 07:11 | |
*** eolivare has joined #opendev | 07:25 | |
ianw | ykarel: sorry, missed this, things in sync now? | 07:34 |
ykarel | ianw, yes now it's synched | 07:34 |
ykarel | ianw, now seeing issue with epel repos not synched | 07:47 |
*** hashar has joined #opendev | 07:48 | |
ykarel | http://pubmirror1.math.uh.edu/fedora-buffet/epel/8/Everything/x86_64/repodata/?C=M;O=D vs mirror.ord.rax.opendev.org/epel/8/Everything/x86_64/repodata/?C=M;O=D | 07:50 |
ykarel | and other epel mirror https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/repodata/?C=M;O=D | 07:51 |
*** sboyron has joined #opendev | 08:05 | |
*** andrewbonney has joined #opendev | 08:33 | |
*** amoralej has joined #opendev | 08:44 | |
kopecmartin | ianw: clarkb return_to address is fixed, thanks for that, but I still can't sign in, I suspect it might be something with realm, in openstackid.org i can see that I'm signing in from "Site" realm instead of 'refstack.openstack.org' | 08:50 |
*** tosky has joined #opendev | 08:51 | |
*** jpena|off is now known as jpena | 08:58 | |
ttx | kopecmartin: yes I confirm I see the same. It's weird as the URL has openid.realm=https%3A%2F%2Frefstack.openstack.org | 09:08 |
kopecmartin | ttx: hmm, is there something else which has to be set on the server side in order to have the correct realm? | 09:10 |
ttx | kopecmartin: I have no idea.. I'll ask openstackID folks to have a look. Was anything changed in the parameters, or was just everything copied over from the old one? | 09:10 |
ttx | Or could it be some DNS propagation issue ? Like the IP we have for refstack.o.o is not the same as the one the openstackid server sees? | 09:12 |
kopecmartin | ttx: there were lots of changes in the server config (like puppet -> containers move, py2->py3, OS version ...) but no significant changes in the configs | 09:14 |
kopecmartin | well , a little workaround with redirection https://review.opendev.org/c/opendev/system-config/+/776292/18/playbooks/roles/refstack/templates/refstack.vhost.j2 | 09:15 |
kopecmartin | maybe that ^^? | 09:15 |
kopecmartin | might be, unfortunately the dns config is outside my scope | 09:17 |
*** stevebaker has quit IRC | 09:17 | |
ttx | hmm, probably not. Here the issue is that clicking "Log In" should give us the login form, not the openstackid.org front page | 09:17 |
ttx | I'll ask the ID provider guys, they should be able to tell us what's missing. i'll let you know here if they reply anything useful, And thanks again for working on this! | 09:18 |
kopecmartin | ttx: sure, thanks .. I'm gonna quickly check the refstack project to see how the signin url is form - i remember there were some changes too | 09:18 |
*** ysandeep is now known as ysandeep|lunch | 09:41 | |
openstackgerrit | Aurelien Lourot proposed openstack/project-config master: Add Magnum charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/780211 | 09:50 |
*** dtantsur|afk is now known as dtantsur | 09:51 | |
openstackgerrit | Aurelien Lourot proposed openstack/project-config master: Add Magnum charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/780211 | 09:54 |
*** smcginnis has joined #opendev | 10:38 | |
*** bodgix has quit IRC | 10:59 | |
*** bodgix_ has joined #opendev | 10:59 | |
*** slaweq has quit IRC | 11:00 | |
*** slaweq has joined #opendev | 11:02 | |
*** brinzhang0 has quit IRC | 11:11 | |
openstackgerrit | Rotan proposed openstack/diskimage-builder master: replace the link which is in the 06-hpdsa file https://review.opendev.org/c/openstack/diskimage-builder/+/730286 | 11:20 |
*** ysandeep|lunch is now known as ysandeep | 11:36 | |
openstackgerrit | Merged zuul/zuul-jobs master: bindep.txt: skip python-devel for el8 platform https://review.opendev.org/c/zuul/zuul-jobs/+/780050 | 11:47 |
*** hashar is now known as hasharLunch | 12:10 | |
*** smcginnis has quit IRC | 12:28 | |
*** jpena is now known as jpena|lunch | 12:32 | |
*** artom has joined #opendev | 12:44 | |
*** tkajinam has quit IRC | 12:54 | |
*** hasharLunch is now known as hashar | 13:00 | |
*** smcginnis has joined #opendev | 13:05 | |
*** ykarel has quit IRC | 13:08 | |
*** ykarel has joined #opendev | 13:09 | |
*** amoralej is now known as amoralej|lunch | 13:23 | |
*** jpena|lunch is now known as jpena | 13:49 | |
*** smcginnis has quit IRC | 13:52 | |
*** mlavalle has joined #opendev | 13:59 | |
dtantsur | hi folks! is it only me, or there is some issue with published logs? https://zuul.opendev.org/t/openstack/build/6f9c830b828e4ff382ed05bfdc608a80/log/job-output.txt | 14:03 |
fungi | ianw: that mina-ssh feature request looks good to me, also they've already replied suggesting you could implement it for them ;) | 14:04 |
fungi | dtantsur: "This logfile could not be found" usually means either we failed trying to upload it, or it disappeared off the swift server after upload. i'll take a look in the executor debug logs in a bit to rule out the former (that usually ends in a post_failure result though) | 14:06 |
dtantsur | thanks! note that it's a very recent run, so it shouldn't have timed out. | 14:06 |
fungi | i'll need to look at it after i run some errands this morning, but will dig in as soon as i'm back | 14:10 |
*** amoralej|lunch is now known as amoralej | 14:12 | |
openstackgerrit | Rich Bowen proposed opendev/yaml2ical master: Adds second- and fourth- week recurring meetings https://review.opendev.org/c/opendev/yaml2ical/+/780266 | 14:14 |
*** hashar is now known as hasharAway | 14:20 | |
*** mfixtex has joined #opendev | 14:24 | |
*** smcginnis has joined #opendev | 14:28 | |
TheJulia | put of curiosity, is the new gerrit webui making huge calls for lists of everythning as it could relate to the user interaction? | 14:32 |
*** lpetrut has joined #opendev | 14:35 | |
*** mfixtex has quit IRC | 14:37 | |
*** whoami-rajat_ is now known as whoami-rajat | 14:40 | |
fungi | kopecmartin: clarkb: ttx: apparently the problem is the auth url should be https://openstackid.org/accounts/openid2 not just the base site url | 14:47 |
fungi | TheJulia: not entirely sure, it's implemented with polymer... but the reason we suspect it's slow is for backend reasons (the relational database has been replaced with objects in git repositories) | 14:48 |
fungi | and it seems like memory pressure might be making filesystem caches inefficient | 14:48 |
*** hasharAway is now known as hashar | 14:49 | |
TheJulia | OH! | 14:52 |
TheJulia | Yeah, that explains a lot | 14:52 |
TheJulia | since it *looks* like the client asks for things like all my changes, all my blah, at least what I can grok on the screen, and if that doesn't load quite fast enough then the page load breaks it seems | 14:54 |
TheJulia | This is why database indexes are a thing too | 14:54 |
TheJulia | "Hi, give me the index" vs "hi, pls tablescan this for me" | 14:54 |
fungi | yeah, and gerrit maintains very large in-memory and on-disk caches of stuff, but the indexing even in caches becomes quite important | 14:57 |
*** Green_Bird has joined #opendev | 14:59 | |
*** ysandeep is now known as ysandeep|dinner | 15:00 | |
*** eolivare has quit IRC | 15:01 | |
*** Green_Bird has quit IRC | 15:01 | |
*** eolivare has joined #opendev | 15:02 | |
*** Green_Bird has joined #opendev | 15:02 | |
fungi | dtantsur: it looks like uploads for that build worked fine, but that one file is not available in swift for some reason (other logs uploaded for that same build can be accessed no problem). have you seen more examples of this? maybe we can find a commonality | 15:03 |
*** Green_Bird has quit IRC | 15:03 | |
fungi | specifically, https://14f46b65f6b8edf7deec-a7117e65d5d46fb2ebde9a8b3aa13b86.ssl.cf2.rackcdn.com/780251/1/check/releases-tox-list-changes/6f9c830/job-output.txt reports a "Content Encoding Error" from the rackspace swift cdn | 15:03 |
*** artom has quit IRC | 15:04 | |
*** Green_Bird has joined #opendev | 15:04 | |
dtantsur | I haven't seen other cases, no | 15:05 |
fungi | so this may be something broken in rackspace's cdn layer, or data corruption at rest (though swift i think prevents that, i don't know how "swift" rackspace's deployment is), or it could be we did something weird when uploading the file (but which did not produce any error) | 15:05 |
*** Green_Bird has quit IRC | 15:05 | |
*** Green_Bird has joined #opendev | 15:06 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: refstack: Fix openid endpoint https://review.opendev.org/c/opendev/system-config/+/780272 | 15:09 |
kopecmartin | fungi: clarkb ianw ^^ | 15:09 |
kopecmartin | fungi: thanks .. i didn't notice it was overrided in the config, I checked just the default value in refstack ..ah | 15:10 |
fungi | kopecmartin: awesome, reviewing now | 15:11 |
fungi | kopecmartin: i've approved, once it deploys please double-check whether things are working as desired | 15:12 |
*** artom has joined #opendev | 15:16 | |
*** lpetrut has quit IRC | 15:16 | |
fungi | i need to pop out to run some errands (i'm a bit behind) but should be back in an hour | 15:19 |
kopecmartin | fungi: thank you, sure | 15:20 |
openstackgerrit | Merged opendev/system-config master: refstack: Fix openid endpoint https://review.opendev.org/c/opendev/system-config/+/780272 | 15:42 |
clarkb | fungi: re the gear change safety I think guillaumec is saying that those chagnes will break zuul testing on focal with python 3.8 | 16:03 |
clarkb | fungi: and it appears related to the enablement of tls 1.3 via PROTOCOL_TLS | 16:03 |
clarkb | we could maybe update the bottom change to disable 1.3 for now? | 16:03 |
clarkb | ( I worry that is the sort of change that becomes permanent) | 16:04 |
*** hashar is now known as hasharAway | 16:06 | |
clarkb | guillaumec: maybe we can try to do a minimal reproducer forcing tls 1.3 between client and server and then asking both of them to stop? | 16:08 |
clarkb | guillaumec: since the test is timing out I suspect that it may just be a teardown/cleanup problem? | 16:08 |
*** hasharAway is now known as hashar | 16:11 | |
*** dhellmann has quit IRC | 16:12 | |
clarkb | fungi: TheJulia: ovs did not support vxlan over ipv6 until relatively recently (and even that may be spec defying?). One option may be to update the multi node bridge stuff to run it over ipv6 if present as that will get us on the prefered IP stack for providers like limestone | 16:13 |
*** dhellmann has joined #opendev | 16:13 | |
clarkb | though using codesearch I'm not sure that the multi node bridge stuff is involved? seems like this may all happen in devstack | 16:14 |
TheJulia | yeah, there is some magic there someplace in the entire multinode setup | 16:14 |
TheJulia | I have to hunt it down every single time I need to look at it :\ | 16:14 |
TheJulia | That might be an option, interestingly enough cross-node v4 seems to be fine in general but we may just not be seeing everything from the job logs that could be happening that makes it seem like everything is fine | 16:16 |
*** ysandeep|dinner is now known as ysandeep | 16:18 | |
clarkb | ya and vxlan is udp and could be more sensitive to those problems? | 16:18 |
*** klonn has joined #opendev | 16:19 | |
clarkb | another issue it could be is conflicting ip addrs | 16:20 |
clarkb | we saw that way back when osic was around because they assigned test node ips out of 10/8 and occassionally the overlays would overlap ip ranges and routing would break | 16:20 |
clarkb | ok it is using the multinode network setup via zuul. It does so with a patch interface between brbm and br-infra called phy-brbm-infra | 16:22 |
clarkb | and phy-infra-brbm | 16:22 |
clarkb | they are opposite ends of the same virtual cable | 16:22 |
clarkb | does not appear to be an ip conflict. br-infra uses 172.24.4.0/24 and the limestone nodes are 10.4.70.0/24 | 16:25 |
clarkb | that probably rules out the easy things, holding a couple of nodes and inspecting the result is likely the easiest way to debug | 16:29 |
clarkb | ianw: the mina sshd feature request lgtm. Also chris might be my hero | 16:38 |
kopecmartin | clarkb: fungi will this https://review.opendev.org/c/opendev/system-config/+/780272 be applied on the server automatically or is there a manual action required? | 16:39 |
fungi | kopecmartin: looks like we don't have a separate deploy job for it yet, so it should get applied in our hourly deployment i think? i'll check in a sec | 16:40 |
fungi | or it could be the deploy jobs haven't finished yet | 16:40 |
kopecmartin | great, thanks .. just wanted to be sure | 16:41 |
clarkb | there should be an infra-prod job, we may have to intervene and restart the service to pick up the config change though | 16:41 |
fungi | kopecmartin: oh, it just hasn't run yet, see the deploy pipeline at https://zuul.opendev.org/t/openstack/status | 16:41 |
clarkb | I notice ianw did that earlier for the fqdn switch | 16:41 |
fungi | there is an infra-prod-service-refstack build for it in waiting state, but it's a ways down the list | 16:42 |
fungi | and yeah, maybe the playbook needs a restart handler for config changes | 16:42 |
*** amoralej is now known as amoralej|off | 16:42 | |
*** marios is now known as marios|out | 16:45 | |
ttx | one of the jobs seems to have failed | 16:47 |
ttx | infra-prod-base on the deploy of 780272 | 16:47 |
fungi | yeah, the infra-prod-base job probably had trouble deploying to a down server somewhere, i'm about to go hunting in the logs on the bastion, it shouldn't affect refstack deployment unless it was the refstack server which was the problem | 16:48 |
ttx | ack | 16:48 |
*** marios|out has quit IRC | 16:48 | |
fungi | that's the job which does things like add our sysadmin accounts, set up mta configs, et cetera | 16:48 |
fungi | but it's running against every machine in our inventory, so if one is down/hung somewhere, that'll report a build failure | 16:49 |
fungi | d'oh! | 16:50 |
fungi | refstack.openstack.org : ok=0 changed=0 unreachable=1 failed=0 s | 16:50 |
fungi | kipped=0 rescued=0 ignored=0 | 16:50 |
fungi | so bridge can't reach refstack.openstack.org | 16:50 |
fungi | aha, expected | 16:51 |
fungi | refstack01.openstack.org : ok=60 changed=2 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0 | 16:51 |
fungi | refstack01.openstack.org is working, but the refstack.openstack.org server in our inventory (i'm guessing the old one) is unreachable, probably offline in preparation for being deprovisioned but we haven't deleted it from the inventory yet | 16:51 |
clarkb | yup, ianw says the old server would be shutdown but not removed for now | 16:52 |
fungi | ttx: so that build failure is expected in this case | 16:52 |
fungi | i suppose we could have added that server to our disable list to avoid the deploy build trying to reach it and reporting failure | 16:55 |
fungi | something we could consider for future deprovisioning work | 16:55 |
TheJulia | clarkb: oh yeah, definitely way more sensitive | 16:56 |
clarkb | fungi: ya and maybe we should go ahead and add it now to prevent confusion until it is removed? | 16:57 |
TheJulia | fungi: ^^^ that is why we lowered the mtu a long time ago.... I remembered :( | 16:57 |
*** hashar has quit IRC | 16:59 | |
*** eolivare has quit IRC | 17:06 | |
*** jpena is now known as jpena|brb | 17:12 | |
fungi | TheJulia: i'm sorry, we should have waited to trigger those memories until beer time | 17:25 |
fungi | clarkb: good call, added it just now | 17:25 |
ttx | fungi: ok let me know when I should be testing again :) | 17:28 |
fungi | will do, looks like there are still three deploy jobs ahead of it | 17:36 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Enable srvr, stat and dump commands in the zk cluster https://review.opendev.org/c/opendev/system-config/+/780303 | 17:36 |
fungi | the semaphore those jobs use tends to slow this down quite a bit | 17:36 |
clarkb | corvus: ^ enabling those commands | 17:37 |
*** jpena|brb is now known as jpena | 17:54 | |
fungi | ttx: kopecmartin: refstack deployment finished at 17:49:57 utc, i'll check whether the service got restarted | 17:55 |
*** ralonsoh has quit IRC | 17:55 | |
fungi | looks like the container was last upped at 02:34 utc, according to ps | 17:56 |
fungi | also /var/refstack/refstack.conf was last modified on february 10, not sure if that's old. checking the bindmounts now | 17:57 |
clarkb | fungi: I need breakfast now that the nodepool launcher debugging is done, but I can help with refstack once I've eaten something | 17:58 |
fungi | aha, yeah that's cruft, it looks at /var/lib/refstack/etc/refstack.conf now and that was modified 17:49 | 17:58 |
fungi | openstack_openid_endpoint = https://openstackid.org/accounts/openid2 | 17:59 |
fungi | ttx: kopecmartin: so the config looks correct. will the service need a restart to see the updated refstack.conf file or does it reload it autonomously? sounds like ianw did an explicit restart to pick up an earlier config change | 17:59 |
kopecmartin | fungi: a restart will be needed | 18:01 |
kopecmartin | so that the config gets copied to the container and is applied | 18:02 |
*** ykarel has quit IRC | 18:04 | |
fungi | kopecmartin: okay, doing that now | 18:05 |
fungi | we should consider adding a handler to do that on config updates if that's safe, or abstract the configuration loading into something which can be triggered by a signal (or watch for file updates directly) | 18:05 |
fungi | it's on its way back up now | 18:06 |
*** artom has quit IRC | 18:06 | |
fungi | ttx: kopecmartin: i guess go ahead and test it now | 18:06 |
kopecmartin | fungi: \o/ it works!! thank you!! | 18:07 |
*** artom has joined #opendev | 18:08 | |
fungi | kopecmartin: no thanks needed, i just pushed a few buttons... but glad it's sorted now | 18:08 |
*** dtantsur is now known as dtantsur|afk | 18:10 | |
*** hamalq has joined #opendev | 18:14 | |
fungi | #status log Restarted the containers on refstack01 to pick up configuration change from https://review.opendev.org/780272 | 18:15 |
openstackstatus | fungi: finished logging | 18:15 |
*** smcginnis has quit IRC | 18:25 | |
clarkb | corvus: still planning to do a zuul restart today? queues aren't tiny but also not huge. Node demand is very low. | 18:26 |
clarkb | also openstack release team said they could extend to monday if necessary (they seemed ok with the friday plan) | 18:27 |
*** smcginnis has joined #opendev | 18:30 | |
corvus | clarkb: extend what? | 18:31 |
clarkb | corvus: feature freeze | 18:33 |
clarkb | (I kinda got the impression a few things were going to slip even if we did nothing so they were already considering it) | 18:33 |
corvus | clarkb: yeah, i agree nodes look good now, but maybe after lunch? | 18:34 |
clarkb | corvus: wfm. Though I'll be trying to enjoy this good weather on the bike this afternoon, but will be around before and after that | 18:35 |
fungi | i'll be around | 18:35 |
*** klonn has quit IRC | 18:38 | |
clarkb | fungi: now that we've had a day to think about it, any reasons to not move forward with simply retiring those accounts with the no external id for preferred email address problem? We theorize these are the result of fallout from other sql db based account mangling as we dont' expect this is doable as a normal user. Also none of the accounts have been used in a year according to the audit script | 18:43 |
*** jpena is now known as jpena|off | 18:44 | |
fungi | no, i still think it seems like it should be entirely safe to retire those | 18:49 |
clarkb | ok I'll proceed with that now then | 18:49 |
clarkb | I went over the data again ab it more today and yo ucan see for some of the accounts they clearly transitioned from one account to another (just builds more confidence this is the right move) | 18:50 |
*** LowKey has joined #opendev | 18:52 | |
fungi | yep, i expect they're all like that, it's just harder to connect the dots for a few since it happened years before | 18:52 |
*** andrewbonney has quit IRC | 18:54 | |
clarkb | alright that is done and logs have been uploaded to review | 19:00 |
clarkb | I'm going to do a consistency check next | 19:00 |
clarkb | #status log Corrected all Gerrit preferred email lacks external id account consistency problems. | 19:14 |
openstackstatus | clarkb: finished logging | 19:14 |
clarkb | still have quite a number of external id conflicts but this is progress | 19:14 |
clarkb | the consistency resutls in are in my homedir on review | 19:14 |
* fungi will take a look shortly | 19:16 | |
*** hashar has joined #opendev | 20:07 | |
*** klonn has joined #opendev | 20:08 | |
corvus | clarkb, fungi: i'm going to start that restart now | 20:55 |
corvus | clarkb: did your nodepool change land? should we restart nodepool too? | 20:56 |
clarkb | corvus: it did land and I worked through them yesterday already | 20:56 |
corvus | clarkb: ok, so we'll just leave nodepool alone? | 20:56 |
clarkb | ya should be fine to leave nodepool alone | 20:56 |
fungi | cool, i'm here. need help? | 20:57 |
corvus | fungi: i don't think so; i'm just going to save queues then run the zuul_restart playbook | 20:57 |
fungi | i'm around to dig in if it goes pear shaped | 20:58 |
corvus | stopping now | 21:00 |
corvus | things are starting | 21:02 |
corvus | cat jobs are catting | 21:04 |
fungi | so catty | 21:06 |
fungi | our zuul is practically jellicle | 21:07 |
corvus | re-enqueing | 21:07 |
corvus | 2021-03-12 21:08:02,726 DEBUG zuul.RPCListener: Formatting tenant openstack status took 0.005 seconds for 93502 bytes | 21:08 |
corvus | that's a new log line btw | 21:08 |
*** sboyron has quit IRC | 21:08 | |
fungi | nice! i like the (albeit miniscule) measurement there | 21:08 |
corvus | see where that is when all the changes are re-enqueued :) | 21:08 |
fungi | i suppose it gets bigger when there's queue data | 21:08 |
fungi | heh, right that | 21:09 |
*** artom has quit IRC | 21:09 | |
clarkb | and we cache that for ~1second still right? | 21:09 |
corvus | yep | 21:09 |
fungi | last i looked at the apache config | 21:09 |
corvus | we cache internally too | 21:10 |
corvus | apache protects zuul-web, and zuul-web protects zuul-scheduler | 21:10 |
fungi | oh, right, the cache duration is expressed in the headers | 21:11 |
fungi | not hard-coded in the apache vhost config | 21:11 |
corvus | we're at about .03s for 500k so far | 21:11 |
corvus | (still enqueueing) | 21:11 |
*** whoami-rajat has quit IRC | 21:13 | |
corvus | #status log restarted all of zuul at commit 13923aa7372fa3d181bbb1708263fb7d0ae1b449 | 21:19 |
openstackstatus | corvus: finished logging | 21:19 |
corvus | re-enqueue is done. | 21:19 |
corvus | 2021-03-12 21:19:30,233 DEBUG zuul.RPCListener: Formatting tenant openstack status took 0.059 seconds for 877325 bytes | 21:20 |
corvus | that's looking typical | 21:20 |
corvus | sometimes it's higher, but it's not in the main thread, so can suffer from contention | 21:20 |
corvus | 0.1 looks to be the max | 21:20 |
clarkb | still well below the cache time which is why I was curious | 21:21 |
fungi | still fairly small, good sign | 21:21 |
fungi | we had almost no node request backlog prior to the restart, and the reenqueue really only shot it up to 500 briefly | 21:23 |
fungi | it's already burning down quickly | 21:23 |
fungi | we weren't even using max quota at the time of the restart, so seems like it was good timing | 21:24 |
clarkb | ya I expected even with feature freeze that friday would be much calmer | 21:24 |
fungi | everyone's already drinking | 21:24 |
fungi | why am i not drinking yet? | 21:24 |
clarkb | I'm not drinking because it is almost time to get some exercise | 21:25 |
fungi | time to exercise my liver | 21:25 |
corvus | then drinking | 21:25 |
clarkb | corvus: it is almost as warm here as there. I'm really excited | 21:25 |
fungi | it's 22.5c here | 21:26 |
fungi | crazy given this is technically still winter for more than a week | 21:26 |
clarkb | will get to 16 here in about an hour. I'm timing my outside time around that temp peak :) | 21:26 |
fungi | breezy but sunny. we should have this temperature all the time | 21:27 |
clarkb | if I go out in half and hour then my 1-1.5 hours outside should involve max warmth | 21:27 |
fungi | i should walk to the beach, but it's almost dinner | 21:27 |
*** hashar has quit IRC | 21:37 | |
*** smcginnis has quit IRC | 21:38 | |
*** smcginnis has joined #opendev | 21:44 | |
clarkb | looks like grafana says backlog is back to basically nil | 21:49 |
fungi | yes, we're back down under quota again | 21:50 |
fungi | i think that means the weekend is here | 21:50 |
clarkb | fungi: ruamel will serialize human readable yaml right? | 21:50 |
clarkb | I think my next step on the gerrit account work is to have the audit script spit out serialized data so I can write queries against it more easily | 21:50 |
fungi | clarkb: i don't know how to interpret some of those words, but it preserves ordering and comments | 21:50 |
fungi | it also comes at the cost of a spaghetti pile of ruamel libraries as dependencies | 21:51 |
clarkb | fungi: heh maybe "more human readable than pyyaml" is more accurate | 21:51 |
clarkb | I guess I can try pyyaml first | 21:51 |
clarkb | in particular what I want to start looking at is if there are any more accounts that have broken openids regardless of previous activity and realized for taht I should just try to serialize as much info as possible then write separate queries against that | 21:51 |
clarkb | also do you think we can land the tooling as proposed? | 21:52 |
fungi | workaround is to actually make comments in yaml (like have a "description" field, et cetera) | 21:52 |
clarkb | its been used a fair bit now and would make it easier for me when I switch between system-config branches to not have to always checkout that one branch to have the tools present | 21:52 |
fungi | er, yeah i'm not entirely understanding the "human readable" bit then | 21:53 |
fungi | if it's not about comments, then... | 21:53 |
*** smcginnis has quit IRC | 21:53 | |
fungi | you can make pyyaml emit more human-friendly yaml formats, you just need to configure it | 21:53 |
clarkb | fungi: maybe the pain has been in configuring it then | 21:54 |
fungi | https://mudpy.org/gitweb?p=mudpy.git;a=blob;f=mudpy/data.py;h=b73959a1b63d857657dbdd4f5afce32c3746e593;hb=HEAD#l161 | 21:55 |
fungi | i've overridden the dumper there specifically to force it to indent lists, but you can probably ignore that | 21:55 |
fungi | the end result though is to make pyyaml write files that yamllint can stomach | 21:56 |
fungi | i fond it mildly incoherent that yamllint (written in python) objects to the default output of the most commonly-used python yaml implementation, but i've come to terms with that | 21:58 |
fungi | s/fond/find/ | 21:58 |
clarkb | is pyyaml optimized for wire transfers by default? I seem to recall there may be reasons like that | 21:58 |
fungi | yeah, could be | 21:59 |
fungi | anyway, feel free to steal that, it's all isc licensed. maybe you want the indented lists too, the _IBSEmitter class is not that complicated to add | 22:00 |
clarkb | thanks | 22:00 |
*** klonn has quit IRC | 22:05 | |
fungi | i keep meaning to push a pr to pyyaml to make that configurable, but... enotime | 22:32 |
*** gothicserpent has quit IRC | 23:14 | |
*** gothicserpent has joined #opendev | 23:20 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!