ianw | i'm not sure if we've been updating or deleting dns | 00:00 |
---|---|---|
clarkb | ianw: the nameserver playbook was one of those that ran yesterday during the great big enqueing and it was successful | 00:03 |
clarkb | I didn't look at the logs though | 00:03 |
ianw | oh sorry i meant for the old servers, it looks like some we haven't cleaned up | 00:03 |
ianw | i'll add it to the todos | 00:03 |
clarkb | oh in rax managed dns, got it | 00:04 |
ianw | i'm also going to merge the ord.rax.opensatck removal, as that is also shutdown and currently in emergency | 00:05 |
ianw | that will finalise ord+dfw | 00:06 |
ianw | iad to be switched in when we feel like watching it | 00:06 |
ianw | this leaves limestone and linaro-london | 00:06 |
ianw | clarkb: you have two shutoff test instances in limestone ... do you want them? | 00:08 |
clarkb | ianw: uhm I think they can go I was debugging something there iirc. what are they called? | 00:09 |
ianw | clarkb-test1 and clarkb-test2 :) | 00:09 |
clarkb | ya those can go | 00:09 |
clarkb | I think they were debugging a weird neutron thing that never resulted in much | 00:09 |
openstackgerrit | Merged opendev/system-config master: Remove mirror02.dfw.rax.openstack.org https://review.opendev.org/727894 | 00:13 |
ianw | we also had an old unattached "afs" volume in there | 00:14 |
openstackgerrit | Merged opendev/system-config master: Remove mirror01.ord.rax.openstack.org https://review.opendev.org/727897 | 00:20 |
ianw | kevinz: do you have thoughts on the linaro london zone? will it ever come back, or should we remove it? | 00:22 |
openstackgerrit | Ian Wienand proposed opendev/zone-opendev.org master: Add limestone replacement mirror https://review.opendev.org/728314 | 00:25 |
openstackgerrit | Merged opendev/system-config master: Add infra-root-keys-2020-05-13 to rotate older ssh keys https://review.opendev.org/727865 | 00:29 |
openstackgerrit | melanie witt proposed zuul/zuul-jobs master: Run sphinx-build in parallel for releasenotes https://review.opendev.org/727473 | 00:35 |
mordred | ianw: sorry. wasn't really here, still not really here. it's entirely probably that the -arm64 image in /opt/images on bridge is garbage and should be rebuilt | 00:38 |
mordred | ianw: I built the others using /opt/nodepool_dib/make-focal.sh on nb04.opendev.org and then scp'd them over to bridge | 00:39 |
ianw | mordred: ok, i've been just using an upstream aMD64 image for the latest couple of mirror servers, just to get things going. we may have to debug glean more on focal, although i don't recall seeing any gate failures | 00:44 |
openstackgerrit | Monty Taylor proposed opendev/base-jobs master: Add jobs for publishing javascript content https://review.opendev.org/728097 | 00:47 |
mordred | ianw: nod. we should probably try a re-build now that you added focal support properly too | 00:48 |
smcginnis | Anyone know if this new docs job step was just recently added: "tox -e pdf-docs -vv > [file]" | 00:48 |
ianw | smcginnis: not much help, i remember some pdf things months ago but nothing recently | 00:52 |
clarkb | smcginnis: yes | 00:52 |
clarkb | chqnge was landed today because jobs were producing hundreds of megabytes of logs in latex warnings | 00:53 |
smcginnis | Looks like the normal step ignores errors, but then tries to run again and fails. | 00:53 |
smcginnis | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_31c/725767/9/check/openstack-tox-docs/31c3adb/job-output.txt | 00:53 |
smcginnis | I've seen multiple failures in different repos. | 00:53 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add vexxhost opendev.org mirrors https://review.opendev.org/728308 | 00:53 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add limestone replacement mirror https://review.opendev.org/728314 | 00:53 |
smcginnis | Not every repo does the pdf builds, so it needs to ignore errors. | 00:53 |
clarkb | smcginnis: ya https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_31c/725767/9/check/openstack-tox-docs/31c3adb/sphinx-build-pdf.log I think we need the same check the include role before it has | 00:54 |
clarkb | I can push a fix soon | 00:55 |
smcginnis | clarkb: Yeah, looks like it needs the "when" part? https://opendev.org/openstack/openstack-zuul-jobs/commit/edc845f0a6da5e7e278eff7d9cc54a261ee531aa | 00:55 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add limestone opendev.org server https://review.opendev.org/728316 | 00:56 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Switch limestone server to opendev.org https://review.opendev.org/728318 | 01:08 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove limestone openstack.org mirror https://review.opendev.org/728319 | 01:15 |
*** ysandeep|sleep is now known as ysandeep | 01:34 | |
kevinz | Hi ianw: yes we can remove it. We will only use linaro US in the future | 02:00 |
ianw | kevinz: ok, thanks, that will simplify things | 02:02 |
ianw | clarkb: if around, not sure what you mean by cleanup from nb04? that doesn't have a separate config file now? | 02:07 |
ianw | i agree we can remove the server if we want | 02:08 |
clarkb | ianw oh I think frickler and I assumed there was still a separate file | 02:09 |
clarkb | I checked inventory for it (its there) but not project-config | 02:09 |
ianw | ok, i'll stack linaro-london removal ontop of it too | 02:11 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove linaro-london cloud https://review.opendev.org/728332 | 02:17 |
*** icarusfactor has joined #opendev | 02:25 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove citycloud https://review.opendev.org/727905 | 02:26 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove linaro-london cloud https://review.opendev.org/728334 | 02:26 |
*** factor has quit IRC | 02:26 | |
ianw | i believe that is everything. i'll try and stack it all | 02:49 |
*** icarusfactor has quit IRC | 02:51 | |
*** icarusfactor has joined #opendev | 02:52 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove citycloud https://review.opendev.org/727905 | 03:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove linaro-london cloud https://review.opendev.org/728334 | 03:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add vexxhost opendev.org mirrors https://review.opendev.org/728309 | 03:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add limestone opendev.org server https://review.opendev.org/728316 | 03:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove vexxhost openstack.org mirrors https://review.opendev.org/728311 | 03:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove limestone openstack.org mirror https://review.opendev.org/728319 | 03:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove citycloud https://review.opendev.org/727905 | 03:12 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove linaro-london cloud https://review.opendev.org/728334 | 03:12 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add vexxhost opendev.org mirrors https://review.opendev.org/728309 | 03:12 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add limestone opendev.org server https://review.opendev.org/728316 | 03:12 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove vexxhost openstack.org mirrors https://review.opendev.org/728311 | 03:12 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove limestone openstack.org mirror https://review.opendev.org/728319 | 03:12 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove iad.rax openstack.org mirror https://review.opendev.org/728339 | 03:12 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Switch RAX IAD mirror to opendev.org version https://review.opendev.org/727917 | 03:21 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove citycloud kna1/lon1/sto2 clouds https://review.opendev.org/727902 | 03:21 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove Citycloud from grafana https://review.opendev.org/727956 | 03:21 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Remove linaro-london cloud https://review.opendev.org/728332 | 03:21 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Switch vexxhost mirrors to opendev.org https://review.opendev.org/728310 | 03:21 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Switch limestone server to opendev.org https://review.opendev.org/728318 | 03:21 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: site-variables: remove opendev.org mirror switch https://review.opendev.org/728345 | 03:21 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove puppet mirror support https://review.opendev.org/728350 | 03:32 |
*** kevinz has quit IRC | 03:37 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove puppet mirror support https://review.opendev.org/728350 | 03:40 |
ianw | my new work entitled "how to remove openstack.org mirrors in 15 easy steps" | 03:42 |
*** factor has joined #opendev | 03:45 | |
*** kevinz has joined #opendev | 03:46 | |
*** icarusfactor has quit IRC | 03:47 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove linaro-london cloud https://review.opendev.org/728334 | 03:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add vexxhost opendev.org mirrors https://review.opendev.org/728309 | 03:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add limestone opendev.org server https://review.opendev.org/728316 | 03:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove vexxhost openstack.org mirrors https://review.opendev.org/728311 | 03:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove limestone openstack.org mirror https://review.opendev.org/728319 | 03:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove puppet mirror support https://review.opendev.org/728350 | 03:49 |
*** factor has quit IRC | 03:51 | |
*** factor has joined #opendev | 03:52 | |
*** ykarel|away is now known as ykarel | 03:54 | |
*** ysandeep is now known as ysandeep|afk | 04:51 | |
ianw | clarkb: so, the tl;dr for the mirror work is start @ https://review.opendev.org/#/c/728339/ and follow the depends-on and changes ontop of that | 05:07 |
ianw | it alternates between removing things from project-config and then system-config ... but it should all unravel. probably need to wait between deployments for some of it | 05:07 |
ianw | the "did read" version is at https://etherpad.opendev.org/p/openstack.org-mirror-be-gone | 05:08 |
*** DSpider has joined #opendev | 05:16 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: ubuntu-minimal : only install 16.04 HWE kernel on xenial https://review.opendev.org/726996 | 05:39 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: ubuntu-minimal: Add Ubuntu Focal test build https://review.opendev.org/725752 | 05:39 |
*** diablo_rojo has quit IRC | 05:54 | |
*** dpawlik has joined #opendev | 05:58 | |
*** ysandeep|afk is now known as ysandeep | 06:03 | |
frickler | infra-root: the ubuntu mirror still seems to be 7d old, it is also quite close to its quota, not sure if that might be related | 06:29 |
openstackgerrit | Merged openstack/project-config master: Switch RAX IAD mirror to opendev.org version https://review.opendev.org/727917 | 06:32 |
frickler | re-running the vos release in the screen on afs01.dfw | 06:35 |
*** ykarel is now known as ykarel|afk | 06:49 | |
frickler | o.k., that finished pretty fast and now /afs/openstack.org/mirror/ubuntu/timestamp.txt has the same May 13 date as the .openstack.org version | 06:53 |
frickler | still waiting for someone to confirm and check the quota situation before dropping the lock on mirror-update | 06:53 |
ianw | frickler: it looks like 40-50gb free; i don't oppose upping it if you like but probably not related to any issues i'd say? | 07:03 |
*** rpittau|afk is now known as rpittau | 07:12 | |
*** DSpider has quit IRC | 07:22 | |
*** DSpider has joined #opendev | 07:27 | |
*** tosky has joined #opendev | 07:35 | |
*** ykarel|afk is now known as ykarel | 07:57 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** slaweq has joined #opendev | 08:04 | |
*** tkajinam has quit IRC | 08:14 | |
*** dtantsur|afk is now known as dtantsur | 08:17 | |
*** ykarel is now known as ykarel|lunch | 08:27 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: DNM: Debug sibling install https://review.opendev.org/728384 | 08:44 |
*** ysandeep is now known as ysandeep|lunch | 09:01 | |
*** ykarel|lunch is now known as ykarel | 09:07 | |
frickler | #status log vos release for mirror.ubuntu completed successfully, dropped the lock on mirror-update to resume normal operations | 09:14 |
openstackstatus | frickler: finished logging | 09:14 |
frickler | fungi: ^^ I left your screens in place in case you want to crosscheck, feel free to clean up | 09:15 |
*** ysandeep|lunch is now known as ysandeep | 09:39 | |
*** rpittau is now known as rpittau|bbl | 10:09 | |
*** dpawlik has quit IRC | 10:27 | |
*** dpawlik has joined #opendev | 10:28 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: bindep: Add missing virtualenv and fixed repo install https://review.opendev.org/693637 | 10:31 |
openstackgerrit | Merged openstack/project-config master: Remove citycloud kna1/lon1/sto2 clouds https://review.opendev.org/727902 | 12:05 |
*** rpittau|bbl is now known as rpittau | 12:06 | |
openstackgerrit | Merged openstack/project-config master: Remove Citycloud from grafana https://review.opendev.org/727956 | 12:07 |
openstackgerrit | Merged openstack/project-config master: Remove linaro-london cloud https://review.opendev.org/728332 | 12:07 |
*** sshnaidm|afk is now known as sshnaidm|off | 12:07 | |
*** ysandeep is now known as ysandeep|afk | 12:16 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: Add remove-zuul-sshkey https://review.opendev.org/680712 | 12:24 |
*** ysandeep|afk is now known as ysandeep | 12:29 | |
*** panda|out is now known as panda | 12:30 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Fix broken tox-siblings.yaml test https://review.opendev.org/728438 | 12:55 |
*** ykarel is now known as ykarel|afk | 12:56 | |
openstackgerrit | Monty Taylor proposed opendev/base-jobs master: Add jobs for publishing javascript content https://review.opendev.org/728097 | 13:00 |
mordred | AJaeger: how does that look now? ^^ | 13:06 |
mordred | AJaeger: I updated https://review.opendev.org/#/c/726554/ to use it | 13:06 |
mordred | (of course config project, so that patch doens't work yet) | 13:06 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: tox: allow tox to be upgraded https://review.opendev.org/690057 | 13:13 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Fix broken tox-siblings.yaml test https://review.opendev.org/728438 | 13:32 |
openstackgerrit | Merged zuul/zuul-jobs master: Add remove-zuul-sshkey https://review.opendev.org/680712 | 13:33 |
fungi | frickler: awesome, thanks for following up! | 13:46 |
fungi | it was at the top of my to do list for when i woke up, so that's a nice start to my morning ;) | 13:47 |
*** kevinz has quit IRC | 13:50 | |
*** mlavalle has joined #opendev | 13:59 | |
*** icarusfactor has joined #opendev | 14:02 | |
*** factor has quit IRC | 14:05 | |
*** ykarel|afk is now known as ykarel | 14:10 | |
*** avass has quit IRC | 14:16 | |
openstackgerrit | Merged zuul/zuul-jobs master: Fix broken tox-siblings.yaml test https://review.opendev.org/728438 | 14:19 |
*** kevinz has joined #opendev | 14:28 | |
donnyd | is there any way to run dib-lint without checking the lib dir | 14:44 |
donnyd | I am trying to write standalone elements that are tracked in separate repos, and I want to just lint element | 14:44 |
donnyd | I don't see a way to do that in here https://opendev.org/openstack/diskimage-builder/src/tag/2.36.0/bin/dib-lint | 14:45 |
fungi | yeah, not sure, you might ask in #openstack-dib | 14:52 |
donnyd | oh, didn't know that was a thing | 14:52 |
donnyd | thanks fungi | 14:52 |
fungi | yw | 14:52 |
AJaeger | infra-rot, https://zuul.opendev.org/ is timing out for me | 14:57 |
AJaeger | infra-root, I mean ^ | 14:58 |
mordred | wfm | 14:59 |
AJaeger | now as well for me | 15:00 |
AJaeger | might have been a longer ntework problem | 15:00 |
fungi | yeah, a cursory check of the server yields nothing surprising | 15:06 |
AJaeger | I now have other network problems as well, so might be on my end - thanks for checking | 15:11 |
fungi | no problem, i'd rather catch problems early if we have any | 15:16 |
*** ykarel is now known as ykarel|away | 15:16 | |
*** avass has joined #opendev | 15:18 | |
clarkb | ++ | 15:32 |
AJaeger | :) | 15:34 |
clarkb | 728286 merged. That eould be a good one to restart the zuul scheduler on. I think our zuul-web memory canary is still happy too | 15:34 |
*** ysandeep is now known as ysandeep|away | 15:36 | |
*** dpawlik has quit IRC | 15:41 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Don't require tox_envlist https://review.opendev.org/726829 | 15:44 |
clarkb | mordred: we might need to remove the +A from https://review.opendev.org/#/c/728310/ and https://review.opendev.org/#/c/728318/2 because I'm not sure the depends on is sufficient to ensure we have a running new server prior to updating site-variables? | 15:47 |
clarkb | mordred: and maybe you can double check my comment on https://review.opendev.org/#/c/728334/4 | 15:51 |
openstackgerrit | Jeremy Stanley proposed opendev/jeepyb master: Update OpenDev Manual URL in new contributor intro https://review.opendev.org/728479 | 15:53 |
openstackgerrit | Merged opendev/system-config master: Remove iad.rax openstack.org mirror https://review.opendev.org/728339 | 16:10 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Don't require tox_envlist https://review.opendev.org/726829 | 16:12 |
openstackgerrit | Merged opendev/system-config master: Remove citycloud https://review.opendev.org/727905 | 16:18 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Remove unecessary environment from tox_siblings test https://review.opendev.org/728494 | 16:21 |
openstackgerrit | Jeremy Stanley proposed opendev/jeepyb master: Update OpenDev Manual URL in new contributor intro https://review.opendev.org/728479 | 16:29 |
*** cmurphy is now known as cmorpheus | 16:33 | |
openstackgerrit | Merged openstack/project-config master: Finish retiring x/pbrx https://review.opendev.org/726463 | 16:35 |
clarkb | infra-root I think I'm in a good spot to restart the zuul scheduler and mergers (and executors and web I guess if we do the whole thing). Looking at running jobs it looks like tripleo may have a whole stack of things mergein the next half hour so maybe plan for ~17:15 ish restart? | 16:36 |
clarkb | I think my change to add a start.yaml for executors landed so we can use the big restart playbook | 16:37 |
clarkb | this will get us the "don't reload configs for tags" and "fix config errors when projects are used across tenants in exciting ways" bugfixes | 16:37 |
clarkb | as well as python3 usage for scheduler and mergers | 16:37 |
clarkb | *python3.7 | 16:40 |
mordred | clarkb: yes - I agree with your comment on 728334 | 16:43 |
corvus | clarkb: sounds good; i'm back from running an errand | 16:46 |
fungi | i've also got my lunchtime chores out of the way and will be on hand for a while | 16:48 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Add DaemonSet check for wait-for-pods role https://review.opendev.org/728503 | 16:52 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run Zuul, Nodepool, and Zookeeper as the "container" user https://review.opendev.org/726958 | 17:04 |
*** mlavalle has quit IRC | 17:05 | |
clarkb | zuul says its ~5 minutes until the end of that stack. I'll start a root screen on bridge | 17:05 |
*** mlavalle has joined #opendev | 17:06 | |
clarkb | I've got the command queued up there ready to go if it looks good (currently prefixed by # so it won't run early) | 17:07 |
clarkb | maybe someone else can capture and restore queues? | 17:07 |
corvus | clarkb: sure | 17:07 |
clarkb | corvus: thanks. We are just waiting on tripleo changes to merge (or fail) now | 17:08 |
*** dtantsur is now known as dtantsur|afk | 17:08 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Add iptables_extra_allowed_groups https://review.opendev.org/726475 | 17:10 |
clarkb | infra-root we've also got a few deploy things queued up. I'm not sure how important it is to wait for those | 17:11 |
clarkb | tripleo gate just cleared out | 17:11 |
corvus | clarkb: they will get re-enqueued | 17:11 |
*** rpittau is now known as rpittau|afk | 17:11 | |
clarkb | corvus: let me know when you are ready for me to run the ansible-playbook command in the bridge screen | 17:11 |
corvus | clarkb: and most things should be idempotent, so i think the main risk would be if we killed it while it was in the middle of something important. i say we yolo it's friday. | 17:11 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Add DaemonSet check for wait-for-pods role https://review.opendev.org/728503 | 17:11 |
clarkb | corvus: rgr | 17:12 |
corvus | clarkb: i have saved queues; clear to go | 17:12 |
clarkb | corvus: ok proceeding in bridge screen now | 17:12 |
clarkb | we are in the wait for executors to stop phase | 17:13 |
clarkb | everything else seems to have stopped | 17:13 |
clarkb | zuul process count on ze01 is slowly falling | 17:15 |
clarkb | (still waiting but numbers look to be dropping, just need patience) | 17:19 |
clarkb | 5 executors have stopped | 17:22 |
clarkb | now waiting on only ze10 | 17:23 |
clarkb | playbooks is done it claims to have started things | 17:24 |
clarkb | I see the expected processes running | 17:24 |
Open10K8S | Hi team | 17:24 |
Open10K8S | Zuul is not responding | 17:24 |
Open10K8S | 503 service unavailable | 17:24 |
corvus | cat jobs are away | 17:25 |
clarkb | scheduler has submitted merge jobs as expected | 17:25 |
clarkb | Open10K8S: we are doing a global restart to pick up a number of changes that got queued up behind openstack release | 17:25 |
clarkb | Open10K8S: it should be up in a few more minutes | 17:25 |
Open10K8S | clarkb: ok, got it. | 17:26 |
Open10K8S | Scheduled downtime is not a bad thing :) | 17:26 |
clarkb | well semi scheduled. Its friday and we had a good window :) | 17:26 |
Open10K8S | clarkb: Just want to check if it is an unscheduled one :) | 17:26 |
Open10K8S | clarkb: Just nofity to check if it is an unscheduled one :) | 17:26 |
clarkb | ++ and tahnks | 17:26 |
Open10K8S | clarkb: pleasure | 17:26 |
clarkb | corvus: looks like scheduler is up now | 17:27 |
corvus | re-enqueing | 17:27 |
clarkb | thats good, now we've tested the new version of the zuul restart playbook | 17:28 |
clarkb | when you're happy with it I'll status log what this means for the running zuul | 17:28 |
corvus | clarkb: re-enqueue is still going, but i think we see enough jobs actually running we can call it good | 17:29 |
clarkb | great | 17:29 |
fungi | yeah, lgtm | 17:29 |
mordred | Open10K8S: people sometimes ask us "what sort of monitoring system do you use" and the answer is "our users" ;) | 17:29 |
corvus | it's made out of people | 17:30 |
Open10K8S | mordred: loll, yeah | 17:30 |
clarkb | #status log Restarted All of Zuul on version: 3.18.1.dev166 319bbacf. This has scheduler, web, and mergers running under python3.7. We have also incorporated bug fixes for config loading (handle errors across tenants and don't load from tags) as well as improvements to the merger around reseting repos and setting HEAD. | 17:31 |
openstackstatus | clarkb: finished logging | 17:31 |
AJaeger | mordred: what shall we do with openstackci in https://review.opendev.org/720892 ? That one is still used. | 17:31 |
AJaeger | mordred: apparently for nb03 | 17:31 |
clarkb | AJaeger: mordred I think we can keep openstackci testing for now and do cleanup for other bits, then when nb03 is dockered we can clean it and openstackci up? | 17:34 |
mordred | yeah | 17:34 |
fungi | as a side note, our zuul scheduler vm has been up 2 years 4 months since its last reboot | 17:35 |
mordred | fungi: wow | 17:35 |
fungi | who says clouds aren't stable? | 17:35 |
mordred | fungi: "they" | 17:35 |
clarkb | infra-root any reason to keep the bridge screen running? the zuul_restart.yaml playbook ran with no errors | 17:35 |
fungi | clarkb: i know of none | 17:36 |
mordred | clarkb, fungi: if you've got a sec, https://review.opendev.org/#/c/728097/ - needed for https://review.opendev.org/#/c/717371/ | 17:36 |
mordred | clarkb: nope | 17:36 |
corvus | clarkb: nope. and reenqueue is done. | 17:36 |
clarkb | cool I'll exit out of it now | 17:36 |
clarkb | Open10K8S: now is a good time to recheck your chagne if it isn't queued already | 17:37 |
clarkb | (zuul may have missed the latest patchset events while it was off) | 17:37 |
openstackgerrit | James E. Blair proposed openstack/project-config master: Use serial manager for deploy pipeline https://review.opendev.org/728532 | 17:38 |
corvus | clarkb, mordred, fungi: ^ that's available to us now | 17:39 |
corvus | i think we can drop the mutex after that | 17:39 |
clarkb | corvus: we also run hourly and daily jobs via periodic pipelines. Maybe we need to serialize those too ? ( think we may need our own daily pipelien for that too) | 17:39 |
mordred | clarkb: \o/ | 17:40 |
clarkb | mordred: looking at the js thing now | 17:40 |
mordred | clarkb: thanks | 17:40 |
Open10K8S | clarkb: ok | 17:40 |
corvus | clarkb: i don't think they need serial; if anything, i'd say supercedent | 17:40 |
corvus | but not sure it'd work there | 17:40 |
fungi | also we still have a toctou sort of bug with delayed execution in periodic jobs, right? | 17:41 |
fungi | racing with the deploy pipeline too, i'd wager | 17:42 |
clarkb | oh and we'd need the mutex to handle cross pipeline jobs | 17:42 |
clarkb | ya | 17:42 |
clarkb | so we may still need the mutex? | 17:42 |
fungi | or did we fix that by not using the zuul ref in periodic? | 17:42 |
corvus | i think the decision was made to not use the zuul ref; unsure if it has been implemented | 17:43 |
clarkb | oh right | 17:44 |
clarkb | and ya not sure if we implemented it yet. I seem to recall reviewing a change mordred wrote around it though | 17:44 |
Open10K8S | clarkb: seems likely queued/waiting status is keeping anyway | 17:45 |
mordred | god, did I write that? | 17:46 |
mordred | yes! | 17:48 |
mordred | infra_prod_run_from_master | 17:48 |
mordred | is the key | 17:48 |
clarkb | mordred: looking at the js change I feel like we've already got what we need for that but we call it "docs" or similar? | 17:51 |
clarkb | mordred: we aren't doing anything js specific there, we're just saying pull tarballs from these builds and extract them to $location | 17:52 |
Open10K8S | https://zuul.opendev.org/t/zuul/status | 17:52 |
Open10K8S | queuing for 18mins :( | 17:52 |
clarkb | Open10K8S: all the jobs had to restart so we'll probablybe at capacity for a bit while we catch back up again | 17:52 |
clarkb | mordred: AJaeger: that makes me wonder if we shouldn't just have a "extract-tarball-to-afs" job | 17:52 |
AJaeger | or role ;) | 17:53 |
AJaeger | clarkb: good spotting | 17:53 |
mordred | clarkb: yeah - I agree | 17:54 |
clarkb | anyway I +2'd the change but didn't approve in case that dedup was something others thought worthwhile | 17:54 |
clarkb | (left what I said above on the chagne too) | 17:54 |
mordred | clarkb: I think it can be done as a followup - extract an extract-tarball-to-afs job and then make the docs and javascript jobs child jobs with some vars set | 17:55 |
clarkb | mordred: k. in that case feel free to approve | 17:55 |
mordred | clarkb: kk. I'm about to be afk for a bit, so I'll hold off just so I can poke at things | 17:56 |
clarkb | ya I'm gonna keep an eye on zuul for a bit longer then try and get a bike ride in before the ussuri celebration thing later today | 17:58 |
clarkb | I even ordered beer that should be arriving before that call :) | 17:58 |
corvus | clarkb: when is that? | 17:58 |
clarkb | corvus: 2000UTC (1pm pacific) | 17:59 |
corvus | thx | 17:59 |
clarkb | corvus: I think meetpad is the planned venue but I don't know the specific url | 17:59 |
*** slaweq has quit IRC | 18:12 | |
openstackgerrit | Merged zuul/zuul-jobs master: bindep: use virtualenv_command from ensure-pip https://review.opendev.org/727561 | 18:16 |
*** iurygregory has quit IRC | 18:35 | |
fungi | clarkb: corvus: according to http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014781.html the room name is virtual-ussuri-celebration | 18:37 |
*** iurygregory has joined #opendev | 18:38 | |
fungi | i also have a beer i'm preparing to crack open in ~1.5 hours | 18:38 |
mnaser | infra-root: did nodepool get a restart too? | 18:39 |
clarkb | mnaser: no | 18:39 |
clarkb | Im gonna pop out in that bike ride now | 18:39 |
mnaser | hmm, ok, i am seeing a lot of odd queries on sjc1 gathering lists of all vms in 86bbbcfa8ad043109d2d7af530225c72 (aka openstackjenkins) | 18:40 |
mnaser | no nodepool changes seem to stick out | 18:40 |
fungi | the launchers do periodically query the full list of instances to look for leaked nodes with their metadata set but which they've otherwise lost track of | 18:44 |
*** iurygregory has quit IRC | 18:45 | |
openstackgerrit | Merged openstack/project-config master: Use serial manager for deploy pipeline https://review.opendev.org/728532 | 18:46 |
*** icarusfactor has quit IRC | 18:47 | |
mnaser | i do wonder how often they're listing all instances though | 19:00 |
mnaser | cause "pretty damn oftem" seems to be from how its hitting the api (and i just kinda ran into a slight nova overoptimization thats' hurting it) | 19:00 |
fungi | what frequency of those queries are you seeing? curious if it matches our cleanup frequency | 19:03 |
fungi | i think that's tunable on our side but will need to dig into it | 19:03 |
mnaser | fungi: i haven't looked but the db seems to be getting hit a lot | 19:04 |
fungi | i thought we only queried it every 10 minutes | 19:07 |
fungi | looking | 19:07 |
mnaser | fungi: i think they may be like accumulating and then retrying? | 19:08 |
mnaser | like even when i kill it i instantly see like over 20+ queries/attempts | 19:09 |
mnaser | unless thats nova retrying | 19:09 |
fungi | looks like the nodepool.launcher.NodePool class hard-codes its cleanup_interval to 60 (seconds) and i can't seem to find any configuration option to adjust that (even globally, much less per provider): https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L874 | 19:16 |
mnaser | yeah but i think another reason here is the way that quota is being calculated | 19:17 |
mnaser | quota checks did those queries and the big burst of nodepool provisions hit the cloud hard i think | 19:20 |
fungi | mnaser: i do see some tracebacks in the launcher logs on nl03 about cleanupworker getting "Server Error for url: https://compute-sjc1.vexxhost.us/v2.1/os-volumes_boot, No server is available to handle this request.: 503 Service Unavailable" | 19:31 |
fungi | the cleanup_interval seems to be the wait time between completion of the previous pulse and the start of the next, so taking pulse runtime into account it looks like it's waking up roughly every two minutes | 19:33 |
fungi | on nl03 anyway | 19:33 |
fungi | it's relatively infrequent, but happened at 17:49, 18:28, 18:35, 18:45, 18:54 | 19:35 |
fungi | those are the only occurrences in the past 6 hours though | 19:35 |
fungi | of the http 503 errors i mean | 19:36 |
corvus | mnaser: i can turn on more debugging if it would help to know when we're sending what api queries | 19:37 |
mnaser | corvus: i think we're able to troubleshoot, i think it might be something in the nova quota code, we're debugging in #openstack-nova :) tgank you tho! | 19:37 |
corvus | mnaser: ack. this anomoly is interesting: http://grafana.openstack.org/d/nuvIH5Imk/nodepool-vexxhost?orgId=1 | 19:38 |
mnaser | corvus: we found out that a certain query wasn't optimized that could result in >16s per server boot, i assume post-zuul reboot, we got a big surge of new vm requests, 16s each really started straining the db | 19:38 |
corvus | mnaser: yeah, i think that lines up with the graph | 19:39 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Bump ansible-lint to 4.3.0 https://review.opendev.org/702679 | 19:39 |
*** noonedeadpunk has quit IRC | 19:42 | |
*** noonedeadpunk has joined #opendev | 19:42 | |
corvus | yep this is not a valid iptables rule: https://zuul.opendev.org/t/openstack/build/ec6dea2a44a04fd0aa03c287be6e59da/log/zk01.opendev.org/rules.v4.txt#18 | 19:44 |
corvus | maybe that's running into mordred's "sometimes hostvars don't seem to be set" thing....? | 19:45 |
corvus | oh! | 19:46 |
corvus | our fake inventory in gate jobs doesn't have "public_ipv4" and "public_ipv6" set | 19:46 |
corvus | only "ansible_host" | 19:47 |
corvus | hrm, we need to map nodepool.public_ipv4 to public_ipv4 when we rewrite the inventory | 19:51 |
corvus | actually, nodepool.public_ipv4 -> public_v4 | 19:51 |
corvus | party time? | 20:01 |
fungi | party time! | 20:01 |
clarkb | yup I've just gotten back to my desk too | 20:02 |
fungi | we're up to 13 in the room | 20:02 |
fungi | up to 15 now, though audio is getting choppy for me | 20:04 |
fungi | 17 people on now | 20:05 |
fungi | is audio inaudibly choppy for anyone else, or is it just my connection? | 20:06 |
clarkb | its been mostly ok for me | 20:08 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Add DaemonSet check for wait-for-pods role https://review.opendev.org/728503 | 20:08 |
clarkb | depends on the person. diablo_rojo_phon and bnemec and tim are coming through ok | 20:08 |
clarkb | ttx was harder to undersatnd | 20:08 |
fungi | i think it's me, my load average is up to 10 now, tons of chromium worker processes | 20:08 |
fungi | this is a 4-core 1.6ghz intel atom x7 so maybe i just need to use a beefier machine | 20:10 |
clarkb | ya my 4 core 7 year old intel cpu is spinning a lot of cycles too | 20:11 |
clarkb | has my cpu temp up to 76C | 20:12 |
clarkb | actually it may be firefox that is slow | 20:16 |
clarkb | I started in FF then switched to chrome after I remembered it is better | 20:16 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Add DaemonSet check for wait-for-pods role https://review.opendev.org/728503 | 20:20 |
fungi | we peaked at 22 participants i think? | 20:23 |
clarkb | my last count was 21 so that is probably correct | 20:23 |
fungi | yeah, i saw 22 on for a bit | 20:23 |
clarkb | mnaser: fungi corvus is there a nodepool thing we still need to track down? | 20:23 |
fungi | clarkb: sounds like we maybe thundering herded an inefficient nova operation in vex | 20:23 |
clarkb | ah ok so related to the service restart | 20:24 |
fungi | yeah, so it seemed given the timing | 20:25 |
fungi | or at least the bulk reenqueue immediately following that restart | 20:25 |
clarkb | taking ttx's mobile app suggestion we might send that out to the jitsi thread on openstack-discuss too | 20:26 |
clarkb | "if your cpu meltsdown, try your phone or tablet" | 20:26 |
fungi | i wonder if my purism5 will support the mobile app, when it eventually ships that is | 20:27 |
fungi | er, librem5 | 20:27 |
corvus | there is a technical reason chromium is more efficient, btw | 20:27 |
corvus | that's probably worth mentioning too (it sends less video data in some circumstances) | 20:28 |
clarkb | corvus: ah ya I already suggested it over firefox in my initial response but details on that would be great | 20:28 |
fungi | direct webrtc implementation in the browser engine? | 20:28 |
clarkb | corvus: my security keys just got dropped off | 20:29 |
clarkb | corvus: I'll have to fiddle with them over the weekend to see how they do | 20:29 |
corvus | fungi: they both do, but they use a different method for something something sending video of different rates something | 20:29 |
corvus | fungi: the way it got lodged in my head is something like "firefox sends multiple streams at different bitrates simultaneously while chromium only sends one" | 20:30 |
fungi | got it, technical stuff ;) | 20:30 |
corvus | fungi: yeah, i think they just need to reverse the polarity, or decouple the heisenberg compensators | 20:31 |
fungi | moar antichronotons | 20:31 |
fungi | or antitachyons, i never can remember which is better | 20:31 |
corvus | just don't cross the streams | 20:32 |
clarkb | zuul memory use continues to look good so I expect python3.7 isn't drastically different than 3.8 with respect to memory consumption | 20:34 |
clarkb | we should add meetpad to cacti | 20:37 |
clarkb | memory use looks good, its definitely using its cpus though | 20:37 |
corvus | load avg is currently 4-5 | 20:38 |
clarkb | I expect if we need to scale it up cpu (and maybe bw) will be the thing to focus on | 20:38 |
clarkb | corvus: ya its an 8cpu host so that should be well within its limits | 20:38 |
corvus | looks like that's the jvb process | 20:38 |
fungi | that's better than the load average for chromium on my machine with only the meetpad page open ;) | 20:38 |
corvus | that's using most of the cpu | 20:38 |
fungi | (still hovering at a load of 10) | 20:38 |
clarkb | I think there are also ways to run maybe jitsi's behind an haproxy | 20:39 |
clarkb | but I haven't looked into that too closely yet (but that might be an answer to scaling up if necessary | 20:39 |
corvus | frickler pointed to some info about scaling | 20:40 |
clarkb | s/maybe/many/ | 20:40 |
clarkb | fwiw it seems to be doing well with a single call this size. | 20:40 |
corvus | and based on this, it seems that the jvb component is the one to focus on | 20:40 |
clarkb | ++ | 20:40 |
corvus | (i think that's the one doing video processing, and the rest is just passing xmpp messages around) | 20:41 |
clarkb | ya jvb == jitsi video bridge | 20:41 |
*** iurygregory has joined #opendev | 20:42 | |
fungi | clarkb: and yeah, sensors(1) says my internal processor core thermocouples are at 79c | 20:47 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add meetpad to cacti https://review.opendev.org/728574 | 20:48 |
clarkb | fungi: corvus ^ there is meetpad in cacti change | 20:48 |
fungi | thanks! | 20:48 |
clarkb | fungi: mine has fallen to 73C | 20:48 |
fungi | clarkb: related monitoring request in a comment on that review | 20:50 |
fungi | if you don' tmind | 20:50 |
fungi | or i can submit a followup change | 20:50 |
clarkb | looking | 20:50 |
clarkb | oh ya let me do that | 20:50 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add meetpad to cacti and ssl certcheck https://review.opendev.org/728574 | 20:51 |
clarkb | fungi: ^ | 20:52 |
fungi | thanks!we run it from the same server, so that reminded me to check | 20:52 |
fungi | clarkb: on a related note, how about 727389 | 20:56 |
clarkb | +2 I can't recall if ianw's set of changes updates that file or ssl domains either | 20:57 |
clarkb | but we should do an accounting of the mirrors as part of that | 20:57 |
fungi | meetpad held up really well | 21:01 |
clarkb | corvus: another takeaway from jvb == load is we might be able to get away with bigger calls/more calls/etc via turning off webcams | 21:02 |
fungi | granted that was probably half the number of people who would normally be in nova's ptg room, but still as a moderate load test of a single room it worked well | 21:02 |
*** diablo_rojo has joined #opendev | 21:03 | |
diablo_rojo | So that seemed to go pretty well. | 21:03 |
diablo_rojo | I did see an email from arkady that he couldnt get in though? | 21:03 |
fungi | "Meetpad did not worked for me" | 21:04 |
fungi | that's... not very actionable | 21:04 |
diablo_rojo | Right. | 21:04 |
diablo_rojo | I asked him if he had errors or the browser.. really any extra detail would help.. | 21:04 |
clarkb | ya anyone want to suggest the mobile app now? | 21:05 |
clarkb | I think that might be a cheat mode for people | 21:05 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Bump ansible-lint to 4.3.0 https://review.opendev.org/702679 | 21:06 |
clarkb | I don't actually now how to connect it to our server though /me looks | 21:06 |
clarkb | its an option under settings. That just makes so much sense :) | 21:07 |
* clarkb composes response to existing thread on openstack-discuss | 21:07 | |
fungi | i will eventually have a smart phone which might be capable of using the mobile app, i really have no idea | 21:07 |
fungi | since the phone runs a modified debian distro and all mainline kernel, i suspect it won't | 21:08 |
clarkb | fungi: I think you'd need an android emulator (whcih does exist) | 21:08 |
clarkb | unsure of how well the existing emulators would handle jitsi though | 21:08 |
fungi | ahh, okay | 21:08 |
fungi | i suspected it might be android-specific in some way | 21:09 |
clarkb | ok response sent | 21:12 |
clarkb | (not to arkady, to the other thread) | 21:13 |
fungi | looks like purism's recent shipping update is projecting mid-august for my preorder fulfilment | 21:13 |
clarkb | fungi: https://anbox.io/ thats one way of doing it | 21:14 |
fungi | neat, thanks! | 21:14 |
clarkb | in fact you/I/we could try that on our desktops | 21:15 |
clarkb | that might be a good workaround for cpu issues if it works | 21:16 |
fungi | true! | 21:16 |
*** paladox has quit IRC | 21:28 | |
*** paladox has joined #opendev | 21:30 | |
openstackgerrit | Merged opendev/system-config master: Add meetpad to cacti and ssl certcheck https://review.opendev.org/728574 | 21:36 |
*** slittle1 has joined #opendev | 22:13 | |
openstackgerrit | Merged opendev/base-jobs master: Add jobs for publishing javascript content https://review.opendev.org/728097 | 22:41 |
*** DSpider has quit IRC | 22:53 | |
*** yuri has joined #opendev | 22:56 | |
*** ysandeep|away is now known as ysandeep | 23:04 | |
*** ysandeep is now known as ysandeep|weekend | 23:09 | |
*** mlavalle has quit IRC | 23:31 | |
*** diablo_rojo has quit IRC | 23:34 | |
openstackgerrit | Matthew Thode proposed openstack/project-config master: drop python2.7 from generate-constraints https://review.opendev.org/728591 | 23:37 |
*** tosky has quit IRC | 23:39 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!