clarkb | ah its going through the ansible log not the job console log | 00:00 |
---|---|---|
corvus | clarkb: so 1deb5f1e391aa7eea4d84b2032bb1c970e005500 would be dev15? | 00:00 |
clarkb | 1deb5f1e391aa7eea4d84b2032bb1c970e005500 is what I find for dev15 | 00:03 |
clarkb | yup | 00:03 |
clarkb | (the thing that makes the weird is that pbr will do 3.3.1.dev$commits since 3.3.0 but git describe will do 3.3.0-$commits since 3.3.0 | 00:03 |
*** jamesmcarthur has joined #openstack-infra | 00:08 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Include host_id for openstack provider https://review.openstack.org/623107 | 00:08 |
corvus | clarkb, pabelanger: none of those changes look suspicous. we don't have objgraph installed on the executors, so we can't get a histogram of memory usage. | 00:11 |
corvus | meanwhile, the current executor behavior is highly anamolous for the last 90 days. this is the first time we've ever had more jobs queued than running. | 00:11 |
corvus | we're running a mere 331 jobs right now | 00:12 |
*** jamesmcarthur has quit IRC | 00:12 | |
clarkb | huh | 00:12 |
pabelanger | agree, have all the executors been rebooted recently? I haven't checked to see if maybe we have a new hwe kernel | 00:12 |
corvus | infra-root: so i'd say that whatever this problem is, it's critical at this point. | 00:12 |
clarkb | pabelanger: I don't think they have ~111 days | 00:12 |
*** _alastor_ has joined #openstack-infra | 00:13 | |
clarkb | corvus: do we want to rotate executors out and see if reboot reset swap and return them to happyness? | 00:13 |
clarkb | (then monitor it for any signs of swap returning) | 00:13 |
clarkb | mostly thinking that without instrumentation we likely aren't going to debug the swappage now, but restarting may give us breathing room | 00:14 |
corvus | clarkb: yeah, i think it's likely to buy us a few days at least, assuming this behavior is consistent with the way it's been for the past week | 00:14 |
corvus | yeah | 00:14 |
pabelanger | +1 | 00:14 |
corvus | so let's pip3 install objgraph on all of them and do that | 00:14 |
pabelanger | ++ | 00:14 |
clarkb | let me know how I can help | 00:14 |
corvus | though i'm just inclined to go ahead and hard-restart them all, rather than rotate out | 00:14 |
corvus | mostly because it's eod | 00:15 |
clarkb | also only using 30% capacity anyway (so the shock is low) | 00:15 |
corvus | okay, objgraph is installed everywhere | 00:15 |
corvus | i'll stop all executors now | 00:16 |
clarkb | status page reflects that executors are stopping | 00:17 |
*** gyee has quit IRC | 00:17 | |
*** _alastor_ has quit IRC | 00:17 | |
clarkb | ze01 appears to have stopped its executor | 00:24 |
corvus | curiously 7 and 9 seem to be the slowest to stop | 00:24 |
corvus | all stopped now; i will reboot them all | 00:25 |
pabelanger | ack | 00:25 |
corvus | they're starting | 00:26 |
corvus | all running except 8,9,10 | 00:27 |
*** wolverineav has quit IRC | 00:27 | |
corvus | all running now; i guess those were just slow to boot | 00:27 |
corvus | gah, i should have deleted the old builds | 00:28 |
clarkb | was that not fixed? | 00:28 |
corvus | clarkb: not merged: https://review.openstack.org/620697 | 00:28 |
clarkb | fwiw ze01 looks sane so far. Memory usage seems to be roughly proportional to the number of processes running | 00:29 |
clarkb | ah | 00:29 |
johnsom | I am guessing you all are talking about the jobs that have been sitting for over an hour, started, but no progress/stream? | 00:29 |
*** wolverineav has joined #openstack-infra | 00:30 | |
clarkb | johnsom: they aren't quite started yet. They go to the empty box on the status page as soon as they have a node assigned aiui, then you have to wait for an executor to pick up that node and start the job. But yes | 00:30 |
johnsom | Yep, cool. Just checking that it's a known issue. | 00:31 |
corvus | i'll manually delete some build dirs (lots of old stuff sitting around will cause du to waste cycles) | 00:31 |
clarkb | snmp hasn't quite caught up on all the nodes according to cacti but spot checking by hand it looks like things haven't immediately returned to the former state | 00:32 |
clarkb | corvus: another thing I notice is that ansible 2.5 had a release at the end of october that we may have pciekd up? there have been a couple since then too (whcih we have been using on more recent restarts) | 00:34 |
corvus | clarkb: yeah, i wonder if something about that could affect it. we don't import it or anything, but it could be using more memory and driving us into swap in general. or outputting more data that we capture or something. | 00:35 |
clarkb | ya. The change log https://github.com/ansible/ansible/blob/stable-2.5/changelogs/CHANGELOG-v2.5.rst looks sane though | 00:35 |
clarkb | we are now running more jobs than are queued | 00:37 |
*** jaosorior has quit IRC | 00:37 | |
corvus | #status log rebooted all zuul executors (ze01-ze11) due to suspected performance degredation due to swap. underlying cause is unclear, but may be due to a regression in zuul introduced since 3.3.0, or in dependencies (including ansible). objgraph installed on all executors to support future memory profiling. | 00:40 |
openstackstatus | corvus: finished logging | 00:40 |
corvus | clarkb, pabelanger, tobiash: i'm not 100% sure i want to put ze12 into production at this point. we may have been wrong about our supposition for the increased queued jobs. | 00:42 |
pabelanger | sure, makes sense | 00:42 |
clarkb | ya if slowness was caused by memory issues we may not need it | 00:43 |
clarkb | corvus: fwiw I did approve the change to puppet ze12 | 00:43 |
clarkb | do we need to -W it? | 00:43 |
corvus | especially based on the sort of exponential regression we were seeing | 00:43 |
pabelanger | yah, we should in next 5mins, about to merge | 00:43 |
corvus | i'll -w it | 00:43 |
pabelanger | I'll look at sf.io tomorrow to see if we are also seeing an increase of swap | 00:44 |
clarkb | corvus: on ze01 and ze03 there are a few megabytes of swap being used, none of it from the two zuul-executor processes | 00:44 |
*** yamamoto has joined #openstack-infra | 00:45 | |
corvus | clarkb: yeah, looking at the list it seems pretty reasonable -- kernel just moving idle stuff out of the way | 00:46 |
corvus | also, we don't need to run apache on those servers | 00:46 |
clarkb | ++ | 00:46 |
corvus | i've deleted old build dirs from the 3 largest offenders, so the servers should be generally in-line now. there may be a few stragglers, but no big deal | 00:47 |
corvus | clarkb, pabelanger: it's possible that ansible is using more memory and the only thing to do about it is just to add more executors after all | 00:50 |
corvus | i kinda don't want to jump to that conclusion though | 00:50 |
pabelanger | Yah, I can also look at open issues in ansible/ansible tomorrow, see if anybody has reported anything | 00:50 |
clarkb | corvus: looks like there are ~200 playbooks running on ze01 but only ~60 jobs? | 00:51 |
pabelanger | like clarkb said, there have been a few releases of 2.5 recently | 00:51 |
clarkb | I guess that could be ansible forking | 00:51 |
clarkb | ah yup it appears there are multiple ansible playbook processes running concurrently per build | 00:51 |
*** Swami has quit IRC | 00:52 | |
corvus | queued jobs: 0 | 00:52 |
pabelanger | yay | 00:52 |
clarkb | corvus: if that is the case we'd still want to hve the governors throttle such that they don't swap | 00:53 |
clarkb | though the swap was from the zuul-executor process so I dunno | 00:54 |
corvus | clarkb: the zuul-executor process on ze01 is the same virt size it was before the reboot | 00:54 |
corvus | 1882 zuul 20 0 5534272 175336 10224 S 45.5 2.1 9:10.47 zuul-executor | 00:54 |
corvus | less resident | 00:54 |
*** Belgar81 has joined #openstack-infra | 00:55 | |
clarkb | and about 10mB into swap now | 00:55 |
clarkb | (far less than before) | 00:55 |
corvus | but given that we're running all out now, and we've achieved the same virtual size as before, and pretty close to the same resident size (what was it, like 300000 or 400000 before?) i'm not sure the executor is going to turn out to be the smoking gun | 00:56 |
corvus | i'm going to eod now | 00:58 |
clarkb | ya I need to pop out myself. | 00:59 |
clarkb | ianw and/or fungi if amorin wanders by later today/tomorrow morning (relative to me) maybe you can point out https://etherpad.openstack.org/p/bhs1-test-node-slowness I'ev triedto capture what we/I know there | 00:59 |
clarkb | corvus: thinking out loud here it might be good to instrument things like ansible as used by zuul so that we'll know if/when there are regressions in performance or resource usage | 01:00 |
clarkb | that feedback might also be useful to ansible tiself | 01:01 |
*** sthussey has quit IRC | 01:17 | |
*** yamamoto has quit IRC | 01:18 | |
*** yamamoto has joined #openstack-infra | 01:18 | |
*** tosky has quit IRC | 01:19 | |
*** rkukura has quit IRC | 01:20 | |
*** harlowja has quit IRC | 01:27 | |
*** markvoelker has quit IRC | 01:33 | |
rm_work | hey, how complex is the process of getting a cloud added to nodepool? including technical and legal/political/whatever -- I assume there's all sorts of things that need to be signed? | 01:35 |
*** betherly has joined #openstack-infra | 01:40 | |
*** betherly has quit IRC | 01:44 | |
*** david-lyle has joined #openstack-infra | 01:48 | |
*** manjeets_ has joined #openstack-infra | 01:49 | |
clarkb | rm_work: https://docs.openstack.org/infra/system-config/contribute-cloud.html is the doc we have for it. It tends to be more informal and we try to do our best to accomodate the needs of the provider | 01:49 |
rm_work | cool cool | 01:50 |
clarkb | corvus: I have a really derpy script at ze01:~clarkb/swap.sh that looks for playbooks using more than 60MB ish of swap. It seems that "larger" jobs tend to be in that club, things like grenade and tripleo and lbaas jobs | 01:50 |
*** dklyle has quit IRC | 01:51 | |
*** manjeets has quit IRC | 01:51 | |
clarkb | also it seems that swap usage may have stablizied a bit. And now really calling it a day | 01:51 |
*** _alastor_ has joined #openstack-infra | 02:13 | |
*** mrsoul has quit IRC | 02:15 | |
*** _alastor_ has quit IRC | 02:18 | |
*** hongbin has joined #openstack-infra | 02:35 | |
*** wolverineav has quit IRC | 02:40 | |
*** bhavikdbavishi has joined #openstack-infra | 02:41 | |
*** wolverineav has joined #openstack-infra | 02:41 | |
*** ykarel has joined #openstack-infra | 02:41 | |
*** bhavikdbavishi1 has joined #openstack-infra | 02:44 | |
*** yamamoto has quit IRC | 02:45 | |
*** bhavikdbavishi has quit IRC | 02:45 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 02:45 | |
*** wolverineav has quit IRC | 02:46 | |
*** betherly has joined #openstack-infra | 02:51 | |
*** imacdonn has quit IRC | 02:53 | |
*** imacdonn has joined #openstack-infra | 02:53 | |
*** betherly has quit IRC | 02:55 | |
*** rlandy|bbl is now known as rlandy | 03:09 | |
*** rlandy has quit IRC | 03:10 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Include host_id for openstack provider https://review.openstack.org/623107 | 03:12 |
*** wolverineav has joined #openstack-infra | 03:21 | |
*** yamamoto has joined #openstack-infra | 03:24 | |
*** wolverineav has quit IRC | 03:26 | |
*** wolverineav has joined #openstack-infra | 03:30 | |
*** psachin has joined #openstack-infra | 03:32 | |
*** wolverineav has quit IRC | 03:34 | |
*** yamamoto has quit IRC | 03:34 | |
*** ramishra has quit IRC | 03:36 | |
*** wolverineav has joined #openstack-infra | 03:46 | |
*** hwoarang has quit IRC | 03:47 | |
*** hwoarang has joined #openstack-infra | 03:50 | |
*** wolverineav has quit IRC | 04:02 | |
*** wolverineav has joined #openstack-infra | 04:03 | |
*** diablo_rojo has quit IRC | 04:06 | |
*** wolverineav has quit IRC | 04:07 | |
*** yamamoto has joined #openstack-infra | 04:29 | |
*** betherly has joined #openstack-infra | 04:32 | |
*** hongbin has quit IRC | 04:33 | |
*** janki has joined #openstack-infra | 04:34 | |
*** ramishra has joined #openstack-infra | 04:35 | |
*** betherly has quit IRC | 04:37 | |
*** rh-jelabarre has quit IRC | 04:41 | |
*** ykarel is now known as ykarel|afk | 04:50 | |
*** lpetrut has joined #openstack-infra | 04:52 | |
*** ykarel|afk has quit IRC | 04:54 | |
*** yboaron has joined #openstack-infra | 05:02 | |
*** apetrich has quit IRC | 05:07 | |
*** ykarel|afk has joined #openstack-infra | 05:10 | |
*** ykarel|afk is now known as ykarel | 05:10 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] rhel8 beta support https://review.openstack.org/623137 | 05:13 |
*** ahosam has joined #openstack-infra | 05:32 | |
*** lpetrut has quit IRC | 05:36 | |
*** wolverineav has joined #openstack-infra | 05:43 | |
*** apetrich has joined #openstack-infra | 05:48 | |
tonyb | tobias-urdin: I sent a list of repos to openstack-discuss can you verify them and then I'll get them taken care of | 05:49 |
*** wolverineav has quit IRC | 05:50 | |
*** _alastor_ has joined #openstack-infra | 06:15 | |
*** _alastor_ has quit IRC | 06:19 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size https://review.openstack.org/622010 | 06:43 |
*** ahosam has quit IRC | 07:00 | |
*** wolverineav has joined #openstack-infra | 07:05 | |
*** wolverineav has quit IRC | 07:10 | |
*** bhavikdbavishi has quit IRC | 07:13 | |
*** jtomasek has joined #openstack-infra | 07:22 | |
*** ahosam has joined #openstack-infra | 07:24 | |
*** yboaron has quit IRC | 07:26 | |
*** yboaron has joined #openstack-infra | 07:26 | |
*** dpawlik has joined #openstack-infra | 07:28 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Report tenant and project specific resource usage stats https://review.openstack.org/616306 | 07:33 |
*** gfidente has joined #openstack-infra | 07:35 | |
*** e0ne has joined #openstack-infra | 07:38 | |
*** ginopc has joined #openstack-infra | 07:50 | |
*** irdr has quit IRC | 07:55 | |
*** rcernin has quit IRC | 07:56 | |
amorin | hey all | 07:57 |
*** pcaruana has joined #openstack-infra | 07:58 | |
*** pcaruana is now known as muttley | 07:58 | |
*** shardy has joined #openstack-infra | 08:02 | |
*** rcernin has joined #openstack-infra | 08:03 | |
*** shardy has quit IRC | 08:05 | |
amorin | so as far as I can read, the results are a little bit better since I moved the disk sched to deadline, but this is still not perfect. | 08:05 |
*** lpetrut has joined #openstack-infra | 08:06 | |
amorin | on my side, I am investigating two things: enabling back VMX flag on cpu (for nested virt) | 08:06 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor jobs page to use a reducer https://review.openstack.org/621396 | 08:06 |
*** slaweq has joined #openstack-infra | 08:06 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor job page to use a reducer https://review.openstack.org/623156 | 08:06 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor tenants page to use a reducer https://review.openstack.org/623157 | 08:06 |
amorin | and also completely removing iotune on osf flavors | 08:06 |
amorin | cc fungi clarkb mordred dmsimard | 08:07 |
*** slaweq has quit IRC | 08:15 | |
*** florianf|afk is now known as florianf | 08:17 | |
*** ykarel is now known as ykarel|lunch | 08:18 | |
frickler | amorin: the comment regarding kvm caching iiuc would amount to setting "libvirt:disk_cachemodes=writeback" in nova.conf | 08:18 |
frickler | amorin: but I'd defer that to a second step | 08:18 |
frickler | amorin: looking at the last 6h, I still see about 50% of the job timeouts on bhs1, which sadly doesn't look like much progress | 08:20 |
*** ralonsoh has joined #openstack-infra | 08:21 | |
amorin | frickler: ok | 08:26 |
*** shardy has joined #openstack-infra | 08:28 | |
tobias-urdin | tonyb: will check it out asap, sorry missed that yesterday was out on adventures | 08:32 |
*** rcernin has quit IRC | 08:33 | |
tonyb | tobias-urdin: All good. I hope they were good adventures :) | 08:34 |
tobias-urdin | tonyb: answered on ML, but that list is correct and complete, thanks tonyb! | 08:42 |
*** AJaeger has quit IRC | 08:49 | |
*** AJaeger has joined #openstack-infra | 08:51 | |
*** ahosam has quit IRC | 08:54 | |
*** bhavikdbavishi has joined #openstack-infra | 08:55 | |
*** bkero has quit IRC | 08:55 | |
*** jpich has joined #openstack-infra | 08:56 | |
*** lpetrut has quit IRC | 08:57 | |
*** tosky has joined #openstack-infra | 09:00 | |
*** markvoelker has joined #openstack-infra | 09:01 | |
*** ahosam has joined #openstack-infra | 09:01 | |
*** kjackal has joined #openstack-infra | 09:10 | |
*** gfidente has quit IRC | 09:14 | |
*** ramishra has quit IRC | 09:19 | |
*** ramishra has joined #openstack-infra | 09:21 | |
*** ahosam has quit IRC | 09:26 | |
*** ahosam has joined #openstack-infra | 09:26 | |
*** ykarel|lunch is now known as ykarel | 09:27 | |
*** witek has joined #openstack-infra | 09:28 | |
*** dtantsur|afk is now known as dtantsur | 09:29 | |
*** markvoelker has quit IRC | 09:34 | |
*** derekh has joined #openstack-infra | 09:44 | |
*** sshnaidm|afk has quit IRC | 09:45 | |
*** sshnaidm|afk has joined #openstack-infra | 09:46 | |
*** bhavikdbavishi has quit IRC | 09:49 | |
*** electrofelix has joined #openstack-infra | 10:04 | |
dulek | Hey, is it possible to sync job lists between two repos in Zuul v3? | 10:05 |
dulek | It's a bit hard to keep tempest plugin repo job list in sync with the main repo manually. | 10:06 |
*** sshnaidm|afk is now known as sshnaidm | 10:12 | |
*** ahosam has quit IRC | 10:15 | |
*** e0ne has quit IRC | 10:24 | |
*** gfidente has joined #openstack-infra | 10:26 | |
*** yboaron_ has joined #openstack-infra | 10:26 | |
*** e0ne has joined #openstack-infra | 10:27 | |
*** yboaron has quit IRC | 10:29 | |
*** markvoelker has joined #openstack-infra | 10:31 | |
frickler | dulek: iiuc the usual solution would be to use project-templates, see https://zuul-ci.org/docs/zuul/user/config.html#project-template | 10:32 |
*** sshnaidm has quit IRC | 10:33 | |
*** sshnaidm has joined #openstack-infra | 10:34 | |
dulek | frickler: Oh my, how early is Christmas this year, this is awesome! | 10:34 |
dulek | Thanks! | 10:34 |
frickler | for an example see http://git.openstack.org/cgit/openstack/designate/tree/.zuul.yaml#n155 vs. http://git.openstack.org/cgit/openstack/designate-tempest-plugin/tree/.zuul.yaml | 10:35 |
*** Belgar81 has quit IRC | 10:54 | |
*** markvoelker has quit IRC | 11:05 | |
*** yboaron_ has quit IRC | 11:05 | |
*** yboaron_ has joined #openstack-infra | 11:08 | |
*** jesusaur has quit IRC | 11:27 | |
*** jesusaur has joined #openstack-infra | 11:31 | |
*** yboaron_ has quit IRC | 11:33 | |
*** bhavikdbavishi has joined #openstack-infra | 11:48 | |
*** agopi has quit IRC | 11:52 | |
*** gfidente has quit IRC | 11:54 | |
stephenfin | fungi, clarkb: When you're about, fancy taking a look at these three doc changes for git-review? https://review.openstack.org/#/q/project:openstack-infra/git-review+status:open+owner:%22Stephen+Finucane+%253Cstephenfin%2540redhat.com%253E%22 | 11:58 |
ssbarnea|rover | does anyone knows what is the bashate friendly of doing something like local foo=$(cmd) ? -- see https://github.com/openstack-dev/bashate/blob/master/bashate/messages.py#L166 | 12:01 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/git-review master: Stash and unstash changes during download https://review.openstack.org/340024 | 12:07 |
*** psachin has quit IRC | 12:08 | |
*** sshnaidm is now known as sshnaidm|bbl | 12:08 | |
*** irdr has joined #openstack-infra | 12:13 | |
*** _alastor_ has joined #openstack-infra | 12:18 | |
*** yamamoto has quit IRC | 12:22 | |
*** _alastor_ has quit IRC | 12:23 | |
*** ahosam has joined #openstack-infra | 12:27 | |
*** mriedem has joined #openstack-infra | 12:27 | |
*** gfidente has joined #openstack-infra | 12:28 | |
*** yboaron_ has joined #openstack-infra | 12:30 | |
*** yamamoto has joined #openstack-infra | 12:31 | |
*** pbourke has quit IRC | 12:40 | |
*** pbourke has joined #openstack-infra | 12:40 | |
*** udesale has joined #openstack-infra | 12:41 | |
openstackgerrit | Dirk Mueller proposed openstack-infra/openstack-zuul-jobs master: use opensuse15 as generic name instead of opensuse150 https://review.openstack.org/619628 | 12:45 |
*** tpsilva has joined #openstack-infra | 12:48 | |
frickler | ssbarnea|rover: split it into two commands: local foo; foo=$(cmd) | 12:49 |
*** hjensas has quit IRC | 12:49 | |
ssbarnea|rover | fresta: thanks. I was considering it but I was not sure if that had the desired behavior. | 12:49 |
*** betherly has joined #openstack-infra | 12:52 | |
*** eharney has quit IRC | 12:54 | |
openstackgerrit | Merged openstack/os-performance-tools master: Change openstack-dev to openstack-discuss https://review.openstack.org/622173 | 12:56 |
*** rh-jelabarre has joined #openstack-infra | 12:57 | |
*** rlandy has joined #openstack-infra | 12:58 | |
*** bobh has quit IRC | 13:00 | |
*** bobh has joined #openstack-infra | 13:00 | |
*** betherly has quit IRC | 13:01 | |
openstackgerrit | Ghanshyam Mann proposed openstack-dev/hacking master: Fix 'ref' format errors in README file https://review.openstack.org/623203 | 13:05 |
openstackgerrit | Ghanshyam Mann proposed openstack-dev/hacking master: Fix 'ref' format errors in README file https://review.openstack.org/623203 | 13:06 |
*** muttley has quit IRC | 13:08 | |
*** boden has joined #openstack-infra | 13:15 | |
*** dave-mccowan has joined #openstack-infra | 13:16 | |
*** rcernin has joined #openstack-infra | 13:17 | |
fungi | #status log deleted stale /var/log/exim4/paniclog on ns2.opendev.org to silence nightly cron alert e-mails about it | 13:17 |
openstackstatus | fungi: finished logging | 13:17 |
*** dave-mccowan has quit IRC | 13:21 | |
*** muttley has joined #openstack-infra | 13:21 | |
*** muttley has quit IRC | 13:25 | |
*** muttley has joined #openstack-infra | 13:26 | |
Linkid | hi | 13:29 |
*** rcernin has quit IRC | 13:29 | |
Linkid | fungi: about peertube, can I add a page here : https://docs.openstack.org/infra/system-config/systems.html ? | 13:29 |
*** muttley has quit IRC | 13:29 | |
Linkid | (as WIP) | 13:29 |
*** yboaron_ has quit IRC | 13:29 | |
AJaeger | Linkid: a spec is the better next step | 13:30 |
AJaeger | Linkid: against openstack-infra/infra-specs repo | 13:30 |
AJaeger | Linkid: once the spec is approved, adding a page is one step | 13:30 |
Linkid | and corvus told about ansible to deploy services, but I only saw puppet classes for services on the system-config repo | 13:30 |
fungi | Linkid: yes, in whatever change implements the configuration management for the service you would also add a document there explaining its management, but AJaeger is right after the mailing list discussion the next step is likely an infra-spec describing how we will get it bootstrapped | 13:31 |
Linkid | ok, I'll make a spec this Friday or this week-end, then | 13:31 |
Linkid | (I don't have enough time today) | 13:32 |
fungi | Linkid: there is a template file in the openstack-infra/infra-specs repo you can fill in with the proposal, see the readme in that repo for instrructions | 13:32 |
Linkid | oh, great :) | 13:33 |
fungi | and there's no hurry, we operate on the assumption that people are working on these sorts of things in their spare/volunteer time anyway | 13:33 |
fungi | and feel free to look at other approved specs in that repo for examples, or ask questions in here or on the ml if you need help | 13:34 |
*** pcaruana has joined #openstack-infra | 13:34 | |
Linkid | ok, I will read some other specs today, I think | 13:35 |
*** kota_ has quit IRC | 13:36 | |
Linkid | thanks for your help :) | 13:36 |
fungi | my pleasure! | 13:37 |
*** ahosam has quit IRC | 13:37 | |
*** ahosam has joined #openstack-infra | 13:37 | |
fungi | rereading the readme in the specs repo, i can see that it could use some clarifications too. i'll improve it a bit here shortly | 13:38 |
*** kota_ has joined #openstack-infra | 13:38 | |
*** pcaruana has quit IRC | 13:39 | |
*** rfolco has quit IRC | 13:41 | |
*** rfolco has joined #openstack-infra | 13:41 | |
*** yamamoto has quit IRC | 13:41 | |
*** pcaruana has joined #openstack-infra | 13:43 | |
*** pcaruana has quit IRC | 13:47 | |
*** ahosam has quit IRC | 13:47 | |
*** kgiusti has joined #openstack-infra | 13:47 | |
*** ykarel is now known as ykarel|away | 13:48 | |
*** dpawlik has quit IRC | 13:50 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Overhaul instructions in README.rst for clarity https://review.openstack.org/623211 | 13:51 |
fungi | Linkid: ^ those updates to the readme might be helpful to you | 13:52 |
*** bhavikdbavishi has quit IRC | 13:53 | |
Linkid | ok, I'll take a look | 13:55 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Overhaul instructions in README.rst for clarity https://review.openstack.org/623211 | 13:56 |
*** agopi has joined #openstack-infra | 13:57 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Overhaul instructions in README.rst for clarity https://review.openstack.org/623211 | 13:58 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Overhaul instructions in README.rst for clarity https://review.openstack.org/623211 | 14:00 |
fungi | okay, i think i'm happy with it now ;) | 14:00 |
amorin | fungi: AJaeger clarkb I just enabled the cpu VMX flag on BHS1, so now, you should be able to spawn icccccdlucfbribncbuvefvbjlbeeckvvikkcvuhtdgn | 14:00 |
amorin | instances | 14:01 |
*** eharney has joined #openstack-infra | 14:01 | |
fungi | those are some fun looking instances | 14:01 |
amorin | with nested virt | 14:01 |
amorin | yes they are :p | 14:01 |
amorin | sorry | 14:01 |
fungi | no worries. i fall asleep on my keyboard all the time ;) | 14:01 |
fungi | and thanks! | 14:01 |
*** pgaxatte has joined #openstack-infra | 14:03 | |
*** pgaxatte has left #openstack-infra | 14:03 | |
*** pgaxatte has joined #openstack-infra | 14:04 | |
*** eernst has joined #openstack-infra | 14:07 | |
*** jcoufal has joined #openstack-infra | 14:10 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Overhaul instructions in README.rst for clarity https://review.openstack.org/623211 | 14:11 |
fungi | okay, now i'm really done with it... i think | 14:12 |
*** sthussey has joined #openstack-infra | 14:13 | |
frickler | fungi: oh, since when do we only need the Task: header? that feature has managed to avoid making itself known to me so far | 14:17 |
fungi | frickler: ever since https://review.openstack.org/607699 merged on halloween | 14:20 |
fungi | hrm, though i still need to restart the gerrit service on review.o.o to pick that up. it's been running undisturbed since august 3 | 14:21 |
fungi | that's probably why it is unknown to you, i haven't announced it working because gerrit hasn't been restarted | 14:22 |
fungi | perhaps i can do a quick gerrit restart late today when things (hopefully) quiet down | 14:22 |
*** jaosorior has joined #openstack-infra | 14:23 | |
mordred | fungi: I really need to start working on the gerrit upgrade again | 14:23 |
mordred | fungi: too many plates spinnin | 14:23 |
mordred | fungi: it looks like the project rename plugin is actually real now though https://gerrit.googlesource.com/plugins/rename-project/+/master/src/main/resources/Documentation/about.md | 14:24 |
*** yamamoto has joined #openstack-infra | 14:24 | |
fungi | yay! that'll be swell. once we have a new enough gerrit to use it | 14:25 |
mordred | yah. I'll be glad when that's no longer a downtime event | 14:25 |
frickler | fungi: hmm, IIUC that patch only creates the hyperlink to the task, will updates to the story still get posted on storyboard when a patch contains only the task reference? | 14:26 |
fungi | frickler: yes, in fact only the task footer causes story updates to happen | 14:27 |
fungi | after digging into the current implementation of the its-storyboard plugin for gerrit, it does nothing at all with story ids, and only acts on task ids | 14:27 |
fungi | we had wanted it to leave story comments when the story footer was included in a change even without a task footer, but it seems that was never actually implemented | 14:28 |
*** eernst has quit IRC | 14:28 | |
*** ykarel|away has quit IRC | 14:29 | |
fungi | so if someone with a good grasp of java wants to work on the its-storyboard plugin for gerrit, that would be a good next feature | 14:29 |
frickler | fungi: o.k., going via the task seems to work just as well, so fine for me | 14:29 |
*** psachin has joined #openstack-infra | 14:31 | |
*** janki has quit IRC | 14:33 | |
*** yamamoto has quit IRC | 14:37 | |
mordred | infra-root: keystoneauth1 3.11.2 has been released, which has the fix for rackspace discovery in it | 14:38 |
mordred | it should be safe to unpin nodepool and to use latest sdk for launch-node now | 14:39 |
mordred | but I'm on a plane, so I'm not going to do any of those things right now | 14:39 |
fungi | also i notice we still have some significant gaps in executor availability so we may want to proceed with the ze12 addition | 14:40 |
*** rkukura has joined #openstack-infra | 14:44 | |
*** Swami has joined #openstack-infra | 14:44 | |
*** sshnaidm|bbl is now known as sshnaidm | 14:51 | |
*** _alastor_ has joined #openstack-infra | 14:53 | |
*** janki has joined #openstack-infra | 14:54 | |
*** Swami has quit IRC | 14:55 | |
*** Swami has joined #openstack-infra | 14:56 | |
*** _alastor_ has quit IRC | 14:57 | |
fungi | oh joy, now spammers seem to be mistyping their spoofed domain and i'm getting messages into the openstack-discuss moderation queue for random addresses @q.com instead of @qq.com | 14:59 |
fungi | on the order of one every few seconds | 14:59 |
*** ramishra has quit IRC | 15:00 | |
fungi | updated the nonmember discard filter to ^[0-9]+@q+\.com$ now | 15:00 |
fungi | and my renovation contractors are making me high on spray-foam insulation fumes | 15:03 |
fungi | i should open a window but it's windy and close to freezing outside right now | 15:03 |
amorin | where are you from? | 15:04 |
fungi | a barrier island in the atlantic, off the coast of north carolina (usa) | 15:05 |
fungi | we're ~16km from shore | 15:06 |
amorin | nice, windy situation I guess | 15:06 |
openstackgerrit | Merged openstack-dev/hacking master: Fix 'ref' format errors in README file https://review.openstack.org/623203 | 15:07 |
*** alexchadin has quit IRC | 15:07 | |
*** ykarel|away has joined #openstack-infra | 15:08 | |
fungi | yeah, the water is no more than 30 meters from my house, at the end of my yard, so very windy | 15:08 |
fungi | nothing to slow it down | 15:08 |
*** ykarel|away is now known as ykarel | 15:08 | |
fungi | i ended up opening a window anyway because i figure i'm far less likely to pass out from hypothermia than hypoxia (and i can always at least put on a jacket) | 15:09 |
*** dpawlik has joined #openstack-infra | 15:10 | |
*** bobh has quit IRC | 15:12 | |
*** jcoufal_ has joined #openstack-infra | 15:14 | |
*** jcoufal_ has quit IRC | 15:15 | |
*** dpawlik has quit IRC | 15:15 | |
*** bobh has joined #openstack-infra | 15:16 | |
*** jcoufal_ has joined #openstack-infra | 15:16 | |
*** dpawlik has joined #openstack-infra | 15:17 | |
openstackgerrit | Chris Dent proposed openstack-infra/openstack-zuul-jobs master: Make lower-constraints job use python 3.6 https://review.openstack.org/623229 | 15:17 |
*** dpawlik has quit IRC | 15:17 | |
*** jcoufal has quit IRC | 15:17 | |
fungi | we nearly caught up on node requests around 1300z but i guess then north america woke up | 15:17 |
*** dpawlik has joined #openstack-infra | 15:18 | |
*** slaweq has joined #openstack-infra | 15:23 | |
*** slaweq has quit IRC | 15:29 | |
*** bobh has quit IRC | 15:31 | |
mordred | stupid north america | 15:32 |
*** adam_zhang has joined #openstack-infra | 15:33 | |
mordred | fungi: you should also ventiilate for a thile once they're done with that spray foam - it offgasses for a while, is my understanding | 15:33 |
*** jhesketh has quit IRC | 15:34 | |
*** adam_zhang has quit IRC | 15:35 | |
*** jhesketh has joined #openstack-infra | 15:35 | |
fungi | yeah | 15:35 |
fungi | luckily this house is fairly leaky already (part of why we're renovating the downstairs entry instead of just repairing it) | 15:36 |
fungi | just an unfortunate time of year to need to leave windows open | 15:36 |
*** bobh has joined #openstack-infra | 15:37 | |
mordred | ++ | 15:40 |
*** bobh has quit IRC | 15:42 | |
*** jamesmcarthur has joined #openstack-infra | 15:43 | |
jrosser | could i get some more eyes on this when folks have a moment? https://review.openstack.org/#/c/622169/ | 15:45 |
corvus | fungi: good morning; i'll start looking at data in a bit | 15:46 |
frickler | jrosser: done | 15:47 |
jrosser | frickler: thanks :) | 15:47 |
mriedem | clarkb: also for you https://bugs.launchpad.net/nova/+bug/1807219 | 15:51 |
openstack | Launchpad bug 1807219 in OpenStack Compute (nova) "SchedulerReporClient init slows down nova-api startup" [Medium,Triaged] | 15:51 |
mriedem | working a patch now | 15:51 |
*** zul has joined #openstack-infra | 15:52 | |
*** dpawlik has quit IRC | 15:55 | |
*** bobh has joined #openstack-infra | 15:55 | |
*** ramishra has joined #openstack-infra | 15:59 | |
*** bobh has quit IRC | 16:00 | |
corvus | fungi, clarkb: i'm going to sigusr2 ze01 to get an objgraph list | 16:00 |
*** _alastor_ has joined #openstack-infra | 16:00 | |
fungi | oh! right, it was so late for me i didn't commit to memory that we'd added it to all the executors | 16:01 |
*** jcoufal_ has quit IRC | 16:01 | |
*** janki is now known as janki|dinner | 16:02 | |
corvus | here are the object counts: http://paste.openstack.org/show/736764/ | 16:04 |
openstackgerrit | Merged openstack-infra/project-config master: Add centos/suse to OSA grafana dashboard https://review.openstack.org/622169 | 16:04 |
corvus | our first class on that list is Repo. and 1700 repos sounds about right. | 16:05 |
fungi | yep | 16:08 |
corvus | i agree with clarkb; i wish we had historical values for "how much memory ansible is using". also, for that matter, "how much memory the executor process is using" | 16:09 |
corvus | cause at this point, all we suspect is something changed and we don't even know which piece of software. | 16:09 |
fungi | it does at least look like we're not swapping as hard since the restart | 16:10 |
*** takamatsu has quit IRC | 16:10 | |
corvus | fungi: yeah, we seem to be around 2g, which is a value we've encountered before in our history without too much issue. | 16:11 |
*** bobh has joined #openstack-infra | 16:17 | |
*** jcoufal has joined #openstack-infra | 16:20 | |
*** bobh has quit IRC | 16:22 | |
clarkb | corvus what I found interesting is ansible memory/swap use seems to correlate to the job playbooks | 16:25 |
clarkb | grenade and tripleo were showing up abunch | 16:25 |
mordred | you know ... | 16:26 |
mordred | in the callback plugins, we actually have the entire log data in memory for the entire job in RAM | 16:26 |
mordred | at least in the json one | 16:26 |
openstackgerrit | Frank Kloeker proposed openstack-infra/irc-meetings master: Change meeting time and format for Docs & I18n team https://review.openstack.org/623242 | 16:27 |
mordred | which we can improve by switching that to be yaml and use multiple documents which we can just append without reading the old data like we discussed in berlin | 16:27 |
mordred | so - grenade and tripleo being long/complex and potentially verbose could be causing the json callback plugin to eat ram | 16:27 |
mordred | that said - we could even improve the json plugin by only reading the old data in to memory right before doing the append and write out - so that it's not holding the RAM for the whole playbook invocation | 16:28 |
*** e0ne has quit IRC | 16:30 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Read old json data right before writing new data https://review.openstack.org/623245 | 16:30 |
mordred | clarkb, corvus: ^^ like that | 16:30 |
*** ykarel is now known as ykarel|away | 16:33 | |
clarkb | ah ya if thats held for say 2 hoursby gremade or tripleo I could see how those would be good swap candidates | 16:37 |
mriedem | clarkb: btw, this takes 26 seconds on nova-api startup: | 16:37 |
mriedem | Dec 05 20:13:23.060958 ubuntu-xenial-ovh-bhs1-0000959981 devstack@n-api.service[23459]: running "unix_signal:15 gracefully_kill_them_all" (master-start)... | 16:37 |
mriedem | http://logs.openstack.org/01/619701/5/gate/tempest-slow/2bb461b/controller/logs/screen-n-api.txt.gz#_Dec_05_20_13_23_060958 | 16:37 |
mriedem | then we spend about 27 seconds loading up API extensions | 16:38 |
*** bobh has joined #openstack-infra | 16:38 | |
clarkb | mriedem: ya is it waiting on child pids to go away? that looked like uwsgi not nova? | 16:38 |
mriedem | we're looking into the latter but i don't know if we can control the former | 16:38 |
*** rossella_s has quit IRC | 16:38 | |
mriedem | yeah i'm not sure what's doing that | 16:40 |
*** psachin has quit IRC | 16:41 | |
mriedem | http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/apache#n272 | 16:41 |
mriedem | devstack sets the hook | 16:41 |
*** bobh has quit IRC | 16:42 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add appending yaml log plugin https://review.openstack.org/623256 | 16:47 |
mordred | and there's the yaml version | 16:47 |
*** pgaxatte has quit IRC | 16:49 | |
clarkb | mordred: left a small comment on the json change but +2 | 16:49 |
mordred | clarkb: yeah - totally. that would be better for sure | 16:53 |
*** bobh has joined #openstack-infra | 16:56 | |
clarkb | mriedem: reading uwsgi that sets up a hook that on SIGTERM (signal 15) it calls the kill them all gracefully function | 16:59 |
clarkb | mriedem: I wonder if that is actually slow or if uwsgi is just not logging what it is doing in the interim well | 16:59 |
*** bobh has quit IRC | 17:01 | |
mriedem | clarkb: same, i suspect it's doing other things but not logging it | 17:01 |
mriedem | i'll enable debug logging and see if that shows anything | 17:07 |
mriedem | 26 https://review.openstack.org/623265 | 17:12 |
clarkb | mriedem: like a season of 24 but two episodes longer | 17:13 |
*** Swami has quit IRC | 17:14 | |
*** janki|dinner has quit IRC | 17:16 | |
*** bobh has joined #openstack-infra | 17:16 | |
*** jamesmcarthur has quit IRC | 17:17 | |
mriedem | heh the 26 was typing in the wrong window | 17:17 |
mriedem | i can't watch 24, the ads alone with kiefer constantly yelling is just too much | 17:18 |
mriedem | "the pop tarts are done omfg!!!" | 17:18 |
clarkb | mnaser: followup on centos images. All regions but inap-mtl1 should have centos 7.6 now | 17:18 |
clarkb | mnaser: waiting on inap upload to complete | 17:18 |
clarkb | mriedem: I couldn't watch it when broadcast but managed to get through the first season on netflex relatively recently | 17:19 |
openstackgerrit | Frank Kloeker proposed openstack-infra/irc-meetings master: Change meeting time and format for Docs & I18n team https://review.openstack.org/623242 | 17:19 |
*** bobh has quit IRC | 17:21 | |
*** jamesmcarthur has joined #openstack-infra | 17:22 | |
clarkb | fungi: heh ns2 now emails about packages on hold? | 17:25 |
clarkb | https://packages.ubuntu.com/bionic/netplan.io | 17:25 |
fungi | clarkb: that's yet another package which can't be upgraded because it will bring in a new dependency | 17:31 |
*** kjackal has quit IRC | 17:32 | |
*** jtomasek has quit IRC | 17:32 | |
corvus | fungi, clarkb: i think we should throw ze12 at the problem. | 17:32 |
clarkb | corvus: ok, just a matter of merging the change to puppet it right? | 17:32 |
corvus | clarkb: yeah, i'll re-approve | 17:33 |
*** kjackal has joined #openstack-infra | 17:33 | |
fungi | wfm | 17:34 |
*** bobh has joined #openstack-infra | 17:34 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Overhaul instructions in README.rst for clarity https://review.openstack.org/623211 | 17:36 |
clarkb | notmyname: http://logs.openstack.org/31/592231/6/gate/swift-probetests-centos-7/7bde795/job-output.txt.gz#_2018-12-06_17_32_48_836444 just failed in the gate. I took a quick look to see if it was for any of the known problems associated with the centos 7.6 release and unless that is a new race caused by new python or libs I don't think it is. (just an fyi that it appears to be an actual bug and not | 17:36 |
clarkb | centos 7.6 causing problems) | 17:36 |
*** rascasoft has quit IRC | 17:38 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix race in test_handler_poll_session_expired https://review.openstack.org/623269 | 17:39 |
openstackgerrit | Chris Dent proposed openstack-infra/openstack-zuul-jobs master: Make lower-constraints job use python 3.6 https://review.openstack.org/623229 | 17:39 |
notmyname | clarkb: thanks | 17:40 |
clarkb | ssbarnea|rover: new theory on the file:// lookup issue with delorean. Is it possible that delorean is looking for that file within its chroot but it exists on the surrounding fs? | 17:42 |
fungi | clarkb: followup on the stackalytics-bot-2 ip6tables block rule. it looks like the bot eventually switched to ipv4 anyway, so probably safe to say it's not what was causing the gerrit slowdowns a couple weeks back and there's no point in continuing to leave that block rule in place | 17:48 |
*** jpich has quit IRC | 17:48 | |
*** florianf has quit IRC | 17:49 | |
*** eharney has quit IRC | 17:52 | |
*** gyee has joined #openstack-infra | 17:55 | |
fungi | so based on graphs it looks like we're topping out around 70 concurrent builds per executor? i guess if with the addition of ze12 we see we get up around 850 concurrent builds for extended periods that suggests we need another | 17:55 |
fungi | the hysteresis kicking in on the executor queue graph around the time we stop accepting more builds is interesting and fairly pronounced | 17:57 |
fungi | or perhaps each of the queued jobs spikes there reflects a major gate queue reset | 17:58 |
openstackgerrit | Merged openstack-infra/system-config master: Add ze12.openstack.org https://review.openstack.org/623067 | 17:59 |
*** derekh has quit IRC | 17:59 | |
pabelanger | fungi: yah, gate resets are aplifying the backlog for sure | 17:59 |
fungi | yeah, i guess there are corresponding spikes on the node requests graph so that seems likely | 17:59 |
*** bdodd has quit IRC | 17:59 | |
fungi | though we do seem to be managing ~0.2kjph higher than yesterday already | 18:00 |
*** dtantsur is now known as dtantsur|afk | 18:01 | |
*** udesale has quit IRC | 18:04 | |
*** Swami has joined #openstack-infra | 18:05 | |
*** gfidente is now known as gfidente|afk | 18:06 | |
*** ykarel|away has quit IRC | 18:07 | |
*** e0ne has joined #openstack-infra | 18:08 | |
*** jamesmcarthur has quit IRC | 18:08 | |
*** e0ne has quit IRC | 18:09 | |
*** bobh has quit IRC | 18:09 | |
*** bobh has joined #openstack-infra | 18:17 | |
openstackgerrit | Merged openstack-infra/zuul master: web: break the reducers module into logical units https://review.openstack.org/621385 | 18:20 |
clarkb | ssbarnea|rover: I'll move the conversation back here since I'm no longer thinking this is likely a zuul bug. http://logs.openstack.org/25/620625/2/gate/tripleo-ci-centos-7-standalone/70949b6/logs/ara_oooq/reports/730db7e5-5c8a-4aec-a2a4-836c4367225a.html That ansible run crashes, this seems to crash the console log streaming which is then noticed when pre is started | 18:20 |
clarkb | ssbarnea|rover: it seems to crash when executing tempest and even the tempest log seems truncated: http://logs.openstack.org/25/620625/2/gate/tripleo-ci-centos-7-standalone/70949b6/logs/undercloud/home/zuul/tempest.log.txt.gz notice that it has concurrency = 4 but only workers 0 and 2 record tests (we should at least have {1} as well) | 18:21 |
*** bobh has quit IRC | 18:22 | |
*** electrofelix has quit IRC | 18:26 | |
*** wolverineav has joined #openstack-infra | 18:29 | |
*** wolverineav has quit IRC | 18:29 | |
*** wolverineav has joined #openstack-infra | 18:29 | |
*** dave-mccowan has joined #openstack-infra | 18:29 | |
clarkb | ssbarnea|rover: ok I figured it out http://logs.openstack.org/25/620625/2/gate/tripleo-ci-centos-7-standalone/70949b6/logs/undercloud/var/log/journal.txt.gz#_Dec_06_16_08_49 -- Reboot -- | 18:29 |
clarkb | ssbarnea|rover: ^ so tahts the issue after all | 18:29 |
clarkb | mordred: ^ fyi | 18:30 |
*** slaweq has joined #openstack-infra | 18:30 | |
fungi | hah, rebooting a node mid-job. we knew that would prematurely terminate the console stream at least, right? | 18:31 |
clarkb | fungi: yes | 18:32 |
mordred | yeah. I really do need to raise the priority of reworking streaming | 18:32 |
mordred | in the mean time - they can add a zuul_console line to their playbook after the reboot | 18:32 |
mordred | it's just an ansible module - nothing stopping it from being restarted | 18:32 |
clarkb | mordred: while I agree, I also don't think that running tempest should induce a reboot | 18:34 |
*** yamamoto has joined #openstack-infra | 18:34 | |
clarkb | so I think there is a bigger tripleo bug here | 18:34 |
clarkb | (or maybe I am misunderstanding the logs around whati s going on at that time) | 18:35 |
openstackgerrit | Merged openstack-infra/irc-meetings master: Change meeting time and format for Docs & I18n team https://review.openstack.org/623242 | 18:35 |
*** diablo_rojo has joined #openstack-infra | 18:35 | |
*** slaweq has quit IRC | 18:36 | |
openstackgerrit | Monty Taylor proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585 | 18:37 |
*** wolverineav has quit IRC | 18:37 | |
mordred | clarkb: I saw something about enabling something - perhaps a new kernel is happening? | 18:37 |
*** yamamoto has quit IRC | 18:38 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Change Senlin meeting to different biweekly times https://review.openstack.org/623031 | 18:39 |
*** jcoufal has quit IRC | 18:40 | |
*** jcoufal has joined #openstack-infra | 18:40 | |
clarkb | mordred: is zuul_console a task that you run or a role? | 18:41 |
clarkb | updating the bug for this now and hoping to give that ^ info | 18:41 |
mordred | clarkb: task. one sec ... | 18:41 |
*** eharney has joined #openstack-infra | 18:42 | |
*** bdodd has joined #openstack-infra | 18:42 | |
mordred | clarkb: sorry, role: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/start-zuul-console | 18:42 |
clarkb | mordred: thanks | 18:42 |
logan- | clarkb: i began seeing similar behavior about 2-3 days ago in jobs I have that launch nested vms with nested virt enabled. i have temporarily changed the affected jobs to use software virt and they no longer reboot. this is on ubuntu xenial test nodes launching bionic nested vms. | 18:44 |
clarkb | logan-: ah good to know. Nested virt hits again. I'll leave that note too | 18:45 |
logan- | pretty concerning to see it is happening on centos guests too | 18:45 |
mordred | clarkb: that said - it's a one-task-role - so if it's more convenient to run it as a task, that's fine too | 18:45 |
*** wolverineav has joined #openstack-infra | 18:45 | |
logan- | nothing has changed on the hosts, but maybe I should look to see if there is a newer kernel we can try. | 18:45 |
clarkb | logan-: well centos just updated its kernels with 7.6 I'm sure | 18:46 |
clarkb | logan-: could be in the guest side of things | 18:46 |
fungi | #status log unblocked stackalytics-bot-2 access to review.o.o since the performance problems observed leading up to addition of the rule on 2018-11-23 seem to be unrelated (it eventually fell back to connecting via ipv4 and no recurrence was reported) | 18:46 |
openstackstatus | fungi: finished logging | 18:47 |
*** ramishra has quit IRC | 18:47 | |
logan- | yeah it seems like these breakages are usually guest induced by updated nodepool images, and then we usually get it back on track by updating the hosts. when I looked the other day there were no kernel updates available for the hosts :/ | 18:47 |
logan- | there is a kernel update available now. I'm taking a host out of the aggregate to update and test. will let you know how it goes. | 18:52 |
*** slaweq has joined #openstack-infra | 18:52 | |
clarkb | logan-: sounds good. It will be really interseting to see in a year or so (when the current 4.19 kernel shows up in places) if the intel nested virt enabled by default there actuall pans out as being much more reliable | 18:53 |
logan- | no kidding. I'm running xenial hwe on these hosts and it is pretty sad that it still breaks a few times per quarter while still being more reliable than the regular xenial kernel. :( | 18:54 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix race in test_handler_poll_session_expired https://review.openstack.org/623269 | 18:56 |
*** slaweq has quit IRC | 18:57 | |
*** shardy has quit IRC | 18:59 | |
clarkb | mwhahaha: ssbarnea|rover EmilienM to tl;dr it I think the three issues I'm aware of affecting tripleo jobs that are not "slowness" are the ntp errors, delorean not finding the /home/zuul/.*/repomd.xml file, delorean using pypi.python.org directly and having errors, and the nested virt possibly crashing VM and rebooting it (which may be the cause of the repomd.xml thing as a side effect?) | 19:01 |
clarkb | I guess thats 3.5 issues | 19:01 |
clarkb | I'm pretty sure all of these have bugs and I've updated the one I have new info on (reboots) with the data I collected | 19:02 |
clarkb | Then there is the ovh slowness related stuff. I do still think reducing memory pressure would be worthwhile as an exercise to see if that helps. Especially if there are any easy wins like kernel same page merging | 19:03 |
clarkb | and we'll keep working with ovh on the infra side to characterize and hopefully address underlying issues as well | 19:04 |
*** udesale has joined #openstack-infra | 19:11 | |
clarkb | hrm my neighbor is getting a new roof, will need to find the airplane headphones | 19:14 |
clarkb | fungi: ^ must be worse at your place :) | 19:14 |
mwhahaha | clarkb: ok we had a fix for the ntp thing but it failed in the gate, maybe we can get that promoted next gate rest | 19:14 |
mwhahaha | which seemst o have just occured | 19:15 |
* mwhahaha sighs | 19:15 | |
mwhahaha | clarkb: https://review.openstack.org/#/c/621930/ if you can promote that to the top of the gate so we stop getting that one | 19:15 |
mwhahaha | clarkb: i'm not aware of the repomd.xml one or the crashing vm. Is the creashing VM the standalone job? | 19:16 |
clarkb | mwhahaha: ya an example is http://logs.openstack.org/25/620625/2/gate/tripleo-ci-centos-7-standalone/70949b6/logs/ara_oooq/ | 19:17 |
mwhahaha | ok so originally we used just qemu hard coded and then we moved to try and do the auto direction | 19:18 |
clarkb | mwhahaha: notice that the multinode-standalone.yaml playbook is incomplete/interrupted. If you then go look at the journal log you'll see that there was a reboot around 16:07 ish | 19:18 |
mwhahaha | yea so it crashes in tempest | 19:18 |
clarkb | then later in the job delorean fails beacuse the repomd.xml isn't present (possibly because ansible crashed in a way that really confused things?) | 19:18 |
clarkb | mwhahaha: ya I think tempest is the trigger there | 19:18 |
mwhahaha | we can force that to qemu but that is less than ideal | 19:18 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Run a local MySQL service on StoryBoard servers https://review.openstack.org/623290 | 19:19 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Switch StoryBoard database backups to local https://review.openstack.org/623291 | 19:19 |
clarkb | mwhahaha: unfortunately nested virt has never been stable | 19:19 |
clarkb | it will work then stop then work and its really hard to debug unless you've got logan- or mnaser investigating the hypervisor side too | 19:19 |
fungi | clarkb: roof rebuild hasn't started yet, we're still getting quotes and arguing about whether we want shingle or steel | 19:19 |
clarkb | mwhahaha: ntp fix is being promoted now | 19:19 |
mwhahaha | thanks | 19:19 |
mwhahaha | i'll propose a patch for the qemu thing | 19:20 |
clarkb | fungi: I'm unsure of the comparative advantages in your part of the world but steel is very loud when it rains | 19:20 |
clarkb | fungi: growing up it would rain hard enough that it would be louder than the speakers hooked to the tv | 19:21 |
clarkb | (granted we lived in the tropics with minimal insulation to dampen things too) | 19:21 |
*** dpawlik has joined #openstack-infra | 19:22 | |
fungi | yeah, lots of insulation here. i know metal roofs are loud, though in theory require less maintenance and last a lot longer for not a lot higher cost | 19:22 |
*** dpawlik has quit IRC | 19:24 | |
mwhahaha | clarkb: for the nested virt thing: https://review.openstack.org/#/c/623293/ | 19:26 |
mwhahaha | ssbarnea|rover, EmilienM fyi -^ | 19:26 |
clarkb | mwhahaha: thanks! | 19:26 |
EmilienM | mwhahaha: ack | 19:27 |
*** ndahiwade has joined #openstack-infra | 19:27 | |
openstackgerrit | Ronelle Landy proposed openstack-infra/zuul-jobs master: WIP: Default private_ipv4 to use public_ipv4 address when null https://review.openstack.org/623294 | 19:28 |
*** udesale has quit IRC | 19:31 | |
openstackgerrit | Merged openstack-infra/zuul master: web: refactor info and tenant reducers action https://review.openstack.org/621386 | 19:35 |
fungi | this is awesome: https://github.com/systemd/systemd/issues/11026 | 19:39 |
*** dpawlik has joined #openstack-infra | 19:39 | |
*** boden has quit IRC | 19:42 | |
fungi | though now https://gitlab.freedesktop.org/polkit/polkit/issues/74 is arguing it's a systemd bug after all | 19:42 |
clarkb | you get root I get root we all get root | 19:43 |
fungi | finger pointing ftw! | 19:43 |
*** sshnaidm is now known as sshnaidm|afk | 19:43 | |
*** dpawlik has quit IRC | 19:44 | |
*** bobh has joined #openstack-infra | 19:47 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Fix race in test_handler_poll_session_expired https://review.openstack.org/623269 | 19:50 |
*** bobh has quit IRC | 19:54 | |
*** jamesmcarthur has joined #openstack-infra | 19:57 | |
*** ndahiwade has quit IRC | 19:59 | |
*** wolverineav has quit IRC | 19:59 | |
corvus | #status log added ze12 to zuul executor pool to reduce memory pressure | 20:00 |
openstackstatus | corvus: finished logging | 20:00 |
corvus | infra-root: ze12 is in production | 20:00 |
*** wolverineav has joined #openstack-infra | 20:00 | |
*** jamesmcarthur has quit IRC | 20:01 | |
*** jamesmcarthur has joined #openstack-infra | 20:01 | |
fungi | ahh, yep, just noticed the green line on the executors graph bump up a notch | 20:02 |
corvus | i really like that it's immediately reflected in monitoring :) | 20:02 |
fungi | that is super nice | 20:02 |
corvus | all the governor graphs have an extra line now too. even cacti is updated. | 20:02 |
fungi | and there's a gate resent underway in the integrated queue. curious to see if we fall into exhaustion again | 20:04 |
fungi | er, gate reset | 20:04 |
corvus | oh good, that will help things equalize across all the executors faster :) | 20:04 |
clarkb | ya its been bumpy there too. I've been trying to context switch into debugging some of those failures next, but running out of steam | 20:04 |
clarkb | glance python3 unittests were the last failure | 20:05 |
*** wolverineav has quit IRC | 20:05 | |
clarkb | seemed to be a legit issue with bytes and unicode or something | 20:05 |
clarkb | http://logs.openstack.org/61/610661/7/gate/openstack-tox-py35/f70430e/job-output.txt.gz#_2018-12-06_18_48_43_311604 | 20:05 |
*** bobh has joined #openstack-infra | 20:07 | |
clarkb | the most recent reset was grenade job failing on bhs1 because the nova node tempest was testing didn't reach an active state before the timeout | 20:07 |
*** bobh has quit IRC | 20:11 | |
*** sthussey has quit IRC | 20:12 | |
*** rcernin has joined #openstack-infra | 20:12 | |
fungi | any puppet gurus know how to work around http://logs.openstack.org/90/623290/1/check/infra-puppet-apply-4-ubuntu-xenial/62c7ca7/applytest/puppetapplytest32.final.out.FAILED ? | 20:17 |
fungi | seems we can't use our mysql::backup_remote class with the puppet mysql module because both want to install mysql-client | 20:18 |
fungi | unfortunately one of them isn't a module under our control (i think?) | 20:18 |
clarkb | fungi: ya the mysql module is an upstream module. | 20:19 |
fungi | maybe it provides a way to not declare the mysql-client package | 20:19 |
clarkb | fungi: puppet 4 is ordered (and maybe 3 is now too?) in any case in the backup module you can do the if !defined() check for mysql-client and install it that way. Then ensure that you include backup class after the regular myself stuff | 20:20 |
clarkb | then the ordering should be such that it works | 20:20 |
fungi | ohh | 20:20 |
fungi | i can actually order them? | 20:20 |
fungi | will try that, thanks! | 20:20 |
clarkb | fungi: its implied top to bottom order in pupet 4 | 20:20 |
clarkb | but I think you can also have mysql backup require mysql then it will order them explicitly too | 20:20 |
fungi | we already do the if ! defined(Package['mysql-client']) dance in puppet-mysql_backup so if i can order them that should do the trick | 20:21 |
*** david-lyle is now known as dklyle | 20:24 | |
*** udesale has joined #openstack-infra | 20:29 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Run a local MySQL service on StoryBoard servers https://review.openstack.org/623290 | 20:31 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Switch StoryBoard database backups to local https://review.openstack.org/623291 | 20:31 |
fungi | hopefully that ^ will solve it for puppet 3 and 4 then | 20:31 |
pabelanger | corvus: clarkb: I think we forgot to move /var/lib/zuul to /dev/xvde2 partition for ze12.o.o, which means we only have 40GB there for builds | 20:32 |
*** bobh has joined #openstack-infra | 20:33 | |
*** wolverineav has joined #openstack-infra | 20:33 | |
corvus | pabelanger: hrm. we should automate that. | 20:34 |
pabelanger | agree! | 20:35 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479 | 20:35 |
corvus | pabelanger: i think it's okay for now; we can shut it down later when it's quieter | 20:35 |
*** bobh has quit IRC | 20:37 | |
*** wolverineav has quit IRC | 20:38 | |
fungi | ooh, just remembered, since i want to restart gerrit soon for the task footer hyperlinking config addition, might be nice to get https://review.openstack.org/471078 merged before as well | 20:39 |
fungi | adding lp bug trackingids | 20:39 |
fungi | (to make them searchable) | 20:40 |
openstackgerrit | Merged openstack-infra/zuul master: web: add error reducer and info toast notification https://review.openstack.org/621387 | 20:42 |
fungi | corvus: looks like we've entered a period of no executors accepting new builds again | 20:43 |
*** mriedem has quit IRC | 20:45 | |
clarkb | fungi: we had a few periods of that overnight according to grafana, they all seem to have recovered on their own (not sure if this one will) | 20:45 |
fungi | yeah, it's more of a question of whether we'll have any more 2-hour-long ones | 20:46 |
*** udesale has quit IRC | 20:46 | |
fungi | this at least has only persisted for ~15 minutes so far | 20:46 |
*** mriedem has joined #openstack-infra | 20:47 | |
clarkb | on the bhs1 front I hopped into a test node and manually checked it had reasonable disk throughput, then found the job it is running https://zuul.openstack.org/stream/b81a8b3afe0f48819fcd3ed0fa201fba?logfile=console.log in the hopes of looking at dstat for that job to see if it exhibits similar behavior to the jobs that timeout | 20:47 |
clarkb | thats a heat functional job that uses devstack, I'm not actually sure if it runs dstat :/ | 20:48 |
fungi | fingers crossed | 20:48 |
clarkb | back to swapping. Last night I found that each of the swapping ansible jobs would use up to 75MB swap each | 20:49 |
clarkb | I think we should consider getting mordreds patch in around the json handling | 20:50 |
clarkb | that should reduce the window where memory is needed in the jobs | 20:50 |
fungi | theory is that it's paging out the console stream? | 20:50 |
clarkb | additionally we probably want to consider testing a downgrade of ansible to 2.5.older to see if it changes behavior | 20:50 |
clarkb | fungi: its the ansible json log data not the console stream itself, but ya we open it and keep it open the whole time when we really only need to write a new copy with new data at the end aiui | 20:51 |
clarkb | I think it scales in the size of the role and tasks in a playbook as its capturing all of that data? | 20:51 |
mordred | there's a patch up to fix that | 20:51 |
clarkb | mordred: ya I +2'd I'm saying we should try to get that in | 20:51 |
mordred | yah. I totally agree | 20:52 |
mordred | I think we should try rolling that out before we try downgrading anythign | 20:52 |
clarkb | ++ | 20:52 |
corvus | i'll take a look now | 20:52 |
fungi | 623245? | 20:53 |
*** wolverineav has joined #openstack-infra | 20:53 | |
clarkb | fungi: yes | 20:53 |
corvus | mordred: i like the local var idea; is there any reason not to do that? | 20:54 |
mordred | there's also a followup patch that will do the same thing but with yaml and appending to a file instead of reading and re-writing | 20:54 |
clarkb | on_stats is the thing that runs at the end of a playbook ruin to display stats around what took time and all that | 20:54 |
mordred | corvus: there isn't - although that function is the last thing called before the process exits, so I didn't just to avoid the respin | 20:54 |
mordred | but I can totally do that real quick | 20:55 |
clarkb | mordred: ya I did double check on that in ansible docs (that the hook fires at the end) | 20:55 |
clarkb | so I didn't -1 | 20:55 |
fungi | i suppose a local would be more future-proofing in case more function calls get tacked on after that down the road? | 20:55 |
mordred | yeah. I'll push up a followup | 20:55 |
corvus | yeah, it's mostly just confusing from a dev/maint pov. | 20:55 |
corvus | mordred: you may as well ammend | 20:56 |
mordred | kk | 20:56 |
corvus | or however you spell that :) | 20:56 |
fungi | ammm...mmmend | 20:56 |
fungi | we have enough people around to approve the revised version anyway | 20:56 |
corvus | i'd want to restart with that anyway, so we'd be waiting for the second, and we can all re+2 real quick | 20:56 |
fungi | do we want to get the yaml equivalent in too? | 20:57 |
clarkb | the one upside to json is browsers nicely render it | 20:57 |
clarkb | yaml is more readable on its own though | 20:57 |
corvus | personally, i'd like to take that one slower if we can, since it's basically a new feature. | 20:57 |
fungi | oh, i see the yaml one is more involved | 20:57 |
corvus | the json one seems more like an operational fix | 20:58 |
mordred | yeah. the yaml one is like a change in approach | 20:58 |
fungi | only just started looking at it and, yeah, i agree | 20:58 |
corvus | i like it, i just think we should talk through it fully (eg clarkb's point) | 20:58 |
fungi | entirely possible there are users of zuul who prefer the json version, so it's probably a bigger community question | 20:59 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Read old json data right before writing new data https://review.openstack.org/623245 | 20:59 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Add appending yaml log plugin https://review.openstack.org/623256 | 20:59 |
mordred | ok. there's the updated | 20:59 |
*** bobh has joined #openstack-infra | 21:02 | |
clarkb | looks like dstat is enabled in that test, should have that data once logs post | 21:04 |
fungi | looks like that busy cycle for the executors lasted ~17 minutes | 21:05 |
openstackgerrit | Nate Johnston proposed openstack-infra/project-config master: Neutron grafana update for co-gating section https://review.openstack.org/622418 | 21:05 |
*** bobh has quit IRC | 21:07 | |
*** agopi has quit IRC | 21:12 | |
*** agopi has joined #openstack-infra | 21:12 | |
*** gfidente|afk has quit IRC | 21:15 | |
SpamapS | Hey I just realized my company affiliation changed recently. Is there still a place to update somewhere? | 21:17 |
* SpamapS always forgets where it is | 21:17 | |
*** jcoufal has quit IRC | 21:19 | |
corvus | SpamapS: the openstack foundation site has a thing for that | 21:21 |
corvus | SpamapS: foundation individual membership | 21:21 |
corvus | fungi, clarkb, mordred: i think our next steps should be to restart executors with mordred's patch, observe behavior, then create ze13 if needed. | 21:22 |
clarkb | ok. Im steeping out for a bit for lunch and needbreak from staring at monitor | 21:22 |
clarkb | can help when I return | 21:22 |
mriedem | clarkb: interesting, not seeing that 26 sec mystery time gap in this run on n-api startup http://logs.openstack.org/65/623265/1/check/tempest-full/0e80f2a/controller/logs/screen-n-api.txt.gz#_Dec_06_17_53_58_039303 | 21:22 |
mriedem | uwsgi debug logging seems to not do anything | 21:23 |
*** jaosorior has quit IRC | 21:23 | |
clarkb | mriedem: it could be ovh specific :( | 21:23 |
fungi | SpamapS: and if you care about stackalytics at all (or your handlers do?) then there's a config file in the stackaytics repo you could push up a review for | 21:23 |
fungi | someday stackalytics might start consuming the affiliation info in osf profiles since there's an api to query that now, but it doesn't today | 21:24 |
fungi | corvus: this sounds like a fine plan. i need to disappear in about 55 minutes to meet some friends for dinner, but can help with restarts prior to that | 21:26 |
fungi | or once i get back (probably around 23:30-00:00z) | 21:26 |
SpamapS | corvus: ty | 21:26 |
SpamapS | fungi: ty too | 21:26 |
*** kjackal has quit IRC | 21:27 | |
fungi | corvus: oh, and taking ze12 offline briefly to add a cinder volume i guess slots in there somewhere too | 21:29 |
*** bobh has joined #openstack-infra | 21:30 | |
corvus | fungi: i don't believe we use cinder volumes | 21:30 |
corvus | fungi: we just mount the ephemeral volume at /var/lib/zuul | 21:31 |
corvus | (we should probably instead have our automation symlink that to /opt or something) | 21:31 |
corvus | but i don't know how to untangle our deployment from openstackci at this point, so i don't want to touch it until we can move to the new stuff. | 21:31 |
*** bobh has quit IRC | 21:34 | |
fungi | oh, got it, i always forget we have ephemeral disk in that provider | 21:34 |
fungi | looks like 623245 has all its node requests fulfilled in the gate as of just now, eta 18 minutes | 21:37 |
*** boden has joined #openstack-infra | 21:40 | |
*** jamesmcarthur has quit IRC | 21:49 | |
clarkb | I think the way it has been done in the past is passing the option to launch node to mount the ephemeral disk elsewhere? | 21:50 |
clarkb | Putting http://logs.openstack.org/38/589238/13/check/heat-functional-convg-mysql-lbaasv2-amqp1/b81a8b3/logs/dstat-csv_log.txt.gz into https://lamada.eu/dstat-graph/ shows a much happier test run than the runs that fail. And there is quite a bit of IO happening too | 21:54 |
clarkb | fungi: ianw ^ I think that points to ovh bhs1 (and maybe gra1) having unhappy and happy hypervisors as the source of the problem (assuming we aren't seeing noisy neighbor issues) | 21:54 |
*** bobh has joined #openstack-infra | 21:55 | |
openstackgerrit | Merged openstack-infra/zuul master: Read old json data right before writing new data https://review.openstack.org/623245 | 21:56 |
fungi | i have a feeling puppet isn't going to update the executors before i have to disappear in 20 minutes, but happy to assist with restarts when i return from dinner if there's still any to be done at that point | 21:59 |
*** bobh has quit IRC | 22:01 | |
clarkb | I've updated https://etherpad.openstack.org/p/bhs1-test-node-slowness | 22:02 |
clarkb | I think we should consider halving the max-servers again and watch those e-r bugs I identified as being corrected by not running in bhs1 | 22:02 |
clarkb | amorin pointed out that none of those slow jobs ran on the same hypervisor so less likely it is one or two unhappy hypervisors. Instead we are maybe our own noisy neighbor | 22:02 |
clarkb | and halving the number of nodes should reduce noisy neighbor impacts | 22:03 |
corvus | clarkb: ack | 22:03 |
fungi | makes sense to me | 22:04 |
openstackgerrit | Ronelle Landy proposed openstack-infra/zuul-jobs master: WIP: Default private_ipv4 to use public_ipv4 address when null https://review.openstack.org/623294 | 22:04 |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Halve bhs1 max-servers value https://review.openstack.org/623338 | 22:06 |
clarkb | fungi: corvus ^ quick review to implement that | 22:06 |
corvus | clarkb: unfortunately that's a variable into our zuul executor swap problem. we should do it because resets are bad, we're just going to need to keep that in mind as we evaluate further changes. | 22:07 |
clarkb | corvus: ++ | 22:07 |
fungi | if we drop it back to 79 instead of 75 we can more directly compare behavior differences between bhs1 and gra1 | 22:08 |
logan- | clarkb: updating xenial hwe from 4.15.0.34.56 to 4.15.0.42.63 makes my nested kvm jobs work again. ¯\_(ツ)_/¯ | 22:08 |
logan- | i will cycle thru the hvs and update them all over the next day or so | 22:08 |
clarkb | logan-: weird | 22:08 |
clarkb | fungi: I think gra1 has less physical hardware too though | 22:09 |
clarkb | so that comparison won't be super accurate? | 22:09 |
fungi | yeah, not a big deal either way | 22:09 |
fungi | certainly if we still see more failures in bhs1 than gra1 even with a lower max-servers, that's telling too | 22:10 |
fungi | anyway, i approved it | 22:10 |
*** kjackal has joined #openstack-infra | 22:11 | |
fungi | and i'm being dragged away 10 minutes early. back as soon as i can be | 22:11 |
clarkb | thanks! | 22:11 |
*** calebb has quit IRC | 22:12 | |
clarkb | logan-: that does sort of imply to me that canonical/ubuntu must haev testing for this stuff, but its likely a losing battle for them trying to keep up | 22:17 |
clarkb | corvus: for the executor stuff we are waiting on mordred's change to merge now? | 22:18 |
corvus | clarkb: it's merged; waiting to deploy | 22:19 |
clarkb | rgr | 22:19 |
corvus | clarkb: but i think we should wait a bit after your quota change merges before we restart | 22:19 |
corvus | so we get a new baseline | 22:19 |
*** eernst has joined #openstack-infra | 22:20 | |
openstackgerrit | Ronelle Landy proposed openstack-infra/zuul-jobs master: WIP: Default private_ipv4 to use public_ipv4 address when null https://review.openstack.org/623294 | 22:20 |
*** eernst has quit IRC | 22:25 | |
*** bobh has joined #openstack-infra | 22:25 | |
*** kgiusti has left #openstack-infra | 22:27 | |
*** manjeets_ is now known as manjeets | 22:28 | |
openstackgerrit | Jonathan Rosser proposed openstack-infra/project-config master: Separate out success/failure/timeout charts in grafana for OSA https://review.openstack.org/623341 | 22:29 |
*** bobh has quit IRC | 22:29 | |
openstackgerrit | Merged openstack-infra/project-config master: Halve bhs1 max-servers value https://review.openstack.org/623338 | 22:30 |
tonyb | Can I please be added to the bootsrappers gerrit group so I can EOL the puppet repos as per: http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000663.html | 22:32 |
* tonyb will self remove when done | 22:32 | |
clarkb | tonyb: yes, one moment | 22:32 |
tonyb | clarkb: Thanks | 22:33 |
clarkb | tonyb: done | 22:33 |
tonyb | clarkb: \o/ as always I'll be careful :) | 22:33 |
*** boden has quit IRC | 22:45 | |
*** bobh has joined #openstack-infra | 22:45 | |
tonyb | clarkb, tobias-urdin: Done and removed | 22:48 |
*** bobh has quit IRC | 22:49 | |
openstackgerrit | Ronelle Landy proposed openstack-infra/zuul-jobs master: WIP: Default private_ipv4 to use public_ipv4 address when null https://review.openstack.org/623294 | 22:53 |
*** bobh has joined #openstack-infra | 23:04 | |
*** lbragstad has quit IRC | 23:08 | |
*** bobh has quit IRC | 23:09 | |
*** lbragstad has joined #openstack-infra | 23:09 | |
clarkb | melwitt: mriedem: email sent | 23:16 |
mriedem | clarkb: thanks | 23:19 |
clarkb | corvus: max-servers change was applied at ~2300UTC and dropped to ~75 in use at about 23:15UTC | 23:19 |
clarkb | baseline numbers probably want to start at 23:15UTC | 23:20 |
corvus | clarkb: yeh, i was just looking. so maybe we wait until at least 24:00 before we restart any executors. | 23:20 |
clarkb | wfm | 23:20 |
melwitt | clarkb: danke | 23:21 |
*** bobh has joined #openstack-infra | 23:30 | |
*** bobh has quit IRC | 23:35 | |
clarkb | corvus: we seem to be stabilizing just under 2GB swap per executor? | 23:42 |
*** rkukura_ has joined #openstack-infra | 23:48 | |
corvus | clarkb: looks like it | 23:48 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Include host_id for openstack provider https://review.openstack.org/623107 | 23:49 |
*** bobh has joined #openstack-infra | 23:49 | |
*** rkukura has quit IRC | 23:51 | |
*** rkukura_ is now known as rkukura | 23:51 | |
*** bobh has quit IRC | 23:54 | |
*** yamamoto has joined #openstack-infra | 23:56 | |
corvus | clarkb: other than 'atomic images prune' and 'atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-kubelet:v1.11.5-1' did you have to do anything else to upgrade the cluster? | 23:58 |
clarkb | corvus: yes, I "vacuumed" the journald contents to free up more disk space | 23:59 |
clarkb | corvus: oh and I upgraded proxy, scheduler, kubelet, api, and controler-manager | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!