*** wolverineav has quit IRC | 00:04 | |
*** eharney has quit IRC | 00:07 | |
*** ahosam has quit IRC | 00:14 | |
*** jamesmcarthur has quit IRC | 00:25 | |
*** jamesmcarthur has joined #openstack-infra | 00:29 | |
*** wolverineav has joined #openstack-infra | 00:34 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 00:40 |
---|---|---|
*** jamesmcarthur has quit IRC | 00:44 | |
*** jamesmcarthur has joined #openstack-infra | 00:45 | |
*** jamesmcarthur has quit IRC | 00:56 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 01:16 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible https://review.openstack.org/611228 | 01:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 01:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Prefix install_openstacksdk variable https://review.openstack.org/621462 | 01:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 01:19 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 01:21 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 01:21 |
*** jamesmcarthur has joined #openstack-infra | 01:27 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 01:32 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 01:32 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 01:45 |
ianw | http://logs.openstack.org/28/611228/9/check/system-config-run-base-ansible-devel/a5abdca/job-output.txt.gz#_2018-12-03_01_33_26_430653 | 01:47 |
ianw | this is an interesting traceback in our ansible devel branch job ... that's an exception from inside python's multiprocesssing module | 01:48 |
ianw | it looks like ansible is a pretty sane user of that, so it seems like a fun bug somewhere | 01:48 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 01:50 |
*** hwoarang has quit IRC | 02:03 | |
*** hwoarang has joined #openstack-infra | 02:04 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 02:11 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 02:11 |
*** hongbin has joined #openstack-infra | 02:13 | |
*** mrsoul has joined #openstack-infra | 02:16 | |
*** jamesmcarthur has quit IRC | 02:32 | |
*** psachin has joined #openstack-infra | 02:42 | |
*** hongbin has quit IRC | 02:46 | |
*** wolverineav has quit IRC | 03:04 | |
*** wolverineav has joined #openstack-infra | 03:04 | |
*** jamesmcarthur has joined #openstack-infra | 03:07 | |
*** bhavikdbavishi has joined #openstack-infra | 03:14 | |
*** hongbin has joined #openstack-infra | 03:21 | |
*** wolverineav has quit IRC | 03:28 | |
*** armax has quit IRC | 03:29 | |
*** hongbin has quit IRC | 03:30 | |
*** ramishra has joined #openstack-infra | 03:31 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 03:31 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 03:31 |
*** jamesmcarthur has quit IRC | 03:35 | |
*** jamesmcarthur has joined #openstack-infra | 03:35 | |
*** hamzy__ is now known as hamzy | 03:36 | |
*** jamesmcarthur has quit IRC | 03:40 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 03:45 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 03:45 |
*** wolverineav has joined #openstack-infra | 03:55 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 03:59 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 03:59 |
*** jamesmcarthur has joined #openstack-infra | 04:06 | |
ianw | 2018-11-29 03:43:13.751247 | bridge.openstack.org | ansible 2.8.0.dev0 | 04:08 |
ianw | oh that's quite annoying, ansible doesn't give you the git head when installed from source in version | 04:09 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 04:27 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 04:27 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] updates for install_ansible role https://review.openstack.org/621463 | 04:48 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 04:48 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] install ansible as editable during devel jobs https://review.openstack.org/621471 | 04:48 |
*** yamamoto has joined #openstack-infra | 04:55 | |
*** agopi has joined #openstack-infra | 05:05 | |
*** hwoarang has quit IRC | 05:10 | |
*** hwoarang has joined #openstack-infra | 05:11 | |
*** jamesmcarthur has quit IRC | 05:22 | |
*** wolverineav has quit IRC | 05:24 | |
*** wolverineav has joined #openstack-infra | 05:43 | |
*** wolverineav has quit IRC | 05:48 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475 | 05:56 |
*** yamamoto has quit IRC | 05:59 | |
*** yamamoto has joined #openstack-infra | 06:00 | |
*** hwoarang has quit IRC | 06:01 | |
*** hwoarang has joined #openstack-infra | 06:03 | |
*** elbragstad has quit IRC | 06:03 | |
*** zul has quit IRC | 06:04 | |
*** ykarel has joined #openstack-infra | 06:08 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479 | 06:23 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Add spec for scale out scheduler https://review.openstack.org/621479 | 06:24 |
*** apetrich has quit IRC | 06:25 | |
*** wolverineav has joined #openstack-infra | 06:34 | |
openstackgerrit | Surya Prakash (spsurya) proposed openstack-infra/zuul master: dict_object.keys() is not required for *in* operator https://review.openstack.org/621482 | 06:35 |
*** ralonsoh has joined #openstack-infra | 06:37 | |
*** yamamoto has quit IRC | 06:37 | |
*** yamamoto has joined #openstack-infra | 06:38 | |
*** apetrich has joined #openstack-infra | 06:40 | |
*** kjackal has joined #openstack-infra | 06:47 | |
*** wolverineav has quit IRC | 06:55 | |
*** wolverineav has joined #openstack-infra | 06:56 | |
*** rcernin has quit IRC | 06:57 | |
*** yamamoto has quit IRC | 06:58 | |
*** yamamoto has joined #openstack-infra | 06:59 | |
*** wolverineav has quit IRC | 07:01 | |
*** quiquell|off is now known as quiquell | 07:10 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475 | 07:13 |
*** rkukura has quit IRC | 07:14 | |
*** dpawlik has joined #openstack-infra | 07:15 | |
*** dpawlik has quit IRC | 07:20 | |
*** dpawlik_ has joined #openstack-infra | 07:20 | |
*** aojea has joined #openstack-infra | 07:23 | |
*** pcaruana has joined #openstack-infra | 07:25 | |
*** wolverineav has joined #openstack-infra | 07:27 | |
*** wolverineav has quit IRC | 07:28 | |
*** wolverineav has joined #openstack-infra | 07:28 | |
*** gema has joined #openstack-infra | 07:37 | |
*** quiquell is now known as quiquell|brb | 07:40 | |
*** wolverineav has quit IRC | 07:43 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475 | 07:46 |
*** ahosam has joined #openstack-infra | 07:59 | |
*** e0ne has joined #openstack-infra | 08:04 | |
*** ginopc has joined #openstack-infra | 08:05 | |
*** slaweq has joined #openstack-infra | 08:06 | |
*** ahosam has quit IRC | 08:08 | |
*** shardy has joined #openstack-infra | 08:11 | |
*** yboaron_ has quit IRC | 08:12 | |
ianw | mordred / corvus / clarkb : it seems the iptables role has triggered a real issue somewhere in our ansible devel branch testing job; I've filed https://github.com/ansible/ansible/issues/49430 with details | 08:12 |
ianw | certainly it seems related to the importing of tasks into the reload handler | 08:13 |
*** jtomasek has joined #openstack-infra | 08:15 | |
*** jtomasek has quit IRC | 08:15 | |
*** jtomasek has joined #openstack-infra | 08:16 | |
*** quiquell|brb is now known as quiquell | 08:18 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [to squash] Modifications to ARA installation https://review.openstack.org/621463 | 08:23 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results https://review.openstack.org/617216 | 08:23 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [wip] install ansible as editable during devel jobs https://review.openstack.org/621471 | 08:23 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [DNM] testing ansible task handler import https://review.openstack.org/621475 | 08:23 |
ianw | dmsimard: ^ could you review 621463 for me, and if you're happy, we can squash that into the base install-ara change? personally i think we can get this in to collect the inner ara results in the gate quickly, as that is very useful | 08:24 |
*** rossella_s has quit IRC | 08:26 | |
*** ykarel is now known as ykarel|lunch | 08:26 | |
*** jpena|off is now known as jpena | 08:28 | |
*** rossella_s has joined #openstack-infra | 08:28 | |
*** kjackal has quit IRC | 08:38 | |
*** kjackal has joined #openstack-infra | 08:39 | |
*** ccamacho has joined #openstack-infra | 08:45 | |
*** rkukura has joined #openstack-infra | 08:46 | |
*** tosky has joined #openstack-infra | 08:48 | |
*** yboaron_ has joined #openstack-infra | 08:51 | |
*** yboaron_ has quit IRC | 08:56 | |
*** yboaron_ has joined #openstack-infra | 08:57 | |
*** xek has joined #openstack-infra | 08:59 | |
*** jpich has joined #openstack-infra | 09:03 | |
*** aojea has quit IRC | 09:13 | |
*** aojea has joined #openstack-infra | 09:14 | |
openstackgerrit | Merged openstack-infra/infra-manual master: Fix a reST block syntax https://review.openstack.org/621455 | 09:37 |
ssbarnea|rover | ianw: mordred corvus clarkb : would it be a problem to upload some periodic rdo job logs to logstash? I found some errors there where log stash would be very useful for. | 09:37 |
*** gfidente has joined #openstack-infra | 09:38 | |
*** ykarel|lunch is now known as ykarel | 09:41 | |
ianw | ssbarnea|rover: you should have a chat with tristanC about his log analysis stuff, it could probably import | 09:42 |
ianw | to your question, i'm not sure, clarkb is probably best to talk too. | 09:42 |
ssbarnea|rover | ianw: thanks. i will ask them. | 09:43 |
*** sshnaidm|off is now known as sshnaidm | 09:43 | |
*** derekh has joined #openstack-infra | 09:57 | |
*** yamamoto has quit IRC | 10:00 | |
*** yamamoto has joined #openstack-infra | 10:07 | |
*** yamamoto has quit IRC | 10:10 | |
*** fresta has quit IRC | 10:13 | |
*** fresta has joined #openstack-infra | 10:14 | |
*** kjackal has quit IRC | 10:17 | |
*** kjackal has joined #openstack-infra | 10:18 | |
*** electrofelix has joined #openstack-infra | 10:18 | |
*** fresta has quit IRC | 10:22 | |
*** electrofelix has quit IRC | 10:22 | |
*** fresta has joined #openstack-infra | 10:22 | |
*** bhavikdbavishi has quit IRC | 10:23 | |
*** electrofelix has joined #openstack-infra | 10:31 | |
*** shardy has quit IRC | 10:35 | |
*** shardy has joined #openstack-infra | 10:43 | |
*** adriancz has joined #openstack-infra | 10:45 | |
*** panda|pto is now known as panda | 10:47 | |
*** shardy has quit IRC | 10:55 | |
*** ahosam has joined #openstack-infra | 10:55 | |
*** priteau has joined #openstack-infra | 10:56 | |
*** yamamoto has joined #openstack-infra | 11:04 | |
*** jamesmcarthur has joined #openstack-infra | 11:11 | |
*** yamamoto has quit IRC | 11:11 | |
*** yamamoto has joined #openstack-infra | 11:15 | |
*** jamesmcarthur has quit IRC | 11:15 | |
*** sshnaidm has quit IRC | 11:16 | |
*** sshnaidm has joined #openstack-infra | 11:16 | |
*** sshnaidm has quit IRC | 11:18 | |
*** rfolco has joined #openstack-infra | 11:18 | |
*** sshnaidm has joined #openstack-infra | 11:19 | |
*** quiquell is now known as quiquell|brb | 11:21 | |
*** owalsh_ has quit IRC | 11:24 | |
*** owalsh has joined #openstack-infra | 11:24 | |
*** jpich has quit IRC | 11:25 | |
*** jpich has joined #openstack-infra | 11:26 | |
*** hamzy_ has joined #openstack-infra | 11:42 | |
*** ahosam has quit IRC | 11:42 | |
*** hamzy has quit IRC | 11:42 | |
*** dtroyer has quit IRC | 11:43 | |
*** dtroyer has joined #openstack-infra | 11:43 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add install and deploy openshift roles. https://review.openstack.org/608610 | 11:44 |
*** quiquell|brb is now known as quiquell | 11:45 | |
*** yamamoto has quit IRC | 11:48 | |
*** yamamoto has joined #openstack-infra | 11:49 | |
*** yamamoto has quit IRC | 11:49 | |
*** yamamoto has joined #openstack-infra | 11:50 | |
*** yamamoto has quit IRC | 11:56 | |
tobias-urdin | tonyb: we got consensus to remove the stable/newton branches on all stable branches but i think the thread is kind of lost in openstack-dev list | 11:57 |
tobias-urdin | who can i talk to to queue up that work? | 11:57 |
*** electrofelix has quit IRC | 11:58 | |
*** electrofelix has joined #openstack-infra | 12:03 | |
*** ykarel is now known as ykarel|afk | 12:03 | |
*** ahosam has joined #openstack-infra | 12:03 | |
*** owalsh has quit IRC | 12:03 | |
*** shardy has joined #openstack-infra | 12:09 | |
*** ykarel|afk is now known as ykarel | 12:11 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider https://review.openstack.org/570667 | 12:13 |
*** ykarel is now known as ykarel|afk | 12:19 | |
*** owalsh has joined #openstack-infra | 12:21 | |
*** ykarel|afk has quit IRC | 12:30 | |
*** jpena is now known as jpena|lunch | 12:34 | |
*** dhill_ has joined #openstack-infra | 12:40 | |
*** ramishra has quit IRC | 12:48 | |
*** lpetrut has joined #openstack-infra | 12:52 | |
*** rlandy has joined #openstack-infra | 12:54 | |
*** lpetrut has quit IRC | 12:55 | |
*** lpetrut has joined #openstack-infra | 12:55 | |
*** dave-mccowan has joined #openstack-infra | 12:58 | |
*** Douhet has quit IRC | 12:58 | |
*** ramishra has joined #openstack-infra | 13:04 | |
*** rh-jelabarre has joined #openstack-infra | 13:06 | |
*** boden has joined #openstack-infra | 13:08 | |
*** kjackal has quit IRC | 13:12 | |
*** kjackal has joined #openstack-infra | 13:12 | |
*** tpsilva has joined #openstack-infra | 13:15 | |
*** jamesmcarthur has joined #openstack-infra | 13:17 | |
*** ykarel|afk has joined #openstack-infra | 13:18 | |
*** ykarel|afk is now known as ykarel | 13:19 | |
*** agopi has quit IRC | 13:24 | |
*** jpena|lunch is now known as jpena | 13:26 | |
*** agopi has joined #openstack-infra | 13:30 | |
*** udesale has joined #openstack-infra | 13:30 | |
*** dave-mccowan has quit IRC | 13:32 | |
*** jamesmcarthur has quit IRC | 13:33 | |
ssbarnea|rover | clarkb: let me know when you are here, i want to ask you about logstash. | 13:34 |
*** ahosam has quit IRC | 13:35 | |
*** zul has joined #openstack-infra | 13:36 | |
*** jroll has quit IRC | 13:38 | |
fungi | ssbarnea|rover: are these logs from jobs which run in our ci system, or elsewhere? injecting third-party logs into our elasticsearch backend is something we've said in the past we won't support, and instead recommend those third parties operate their own log analysis systems (they're welcome to reuse the same mechanisms we do for running them if they like) | 13:38 |
*** jroll has joined #openstack-infra | 13:38 | |
fungi | ianw: catching up on scrollback, did you come to a conclusion on how to unblock system-config changes (the failing "Install IPv4 rules files" task)? | 13:40 |
ssbarnea|rover | fungi: so short answer no (way to have an unified interface to query logs across differrent CI systems). | 13:40 |
ssbarnea|rover | i guess there is no need to explain why this would be useful (also related to elastic-recheck), as same error could easily spread across different CIs | 13:41 |
fungi | ssbarnea|rover: right, we already struggle for a reasonable amount of retention with just the logs from our ci systems. we've also had other projects ask to reuse our elasticsearch cluster to house performance metrics from jobs in their jenkins simply so they can avoid having to maintain an elasticsearch cluster themselves... not sure where we can sanely draw the line, but previously we've said "only | 13:43 |
fungi | jobs which run in our ci system" | 13:43 |
fungi | you can also run your own elastic-recheck service. it's published under an open license too | 13:44 |
frickler | fungi: iiuc we'd have to make system-config-run-base-ansible-devel non-voting if we need to merge something before we find a fix or workaround for that ansible issue | 13:45 |
fungi | frickler: thanks, i need to go run some errands here shortly, but when i get back i can try to take a look so i can merge the mailing list changes which were scheduled to go in today | 13:46 |
frickler | fungi: I can prepare a patch for that | 13:47 |
*** jaosorior has joined #openstack-infra | 13:47 | |
fungi | is there a theory as to why ansible isn't finding the "Reload iptables Debian" handler? | 13:48 |
fungi | i saw ianw say something about exposing a bug in ansible | 13:48 |
frickler | fungi: https://github.com/ansible/ansible/issues/49430 has some details, but no root cause yet if I read it correctly | 13:50 |
ssbarnea|rover | fungi: :) i know, i was trying to lower the number of system I need to check, not increasing it. i do understand the reasons behind. still kibana supports doing queries on multiple clusters, which means that it could be possible to configure it as a single frontend for both clusters. | 13:50 |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/system-config master: Make system-config-run-base-ansible-devel non-voting https://review.openstack.org/621577 | 13:51 |
*** Douhet has joined #openstack-infra | 13:52 | |
*** jamesmcarthur has joined #openstack-infra | 13:52 | |
mordred | frickler: fascinating | 13:57 |
frickler | ianw: fungi: I think I found the commit that broke ansible for us, added reference to the issue. still not sure whether that implies that our usage is broken | 13:57 |
frickler | mordred: ^^ | 13:57 |
mordred | frickler: yah. I was just reading your comment there | 13:57 |
fungi | neat-o | 14:00 |
*** fried_rice is now known as efried | 14:00 | |
*** quiquell is now known as quiquell|lunch | 14:01 | |
*** efried is now known as fried_rice | 14:01 | |
*** fried_rice is now known as efried | 14:02 | |
*** jcoufal has joined #openstack-infra | 14:02 | |
*** kgiusti has joined #openstack-infra | 14:03 | |
*** yboaron_ has quit IRC | 14:05 | |
*** yboaron_ has joined #openstack-infra | 14:05 | |
*** mriedem has joined #openstack-infra | 14:07 | |
*** jcoufal has quit IRC | 14:07 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/system-config master: Fix iptables handlers https://review.openstack.org/621580 | 14:09 |
frickler | ianw: fungi: mordred: ^^ I think that this should be the fix, waiting to see job results | 14:10 |
*** jcoufal has joined #openstack-infra | 14:11 | |
fungi | thanks frickler! | 14:15 |
mordred | neat! | 14:16 |
*** jcoufal has quit IRC | 14:17 | |
*** jcoufal has joined #openstack-infra | 14:19 | |
*** psachin has quit IRC | 14:24 | |
*** SteelyDan is now known as dansmith | 14:28 | |
*** quiquell|lunch is now known as quiquell | 14:40 | |
fungi | okay, heading out for errands, back in a sec | 14:42 |
*** nhicher has joined #openstack-infra | 14:42 | |
*** jpich has quit IRC | 14:42 | |
*** lbragstad has joined #openstack-infra | 14:49 | |
*** gema has left #openstack-infra | 14:49 | |
*** jpich has joined #openstack-infra | 14:50 | |
*** dave-mccowan has joined #openstack-infra | 14:54 | |
*** bobh has joined #openstack-infra | 14:54 | |
*** lbragstad has quit IRC | 14:58 | |
*** lbragstad has joined #openstack-infra | 15:00 | |
*** beekneemech is now known as bnemec | 15:01 | |
*** jamesmcarthur has quit IRC | 15:06 | |
*** sthussey has joined #openstack-infra | 15:16 | |
hughsaunders | Hey, I've been looking into nodepool again, and it seems there isn't an attempt to route requests to workers that have ready capacity. Also ready capacity isn't evenly distributed, so once you have more than a few regions that can provide a label, the chances of hitting ready capacity are quite low. Eg if I have 5 regions, and min-ready:3, there will probably only be ready capacity in 2 regions, which gives a request a 2/5 | 15:30 |
hughsaunders | chance of hitting a ready node. | 15:30 |
*** dpawlik_ has quit IRC | 15:31 | |
hughsaunders | I started digging into the code because I couldn't work out why my requests were waiting for new instance builds when there were ready nodes waiting. | 15:31 |
hughsaunders | So am I doing something wrong? Or have I come to an accurate summary of the current situation? If so would you accept some kind of patch to attempt to prioritise regions with ready capacity? | 15:32 |
fungi | hughsaunders: nodepool/zuul development discussions likely have a better audience in the #zuul channel, as nodepool technically isn't an openstack-infra project any longer | 15:33 |
hughsaunders | probably should have remembered that, apologies and thanks. | 15:33 |
corvus | hughsaunders: if you want to hop over to that channel, i can answer your question there :) | 15:33 |
dmsimard | I was looking at the state of the gate because it seemed like there was a little bit of a backlog. Is it okay for certain projects to have >25 jobs on a single change ? | 15:37 |
mordred | dmsimard: yeah. there is a set of patches that started to be rolled out friday that are intended to make the backlog a bit fairer | 15:40 |
mordred | dmsimard: but as of now there haven't been any limits placed on to numbers of jobs per projects | 15:40 |
dmsimard | I like to think that if they have as many jobs it's because they need it, was just genuinely curious -- I think I saw a set of changes by tobiash to get metrics too. | 15:41 |
mordred | dmsimard: yah. like - I have a ton of jobs on sdk ... but they're all actually useful (I keep trying to remove some) | 15:42 |
*** jamesmcarthur has joined #openstack-infra | 15:42 | |
tobiash | dmsimard: you mean https://review.openstack.org/616306 ? | 15:43 |
dmsimard | yeah | 15:43 |
*** aojeagarcia has joined #openstack-infra | 15:51 | |
*** ykarel has quit IRC | 15:53 | |
*** ykarel has joined #openstack-infra | 15:53 | |
*** yboaron_ has quit IRC | 15:54 | |
*** aojea has quit IRC | 15:54 | |
*** rtjure has quit IRC | 15:55 | |
*** lennyb_ has quit IRC | 15:55 | |
*** jhesketh has quit IRC | 15:55 | |
*** dayou_ has quit IRC | 15:56 | |
AJaeger_ | tripleo is running again - or still - non-voting jobs in gate ;( . EmilienM , jaosorior, please see https://review.openstack.org/616872 which is right now top of zuul gate for tripleo and has 4 non-voting jobs | 15:56 |
EmilienM | AJaeger_: ok | 15:56 |
*** jhesketh has joined #openstack-infra | 15:57 | |
*** lennyb has joined #openstack-infra | 15:57 | |
*** quiquell is now known as quiquell|off | 15:58 | |
*** janki has joined #openstack-infra | 15:58 | |
EmilienM | mwhahaha: ^ didn't we fix that? | 15:59 |
mwhahaha | there's a patch | 16:00 |
mwhahaha | https://review.openstack.org/#/c/620705/ | 16:00 |
*** dayou_ has joined #openstack-infra | 16:00 | |
mwhahaha | pending approval :/ | 16:00 |
EmilienM | approved | 16:00 |
*** Douhet has quit IRC | 16:01 | |
AJaeger_ | thanks, EmilienM and mwhahaha ! | 16:01 |
*** woojay has joined #openstack-infra | 16:02 | |
*** Douhet has joined #openstack-infra | 16:02 | |
fungi | need us to promote that change to the front so it will take effect sooner? | 16:03 |
EmilienM | fungi: yes please | 16:04 |
AJaeger_ | fungi: it's only for tripleo-ci and there's onl 616872 at top of gate, let it finish | 16:04 |
*** gyee has joined #openstack-infra | 16:04 | |
fungi | k | 16:05 |
fungi | wasn't sure if it was in one of the longer shared queues | 16:05 |
AJaeger_ | it is in the longer shared queue - but only releveant for tripleo-ci and as there are no other changes for that repo, promoting would harm us IMHO | 16:06 |
fungi | ahh, figured if it was removing a lot of non-voting jobs then we would stop running them that much sooner | 16:07 |
*** rtjure has joined #openstack-infra | 16:10 | |
*** dpawlik has joined #openstack-infra | 16:11 | |
*** adriancz has quit IRC | 16:14 | |
*** dklyle has joined #openstack-infra | 16:15 | |
*** dpawlik has quit IRC | 16:16 | |
clarkb | amorin: fungi: any word if we should reenable bhs1 at this point? | 16:17 |
clarkb | ssbarnea|rover: I am here if you want to talk logstash, or did fungi answer your questions? | 16:17 |
*** dtantsur is now known as dtantsur|afk | 16:17 | |
clarkb | I agree we with fungi. Our elasticsaerch and logstash tooling is built for our CI system. Its unfortunately not a great set of tooling to offer to third parties (due to AAA being non existant and the size of the cluster already being quite large for the few days of logs we get out of it) | 16:18 |
ssbarnea|rover | clarkb: fungi answered most questions, mainly the only remaining one is if we can configure kibana to query both elastic-search clusters. | 16:18 |
clarkb | ssbarnea|rover: both meaning Infra's and RDOs? | 16:18 |
clarkb | no I don't think we should do that either | 16:18 |
ssbarnea|rover | clarkb: yep. rdo cluster could be optional. | 16:19 |
ssbarnea|rover | clarkb: i will try to see if I can configure the rdo kibana to query both (own and upstrean). | 16:19 |
*** dpawlik has joined #openstack-infra | 16:20 | |
ssbarnea|rover | the idea is to have one unified query interface | 16:20 |
clarkb | the issue is that we aren't one unified system though | 16:20 |
clarkb | the infra team has zero ability to fix bugs in rdop | 16:20 |
clarkb | but presenting that data as coming from our CI system would imply otherwise | 16:20 |
clarkb | and I don't want to create that confusion | 16:20 |
fungi | we get enough questions every time systems people already incorrectly assume we manage are offline | 16:21 |
dmsimard | was anyone looking at the issues we had in ovh hs ? | 16:21 |
dmsimard | bhs* | 16:21 |
ssbarnea|rover | clarkb: never mind, i will try to configure rdo to query both. | 16:21 |
fungi | dmsimard: amorin said he was going to look into it, yes | 16:21 |
*** bobh has quit IRC | 16:25 | |
*** yamamoto has joined #openstack-infra | 16:25 | |
*** janki has quit IRC | 16:26 | |
amorin | fungi: yes, I did try to take a look, but I was trapped in another topic | 16:27 |
dmsimard | amorin: let us know if we can help :) | 16:29 |
fungi | frickler: your proposed fix seems to be raising an "Unexpected Exception" from the iptables : Reload iptables (Debian) handler now | 16:29 |
fungi | not quite sure what to make of that | 16:29 |
fungi | http://logs.openstack.org/80/621580/1/check/system-config-run-base/caf2d3e/job-output.txt.gz#_2018-12-03_15_56_39_095892 | 16:30 |
*** yamamoto has quit IRC | 16:30 | |
fungi | i think it's saying that `netfilter-persistent start` exited nonzero? | 16:31 |
fungi | hrm, though the json mentions an rc of 0 | 16:32 |
fungi | so maybe it's not talking about that task | 16:33 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Don't calculate priority of non-live items https://review.openstack.org/621626 | 16:35 |
frickler | fungi: oh, that error is in -base now, not in -devel | 16:35 |
frickler | maybe the change isn't backwards compatible? | 16:35 |
fungi | ouch | 16:35 |
fungi | right, i missed that | 16:35 |
fungi | can we specify both import_tasks and include_tasks? | 16:36 |
*** lpetrut has quit IRC | 16:36 | |
fungi | or are they mutually-exclusive? | 16:36 |
frickler | I have not idea, I'll leave this to ansible experts now. mordred ianw ^^ | 16:36 |
frickler | s/not/no/ | 16:37 |
frickler | we can merge the nv patch in the meantime I'd say | 16:37 |
clarkb | frickler: frickler tldr is ansible 2.8.0 has broken things in a non backward compatible way? | 16:39 |
clarkb | I guess ianw filed a bug maybe I should start by reading that | 16:39 |
frickler | clarkb: the issue and the links in it should have some information. I'm not sure whether it is really backwards incompatible or my fix just needs more knowledge | 16:40 |
frickler | clarkb: for sure the cited merge broke the way we use ansible currently | 16:41 |
*** trown is now known as trown|lunch | 16:41 | |
*** dpawlik has quit IRC | 16:42 | |
*** e0ne has quit IRC | 16:48 | |
clarkb | looks like other users have reported similar issues | 16:51 |
clarkb | so maybe switching to -nv job for now and waiting to see if ansible fixes it for all of us is the way forward? | 16:51 |
*** dpawlik has joined #openstack-infra | 16:52 | |
pabelanger | clarkb: corvus: I'm around again today if we wanted to try again nodepool / zuul upgrades. I admit, I am not sure if there are any issues preventing us from trying again this morning | 16:57 |
corvus | pabelanger, clarkb: yeah, we could try now, or we could wait for 621626 to land. either should work. | 16:58 |
pabelanger | looking | 16:59 |
clarkb | If we wait then the end result is releasable assuming it works right? | 17:00 |
pabelanger | looks like a few hours of waiting, assuming we don't enqueue | 17:00 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Don't import in iptables handlers https://review.openstack.org/621633 | 17:00 |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Don't import tasks in iptables reload and use listen https://review.openstack.org/621634 | 17:00 |
corvus | frickler, clarkb, fungi, ianw, mordred: ^ two more alternatives to consider | 17:01 |
corvus | clarkb, pabelanger: why don't i direct-enqueue it | 17:01 |
pabelanger | +1 | 17:01 |
clarkb | corvus: ++ | 17:02 |
*** udesale has quit IRC | 17:02 | |
*** aojeagarcia has quit IRC | 17:07 | |
mordred | corvus: I think I like 621633 the best in this particular case, just because it's simpler | 17:07 |
corvus | mordred: yeah. i kind of like listen, and would lean toward that, if it weren't for the 'when' issue | 17:08 |
mordred | yeah | 17:09 |
*** graphene has joined #openstack-infra | 17:10 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Add #openstack-designate to accessbot https://review.openstack.org/621639 | 17:15 |
corvus | frickler: ^ | 17:15 |
*** armax has joined #openstack-infra | 17:16 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool https://review.openstack.org/621642 | 17:18 |
*** manjeets has joined #openstack-infra | 17:19 | |
*** dpawlik has quit IRC | 17:19 | |
*** bobh has joined #openstack-infra | 17:20 | |
*** bobh has quit IRC | 17:21 | |
clarkb | fungi: gerrit slowness hasn't happened again and we are still blocking stackalytics user? | 17:25 |
*** jpich has quit IRC | 17:25 | |
clarkb | corvus: frickler re -designate channel, I can't seem to list access for that channel with chanserv? | 17:27 |
corvus | clarkb: yep. you will when the accessbot change lands | 17:31 |
clarkb | corvus: was it set up intentioanlly that way before? seems odd | 17:32 |
corvus | clarkb: not sure; might be a side effect of some of the modes set on it? | 17:32 |
corvus | i'm afk for 30m; should be ready to restart zuul when i get back | 17:34 |
corvus | apparently i jinxed it; py36 failed | 17:34 |
clarkb | fwiw I +2'd https://review.openstack.org/621633 as I agree wtih mordred that I prefer it because it is simpler | 17:34 |
clarkb | frickler: fungi ^ if others want to maybe review a fix for the ansible thing | 17:35 |
*** dpawlik has joined #openstack-infra | 17:35 | |
corvus | the sql failures again; i'm going to re-enqueue | 17:35 |
corvus | (also, that's the second time i've see the sql failures on limestone) | 17:36 |
mordred | clarkb: I have also +2d, but have not +Ad so that we can get folks to weigh in | 17:36 |
mordred | clarkb: I wish there was a condorcet plugin for gerrit that would allow people to rank vote on a collection of patches. I have no interest in writing such a plugin though | 17:37 |
corvus | mordred: ++ | 17:37 |
clarkb | mordred: you could probably implement that intirely in prolog | 17:37 |
mordred | clarkb: yah. DEFINITELY don't want to implement a condorcet voting system in prolog | 17:37 |
clarkb | :) | 17:38 |
corvus | ok. re-enqueued. back in ~30. | 17:38 |
mordred | but maybe zaro will get bored one day and write it :) | 17:38 |
*** dpawlik has quit IRC | 17:39 | |
fungi | clarkb: correct, i haven't seen any coordinated reports of slowness (just the occasional ones which only seemed to affect one person and couldn't be reproduced globally) | 17:41 |
fungi | and i haven't removed the ip6tables rule blocking the address stackalytics-bot-2 was seen coming from | 17:42 |
*** shardy has quit IRC | 17:46 | |
*** bobh has joined #openstack-infra | 17:55 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Extract out common config parsing for ConfigPool https://review.openstack.org/621642 | 17:59 |
*** derekh has quit IRC | 18:02 | |
*** e0ne has joined #openstack-infra | 18:03 | |
jonher | Gate never ran on https://review.openstack.org/619216/ is a normal recheck required or is there another command to only have it recheck gate? | 18:05 |
openstackgerrit | Merged openstack-infra/zuul master: Don't calculate priority of non-live items https://review.openstack.org/621626 | 18:07 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Handle github delete events https://review.openstack.org/621665 | 18:10 |
*** ralonsoh has quit IRC | 18:11 | |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project https://review.openstack.org/621666 | 18:11 |
fungi | jonher: that's really strange. i don't see any indication of maintenance activity around that time | 18:11 |
clarkb | fungi: jonher: zuul was restarted a couple times on that day | 18:12 |
clarkb | tryign to get the relative priority work deployed | 18:12 |
fungi | indeed it was according to http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-30.log.html | 18:12 |
fungi | just never made it into https://wiki.openstack.org/wiki/Infrastructure_Status | 18:12 |
jonher | OK, so a simple "recheck" should get things going again? | 18:13 |
clarkb | jonher: yes | 18:13 |
clarkb | or better yet reapproval | 18:13 |
clarkb | whic I've done | 18:13 |
clarkb | (then we can skip the check queue) | 18:13 |
fungi | approve event in gerrit was at 22:49 and looks like there was indeed a zuul scheduler restart ni progress according to the channel log | 18:13 |
fungi | so no mystery, just poor timing on my part with the approve button | 18:14 |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project https://review.openstack.org/621666 | 18:14 |
jonher | gr8, thanks clarkb | 18:14 |
*** jpena is now known as jpena|off | 18:17 | |
*** apetrich has quit IRC | 18:18 | |
openstackgerrit | Merged openstack-infra/infra-manual master: Replace mailing list https://review.openstack.org/619216 | 18:23 |
clarkb | fungi: did old mailing lists get disabled yet? that is on tap for today right? | 18:24 |
jonher | ^ now it merged, thanks again :) | 18:24 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Fix broken setRefs whith missing objects https://review.openstack.org/621667 | 18:24 |
fungi | clarkb: that is on tap for today, but need to be able to merge system-config patches to do that, ideally | 18:25 |
clarkb | fungi: did you see https://review.openstack.org/#/c/621633/ as a fix for that? | 18:25 |
fungi | yep, and earlier attempts | 18:25 |
fungi | was waiting to see check results | 18:25 |
*** udesale has joined #openstack-infra | 18:26 | |
*** apetrich has joined #openstack-infra | 18:30 | |
*** electrofelix has quit IRC | 18:31 | |
*** ykarel is now known as ykarel|away | 18:36 | |
corvus | clarkb, pabelanger: zuul change is in place; shall we start some restarts now? | 18:38 |
*** eernst has joined #openstack-infra | 18:39 | |
fungi | or restart some starts | 18:39 |
corvus | perhaps most accurately: restart some restarts | 18:39 |
clarkb | I'm around and ready | 18:41 |
fungi | also around and not mired in anything especially sticky | 18:42 |
*** vabada has quit IRC | 18:43 | |
corvus | would someone like to go aheand and restart the nodepool launchers? | 18:44 |
corvus | and i can restart the zuul scheduler afterwords | 18:44 |
fungi | i can do that | 18:45 |
fungi | any special care to take, or just service restart them? | 18:45 |
corvus | fungi: maybe start with nl04 | 18:45 |
corvus | we did merge at least one change since the last time we restarted them | 18:45 |
pabelanger | corvus: clarkb: I am around | 18:46 |
pabelanger | on standby if needed | 18:46 |
fungi | pbr freeze says we have nodepool==3.3.2.dev67 # git sha f116826 installed on nl04 | 18:46 |
corvus | looks right | 18:47 |
fungi | that's what we're expecting, seems to match origin/master | 18:47 |
clarkb | ya nl04 is good choice while bhs1 is disabled | 18:47 |
*** ykarel|away has quit IRC | 18:47 | |
fungi | nodepool-launcher restarted on nl04 now | 18:47 |
corvus | it's going to be very very chatty for a bit | 18:48 |
fungi | with `service nodepool-launcher restart` which seems to have worked. new pid, current time | 18:48 |
fungi | and yeah, tailing the debug log it is indeed chatty | 18:48 |
fungi | seems to have reached steady state now? | 18:48 |
fungi | it's handling requests anyway | 18:49 |
corvus | yeah, still very chatty. i'm on the fence about whether we can handle that level long-term. but it's going to be useful for the next little bit to be able to examine the new behavior. | 18:49 |
corvus | i think we can proceed to restart the rest | 18:49 |
fungi | shall i work my way down the list with nl03 next? | 18:49 |
corvus | ++ | 18:50 |
fungi | f116826 is installed there too | 18:50 |
fungi | it's restarted on nl03 now | 18:50 |
fungi | while that's going, i've checked `pbr freeze` on nl02 and 01 and they both look right as well | 18:52 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool master: Make launcher debug slightly less chatty https://review.openstack.org/621675 | 18:53 |
corvus | that's for later ^ | 18:53 |
fungi | i think nl03 is handling requests, the debug log is just so firehose it never pauses | 18:54 |
fungi | shall i move on to nl02? | 18:54 |
corvus | fungi: yep | 18:54 |
fungi | okay, it's restarted as well | 18:55 |
*** diablo_rojo has joined #openstack-infra | 18:55 | |
mordred | corvus, fungi, clarkb: I'm around-ish .. but a dude is coming over to the house in a few minutes to give us some quotes on some work, so I'm not around-around | 18:56 |
corvus | hrm. we're missing a debug line at the start of request processing; it's hard to tell (with grep) when the loop starts again | 18:56 |
fungi | i do see nl02 seeming to satisfy some requests though according to the log | 18:56 |
fungi | if i'm reading correctly | 18:56 |
*** wolverineav has joined #openstack-infra | 18:57 | |
clarkb | yes it appears to be declining requests for citycloud | 18:58 |
fungi | okay, moving on to nl01 i guess | 18:58 |
fungi | and it's restarted | 18:59 |
*** trown|lunch is now known as trown|outtypewww | 18:59 | |
fungi | this one's not so active compared to 02 and 03 | 18:59 |
fungi | openstack.exceptions.HttpException: HttpException: 403: Client Error for url: https://ord.servers.api.rackspacecloud.com/v2/637776/servers, Quota exceeded for ram: Requested 8192, but already used 1638400 of 1641728 ram | 19:00 |
fungi | whee! | 19:00 |
clarkb | fungi: the launchers with disabled providers are more active since they just decline things | 19:01 |
clarkb | "more active" | 19:01 |
*** wolverineav has quit IRC | 19:01 | |
fungi | oh, right, that's what it is | 19:01 |
*** wolverineav has joined #openstack-infra | 19:01 | |
fungi | i hadn't made that connection | 19:01 |
mordred | that's working-as-designed :) | 19:04 |
*** e0ne has quit IRC | 19:04 | |
clarkb | ya I think this looks happy | 19:05 |
clarkb | corvus: are we ready to restart zuul? | 19:05 |
fungi | seems sane on the launcher end now at any rate | 19:05 |
corvus | let's hold the zuul restart for a few minutes; there's a release making its way through right now | 19:05 |
corvus | it has 1min left in gate; then of course the actual post-merge release activity | 19:06 |
corvus | https://review.openstack.org/#/c/620919/ | 19:06 |
corvus | after that we should be good (see #openstack-release) | 19:06 |
*** jamesmcarthur has quit IRC | 19:06 | |
fungi | looks like the system-config fix is really, really close to getting node assignments | 19:06 |
corvus | fungi: it should still be after the restart. | 19:08 |
fungi | indeed | 19:08 |
corvus | (possibly closer) | 19:08 |
*** shardy has joined #openstack-infra | 19:17 | |
Shrews | hrm, did we remove a provider pool from nl04? | 19:17 |
Shrews | WARNING nodepool.driver.openstack.OpenStackProvider: Cannot find provider pool for node | 19:17 |
*** e0ne has joined #openstack-infra | 19:17 | |
clarkb | we disabled bhs1 via max servers | 19:19 |
fungi | yeah, didn't remove one afaik | 19:19 |
clarkb | I don't think we remoed any providers though. Does it not log the one it thinks is missing? | 19:19 |
Shrews | this is for ovh-gra1, which still exists in nodepool.yaml | 19:19 |
Shrews | something is weird there | 19:19 |
Shrews | pool and launcher node attributes are empty. maybe this is due to corvus' recent change... | 19:20 |
clarkb | we seem to have launched new nodes there since the restart | 19:20 |
*** priteau has quit IRC | 19:20 | |
*** amotoki has quit IRC | 19:21 | |
Shrews | hrm, not the change i was thinking of... | 19:22 |
corvus | Shrews: the pool is named "pool"? | 19:23 |
*** amotoki has joined #openstack-infra | 19:23 | |
corvus | oh, no you said it's None. sorry. | 19:24 |
fungi | cruft for something hanging around in zk? | 19:24 |
corvus | Shrews: could it be that when we create a fake node for deleting a failure, it has no pool entry? | 19:25 |
Shrews | corvus: seems that way (just a warning that i hadn't noticed). ovh doesn't seem to be able to delete that instance, so it's hanging around | 19:26 |
Shrews | so a problem with the provider | 19:26 |
corvus | Shrews: ok, so we're still trying to delete those nodes (ie, it's a non-fatal error)? | 19:26 |
Shrews | corvus: right | 19:26 |
fungi | since their upgrade (to newton i think?) ovh has been struggling to satisfy delete requests in a timely fashion | 19:26 |
tobiash | yes, we only set the provider | 19:26 |
*** wolverineav has quit IRC | 19:27 | |
Shrews | tobiash: is that warning useful? | 19:27 |
corvus | Shrews: ok. we're probably seeing it more because of the recent fix to create those stub nodes more often (on launch failures which return an external id) | 19:27 |
clarkb | unrelated but https://github.com/kubernetes/kubernetes/issues/71411 probably means we want to redeploy the nodepool k8s cluster when oen of those patched versiosn is available | 19:27 |
tobiash | Shrews: from where does it come? | 19:27 |
corvus | Shrews: we could probably copy in the pool from the original request | 19:27 |
Shrews | tobiash: during quota calculation | 19:27 |
clarkb | corvus: ++ | 19:27 |
*** wolverineav has joined #openstack-infra | 19:28 | |
tobiash | actually then that node isn't taken into account during quota calculation | 19:28 |
tobiash | so I think the warning was useful | 19:28 |
tobiash | I think we should add the pool to these nodes too | 19:28 |
Shrews | tobiash: ++ | 19:28 |
tobiash | but that's something that is already there for a long time so nothing fatal | 19:29 |
corvus | heh -- the fix was to make sure the node was taken into account for quota. so.. yep. :) | 19:29 |
fungi | corvus: unrelated, but the 621633 fix for system-config is failing puppet-beaker-rspec-puppet-4-infra-system-config and system-config-run-base | 19:30 |
fungi | digging into logs for those now | 19:30 |
clarkb | is createServer what sets node.pool? | 19:30 |
fungi | the former is raising "ERROR! The requested handler 'Reload iptables Debian' was not found in either the main handlers list nor in the listening handlers list" | 19:31 |
fungi | and so is the latter | 19:31 |
tobiash | Shrews, corvus: as the comment states, the node is in a funny state: http://paste.openstack.org/show/736590/ :) | 19:31 |
fungi | so i guess that's still being referenced | 19:31 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Set pool for error'ed instances https://review.openstack.org/621681 | 19:32 |
*** wolverineav has quit IRC | 19:32 | |
Shrews | i think ^^ fixes it | 19:32 |
*** wolverineav has joined #openstack-infra | 19:32 | |
clarkb | oh right its because we make a copy of the node data structure in the bubbled up exception handler | 19:33 |
clarkb | we don't use the actual node, instead that is reused | 19:33 |
clarkb | Shrews: ya I think that should fix it | 19:33 |
*** bobh has quit IRC | 19:33 | |
corvus | fungi: hrm. i guess that doesn't work. i don't immediately know why, but i wonder if it's due to the arcane rules for referencing handler tasks by name (referred to vaguely in one of the ansible bug reports) | 19:34 |
corvus | in other news, the release is done, so we can restart zuul now | 19:34 |
clarkb | do we need Shrews' fix to make quota crunching work properly? | 19:35 |
tobiash | clarkb: not immediate, this thing is already there for a long time | 19:35 |
clarkb | we merged two related chagnes around that before. The first attempted to track the ndoes properly and the second to track untracked nodes. I think these ndoes are currently "tracked" but then fail to be deleted | 19:36 |
clarkb | tobiash: yes. Mostly wondering if the secodn related change will change the behavior in a more negative way than what we had before | 19:36 |
tobiash | which one? | 19:36 |
clarkb | tobiash: 56164c886a81c5d5c67eaac789a6288dd555189b | 19:37 |
clarkb | I guess its the same as it was before since ^ will see them as untracked and not account for them and afbf9108d893ede0d147da2afe16c9e6d4bc76d4 will basically treat them as untracked too | 19:37 |
clarkb | so not a worse regression, just not fixed yet | 19:37 |
tobiash | clarkb: that dows | 19:37 |
tobiash | clarkb: that doesn't make use of the pool, so shouldn't matter | 19:38 |
AJaeger_ | corvus, clarkb , frickler , #openstack-designate redirects to #openstack-dns, I'll WIP https://review.openstack.org/#/c/621639 since I think it's wrong | 19:38 |
clarkb | AJaeger_: oh that explains it | 19:38 |
clarkb | corvus: I'm ready for scheduler restart whenever you are | 19:39 |
fungi | corvus: your 621634 alternative is actually passing all its jobs | 19:39 |
clarkb | looks like tripleo gate just reset too | 19:39 |
clarkb | so not a bad time for it | 19:39 |
fungi | so that one might win for being the only one proposed so far which actually works ;) | 19:40 |
tobiash | clarkb: for the record, this is the change that introduced the 'without pool nodes': https://review.openstack.org/589854 | 19:40 |
corvus | AJaeger_: can you elaborate on why you think the change is wrong? | 19:40 |
tobiash | so that merged 3 months ago | 19:40 |
clarkb | tobiash: ya and afbf9108d893ede0d147da2afe16c9e6d4bc76d4 attempted to rely on it but was incomplete | 19:41 |
tobiash | ah that makes sense | 19:42 |
clarkb | fungi: that is weird since 621633 uses the existing handler names in the main handler file. Basically that didn't change. So odd we'd run into import_tasks errors in that file if it can't even find the handlers | 19:42 |
AJaeger_ | corvus: see my comment - joining #openstack-designate, the topic is "This channel is unused, use #openstack-dns" | 19:43 |
*** AJaeger_ is now known as AJaeger | 19:43 | |
AJaeger | corvus: so, why are you adding it? What triggered that change? | 19:43 |
corvus | AJaeger_: yes... i'm not suggesting anyone use it. i'm just trying to establish basic access. | 19:43 |
fungi | clarkb: i dunno what to tell you, but aside from infra-puppet-apply-3-centos-7 which only just got a node assignment, every other job has reported success on 621634 | 19:43 |
*** gfidente is now known as gfidente|afk | 19:43 | |
corvus | AJaeger: frickler needs access to that channel to be able to set +i to make the forward effective. accessbot will grant him that access. | 19:44 |
corvus | AJaeger: see http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-28.log.html#t2018-11-28T18:57:50 and http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-11-28.log.html#t2018-11-28T19:04:52 | 19:44 |
AJaeger | corvus: Ah! That explains it - thanks, then all is fine! | 19:44 |
corvus | AJaeger: but, moreover, i can't see as how any change that adds an "#openstack-*" channel to accessbot would be wrong. | 19:44 |
corvus | all openstack channels should be managed by accessbot | 19:44 |
clarkb | fungi: ya mostly just pointing out its weird that a change whcih doesn't change teh addressing of the handlers would go from failing to run said handlers due to import tasks to failing to find the handlers at all. | 19:45 |
clarkb | seems like this should be alarm bell worthy for ansible 2.8 release process if its goign to create havoc in handlers for people | 19:45 |
AJaeger | corvus: even if unused? | 19:45 |
corvus | AJaeger: yeah, i don't see why not | 19:46 |
corvus | AJaeger: otherwise, we won't maintain op access for new global irc ops, etc. | 19:46 |
AJaeger | Ok, I see... | 19:46 |
AJaeger | thanks for explanation, corvus | 19:46 |
corvus | AJaeger: np :) | 19:47 |
fungi | AJaeger: consider a future state where we want to start using the channel again and we left it in some old state owned exclusively by accounts we replaced in intervening years | 19:47 |
fungi | keeping the abandoned channels in our accessbot config preserves our access to them | 19:48 |
corvus | i'll restart the zuul scheduler now | 19:48 |
*** shardy has quit IRC | 19:48 | |
AJaeger | fungi: understood now - thanks | 19:48 |
clarkb | mordred: Shrews pabelanger is that something that ansible might find as useful prerelease feedback? do we need to do anything other than just watch the existing bugs for the issue? | 19:49 |
pabelanger | clarkb: feedback of the iptables issue? | 19:51 |
*** e0ne has quit IRC | 19:51 | |
clarkb | pabelanger: yes. Basically we can't use import_tasks anymore in the handlers. But then if we switch to using normal tasks in the handler (https://review.openstack.org/#/c/621633/1/playbooks/roles/iptables/handlers/main.yaml) then ansible 2.8 says it can't find the handler for Reload iptables Debian | 19:51 |
clarkb | the fix that does work is 621633 which adds explicit listens to the handlers | 19:52 |
*** e0ne has joined #openstack-infra | 19:52 | |
pabelanger | clarkb: Yah, we could ask in #ansible if it will be useful info | 19:53 |
fungi | clarkb: the fix which works (or seems to) is 621634 not 621633 | 19:55 |
fungi | though it uses listen as you describe | 19:55 |
corvus | zuul is restarted | 19:56 |
clarkb | oh sorry I copy pasta'd wrong | 19:57 |
*** wolverineav has quit IRC | 19:57 | |
clarkb | fungi: yup 621634 is the one I meant | 19:57 |
*** wolverineav has joined #openstack-infra | 19:58 | |
clarkb | corvus: is not being able to get a status related to zuul loading its config on first start? | 20:00 |
clarkb | oh there it goes | 20:00 |
*** tpsilva has quit IRC | 20:01 | |
corvus | clarkb: related to re-enqueuing (they're both gearman jobs) | 20:01 |
corvus | i've examined the extra debug logs from nodepool and verified that it's processing priority 0 requests before higher numbers | 20:02 |
*** e0ne has quit IRC | 20:02 | |
corvus | also, the priority column is visible in the 'nodepool request-list' output. | 20:03 |
*** wolverineav has quit IRC | 20:03 | |
*** e0ne has joined #openstack-infra | 20:03 | |
pabelanger | Yay | 20:04 |
clarkb | fungi: do you think we should enqueue 621634 to the gate since its been shown to work but didn't finish check testing? | 20:05 |
fungi | clarkb: yes, i think so as long as everyone prefers that to making the job nonvoting | 20:05 |
fungi | it seemed to be the least preferred of the various attempts at fixing, so i wasn't sure | 20:05 |
corvus | so i think things are functioning correctly; probably the next step is to see if things behave how we expect with the changes. that will probably be easier to evaluate after we get past the restart. | 20:06 |
clarkb | fungi: I think this type of error shows there is value in having the test and I worry that if you set it non voting we'll just ignore new failures | 20:06 |
fungi | me too | 20:06 |
fungi | corvus: i concur | 20:06 |
clarkb | corvus: ya last time it seemed that the restart made it hard to see what was normal behavior | 20:06 |
clarkb | I've +2'd 621634 and think we can move fowrad with that while nasible figures out if its broken things sufficiently for fixing | 20:07 |
corvus | it's priority 1, btw. | 20:07 |
corvus | 621634 is | 20:07 |
fungi | i guess 33 was pri0 | 20:08 |
corvus | so, aside from the fact that the whole system is busy satisfying nodes for the changes which arrived first, it's pretty high on the list for check nodes | 20:08 |
clarkb | fungi: yup | 20:08 |
corvus | ooh | 20:09 |
corvus | i want to dequeue 33 and see if 34 gets bumped | 20:09 |
fungi | an excellent test! | 20:09 |
fungi | i say go for it | 20:09 |
corvus | done! | 20:09 |
*** e0ne has quit IRC | 20:10 | |
fungi | it fell out of the check pipeline at least | 20:10 |
corvus | 2018-12-03 20:09:39,668 DEBUG zuul.nodepool: Revised relative priority of node request <NodeRequest 300-0000624391 <NodeSet [<Node None ('bridge.openstack.org',):ubuntu-bionic>, <Node None ('trusty',):ubuntu-trusty>, <Node No | 20:10 |
corvus | ne ('xenial',):ubuntu-xenial>, <Node None ('bionic',):ubuntu-bionic>, <Node None ('centos7',):centos-7>]>> from 1 to 0 | 20:10 |
clarkb | jobs just started | 20:10 |
clarkb | seems to work as expected | 20:10 |
corvus | yep -- that log line plus i checked nodepool request-list and saw it go from 1 to 0 | 20:11 |
fungi | and it's getting nodes already | 20:12 |
fungi | yeah | 20:12 |
fungi | slick! | 20:12 |
pabelanger | ++ | 20:12 |
corvus | \o/ | 20:12 |
fungi | i love it when a plan comes together | 20:12 |
*** jamesmcarthur has joined #openstack-infra | 20:12 | |
* corvus lights cigar | 20:12 | |
corvus | i've rechecked 633 (for posterity) | 20:15 |
corvus | granted, posterity is, what, a few weeks around here, but hey. | 20:15 |
*** david-lyle has joined #openstack-infra | 20:16 | |
corvus | so i think i'll eat some food now, and then come back and make sure that we're actually reporting on changes and don't have any crazy new exceptions, then i'll send that email we drafted friday | 20:16 |
fungi | thanks! i'll get back to drafting e-mails about mailing list shutdowns | 20:17 |
*** manjeets_ has joined #openstack-infra | 20:17 | |
*** e0ne has joined #openstack-infra | 20:17 | |
*** eernst has quit IRC | 20:18 | |
*** manjeets has quit IRC | 20:18 | |
*** dklyle has quit IRC | 20:18 | |
*** munimeha1 has joined #openstack-infra | 20:19 | |
*** jamesmcarthur has quit IRC | 20:20 | |
*** e0ne has quit IRC | 20:21 | |
*** jamesmcarthur has joined #openstack-infra | 20:22 | |
* mordred is back - looks like the new stuff is working good! | 20:26 | |
clarkb | ssbarnea|rover: fyi http://logs.openstack.org/38/618638/1/gate/tripleo-ci-centos-7-containers-multinode/45126b1/ara-report/file/eb257cab-ab3a-45e8-8d69-f33d118f5916/#line-10 is failing because it needs root | 20:27 |
ssbarnea|rover | clarkb: ouch... kinda is almost 9pm here... ] | 20:28 |
clarkb | I'm guessing https://review.openstack.org/#/c/616872/ is the cause. No worries thought I'd point it out to someone in tripleo | 20:30 |
clarkb | EmilienM: mwhahaha ^ you may care too and be more on awake timezones right now ;) | 20:30 |
*** udesale has quit IRC | 20:30 | |
mwhahaha | waa? | 20:30 |
mwhahaha | oh thanks | 20:30 |
clarkb | actually that change may be unrelated. Now thinking that maybe if package has no updates and is already installed that works because you can yum info/list without root | 20:31 |
clarkb | but if that package has updated in rdo or centos or somewhere then we'll try to upgrade it and then it breaks | 20:31 |
clarkb | in any case become: true there likely necessary | 20:32 |
mwhahaha | i shall fix | 20:32 |
*** wolverineav has joined #openstack-infra | 20:32 | |
ssbarnea|rover | clarkb: thanks for reproting, i am creating bug for it now. true: become is a must there, is... obvious. | 20:32 |
mwhahaha | ssbarnea|rover: you want to fix it since you're creating a bug | 20:33 |
ssbarnea|rover | mwhahaha: ok, i will do both. pinging you to review. | 20:33 |
ssbarnea|rover | in fact creating bug is harder than crearting the CR :D | 20:34 |
mwhahaha | pretty much | 20:34 |
ianw | at least it seems the ansible-devel job is working to find issues well before we update and everything explodes :) | 20:36 |
clarkb | ianw: ya and the fix for the first issue seems to have found a second issue | 20:37 |
*** graphene has quit IRC | 20:38 | |
ianw | clarkb: so that's 621633 ... where using the block: the handlers also don't seem to be found/triggered? | 20:39 |
clarkb | ianw: ya the handler isn't found | 20:40 |
clarkb | could be the same issue manifesting differently or two different issues, unsure | 20:40 |
corvus | how did those tripleo changes end up in gate with that error? | 20:41 |
ianw | ok, my github bug wasn't the best i know, i didn't have a test-case and only noticed it was the imports late in the day. can work on getting something use for bug now we have some smoking guns | 20:41 |
ssbarnea|rover | clarkb: https://review.openstack.org/#/c/621696/ -- going out now. | 20:41 |
*** e0ne has joined #openstack-infra | 20:42 | |
*** e0ne has quit IRC | 20:42 | |
corvus | we've merged changes since the restart | 20:44 |
clarkb | corvus: I think it may have to do with local image install state and remote package availability | 20:44 |
clarkb | corvus: ansible can check if you have teh latest installed without root. And if you do have latest already installed its fine | 20:44 |
clarkb | corvus: but if the upstream package repo updates then now you need root to reconcile the delta | 20:44 |
*** hjensas has joined #openstack-infra | 20:45 | |
clarkb | I expect all the changes to fail with those errors until 621696 merges or the upstream package repo reverts the update | 20:45 |
corvus | i think i see a new exception in the scheduler logs; i'm digging | 20:48 |
clarkb | and ya there is a relatively recent package for util-linux in the centos 7 package repo. Time stamp is jsut over 2weeks old. Unsure if that timestamp maps to build time or publish or what | 20:48 |
clarkb | seems like octavia is also having rpm/yum/centos related issues | 20:50 |
mwhahaha | ugh so that ceph-loop-device thing is going to completely hose up the gate, any way to get that promoted to the top of the tripleo gate? | 20:50 |
clarkb | http://logs.openstack.org/38/617838/5/gate/octavia-v2-dsvm-scenario-centos-7/9e669f8/job-output.txt.gz#_2018-12-03_20_09_45_935256 | 20:50 |
clarkb | mwhahaha: ya if it gets approved I can enqueue and promote it | 20:51 |
mwhahaha | clarkb: aproved | 20:51 |
mwhahaha | all approved | 20:51 |
mwhahaha | er also | 20:51 |
* mwhahaha give sup | 20:51 | |
clarkb | I wonder if that would be a useful ansible lint rule | 20:52 |
mwhahaha | yes | 20:52 |
clarkb | use become for package installs | 20:52 |
clarkb | promotion is running now | 20:53 |
clarkb | and done | 20:54 |
mwhahaha | thanks | 20:54 |
clarkb | johnsom: rm_work hey not sure why yet, but it seems centos7 updates have broken octavia gates, you'll probably want to look into it | 20:54 |
clarkb | I'm guessing today is the next point release release | 20:55 |
johnsom | clarkb I saw a failure this morning with a missing package at RAX, just assumed it was a mirror sync issue | 20:55 |
clarkb | johnsom: I think its likely due to 7.6 or whatever the number is happening | 20:56 |
clarkb | johnsom: and packages being broken there? I'm not sure. Yum called it a non fatal rpm install thing | 20:56 |
johnsom | The one I saw was radvd couldn't be downloaded from the mirror | 20:57 |
clarkb | http://logs.openstack.org/38/617838/5/gate/octavia-v2-dsvm-scenario-centos-7/9e669f8/job-output.txt.gz#_2018-12-03_20_09_10_181752 at least I'm not seeing anything else that could be thep roblem | 20:57 |
*** apetrich has quit IRC | 20:58 | |
fungi | #status log removed static.openstack.org from the emergency disable list now that ara configuration for logs.o.o site has merged | 20:58 |
openstackstatus | fungi: finished logging | 20:58 |
*** udesale has joined #openstack-infra | 20:58 | |
clarkb | https://lwn.net/Articles/773680/ yup its likely 7.6 | 20:58 |
clarkb | #status Log CentOS 7.6 appears to have been released. Our mirrors seem to have synced this release. This is creating a variety of fallout in projects such as tripleo and octavia. Considering that 7.5 is now no longer supported we should address this by rolling forward and fixing problems. | 20:59 |
openstackstatus | clarkb: finished logging | 20:59 |
clarkb | johnsom: reading the devstack function for detecting failures any one of those lines that says failure: something will cause the failure to bubble up in devstack | 21:04 |
clarkb | though maybe the no package golang error is the actual issue? | 21:05 |
clarkb | sure enough there is no golang package | 21:06 |
clarkb | ianw: ^ this is something you probably have the hsitory around to know how to debug | 21:06 |
clarkb | well thats curious 7.5 had golang | 21:07 |
clarkb | 7.6 does not | 21:07 |
johnsom | That seems like an issue bigger than a dot release... | 21:08 |
clarkb | well its the dot release not being backward compatible by removing packages | 21:08 |
clarkb | so yes, but also not much we can do about it? may need to enable epel and use their golang? | 21:08 |
*** gema has joined #openstack-infra | 21:09 | |
ianw | hrm, that doesn't look like intended behaviour | 21:09 |
ianw | it does say non-fatal error ... we do have some extra stuff in there because yum doesn't exit with !0 on missing packages | 21:10 |
clarkb | ianw: ya devstack has a check of itself to look for Failure: and no package lines | 21:10 |
clarkb | in this case Failure: is not going to match failure: I don't think since awk should be case sensitive. I now believe the lack of a golang package is the issue | 21:10 |
clarkb | which is devstack checking correctly that all packages installed (and they did not) | 21:11 |
ianw | No package golang available. | 21:11 |
ianw | yeah, i think we've come to the same conclusion this is a correct detection of the golang package not being found :) | 21:12 |
ianw | why this just started happening ... | 21:12 |
ianw | is another question | 21:12 |
mordred | ianw: because. raisins | 21:12 |
clarkb | ianw: because 7.6 just released | 21:13 |
clarkb | likely our mirrors just recently finished releasing | 21:13 |
clarkb | mwhahaha: you'll likely want to keep an eye out for any other fallout now that the become: true fix is in place | 21:13 |
clarkb | mwhahaha: since there is a non zero chance there are other breaking issues | 21:13 |
openstackgerrit | Ed Leafe proposed openstack-infra/project-config master: Add the os-resource-classes project https://review.openstack.org/621666 | 21:14 |
ianw | yeah, i mean why golang would disappear between releases | 21:14 |
tosky | or maybe golang is just somewhere else | 21:14 |
mwhahaha | clarkb: yea | 21:14 |
clarkb | tosky: regardless its still backward incompatible change for a stable distro | 21:15 |
fungi | perhaps they renamed the package? | 21:15 |
clarkb | I don't think putting the package in a differet location changes how I feel about that | 21:15 |
tosky | clarkb: it depends on the place of the repository | 21:16 |
tosky | on which repository | 21:16 |
fungi | or yeah maybe they moved it to a different rhn channel | 21:16 |
tosky | I don't know how it works internally with golang, but I see this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.5_release_notes/chap-red_hat_enterprise_linux-7.5_release_notes-deprecated_functionality | 21:16 |
fungi | or whatever they renamed those in the days since rhn | 21:16 |
clarkb | tosky: http://mirror.centos.org/centos/7/os/x86_64/Packages/ is where it was and is now missing | 21:16 |
ianw | "The golang package, available in the Optional channel, will be removed from a future minor release of Red Hat Enterprise Linux 7. Developers are encouraged to use the Go Toolset instead, which is currently available as a Technology Preview through the Red Hat Developer program. " | 21:17 |
clarkb | and ya that explains it | 21:17 |
ianw | that sounds likely | 21:17 |
fungi | whee! | 21:17 |
ianw | i think centos has go toolset? | 21:17 |
*** diablo_rojo has quit IRC | 21:18 | |
*** apetrich has joined #openstack-infra | 21:18 | |
clarkb | http://mirror.centos.org/centos/7/sclo/x86_64/rh/go-toolset-7/ is that it? | 21:18 |
clarkb | those versions are older than what were in 7.5 so may not fix everything if the go version matters | 21:18 |
clarkb | wait one is older one is newer | 21:18 |
*** diablo_rojo has joined #openstack-infra | 21:19 | |
*** priteau has joined #openstack-infra | 21:19 | |
tosky | unless you also need to enable the repository with containers-related | 21:19 |
tosky | stuff | 21:19 |
tosky | https://wiki.centos.org/Container/Tools -> it seems to contain golang | 21:20 |
*** manjeets_ is now known as manjeets | 21:20 | |
ianw | clarkb: the slco i think is enabled via software collections, then you put it in your path | 21:21 |
clarkb | https://git.openstack.org/cgit/openstack/octavia/tree/devstack/files/rpms/octavia is where it comes from so seems octavia specific | 21:21 |
clarkb | devstack runs aren't all trying to install it | 21:21 |
*** kgiusti has left #openstack-infra | 21:22 | |
clarkb | probably up to octavia to decide what is the most appropriate method for installing golang in this case | 21:22 |
tosky | it looks like that at least one of the featuresets in tripleo-quickstart enables the virt7-container-common-candidate repository, which provides golang too | 21:22 |
*** udesale has quit IRC | 21:23 | |
EmilienM | we use virt7-container-common-candidate to pull podman mainly and its deps | 21:24 |
corvus | okay, after much digging, i see that the "new" exception from the scheduler is not new at all; apparently for some time the scheduler has gotten sufficiently busy that there's a significant lag between when a job starts and the scheduler registers it. if a job is canceled during that window, we can't notify the executor, and so we return the nodes out from under it. when the job eventually | 21:25 |
corvus | fails, we try to return the nodes again, but note that we don't have the lock. in the end, everything works as it it should (or, at least, as best it can). i don't see an immediate fix to correct the underlying race which causes the errors. | 21:25 |
corvus | so i think i'm happy with the current system state and plan to give the release folks the all-clear and send out that email | 21:26 |
corvus | clarkb, fungi, pabelanger, mordred: ^ sound good | 21:26 |
clarkb | corvus: ++ | 21:26 |
clarkb | johnsom: hopefully that gives you enough breadcrumbs to go about fixing it. I'm not sure how octavia is using golang so unsure how to best suggest to fix it. However, I think if it were me maybe install from upstream go? | 21:27 |
pabelanger | corvus: ++ | 21:28 |
fungi | corvus: sounds good! | 21:29 |
cmurphy | clarkb: https://review.openstack.org/602380 was approved but had a gate failure, I'm now holding it until someone can babysit it, when is a good time for me to release it? | 21:29 |
mordred | corvus: ++ | 21:29 |
cmurphy | or mordred ^ | 21:30 |
clarkb | cmurphy: fungi might be willing to help watch it? he has been digging into all the mailing list stuff recently | 21:30 |
clarkb | I can help too, I just don't have the same level of mailman skills | 21:30 |
cmurphy | the main thing is just watching the puppet log to see if anything changed, if anything changed we revert | 21:31 |
fungi | clarkb: cmurphy: sure, happy to take a look, go ahead and un-wip | 21:31 |
openstackgerrit | Merged openstack-infra/system-config master: Don't import tasks in iptables reload and use listen https://review.openstack.org/621634 | 21:31 |
cmurphy | thanks fungi | 21:31 |
clarkb | fungi: ^ and with that in hopefully we can unblock the list disabling | 21:34 |
cmurphy | hmm should i recheck or will it get make its way into the gate queue on its own? | 21:36 |
fungi | clarkb: yep, i already rechecked my ml alias changes | 21:36 |
fungi | cmurphy: i've approved it just now | 21:37 |
clarkb | cmurphy: if all you did is remove the -W then you probably need to recheck (or have someone approve it as fungi did) | 21:37 |
cmurphy | got it thanks fungi | 21:37 |
fungi | my pleasure! | 21:37 |
*** jcoufal has quit IRC | 21:37 | |
* fungi goes back to writing a bunch of very redundant-looking e-mail messages | 21:37 | |
clarkb | corvus: mordred: not sure if you saw https://github.com/kubernetes/kubernetes/issues/71411 during the relevant priority stuff. But any chance we can check if our cluster needs a rebuild and if that is possible? (does magnum give you the version of k8s it deploys or do you select one?) | 21:40 |
*** priteau has quit IRC | 21:41 | |
corvus | clarkb: i don't recall seeing a choice or information | 21:41 |
mordred | I'm not super sure that would affect us anyway | 21:42 |
mordred | it seems like a violation of network isolation | 21:42 |
clarkb | mordred: it says in default configs the discovery api exposes it for all requests | 21:42 |
mordred | right - but "Remove pod exec/attach/portforward permissions from users that should not have full access to the kubelet API" | 21:43 |
clarkb | mordred: I read that to mean anyone on the internet (because our k8s api is internet facing right?) could exploit this to run pods | 21:43 |
mordred | is one of the mitigations - and I don't believe we have any such users | 21:43 |
mordred | clarkb: hrm. maybe? | 21:44 |
clarkb | I think they listed the two ways you could exploit it and your thing is the second but not only way | 21:44 |
clarkb | the first way through the discovery api is what I am worried about | 21:44 |
clarkb | ya the articles on it say that the one you point out can give you admin on cluster the one I point out will let you run pods | 21:45 |
mordred | clarkb: "aggregated API server endpoint" seems to be key | 21:46 |
mordred | I mean - regardless, we should likely upgrade - or use it as an exercise to figure out how to upgrade even if we don't need to | 21:46 |
clarkb | ya I'm not sure we have the insight necessary to know how magnum is deploying things so erring on the side of caution here is probably a good idea | 21:47 |
mordred | agree | 21:47 |
clarkb | reading magnum user docs I don't see a managed upgrade command | 21:49 |
clarkb | I'm thinking it may need to be a delete, create | 21:49 |
*** jmorgan1 has joined #openstack-infra | 21:53 | |
clarkb | or figure out how to do an upgrade in place on the cluster. Not sure if the commands to expand the cluster will work though (as it may end up with mismatched services?) | 21:55 |
clarkb | hogepodge: ^ you probably know | 21:55 |
*** wolverineav has quit IRC | 21:56 | |
clarkb | mordred: thinking about it more I think you can use unauth'd discovery to get a pod, then use that to get admin. Considering that and our not having really used this at all yet, delete, create may be desireable | 21:59 |
mordred | clarkb: ++ | 22:04 |
corvus | clarkb: yeah, but would be nice to know if/when that would be effective. | 22:05 |
corvus | also, i wonder if we can/should use the same keys. | 22:06 |
clarkb | ok reading more | 22:12 |
clarkb | it seems that you have to have one of the non default aggregate server endpoints running | 22:12 |
fungi | some sort of race or other nondeterminism in our snmp service test? http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_59_38_247682 | 22:12 |
clarkb | thats what the blurb about metrics is about | 22:13 |
clarkb | mordred: ^ so I think we were both half right | 22:13 |
clarkb | mordred: basically our api server is likely "vulnerable" but if there isn't the backend service endpoint behind it it can't be exploited | 22:13 |
*** jaosorior has quit IRC | 22:13 | |
*** jaosorior has joined #openstack-infra | 22:16 | |
*** rcernin has joined #openstack-infra | 22:18 | |
*** pcaruana has quit IRC | 22:18 | |
openstackgerrit | Merged openstack-infra/system-config master: Turn on future parser for lists.katacontainers.io https://review.openstack.org/602380 | 22:19 |
corvus | fungi: this looks ok. i'm not sure it it should be that short (compared to preceding/following lines): http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_56_52_322498 | 22:21 |
corvus | fungi: same: http://logs.openstack.org/56/619056/2/check/system-config-run-base/30ad771/job-output.txt.gz#_2018-12-03_21_56_56_261602 | 22:22 |
corvus | fungi: if it happens again, it might be good to hold the node and capture the syslog | 22:22 |
corvus | or, well, actually, we should just do that in th post playbook regardless | 22:23 |
clarkb | fwiw we do appear to have the default access to some bits of the api as unauthenticaed k8s user | 22:23 |
*** udesale has joined #openstack-infra | 22:23 | |
clarkb | but hard to know if there are aggregated api servers running behind that | 22:23 |
corvus | (of course, "capture the syslog" across all the systems we use is an impossibly complex task compared to 2 years ago) | 22:24 |
clarkb | corvus: for our control plane at least everything should still use rsyslog (journald will forward there) | 22:24 |
corvus | oh good | 22:24 |
clarkb | I'm not sure what the context of that is, but ya the way ubuntu and centos have set things up journald is actually a ring buffer that forwards to rsyslog. And pre systemd is just syslog | 22:25 |
*** udesale has quit IRC | 22:25 | |
clarkb | so they should all have a consistent interface to permanent logs (which is wherever rsyslog has written them which differes on distros) | 22:25 |
*** wolverineav has joined #openstack-infra | 22:25 | |
*** udesale has joined #openstack-infra | 22:26 | |
fungi | clarkb: context was getting snmpd's syslogged errors from a test node in our ansible base-test integration jobs | 22:27 |
*** priteau has joined #openstack-infra | 22:27 | |
*** ramishra has quit IRC | 22:28 | |
clarkb | ah I would expect that to be in /var/log/messages or /var/log/syslog depending on the platform then | 22:28 |
*** priteau has quit IRC | 22:29 | |
fungi | yeah, hopefully as this is an attempt at replicating bits of our control plane for an integration test, behavior should be similar | 22:29 |
*** boden has quit IRC | 22:32 | |
dmsimard | btw heads up, CentOS 7.6 is rolling out | 22:34 |
dmsimard | ah, just caught up with backlog :p | 22:35 |
clarkb | dmsimard: oh we've already discovered it :) broke tripleo and octavia | 22:35 |
* dmsimard sighs | 22:35 | |
mwhahaha | well if that's the only issue with tripleo it'll be one of the smoothest transitions | 22:36 |
* mwhahaha knocks on wood | 22:36 | |
clarkb | mwhahaha: http://logs.openstack.org/90/614290/2/gate/tripleo-ci-centos-7-standalone/5c77eaf/job-output.txt.gz#_2018-12-03_21_18_15_207420 paunch just ran into that | 22:36 |
clarkb | I think the error occurred because we don't support nested virt in inap | 22:37 |
mwhahaha | it's ignored | 22:37 |
mwhahaha | it failed cause of another reason | 22:37 |
mwhahaha | http://logs.openstack.org/90/614290/2/gate/tripleo-ci-centos-7-standalone/5c77eaf/job-output.txt.gz#_2018-12-03_21_55_50_136608 | 22:37 |
mwhahaha | tempest has been hanging for some weird reason | 22:37 |
clarkb | mwhahaha: maybe only load kvm_intel if vmx is present? (will clean up the logs) | 22:38 |
mwhahaha | yea we can clean up that role. it's our role to check if we should be using qemu or not for nova | 22:39 |
clarkb | also it fails later trying to connect to tempest-sendmail.tripleo.org:8080 ? | 22:39 |
*** mriedem is now known as mriedem_away | 22:40 | |
mwhahaha | yea i don't know the deal with that code, will need to raise a bug (and maybe disable it) | 22:40 |
clarkb | zuul can be configured to report via email if you'd like to set that up. | 22:40 |
mgagne_ | clarkb: vmx flag exists on our processor. is the issue that it isn't exposed to the VM? | 22:40 |
mwhahaha | no this is the tempest failures being sent out | 22:40 |
clarkb | mgagne_: ya you have to expose it to the middle VM for the nested virt to work | 22:41 |
clarkb | mwhahaha: the reports can point to job logs which include the tempest failures? | 22:41 |
dmsimard | mwhahaha: fwiw the base centos image is 7.5, nodepool hasn't built the 7.6 yet apparently | 22:41 |
mgagne_ | clarkb: right but what's the current status? I don't remember what we did | 22:41 |
clarkb | mwhahaha: another thing we should look at cleaning up is https://review.openstack.org/#/c/567224/, periodic jobs can be used for that | 22:41 |
mwhahaha | clarkb: those are basically periodic but < 8 hours (which was previously the periodic limit) | 22:42 |
clarkb | mgagne_: I think it is enabled on some systems but not others? I've not followed it super closely. johnsom tends to have a good overview of it | 22:42 |
mwhahaha | i think they are every 4, but yes it might make sense to look into a different way of running those | 22:42 |
clarkb | mwhahaha: ok I'm not sure how circumventing the limit is any better? | 22:42 |
mgagne_ | they all have the same CPU and configs. | 22:42 |
clarkb | absically thats a bug and its wrong so please can we fix it with the correct tool (periodic jobs) | 22:42 |
mwhahaha | periodic is just one job right? | 22:42 |
mgagne_ | hopefully they have the same BIOS settings, that I'm not sure | 22:42 |
mwhahaha | not *all* jobs for a repo? | 22:42 |
clarkb | mwhahaha: periodic is a piepline you configure which jobs to trigger on the period | 22:43 |
mwhahaha | i'll raise the issue with the appropriate folks, i don't really like those anyway | 22:43 |
clarkb | mgagne_: its a hypervisor kvm option not bios flag to pass it through | 22:43 |
clarkb | mgagne_: let me hop on an instance and double check | 22:43 |
johnsom | mgagne_ Hi, what is your nested virtualization question? | 22:43 |
mgagne_ | clarkb: could be that VT is disabled in the bios | 22:43 |
mgagne_ | johnsom: someone suspects that vmx flag isn't exposed in inap-mtl01. I'm saying our CPU have vmx flag. so I'm wondering what's the actual issue. | 22:44 |
clarkb | mgagne_: johnsom I've just hopped on an instance and don't see vmx in the VM | 22:45 |
johnsom | mgagne_ Ah, ok. Yeah, so if your hypervisor level sees VMX in the cpuinfo, your hardware virtualization is enabled. | 22:45 |
fungi | nova has to be configured to pass that through to the instances, correct? | 22:45 |
clarkb | systemd-detect-virt says kvm so the hypervisor is running with virt enabled (it would say qemu otherwise) | 22:45 |
mgagne_ | ok, let me see which CPU model is exposed then | 22:45 |
clarkb | fungi: I think its kvm actually | 22:45 |
fungi | ahh | 22:45 |
johnsom | mgagne_ However, you then need to enable your hypervisor to expose VMX inside the guests as well. | 22:45 |
dmsimard | mgagne_: http://paste.openstack.org/show/736600/ | 22:46 |
*** udesale has quit IRC | 22:46 | |
clarkb | mgagne_: its not urgent just pointing out that tripleo seemed toa ssume nested virt in the testing which added noise to the logs | 22:46 |
mgagne_ | so I think it has to do with the CPU model used by libvirt which does not include vmx. | 22:46 |
clarkb | mwhahaha: the other tool to keep in mind there is openstack health | 22:47 |
johnsom | mgagne_ What hypervisor are you using? | 22:47 |
clarkb | mwhahaha: it uses subunit to track things at a test level and you can rss/atom subscribe to feeds for things like that | 22:47 |
mgagne_ | johnsom: libvirt+kvm | 22:47 |
clarkb | mwhahaha: but it gives you nice graphing over time and so on | 22:47 |
mwhahaha | yea we use that too | 22:47 |
*** irclogbot_1 has quit IRC | 22:47 | |
clarkb | mgagne_: ah interesting | 22:47 |
johnsom | mgagne_ These are the steps for a KVM hypervisor: https://docs.openstack.org/devstack/latest/guides/devstack-with-nested-kvm.html | 22:47 |
mwhahaha | this is specifically to notify the correct people who care about specific test failures | 22:47 |
mgagne_ | I'll see what I can do | 22:47 |
clarkb | mwhahaha: ya they should be able to subscribe to those failures in openstack health I think | 22:48 |
* mwhahaha shrugs | 22:48 | |
mwhahaha | this stuff predates alot of that | 22:48 |
mgagne_ | johnsom: I think that's not the issue atm, the issue is with the CPU model used by libvirt which doesn't include those flags. | 22:48 |
mwhahaha | i thought mail was turned off anyway | 22:48 |
clarkb | mwhahaha: ya looks like that server isn't responding which leads to the later failure in that job | 22:48 |
mwhahaha | i'm filing bugs | 22:49 |
*** lbragstad has quit IRC | 22:51 | |
*** lbragstad has joined #openstack-infra | 22:52 | |
clarkb | mordred: there was email to the -discuss list recently about how to upgrade existing magnum clusters. Looks like you need access to the host VMs and run atomic container update ocmmands | 22:53 |
clarkb | mordred: so ya not exposed by the api as far as I can tell | 22:53 |
*** jaosorior has quit IRC | 22:53 | |
*** rh-jelabarre has quit IRC | 22:54 | |
*** jamesmcarthur has quit IRC | 22:55 | |
clarkb | I'm guessing we can't ssh into our magnum instances? | 22:55 |
*** rh-jelabarre has joined #openstack-infra | 22:57 | |
clarkb | what do you know I can ssh into them | 22:58 |
clarkb | There were 75084 failed login attempts since the last successful login. | 22:58 |
clarkb | seems like ssh is keeping the badness out? | 22:58 |
fungi | argh, can anyone interpret http://logs.openstack.org/58/621258/1/check/system-config-run-base-ansible-devel/3bb59c6/job-output.txt.gz#_2018-12-03_22_31_44_305164 ? | 22:59 |
fungi | looks like it hit that on trusty, xenial and centos7 | 22:59 |
clarkb | fungi: ansible inventory nodes use connections, ssh, windowswhateverpowershell?, etc | 23:00 |
fungi | same error for all 3 so i don't think itS a concidence | 23:00 |
clarkb | seems that ssh is no longer valid? | 23:00 |
clarkb | we might not want to keep up with the ansible devel at this rate :P | 23:00 |
fungi | or i may just put lists.o.o in the emergency disable list temporarily and hand-apply 621258 so i can get on with things | 23:01 |
clarkb | fungi: http://logs.openstack.org/58/621258/1/check/system-config-run-base-ansible-devel/3bb59c6/ansible/hosts/inventory.yaml is where we tell it to use the ansible_connection ssh | 23:02 |
*** rh-jelabarre has quit IRC | 23:02 | |
corvus | let's merge the non-voting change | 23:03 |
corvus | https://review.openstack.org/621577 | 23:04 |
corvus | someone will need to remove frickler's WIP | 23:05 |
clarkb | corvus: fungi I removed the WIP and approved the cahnge | 23:06 |
fungi | thanks! | 23:06 |
clarkb | corvus: mordred and other infra-root. We can ssh into the k8s nodes via the root user | 23:06 |
clarkb | seems that the hosts use our aggregate ssh key | 23:06 |
clarkb | corvus: mordred: infra-root any reason not to attempt to upgrade the cluster under magnum as described on the -discuss list? | 23:06 |
corvus | clarkb: ah yes, i knew that (i selected the keypair when creating it). i didn't make that connection though. | 23:07 |
clarkb | there is a non zero chance that this will break the cluster but we aren't using it yet right? and maybe we'll learn things | 23:07 |
corvus | clarkb: i say go for it yolo | 23:08 |
fungi | i must admit i'm not entirely clear on what or where said magnum cluster is | 23:08 |
*** irclogbot_1 has joined #openstack-infra | 23:08 | |
clarkb | fungi: corvus created a magnum k8s cluster in vexxhost sjc1 to point nodepoo at | 23:08 |
corvus | fungi: i made a magnum in vexxhost for nodepool | 23:08 |
fungi | was it used to test nodepool kubernetes driver? | 23:08 |
fungi | ahh, okay, good guess ;) | 23:08 |
clarkb | its not been used yet as there was a bug in the config file | 23:08 |
clarkb | not sure if that was fixed | 23:08 |
corvus | fungi: https://review.openstack.org/620756 | 23:08 |
clarkb | `sudo atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-apiserver:v1.11.5-1` and `sudo atomic containers update --rebase docker.io/openstackmagnum/kubernetes-apiserver:v1.11.5-1 kube-apiserver` are the sorts of commands to run according to the mailing list | 23:09 |
clarkb | I'll start on the master node and update all of the services to 1.11.5.-1 there. Then update the minion services after | 23:09 |
clarkb | and if it breaks we can always rebuild it. But ya Ifigure good learning opportunity to do this as in place upgrade | 23:09 |
clarkb | current versions is 1.11.1 | 23:09 |
*** jtomasek has quit IRC | 23:10 | |
fungi | so i guess magnum doesn't manage the version of kubernetes in the way that, say, trove manages the version of mysql? | 23:11 |
clarkb | correct | 23:11 |
clarkb | there is apparently ongoing work to support this? but the mailing list confirmed my reading of docs that we have to do it under magnum | 23:11 |
fungi | k | 23:11 |
clarkb | the other concern I have getting set up to do this is the magnum instances are built on fedora 27 whcih is no logner supported aiui | 23:12 |
clarkb | probably smaller concern since all services run out of containers, but ... | 23:12 |
fungi | you can in-place upgrade fedora though, right? | 23:13 |
*** yamamoto has joined #openstack-infra | 23:13 | |
clarkb | I think you "can" but its often recommended to do reinstall? | 23:13 |
fungi | or does kubernetes eat its own cloud-native dogfood and recommend that you redeploy your kubernetes control plane daily? | 23:13 |
jonher | Is there a good reason to why lists.openstack.org does not do https? | 23:15 |
fungi | jonher: no point | 23:15 |
jonher | alright, fair enough | 23:15 |
fungi | jonher: it sends out account passwords (the only thing https there would possibly protect) via unencrypted smtp on request | 23:15 |
fungi | and those passwords are only for managing subscription preferences | 23:16 |
clarkb | heh and now I've run out of disk space as we only have 5GB of disk on this node? | 23:17 |
jonher | I just found some links to lists.openstack.org that had https, hence the question, I'll submit a MR in that project | 23:17 |
clarkb | I'm going to see if it just didn't resize the rootfs on boot | 23:17 |
clarkb | once I figure out how to figure that out | 23:17 |
clarkb | (yay learning things) | 23:17 |
fungi | jonher: my poc for upgrading to mailman3 suggests we'll probably switch to https when we do that, but it's a much different system too | 23:17 |
*** gema has quit IRC | 23:18 | |
clarkb | ok lvm is set up and has ~32GB mounted under /var/lib/docker | 23:20 |
clarkb | 5GB mounted on sysroot | 23:20 |
clarkb | problem is we don't seem to use /var/lib/docker with atomic? | 23:20 |
corvus | clarkb: i wonder if we can do a rolling replace of master/minions? | 23:22 |
clarkb | /vda1 is /boot /vda2 is sysroot mapped through lvm /vdb is ~80GB device of which ~32GB is exposed to docker-pool via lvm | 23:23 |
clarkb | docker-pool isn't actually moutned on anything from what I see | 23:23 |
clarkb | maybe the intent was to set docker-pool | 23:24 |
clarkb | er | 23:24 |
clarkb | set docker-pool in /etc/docker/docker-lvm-plugin? but that wasn't done | 23:25 |
clarkb | hrm though there is an lv on the docker vg so maybe that is automagic | 23:25 |
mgagne_ | looks like the only way to be able to add vmx flag in Nova is to run Rocky. Or to use host-passthrough cpu_mode. Version prior to Rocky allows you to provide extra CPU flags but there is a whitelist which does not include vmx, only pcid and others related to meltdown/spectre. | 23:26 |
fungi | mgagne_: that option was added to allow passing through the cpu flags for meltdown/spectre | 23:29 |
mgagne_ | yes | 23:29 |
mgagne_ | but won't help for vmx =) | 23:29 |
mgagne_ | unless I patch our version of nova to allow it | 23:30 |
fungi | i have to assume nested-virt support was accomplished some other way as i thought providers had been doing that for a while | 23:30 |
mgagne_ | and in fact, add the feature. still running mitaka. | 23:30 |
mgagne_ | fungi: maybe they are using host-passthrough? or host-model? | 23:30 |
fungi | i don't know enough about nova to know, other than having been privy to the meltdown/spectre discussions and seeing other providers exposing nested-virt acceleration support who weren't running rocky either and who i assumed weren't patching nova to do it | 23:32 |
fungi | but... maybe they were/ | 23:32 |
clarkb | I freed up disk space with atomic images prune | 23:32 |
clarkb | it deleted some ociimages data | 23:32 |
clarkb | I think the docker lv must be used by k8s workload? | 23:33 |
clarkb | but atomic isn't running things with docker? or otherwise keeping its images and runtimes off of that lv? | 23:33 |
clarkb | hrm that wasn't enough to pull the other images | 23:34 |
ianw | ok, so i'm all caught up on the devel branch issues. the original bug exactly matches the change pointed out by fricker. the additional issue of using a block: in the handler (621633) is a known problem as i mentioned in a comment there | 23:36 |
ianw | so while i probably wouldn't agree ansible should break this without deprecation, it's all explained in my head at least now :) | 23:37 |
openstackgerrit | Merged openstack-infra/system-config master: Tighten permissions on zone keys https://review.openstack.org/617939 | 23:38 |
openstackgerrit | Merged openstack-infra/system-config master: Make system-config-run-base-ansible-devel non-voting https://review.openstack.org/621577 | 23:38 |
clarkb | fedora-atomic itself uses 4.4GB of disk for its ostree | 23:40 |
clarkb | so Ican't really go deleting anything else | 23:40 |
clarkb | mnaser: ^ as a heads up you may be interested in this as it feels like the vexxhost magnum deployment is not deployed on partitions large enough to do an in place k8s upgrade | 23:41 |
clarkb | mnaser: you might want to double the size of vda to 10GB from 5GB? | 23:41 |
* mnaser reads backlog | 23:42 | |
mnaser | clarkb: i think for that when you create a magnum cluster you pick the docker volume size | 23:44 |
mnaser | magnum cluster-show <foo> .. what does that show for docker_volume_size ? | 23:44 |
clarkb | mnaser: no this is the sysroot that is the issue | 23:44 |
clarkb | mnaser: I see the docker volume and it is ~80GB which si fine. The problem is that the host os itself uses atomic/ostree to run the system containers and I can't update those as sysroot is only 5GB large and fedora itself is 4.4GB | 23:44 |
clarkb | but let me show the cluster | 23:45 |
clarkb | coe cluster show Nodepool doesn't show volume sizes. Is that only available with magnumclient? | 23:47 |
mnaser | i think it might be clarkb | 23:49 |
mnaser | clarkb: i think this is a case of magnum creating a vm without volumes | 23:49 |
mnaser | but in sjc1 we do bfv only | 23:49 |
mnaser | that should probably be something we should fix | 23:50 |
clarkb | | docker_volume_size | 80 | | 23:50 |
clarkb | which is what I see on the pv/vg/lv side | 23:50 |
clarkb | so I think that is fine. My understanding of the issue is that atomic runs these system level containers outside of docker. And those containers run k8s | 23:50 |
clarkb | atomic itself is a 4.4GB "container" according to ostree which uses up almost the entire 5GB sysroot | 23:51 |
clarkb | but then I can't update the k8s container images as I run out of disk | 23:51 |
clarkb | mnaser: are we able to specify the sysroot size somehow when creating the cluster? | 23:51 |
mnaser | clarkb: unfortunately, i think the very fact that we are able to boot this cluster at all is a result of this bug: https://review.openstack.org/#/c/603910/ | 23:52 |
*** pbourke has quit IRC | 23:53 | |
mnaser | when root_gb=0, it creates a 'disk' that is equal to the size of the image | 23:53 |
mnaser | which really is a security issue to start with | 23:53 |
clarkb | that would explain it | 23:53 |
mnaser | but anyways, i think thats what is happening | 23:53 |
mnaser | i wonder if magnum has bfv support, grr | 23:53 |
mnaser | if not that's a fun exercise for me :) | 23:54 |
clarkb | on the one hand atomic is supposed to be fairly atomic and maybe the answer here is wait for vexxhost to push new iamges and then redeploy, but that doesn't help epople that have an existing cluster they want to keep using | 23:54 |
*** pbourke has joined #openstack-infra | 23:55 | |
mnaser | clarkb: yeah, what sort of issues did you run into? i havent had issues doing something like atomic host upgrade in the past | 23:55 |
mnaser | but it was on new clusters so maybe they didnt have a lot of space occupied by logs etc | 23:55 |
clarkb | mnaser: `atomic pull --storage ostree docker.io/openstackmagnum/kubernetes-kubelet:v1.11.5-1` fails with `FATA[0033] Error committing the finished image: /builddir/build/BUILD/skopeo-7add6fc80b0f33406217e7c3361cb711c814f028/vendor/src/github.com/ostreedev/ostree-go/pkg/otbuiltin/commit.go:407 - Writing content object: fallocate: No space left on device` | 23:57 |
mnaser | any reason why you were pulling that? | 23:57 |
clarkb | mnaser: yes major k8s security vulnerability I'd like to patch :) | 23:57 |
mnaser | oh that's nice to know. | 23:58 |
clarkb | and took this as a learning opportunity. I think for infra its no big deal to make a new cluster | 23:58 |
mnaser | that's kinda necessary | 23:58 |
mnaser | yeah but it's a good exercise | 23:58 |
clarkb | but anyone that has a running cluster is likely going to want to ugprade in place rather than redeploy | 23:58 |
clarkb | so figuring this out is also useful | 23:58 |
mnaser | look at that, working with a cloud providers pays for both infra and provider | 23:58 |
mnaser | who knew | 23:58 |
mnaser | :P | 23:58 |
mordred | mnaser: ikr? | 23:58 |
clarkb | mnaser: ya I mean we'll likely just reisntall it at this point, but figuring out the disk situation so that in the future we could just upgrade would be nice | 23:59 |
mnaser | https://github.com/openstack/magnum/blob/c8019ea77f33609452dd1a973e0f421b118c2079/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L745-L761 | 23:59 |
clarkb | but as you said that may depend no whether or not magnum understands bfv | 23:59 |
mnaser | so it looks like it doesnt support boot from volume grrr | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!