*** rkukura has joined #openstack-infra | 00:01 | |
*** longkb has joined #openstack-infra | 00:10 | |
*** slaweq has joined #openstack-infra | 00:13 | |
*** slaweq has quit IRC | 00:45 | |
*** slaweq has joined #openstack-infra | 01:11 | |
*** armax has joined #openstack-infra | 01:21 | |
*** ssbarnea has quit IRC | 01:33 | |
*** yamamoto has joined #openstack-infra | 01:40 | |
*** slaweq has quit IRC | 01:44 | |
hogepodge | If a node goes down in a gate job, is there a way to recover any diagnostics on it if you know things like the hostname and job it was associated with? | 01:52 |
---|---|---|
clarkb | hogepodge: we can determine uuid and followup with the cloud provider but its largely on them I think and depending on the cloud they may be more or less willing to do that | 01:56 |
clarkb | hogepodge: other things to consider: if uaing nested virt that can crash the VM in some cases and if modifying the network like interfaces or firewalla that can make the node go away too | 01:58 |
hogepodge | clarkb: I'm looking at one of the five node jobs. The workflow sets up a kubernetes cluster, one control plane node and four worker nodes. All of the nodes come up fine, and some pods are even deployed to it. Then the node disappears and I can't find any errors in the logs suggesting what happened. | 02:03 |
hogepodge | (when I say "the node" I mean "a node") | 02:03 |
clarkb | hogepodge: what cloud was it in? | 02:03 |
clarkb | a d region | 02:03 |
hogepodge | rax-ord | 02:04 |
*** hongbin has joined #openstack-infra | 02:05 | |
hogepodge | It took me a while to figure out how to read ARA reports. | 02:05 |
clarkb | hogepodge: if we collect info to go back into logs we can see if cloudnull is able to help us debug rax side | 02:05 |
clarkb | hostname, ip addresses, timestamps | 02:06 |
hogepodge | playbooks are shown in reverse chronological order, but tasks in chronological | 02:06 |
hogepodge | hostname ubuntu-xenial-rax-ord-0000321302 | 02:06 |
clarkb | link to the logs would probably help to just to find a y other info later if necessary | 02:07 |
hogepodge | actually, here's all the hostinfo http://logs.openstack.org/57/608057/7/check/openstack-helm-infra-five-ubuntu/daa7967/zuul-info/host-info.node-4.yaml | 02:07 |
hogepodge | have to run, I'll check back in tomorrow or later | 02:09 |
clarkb | ya I cant debug tonight and chances are we need someone cloud side anyway | 02:09 |
clarkb | but we should also try to rule out the test breaking networking on the node too | 02:10 |
clarkb | firewalls/sshd/interfaces | 02:10 |
*** slaweq has joined #openstack-infra | 02:11 | |
*** rkukura has quit IRC | 02:15 | |
hogepodge | I don’t think it does. Recheck passed. | 02:38 |
*** slaweq has quit IRC | 02:45 | |
*** yamamoto has quit IRC | 02:49 | |
*** mrsoul has quit IRC | 02:50 | |
fungi | ianw: there's an alternate armci.yaml in my homedir on the bridge with working initial temporary passwords for the new cloud. i haven't reset and recorded them in the usual places yet but if you want to play around with that feel free | 02:54 |
ianw | fungi: oh, cool, was just replying that the credentials don't work :) | 03:03 |
*** slaweq has joined #openstack-infra | 03:13 | |
*** bhavikdbavishi has joined #openstack-infra | 03:21 | |
*** annp has joined #openstack-infra | 03:40 | |
*** bhavikdbavishi has quit IRC | 03:42 | |
*** slaweq has quit IRC | 03:44 | |
*** bhavikdbavishi has joined #openstack-infra | 03:57 | |
*** mino_ has joined #openstack-infra | 03:58 | |
*** dave-mccowan has quit IRC | 04:00 | |
*** ramishra has joined #openstack-infra | 04:09 | |
*** slaweq has joined #openstack-infra | 04:16 | |
*** yamamoto has joined #openstack-infra | 04:24 | |
*** truongnh has joined #openstack-infra | 04:27 | |
*** udesale has joined #openstack-infra | 04:31 | |
*** ykarel has joined #openstack-infra | 04:40 | |
*** janki has joined #openstack-infra | 04:47 | |
*** slaweq has quit IRC | 04:48 | |
*** hongbin has quit IRC | 04:50 | |
*** slaweq has joined #openstack-infra | 05:16 | |
*** pcaruana has joined #openstack-infra | 05:23 | |
*** bhavikdbavishi has quit IRC | 05:27 | |
*** pcaruana has quit IRC | 05:32 | |
*** slaweq has quit IRC | 05:44 | |
*** zul has quit IRC | 05:49 | |
*** truongnh has quit IRC | 05:55 | |
*** truongnh has joined #openstack-infra | 06:04 | |
*** slaweq has joined #openstack-infra | 06:11 | |
*** e0ne has joined #openstack-infra | 06:17 | |
*** rkukura has joined #openstack-infra | 06:32 | |
*** quiquell has joined #openstack-infra | 06:35 | |
*** gfidente has joined #openstack-infra | 06:36 | |
*** e0ne has quit IRC | 06:36 | |
*** Dobroslaw has joined #openstack-infra | 06:37 | |
*** slaweq has quit IRC | 06:39 | |
*** chkumar|off is now known as chandankumar | 06:41 | |
*** rkukura has quit IRC | 06:42 | |
*** felipemonteiro has joined #openstack-infra | 06:49 | |
*** sshnaidm|off is now known as sshnaidm|rover | 07:07 | |
*** jbadiapa has joined #openstack-infra | 07:11 | |
*** slaweq has joined #openstack-infra | 07:11 | |
*** dpawlik has joined #openstack-infra | 07:12 | |
*** kjackal has joined #openstack-infra | 07:13 | |
*** jaosorior has joined #openstack-infra | 07:13 | |
*** slaweq has quit IRC | 07:16 | |
*** dpawlik has quit IRC | 07:16 | |
*** jaosorior has quit IRC | 07:24 | |
*** jaosorior has joined #openstack-infra | 07:27 | |
*** dpawlik has joined #openstack-infra | 07:42 | |
*** ccamacho has joined #openstack-infra | 07:44 | |
*** kopecmartin|off is now known as kopecmartin | 07:45 | |
*** dpawlik has quit IRC | 07:47 | |
*** truongnh has quit IRC | 07:52 | |
*** felipemonteiro has quit IRC | 08:00 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Try to reproduce hanging paused job https://review.openstack.org/615493 | 08:05 |
*** pcaruana has joined #openstack-infra | 08:06 | |
*** jtomasek has joined #openstack-infra | 08:12 | |
*** ralonsoh has joined #openstack-infra | 08:14 | |
*** dpawlik has joined #openstack-infra | 08:17 | |
amorin | hey all | 08:26 |
*** florianf|afk is now known as florianf | 08:30 | |
*** jtomasek has quit IRC | 08:36 | |
*** jtomasek has joined #openstack-infra | 08:44 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Update static.o.o publishing https://review.openstack.org/615501 | 08:46 |
*** bauwser is now known as bauzas | 08:56 | |
*** jpena|off is now known as jpena | 08:56 | |
*** jpich has joined #openstack-infra | 08:57 | |
*** lucasagomes has joined #openstack-infra | 08:57 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Try to reproduce hanging paused job https://review.openstack.org/615493 | 08:58 |
*** ccamacho has quit IRC | 09:04 | |
*** e0ne has joined #openstack-infra | 09:10 | |
*** rossella_s has joined #openstack-infra | 09:12 | |
*** ccamacho has joined #openstack-infra | 09:12 | |
*** pcaruana has quit IRC | 09:31 | |
*** pcaruana has joined #openstack-infra | 09:32 | |
*** adriancz has joined #openstack-infra | 09:39 | |
*** noama has joined #openstack-infra | 09:40 | |
*** ykarel is now known as ykarel|lunch | 09:41 | |
*** slaweq has joined #openstack-infra | 09:41 | |
AJaeger | fungi, could you review this ossa change, please? https://review.openstack.org/615498 | 09:43 |
*** ramishra_ has joined #openstack-infra | 09:49 | |
*** ramishra has quit IRC | 09:49 | |
*** derekh has joined #openstack-infra | 09:49 | |
*** shrasool has joined #openstack-infra | 09:52 | |
*** trown has quit IRC | 09:55 | |
*** trown has joined #openstack-infra | 09:58 | |
openstackgerrit | Noam Angel proposed openstack/diskimage-builder master: move selinux-permissive configure to pre-install phase https://review.openstack.org/615519 | 10:00 |
openstackgerrit | Noam Angel proposed openstack/diskimage-builder master: move selinux-permissive configure to pre-install phase https://review.openstack.org/615519 | 10:00 |
*** longkb has quit IRC | 10:01 | |
openstackgerrit | Noam Angel proposed openstack/diskimage-builder master: move selinux-permissive configure to pre-install phase https://review.openstack.org/615519 | 10:03 |
*** ykarel|lunch is now known as ykarel | 10:05 | |
*** panda has joined #openstack-infra | 10:05 | |
*** electrofelix has joined #openstack-infra | 10:06 | |
*** ccamacho has quit IRC | 10:15 | |
icey | the openstack bot doesn't seem to be handling the meeting bits this morning :-/ | 10:16 |
*** ccamacho has joined #openstack-infra | 10:17 | |
*** shardy has joined #openstack-infra | 10:21 | |
*** dtantsur|afk is now known as dtantsur\ | 10:35 | |
*** dtantsur\ is now known as dtantsur | 10:35 | |
*** mino_ has quit IRC | 10:35 | |
*** d0ugal has quit IRC | 10:36 | |
*** maciejjozefczyk has quit IRC | 10:50 | |
frickler | icey: in which channel did you see issues? | 10:54 |
*** maciejjozefczyk has joined #openstack-infra | 10:54 | |
*** d0ugal has joined #openstack-infra | 10:55 | |
*** ssbarnea has joined #openstack-infra | 11:04 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Resume paused job with skipped children https://review.openstack.org/615493 | 11:06 |
icey | frickler: #openstack-meeting-4 | 11:15 |
*** rfolco|off has joined #openstack-infra | 11:22 | |
frickler | icey: hmm, it did seem to work fine when I just tested it, sadly I wasn't online in that channel earlier, so not sure what went wrong there. maybe some other infra-root can take a look later | 11:22 |
*** rfolco|off is now known as rfolco|ruck | 11:24 | |
icey | No worries, meeting finished anyways:-) | 11:24 |
*** tpsilva has joined #openstack-infra | 11:31 | |
*** dpawlik has quit IRC | 11:38 | |
frickler | icey: oh, I just checked the logs, you had a space in front of your #startmeeting command, which is why the bot ignored it. you don't see it in the html log, but if you check the text version you can see it http://eavesdrop.openstack.org/irclogs/%23openstack-meeting-4/%23openstack-meeting-4.2018-11-05.log | 11:46 |
*** beekneemech has quit IRC | 11:53 | |
icey | Heh that explains it, thanks! | 11:57 |
*** bnemec has joined #openstack-infra | 11:57 | |
*** dave-mccowan has joined #openstack-infra | 12:04 | |
*** janki has quit IRC | 12:05 | |
*** rh-jelabarre has joined #openstack-infra | 12:07 | |
*** pbourke has quit IRC | 12:10 | |
*** pbourke has joined #openstack-infra | 12:11 | |
*** udesale has quit IRC | 12:14 | |
*** roman_g has joined #openstack-infra | 12:15 | |
*** dpawlik has joined #openstack-infra | 12:16 | |
*** dpawlik has quit IRC | 12:18 | |
*** dpawlik has joined #openstack-infra | 12:18 | |
*** jpena is now known as jpena|lunch | 12:20 | |
*** dpawlik has quit IRC | 12:23 | |
*** dpawlik has joined #openstack-infra | 12:26 | |
*** dpawlik has quit IRC | 12:27 | |
*** dpawlik has joined #openstack-infra | 12:28 | |
*** dpawlik_ has joined #openstack-infra | 12:30 | |
*** dpawlik has quit IRC | 12:31 | |
*** dpawlik_ has quit IRC | 12:32 | |
*** dpawlik has joined #openstack-infra | 12:32 | |
*** jroll has quit IRC | 12:32 | |
*** jroll has joined #openstack-infra | 12:34 | |
openstackgerrit | Merged openstack-infra/zuul master: Fix unreachable nodes detection https://review.openstack.org/602829 | 12:34 |
*** dpawlik_ has joined #openstack-infra | 12:35 | |
openstackgerrit | Merged openstack-infra/zuul master: Also retry the job if a post job failed with unreachable https://review.openstack.org/602830 | 12:36 |
*** dpawlik has quit IRC | 12:37 | |
*** ansmith has quit IRC | 12:37 | |
*** e0ne_ has joined #openstack-infra | 12:37 | |
*** e0ne has quit IRC | 12:40 | |
*** dpawlik_ has quit IRC | 12:42 | |
*** dpawlik has joined #openstack-infra | 12:42 | |
*** boden has joined #openstack-infra | 12:43 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add role to install kubernetes https://review.openstack.org/605823 | 12:48 |
*** rlandy has joined #openstack-infra | 12:49 | |
*** dtantsur is now known as dtantsur|brb | 12:51 | |
*** zul has joined #openstack-infra | 13:15 | |
*** AJaeger has quit IRC | 13:22 | |
*** jpena|lunch is now known as jpena | 13:25 | |
*** jtomasek has quit IRC | 13:27 | |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Implement an OpenShift resource provider https://review.openstack.org/570667 | 13:28 |
*** jtomasek has joined #openstack-infra | 13:28 | |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335 | 13:28 |
smcginnis | Etherpad appears to be having issues again. | 13:29 |
*** haleyb has joined #openstack-infra | 13:30 | |
*** jistr is now known as jistr|call | 13:32 | |
*** panda is now known as panda|bbl | 13:36 | |
*** AJaeger has joined #openstack-infra | 13:37 | |
*** jcoufal has joined #openstack-infra | 13:38 | |
*** yamamoto has quit IRC | 13:38 | |
*** yamamoto has joined #openstack-infra | 13:38 | |
arxcruz | gmann: around? | 13:44 |
gmann | arxcruz: kind of but i need to go away soon if nothing urgent. | 13:46 |
arxcruz | gmann: just wondering, I'm noticing on tripleo that tempest scenarios are not running in parallel | 13:46 |
arxcruz | api tests are running in parallel, but scenarios start to use only one worker | 13:47 |
gmann | arxcruz: which job (tox env) you use? | 13:47 |
arxcruz | gmann: wondering if you know if that's a known issue, or if i need to change any configuration under stestr, or in tempest itself | 13:47 |
arxcruz | gmann: we use tempest run directly, not from tox | 13:47 |
*** jamesmcarthur has quit IRC | 13:47 | |
arxcruz | gmann: i'm aware of the serial scenario run in tox | 13:48 |
*** jamesmcarthur has joined #openstack-infra | 13:48 | |
gmann | arxcruz: as integrate-gate (tempest-full job), scenario tests are run in serial due to ssh timeout issue we faced | 13:48 |
gmann | arxcruz: ok | 13:48 |
gmann | arxcruz: in that case, it should run in parallel as long as you do not explicitly mention to run them in serial | 13:49 |
arxcruz | gmann: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/480e81e/logs/stackviz/#/testrepository.subunit/timeline | 13:49 |
arxcruz | an example | 13:49 |
openstackgerrit | Merged openstack-infra/zuul master: web: uses queues uid to preserve state on change https://review.openstack.org/614933 | 13:50 |
arxcruz | gmann: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/480e81e/logs/undercloud/home/zuul/tempest_container.sh.txt.gz | 13:51 |
dhellmann | AJaeger : I realized over the weekend that we have a small problem with the release job for heat. I know generally how to fix it, but could use some advice on details. | 13:53 |
dhellmann | The issue is that we're changing the dist name on master in order to be able to publish to pypi | 13:53 |
dhellmann | That's not something we're going to want to backport, though | 13:53 |
dhellmann | So we need to continue to use the old tarball-only release job on the other branches | 13:54 |
*** felipemonteiro has joined #openstack-infra | 13:54 | |
dhellmann | I can set up the job to use a regex based on the tags to know when to run or not run | 13:54 |
gmann | arxcruz: it is not mentioned serial and concurrency is 3 so it should be parallel | 13:54 |
arxcruz | gmann: exactly, but you see, api is running in parallel, but when start to run scenario, it runs in serial | 13:55 |
dhellmann | but the project-template has dependencies defined between some of the other jobs, so that the announce job only runs if we actually upload a release, and I don't know the best way to reproduce that (use the project-template and then separately set the matching regexes, or bring the whole set of jobs in as custom for heat) | 13:55 |
*** ansmith has joined #openstack-infra | 13:57 | |
gmann | arxcruz: not sure how you observe them as serial ? | 13:57 |
AJaeger | dhellmann: use the individual jobs for heat, not the template. That's the only way out ... | 13:58 |
arxcruz | gmann: stackviz shows the scenario running only on one worker | 13:58 |
arxcruz | gmann: https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/480e81e/logs/stackviz/#/testrepository.subunit/timeline | 13:58 |
dhellmann | AJaeger : ok. I'll work on a patch for that later today | 13:59 |
*** janki has joined #openstack-infra | 13:59 | |
AJaeger | dhellmann: and then let's check what to do with the dependencies - might be tricky ... | 14:00 |
arxcruz | gmann: is tempest smart enough to let's say, if our quotas is set to 1 vm, it runs these scenarios in serial ? | 14:00 |
dhellmann | AJaeger : I thought maybe if I added both job templates, I could then list the 2 release jobs with the matching regexes and zuul would figure out what I meant. :-) | 14:00 |
arxcruz | like, we don't have quota, let's wait one scenario teardown everything to run the next | 14:00 |
*** jamesmcarthur has quit IRC | 14:01 | |
AJaeger | dhellmann: we need to ask corvus for that - I suggest you push a change out and ask him. | 14:01 |
*** ginopc has joined #openstack-infra | 14:02 | |
dhellmann | AJaeger ++ | 14:02 |
gmann | arxcruz: i can see them in parallel. for example - tempest.scenario.test_volume_boot_pattern and tempest.scenario.test_network_advanced_server_ops tempest.scenario.test_volume_boot_pattern at same time @4.36 | 14:02 |
arxcruz | gmann: but why the last tests are running only on one single worker ? | 14:03 |
arxcruz | while the other 3 workers are doing nothing | 14:03 |
gmann | arxcruz: at the end it is only scenario as api tests are done and those remaining scenario get worker after that | 14:03 |
arxcruz | gmann: so, at the end, we have 3 workers doing nothing, and one woeker running all scenarios | 14:03 |
arxcruz | all the remain scenarios | 14:03 |
*** kgiusti has joined #openstack-infra | 14:04 | |
arxcruz | gmann: here is more visible that http://logs.openstack.org/33/615133/2/check/tripleo-ci-centos-7-standalone/9ac46cc/logs/stackviz/#/testrepository.subunit/timeline | 14:04 |
*** bobh has joined #openstack-infra | 14:05 | |
gmann | arxcruz: RE: quota things. no Tempest does not wait for quota things. if no enough quota then it will error if any test creating more then allowed quota | 14:05 |
gmann | arxcruz: one thing is class level tests is all serial. so you can see all tests in tempest.scenario.test_network_basic_ops class will run in serial always. | 14:07 |
gmann | arxcruz: but i can see TestVolumeBootPattern tests did not get started on worker 1 or 3 | 14:07 |
*** jistr|call is now known as jistr | 14:08 | |
gmann | arxcruz: and i think you will see different behaviour depends on CPU usage on machine. in your last link you pasted, only these 2 tests are running serially and rest all in parallel. and that depends on CPU allocation to worker etc. Tempest does not have any control on those things | 14:10 |
*** mriedem has joined #openstack-infra | 14:11 | |
gmann | arxcruz: from Tempest side, tests are in queue for parallel run at class level and depends on CPU availability they gets executed. | 14:12 |
arxcruz | gmann: is there a way to change that ? | 14:13 |
arxcruz | i mean class level run in parallel ? | 14:13 |
arxcruz | gmann: do you know where the scheduler for this cpu availability? | 14:13 |
gmann | arxcruz: only things we can do to fast run is to increase the workers. | 14:13 |
mordred | EmilienM: I've started looking at podman for infra things - and one of the first things that jumps out at me is that the config file for configuring registries (and mirrors) is different than from docker... in the tripleo jobs, when you're doing podman stuff - are you guys writing the per-region mirror info into /etc/containers/registries.conf ? | 14:13 |
gmann | arxcruz: depends on your machine you run tests | 14:14 |
gmann | arxcruz: need to leave office otherwise i am going to miss my last train. will try to catch you from home or tomorrow. | 14:14 |
EmilienM | mordred: so I have bad news for you | 14:14 |
arxcruz | gmann: ok, thanks! | 14:14 |
mordred | EmilienM: oh no! not bad news! | 14:15 |
*** panda|bbl is now known as panda | 14:15 | |
EmilienM | mordred: there is no support for mirrors / proxies in podman. In fact the code that pulls images is in https://github.com/containers/image | 14:15 |
*** ginopc has quit IRC | 14:15 | |
EmilienM | mordred: but AFIK there is nothing in podman that consummes it | 14:15 |
EmilienM | mordred: the only thing that we were able to do is to use insecure registries from the undercloud when deploying | 14:16 |
EmilienM | mordred: with [registries.insecure] | 14:16 |
mordred | EmilienM: but you have to explicitly configure registries - so couldn't you just put in the mirror url instead of docker.io in the registries list? | 14:17 |
EmilienM | mordred: https://github.com/containers/image/issues/529 | 14:17 |
AJaeger | EmilienM: are you sure? Let me double check with a colleague... | 14:17 |
EmilienM | mordred: I believe you can do that yes | 14:18 |
*** quiquell is now known as quiquell|lunch | 14:18 | |
EmilienM | there is no equivalent of registry-mirrors in podman | 14:18 |
mordred | EmilienM: sweet. I'm going to poke at that for a little bit - on test nodes, I would NEVER want them to fallback to docker.io and to always only ever talk to the mirror ... I'll let you know if I get it working | 14:18 |
*** ginopc has joined #openstack-infra | 14:19 | |
EmilienM | mordred: question: on which OS/version are you running tests? | 14:19 |
*** dtantsur|brb is now known as dtantsur | 14:19 | |
EmilienM | mordred: if centos7, you want to pull podman from https://buildlogs.centos.org/centos/7/virt/x86_64/container/ | 14:19 |
EmilienM | to get the latest version | 14:19 |
AJaeger | mordred, EmilienM, before you try alternatives, let's wait for my question - I know a colleague implemented something for cri-o and expect this works on podman - but let's see whether my colleague is around... | 14:19 |
EmilienM | AJaeger: no problem | 14:20 |
EmilienM | in fact we might be able to use [registries.search] if it's a full mirror | 14:20 |
AJaeger | EmilienM: quick answer: you're right ;( | 14:20 |
mordred | EmilienM: yes - that's what I was thinking- it is a full mirror | 14:21 |
mordred | EmilienM: I'm actually working with the libpod team on testing out their new shiny ubuntu bionic ppa | 14:21 |
EmilienM | mordred: I'll give it a try today in tripleo-ci | 14:21 |
*** felipemonteiro has quit IRC | 14:21 | |
EmilienM | I don't know why I didn't think about it before | 14:21 |
EmilienM | mordred: nice, so join us on #podman :D we have a nice collaboration between tripleo and them | 14:22 |
mordred | EmilienM: cool. I'm also going to make an install-podman role in zuul-jobs, similar to the install-docker role - that will install podman and set it up to talk to mirrors | 14:22 |
EmilienM | oh nice | 14:22 |
ykarel | mordred, can u check https://review.openstack.org/#/c/615543/, py27 was running with python3 | 14:23 |
mordred | ykarel: yup - I believe I've +2'd it already - Shrews - you wanna +A it? | 14:24 |
Shrews | mordred: no? b/c test failure? | 14:24 |
ykarel | mordred, ok there is some patch already | 14:24 |
*** fried_rice is now known as efried | 14:25 | |
ykarel | py27 test actually failing after this | 14:25 |
mordred | Shrews: oh. well, bah on test failures | 14:25 |
mordred | ykarel: awesome! do you want to fix those failures in that patch? if not, I can look in to it in a little while | 14:25 |
mordred | and looks like getting it updated would in fact be important :) | 14:25 |
ykarel | mordred, u can take it over | 14:26 |
ykarel | mordred, actually detected in package build: https://logs.rdoproject.org/23/17223/1/check/legacy-rdoinfo-DLRN-check/ed2bc9a/buildset/centos-rpm-master/repos/99/47/9947219c4ee8da7b5d8478f0904fbb6f48f9a457_dev/mock.log | 14:27 |
ykarel | there we have multiple failures apart from CI, thread.error: can't start new thread | 14:28 |
*** jamesmcarthur has joined #openstack-infra | 14:28 | |
mordred | ykarel: the thread error is going to be an issue in tests related to the task manager work - and _should_ be better now- but also there are a few more patches coming that should make that much better | 14:30 |
mordred | ykarel: and awesome- I will do - thanks for pointing it out! | 14:30 |
ykarel | mordred, Thanks will keep an eye there | 14:30 |
ykarel | it's issue with 2.19 release, may be next release would be better | 14:31 |
*** shrasool has quit IRC | 14:33 | |
mordred | ykarel: I hope so - we've got a patch in flight to keystoneauth which will actually make us stop creating the additional threads in the first place - should all be sorted reasonably soon | 14:36 |
ykarel | mordred, ack Thanks | 14:36 |
* mordred afks for just a bit | 14:37 | |
*** SteelyDan is now known as dansmith | 14:37 | |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver https://review.openstack.org/604404 | 14:39 |
openstackgerrit | Jean-Philippe Evrard proposed openstack-infra/project-config master: Add notifications to openstack-helm https://review.openstack.org/615572 | 14:40 |
*** felipemonteiro has joined #openstack-infra | 14:42 | |
*** felipemonteiro has quit IRC | 14:45 | |
*** ccamacho has quit IRC | 14:50 | |
*** ccamacho has joined #openstack-infra | 14:52 | |
*** rh-jelabarre has quit IRC | 14:52 | |
*** sthussey has joined #openstack-infra | 14:53 | |
openstackgerrit | Merged openstack-infra/nodepool master: Implement a Kubernetes driver https://review.openstack.org/535557 | 14:54 |
openstackgerrit | Merged openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515 | 14:55 |
*** quiquell|lunch is now known as quiquell | 14:59 | |
*** janki has quit IRC | 15:01 | |
*** munimeha1 has joined #openstack-infra | 15:02 | |
odyssey4me | Hi folks - it seems like the github mirrors aren't quite up to date. Can someone take a peek? | 15:03 |
odyssey4me | As an example, https://git.openstack.org/cgit/openstack/openstack-ansible-os_cinder/commit/?id=02fa53d9de7c984e84710520f966be56e12e988c is not present in the github mirror. | 15:03 |
*** jistr is now known as jistr|call | 15:04 | |
*** jistr|call is now known as jistr | 15:10 | |
fungi | agreed, https://github.com/openstack/cinder/commit/02fa53d9de7c984e84710520f966be56e12e988c seems to return a 404 | 15:21 |
fungi | and gh claims the last merge cinder commit in master is 3 days old | 15:22 |
*** d34dh0r53 has quit IRC | 15:22 | |
*** cloudnull has quit IRC | 15:22 | |
*** eglute has quit IRC | 15:22 | |
*** cloudnull has joined #openstack-infra | 15:23 | |
*** d34dh0r53 has joined #openstack-infra | 15:23 | |
*** eglute has joined #openstack-infra | 15:23 | |
fungi | it does indeed look like there are a bunch of pushes to git@github.com:$someproject waiting as far back as Nov-03 20:59 (utc) | 15:26 |
corvus | fungi: cf1a92a7 Nov-03 20:58 [6f2926c3] push git@github.com:openstack/keystone.git | 15:27 |
corvus | yeah | 15:27 |
corvus | looks like that one is "running" but stuck, and the others are behind it | 15:27 |
fungi | oh, yep | 15:27 |
corvus | i bet if we kill it, things will resume | 15:27 |
corvus | i'll go ahead and kill it and the other stuck tasks | 15:29 |
fungi | 1bc9f3356d3076334a4b9b36283f463e0a65fa8b seems to be the last commit to replicate to gh for openstack/keystone | 15:30 |
corvus | there's another keystone push right after, so it'll probably get updated anyway | 15:30 |
fungi | so i _think_ 733b37f24d874193b965528bacf1fd56ccffbc79 is what was hung replicating? | 15:31 |
fungi | (maybe) | 15:31 |
fungi | i don't see anything weird about that change ( https://review.openstack.org/615400 ) so it was probably just something going sideways with the connection | 15:31 |
corvus | i killed the task; keystone looks a bit more up to date nowe | 15:32 |
EmilienM | mordred, AJaeger : so if you configure registries.search with the mirror (without http://) and then add it to registries.insecure as well (without http://) and then run "podman pull --tls-verify=false myimage", it works | 15:32 |
EmilienM | mordred, AJaeger : it would be nice to provide https (secured) mirros to avoid this workaround though | 15:32 |
EmilienM | but it's good to know we can use mirrors in /etc/containers/registries.conf | 15:33 |
fungi | task count is slowly falling | 15:33 |
corvus | EmilienM: there's a spec for using letsencrypt in infra; i imagine if we can find someone to work on that, we could probably have tls mirrors | 15:33 |
EmilienM | corvus: nice | 15:33 |
EmilienM | corvus: I might know someone | 15:34 |
corvus | EmilienM: https://review.openstack.org/587283 | 15:34 |
EmilienM | yeah | 15:34 |
* Tengu hides | 15:36 | |
EmilienM | I'll put links here but spredzy (Yanis) wrote https://github.com/Spredzy/ansible-role-lecm and https://github.com/Spredzy/lecm | 15:37 |
EmilienM | which I'm sure can help | 15:37 |
Tengu | I started some fancy work for LE integration on my own, last year - it was focused on the public endpoints though, and took care of detecting where the VIP is, and sync the certificate between HA controllers. | 15:38 |
Tengu | never had time to finish, unfortunately. | 15:39 |
*** sdoran has left #openstack-infra | 15:39 | |
Tengu | https://github.com/cjeanneret/openstack-certodia | 15:41 |
clarkb | EmilienM: mordred to be clear our proxies are not a mirror they are caching proxies | 15:42 |
clarkb | so yes it is a full "mirror" of dockerhub | 15:42 |
Tengu | clarkb: TLS MitM then? | 15:43 |
clarkb | Tengu: ? | 15:43 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for eavesdrop.o.o https://review.openstack.org/590048 | 15:43 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for lists.katacontainers.io https://review.openstack.org/602380 | 15:43 |
clarkb | Tengu: I was following up to the earlier question about whether or not our "mirrors" were "full mirrors" | 15:44 |
clarkb | our mirrors are actually just caches for the actual thing so: yes | 15:44 |
*** shrasool has joined #openstack-infra | 15:44 | |
Tengu | clarkb: the dockerhup serves all its content via TLS (https) - in order to have a caching proxy it must decrypt the content. so man-in-the-middle (just a side question, not really that important) | 15:44 |
clarkb | Tengu: ah right, we actually only do http to the proxy then it does https to the backend | 15:45 |
clarkb | so not really mitm'ing | 15:45 |
clarkb | (at least we aren't pretending to have secured a connection to the backend) | 15:45 |
AJaeger | fungi, could you review this ossa change, please? https://review.openstack.org/615498 | 15:46 |
Tengu | clarkb: :) ok! thanks for the precision. | 15:46 |
clarkb | re tls mirrors in general. Keep in mind that for apt you have to install extra packages to speak tls | 15:47 |
clarkb | so we shouldn't blindly put that in place across the baord | 15:48 |
clarkb | (though this is no longer true on bionic iirc) | 15:48 |
*** quiquell is now known as quiquell|off | 15:49 | |
Tengu | hmmm, latest version of apt doesn't need any additional package. | 15:49 |
clarkb | Tengu: ya I think for bionic and debian testing/unstable its fine. But xenial for sure needs the apt tls package and unsure about debian stable | 15:50 |
Tengu | latest debian stable already have the new apt version, and there's no "apt-https-transport" anymore (or whatever the name was back then) | 15:50 |
clarkb | ah ok so its just xenial then | 15:50 |
Tengu | yeah, I think so | 15:50 |
fungi | yeah, but also apt repositories are rarely served via https anyway because it brings basically nothing useful to do so and it's hard to coordinate certs across a distributed volunteer-run mirror network | 15:50 |
Tengu | anyway, I didn't touch any (stable) debian for a year now ^^' | 15:50 |
clarkb | fungi: right, I mostly don't want someone to add tls to the mirrors and break apt | 15:50 |
fungi | we could just avoid redirecting http to https | 15:51 |
Tengu | +1 | 15:51 |
clarkb | Tengu: re mitm too, since we use our own hostnames we should be able to mitm without any extra work as well. Its not like a squid pretending to be dockerhub, its apache reverse proxying to dockerhub under its own name | 15:56 |
clarkb | *any extra work on top of setting up tls on the proxy | 15:56 |
Tengu | ok :) | 15:56 |
jungleboyj | I am afraid that I have another corrupted etherpad: https://etherpad.openstack.org/p/cinder-stein-meeting-agendas | 16:02 |
*** dklyle has joined #openstack-infra | 16:02 | |
jungleboyj | Could someone run the magic command to try to recover it. | 16:02 |
clarkb | hogepodge: a6cb65b1-5897-45b4-86c8-d79ff74dc327 is the uuid of http://logs.openstack.org/57/608057/7/check/openstack-helm-infra-five-ubuntu/daa7967/zuul-info/host-info.node-4.yaml that host you identified in rax-ord as having trouble | 16:03 |
clarkb | cloudnull: ^ any chance you have a few minutes today to take a look at that or suggest further debugging? tldr is rax-ord instance lost networking | 16:04 |
*** dpawlik has quit IRC | 16:05 | |
clarkb | jungleboyj: that does make we wonder if someone doing cinder things has a client that causes that | 16:05 |
*** dpawlik_ has joined #openstack-infra | 16:05 | |
clarkb | jungleboyj: I'll take a look shortly | 16:05 |
*** luizbag has joined #openstack-infra | 16:05 | |
jungleboyj | clarkb: Thank you so much. We were using it when things are unhappy on the server last week. | 16:06 |
clarkb | ah that could be related too | 16:06 |
*** pcaruana has quit IRC | 16:06 | |
AJaeger | config-core, could you review https://review.openstack.org/615174 and https://review.openstack.org/615572 , please? | 16:07 |
*** ccamacho has quit IRC | 16:08 | |
*** dpawlik_ has quit IRC | 16:12 | |
*** dpawlik has joined #openstack-infra | 16:12 | |
clarkb | jungleboyj: http://paste.openstack.org/show/734152/ | 16:13 |
jungleboyj | clarkb You rock! Thank you! | 16:14 |
jungleboyj | clarkb: Take it the original page is lost? | 16:15 |
clarkb | jungleboyj: yes, I don't think we know of a way to edit the database to recover those. Likely is editing the json in the tables to be valid | 16:16 |
*** maciejjozefczyk has quit IRC | 16:16 | |
*** ramishra_ has quit IRC | 16:16 | |
clarkb | dtroyer: ildikov for https://review.openstack.org/#/c/615174/2 do we need to coordinate that to update groups quickly so that starlingx work doesn't grind to hault? | 16:16 |
*** imacdonn has quit IRC | 16:16 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid master: Migration to PHP 7.x https://review.openstack.org/611936 | 16:17 |
*** imacdonn has joined #openstack-infra | 16:17 | |
clarkb | dtroyer: ildikov: a volunteer to be the seed user on those groups would probably work? or we can add the old group to the new groups and you all can sort it out from there? | 16:17 |
*** dpawlik has quit IRC | 16:17 | |
fungi | clarkb: one workaround is that you can move the corrupt pad (there's an api call like move or rename) and then the original content can be pasted into a new pad at the old name after that so urls remain valid | 16:17 |
ildikov | clarkb: I have an initial list for people, I can setup the groups if you add me | 16:17 |
clarkb | ildikov: ok I'll approve it now then | 16:18 |
ildikov | clarkb: thanks! | 16:18 |
openstackgerrit | Merged openstack-infra/project-config master: Add notifications to openstack-helm https://review.openstack.org/615572 | 16:19 |
AJaeger | mrhillsman: could you review governance-sigs open changes, please? https://review.openstack.org/#/q/project:openstack/governance-sigs+is:open | 16:25 |
clarkb | #status log Added stephenfin and ssbarnea to git-review-core in Gerrit. Both have agree to focus on bug fixes, stability, and improved testing. Or as corvus put it "to be really clear about that, i think any change which requires us to alter our contributor docs should have a nearly impossible hill to climb for acceptance". | 16:28 |
openstackstatus | clarkb: finished logging | 16:28 |
*** dpawlik has joined #openstack-infra | 16:28 | |
clarkb | stephenfin: ssbarnea ^ fyi and thank you! | 16:28 |
fungi | yes, huge thanks!!! | 16:28 |
openstackgerrit | Merged openstack-infra/project-config master: Add StarlingX core groups https://review.openstack.org/615174 | 16:30 |
stephenfin | clarkb: Spot on. Cheers :) | 16:31 |
stephenfin | can't promise I'll do anything until December (summit and PTO) but after that, fix ALL the bugs | 16:31 |
corvus | i think doing nothing and fixing bugs are both great things to do | 16:32 |
*** dpawlik has quit IRC | 16:32 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: Proposed spec: tenant-scoped admin web API https://review.openstack.org/562321 | 16:34 |
*** e0ne_ has quit IRC | 16:36 | |
*** kopecmartin is now known as kopecmartin|off | 16:36 | |
ildikov | clarkb: how long till the new groups appear in Gerrit? | 16:39 |
clarkb | ildikov: puppet is running every ~45 minutes right now so roughly about that long | 16:39 |
ildikov | clarkb: cool, tnx | 16:40 |
odyssey4me | would it be possible for me to gain access to a held node which is running a test? I'm not able to replicate an issue outside of CI which is causing https://review.openstack.org/#/c/615258/ to fail for the one set of tests on suse/centos | 16:40 |
clarkb | odyssey4me: have a preference for which job we hold the instance for? | 16:42 |
odyssey4me | clarkb I guess openstack-ansible-deploy-aio_metal-opensuse-423 given that's what I tried to replicate locally. | 16:42 |
clarkb | odyssey4me: against openstack-ansible project? | 16:43 |
odyssey4me | clarkb yeah, ideally a test running against that patch | 16:43 |
odyssey4me | (it resolves some issues I picked up in local testing) | 16:43 |
fungi | odyssey4me: thanks for the heads up on the replication backlog. gerrit caught up a little while ago and https://github.com/openstack/openstack-ansible-os_cinder/commit/02fa53d9de7c984e84710520f966be56e12e988c seems to have replicated now | 16:44 |
clarkb | odyssey4me: next failure of that job on change 615258 should be held | 16:44 |
odyssey4me | fungi ah, great thanks - while we all know it's a mirror and not necessarily in sync, sometimes people still use it | 16:44 |
odyssey4me | clarkb so I just run a recheck and it'll hold that node? | 16:45 |
clarkb | odyssey4me: yup it should | 16:45 |
odyssey4me | how do I then actually access it? | 16:45 |
clarkb | odyssey4me: an infra-root will need to add your ssh key to the held instance | 16:45 |
odyssey4me | clarkb ok, thanks - I'll wait for the job to run again, then ping for the key to get added - is there a time limit on the hold? I ask because it's pretty much the end of my day and I'd prefer to do the investigation in the morning | 16:47 |
clarkb | odyssey4me: I did not set a time limit | 16:47 |
odyssey4me | ok thanks, so I just need to ping when I'm done again | 16:47 |
fungi | we also usually record enough breadcrumbs to remember who to ask to make sure they're done before we delete it | 16:48 |
*** openstackgerrit has quit IRC | 16:48 | |
clarkb | yes "odyssey4me debugging failures" says the comment | 16:48 |
odyssey4me | great, tyvm! | 16:49 |
*** sshnaidm|rover is now known as sshnaidm|afk | 16:53 | |
*** ginopc has quit IRC | 16:55 | |
*** ginopc has joined #openstack-infra | 16:58 | |
*** trown is now known as trown|lunch | 17:00 | |
*** gyee has joined #openstack-infra | 17:02 | |
*** openstackgerrit has joined #openstack-infra | 17:03 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: adns: Set zone directory permissions https://review.openstack.org/615607 | 17:03 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver https://review.openstack.org/604404 | 17:05 |
*** shrasool has quit IRC | 17:07 | |
clarkb | if anyone else has throughts on using the infra onboarding session in berlin for user onboarding please put them on https://etherpad.openstack.org/p/openstack-infra-berlin-onboarding I'm going to circulate that etherpad on the various dev mailing lists shortly | 17:10 |
ildikov | clarkb: the groups for StarlingX appeared on Gerrit now. Who has rights to add people to it by default? | 17:15 |
clarkb | ildikov: right now only gerrit admins. I'll add you then you can add everyone else | 17:15 |
ildikov | clarkb: cool, thank you | 17:15 |
clarkb | ildikov: done, I think I got all the groups too, but let me or infra-root know if any were missed | 17:17 |
ildikov | clarkb: roger; thanks! | 17:18 |
corvus | clarkb, fungi, mordred: https://review.openstack.org/615607 was the only oopsie from the opendev bootstrapping. i fixed that manually, and manually opened the firewall, and the new servers are serving data now. i sent jimmy an email asking him to set up glue records and dnssec (you are cc'd). once that's set up, we should be gtg. | 17:18 |
*** rkukura has joined #openstack-infra | 17:19 | |
* clarkb reviews the fix | 17:19 | |
corvus | (er, obviously, once the glue records are in place, i'll update the firewall config in ansible to do what i did manually) | 17:19 |
corvus | also, i manually ran the snippet in 615607, so that ansible is tested | 17:19 |
*** ginopc has quit IRC | 17:20 | |
*** ykarel has quit IRC | 17:23 | |
*** ginopc has joined #openstack-infra | 17:26 | |
*** jpich has quit IRC | 17:27 | |
ildikov | clarkb: this one is missing: https://review.openstack.org/#/admin/groups/1966,members | 17:28 |
clarkb | ildikov: fixed | 17:29 |
ildikov | tnx! | 17:29 |
*** calebb has joined #openstack-infra | 17:32 | |
*** ginopc has quit IRC | 17:41 | |
*** yamamoto has quit IRC | 17:47 | |
*** jpena is now known as jpena|off | 17:48 | |
*** rkukura has quit IRC | 17:49 | |
*** dave-mccowan has quit IRC | 17:49 | |
fungi | thanks corvus! | 17:50 |
*** ginopc has joined #openstack-infra | 17:54 | |
*** e0ne has joined #openstack-infra | 17:57 | |
AJaeger | prometheanfire, tristanC , fungi , tonyb, could you review this ossa docs change, please? https://review.openstack.org/#/c/615498/ | 17:58 |
AJaeger | thanks, mrhillsman ! | 18:00 |
*** derekh has quit IRC | 18:00 | |
prometheanfire | k | 18:01 |
ssbarnea | out of curiosity, do we have an irc bot that can post rss feeds? or if you know one that can easily be setup? | 18:01 |
fungi | AJaeger: thanks!!! | 18:01 |
ssbarnea | or even better, a free SaaS solution that does this. | 18:01 |
prometheanfire | didn't you hear? rss is dead :( | 18:01 |
mrhillsman | AJaeger: you’re welcome, not sure why i did not get notifications, maybe i just missed them | 18:02 |
ssbarnea | i guess rss is like irc, waiting for the singularity.... in order to die. | 18:02 |
AJaeger | thanks, fungi and prometheanfire ! | 18:02 |
prometheanfire | yarp | 18:02 |
fungi | ssbarnea: statusbot posts to wiki.openstack.org and twitter... updating an rss xml blob might not be hard to do with that codebase as a starting point. the main trick is in where/how you go about publishing it | 18:03 |
AJaeger | fungi, prometheanfire, keep in mind that ossa won't work with python3 - at least not without additional changes... | 18:03 |
fungi | AJaeger: seems like something we need to fix. thanks for the heads up! | 18:03 |
AJaeger | fungi: yes, that needs eventual fixing... | 18:04 |
fungi | looks like patchset #3 failures should point us in a starting direction at least | 18:05 |
melwitt | does anyone know if there's a way to search for bug keywords limited to a particular project in storyboard? I don't see how in the docs | 18:05 |
melwitt | I'm trying to see if there's already a bug open for the thing I'm considering opening a bug about | 18:06 |
AJaeger | fungi: just add python3 as basepython and see it fail - and then we need to update requirements as well (sphinx is < 1.3) | 18:06 |
*** luizbag has left #openstack-infra | 18:07 | |
fungi | melwitt: have you tried adding the project and keyword to the search field? | 18:08 |
fungi | when i search for bindep (and pick the openstack-infra/bindep project from the drop-down) and then add alpine as a keyword i get the one bindep story (and associated task) for alpine linux support in bindep. if i remove the openstack-infra/bindep project from the search i get several stories for different projects | 18:10 |
melwitt | it doesn't seem to allow both to be specified, either project or text keyword? | 18:10 |
melwitt | ok, I'll try it again | 18:11 |
fungi | you can add both. is it not letting you? | 18:11 |
fungi | basically it gave me https://storyboard.openstack.org/#!/search?q=alpine&project_id=811&title=alpine as the query when i entered what i described above | 18:11 |
melwitt | maybe a UI challenge but when I wrote "floating" next to the python-openstackclient box, it erased the python-openstackclient box and searched everything | 18:12 |
melwitt | gotcha. I'll play around with it more | 18:12 |
fungi | huh, that's definitely not intentional :/ | 18:12 |
fungi | i wonder if it gets ornery if you don't wait for the little "processing" spinny on the right side to stop and go back to a magnifying glass | 18:14 |
fungi | https://storyboard.openstack.org/#!/search?q=floating&project_id=975&title=floating is what i get for what it sounds like you're looking for | 18:14 |
melwitt | ah, yes, that is what I wanted | 18:15 |
fungi | the api is a little slow to get back to the webclient during typeahead searching, could probably stand to profile those queries and see where it's spending time | 18:16 |
melwitt | oh, I see what I did wrong. when I selected python-openstackclient from the drop down, I selected the "text:python-openstackclient" instead of the project version of it | 18:16 |
*** Swami has joined #openstack-infra | 18:17 | |
fungi | i find the icons a little opaque, so can see where that would be easy to do. the hover tooltip helps but i wonder if we shouldn't spell out what's in the tooltip into the selection menu | 18:17 |
melwitt | so when I added "floating" it took away the text:python-openstackclient. if I select the project:python-openstackclient it works | 18:17 |
AJaeger | fungi: I have ossa converted to python3! Now cleaning up... | 18:17 |
*** yamamoto has joined #openstack-infra | 18:18 | |
melwitt | yeah, I've got it now. thank you | 18:18 |
fungi | melwitt: yeah, right now search terms are exclusively anded, so having more than one for a specific category makes little sense. there's been discussion about how to switch out the search query parser for a more full-featured language | 18:18 |
melwitt | fungi: yeah, makes sense. I just didn't notice there were "different" python-openstackclient query types in the box (and defaults to text if you don't wait for the box selector) | 18:19 |
melwitt | now I know what to do :) | 18:19 |
*** ralonsoh has quit IRC | 18:20 | |
fungi | melwitt: well, if we spell it out with project: and text: in front of terms in the search bar that might make it more obvious than just the icons | 18:25 |
*** rkukura has joined #openstack-infra | 18:27 | |
melwitt | yeah, that would make it similar to how to search in gerrit or logstash.o.o | 18:28 |
*** yamamoto has quit IRC | 18:29 | |
*** trown|lunch is now known as trown | 18:36 | |
AJaeger | prometheanfire, fungi, https://review.openstack.org/615626 ports ossa to python3 | 18:43 |
*** electrofelix has quit IRC | 18:43 | |
prometheanfire | AJaeger: very nice | 18:43 |
fungi | AJaeger: wow, thanks!!! | 18:44 |
AJaeger | config-core, https://review.openstack.org/615501 changes remaining publish jobs to "tox -e docs" - please review. | 18:44 |
prometheanfire | AJaeger: watching the review (just in case) | 18:44 |
AJaeger | prometheanfire, fungi, I included the openstackdocstheme into that as well. If you want to use the "Report a bug " link of the theme, tell me where to report bugs - launchpad project or storyboard project. Those are bugs against OSSA itself... | 18:45 |
fungi | prometheanfire: yeah, we can check the draft rendering once zuul links it | 18:45 |
AJaeger | It looked good locally - but yes, let's wait until the build is done. Will tell you... | 18:45 |
fungi | AJaeger: it's sort of up in the air since we (in theory) use both lp and sb though in practice we haven't had a vulnerability to oversee for any post-sb-migration project yet | 18:46 |
AJaeger | fungi, I did the initial ossa change for 615501... | 18:46 |
*** diablo_rojo has joined #openstack-infra | 18:46 | |
AJaeger | fungi: it's bugs against ossa repo itself, so typo in the descriptions etc. | 18:47 |
AJaeger | fungi: so, if you click on the "bug" on the page, where should it open a report? I disabled the bug icon for ossa now since the docs did not give any place. | 18:47 |
AJaeger | fungi: so, it's not a bug against nova etc - but a bug against ossa documents | 18:48 |
AJaeger | but if you have nothing for that - we can leave it disabled | 18:48 |
fungi | right, we've never really had bug reports for that repo itself as we've used bug/task tracking to identify when the vmt needs to take some action on a reported vulnerability | 18:48 |
fungi | but i suppose we could stick with the lp link for now since it's more in use by the vmt still | 18:48 |
openstackgerrit | Merged openstack-infra/system-config master: adns: Set zone directory permissions https://review.openstack.org/615607 | 18:49 |
*** noama has quit IRC | 18:50 | |
clarkb | infra-root config-core https://review.openstack.org/615628 is my first draft at the project update for berlin | 18:57 |
clarkb | please review it for accuracy and also that I didn't miss anything super important | 18:57 |
AJaeger | clarkb: couple of minor suggestions | 19:05 |
clarkb | AJaeger: thanks! | 19:07 |
*** xek_ has joined #openstack-infra | 19:09 | |
*** xek has quit IRC | 19:12 | |
clarkb | AJaeger: fixes pushed | 19:13 |
*** rockyg has joined #openstack-infra | 19:14 | |
AJaeger | clarkb: I just had one more suggestion - did you include that one? | 19:14 |
clarkb | AJaeger: I did not, will do | 19:14 |
AJaeger | clarkb: your push and my addition crossed ;) | 19:15 |
AJaeger | clarkb: otherwise LGTM | 19:15 |
openstackgerrit | Merged openstack-infra/zuul master: Small script to scrape Zuul job node usage https://review.openstack.org/613674 | 19:16 |
AJaeger | fungi, prometheanfire, http://logs.openstack.org/26/615626/1/check/openstack-tox-docs/5dae997/html/ | 19:17 |
AJaeger | fungi, prometheanfire, I pushed a small update for ossa - now all ready to review | 19:19 |
*** tpsilva has quit IRC | 19:22 | |
fungi | thanks again, AJaeger! | 19:23 |
*** dtantsur is now known as dtantsur|afk | 19:24 | |
*** zul has quit IRC | 19:24 | |
AJaeger | you're welcome, fungi. Note that I also pushed https://review.openstack.org/615629 to remove anchor - it's retired. | 19:25 |
AJaeger | fungi, if you have some review time, I would appreciate review of https://review.openstack.org/615501 to move docs publishing for more sites to "tox -e docs", please | 19:26 |
fungi | AJaeger: https://review.openstack.org/615501 is complaining about a configuration error which seems to have crept in over the past 10 hours | 19:30 |
*** timothyb89 has quit IRC | 19:31 | |
*** roman_g has quit IRC | 19:36 | |
AJaeger | fungi: it is in error in the change - I wonder why Zuul did not report it initially ;( | 19:36 |
AJaeger | corvus: any ideas? ^ | 19:37 |
* AJaeger will update | 19:37 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/puppet-pip master: [debug] Fix openstack_pip provider for pip 18 https://review.openstack.org/606021 | 19:39 |
AJaeger | fungi, corvus, those two errors Zuul reports on 615501 have been wrong in initial submission already. Why did Zuul not complain initially in check pipeline - but complains now during gate? | 19:40 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Update static.o.o publishing https://review.openstack.org/615501 | 19:41 |
AJaeger | fungi, updated to fix failure - thanks ^ | 19:41 |
AJaeger | needs another change... | 19:42 |
fungi | AJaeger: yeah, i find that strange if it wasn't a regression between when check tests ran and when i approved | 19:45 |
AJaeger | fungi, corvus, I guess I know why it did not test before - because of a change to a trusted project with depends-on | 19:45 |
fungi | ohhh | 19:45 |
fungi | and the dependency hadn't merged yet? | 19:46 |
fungi | and now it has | 19:46 |
AJaeger | it has merged now - but wasn't merged earlier | 19:46 |
fungi | yeah, if we'd rechecked it likely should have reported once that merged | 19:46 |
AJaeger | so, now I got directly a -1 | 19:46 |
AJaeger | fungi: yep | 19:46 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Update static.o.o publishing https://review.openstack.org/615501 | 19:49 |
*** shardy has quit IRC | 19:52 | |
*** shardy has joined #openstack-infra | 19:52 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Remove publish-static https://review.openstack.org/615637 | 19:55 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Remove publish-static https://review.openstack.org/615637 | 19:59 |
fungi | AJaeger: do you have any suggestions for how we can deal with the sidebar on http://logs.openstack.org/26/615626/2/check/openstack-tox-docs/bfba776/html/ under openstackdocstheme? seems it ends up with all the security advisories linked there before the toc | 20:03 |
AJaeger | fungi, I think I can disable it - let me check... | 20:04 |
fungi | the previous theme didn't include that section in the sidebar | 20:05 |
fungi | but i'll be honest, i'm not sure what causes that "openstack-security-advisories" section to appear there either | 20:07 |
AJaeger | fungi, it's the title of the page... | 20:07 |
*** boden has quit IRC | 20:08 | |
*** kjackal has quit IRC | 20:08 | |
*** kjackal has joined #openstack-infra | 20:09 | |
AJaeger | fungi: updated - the toc is gone | 20:12 |
AJaeger | clarkb, fungi, could you review https://review.openstack.org/615501 again, please? This now passes | 20:13 |
AJaeger | I also needed to push https://review.openstack.org/615637 as followup | 20:14 |
*** maciejjozefczyk has joined #openstack-infra | 20:14 | |
fungi | AJaeger: the toc wasn't itself a problem, it's just that under the previous theme the toc didn't include all the advisory documents: https://security.openstack.org/ | 20:15 |
fungi | so i was wondering if it was possible to get openstackdocstheme to behave the same way | 20:17 |
AJaeger | fungi, that toc is there as well - we had both global and local toc -and that local got confused by the generated stuff. The just pushed version should be what you expect | 20:18 |
fungi | oh! thanks, so it's the global toc which was the issue? | 20:18 |
fungi | but the local toc will still be included? | 20:18 |
AJaeger | fungi, check http://logs.openstack.org/26/615626/3/check/openstack-tox-docs/0981cf0/html/ | 20:18 |
dhellmann | is it possible to configure a release or tag pipeline job on one repo so that it only runs if the tag matches a pattern? I see options for setting branches on jobs but not tags. | 20:19 |
fungi | AJaeger: oh, yep perfect--thanks again! | 20:19 |
AJaeger | yes, exactly | 20:19 |
dhellmann | it looks like the tag pattern applies to the pipeline? | 20:19 |
fungi | dhellmann: we have a pattern which determines which pipeline a given tag ref is to be enqueued into, but that's at the changeish level not the job level. i wonder whether branch matcher expressions will work on a tag, but don't know the answer | 20:20 |
*** maciejjozefczyk has quit IRC | 20:21 | |
*** rlandy is now known as rlandy|brb | 20:21 | |
dhellmann | fungi : ok. I realized this weekend that we can't use the same job to release stable versions of heat that we will for master because the dist name doesn't match what we own on pypi | 20:21 |
dhellmann | so I need a way to run different release jobs on older stable branches of heat | 20:21 |
dhellmann | unless we decide to backport the dist name change, but that seems like it would be against our stable policy | 20:22 |
fungi | dhellmann: if i (or someone) gets a chance to write up documentation and release notes for 578557 it would also solve that challenge i think? | 20:22 |
dhellmann | yeah, that would also help | 20:22 |
clarkb | infra-root I'm going to reboot the mirror in packethost. John thought that they had resovled many of the problems in packethost. I think we check if the mriror is still running tomorrow (give it ~24 hours) then if it is running set max-servers to say 10 and go from there | 20:23 |
*** jtomasek has quit IRC | 20:23 | |
fungi | when i asked in #zuul last week, tobiash said he's already been running with that feature on bmw's zuul | 20:23 |
dhellmann | fungi : is that patch just missing a release note? | 20:23 |
fungi | dhellmann: release note and documentation of the behavior, yes | 20:23 |
dhellmann | ok | 20:23 |
clarkb | actually no because the mirror doesn't seem to exist at all anymore | 20:24 |
clarkb | interesting | 20:24 |
clarkb | I guess we'll have to build a new mirror there | 20:24 |
fungi | clarkb: oof, accidentally deleted i guess? | 20:24 |
fungi | (or the whole deployment was wiped maybe) | 20:24 |
clarkb | fungi: well they had talked about database improvements. I wonder if that included starting with a fresh db | 20:25 |
dmsimard | clarkb: I'm not sure if it's related but fwiw packet.com is down | 20:25 |
openstackgerrit | Merged openstack-infra/project-config master: Update static.o.o publishing https://review.openstack.org/615501 | 20:25 |
clarkb | dmsimard: shouldn't be, the openstack control plane is independent of the packet host dashboard. We just run on their baremetal instances (and I can talk to the api its just listing no hosts) | 20:26 |
dmsimard | ack | 20:26 |
clarkb | http://logs.openstack.org/33/615633/1/check/nova-live-migration/c3ac608/job-output.txt#_2018-11-05_19_54_44_188300 adds confusion to the networking issue in rax-ord | 20:30 |
clarkb | looks like we get the wrong host key for one git push then its fine on a subsequent one | 20:31 |
clarkb | which probably does lend some weight to the idea that its two hosts fighting over the same IP | 20:31 |
clarkb | anyone know if we can have ansible log additioanl ssh remote data? | 20:36 |
clarkb | things like sshd version, the anticipated host key vs the one received, ip address (just to double check), etc? | 20:37 |
clarkb | oh neat fact gathering has some of the expected data | 20:37 |
*** e0ne has quit IRC | 20:45 | |
clarkb | ok I've confirmed that the host key as reported by facts is different after sha256 fingerprinting it with ssh-keygen. I've also checked the ip address matches up with the one in our inventory | 20:47 |
fungi | was this the same ip address as the previous problem report, by any chance? | 20:48 |
clarkb | this particular IP address shows up in ~12 failures due to this | 20:48 |
clarkb | fungi: ya I think so. 104.130.222.138 | 20:48 |
clarkb | so ya I wonder if rax lost track of that particular IP in ord | 20:49 |
*** rlandy|brb is now known as rlandy | 20:49 | |
* clarkb dobule checks against hogepodge's instance | 20:49 | |
clarkb | hogepodge's ip addr is different | 20:49 |
fungi | ahh | 20:49 |
clarkb | 104.130.216.85 | 20:49 |
clarkb | could be a small number of leaked/lost IPs? | 20:49 |
fungi | trying to communicate this to fanatical support without the assistance of cloudnull is likely to be challenging | 20:50 |
clarkb | ya... | 20:50 |
cloudnull | ^ present | 20:50 |
mordred | yay it's cloudnull! | 20:50 |
cloudnull | o/ | 20:50 |
fungi | and he appears in a puff of awesome | 20:50 |
clarkb | cloudnull: oh hey. So we think that maybe there are duplicate IPs in rax (ord in particular but haven't checked other regions) | 20:50 |
* cloudnull first day back from much required holiday | 20:50 | |
cloudnull | oh thats all bad | 20:51 |
mordred | cloudnull: welcome back! | 20:51 |
cloudnull | thanks! | 20:51 |
clarkb | cloudnull: http://logs.openstack.org/33/615633/1/check/nova-live-migration/c3ac608/job-output.txt#_2018-11-05_19_54_44_188300 is the symptom (notice that we push to secondary before and after that error successfully | 20:51 |
fungi | yeah, we've seen other cloud providers "lose track" of virtual machines from time to time and exhibit this exact behavior | 20:51 |
clarkb | cloudnull: and if I look up the error message with that IP address I find ~12 cases where this particular IP address has exhibited this in the last week | 20:51 |
fungi | and then you end up with arp overwrites in the routers giving you a toss-up as to whether you end up communicating with your vm or the ghost | 20:52 |
cloudnull | do we have a rax ticket to go wave around ? | 20:52 |
clarkb | cloudnull: not yet, I've only really jsut sat down to dig into what data we do have | 20:52 |
clarkb | I can open one if that will help | 20:52 |
cloudnull | ok. | 20:52 |
*** gfidente is now known as gfidente|afk | 20:52 | |
cloudnull | if you have a moment. i will ping some internal folks while im here | 20:53 |
clarkb | ya I'll be around. Just tell me what I should do next :) | 20:53 |
clarkb | or do you mean file the ticket if I have a moment? | 20:53 |
* clarkb digs up uuid for this instance | 20:53 | |
clarkb | de6e6777-f4bf-4fb6-a6ee-ffc1cc1ee2cb is the instance uuid for that particular case | 20:54 |
fungi | it's one of those sorts of issues where if we just go through normal ticket reporting it's going to take first tier support forever to determine that we're not crazy and escalate it to someone who can check whether there are lost instances squatting those ip addresses | 20:54 |
cloudnull | clarkb yea, if you could file a ticket it'd be great just so I can go wave it around at people to make them fix it faster. | 20:54 |
clarkb | cloudnull: ok I'll work on that now | 20:54 |
* cloudnull is already causing a ruckuss in their chat channels | 20:55 | |
fungi | at least in the previous cases we've seen, the nova api is just going to confirm to the operator that there's nothing there | 20:55 |
fungi | end up needing to track it through the network gear to a particular host and then use virsh or something to find the running vm | 20:56 |
cloudnull | So "104.130.222.138" is the troublesome IP | 20:58 |
clarkb | cloudnull: ya, one of them at least. Sorry working to file ticket, but logstash may have nore data for us | 20:58 |
clarkb | cloudnull: 104.130.216.85 is one that hogepodge identified yesterday | 20:58 |
hogepodge | Thank you cloudnull and clarkb ! | 20:59 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Merger: automatically add new hosts to the known_hosts file https://review.openstack.org/608453 | 21:03 |
clarkb | cloudnull: #181105-ord-0000889 | 21:04 |
ianw | fungi: did you see my comments on the insecure: true stuff? | 21:07 |
*** shrasool has joined #openstack-infra | 21:08 | |
ianw | fungi: btw that had me so confused i had to git log & code read openstacksdk :) https://review.openstack.org/#/c/615512/ | 21:08 |
clarkb | cloudnull: `message:"ED25519 host key for 104.130.222.138 has changed and you have requested strict checking." AND (tags:"console.html" OR tags:"job-output.txt")` is a logstash query (http://logstash.openstack.org) that will show you instances for that particular IP address | 21:08 |
clarkb | cloudnull: you can search back 10 days currently | 21:09 |
openstackgerrit | Colleen Murphy proposed openstack-infra/puppet-pip master: [debug] Fix openstack_pip provider for pip 18 https://review.openstack.org/606021 | 21:12 |
fungi | ianw: yep, did you see my reply comment on the change? | 21:12 |
*** jamesmcarthur has quit IRC | 21:13 | |
*** maciejjozefczyk has joined #openstack-infra | 21:16 | |
clarkb | cloudnull: just let us know if theer is any other info we should dig up. I'll be around all day (though I need lunch right this moment) | 21:16 |
*** eernst has joined #openstack-infra | 21:16 | |
cloudnull | from the folks in pub cloud "so there is totally a rogue VM for that ip" -- "i'm cleaning it up now" | 21:17 |
fungi | heh | 21:17 |
fungi | we suspect there are other affected addresses as well | 21:17 |
ianw | fungi: ahhh, sorry yes now reloaded :) great, i thought it would be something like that but ran out of time to look, thanks | 21:18 |
clarkb | ya we'd need to go digging through logstash to find them though | 21:18 |
*** eernst_ has joined #openstack-infra | 21:18 | |
clarkb | fungi: any chance you are in a spot to do that now using some variant of my query above? | 21:18 |
fungi | i have a few minutes to try, yes. i'll see if i come up with anything | 21:18 |
fungi | i suppose i should limit it to the last 24 hours in case some have already been cleaned up | 21:20 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for eavesdrop.o.o https://review.openstack.org/590048 | 21:21 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for lists.katacontainers.io https://review.openstack.org/602380 | 21:21 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for lists.openstack.org https://review.openstack.org/615656 | 21:21 |
*** maciejjozefczyk has quit IRC | 21:21 | |
*** eernst has quit IRC | 21:21 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Don't do live streaming in loops https://review.openstack.org/615657 | 21:21 |
fungi | NOT message:"104.130.222.138" AND message:"has changed and you have requested strict checking." AND (tags:"console.html" OR tags:"job-output.txt") | 21:22 |
fungi | that gets me a few hits in the past day | 21:22 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for wiki-dev.openstack.org https://review.openstack.org/615658 | 21:22 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for wiki.openstack.org https://review.openstack.org/615659 | 21:22 |
cloudnull | the rouge VM has been dealt with | 21:22 |
*** eernst_ has quit IRC | 21:23 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for logstash.openstack.org https://review.openstack.org/615660 | 21:24 |
fungi | clarkb: cloudnull: looks like that query turns up similar collisions for 162.242.218.218 and 104.130.217.169 in rackspace | 21:25 |
fungi | also 158.69.64.67 which is in ovh, not rackspace | 21:25 |
fungi | hogepodge: i'm guessing the fact that a 5-node job is 5x as likely to hit a contended ip address makes this show up a lot more for openstack-helm testing | 21:27 |
coreycb | AJaeger: hi, can you comment on this where I've posed a question to @ajaeger? Thanks in advance. https://review.openstack.org/#/c/610708/5/goals/stein/python3-first.rst@45 | 21:27 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for subunit workers https://review.openstack.org/615661 | 21:29 |
cloudnull | fungi folks are dealing with those IPs now too | 21:30 |
openstackgerrit | Douglas Mendizábal proposed openstack-infra/irc-meetings master: Change Barbican meeting time for DST ending https://review.openstack.org/615662 | 21:30 |
fungi | cloudnull: awesome, thanks!!! | 21:30 |
fungi | cloudnull: i only queried back a day in case older addresses in that situation were already dealt with by the operators, but we should likely keep an eye out for more hits of a similar nature in our logs | 21:31 |
fungi | clarkb: ^ | 21:31 |
*** jamesmcarthur has joined #openstack-infra | 21:31 | |
*** kaiokmo has quit IRC | 21:35 | |
hogepodge | fungi: that's why I always buy five lottery tickets. ;-) | 21:36 |
clarkb | cloudnull: thank you! | 21:39 |
*** shrasool has quit IRC | 21:39 | |
*** markmcclain has quit IRC | 21:42 | |
*** adam_g has quit IRC | 21:43 | |
*** dhellmann has quit IRC | 21:44 | |
hogepodge | clarkb: cloudnull: so if it's something we're seeing in multiple public clouds, it sounds like it might be an upstream bug | 21:45 |
clarkb | hogepodge: it could be | 21:45 |
*** dhellman_ has joined #openstack-infra | 21:49 | |
*** dhellman_ is now known as dhellmann | 21:49 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for elasticsearch.openstack.org https://review.openstack.org/615665 | 21:53 |
*** ansmith has quit IRC | 21:54 | |
*** agopi|off has joined #openstack-infra | 21:58 | |
*** e0ne has joined #openstack-infra | 22:02 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Change Barbican meeting time for DST ending https://review.openstack.org/615662 | 22:04 |
clarkb | http://status.openstack.org/elastic-recheck/#1384373 is the e-r bug to follow on the ip reuse thing | 22:05 |
clarkb | we should see that number fall in theory | 22:05 |
*** AJaeger_ has joined #openstack-infra | 22:05 | |
clarkb | cloudnull: ^ fyi if you want to follow along using our data tracking | 22:05 |
*** e0ne has quit IRC | 22:05 | |
*** AJaeger has quit IRC | 22:07 | |
*** gfidente|afk has quit IRC | 22:08 | |
*** trown is now known as trown|outtypewww | 22:08 | |
fungi | excluding those other three addresses and extending the query to 2 days gets no additional hits | 22:08 |
fungi | 7 days gets me some others | 22:09 |
fungi | 104.130.207.161 and 104.130.216.201 in rackspace | 22:10 |
fungi | 213.32.77.33 in ovh | 22:10 |
fungi | none of those were seen in the past 48 hours though | 22:11 |
fungi | also 213.32.73.193 in ovh | 22:12 |
*** bobh has quit IRC | 22:12 | |
fungi | 172.99.69.23 in rackspace | 22:13 |
fungi | 149.202.161.227 in ovh | 22:13 |
fungi | that's all of the others for the past week | 22:14 |
fungi | cloudnull: so you might see if they also know about (or have already cleaned up) 104.130.207.161, 104.130.216.201 and 172.99.69.23 | 22:14 |
fungi | ianw: can you maybe give amorin a heads up later in your day about 158.69.64.67, 213.32.77.33, 213.32.73.193 and 149.202.161.227? | 22:15 |
*** adam_g has joined #openstack-infra | 22:17 | |
fungi | and with that, i need to disappear for dinner. back in a while | 22:18 |
*** dhellmann_ has joined #openstack-infra | 22:18 | |
*** dhellmann has quit IRC | 22:18 | |
*** bobh has joined #openstack-infra | 22:19 | |
clarkb | fungi: thank you for digging those up | 22:20 |
*** dhellmann_ is now known as dhellmann | 22:20 | |
*** rockyg has quit IRC | 22:23 | |
*** bobh has quit IRC | 22:24 | |
*** bobh has joined #openstack-infra | 22:25 | |
ianw | fungi: will do | 22:26 |
*** felipemonteiro has joined #openstack-infra | 22:28 | |
*** bobh has quit IRC | 22:29 | |
*** munimeha1 has quit IRC | 22:42 | |
clarkb | ianw: http://logs.openstack.org/07/612307/6/gate/tripleo-ci-centos-7-undercloud-containers/edecfaf/job-output.txt.gz#_2018-11-02_12_57_33_229120 name resolution errors on centos-7 now too? | 22:42 |
clarkb | oh actually hrm. That failed in bhs1 where openstack apis think we have ipv6 but the instances don't know about it (not in metadata or config drive) I wonder if we misconfigure unbound there? | 22:43 |
clarkb | nope the ara report shows we don't assume the host has ipv6 there and only set ipv4 resolvers | 22:44 |
ianw | hmmm | 22:46 |
*** jcoufal has quit IRC | 22:46 | |
clarkb | it is interesting that this seems to affect red hat distros more significantly than suse or debian or ubuntu | 22:47 |
* clarkb asks logstash if this is still true | 22:47 | |
ianw | a fix for that prefering ipv6 over ipv4 even when no ipv6 got pushed into all fedora unbound | 22:47 |
ianw | i was thinking of reverting our fix | 22:47 |
clarkb | in the case of ovh there is ipv6 but it must be statically configured from data retrieved through the nova/neutron apis. You can't see it from the instance metadata directly | 22:48 |
clarkb | so those are ipv4 only clouds currently with glean | 22:48 |
clarkb | and the failure happens after we should've written an ipv4 only config for unbound | 22:49 |
clarkb | ok its not centos only says logstash | 22:49 |
clarkb | xenial and bionic show up a bit too | 22:49 |
clarkb | happens in gra1 majority of time with bhs1 following with significant portion too (then long tail) | 22:50 |
clarkb | maybe we are having problems getting to opendns from ovh? | 22:50 |
clarkb | infra-root ^ what do we think about replacing opendns with cloudflare dns (1.1.1.1) | 22:51 |
ianw | yeah, i mean it could also be transient ... maybe we should put a pause and loop in there? | 22:51 |
clarkb | ianw: ya we could also try a few times to see | 22:51 |
ianw | all things being equal, maybe we should start with that and keep the resolvers fixed for now, and if we still see timeouts after even a couple of loops, well there's bigger issues? | 22:52 |
clarkb | sounds good. I mention cloudflare because that have massive distribution and scale (so like google shouldn't have many outages) | 22:53 |
clarkb | whereas opendns has been acuqired and who knows anymore | 22:53 |
ianw | ok, i'll look at adding a loop as a first thing | 22:53 |
ianw | it was one year ago we were watching the horse race at the sydney summit today ... time files | 22:55 |
*** agopi|off is now known as agopi | 22:55 | |
clarkb | I think my horse came in dead last | 22:55 |
clarkb | I know how to pick them | 22:56 |
clarkb | ianw: iirc unbound won't retry a failed lookup against a different forwarder, but it will round robin the next request | 22:58 |
clarkb | ianw: we should be careful that retries don't suddenly work on the second attempt because the other dns server was used then fail later in the jobs. (that would be good indication we should change providers though) | 22:58 |
clarkb | https://system.opendns.com/ indicates opendns should've been fine though | 23:00 |
ianw | hrm, true; if we could grab some of the unbound log file it would be good too | 23:00 |
ianw | dmsimard: this feel like something like attachments or artifacts or something like that which i think ara can display? | 23:02 |
*** florianf has quit IRC | 23:05 | |
*** felipemonteiro has quit IRC | 23:07 | |
*** kjackal has quit IRC | 23:08 | |
*** lbragstad has quit IRC | 23:09 | |
*** lbragstad has joined #openstack-infra | 23:10 | |
*** sthussey has quit IRC | 23:11 | |
clarkb | ianw: the upside is that those jobs are failing in pre so will be retried | 23:20 |
clarkb | prior to zuulv3 I expect many of those nodes would've been recycled by nodepool ready script checks | 23:20 |
*** mriedem has quit IRC | 23:20 | |
*** xek__ has joined #openstack-infra | 23:21 | |
*** xek_ has quit IRC | 23:24 | |
clarkb | ianw: also idea: on top of collecting unbound logs maybe we can track the failed names and backend servers? we might learn that github.com wih 30 second ttl fails a lot more than git.o.o with hour long ttl (or the opposite) | 23:28 |
*** adriancz has quit IRC | 23:28 | |
clarkb | there are potentially things we can do with our dns records to allevaite some of the pain there | 23:28 |
ianw | hrm you mean retry with a different server? | 23:32 |
ianw | i mean target DNS name | 23:33 |
clarkb | more, that we can change ttls and potentially dns hosting | 23:33 |
clarkb | if we find our dns is particularly unhappy for some reason | 23:33 |
clarkb | infra-root re packethost there appear to be a bunch of floating IPs used (and possibly leaked). I think john's test nodepool may have done that? I am unsure but that is making me think we don't want to boot a new mirror just yet | 23:36 |
clarkb | looks like most of the /25 is allocated | 23:36 |
clarkb | but not attached | 23:36 |
*** jamesmcarthur has quit IRC | 23:40 | |
*** jamesmcarthur has joined #openstack-infra | 23:41 | |
*** jamesmcarthur has quit IRC | 23:43 | |
*** kgiusti has left #openstack-infra | 23:43 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!