*** hamzy has joined #oooq | 00:14 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039 @ https://review.openstack.org/602248, master: tripleo-ci-centos-7-ovb- (2 more messages) | 00:36 |
---|---|---|
*** apetrich has quit IRC | 01:57 | |
*** aakarsh has joined #oooq | 02:09 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039 @ https://review.openstack.org/602248, master: tripleo-ci-centos-7-ovb- (2 more messages) | 02:36 |
*** rlandy|ruck|bbl is now known as rlandy|ruck | 02:42 | |
*** rlandy|ruck has quit IRC | 02:49 | |
*** irclogbot_3 has quit IRC | 03:01 | |
*** irclogbot_3 has joined #oooq | 03:01 | |
*** ykarel has joined #oooq | 03:24 | |
*** raukadah is now known as chandankumar | 03:25 | |
ykarel | weshay, did the issue related to buildah got cleared? | 03:43 |
ykarel | let me know if i can check something | 03:43 |
weshay | ykarel I don't think it's triggered, haven't really looked yet.. was planning on doing it in the morning | 03:44 |
ykarel | weshay, i see u fired test project with those repos | 03:45 |
ykarel | i can see jobs here:- https://trunk-primary.rdoproject.org/api-centos-stein/api/civotes_detail.html?commit_hash=31a3eed8143f9ab8ecc8bc6123df4c1f5e45f826&distro_hash=5db0bc146d484e09fcc55872c92861109113529d | 03:45 |
ykarel | both with and without that patch ^^ | 03:45 |
ykarel | and seems same result | 03:45 |
weshay | yup 2019-04-12 00:02:35.871 14873 ERROR tripleoclient.v1.tripleo_deploy.Deploy [ ] Exception: Not found image: docker://trunk.registry.rdoproject.org/tripleomaster/centos-binary-tempest:586361584d545dd76d055874284fdf70c730a476_4bfa3685 | 03:46 |
weshay | ykarel aye | 03:48 |
weshay | and the rpms are on the undercloud | 03:48 |
weshay | ykarel I'll email folks w/ you on it | 03:49 |
*** ykarel_ has joined #oooq | 03:51 | |
*** ykarel has quit IRC | 03:53 | |
*** ykarel has joined #oooq | 03:54 | |
ykarel | weshay, /me got disconnected, network issue | 03:54 |
weshay | ykarel emailed u | 03:55 |
ykarel | okk just saw, merging the revert and will see the result in some time | 03:55 |
ykarel | next periodic run in around 1 hour | 03:56 |
weshay | ykarel it didn't merge | 03:56 |
*** ykarel_ has quit IRC | 03:56 | |
weshay | I don't have merge rights | 03:56 |
ykarel | weshay, i just +W | 03:56 |
weshay | ah there we go | 03:56 |
weshay | ) | 03:56 |
weshay | :) | 03:56 |
ykarel | but still it's confusing jobs failing at different places, strange is standalones are passing, need to look more what's the issue | 04:02 |
weshay | ykarel the tenant was maxed out | 04:02 |
weshay | ykarel fixed that | 04:02 |
weshay | should be clean | 04:03 |
weshay | ykarel https://review.openstack.org/#/c/651964/1 | 04:03 |
weshay | I meged that | 04:03 |
weshay | but will take a few hours to land | 04:03 |
ykarel | okk | 04:03 |
weshay | workflowed rather | 04:03 |
ykarel | that should not hurt the jobs i think | 04:03 |
weshay | ykarel it could.. every other podman/buidah update has :) | 04:04 |
ykarel | weshay, hmm possible but for last some time the updates were not that bad that used to be before | 04:04 |
weshay | ya.. but fool me once, twice .. you know the drill | 04:05 |
weshay | ykarel /me going to bed | 04:05 |
ykarel | but will get more clear picture with next run | 04:05 |
weshay | see you later man.. always thanks for the help! | 04:05 |
ykarel | weshay, ack good night | 04:05 |
*** udesale has joined #oooq | 04:13 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb- (2 more messages) | 04:36 |
*** marios has joined #oooq | 05:15 | |
*** ykarel has quit IRC | 05:16 | |
*** ykarel has joined #oooq | 05:29 | |
*** ykarel_ has joined #oooq | 05:34 | |
*** ykarel has quit IRC | 05:36 | |
*** jtomasek has joined #oooq | 05:39 | |
*** ykarel__ has joined #oooq | 05:46 | |
*** ykarel_ has quit IRC | 05:48 | |
*** quiquell has joined #oooq | 05:50 | |
*** quiquell is now known as quiquell|rover | 05:50 | |
*** panda has joined #oooq | 05:54 | |
*** panda has quit IRC | 05:55 | |
*** marios has quit IRC | 05:57 | |
*** marios has joined #oooq | 05:58 | |
*** jfrancoa has joined #oooq | 06:06 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb- (3 more messages) | 06:36 |
*** quiquell|rover is now known as quique|rover|brb | 06:44 | |
*** ykarel_ has joined #oooq | 06:49 | |
*** ykarel__ has quit IRC | 06:51 | |
*** holser_ has joined #oooq | 06:59 | |
*** skramaja has joined #oooq | 06:59 | |
*** hamzy has quit IRC | 07:05 | |
*** hamzy has joined #oooq | 07:05 | |
*** aakarsh has quit IRC | 07:05 | |
*** quique|rover|brb is now known as quiquell|rover | 07:08 | |
*** panda has joined #oooq | 07:12 | |
*** ykarel_ is now known as ykarel | 07:16 | |
*** panda has quit IRC | 07:16 | |
*** amoralej|off is now known as amoralej | 07:21 | |
*** panda has joined #oooq | 07:29 | |
zbr | who can join me on bj for a 15min presentation on using pytest to validate release config files? marios, quiquell|rover arxcruz ? | 07:39 |
*** jfrancoa has quit IRC | 07:40 | |
marios | zbr: o/ when? | 07:40 |
zbr | marios: 9am BST (in 20min) sounds ok? | 07:41 |
marios | zbr: k | 07:41 |
*** apetrich has joined #oooq | 07:42 | |
*** tosky has joined #oooq | 07:43 | |
quiquell|rover | zbr: CI promotion pipeline is not happy | 07:44 |
*** jpena|off is now known as jpena | 07:47 | |
zbr | quiquell|rover: anything i can help with? i see that gate-check failed due to resources. probably the stack cleanup did not work? | 07:49 |
quiquell|rover | I am into it | 07:49 |
*** ykarel is now known as ykarel|lunch | 07:52 | |
*** ccamacho has joined #oooq | 07:53 | |
*** jfrancoa has joined #oooq | 07:58 | |
zbr | https://bluejeans.com/2655417928 | 08:01 |
*** dtantsur|afk is now known as dtantsur | 08:01 | |
quiquell|rover | chandankumar: do you have some brain cycles for me ? | 08:02 |
chandankumar | quiquell|rover: yes | 08:02 |
quiquell|rover | I am super bad a discovering what tempest is trying to do | 08:02 |
chandankumar | quiquell|rover: where? | 08:02 |
quiquell|rover | chandankumar: http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/73a08de/logs/tempest.html.gz | 08:02 |
quiquell|rover | Failed to establish authenticated ssh connection to cirros@192.168.24.113 (Error reading SSH protocol banner). Number attempts: 10. Retry after 11 seconds. | 08:02 |
chandankumar | quiquell|rover: checking | 08:03 |
zbr | https://review.openstack.org/#/c/649965/ | 08:03 |
quiquell|rover | chandankumar: this is a cirros instance that tempest is trying to startup ? | 08:04 |
quiquell|rover | chandankumar: and then try to access it ? | 08:04 |
chandankumar | quiquell|rover: https://github.com/openstack/tempest/blob/master/tempest/scenario/test_minimum_basic.py#L145 | 08:06 |
chandankumar | quiquell|rover: before rebooting it is trying to ssh into instance | 08:06 |
quiquell|rover | so ti add the security group to have 22 port there | 08:07 |
quiquell|rover | and try to access | 08:07 |
quiquell|rover | this is an instance it has startup over the oc installed by the standalone | 08:07 |
quiquell|rover | this is it ? | 08:07 |
quiquell|rover | and als othe log there is the console log of the instance | 08:09 |
quiquell|rover | ok | 08:09 |
quiquell|rover | thanks ! | 08:09 |
chandankumar | quiquell|rover: yes it add the security group | 08:10 |
chandankumar | in this test, create glance image create keypair, server create , create a volume, attach to it then detacch then add floatin ip to it | 08:11 |
chandankumar | and retirve it then tries to ssh into it | 08:11 |
*** amoralej is now known as amoralej|mtg | 08:13 | |
quiquell|rover | test_minimum_basic_scenario | 08:15 |
quiquell|rover | I don't see this test at a passing one ? | 08:15 |
quiquell|rover | Have activate it now ? | 08:15 |
quiquell|rover | Ahh no nathing | 08:16 |
quiquell|rover | Is there | 08:16 |
chandankumar | quiquell|rover: https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/featureset052.yml#L36 it is running on standalone it should be running | 08:20 |
chandankumar | I am trying to grep in log why ssh is timing out | 08:20 |
quiquell|rover | chandankumar: looks like Error reading SSH protocol banner happend when you can do a ssh connection | 08:20 |
quiquell|rover | chandankumar: but ssh server is not responding | 08:20 |
quiquell|rover | chandankumar: do we have logs of those instances ? | 08:21 |
quiquell|rover | chandankumar: at thhe end if the ssh logs of the instance created in the overcloud | 08:21 |
chandankumar | failed to get http://169.254.169.254/2009-04-04/user-data | 08:22 |
chandankumar | warning: no ec2 metadata for user-data | 08:22 |
chandankumar | found datasource (ec2, net) | 08:22 |
chandankumar | Top of dropbear init script | 08:22 |
chandankumar | Starting dropbear sshd: WARN: generating key of type ecdsa failed! | 08:22 |
chandankumar | I am noit sure it is related | 08:22 |
quiquell|rover | chandankumar: nope is same at success on | 08:22 |
quiquell|rover | chandankumar: there is nothing especial at the instance console :-/ | 08:22 |
quiquell|rover | chandankumar: na, don't worry I am going to recheck the job | 08:24 |
*** dsneddon has quit IRC | 08:26 | |
chandankumar | quiquell|rover: one more thing, yesterday we ported jobs to os_tempest https://github.com/rdo-infra/rdo-jobs/blob/master/zuul.d/standalone-jobs.yaml#L6 | 08:26 |
chandankumar | it is still running validate-tempest role | 08:26 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb- (3 more messages) | 08:36 |
marios | thanks zbr | 08:37 |
marios | we | 08:37 |
marios | r | 08:37 |
marios | soren | 08:37 |
zbr | thanks, for the patience | 08:38 |
*** derekh has joined #oooq | 08:39 | |
zbr | marios: panda : re just enabling the html report, that one is already ready for tripleo-ci : https://review.openstack.org/#/c/651910/1 -- you can click the openstack-tox-py* links to check it. | 08:41 |
quiquell|rover | marios, panda, zbr, chandankumar: to make f28 standalone voting https://review.openstack.org/#/c/651230/ | 08:43 |
quiquell|rover | It's passing now | 08:43 |
quiquell|rover | http://zuul.openstack.org/builds?job_name=tripleo-ci-fedora-28-standalone | 08:43 |
quiquell|rover | also uc upgrade https://review.openstack.org/#/c/651218 | 08:44 |
chandankumar | sshnaidm|off: https://review.openstack.org/#/c/639324/ this might run tls tests | 08:46 |
marios | quiquell|rover: ack needs to go gate too? | 08:47 |
marios | quiquell|rover: (commented) | 08:47 |
quiquell|rover | everything that is voting have to be in the gates I think | 08:48 |
quiquell|rover | Since the time window to go to gate can break the jobs too | 08:48 |
marios | quiquell|rover: right thats what i'm saying :) | 08:49 |
quiquell|rover | fixing | 08:50 |
quiquell|rover | marios: https://review.openstack.org/651230 | 08:54 |
quiquell|rover | zbr: can you revote ? https://review.openstack.org/651230 | 08:55 |
marios | quiquell|rover: voted. btw are you using a config file to add reviewers? maybe consider removing matt young ... poor matt is still being spammed by you | 08:56 |
quiquell|rover | I have matt there ? | 08:58 |
quiquell|rover | I was also the gmail :-/ | 08:58 |
quiquell|rover | poor man | 08:58 |
*** dtantsur is now known as dtantsur|brb | 08:59 | |
*** dsneddon has joined #oooq | 09:06 | |
*** ykarel|lunch is now known as ykarel | 09:21 | |
zbr | quiquell|rover: is bringing back the voting job to the people! hurrah! ;) | 09:27 |
quiquell|rover | zbr: well we are trying it here https://bugs.launchpad.net/tripleo/+bug/1823901 | 09:29 |
openstack | Launchpad bug 1823901 in tripleo "move non-voting jobs back to voting where possible" [High,Triaged] | 09:29 |
ykarel | quiquell|rover, hi | 09:32 |
ykarel | quiquell|rover, good to rerun failed ovb master jobs in testproject | 09:32 |
ykarel | https://trunk-primary.rdoproject.org/api-centos-master-uc/api/civotes_detail.html?commit_hash=38f3f4c50df0f4c4f45e956744e4fc2605092306&distro_hash=6fe222ac6797ca0469c45831310cafa8c06f4a06 | 09:33 |
ykarel | ovb jobs failed with introspection, 1 standalone failed at tempest SSH | 09:33 |
quiquell|rover | ykarel: i am doing it already | 09:34 |
quiquell|rover | ykarel: the ones that are not affected by the metadata issue https://review.rdoproject.org/r/#/c/20166/ | 09:34 |
ykarel | quiquell|rover, but you missed some jobs in ^^ | 09:34 |
ykarel | there are multiple failings | 09:35 |
chandankumar | arxcruz: except plain standalone all standalone periodic scenario jobs are using os_tempest | 09:35 |
quiquell|rover | ykarel: they are related to metadata | 09:35 |
chandankumar | I am updating all in us story itself | 09:35 |
quiquell|rover | the not known ones are those | 09:35 |
ykarel | quiquell|rover, not known? | 09:35 |
ykarel | u means socket address bound | 09:35 |
quiquell|rover | bmc socket issue and tempest ssh issue are new | 09:36 |
quiquell|rover | other failures are known | 09:36 |
quiquell|rover | just those two to verify that is not transitory | 09:36 |
*** dsneddon has quit IRC | 09:37 | |
ykarel | quiquell|rover, those ^^ are also seen earlier | 09:38 |
ykarel | i think there must be bug for socket one atleast | 09:38 |
ykarel | iirc that happens when there are some stacks undeleted | 09:38 |
quiquell|rover | ykarel: the one at centos-7-standalone is known ? | 09:45 |
ykarel | quiquell|rover, seems transient and happening from time to time | 09:46 |
ykarel | not sure if there is bug already | 09:46 |
ykarel | but chandankumar may know if any | 09:46 |
chandankumar | ykarel: seeing first time | 09:47 |
chandankumar | ykarel: socket error | 09:47 |
ykarel | chandankumar, nope | 09:47 |
ykarel | tempest one | 09:47 |
ykarel | chandankumar, https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/73a08de/logs/tempest.html.gz | 09:47 |
chandankumar | ykarel: yes SSHException: Error reading SSH protocol banner seen first time | 09:48 |
chandankumar | waiting for next run | 09:48 |
quiquell|rover | could be that images are so smal that load is too much ? | 09:48 |
quiquell|rover | and sshd does not respond ? | 09:48 |
ykarel | ack | 09:49 |
chandankumar | quiquell|rover: nope | 09:50 |
chandankumar | quiquell|rover: instance is already spinned, failed at ssh | 09:51 |
zbr | quiquell|rover: not sure if it appllies here but I remember very well that about one year ago we had problems when new rhel came out, it did take longer to boot and ir jobs failing to connect because sshd took more time to start, even worse there was a small period of time where the ssh port was open but sshd daemon was not working. retries were needed (and not around is port open check). Also cloud load influenced it. | 09:51 |
quiquell|rover | zbr: it has retries there but yest the banner issue is usually this | 09:51 |
quiquell|rover | zbr: do you remember the cloud load stuff ? | 09:52 |
chandankumar | quiquell|rover: is this issue seen multiple times? | 09:54 |
zbr | quiquell|rover: in my case it was not this "Error reading SSH protocol banner" error, also I see that we retry for ~4 minutes. Not sure what to say if 4min is too much or too little, but it may worth increasing it a little bit. | 09:54 |
chandankumar | not sure cirros-3.5 image is the issue | 09:54 |
chandankumar | we are using cirros-3.6 in os_tempest side | 09:55 |
quiquell|rover | chandankumar: I think is first time | 09:55 |
zbr | chandankumar: i think that may be the cause: Starting dropbear sshd: WARN: generating key of type ecdsa failed! | 09:55 |
quiquell|rover | chandankumar: did we update the image ? | 09:55 |
chandankumar | quiquell|rover: in devstack, we switched to 3.6 long time ago | 09:55 |
quiquell|rover | zbr: that appears at a passing job too | 09:55 |
zbr | ecdsa is key in recent ssh, this should not fail. also more interesting is the next line after that. | 09:56 |
quiquell|rover | zbr: let me find the passing job | 09:56 |
chandankumar | quiquell|rover: zbr https://github.com/openstack/openstack-ansible-os_tempest/commit/81938c8e73a240e0773c812725de4e802f213b53 | 09:57 |
chandankumar | in next run I think it will use os_Tempest | 09:57 |
quiquell|rover | zbr: is here too http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/1cc6eea/logs/undercloud/home/zuul/tempest/tempest.log.txt.gz | 09:58 |
quiquell|rover | chandankumar: htestproject has wored | 10:00 |
quiquell|rover | chandankumar, ykarel: http://logs.rdoproject.org/66/20166/1/check/periodic-tripleo-ci-centos-7-standalone-master/ddd9bad/ | 10:00 |
chandankumar | quiquell|rover: ok | 10:01 |
quiquell|rover | chandankumar: === cirros: current=0.3.5 uptime=16.84 === | 10:01 |
quiquell|rover | ____ ____ ____ | 10:01 |
quiquell|rover | / __/ __ ____ ____ / __ \/ __/ | 10:01 |
quiquell|rover | / /__ / // __// __// /_/ /\ \ | 10:01 |
quiquell|rover | \___//_//_/ /_/ \____/___/ | 10:01 |
quiquell|rover | http://cirros-cloud.net | 10:01 |
quiquell|rover | old cirros | 10:02 |
quiquell|rover | this is not suppose to use the os_tempest ? | 10:02 |
chandankumar | quiquell|rover: /me is confused | 10:02 |
chandankumar | quiquell|rover: all scenario standalone job is running ostempest | 10:02 |
quiquell|rover | chandankumar: maybe there is something missing at periodic jobs | 10:02 |
chandankumar | quiquell|rover: but why this change https://review.rdoproject.org/r/#/c/20061/ | 10:02 |
quiquell|rover | at zuul config | 10:02 |
chandankumar | is not yet picked | 10:02 |
chandankumar | it is merged 18 hours ago | 10:03 |
quiquell|rover | weird | 10:03 |
quiquell|rover | let me find | 10:03 |
quiquell|rover | use_os_tempest: true | 10:04 |
quiquell|rover | is there | 10:04 |
quiquell|rover | http://logs.rdoproject.org/66/20166/1/check/periodic-tripleo-ci-centos-7-standalone-master/ddd9bad/zuul-info/inventory.yaml | 10:04 |
chandankumar | something wrong is there | 10:04 |
quiquell|rover | Let check the output of playbooks | 10:04 |
quiquell|rover | chandankumar: it's using validate-tempest | 10:05 |
chandankumar | something wrong in job definition | 10:05 |
quiquell|rover | ok let's find | 10:05 |
quiquell|rover | maybe it's not overriding it | 10:05 |
quiquell|rover | can be | 10:06 |
quiquell|rover | that we override in featureset or the like | 10:06 |
chandankumar | quiquell|rover: if we make changes in fs it will break upstream jobs | 10:07 |
chandankumar | quiquell|rover: https://review.rdoproject.org/r/#/c/20061/4/zuul.d/standalone-jobs.yaml@18 does it causes something? | 10:08 |
quiquell|rover | 2019-04-12 09:45:09.970895 | primary | PLAY [Validate the deployment] ************************************************* | 10:08 |
quiquell|rover | 2019-04-12 09:45:09.994961 | primary | | 10:08 |
quiquell|rover | 2019-04-12 09:45:09.995150 | primary | TASK [include_tasks] *********************************************************** | 10:08 |
quiquell|rover | 2019-04-12 09:45:10.006830 | primary | Friday 12 April 2019 09:45:10 +0000 (0:00:00.081) 0:58:03.969 ********** | 10:08 |
quiquell|rover | 2019-04-12 09:45:10.034036 | primary | skipping: [undercloud] | 10:08 |
quiquell|rover | 2019-04-12 09:45:10.049693 | primary | | 10:08 |
quiquell|rover | 2019-04-12 09:45:10.049923 | primary | PLAY RECAP ********************************************************************* | 10:08 |
quiquell|rover | 2019-04-12 09:45:10.050092 | primary | undercloud : ok=106 changed=59 u | 10:08 |
quiquell|rover | it's skipping os_tempest ? | 10:08 |
ykarel | quiquell|rover, ack | 10:08 |
ykarel | panda, /me converting build-containers to role https://review.openstack.org/652027 | 10:09 |
ykarel | we need to reuse it in rdoinfo jobs | 10:09 |
*** dsneddon has joined #oooq | 10:09 | |
quiquell|rover | chandankumar: I am opening a bug for it | 10:10 |
chandankumar | quiquell|rover: sure | 10:10 |
panda | ykarel: you beat me to it by a couple of days. I probably would have needed this for the rdo on rhel job | 10:12 |
ykarel | panda, okk let's get this before so you have easy transition :) | 10:13 |
*** dsneddon has quit IRC | 10:14 | |
chandankumar | quiquell|rover: I am a big idiot, let me show how | 10:16 |
quiquell|rover | found it ? | 10:16 |
chandankumar | quiquell|rover: https://review.rdoproject.org/r/#/c/20168/ | 10:16 |
panda | ykarel: ok, waiting for the results. Looks good, but for example with the similar change for the other playbook, some variables needed to be passsed in a weird way. Maybe this doesn't need it. | 10:16 |
quiquell|rover | chandankumar: https://bugs.launchpad.net/tripleo/+bug/1824512 | 10:16 |
openstack | Launchpad bug 1824512 in tripleo "periodic standalone jobs not using os_tempest" [High,Triaged] - Assigned to Quique Llorente (quiquell) | 10:16 |
quiquell|rover | Add the bug | 10:16 |
ykarel | panda, ack let's see how it goes then i iterate to fix it | 10:17 |
quiquell|rover | chandankumar: ahh yep smelt like that | 10:17 |
quiquell|rover | chandankumar: I am going to test it at my test review | 10:18 |
chandankumar | quiquell|rover: sure | 10:20 |
chandankumar | quiquell|rover: it also provided me a suggestion to add a new var in os_tempest role to just run smoke_Tests | 10:22 |
quiquell|rover | cool | 10:22 |
chandankumar | instead of putting crazy regex | 10:22 |
quiquell|rover | sshnaidm|off: Do I see a pass for centos/libvirt here ? https://review.rdoproject.org/r/#/c/20131/ | 10:25 |
zbr | panda: look at last test (expand) from http://logs.openstack.org/72/651772/6/check/openstack-tox-py27/510de77/tox/reports.html -- does it look ok to you? | 10:29 |
*** bogdando has joined #oooq | 10:35 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb- (3 more messages) | 10:36 |
*** panda is now known as panda|lunch | 10:42 | |
*** dsneddon has joined #oooq | 10:42 | |
*** dsneddon has quit IRC | 10:47 | |
*** amoralej|mtg is now known as amoralej | 10:49 | |
*** holser_ is now known as holser|lunch | 11:01 | |
*** udesale has quit IRC | 11:05 | |
weshay | quiquell|rover when you have a minute, let's 1-1 | 11:17 |
*** dsneddon has joined #oooq | 11:17 | |
quiquell|rover | now is good | 11:17 |
quiquell|rover | going to your room | 11:17 |
weshay | zbr were you going anything w/ polling zuul yesterday.. the infra guys were freaked out | 11:17 |
weshay | zbr I may need to chat w/ you too for a min | 11:18 |
zbr | weshay: nothing special, but I am one of the few using https://github.com/openstack/coats/blob/master/coats/openstack_gerrit_zuul_status.user.js | 11:18 |
weshay | I'll let you know | 11:18 |
zbr | sure. but as a note I didn't do anything special about zuul yesterday. | 11:19 |
weshay | quiquell|rover http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/6a18a34/logs/bmc-console.log | 11:21 |
*** dsneddon has quit IRC | 11:21 | |
*** jpena is now known as jpena|lunch | 11:27 | |
*** jaosorior has quit IRC | 11:33 | |
arxcruz | chandankumar: hey, so on https://tree.taiga.io/project/tripleo-ci-board/task/898 i need to do for oswin-tempest-plugin and vmware-nsx-tempest-plugin right ? | 11:34 |
arxcruz | sorry the delay on this | 11:34 |
chandankumar | arxcruz: yes | 11:35 |
chandankumar | arxcruz: first check in openstack/releases repo | 11:35 |
chandankumar | if tag is there or not | 11:35 |
chandankumar | arxcruz: thanks for taking care of that | 11:35 |
chandankumar | arxcruz: I have emailed the today's agenda feel free to add or remove from that | 11:36 |
arxcruz | chandankumar: vmware-nsx-tempest-plugin isn't in releases | 11:50 |
weshay | zbr hey you around? | 11:51 |
zbr | weshay: yep, joining now. | 11:51 |
weshay | arxcruz did you update the lp spec for tempest? | 11:51 |
weshay | tripleo lp spec | 11:52 |
*** dsneddon has joined #oooq | 11:53 | |
arxcruz | weshay: no, doing now | 11:56 |
arxcruz | weshay: do | 11:57 |
arxcruz | weshay: done | 11:57 |
weshay | arxcruz ok.. paste the review . quiquell|rover needs the same | 11:57 |
*** dsneddon has quit IRC | 11:58 | |
weshay | zbr go let clark know :) | 12:07 |
quiquell|rover | weshay, arxcruz: what's that spec ? | 12:11 |
quiquell|rover | chandankumar: merging https://review.rdoproject.org/r/#/c/20168/ it's using os_tempest alright | 12:13 |
*** skramaja has quit IRC | 12:13 | |
zbr | weshay: the API issue from yesterday seem to be unrelated to our gate-checks, likely they are releated to cockpit, see http://eavesdrop.openstack.org/irclogs/%23tripleo/%23tripleo.2019-04-11.log.html#t2019-04-11T19:45:31 | 12:15 |
arxcruz | quiquell|rover: add tempest as tag in lp | 12:15 |
zbr | see https://softwarefactory-project.io/grafana/d/000000001/zuul-status?orgId=1&from=1554466523170&to=1555071323170&refresh=5s --- does anyone know who could create these spikes in API? | 12:15 |
quiquell|rover | arxcruz: so how do I do stuff like that ? | 12:16 |
quiquell|rover | arxcruz: to other tags | 12:16 |
*** rlandy has joined #oooq | 12:16 | |
chandankumar | arxcruz: checking | 12:16 |
*** rlandy is now known as rlandy|ruck | 12:16 | |
rlandy|ruck | quiquell|rover: hello :) | 12:17 |
weshay | quiquell|rover https://review.openstack.org/#/c/650283/ | 12:17 |
quiquell|rover | rlandy|ruck: o/ | 12:17 |
quiquell|rover | rlandy|ruck: containers are good now after buildah rollback | 12:17 |
chandankumar | arxcruz: vmware-nsx is also not in releases just like novajoin | 12:17 |
quiquell|rover | rlandy|ruck: there were some issues non related | 12:18 |
rlandy|ruck | weshay: re: the nova jobs bugs ... paul claims it's fixed upstream | 12:18 |
rlandy|ruck | quiquell|rover: yep - interesting day yesterday | 12:18 |
chandankumar | arxcruz: let pin this to this commit https://github.com/openstack/vmware-nsx-tempest-plugin/commit/586361584d545dd76d055874284fdf70c730a476 | 12:18 |
* rlandy|ruck checks ruck/rover etherpad | 12:18 | |
weshay | rlandy|ruck right.. they may have added the patch to sf last night | 12:19 |
quiquell|rover | rlandy|ruck: so the issue what the new version of buildah or just buildah vs docker ? | 12:19 |
rlandy|ruck | what brought zbr into the mix? | 12:19 |
weshay | quiquell|rover the later... buildah vs docker | 12:20 |
quiquell|rover | weshay: and what's this stuff with repodata ? | 12:20 |
rlandy|ruck | quiquell|rover: could not get containers at all in OVB jobs | 12:20 |
rlandy|ruck | see the testproject run | 12:20 |
rlandy|ruck | quiquell|rover: EmilienM had a new version of buildah and podman | 12:20 |
rlandy|ruck | we wanted to test those rpms | 12:21 |
quiquell|rover | rlandy|ruck: ack now I get it | 12:21 |
rlandy|ruck | so we could do this again if we needed to | 12:21 |
rlandy|ruck | quiquell|rover: ^^ possibly with a nwe patch as the old one didn;t work | 12:21 |
quiquell|rover | rlandy|ruck: you mean reactivating buildah pipeline or doing a testproject ? | 12:22 |
quiquell|rover | or reproducer ? | 12:22 |
rlandy|ruck | quiquell|rover: testproject before - not another try this directly in the pipeline | 12:23 |
quiquell|rover | rlandy|ruck: we will have to exercise push though | 12:23 |
rlandy|ruck | not sure what to do with this bug: https://bugs.launchpad.net/tripleo/+bug/1824388? | 12:23 |
openstack | Launchpad bug 1824388 in tripleo "periodic jobs are failing undercloud install - Not found image" [Critical,Fix released] - Assigned to Ronelle Landy (rlandy) | 12:23 |
rlandy|ruck | close it out due to revert? | 12:23 |
quiquell|rover | rlandy|ruck: I closed it | 12:23 |
weshay | rlandy|ruck no.. don't close it | 12:23 |
quiquell|rover | rlandy|ruck: is working now the pipeline | 12:24 |
quiquell|rover | rlandy|ruck: the other stuff is just continue with the transition with buildah | 12:24 |
rlandy|ruck | still 'In progress' | 12:24 |
weshay | rlandy|ruck so .. what we need to explain in the CIX card and meeting is that we should be using "buildah" but it's busted atm.. and needs to be fixed | 12:24 |
*** dsneddon has joined #oooq | 12:24 | |
rlandy|ruck | oh wrong bug | 12:24 |
quiquell|rover | I see it as Fix Releasd | 12:24 |
rlandy|ruck | quiquell|rover: ^^ yep - was looking at the wrong bug | 12:24 |
rlandy|ruck | hmm ... don't see the bgug in production chain | 12:26 |
quiquell|rover | rlandy|ruck: me neither | 12:26 |
quiquell|rover | I am going to re-tag it | 12:26 |
rlandy|ruck | https://trello.com/c/yNNAaSqC/948-cixlp1824317tripleociproa-periodic-containers-build-fail-at-push-unauthorized-authentication-required-n | 12:26 |
quiquell|rover | to see if it apperas | 12:26 |
rlandy|ruck | quiquell|rover: weshay: ^^ we still have this one in prod chain list | 12:27 |
quiquell|rover | rlandy|ruck: well thats buildah too | 12:27 |
rlandy|ruck | worst case, I'll attach the second bug to this card | 12:27 |
quiquell|rover | rlandy|ruck: it was fixed though | 12:27 |
rlandy|ruck | it will give us a space to talk a bout buildah | 12:27 |
quiquell|rover | rlandy|ruck: let's see if bug appears after re labeling | 12:27 |
quiquell|rover | yep | 12:27 |
rlandy|ruck | yeah if the new one shows up, I'll move the old one | 12:28 |
rlandy|ruck | not a big deal | 12:28 |
*** jpena|lunch is now known as jpena | 12:28 | |
rlandy|ruck | weshay: requesting time this afternoon to bring hw back up | 12:28 |
quiquell|rover | rlandy|ruck: about this https://bugs.launchpad.net/tripleo/+bug/1824315 | 12:29 |
openstack | Launchpad bug 1824315 in tripleo "periodic fedora28 standalone job failing at test_volume_boot_pattern" [Critical,Fix released] - Assigned to Quique Llorente (quiquell) | 12:29 |
quiquell|rover | rlandy|ruck: I have close it, it does not appear anymore | 12:29 |
*** dsneddon has quit IRC | 12:29 | |
chandankumar | quiquell|rover: is it possible to track specific tests in sova? | 12:29 |
rlandy|ruck | quiquell|rover: it's fine - don't worry - | 12:29 |
weshay | rlandy|ruck when you have a moment to breathe, let's sync on openstack-infra heat stacks, and bm hardware | 12:29 |
rlandy|ruck | just let's leave the old card there and I'll add the new bug in that card to talk about it | 12:29 |
quiquell|rover | chandankumar: it uses regex more or less so maybe ye s | 12:30 |
* chandankumar needs to check it | 12:30 | |
quiquell|rover | chandankumar: what do you really want to do/have ? | 12:31 |
rlandy|ruck | weshay: ack - will ping in a bit after checked on pidone work | 12:31 |
chandankumar | quiquell|rover: just to track how many times specifical critical tempest tests failed | 12:31 |
quiquell|rover | weshay, Tengu: fix por issue with pki at reproducer https://review.rdoproject.org/r/#/c/20175/ | 12:31 |
quiquell|rover | mergy mergy | 12:31 |
quiquell|rover | chandankumar: that's interesting | 12:32 |
chandankumar | quiquell|rover: I am donot want to go through elastic recheck to look for specific tests grep | 12:32 |
quiquell|rover | would be nice to search by tempest test instead of job name | 12:32 |
quiquell|rover | chandankumar: is a small subset of tests ? | 12:33 |
chandankumar | it will give an idea how much time across our jobs got failed | 12:33 |
chandankumar | quiquell|rover: yes two tests critical | 12:33 |
quiquell|rover | if they are only two we can add them to the telegraf python script | 12:33 |
quiquell|rover | so we can search by them at cockpit | 12:33 |
quiquell|rover | and write a graph | 12:33 |
chandankumar | quiquell|rover: basically these 3 tests | 12:34 |
chandankumar | quiquell|rover: https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/featureset052.yml#L35 | 12:34 |
Tengu | quiquell|rover: «o/ | 12:34 |
Tengu | lemme check that. | 12:34 |
chandankumar | need to look into stestr_html report | 12:34 |
quiquell|rover | rlandy|ruck, weshay: to make f28 standalone voting and gating https://review.openstack.org/#/c/651230/ | 12:35 |
quiquell|rover | panda|lunch: ^ | 12:35 |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb- (2 more messages) | 12:36 |
rlandy|ruck | quiquell|rover: ack - about time | 12:36 |
rlandy|ruck | https://review.openstack.org/#/c/651010/ merged - thank you | 12:37 |
quiquell|rover | weshay: workflow f28 standalone voting/gateing https://review.openstack.org/#/c/651230/ | 12:44 |
quiquell|rover | Tengu: btw since you have a big lab there | 12:45 |
Tengu | quiquell|rover: yeah? | 12:45 |
quiquell|rover | Tengu: there is a host mode for reproducer | 12:45 |
quiquell|rover | Tengu: to attack directly your big machines | 12:45 |
quiquell|rover | Tengu: only one node though | 12:45 |
Tengu | oh? | 12:46 |
Tengu | that would be nice for standalone | 12:46 |
quiquell|rover | nodepool_provider: host | 12:46 |
quiquell|rover | it will launche the job at same machine reproducer is | 12:46 |
Tengu | quiquell|rover: is there some "real" doc for that reproducer thingy? The readme is a bit light imho. | 12:46 |
quiquell|rover | Tengu: we have only the README and the doc in the script | 12:47 |
quiquell|rover | Tengu: not more info | 12:47 |
Tengu | hmm | 12:48 |
Tengu | would love to get some more though :) | 12:48 |
quiquell|rover | We are just eating our own dog food now, but you are an exception | 12:48 |
Tengu | hehe | 12:48 |
Tengu | guess it would be great if I wasn't the only one though ;) | 12:48 |
quiquell|rover | Yep step by step | 12:48 |
rlandy|ruck | card commented for prod chain | 12:49 |
quiquell|rover | rlandy|ruck: I see one old stack there, is this important ? | 12:49 |
rfolco | rlandy|ruck, scen000-updates is failing on stein - http://logs.openstack.org/09/651809/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/61bcb01/logs/undercloud/home/zuul/overcloud_update_prepare_containers.log.txt.gz | 12:49 |
rlandy|ruck | quiquell|rover: sorry - an old stack where? | 12:50 |
panda|lunch | quiquell|rover: learn from the mistakes of the past https://review.openstack.org/#/c/633087/ , look at my comments. | 12:50 |
rlandy|ruck | rfolco: looking | 12:50 |
quiquell|rover | rfolco: https://bugs.launchpad.net/tripleo/+bug/1824500 | 12:50 |
openstack | Launchpad bug 1824500 in tripleo "periodic stein fs037 updates tripleo-ansible-inventory: error: unrecognized arguments: --undercloud-connection" [Critical,Fix committed] - Assigned to Quique Llorente (quiquell) | 12:50 |
quiquell|rover | rlandy|ruck: ^ | 12:50 |
rlandy|ruck | yep - thought that one was familiar - thanks | 12:50 |
quiquell|rover | rfolco: TL;DR tripleo-validations RPM was build with old commit after creating stable/stein branch | 12:51 |
*** dtantsur|brb is now known as dtantsur | 12:51 | |
quiquell|rover | np | 12:51 |
chandankumar | arxcruz: thanks for the comment on the tempest docs! | 12:51 |
rfolco | quiquell|rover, thanks | 12:51 |
quiquell|rover | panda|lunch: ack, so what do we do here ? | 12:52 |
quiquell|rover | panda|lunch: also how important is this job now ? | 12:52 |
quiquell|rover | rfolco: yw | 12:52 |
*** holser|lunch is now known as holser_ | 12:53 | |
panda|lunch | quiquell|rover: I'm looking at recent builds. I see seven failures in the last 2 days. At least two are legit. One seems to have failed for the missing containers bug. We may have a chance this time, but be ready to justify the choise with data of a stable histotry in your hands. | 12:56 |
quiquell|rover | panda|lunch: thanks will do, yep let be conservative here | 12:57 |
*** dsneddon has joined #oooq | 12:58 | |
chandankumar | arxcruz: weshay os_tempest meeting time https://redhat.bluejeans.com/1571313919/6145/ | 13:00 |
*** altlogbot_0 has joined #oooq | 13:03 | |
*** dsneddon has quit IRC | 13:04 | |
rlandy|ruck | quiquell|rover: on ruck/rover etherpad ... "https://bugs.launchpad.net/tripleo/+bug/1824256 - introspection issues, is promotion blocker, and looks like ykarel found something." | 13:06 |
openstack | Launchpad bug 1824256 in tripleo "Possible network issues in rdo-cloud causing introspection failures" [Critical,In progress] - Assigned to Ronelle Landy (rlandy) | 13:06 |
rlandy|ruck | what did ykarel find? | 13:06 |
rlandy|ruck | I see your note with the link to a closed bug | 13:06 |
rlandy|ruck | there was an issue with the tenant a while back | 13:06 |
rlandy|ruck | just checking if there is something new | 13:06 |
*** aakarsh has joined #oooq | 13:07 | |
quiquell|rover | rlandy|ruck: nah red herring it was not that | 13:07 |
rlandy|ruck | k - np | 13:07 |
quiquell|rover | rlandy|ruck: didn't update the line | 13:07 |
quiquell|rover | A lot of stuff should be fixed next run | 13:07 |
rlandy|ruck | great | 13:08 |
quiquell|rover | rlandy|ruck: run reproducer ci at tripleo-ci change https://review.rdoproject.org/r/#/c/20176/ | 13:15 |
quiquell|rover | rlandy|ruck: with toci dry-run | 13:15 |
*** aakarsh|2 has joined #oooq | 13:17 | |
*** aakarsh has quit IRC | 13:20 | |
*** aakarsh|2 has quit IRC | 13:22 | |
zbr | panda|lunch: rlandy|ruck : a simple patch that enables html reporting for py* jobs: https://review.openstack.org/#/c/651910/2 | 13:23 |
quiquell|rover | zbr: we use pytest at emit releases python script too and in the reproducer unit tests | 13:25 |
quiquell|rover | zbr: maybe is nice to have html reports there too | 13:25 |
zbr | quiquell|rover: this is enabling it for emitreleases. i will do it for the other too. | 13:25 |
quiquell|rover | zbr: take a look at rdo-infra/ci-config there some there too | 13:26 |
quiquell|rover | ohhh html is super nice | 13:26 |
chandankumar | arxcruz: you are talking about this https://review.openstack.org/#/c/648121/ ? | 13:28 |
arxcruz | chandankumar: yes | 13:28 |
chandankumar | arxcruz: https://tree.taiga.io/project/tripleo-ci-board/task/917?kanban-status=1447276 | 13:29 |
chandankumar | arxcruz: it is from sprint bug backlog | 13:29 |
chandankumar | arxcruz: https://tree.taiga.io/project/tripleo-ci-board/us/960 | 13:30 |
arxcruz | chandankumar: no taiga link on the patch... | 13:30 |
chandankumar | arxcruz: we can ask him to add it | 13:30 |
*** dsneddon has joined #oooq | 13:32 | |
*** altlogbot_0 has quit IRC | 13:32 | |
*** altlogbot_0 has joined #oooq | 13:33 | |
*** dsneddon has quit IRC | 13:36 | |
*** quiquell|rover is now known as quiquell|off | 13:37 | |
*** altlogbot_0 has quit IRC | 13:38 | |
*** altlogbot_1 has joined #oooq | 13:38 | |
*** aakarsh has joined #oooq | 13:40 | |
*** aakarsh has quit IRC | 13:40 | |
*** aakarsh has joined #oooq | 13:41 | |
zbr | chandankumar arxcruz rlandy|ruck : when can I spare you 15mins to present you the pytest-html reporting? | 13:56 |
rlandy|ruck | zbr: yes | 14:00 |
rlandy|ruck | promised to make time for that | 14:00 |
*** altlogbot_1 has quit IRC | 14:00 | |
rlandy|ruck | zbr: pls ping me or book some time | 14:00 |
zbr | rlandy|ruck: lest see if the other two can, it could easier to bundle it. | 14:01 |
rlandy|ruck | zbr: k - let me know | 14:01 |
rfolco | rlandy|ruck, can you please paste buildah bug so I can refer in my upstream patch please ? | 14:05 |
*** dsneddon has joined #oooq | 14:06 | |
rlandy|ruck | weshay: quiquell|off: panda|lunch: marios: review pls ... https://review.openstack.org/#/c/652078/ | 14:11 |
*** dsneddon has quit IRC | 14:11 | |
rlandy|ruck | rfolco, https://bugs.launchpad.net/tripleo/+bug/1824388 | 14:11 |
openstack | Launchpad bug 1824388 in tripleo "periodic jobs are failing undercloud install - Not found image" [Critical,Fix released] - Assigned to Ronelle Landy (rlandy) | 14:11 |
rlandy|ruck | weshay: k- ready to chat when you are | 14:11 |
weshay | rlandy|ruck k.. let me chat w/ you a bit later | 14:12 |
weshay | arxcruz I'm avail if you still have time before your next | 14:12 |
rlandy|ruck | k | 14:12 |
rfolco | rlandy|ruck, thx | 14:13 |
weshay | zbr do you have a local instance of the cockpit running? | 14:15 |
zbr | nope. but btw, false alarm about the zuul api, it was not the cockpit. it was me, well my firefox "upgrade". | 14:16 |
marios | rlandy|ruck: ack minor comment | 14:16 |
marios | rlandy|ruck: can revote if urgent to merge | 14:17 |
rlandy|ruck | not urgent | 14:20 |
rlandy|ruck | marios: ^^ got a workaround for pidone but this is the real fix | 14:20 |
*** altlogbot_3 has joined #oooq | 14:25 | |
*** Vorrtex has joined #oooq | 14:27 | |
*** altlogbot_3 has quit IRC | 14:29 | |
*** altlogbot_0 has joined #oooq | 14:30 | |
weshay | zbr that is so weird | 14:32 |
zbr | weshay: what? bj? | 14:33 |
weshay | zbr ur firefox | 14:33 |
*** altlogbot_0 has quit IRC | 14:33 | |
*** panda|lunch is now known as panda | 14:33 | |
zbr | yeah, i know. but that time coincided with my system reboot which included a firefox update, something quite rare. | 14:34 |
zbr | probably some combination of ff plugins made it enter an endless loop that repeated the http call. that's the only thing I can think about. | 14:35 |
*** ykarel is now known as ykarel|afk | 14:35 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb- (2 more messages) | 14:36 |
rlandy|ruck | and we're back with node failures :( | 14:40 |
weshay | rlandy|ruck really? | 14:43 |
weshay | dang | 14:43 |
* weshay looks | 14:43 | |
*** dsneddon has joined #oooq | 14:45 | |
rlandy|ruck | weshay: investigating | 14:45 |
rlandy|ruck | we van talk about it later | 14:45 |
rlandy|ruck | can | 14:45 |
*** altlogbot_0 has joined #oooq | 14:48 | |
*** dsneddon has quit IRC | 14:49 | |
*** altlogbot_0 has quit IRC | 14:51 | |
*** altlogbot_2 has joined #oooq | 14:52 | |
rfolco | rlandy|ruck, I see 3 jobs failing for stable/stein now, scen8, scen9, and stdln-upgr | 14:53 |
rfolco | from these, apprently only scen9 is green for rocky and master, and failing stein only | 14:53 |
*** altlogbot_2 has quit IRC | 14:55 | |
*** altlogbot_2 has joined #oooq | 14:56 | |
*** altlogbot_2 has quit IRC | 14:56 | |
*** Goneri has joined #oooq | 14:58 | |
*** altlogbot_1 has joined #oooq | 14:58 | |
rlandy|ruck | will look in a sec | 14:58 |
*** ykarel|afk is now known as ykarel | 15:00 | |
arxcruz | weshay: i do have time now | 15:01 |
arxcruz | do you ? | 15:01 |
arxcruz | so, i'm going to the supermarket buy some stuff for apetrich tomorrow, then i'll be back :D | 15:04 |
*** dsneddon has joined #oooq | 15:04 | |
apetrich | now I'm to blame for :) | 15:04 |
arxcruz | apetrich: of course :P | 15:10 |
arxcruz | always have someone to blame :P | 15:10 |
apetrich | arxcruz, just do like everyone else and create a bug in bugzilla "mistral bug: Missed a meeting because I went to the market" | 15:11 |
*** tosky has quit IRC | 15:11 | |
*** tosky has joined #oooq | 15:12 | |
arxcruz | apetrich: cool, i'll do it :D | 15:13 |
*** jfrancoa has quit IRC | 15:19 | |
*** ccamacho has quit IRC | 15:20 | |
rlandy|ruck | weshay: hmmm ...wondering if we shoudn't stagger the periodic pipeline more si stein/master kick one hour part - see if that helps node failures | 15:21 |
ykarel | rlandy|ruck, so current periodic run was started manually? | 15:34 |
rlandy|ruck | ykarel: no - kicked with four hourly trigger | 15:34 |
ykarel | okk should be by cron only, ignore | 15:35 |
ykarel | yes just saw | 15:35 |
rlandy|ruck | but we see node failures often at the start | 15:35 |
rlandy|ruck | and I was wondering if we are not loading the system | 15:35 |
ykarel | i seems overloaded | 15:35 |
ykarel | rlandy|ruck, yes | 15:35 |
rlandy|ruck | so ... if we started stein/master apart, maybe we could get by | 15:35 |
ykarel | rlandy|ruck, see https://zabbix.infra.prod.eng.rdu2.redhat.com/zabbix/screens.php?elementid=224 | 15:36 |
ykarel | i see vms possible: tripleoci | ci.m1.large | 38 | 15:36 |
ykarel | few minutes back it was 20 | 15:36 |
ykarel | so ^^ could be reason for NODE_FAILURE | 15:36 |
chandankumar | See ya guys, Have a nice weekend :-) | 15:36 |
*** chandankumar is now known as raukadah | 15:36 | |
rlandy|ruck | ykarel - all a guess really but if we could reduce possible causes, we may get somewhere | 15:38 |
ykarel | rlandy|ruck, possible causes? | 15:39 |
ykarel | are there some VMS in Error state | 15:39 |
zbr | rlandy|ruck: ok for bj seession in ~20min, at 1600 UTC? | 15:40 |
rlandy|ruck | yeah - we are getting more and more of those VMs in error state | 15:40 |
rlandy|ruck | cleaning up again | 15:40 |
rlandy|ruck | zbr: ack | 15:40 |
ykarel | rlandy|ruck, ack | 15:40 |
ykarel | rlandy|ruck, good to check why they get into ERROR state | 15:40 |
rlandy|ruck | sure | 15:41 |
zbr | arxcruz: you are welcomed too. | 15:41 |
*** bogdando has quit IRC | 15:44 | |
rlandy|ruck | holy cow | 15:49 |
rlandy|ruck | we are in error | 15:49 |
rlandy|ruck | stacks are ok though | 15:49 |
rlandy|ruck | {u'message': u'No valid host was found. There are not enough hosts available.', u'code': 500, u'created': u'2019-04-12T15:00:38Z'} | 15:50 |
rlandy|ruck | likely means we ran out of resources | 15:50 |
ykarel | yes zabbix indicates the same, how many vms are there in ERROR state? | 15:52 |
rlandy|ruck | no stack though | 15:56 |
rlandy|ruck | ykarel: 27 - I've actually seen worse | 15:56 |
rlandy|ruck | getting rid of those | 15:56 |
ykarel | ack | 15:56 |
*** derekh has quit IRC | 15:58 | |
zbr | https://bluejeans.com/2655417928 | 16:00 |
*** jfrancoa has joined #oooq | 16:00 | |
*** jfrancoa has quit IRC | 16:08 | |
*** dsneddon has quit IRC | 16:10 | |
ykarel | rlandy|ruck, is fs039 failure known? | 16:11 |
*** dsneddon has joined #oooq | 16:11 | |
rlandy|ruck | ykarel: ack - spoke with sshnaidm | 16:11 |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 16:12 | |
*** dtantsur is now known as dtantsur|afk | 16:12 | |
ykarel | rlandy|ruck|mtg, current master promotion has just missing fs039 | 16:12 |
ykarel | missing successful jobs: [u'periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master'] | 16:12 |
rlandy|ruck|mtg | maybe we should remove from criteria and promote | 16:12 |
rlandy|ruck|mtg | weshay: ^^ ok? | 16:12 |
ykarel | if the issue is already known and worked upon good to promote | 16:12 |
weshay | rlandy|ruck|mtg is 39 master voting? | 16:13 |
rlandy|ruck|mtg | ykarel: just in meeting atm - will modify criteria to promote in a bit | 16:13 |
weshay | I don't think it is.. but confirm | 16:13 |
ykarel | it's third party ovb | 16:13 |
weshay | oh wait | 16:13 |
rlandy|ruck|mtg | shoudl not be | 16:13 |
weshay | it's ovb | 16:13 |
weshay | even if it's ovb should be non-voting if we're going to promote w/ a known failures | 16:13 |
* weshay looks at master promote jobs | 16:14 | |
rlandy|ruck|mtg | weshay; on zbr's presentation | 16:15 |
weshay | rlandy|ruck|mtg 39 failed on 2019-04-12 06:55:12.036609 | primary | TASK [overcloud-prep-images : Prepare the overcloud images for deploy] ********* | 16:16 |
*** ykarel is now known as ykarel|away | 16:16 | |
weshay | which afaik is infra | 16:16 |
rlandy|ruck|mtg | correct | 16:16 |
weshay | rlandy|ruck|mtg ack to promote | 16:16 |
rlandy|ruck|mtg | https://bugs.launchpad.net/tripleo/+bug/1824256 | 16:16 |
openstack | Launchpad bug 1824256 in tripleo "Possible network issues in rdo-cloud causing introspection failures" [Critical,In progress] - Assigned to Ronelle Landy (rlandy) | 16:16 |
ykarel|away | weshay, rlandy|ruck|mtg there is some real bug in fs039 | 16:16 |
*** dsneddon has quit IRC | 16:16 | |
ykarel|away | upstream jobs are also failing since 10th | 16:16 |
ykarel|away | with same error in overcloud deploy | 16:16 |
*** amoralej is now known as amoralej|off | 16:17 | |
ykarel|away | https://review.rdoproject.org/zuul/builds?job_name=tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039&branch=master&result=FAILURE | 16:17 |
ykarel|away | see the jobs which are duration around 7000 seconds | 16:17 |
ykarel|away | failed at overcloud deploy | 16:17 |
rlandy|ruck|mtg | yep - will confirm afterwards | 16:18 |
*** dsneddon has joined #oooq | 16:18 | |
ykarel|away | okk, pasting here for reference:- [overcloud.ComputeServiceChain.ServiceServerMetadataHook]: CREATE_FAILED resources.ServiceServerMetadataHook: u'type' | 16:18 |
*** dsneddon has quit IRC | 16:23 | |
*** ykarel|away has quit IRC | 16:24 | |
weshay | rlandy|ruck|mtg periodic-tripleo-ci-centos-7-standalone-master failed in the latest run | 16:29 |
weshay | rlandy|ruck|mtg /me is finally ready to chat | 16:29 |
weshay | when ever u r | 16:29 |
rlandy|ruck|mtg | weshay:in zbr's meeting | 16:29 |
rlandy|ruck|mtg | will ping afterwards | 16:29 |
*** jpena has quit IRC | 16:29 | |
*** dsneddon has joined #oooq | 16:32 | |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 16:33 | |
*** marios has quit IRC | 16:35 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb- (2 more messages) | 16:36 |
rlandy|ruck | weshay: k - let's chat | 16:37 |
rlandy|ruck | will get to failing tests after that | 16:38 |
*** tosky has quit IRC | 16:38 | |
weshay | rlandy|ruck k /me goes | 16:39 |
*** holser_ has quit IRC | 16:52 | |
zbr | weshay: rlandy|ruck https://review.openstack.org/#/c/649965/ (release config testing) is ready for final review. | 16:53 |
*** ykarel has joined #oooq | 16:57 | |
zbr | https://review.openstack.org/#/c/651910/ is enabling html report on tripleo-ci repo. | 16:57 |
weshay | rlandy|ruck https://etherpad.openstack.org/p/tripleo-train-topics | 17:10 |
*** vinaykns has joined #oooq | 17:14 | |
rlandy|ruck | zbr: will take a look in a bit | 17:45 |
*** irclogbot_3 has quit IRC | 18:08 | |
*** irclogbot_3 has joined #oooq | 18:10 | |
rlandy|ruck | step one prodchain board is sorted | 18:13 |
*** tosky has joined #oooq | 18:18 | |
raukadah | weshay: arxcruz https://etherpad.openstack.org/p/osa-train-ptg | 18:23 |
rlandy|ruck | ah - not bad - hit ipmi failure - drac ip update | 18:30 |
rlandy|ruck | weshay: ^^ | 18:30 |
rlandy|ruck | can fix that | 18:30 |
*** Vorrtex has quit IRC | 18:36 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario008-multinode-oooq-container, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-scenario012-standalone @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb- (2 more messages) | 18:36 |
raukadah | hubbot1: source | 18:37 |
hubbot1 | raukadah: My source is at https://github.com/ProgVal/Limnoria | 18:37 |
*** Vorrtex has joined #oooq | 18:40 | |
raukadah | zbr: is the pytest demo recorded? | 18:48 |
raukadah | sorry i missed it! | 18:48 |
zbr | raukadah: no but I can do another one quickly, is only 15mins. no charge for private events, yet. | 18:48 |
raukadah | zbr: bandwidth low, May be I will catch some time next week | 18:49 |
zbr | raukadah: sure. | 18:49 |
raukadah | zbr: great, thanks! | 18:49 |
zbr | were you talking about irc bots? | 18:50 |
raukadah | zbr: nope | 18:50 |
raukadah | hubbot1: I wanted to check hubbot1 source code | 18:51 |
hubbot1 | raukadah: Error: "I" is not a valid command. | 18:51 |
raukadah | 2 more messages looks annoying, may be something hidden there | 18:52 |
raukadah | so checking the source | 18:52 |
zbr | raukadah: ok. me just wanted to say that about a month ago I found a very nice bot named notifico (fully opensource but also hosted freely), https://n.tkte.ch/ -- already using it in multiple channels as it has hooks with multiple systems. adding one for gerrit should be very easy. | 18:54 |
raukadah | zbr: great, adding to my list to check! | 18:55 |
*** apetrich has quit IRC | 19:00 | |
rlandy|ruck | weird just lost ... hostname rdoci-hp-01.v100.rdoci.lab.eng.rdu2.redhat.com: Name or service not known | 19:10 |
rlandy|ruck | weshay: hmm ... it is not master but stein that would promote w/o fs039 | 19:17 |
weshay | ah.. /me looks | 19:17 |
weshay | saw that was running well | 19:18 |
weshay | rlandy|ruck you put up the push change | 19:18 |
rlandy|ruck | weshay: I put up the push change? | 19:19 |
weshay | rlandy|ruck nothing is triggering on stein except now for python-tripleoclient | 19:19 |
weshay | should be fine to promote | 19:19 |
weshay | rlandy|ruck there should be very little diff between master and stein atm | 19:19 |
rlandy|ruck | weshay: sorry - I am confused ... | 19:20 |
rlandy|ruck | https://review.rdoproject.org/zuul/status | 19:20 |
rlandy|ruck | only failing stein job is fs039 | 19:20 |
rlandy|ruck | but it's a deployment failure | 19:20 |
weshay | ya.. I see | 19:20 |
rlandy|ruck | so I didn;t change the promotion criteria | 19:21 |
rlandy|ruck | logging a bug on fs039 failing - that is all | 19:21 |
weshay | that's an odd heat error | 19:21 |
rlandy|ruck | I know | 19:21 |
rlandy|ruck | it's not the error ykarel mentioned | 19:22 |
rlandy|ruck | not the same one in check failures | 19:22 |
weshay | rlandy|ruck that is probably not specific to 39 | 19:22 |
rlandy|ruck | possibly | 19:22 |
rlandy|ruck | I am combing through the check errors to compare | 19:22 |
rlandy|ruck | almost looks like heat got interrupted | 19:23 |
weshay | rlandy|ruck I think it's a real bug | 19:25 |
weshay | rlandy|ruck http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-stein/415c446/logs/undercloud/var/log/containers/heat/heat-engine.log.txt.gz | 19:26 |
weshay | 2019-04-12 17:31:32.052 8 INFO heat.engine.resource [req-a799f5c3-5a74-4b9e-9c2e-e36f5cc7dea4 - admin - default default] creating Value "CompactServices" Stack "overcloud-ComputeServiceChain-atwhtineazrs-ServiceServerMetadataHook-4z35ptevylo2" [dd0d8e8b-d0ab-4a08-84bb-d68cff588f4d] | 19:26 |
weshay | 2019-04-12 17:31:32.066 8 ERROR heat.engine.check_resource [req-a799f5c3-5a74-4b9e-9c2e-e36f5cc7dea4 - admin - default default] Unexpected exception in resource check.: KeyError: u'type' | 19:26 |
rlandy|ruck | "No module named blazarclient". Not using blazar. | 19:26 |
rlandy|ruck | that looks familiar - ok bug in progress | 19:26 |
weshay | rlandy|ruck those are just warnings though | 19:27 |
rlandy|ruck | weshay: no the error you pointed out | 19:27 |
weshay | ah k | 19:27 |
rlandy|ruck | <ykarel|away> okk, pasting here for reference:- [overcloud.ComputeServiceChain.ServiceServerMetadataHook]: CREATE_FAILED resources.ServiceServerMetadataHook: u'type' | 19:27 |
rlandy|ruck | ^^ real problem | 19:28 |
weshay | rlandy|ruck might be the change I proposed a revert on https://review.openstack.org/#/c/652137/ | 19:30 |
rlandy|ruck | idk - let's see what runs and passes/fails | 19:31 |
weshay | rlandy|ruck or https://review.openstack.org/#/c/639119/ | 19:32 |
weshay | you have the bug submitted? | 19:32 |
rlandy|ruck | weshay: in progress - one minute - then we can log questions/comments | 19:32 |
weshay | this is what happens when rdo-cloud is unstable | 19:39 |
rlandy|ruck | weshay: https://bugs.launchpad.net/tripleo/+bug/1824579 | 19:39 |
openstack | Launchpad bug 1824579 in tripleo "tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039 jobs are failing overcloud deployment with 'KeyError: u'type''" [Undecided,New] | 19:39 |
rlandy|ruck | ^^ can comment more there | 19:39 |
ykarel | <weshay> rlandy|ruck or https://review.openstack.org/#/c/639119/ | 19:45 |
ykarel | ^^ looks the culprit | 19:45 |
*** panda has quit IRC | 19:46 | |
weshay | ykarel aye.. one of the two in the topic | 19:46 |
ykarel | hmm | 19:46 |
weshay | will try both | 19:47 |
ykarel | weshay, fs039 failed on ^^ patch on 5th, so ^^ patch affected it | 19:47 |
rlandy|ruck | only fails 039 with that big a change? | 19:49 |
ykarel | that patch should have missed some tls config | 19:49 |
rlandy|ruck | reasonable | 19:49 |
rlandy|ruck | weshay: you are entering a revert on https://review.openstack.org/#/c/639119/? | 19:51 |
rlandy|ruck | pls paste relevant reverts in bug so we track results | 19:52 |
weshay | rlandy|ruck https://review.openstack.org/#/q/topic:containers-common+(status:open+OR+status:merged) | 19:52 |
weshay | already in there | 19:52 |
ykarel | rlandy|ruck, i doubt at line https://review.openstack.org/#/c/639119/16/deployment/ovn/ovn-metadata-container-puppet.yaml@284 | 19:53 |
ykarel | metadata_settings: {} | 19:53 |
rlandy|ruck | got it | 19:54 |
*** ykarel is now known as ykarel|away | 19:54 | |
* ykarel|away leaving | 19:54 | |
weshay | ykarel|away have a good weekend | 19:54 |
ykarel|away | u too | 19:55 |
weshay | rlandy|ruck if we get bm back we can test this introspection error we see in rdo-cloud w/ it | 19:58 |
weshay | I see you have two jobs running... | 19:58 |
weshay | hopefully we can get them non-voting in the pipeline while we also figure out the baseos workflow | 19:58 |
rlandy|ruck | weshay: git two bm jobs running atm | 20:00 |
rlandy|ruck | got | 20:00 |
rlandy|ruck | updating the instackenv.json on the others | 20:00 |
weshay | introspection is running on one now :) | 20:01 |
rlandy|ruck | yep - here's hoping | 20:06 |
weshay | introspection passed | 20:12 |
rlandy|ruck | yep - and we're deploying | 20:14 |
*** ykarel|away has quit IRC | 20:17 | |
weshay | rlandy|ruck so if introspection passed and the deployment is running.. the ips must be mostly sane | 20:21 |
rlandy|ruck | weshay: I think so, so far - it's juts instackenv.json | 20:21 |
rlandy|ruck | I emailed matt to confirm | 20:21 |
rlandy|ruck | that I wasn't hitting old networks | 20:21 |
rlandy|ruck | the two hp changes are in - working on dell now | 20:22 |
rlandy|ruck | dell is not in order like hp - so it's more work - need to check each machine | 20:23 |
weshay | k | 20:27 |
weshay | let me know if I can help | 20:27 |
*** dsneddon has quit IRC | 20:29 | |
rlandy|ruck | almost done - two more envs to go | 20:33 |
*** Vorrtex has quit IRC | 20:34 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb- (2 more messages) | 20:36 |
*** zbr has quit IRC | 20:37 | |
weshay | rlandy|ruck woot.. tempest is working on libvirt repro :) | 20:55 |
weshay | https://review.openstack.org/#/q/topic:bug/1824243+(status:open+OR+status:merged) | 20:55 |
rlandy|ruck | nice | 20:56 |
weshay | arxcruz ^ let's chat about cirros 3.5/6 next week | 20:56 |
rlandy|ruck | well we have a failed deployment | 21:09 |
rlandy|ruck | all instacknev.json updated | 21:09 |
rlandy|ruck | Ping to 10.12.150.126 failed | 21:11 |
rlandy|ruck | external address looks wrong | 21:11 |
*** jtomasek has quit IRC | 21:14 | |
rlandy|ruck | weshay: negative - we have changed provisioning and external networks | 21:22 |
rlandy|ruck | ok - I see the vlan sheet - will update | 21:46 |
*** rlandy|ruck has quit IRC | 21:49 | |
*** aakarsh has quit IRC | 22:28 | |
hubbot1 | FAILING CHECK JOBS on master: tripleo-ci-centos-7-standalone-upgrade, tripleo-ci-centos-7-scenario012-standalone, tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053, tripleo-ci-centos-7-scenario008-multinode-oooq-container @ https://review.openstack.org/604298, stable/pike: tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039, tripleo-ci-centos-7-ovb- (2 more messages) | 22:36 |
*** aakarsh has joined #oooq | 22:52 | |
*** tosky has quit IRC | 23:11 | |
*** aakarsh has quit IRC | 23:40 | |
*** aakarsh has joined #oooq | 23:40 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!