hubbot | FAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-oooq @ https://review.openstack.org/564291 | 00:24 |
---|---|---|
*** tcw has joined #oooq | 01:58 | |
*** tcw1 has quit IRC | 02:01 | |
hubbot | FAILING CHECK JOBS on stable/ocata: tripleo-ci-centos-7-undercloud-oooq @ https://review.openstack.org/564291 | 02:24 |
*** rfolco_ has quit IRC | 02:50 | |
*** sanjayu_ has quit IRC | 02:59 | |
*** ykarel|away has joined #oooq | 03:47 | |
*** udesale has joined #oooq | 04:06 | |
*** rlandy|rover|bbl is now known as rlandy|rover | 04:20 | |
*** rlandy|rover has quit IRC | 04:20 | |
hubbot | All check jobs are working fine on stable/ocata, stable/ocata, master, stable/queens. | 04:24 |
*** links has joined #oooq | 04:38 | |
*** gvrangan has joined #oooq | 05:24 | |
*** ratailor has joined #oooq | 05:27 | |
*** sshnaidm_pto has quit IRC | 05:28 | |
*** gvrangan has quit IRC | 05:32 | |
*** quiquell|off is now known as quiquell | 05:33 | |
*** gvrangan has joined #oooq | 05:33 | |
*** pgadiya has joined #oooq | 05:38 | |
*** pgadiya has quit IRC | 05:38 | |
*** marios has joined #oooq | 05:44 | |
*** jaosorior has quit IRC | 05:58 | |
*** quiquell is now known as quiquell|afk | 06:13 | |
*** jaosorior has joined #oooq | 06:14 | |
*** saneax has joined #oooq | 06:14 | |
*** gvrangan has quit IRC | 06:23 | |
hubbot | All check jobs are working fine on stable/ocata, stable/ocata, master, stable/queens. | 06:24 |
*** gvrangan has joined #oooq | 06:26 | |
*** florianf has joined #oooq | 06:56 | |
*** zoli is now known as zoli|wfh | 07:08 | |
*** zoli|wfh is now known as zoli | 07:08 | |
*** quiquell|afk is now known as quiquell | 07:09 | |
*** ykarel_ has joined #oooq | 07:10 | |
*** ykarel|away has quit IRC | 07:12 | |
*** tesseract has joined #oooq | 07:22 | |
*** skramaja has joined #oooq | 07:23 | |
*** skramaja_ has joined #oooq | 07:28 | |
*** skramaja has quit IRC | 07:28 | |
*** ccamacho has joined #oooq | 07:44 | |
*** amoralej|off is now known as amoralej | 07:55 | |
*** holser__ has joined #oooq | 08:02 | |
*** ykarel_ is now known as ykarel | 08:03 | |
*** ratailor has quit IRC | 08:21 | |
hubbot | All check jobs are working fine on stable/ocata, stable/ocata, master, stable/queens. | 08:24 |
*** holser__ has quit IRC | 08:24 | |
*** sshnaidm_pto has joined #oooq | 08:25 | |
*** ratailor has joined #oooq | 08:29 | |
*** gvrangan has quit IRC | 08:35 | |
*** ykarel is now known as ykarel|lunch | 08:42 | |
*** gvrangan has joined #oooq | 08:52 | |
*** panda|off is now known as panda | 09:04 | |
quiquell | panda: Good morning | 09:11 |
panda | quiquell: helloooo | 09:12 |
quiquell | panda: Do you have a minute to check logs, and help me understand the n -> n + 1 | 09:13 |
quiquell | For sure it's not working but whant to know what it's and what not | 09:13 |
panda | quiquell: sure, how do you want to proceed ? | 09:15 |
quiquell | panda: bj and share screens or shared tmux ? | 09:17 |
quiquell | https://bluejeans.com/7891065232 | 09:19 |
*** ykarel|lunch is now known as ykarel | 09:31 | |
*** ratailor has quit IRC | 09:31 | |
*** udesale has quit IRC | 09:32 | |
*** ratailor has joined #oooq | 09:33 | |
*** apetrich has quit IRC | 09:34 | |
arxcruz|ruck | dalvarez: around? | 09:34 |
arxcruz|ruck | dalvarez: have you seen this errors before? http://paste.openstack.org/show/722402/ | 09:35 |
*** udesale has joined #oooq | 09:38 | |
*** ratailor has quit IRC | 09:46 | |
*** ratailor has joined #oooq | 09:49 | |
dalvarez | arxcruz|ruck: no idea | 09:50 |
dalvarez | i havent seen that before | 09:50 |
arxcruz|ruck | this is happening in one of our jobs in ocata all the time :/ | 09:50 |
dalvarez | arxcruz|ruck: for some reason it looks like soemthing is adding a second manager to OVS | 09:51 |
*** ratailor has quit IRC | 09:51 | |
dalvarez | maybe slaweq can take a look | 09:51 |
dalvarez | looks like a tripleo thing though | 09:51 |
*** ratailor has joined #oooq | 09:52 | |
dalvarez | maybe not, neutron adds the manager from its code too | 09:52 |
dalvarez | arxcruz|ruck: ^ | 09:52 |
arxcruz|ruck | hmmmm, okay | 09:52 |
*** gvrangan has quit IRC | 09:57 | |
*** ykarel_ has joined #oooq | 10:18 | |
*** ykarel has quit IRC | 10:21 | |
hubbot | All check jobs are working fine on stable/queens, stable/ocata, stable/ocata, master. | 10:24 |
quiquell | panda: Adding fs050 to the gates https://review.openstack.org/#/c/570882/ | 10:24 |
quiquell | It's working already | 10:24 |
*** gvrangan has joined #oooq | 10:39 | |
panda | quiquell: I have the name for the upgrade ddirection | 10:45 |
quiquell | panda: yes !!! | 10:45 |
panda | quiquell: for the differentiation | 10:45 |
quiquell | panda: what do you have in mind | 10:46 |
panda | quiquell: I usually borrow a lot from chemistry lately | 10:46 |
panda | quiquell: do you know what a atom orbital is ? | 10:46 |
quiquell | panda: Yep I remember the orbits of energy | 10:47 |
quiquell | or similar | 10:47 |
panda | quiquell: ok the electron have a particular value to identify when the are in the same orbital, and it's their spin | 10:48 |
panda | quiquell: I suggest we call this spin up and spin down | 10:48 |
quiquell | panda: holy shit | 10:49 |
*** ykarel_ is now known as ykarel | 10:49 | |
quiquell | panda: But it's always a spin up, what changes is the initial orbit | 10:50 |
quiquell | but the direction is always an upgrade | 10:50 |
quiquell | increment/decrement is wrong too | 10:50 |
quiquell | what we change is the install release | 10:50 |
quiquell | if delta is 1 | 10:51 |
quiquell | and install release is is the increment or the decrement | 10:52 |
quiquell | But it always going to be a spin up | 10:52 |
quiquell | Don't know | 10:52 |
quiquell | panda: I like spin up or spin down | 10:52 |
quiquell | Ok, so it means where do we want to put the sable release | 10:53 |
quiquell | at intall or at target | 10:54 |
quiquell | so | 10:54 |
quiquell | --stable-release-place=install --stable-release-place=target | 10:54 |
*** skramaja_ has quit IRC | 10:55 | |
quiquell | --stable-release-place=initial --stable-release-place=target | 10:55 |
quiquell | panda: What do you think ? | 10:56 |
panda | for what are these cli arguments ? | 10:56 |
quiquell | emit_releases_file.py | 10:57 |
quiquell | But similar can be for the toci_type or job env | 10:57 |
panda | --spin up --spin down ? :) | 10:57 |
quiquell | It's always a spin up | 10:57 |
quiquell | it's an upgrade | 10:57 |
quiquell | what changes is the initial point | 10:58 |
quiquell | it always goes from a lower orbit the upper one | 10:58 |
quiquell | but what changes is the initial orbit | 10:58 |
panda | --start-from-current --start-from-previous ? The difficulty is always the same, there are not many workds to describe a change in the starting point, our languages are alwasy forcused on what come next :) | 10:59 |
*** skramaja has joined #oooq | 10:59 | |
quiquell | not previous | 11:00 |
quiquell | we have de ffu | 11:00 |
panda | I give up | 11:00 |
panda | we have ffu | 11:00 |
quiquell | haha | 11:00 |
panda | FFFFFUUUUUUUUUUUUU | 11:00 |
*** ratailor_ has joined #oooq | 11:00 | |
quiquell | --stable-at-install --stable-at-target | 11:01 |
panda | --start-release-is-really-the-one-that-im-passing --start-release-is-not-that-one-its-the-other-one | 11:02 |
quiquell | --install-with-release --upgrade-with-release | 11:02 |
quiquell | panda: Look the las too | 11:02 |
quiquell | s/too/two/ | 11:02 |
*** ratailor has quit IRC | 11:03 | |
*** tesseract has quit IRC | 11:07 | |
*** tesseract has joined #oooq | 11:08 | |
*** panda is now known as panda|lunch | 11:13 | |
*** udesale has quit IRC | 11:33 | |
*** udesale has joined #oooq | 11:37 | |
*** gvrangan has quit IRC | 11:42 | |
*** udesale has quit IRC | 11:47 | |
*** udesale has joined #oooq | 11:51 | |
quiquell | panda|lunch: Have it upgrade-from-release upgrade-to-release | 11:59 |
*** atoth has joined #oooq | 11:59 | |
panda|lunch | quiquell: that might work, now they just have to be mutually exclusive :) | 12:00 |
quiquell | panda|lunch: That's just implementation but need to be done | 12:00 |
quiquell | panda|lunch: btw, undercloud upgrade finished | 12:01 |
quiquell | We can take a look later on | 12:01 |
panda|lunch | quiquell: ok, give me some time to look at the cards for today | 12:02 |
*** sshnaidm_pto has quit IRC | 12:02 | |
*** udesale has quit IRC | 12:03 | |
*** udesale has joined #oooq | 12:03 | |
*** trown|outtypewww is now known as trown | 12:04 | |
quiquell | ok | 12:04 |
*** udesale_ has joined #oooq | 12:07 | |
trown | hmm... not to throw a wrench in the naming party... but I think the script itself should be figuring out "spin-up" vs "spin-down", and we need to just add our keyword to the job type | 12:07 |
trown | otherwise we are adding logic somewhere that is not python | 12:07 |
quiquell | trown: Figuring it aut means reading an enviroment variable | 12:08 |
quiquell | trown: The only bash logic is passing what the job want to do | 12:08 |
trown | quiquell: how does the environment variable get set? | 12:09 |
panda|lunch | trown: agreed | 12:09 |
quiquell | trown: toci_type or var in the job | 12:09 |
*** udesale has quit IRC | 12:09 | |
quiquell | trown: At least this are the only options I know of, you guys sure know other ones | 12:09 |
trown | quiquell: if it is toci_type, then you have to have logic to parse toci_type | 12:10 |
quiquell | trown: Of course not doing it at fs | 12:10 |
quiquell | trown: Reading toci_type in two places is not a bad thing ? | 12:11 |
*** panda|lunch is now known as panda | 12:11 | |
trown | but... I guess we already pull out featureset... maybe it is fine to only do the parsing in bash, and none of the logic of what to do based on that parsing | 12:11 |
quiquell | trown: Changing the toci_type parsing in the only place we do it, I think is better | 12:13 |
quiquell | trown: And isolate script from toci_type too there stuff we don't care about there | 12:13 |
*** gvrangan has joined #oooq | 12:13 | |
panda | quiquell: ok, ready | 12:16 |
quiquell | panda: give me the key again | 12:19 |
panda | quiquell: lol | 12:19 |
panda | quiquell: take it from the previous place :) | 12:19 |
quiquell | Waait I am going to eat soimething first | 12:19 |
quiquell | It was wrong | 12:19 |
quiquell | It was good ? | 12:19 |
panda | quiquell: it was ok | 12:19 |
quiquell | panda: zuul@38.145.32.100 | 12:21 |
panda | quiquell: ok, take you time for the lunch | 12:21 |
quiquell | panda: Going to eat something now | 12:21 |
panda | quiquell: no rush | 12:21 |
quiquell | panda: You can explore already | 12:21 |
hubbot | All check jobs are working fine on stable/queens, stable/ocata, stable/ocata, master. | 12:24 |
*** rlandy has joined #oooq | 12:28 | |
quiquell | rlandy: Good morning, https://review.openstack.org/#/c/568946/ is merged | 12:29 |
*** apetrich has joined #oooq | 12:29 | |
*** udesale_ has quit IRC | 12:29 | |
*** udesale_ has joined #oooq | 12:30 | |
rlandy | quiquell: yes - I checked with panda - he ok'ed it. Emilien wants that review in | 12:30 |
rlandy | arxcruz|ruck: hello! | 12:31 |
*** rlandy is now known as rlandy|rover | 12:32 | |
arxcruz|ruck | rlandy|rover: hey | 12:32 |
rlandy|rover | our zuul queues are so long these days | 12:32 |
arxcruz|ruck | rlandy|rover: so, quick question, where are the rdo phase 1 jobs? | 12:32 |
rlandy|rover | wonder if there is anything we can do about that | 12:32 |
rlandy|rover | ci.centos | 12:32 |
arxcruz|ruck | rlandy|rover: nevermind, the dashboard just take longer to update | 12:32 |
rlandy|rover | and I ma going to killl that stupid ocata job today | 12:32 |
arxcruz|ruck | so pike was promoted :) | 12:33 |
arxcruz|ruck | i was looking this morning, the jobs pass but dashboard wasn't updating, makes me wondering if i was looking the wrong jobs | 12:33 |
rlandy|rover | arxcruz|ruck: dashboard is usually accurate | 12:33 |
rlandy|rover | depends on the hash in use | 12:33 |
arxcruz|ruck | rlandy|rover: regarding ocata, dalvarez told that something is trying to insert two managers | 12:34 |
arxcruz|ruck | rlandy|rover: i meant this http://rhos-release.virt.bos.redhat.com:3030/rhosp | 12:34 |
arxcruz|ruck | untill a few minutes ago was showing pike 4d old | 12:34 |
arxcruz|ruck | but now phase 1 is showing as green :) | 12:34 |
rlandy|rover | phase2 needs to promote now - I'll check on that | 12:35 |
rlandy|rover | arxcruz|ruck: what do you mean about two managers? | 12:35 |
rlandy|rover | the job gets killed at diff places each time | 12:35 |
arxcruz|ruck | rlandy|rover: http://paste.openstack.org/show/722402/ | 12:35 |
arxcruz|ruck | Transaction causes multiple rows in \"Manager\" table to have identical values | 12:36 |
rlandy|rover | yeah - I have seen that constraint violation error before | 12:37 |
rlandy|rover | the problem is that the error is not consistent | 12:37 |
rlandy|rover | did you get any more info? | 12:38 |
arxcruz|ruck | no, just a bunch of errors in neutron, which i believe that's the reason for no valid hosts | 12:38 |
arxcruz|ruck | but i'm not good in neutron stuff | 12:38 |
rlandy|rover | note that sometimes we get the conductor error | 12:38 |
rlandy|rover | I have ping'ed nova and ironic guys on this | 12:39 |
arxcruz|ruck | you should ask dtantsur|afk | 12:39 |
rlandy|rover | I have | 12:39 |
rlandy|rover | and owalsh | 12:39 |
arxcruz|ruck | dtantsur|afk: I can pay you a beer at brewdog if you help us :D | 12:39 |
rlandy|rover | if it's a neutron issue, we need to bug someone else | 12:40 |
*** gvrangan has quit IRC | 12:40 | |
rlandy|rover | arxcruz|ruck: but this has to get assigned today | 12:41 |
rlandy|rover | can't carry on | 12:41 |
rlandy|rover | I agree there | 12:41 |
arxcruz|ruck | rlandy|rover: https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-ocata-rdo_trunk-minimal-356/undercloud/var/log/extra/errors.txt.gz | 12:42 |
arxcruz|ruck | there are errors in ironic, neutron and nova | 12:43 |
rlandy|rover | https://bugzilla.redhat.com/show_bug.cgi?id=1506035 | 12:43 |
openstack | bugzilla.redhat.com bug 1506035 in openstack-neutron "neutron openvswitch agent creates multiple rows in Manager table" [Low,Closed: notabug] - Assigned to amuller | 12:43 |
*** gvrangan has joined #oooq | 12:43 | |
rlandy|rover | error could be harmless | 12:43 |
ykarel | rlandy|rover, arxcruz|ruck i think we are seeing non consistent issue in ocata is ironic is being restarted multiple times, may be other services also, in pike i can't see multiple restarts | 12:43 |
ykarel | can you look in this area | 12:43 |
ykarel | is this expected | 12:43 |
rlandy|rover | ykarel: the reason is why | 12:44 |
rlandy|rover | the hatrdware is ok for pike, queens, master | 12:44 |
rlandy|rover | ocata works on rodcloud jobs | 12:44 |
panda | quiquell: looks like it's doing as expected for the build-test-packages, building them at install time, skipping them at upgrade | 12:44 |
quiquell | panda: But install-repo is reinstalling at upgrades | 12:44 |
quiquell | I think | 12:44 |
rlandy|rover | ykarel: we also have no access to the hardware | 12:44 |
panda | quiquell: yes | 12:44 |
quiquell | panda: That's what we have to fix | 12:45 |
panda | quiquell: let's see how the precedence is | 12:45 |
*** udesale has joined #oooq | 12:45 | |
quiquell | but build-test-packages is alaredy idempotent | 12:45 |
rlandy|rover | so while I'd love to dump this one some team - we need to pick the right one | 12:45 |
*** udesale_ has quit IRC | 12:45 | |
arxcruz|ruck | rlandy|rover: check comment 5 | 12:45 |
ykarel | rlandy|rover, i have a reproducer in a beaker machine, in case u want to look | 12:45 |
ykarel | i loan from a colleague | 12:46 |
rlandy|rover | ykarel: did it actually reproducer it? | 12:46 |
panda | quiquell: so, the gating repo has priority 1 as I remembered, I'm not wure if this means that newer versions will not be installed | 12:46 |
rlandy|rover | the errors | 12:46 |
ykarel | rlandy|rover, no conductor issue reproduced | 12:46 |
ykarel | and i think the reason is ironic services are being restarted | 12:46 |
ykarel | we need to find is that expected | 12:46 |
panda | quiquell: that's the next thing to check: if a package in master will be installer over a package in queens even if it comes from a repo with priority=1 | 12:46 |
quiquell | panda: Have you check if the upgrade ins installing queens in master already ? | 12:47 |
quiquell | rlandy|rover, arxcruz|ruck: going to put for a few days the ruck rover alarm tool we are doing | 12:47 |
quiquell | rlandy|rover, arxcruz|ruck: let me know if it's of any use | 12:48 |
rlandy|rover | ykarel: do you work with any neutron expert? | 12:48 |
ykarel | rlandy|rover, nope | 12:48 |
panda | quiquell: tht version is 0.20180531112327.b7d84da.el7 | 12:48 |
*** ruck-rover-alert has joined #oooq | 12:49 | |
panda | quiquell: looks like the one from the change to me | 12:49 |
rlandy|rover | ykarel: the reproducer I ran on my minidell never got very far | 12:49 |
rlandy|rover | do you see the services restarting? | 12:49 |
rlandy|rover | can you hold that beaker machine? | 12:49 |
ruck-rover-alert | [Alerting] Promotions for ocata alert: Ocata promotions problem in the last 24h: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=111&orgId=1 | 12:49 |
panda | quiquell: so even if install-repo is not idempotent, yum does the math for us, we are just wasting time | 12:49 |
panda | installing a repo that is not useful anymore | 12:49 |
ykarel | rlandy|rover, yes services are being started around every 10 minutes | 12:49 |
ruck-rover-alert | [Alerting] Gate jobs upstream alert: Upstream gate is failing in the last 24h: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=55&orgId=1 | 12:50 |
ykarel | rlandy|rover, a puppet apply is continuosly running | 12:50 |
ruck-rover-alert | [Alerting] RDO master noop change results alert: RDO jobs failing at master noop change https://review.openstack.org/#/c/560445.: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=103&orgId=1 | 12:50 |
ruck-rover-alert | [Alerting] RDO stable/ocata noop change results alert: RDO jobs failing at stable/ocata noop change https://review.openstack.org/#/c/564291.: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=106&orgId=1 | 12:50 |
quiquell | panda: but then it's wrong, it has to be the one from master not the one from the change | 12:50 |
ruck-rover-alert | [Alerting] Upstream zuul max enqueued time alert: A lot of jobs at upstream tripleo gate queue, check https://zuul.openstack.org: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=71&orgId=1 | 12:50 |
quiquell | I mean after the upgrade the tht have to be the master version ? | 12:50 |
rlandy|rover | quiquell: can we turn this off | 12:50 |
panda | quiquell: lol, true, let's checkj better | 12:50 |
quiquell | rlandy|rover: Sure sorry | 12:50 |
rlandy|rover | we're in the middle of a conversation here | 12:50 |
*** ruck-rover-alert has quit IRC | 12:50 | |
ykarel | rlandy|rover, /usr/bin/ruby /usr/bin/puppet apply --summarize --detailed-exitcodes /etc/puppet/manifests/puppet-stack-config.pp | 12:50 |
ykarel | this restarts services again and again | 12:51 |
rlandy|rover | can you hold that machine for a bit | 12:51 |
rlandy|rover | EmilienM: ping if you are around | 12:51 |
rlandy|rover | remember you offered to help yesterday?? | 12:52 |
rlandy|rover | any thoughts on <ykarel> rlandy|rover, a puppet apply is continuosly running? | 12:52 |
rlandy|rover | we have an ocata issue here that is doing us in | 12:52 |
rlandy|rover | ykarel: adding your comments to the bug | 12:53 |
*** ruck-rover-alert has joined #oooq | 12:53 | |
myoung | o/ good morning | 12:54 |
rlandy|rover | ykarel: can we get access to your beaker machine? | 12:55 |
panda | quiquell: so tht package version is 8.0.3 so it's queens | 12:55 |
ykarel | rlandy|rover, ok, yes you can get access | 12:55 |
panda | quiquell: but since this is after update, I expect this to be master | 12:55 |
*** quiquell is now known as quique|lunch | 12:55 | |
quique|lunch | panda: Give me a minute | 12:56 |
ykarel | rlandy|rover, i can see service being restarting in ovb jobs also both in pike/ocata but there we might be just lucky | 12:56 |
panda | quique|lunch: oh I thought you were lunching 30 minutes ago | 12:56 |
panda | quique|lunch: take your time | 12:56 |
rlandy|rover | I was also looking at this .. https://bugs.launchpad.net/tripleo/+bug/1673030 | 12:57 |
openstack | Launchpad bug 1673030 in tripleo "ocata/stable jobs are broken (httpd restart failures)" [Critical,Fix released] - Assigned to Michele Baldessari (michele) | 12:57 |
rlandy|rover | looks like service restarts are nothing new | 12:57 |
rlandy|rover | they probably mess up a lot | 12:57 |
ykarel | hmm, that's bad, i think services should not restart until really required | 12:59 |
rlandy|rover | ykarel: so I think we may have been getting by with this problem for some tim | 12:59 |
rlandy|rover | when EmilienM is available, I'd like to run it by him | 12:59 |
ykarel | rlandy|rover, okk | 12:59 |
ykarel | rlandy|rover, ssh -X root@dell-per320-2.gsslab.pnq.redhat.com | 13:00 |
rlandy|rover | ykarel: awesome - I'm in - thank you | 13:00 |
ykarel | rlandy|rover, ok and undercloud: ssh -F ~/.quickstart/ssh.config.ansible undercloud | 13:01 |
*** florianf has quit IRC | 13:05 | |
*** florianf has joined #oooq | 13:05 | |
rlandy|rover | arxcruz|ruck: once this bug is assigned - I am considering changing the promotion criteria | 13:06 |
rlandy|rover | I'll look at why pike phase 2 is behind | 13:07 |
rlandy|rover | also - our upstream zuul queues are so long | 13:07 |
*** florianf has quit IRC | 13:08 | |
*** florianf has joined #oooq | 13:08 | |
*** quique|lunch is now known as quiquell | 13:09 | |
quiquell | panda: ready | 13:09 |
arxcruz|ruck | rlandy|rover: pike is behind because phase 1 was 4 days old, i believe we will have a pike phase 2 today | 13:09 |
rlandy|rover | I hope so | 13:09 |
quiquell | panda: Let's look at it before the meeting | 13:09 |
rlandy|rover | arxcruz|ruck: just retried https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/periodic-pike-rdo_trunk-featureset020-1ctlr_1comp_64gb/ | 13:11 |
rlandy|rover | mirror issue | 13:11 |
*** trown is now known as trown|brb | 13:11 | |
*** trown|brb is now known as trown | 13:15 | |
panda | quiquell: you there ? | 13:15 |
panda | quiquell: the tht version is from queens, so the priority is probably blocking the upgrade for packages | 13:16 |
quiquell | panda: Yep | 13:16 |
panda | quiquell: so we need to make install-repo smarter | 13:16 |
quiquell | panda: renamign the gating repo tarball is not enough _ | 13:17 |
quiquell | I mean adding the release on it | 13:17 |
quiquell | so install repo just use the repo from the current release it is working on | 13:17 |
panda | quiquell: we need to remove what's on the previous version too | 13:17 |
quiquell | panda: Maybe we can move it more than remove it | 13:18 |
quiquell | For debuggin purposes at the end of install repo | 13:18 |
panda | quiquell: doesn't matter, it needs to be disabled | 13:18 |
panda | in some way | 13:18 |
quiquell | panda: Let's hack it and see what happends will try | 13:18 |
panda | move it, remove it, stab it, strangle it, whatever :) | 13:18 |
quiquell | panda: the gating repo is not needed after install ? | 13:19 |
panda | quiquell: not as far as I know | 13:19 |
quiquell | ok | 13:19 |
*** marios has quit IRC | 13:19 | |
*** marios has joined #oooq | 13:20 | |
EmilienM | rlandy|rover: hey, what's up? in meeting the next 1h30, then available | 13:21 |
ruck-rover-alert | [Alerting] RDO stable/queens noop change results alert: RDO jobs failing at stable/queens noop change https://review.openstack.org/#/c/567224.: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=104&orgId=1 | 13:22 |
ruck-rover-alert | [Alerting] Promotions for pike alert: Master promotions problem in the last 24h: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=110&orgId=1 | 13:22 |
ruck-rover-alert | [Alerting] RDO stable/pike noop change results alert: RDO jobs failing at stable/pike noop change https://review.openstack.org/#/c/564285.: http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?fullscreen=true&edit=true&tab=alert&panelId=105&orgId=1 | 13:22 |
*** ruck-rover-alert has quit IRC | 13:22 | |
rlandy|rover | EmilienM: ok - pls ping me when you have some time - we need your puppet expertise | 13:23 |
*** ruck-rover-alert has joined #oooq | 13:23 | |
ruck-rover-alert | [OK] Promotions for pike alert: Master promotions problem in the last 24h | 13:23 |
myoung | CI squad: standup/scrum in 6m | 13:24 |
rlandy|rover | EmilienM: https://bugs.launchpad.net/tripleo/+bug/1774079 - this is killing the ocata promotion for weeks | 13:24 |
openstack | Launchpad bug 1774079 in tripleo "[ocata promotion] phase1 (ci.centos) job tripleo-quickstart-promote-ocata-rdo_trunk-minimal fails introspection/deploy "No valid host found"" [Critical,Triaged] | 13:24 |
ruck-rover-alert | [Alerting] Promotions for queens alert: Queens promotions problem in the last 24h | 13:24 |
ruck-rover-alert | [Alerting] RDO stable/queens noop change results alert: RDO jobs failing at stable/queens noop change https://review.openstack.org/#/c/567224. | 13:24 |
rlandy|rover | ykarel kindly has a reproducer env set up | 13:24 |
*** ratailor_ has quit IRC | 13:24 | |
ruck-rover-alert | [OK] Promotions for queens alert: Queens promotions problem in the last 24h | 13:25 |
ruck-rover-alert | [OK] RDO stable/queens noop change results alert: RDO jobs failing at stable/queens noop change https://review.openstack.org/#/c/567224. | 13:25 |
rlandy|rover | quiquell: pls, pls can we get these ruck-rover alerts turned off - too many to follow | 13:25 |
arxcruz|ruck | lol | 13:25 |
panda | ruck-rover-alert: ? | 13:25 |
arxcruz|ruck | quiquell: perhaps messaging only the ruck and rover | 13:25 |
quiquell | rlandy|rover: Sorry again, It's not suppose to do it here... my fault | 13:26 |
*** ruck-rover-alert has quit IRC | 13:26 | |
arxcruz|ruck | quiquell-- | 13:26 |
hubbot | arxcruz|ruck: quiquell's karma is now 0 | 13:26 |
rlandy|rover | quiquell: if it alerts on my personal feed, np | 13:26 |
quiquell | arxcruz|ruck++ | 13:26 |
hubbot | quiquell: arxcruz|ruck's karma is now 2 | 13:26 |
rlandy|rover | it's just a lot of text *** in the middle *** of a chat conversation | 13:26 |
quiquell | rlandy|rover, arxcruz|ruck: alerts are at #tripleo-ci now | 13:27 |
rlandy|rover | also - if things are ok - we don't need to know about them | 13:27 |
arxcruz|ruck | rlandy|rover: i believe we should assign to ironic, but not sure if they will care too much since it's ocata | 13:27 |
panda | [Alerting] scrum meeting in 3 min | 13:27 |
panda | [OK] scrum meeting in 3 min | 13:27 |
rlandy|rover | panda: lol | 13:27 |
quiquell | :-) | 13:27 |
quiquell | ok means "back to normal" they don't get repeated | 13:28 |
*** links has quit IRC | 13:29 | |
*** zoli is now known as zoli|lunch | 13:30 | |
*** sshnaidm_pto has joined #oooq | 13:31 | |
myoung | rlandy|rover, scrum | 13:32 |
*** udesale_ has joined #oooq | 13:34 | |
rlandy|rover | quiquell: alerts on tripleo-ci are great - thanks | 13:36 |
*** udesale has quit IRC | 13:37 | |
quiquell | rlandy|rover: We have to adjust them the time range to check problems | 13:37 |
panda | quiquell: and add some dampening, to me a OK 3 minutes after an ALERT is too much noise | 13:43 |
quiquell | panda: the local influxdb is eating all the m emory, that was it was funny | 13:44 |
quiquell | panda: normally it's not so verbose | 13:44 |
*** gvrangan has quit IRC | 13:47 | |
*** jaganathan has quit IRC | 14:02 | |
ykarel | rlandy|rover, seeing local reproducer, cause for restart should be:- | 14:08 |
ykarel | May 31 13:03:17 undercloud os-collect-config[714]: /usr/libexec/os-refresh-config/post-configure.d/98-undercloud-setup: line 91: HOME: unbound variable | 14:08 |
ykarel | May 31 13:03:17 undercloud os-collect-config[714]: [2018-05-31 13:03:17,766] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/post-configure.d']' returned non-zero exit status 1] | 14:08 |
ykarel | May 31 13:03:17 undercloud os-collect-config[714]: [2018-05-31 13:03:17,767] (os-refresh-config) [ERROR] Aborting... | 14:08 |
ykarel | May 31 13:03:17 undercloud os-collect-config[714]: Command failed, will not cache new data. Command 'os-refresh-config' returned non-zero exit status 1 | 14:09 |
*** apetrich has quit IRC | 14:09 | |
ykarel | May 31 13:03:17 undercloud os-collect-config[714]: Sleeping 1.00 seconds before re-exec. | 14:09 |
ykarel | EmilienM, ^^ any idea? | 14:09 |
rlandy|rover | ykarel: my best guess is that this restart problem has been going on for some time - but we are seeing the failures somewhat more consistently now. if we get nowhere with the debug today, I think we need to promote | 14:16 |
panda | upcrap the undercloud ! | 14:16 |
ykarel | rlandy|rover, sure, | 14:17 |
rlandy|rover | maybe EmilienM has some more info here - we are going EOL in august | 14:17 |
rlandy|rover | I want to give it one more day | 14:17 |
ykarel | yup waiting for him | 14:17 |
quiquell | panda: It was uncrap ? | 14:17 |
rlandy|rover | in truth, we had no promotions from 05/09 to 05/22 | 14:18 |
rlandy|rover | when things went south | 14:18 |
rlandy|rover | a million things could have gone in in that time | 14:18 |
rlandy|rover | ykarel: but I agree - we don't give up without a proper last fight/investigation | 14:18 |
panda | quiquell: I clearly heard "upcrap" | 14:19 |
ykarel | rlandy|rover, can see this restart even before 05/09, but in pike not | 14:19 |
hubbot | All check jobs are working fine on stable/queens, stable/ocata, stable/ocata, master. | 14:24 |
*** zoli|lunch is now known as zoli|wfh | 14:42 | |
*** zoli|wfh is now known as zoli | 14:42 | |
quiquell | no usre needed for ruck/rover alarms now http://38.145.34.131:3000/d/pgdr_WVmk/ruck-rover?orgId=1 | 14:43 |
rlandy|rover | arxcruz|ruck: ugh - https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/periodic-pike-rdo_trunk-featureset020-1ctlr_1comp_64gb/44/console | 14:51 |
rlandy|rover | pike rdo phase 2 | 14:51 |
arxcruz|ruck | rlandy|rover: maybe ykarel can help ? | 14:52 |
arxcruz|ruck | it's packaging issue | 14:52 |
rlandy|rover | we need to upgrade 7.5 | 14:52 |
rlandy|rover | checking host | 14:52 |
arxcruz|ruck | rlandy|rover: you mean the image directly to 7.5 ? | 14:53 |
rlandy|rover | creating bug to start with | 14:54 |
arxcruz|ruck | rlandy|rover: i'll create | 14:54 |
ykarel | rlandy|rover, also need to remove the workaround:- install known good kernel version when moving to rhel 7.5 | 14:55 |
panda | myoung: trown quiquell when do we want to finish the discussion ? I'm available | 14:55 |
rlandy|rover | ykarel: yep - knew that one would bite us some day | 14:55 |
ykarel | yup | 14:56 |
quiquell | panda: want to ask myoung stuff about the promoter first, just a few minutes | 14:56 |
EmilienM | ykarel: I have one more 1h meeting and I'm done :D | 14:56 |
arxcruz|ruck | rlandy|rover: https://bugs.launchpad.net/tripleo/+bug/1774435 | 14:57 |
openstack | Launchpad bug 1774435 in tripleo "Libvirt package dependences broken on RDO Phase 2" [Undecided,Triaged] - Assigned to Ronelle Landy (rlandy) | 14:57 |
ykarel | EmilienM, no prob, take your time :) | 14:57 |
rlandy|rover | Last login: Thu May 31 09:16:49 2018 from slave-rdo-ci-fx2-01-s6.v101.rdoci.lab.eng.rdu.redhat.com | 14:57 |
rlandy|rover | [root@rdo-ci-fx2-01-s6 ~]# cat /etc/redhat-release | 14:57 |
rlandy|rover | Red Hat Enterprise Linux Server release 7.4 (Maipo) | 14:57 |
quiquell | panda: btw fs050 it's ready for the gates https://review.openstack.org/#/c/570882/ | 14:57 |
rlandy|rover | hmmm - this machine must support other jobs though | 14:58 |
*** skramaja has quit IRC | 15:00 | |
myoung | quiquell, panda, have tempest squad scrum, avail after | 15:00 |
myoung | folks are on PTO, will just be a few mins | 15:02 |
trown | im available whenever | 15:05 |
quiquell | rlandy|rover, trown, myoung: +2/+1w fs050 gating https://review.openstack.org/#/c/570882/ | 15:06 |
rlandy|rover | myoung:what are our internal slaves usually running? this one is rhel 7.4 | 15:06 |
myoung | panda: do you have a hot sec to join my BJ with chandankumar? | 15:07 |
rlandy|rover | have you upgraded any to rhel 7.5? | 15:07 |
myoung | panda: need a quick bit of advice | 15:07 |
chandankumar | panda: https://review.openstack.org/#/q/topic:refstack-support+(status:open+OR+status:merged) we need your eyes on this one | 15:07 |
myoung | bluejeans.com/matyoung | 15:07 |
myoung | rlandy|rover: sec | 15:08 |
rlandy|rover | sure | 15:08 |
quiquell | panda: have some time for the meeting | 15:08 |
myoung | rlandy|rover: they should all be at 7.4, now that 7.5 has released we could/should update the virthosts | 15:13 |
myoung | rlandy|rover: the slaves (small VM's) are running fedora | 15:13 |
rlandy|rover | myoung: stupid question - how do we update - rhos-release or an entire reboot? | 15:14 |
myoung | rlandy|rover: afaik the rhel 7.4 version on virthosts, while needing to be upgraded shouldn't be an issue, as for those jobs UC/OC area all libvirt domains anyway | 15:14 |
rlandy|rover | myoung: see error .. https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/periodic-pike-rdo_trunk-featureset020-1ctlr_1comp_64gb/44/console | 15:14 |
myoung | rlandy|rover: aye it's generally a "rhos-release $theArgs && yum update && reboot" | 15:15 |
*** ccamacho has quit IRC | 15:15 | |
rlandy|rover | admin.so.0(LIBVIRT_ADMIN_PRIVATE_3.2.0)(64bit)\n Installed: libvirt-libs-3.9.0-14.el7_5.5.x86_64 (@rhelosp-rhel-7.4-server)\n ~libvirt- | 15:15 |
* myoung looks | 15:15 | |
rlandy|rover | so if we update to 7.5 | 15:15 |
rlandy|rover | we should get by this | 15:15 |
*** ccamacho has joined #oooq | 15:15 | |
rlandy|rover | also we are locking the kernel | 15:15 |
rlandy|rover | http://git.app.eng.bos.redhat.com/git/tripleo-environments.git/tree/roles/prep-internal-host/tasks/main.yml#n36 | 15:15 |
rlandy|rover | "rhos-release rhel-7.5 && yum update && reboot"? | 15:16 |
rlandy|rover | Installed: libvirt-libs-3.9.0-14.el7_5.5.x86_64 | 15:17 |
*** holser__ has joined #oooq | 15:17 | |
panda | myoung: chandankumar sorry about that. myoung where did you go ? | 15:17 |
myoung | panda: we're done, most of tempest team is out so standup was just chandankumar's question | 15:18 |
myoung | panda, quiquell now have time :) | 15:18 |
myoung | rlandy|rover: ack, let's just upgrade them | 15:18 |
quiquell | panda: Let's do this I don't have much time left | 15:18 |
rlandy|rover | myoung: using "rhos-release rhel-7.5 && yum update && reboot" ? | 15:18 |
*** sanjay__u has joined #oooq | 15:18 | |
myoung | rlandy|rover: in the past we did hit some issues around having multiple versions of rpm's thru upgrades and needed a little kung-fu, mostly because there were some broken package dependencies in the 7.3 --> 7.4 upgrade path (686 binaries were being pulled in). TLDR we should try on a single virthost first :) | 15:19 |
rlandy|rover | myoung: I am on rdo-ci-fx2-01-s6.v101.rdoci.lab.eng.rdu.redhat.com | 15:19 |
myoung | rlandy|rover: heh hence http://git.app.eng.bos.redhat.com/git/tripleo-environments.git/tree/ci-scripts/virthost-rpm-cleanup.sh | 15:19 |
rlandy|rover | could try that one | 15:19 |
panda | myoung: quiquell in myoung room | 15:19 |
panda | ready and waiting | 15:20 |
myoung | panda, quiquell, inc | 15:20 |
panda | lock and loaded | 15:20 |
myoung | rlandy|rover: can help / jump in after | 15:20 |
rlandy|rover | myoung: ping me when you are done with meeting and we'll do this | 15:20 |
quiquell | connecting | 15:20 |
myoung | rlandy|rover: ack | 15:20 |
*** holser__ has quit IRC | 15:28 | |
*** saneax has quit IRC | 15:30 | |
*** marios has quit IRC | 15:39 | |
*** udesale_ has quit IRC | 15:40 | |
*** jaganathan has joined #oooq | 15:40 | |
*** jaganathan has quit IRC | 15:42 | |
*** jaganathan has joined #oooq | 15:42 | |
EmilienM | rlandy|rover: what is the error, what is the context? i can help now | 15:55 |
rlandy|rover | EmilienM: hi there - nice presentation earlier today! ykarel and I have been looking at the ocata failures in phase 1 | 15:56 |
rlandy|rover | ci.centos | 15:56 |
rlandy|rover | getting bug | 15:57 |
rlandy|rover | https://bugs.launchpad.net/tripleo/+bug/1774079 | 15:57 |
openstack | Launchpad bug 1774079 in tripleo "[ocata promotion] phase1 (ci.centos) job tripleo-quickstart-promote-ocata-rdo_trunk-minimal fails introspection/deploy "No valid host found"" [Critical,Triaged] | 15:57 |
rlandy|rover | the problem being that the jobs fails consistently since 05/22 but always in a different place | 15:58 |
rlandy|rover | ykarel set up a reproducer env (which we can give you access to) | 15:58 |
rlandy|rover | noticing that there are a lot of restarts | 15:58 |
ykarel | rlandy|rover, EmilienM i think i got something, proposed a patch | 15:59 |
ykarel | a cherry-pick, EmilienM can you check | 15:59 |
rlandy|rover | ykarel: interesting - date seems to be earlier expected - but it's not in ocata | 16:02 |
rlandy|rover | newton? | 16:02 |
trown | haha | 16:03 |
trown | ok, bye matt :P | 16:03 |
quiquell | Ups I have to drop btw | 16:03 |
myoung | gah. bluejeans | 16:03 |
quiquell | See you tomorrow will ask panda about stuff you talk about | 16:03 |
trown | no worries, I think we have a good idea what are the next steps | 16:03 |
*** quiquell is now known as quiquell|off | 16:03 | |
panda | trown: you going to rejoin ? | 16:04 |
trown | panda: do we have more to discuss? | 16:05 |
ykarel | rlandy|rover, not checked for newton | 16:05 |
rlandy|rover | ykarel: np - I'll check | 16:05 |
rlandy|rover | ykarel: do you see an improvement in restarts> | 16:05 |
rlandy|rover | ? | 16:05 |
rlandy|rover | we can deal with newton if it fixes ocata | 16:06 |
panda | trown: if you are ok with the notes on the card then no, I will remove the maturity labels and it's good to go | 16:06 |
ykarel | rlandy|rover, improvement where, i haven't applied it locally yet | 16:06 |
trown | panda: ya, I think I could work on any of the cards | 16:06 |
panda | trown: ok | 16:06 |
rlandy|rover | ykarel: ok then, we'll see what the gates say | 16:06 |
ykarel | rlandy|rover, but line 91: HOME: unbound variable is still an issue, what i noticed is the .novaclient file is created in /home/stack and that script is running from root | 16:08 |
ykarel | so even if $HOME is available, it would look at /root/.novaclient | 16:09 |
rlandy|rover | ykarel: same as with the duplicate entry - there are a lot of error we follow up and sometimes they are unrelated | 16:09 |
ykarel | i meant: https://github.com/openstack/instack-undercloud/blob/stable/ocata/elements/undercloud-install/os-refresh-config/post-configure.d/98-undercloud-setup#L91 | 16:09 |
rlandy|rover | ykarel: considering the errors and stack traces, your patch seems like a good shot | 16:09 |
ykarel | ok let's get EmilienM mwhahaha opinion on that | 16:10 |
rlandy|rover | sure | 16:10 |
EmilienM | ykarel: ok, will check | 16:11 |
ykarel | EmilienM, Thanks | 16:11 |
* ykarel leaving | 16:11 | |
rlandy|rover | ykarel: thanks for your work on this! | 16:11 |
*** amoralej is now known as amoralej|off | 16:15 | |
*** ykarel_ has joined #oooq | 16:16 | |
*** ykarel has quit IRC | 16:18 | |
*** apetrich has joined #oooq | 16:18 | |
*** ccamacho has quit IRC | 16:21 | |
*** zoli is now known as zoli|gone | 16:24 | |
*** zoli|gone is now known as zoli | 16:24 | |
hubbot | All check jobs are working fine on stable/ocata, stable/ocata, master, stable/queens. | 16:24 |
*** panda is now known as panda|off | 16:25 | |
rlandy|rover | myoung: pls ping when you are ready to work on the virt host upgrade | 16:27 |
myoung | rlandy|rover: ack, ready | 16:27 |
rlandy|rover | myoung: ok - I am logged into rdo-ci-fx2-01-s6.v101.rdoci.lab.eng.rdu.redhat.com as root | 16:28 |
rlandy|rover | it's current installed with rhel 7.4 | 16:28 |
rlandy|rover | am I good to go with "rhos-release rhel-7.5 && yum update && reboot"? | 16:28 |
myoung | trown: panda|off suggests we tag-team on https://trello.com/c/ZlaiOAZ2 by doing test / impl in parallel | 16:28 |
rlandy|rover | if not, what additional? | 16:28 |
myoung | rlandy|rover: do you have a tmate already? | 16:28 |
rlandy|rover | will create - sec | 16:29 |
myoung | can jump into bj if helps or just in tmate either | 16:29 |
*** trown is now known as trown|lunch | 16:29 | |
myoung | trown|lunch: : happy to take either half | 16:29 |
*** ykarel_ has quit IRC | 16:30 | |
myoung | rlandy|rover: (shocker) I kept fairly detailed notes when I did the 7.3 --> 7.4 upgrade (https://trello.com/c/yKpd7REc) - reviewing them now | 16:30 |
rlandy|rover | myoung: ok | 16:31 |
rlandy|rover | let's try on this one box | 16:31 |
rlandy|rover | then we can try the virt packages install | 16:31 |
rlandy|rover | imho - we just rhos-release and hope for the best | 16:31 |
* rlandy|rover is full of hope | 16:31 | |
*** ykarel has joined #oooq | 16:50 | |
rlandy|rover | https://code.engineering.redhat.com/gerrit/140417 | 16:52 |
rlandy|rover | myoung: ^^ | 16:52 |
EmilienM | could I get someone kicking off a job with https://review.openstack.org/#/c/566916 on featureset035 so I can debug, I don't have the hardware | 17:06 |
EmilienM | I really want to reproduce the failure | 17:07 |
*** sshnaidm_pto has quit IRC | 17:08 | |
EmilienM | send the bill to weq | 17:09 |
EmilienM | wes* | 17:09 |
rlandy|rover | EmilienM: reproducer not working on your personal rdocloud tenant? | 17:11 |
rlandy|rover | I can kick it on my tenant | 17:11 |
myoung | rlandy|rover: https://code.engineering.redhat.com/gerrit/140418 Update the rpm update script to support 7.5 | 17:11 |
rlandy|rover | comment on line 5 and then let's push this | 17:12 |
EmilienM | oh wait | 17:14 |
EmilienM | rlandy|rover: let me try it before :-P | 17:14 |
EmilienM | I always forgot I can deploy OVB in RDO now | 17:14 |
* EmilienM facepalm | 17:14 | |
rlandy|rover | EmilienM: depends on resources - let me know | 17:15 |
ykarel | rlandy|rover, newton phase1 also red, don't we care about it now? | 17:15 |
EmilienM | rlandy|rover: ack | 17:15 |
ykarel | rlandy|rover, currently failing at missing pem file: | /home/jenkins/.ssh/rdo-ci-public.pem: No such file or directory | 17:16 |
rlandy|rover | ykarel: it's been down for ages | 17:16 |
rlandy|rover | whole newton pipeline is disabled | 17:17 |
ykarel | hmm if these are not cared, shouldn't those be removed | 17:17 |
myoung | rlandy|rover: looking at https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/a%20multijob%20perspective/job/rdo-promote-newton-rdo_trunk/ | 17:17 |
ykarel | rlandy|rover, i can see those jobs are running, llast run 2 hours ago | 17:18 |
rlandy|rover | ykarel: very few people care about anything after upstream and rdocloud | 17:18 |
rlandy|rover | I'll look into it more after when we have ocata and pike going | 17:18 |
rlandy|rover | have a bunch of slaves to upgrade | 17:19 |
rlandy|rover | ykarel: your patch got two + 2's | 17:19 |
ykarel | rlandy|rover, Okk Thanks | 17:19 |
ykarel | rlandy|rover, yeah saw that, if we get that merged today, we can see the result in tomorrow's run | 17:19 |
rlandy|rover | ykarel++ | 17:20 |
hubbot | rlandy|rover: ykarel's karma is now 1 | 17:20 |
EmilienM | rlandy|rover: have you seen that before ? http://paste.openstack.org/show/cF3C5lSLqsxORBBPgjSf/ | 17:20 |
myoung | ykarel: I think folks "care" - but lastly needs > resources | 17:20 |
ykarel | myoung, agree | 17:20 |
ykarel | i also saw that job by chance | 17:20 |
EmilienM | I wonder if my openrc is missing something | 17:20 |
rlandy|rover | EmilienM: can you tell why the stack failed? | 17:20 |
rlandy|rover | do you have enough resources | 17:20 |
EmilienM | I removed all resources in my project | 17:21 |
rlandy|rover | let me try my tenant | 17:21 |
rlandy|rover | one moment | 17:21 |
*** sshnaidm_pto has joined #oooq | 17:21 | |
EmilienM | rlandy|rover: can you show your openrc plz? (without password!) | 17:22 |
ykarel | EmilienM, are u trying v2 or v3 openrc? | 17:22 |
rlandy|rover | use v2 | 17:22 |
*** atoth has quit IRC | 17:22 | |
EmilienM | v3 | 17:22 |
ykarel | hmm try v2 | 17:22 |
*** atoth has joined #oooq | 17:22 | |
EmilienM | oh ok | 17:22 |
ykarel | but should be fixed for v3 as well if that's the issue | 17:22 |
rlandy|rover | it was a stack create limitation for a long time | 17:23 |
myoung | EmilienM, rlandy|rover, looking at http://paste.openstack.org/show/722452, because my eyes are bleeding from prev paste :) | 17:23 |
rlandy|rover | v3 compatibility | 17:23 |
EmilienM | interesting it works with deployed-server jobs | 17:25 |
EmilienM | but not for ovb, probably some resource in heat | 17:25 |
rlandy|rover | yep - stack issue | 17:25 |
EmilienM | ok stack deploying now :D | 17:25 |
rlandy|rover | awesome | 17:25 |
EmilienM | ykarel++ | 17:25 |
hubbot | EmilienM: ykarel's karma is now 2 | 17:25 |
EmilienM | rlandy|rover++ | 17:25 |
hubbot | EmilienM: rlandy|rover's karma is now 1 | 17:25 |
EmilienM | rlandy++ | 17:25 |
EmilienM | awesome work folks I'll keep repeating :D | 17:25 |
*** ykarel is now known as ykarel|away | 17:27 | |
EmilienM | Create_Failed: Resource CREATE failed: resources.undercloud_env: Property error: resources.undercloud_server.properties.key_name: Error validating value 'key-985': The Key (key-985) could not be found. | 17:28 |
* EmilienM sad face | 17:28 | |
EmilienM | if someone can try to reproduce https://logs.rdoproject.org/16/566916/7/openstack-check/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/Zdf5fa249cc654c9d961afde1d13457cf/reproducer-quickstart.sh | 17:32 |
myoung | rlandy|rover: https://code.engineering.redhat.com/gerrit/140420 Remove old / uneeded slave label (rdo-manager-64-74proto) | 17:33 |
rlandy|rover | myoung: awesome | 17:34 |
*** ykarel|away has quit IRC | 17:37 | |
*** d0ugal_ has joined #oooq | 17:40 | |
*** florianf has quit IRC | 17:42 | |
*** d0ugal has quit IRC | 17:42 | |
rlandy|rover | EmilienM: will try that in a bit - just updating slaves atm | 17:43 |
*** trown|lunch is now known as trown | 17:48 | |
trown | myoung: I dont really understand how we would parallelize that card | 17:48 |
*** d0ugal__ has joined #oooq | 17:51 | |
*** d0ugal_ has quit IRC | 17:53 | |
myoung | trown: i have a similar concern, on my way out for food, back in a bit. | 17:57 |
*** myoung is now known as myoung|lunch | 17:57 | |
*** d0ugal__ has quit IRC | 17:59 | |
*** gvrangan has joined #oooq | 18:02 | |
rlandy|rover | EmilienM: I have the fs035 reproducer in progress on my tenant - it's still doing the nodepool-setup piece. pls send me your key and I'll add it to the undercloud so you can log in | 18:07 |
EmilienM | oh wow | 18:09 |
EmilienM | why didn't it work for me | 18:09 |
EmilienM | rlandy|rover: the second one on https://launchpad.net/~emilienm/+sshkeys | 18:09 |
*** gvrangan has quit IRC | 18:09 | |
rlandy|rover | idk - would have to check your run command and env. - adding keys | 18:09 |
EmilienM | thanks | 18:10 |
rlandy|rover | EmilienM: pls try ssh zuul@38.145.33.182 | 18:11 |
rlandy|rover | it's still running TASK [nodepool-setup : Install packages] | 18:11 |
EmilienM | rlandy|rover: I'm in! thanks :) | 18:11 |
rlandy|rover | EmilienM: ok - I'll let you know when this part is done - you should see stuff happening in the /home/zuul dir afterwards | 18:12 |
EmilienM | right | 18:13 |
EmilienM | rlandy|rover: thanks a ton | 18:13 |
rlandy|rover | sure | 18:13 |
*** d0ugal__ has joined #oooq | 18:23 | |
*** gvrangan has joined #oooq | 18:23 | |
hubbot | FAILING CHECK JOBS on master: tripleo-quickstart-extras-gate-newton-delorean-full-minimal @ https://review.openstack.org/560445 | 18:25 |
*** d0ugal__ has quit IRC | 18:34 | |
*** tesseract has quit IRC | 18:34 | |
*** ccamacho has joined #oooq | 18:45 | |
rlandy|rover | ha - had to reseat that whole server :( | 18:47 |
rlandy|rover | back now | 18:47 |
rlandy|rover | EmilienM: ok - finally - you should see things running now on the reproducer undercloud | 18:59 |
*** d0ugal__ has joined #oooq | 18:59 | |
rlandy|rover | myoung|lunch: on to the third slave | 19:00 |
rlandy|rover | seems to be working though | 19:00 |
EmilienM | rlandy|rover: indeed, I tailf console.log now | 19:00 |
EmilienM | thanks again | 19:00 |
rlandy|rover | we could pike promote | 19:00 |
*** gvrangan has quit IRC | 19:00 | |
EmilienM | still curious how you could make it work and not me | 19:01 |
rlandy|rover | kolla failures :( | 19:29 |
*** myoung|lunch is now known as myoung | 19:34 | |
myoung | rlandy|rover: cool... | 19:35 |
*** holser__ has joined #oooq | 19:36 | |
rlandy|rover | ReadTimeoutError(self._pool, None, 'Read timed out.')", "ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.", | 19:38 |
rlandy|rover | oh dear | 19:38 |
rlandy|rover | myoung: welcome back | 19:38 |
rlandy|rover | more fun on rdocloud | 19:38 |
rlandy|rover | maybe you want to host image builds as well | 19:39 |
myoung | oh no | 19:39 |
myoung | is this from periodics again? | 19:39 |
myoung | pushing built containers --> rdo registry? | 19:39 |
rlandy|rover | myoung: yep - master | 19:39 |
rlandy|rover | you seen it? | 19:39 |
rlandy|rover | https://review.rdoproject.org/zuul/ | 19:39 |
myoung | rlandy|rover: https://bugs.launchpad.net/tripleo/+bug/1771634 | 19:41 |
openstack | Launchpad bug 1771634 in tripleo "periodic: container build jobs are failing when pushing to rdo registry (500, 504, read timeout)" [High,Confirmed] | 19:41 |
myoung | yes this one was permafailing last sprint | 19:42 |
myoung | rlandy|rover: related (but different issue) https://bugs.launchpad.net/tripleo/+bug/1771469 | 19:42 |
openstack | Launchpad bug 1771469 in tripleo "RFE: (dlrnapi-promoter) Better handle Error removing image docker.io/tripleoqueens/centos-binary-nova-placement-api:current-tripleo - UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)" [High,Triaged] | 19:42 |
myoung | rlandy|rover: CIX card from last sprint, https://trello.com/c/6t0etyUO/593-cixlp1771634tripleociproa-periodic-container-build-jobs-are-failing-when-pushing-to-rdo-registry-500-504-read-timeout, since wasn't "failing" any more was moved to a "done" state. I think we should move it back to critical failing jobs, and update https://bugs.launchpad.net/tripleo/+bug/1771634 with the latest failures | 19:43 |
openstack | Launchpad bug 1771634 in tripleo "periodic: container build jobs are failing when pushing to rdo registry (500, 504, read timeout)" [High,Confirmed] | 19:43 |
myoung | arxcruz|ruck: ^^ | 19:43 |
*** tosky has joined #oooq | 19:44 | |
*** holser__ has quit IRC | 20:03 | |
rlandy|rover | oh yeah - I remember that bug | 20:07 |
rlandy|rover | myoung:ok - updating bug | 20:08 |
myoung | rlandy|rover: i moved the CIX card back to failing jobs, and updated bug with link to latest failure kolla log | 20:18 |
*** apetrich has quit IRC | 20:23 | |
*** apetrich has joined #oooq | 20:23 | |
hubbot | FAILING CHECK JOBS on master: tripleo-quickstart-extras-gate-newton-delorean-full-minimal @ https://review.openstack.org/560445 | 20:25 |
rlandy|rover | on the last slave update :) | 20:58 |
*** trown is now known as trown|outtypewww | 21:00 | |
*** sshnaidm_pto has quit IRC | 21:01 | |
EmilienM | rlandy|rover: in the environment you gave to me, the undercloud isn't containeirzed | 21:39 |
EmilienM | https://review.openstack.org/#/c/566916 | 21:39 |
rlandy|rover | I'll check the reproducer I used | 21:39 |
rlandy|rover | : ${ZUUL_CHANGES:="openstack/python-tripleoclient:master:refs/changes/64/570864/2^openstack/tripleo-quickstart:master:refs/changes/16/566916/7"} | 21:40 |
rlandy|rover | : ${TOCI_JOBTYPE:="ovb-3ctlr_1comp-featureset035"} | 21:40 |
rlandy|rover | did I miss something? | 21:40 |
EmilienM | mhh | 21:40 |
rlandy|rover | export EXTRA_VARS="\$EXTRA_VARS --extra-vars dlrn_hash_tag=07f09e500b0f28ecddabf5d8ac808a0a9b399ae3_28943010 " | 21:41 |
EmilienM | it sounds like quickstart was cloned from master | 21:41 |
rlandy|rover | oh that | 21:42 |
EmilienM | :D | 21:42 |
*** links has joined #oooq | 21:42 | |
rlandy|rover | I have a review somewhere to address that problem | 21:42 |
EmilienM | :-O | 21:42 |
rlandy|rover | quickstart itself on the machine is not changed | 21:43 |
rlandy|rover | it should be on the undercloud though | 21:43 |
rlandy|rover | I'll check | 21:43 |
EmilienM | rlandy|rover: this? https://review.openstack.org/#/c/564589/ | 21:44 |
rlandy|rover | yep | 21:44 |
rlandy|rover | but that come into effect if you need to change the running script | 21:44 |
EmilienM | I just want to test a change in the featureset | 21:45 |
rlandy|rover | right ok - you changed the configset itself | 21:46 |
rlandy|rover | EmilienM: sorry - I'll rerun it with manual edits | 21:46 |
EmilienM | can we do that? nice | 21:46 |
EmilienM | if quickstart doesn't complain, I'm fine with it now | 21:47 |
rlandy|rover | nothing fancy | 21:47 |
rlandy|rover | edit reproducer to git review -d | 21:47 |
rlandy|rover | after it clones quickstart | 21:47 |
rlandy|rover | ok - I'm taking down the env | 21:48 |
EmilienM | ok | 21:48 |
rlandy|rover | EmilienM: ok - let's try this again: ssh root@38.145.34.36 | 22:00 |
rlandy|rover | your key is on the root user | 22:00 |
rlandy|rover | as zuul is in setup atm | 22:00 |
rlandy|rover | ok - your key is on the zuul user as well now | 22:01 |
rlandy|rover | arxcruz|ruck: how long should tempest take to run on pike? https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/periodic-pike-rdo_trunk-featureset020-1ctlr_1comp_64gb/45/console - been running for hours | 22:02 |
*** tosky has quit IRC | 22:03 | |
rlandy|rover | myoung: the slaves are back and updated on phase 2 | 22:04 |
myoung | rlandy|rover: the virt tempest fs020 job usually takes ~ 5 hours total | 22:06 |
myoung | rlandy|rover: re virthosts --> rhel 7.5 \o/ | 22:06 |
rlandy|rover | really | 22:07 |
rlandy|rover | that's way long | 22:07 |
rlandy|rover | hoping it passes and we can promote pike | 22:07 |
rlandy|rover | one less yellow box to worry about | 22:08 |
myoung | aye | 22:13 |
myoung | rlandy|rover: looks like #45 finished tempest with a pass, collecting logs now | 22:14 |
myoung | @4:40 it's on track for a sub 5 hr runtime :) | 22:14 |
*** links has quit IRC | 22:16 | |
rlandy|rover | that's insane | 22:19 |
rlandy|rover | but hopeful we will promote | 22:20 |
hubbot | FAILING CHECK JOBS on master: tripleo-quickstart-extras-gate-newton-delorean-full-minimal @ https://review.openstack.org/560445 | 22:25 |
myoung | rlandy|rover: we've passed for the job status so we should. I'll watch the actual promotion into this evening, should take an hourish | 22:26 |
rlandy|rover | cool - thanks | 22:27 |
myoung | rlandy|rover: it's already in flight 2018-05-31 18:36:13,294 32625 INFO promoter Promoting the container images for dlrn hash d5ff1f4b7aaeacd78e4ce1254c9428103893c137 on pike to current-tripleo-rdo-internal | 22:43 |
*** d0ugal__ has quit IRC | 22:48 | |
*** d0ugal__ has joined #oooq | 23:03 | |
*** rlandy|rover is now known as rlandy|rover|bbl | 23:09 | |
rlandy|rover|bbl | cool | 23:09 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!