*** skramaja has joined #oooq | 02:51 | |
*** ykarel has joined #oooq | 02:57 | |
*** udesale has joined #oooq | 03:27 | |
*** gkadam has joined #oooq | 03:41 | |
*** chem has quit IRC | 04:05 | |
*** ykarel has quit IRC | 04:09 | |
*** dtrainor has quit IRC | 04:22 | |
*** ykarel has joined #oooq | 04:49 | |
*** ykarel has quit IRC | 04:50 | |
*** ykarel has joined #oooq | 04:50 | |
*** ykarel has quit IRC | 04:54 | |
*** ykarel has joined #oooq | 04:54 | |
*** honza has joined #oooq | 05:39 | |
*** chem has joined #oooq | 05:47 | |
*** jfrancoa has joined #oooq | 05:50 | |
*** jtomasek has joined #oooq | 06:16 | |
*** chandankumar is now known as chkumar|pto | 06:35 | |
*** apetrich has joined #oooq | 06:38 | |
*** kopecmartin has joined #oooq | 06:48 | |
*** ccamacho has joined #oooq | 06:50 | |
*** dmellado has quit IRC | 07:00 | |
*** saneax has joined #oooq | 07:01 | |
*** dmellado has joined #oooq | 07:02 | |
*** skramaja has quit IRC | 07:18 | |
*** dtantsur|afk is now known as dtantsur | 07:26 | |
*** tosky has joined #oooq | 07:36 | |
tosky | it looks like few jobs are constantly failing with a timeout in the last days | 07:54 |
---|---|---|
tosky | should I keep rechecking, or is there some other issue? | 07:54 |
tosky | (I think that the channel topic should be updated) | 07:54 |
*** d0ugal has joined #oooq | 07:54 | |
*** ykarel is now known as ykarel|lunch | 07:59 | |
jfrancoa | tosky: yes, I was about to ask for the same. And for what I see it's impacting mainly to queens overcloud deployment (the upgrades job fails in master with the same reason too becase it deploys queens overcloud) | 08:03 |
*** skramaja has joined #oooq | 08:08 | |
*** dbecker has joined #oooq | 08:10 | |
*** jtomasek has quit IRC | 08:15 | |
*** jtomasek has joined #oooq | 08:16 | |
rascasoft | marios|rover, hey man are you still rover? | 08:21 |
rascasoft | tosky, jfrancoa, same here, btw | 08:21 |
jfrancoa | I wanted to know if there is any LP opened for it. I'm trying to reproduce it with reproducer-quickstart.sh, and if I find anything I'd like to add it to the LP | 08:22 |
rascasoft | ssbarnea, you around? | 08:27 |
rascasoft | sshnaidm, maybe you? I'm trying to get some help to understand what is going on with the ironic-python-agent.initramfs in oooq | 08:28 |
rascasoft | for some reason my deployments are failing while preparing the images, because of the absence of this file | 08:29 |
rascasoft | (and essentially because the image upload action fails) | 08:29 |
sshnaidm | rascasoft, which release? | 08:29 |
rascasoft | but before everything was fine. Was something removed? | 08:29 |
rascasoft | sshnaidm, any release | 08:29 |
rascasoft | sshnaidm, to give you a failing job link: https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans/20/console | 08:30 |
rascasoft | sshnaidm, as you can see the overcloud-prep-image is failing. Failing env is accessible | 08:30 |
rascasoft | sshnaidm, the problem is the one I described: even if I declared download_overcloud_image: true there's no trace of initramfs image | 08:31 |
sshnaidm | rascasoft, I'd like to see /home/stack/overcloud_prep_images.log | 08:33 |
rascasoft | sshnaidm, I was giving for sure that download_overcloud_image: true included also ipa images, but this seems not to be the case | 08:40 |
*** ykarel|lunch is now known as ykarel | 08:41 | |
sshnaidm | rascasoft, I don't think it's a problem | 08:42 |
sshnaidm | rascasoft, did it start in last days? | 08:42 |
rascasoft | sshnaidm, yes | 08:42 |
rascasoft | but last days could be last 3 weeks | 08:42 |
rascasoft | before my PTO it was working | 08:43 |
sshnaidm | rascasoft, but you can see when jobs started to fail? | 08:44 |
rascasoft | sshnaidm, do you think it could be reasonable if I prepare a review that includes also ipa images in here: https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2#L29 | 08:44 |
rascasoft | ? | 08:45 |
sshnaidm | rascasoft, no, because it's not a root cause for problem | 08:45 |
rascasoft | sshnaidm, well it's hard to say because we fast forwarded the internal repo last week | 08:45 |
sshnaidm | rascasoft, I'm afraid it's because of https://github.com/openstack/tripleo-quickstart-extras/commit/36fec92e39c8727ef6ffdd2512ccbd5a77613c16 | 08:45 |
rascasoft | and everything went kaboom | 08:45 |
rascasoft | sshnaidm, why? This thing is perfectly fine | 08:46 |
sshnaidm | rascasoft, because it's last change in recent days that can affect your job.. | 08:46 |
rascasoft | sshnaidm, and it's just a split of the original playbook, without touching any code | 08:46 |
rascasoft | sshnaidm, LOL this was supposed to FIX everything :D | 08:47 |
sshnaidm | yeah, theoretically we write without bugs :D | 08:47 |
rascasoft | sshnaidm, you're not considering the fast forward | 08:47 |
rascasoft | sshnaidm, it could be anything | 08:47 |
rascasoft | sshnaidm, and note, we needed to do the fast forward to include this code | 08:48 |
sshnaidm | rascasoft, what is time range? | 08:48 |
rascasoft | sshnaidm, we need to ask to who made the fast forward last time | 08:48 |
rascasoft | sshnaidm, but I'd say not more than a month | 08:48 |
rascasoft | sshnaidm, which in terms of oooq is an entire AGE | 08:49 |
sshnaidm | I see.. | 08:49 |
sshnaidm | well, need to check | 08:49 |
sshnaidm | rascasoft, do you have such output without debug and download_overcloud_image:true ? | 08:50 |
rascasoft | sshnaidm, oh you can check one of the previous jobs https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans/ | 08:51 |
sshnaidm | rascasoft, ok, looking | 08:51 |
rascasoft | sshnaidm, but there are lot of them | 08:51 |
rascasoft | let me help you | 08:51 |
sshnaidm | rascasoft, and all with debug? | 08:52 |
rascasoft | sshnaidm, with debug what do you mean exactly? At what level? | 08:53 |
ykarel | tosky, jfrancoa which jobs are impacted, any link? | 08:53 |
ykarel | i was also seeing some timeouts, but those were non-voting | 08:53 |
sshnaidm | rascasoft, I mean without ansible debug, but seems like it's only with it.. nevermind | 08:53 |
rascasoft | sshnaidm, yes | 08:54 |
tosky | ykarel: all jobs on stable/queens | 08:54 |
rascasoft | sshnaidm, but I don't think you'll fin any useful additional info there | 08:54 |
tosky | ykarel: for tripleo-heat-templates | 08:54 |
ykarel | tosky, okk checking | 08:54 |
*** skramaja has quit IRC | 08:54 | |
jfrancoa | ykarel: here in stable/queens many of them https://review.openstack.org/#/c/567224/ | 08:54 |
sshnaidm | rascasoft, you need to enable ara on your jobs.. | 08:54 |
tosky | (at least) | 08:54 |
rascasoft | sshnaidm, we had an enormous amount of problems with these deploys while trying to make them work on Firdat | 08:54 |
rascasoft | *Friday | 08:54 |
rascasoft | sshnaidm, can you point me somwhere where it's done so I can replicate? | 08:55 |
sshnaidm | rascasoft, yeah, I'll show later, it's from upstream CI | 08:57 |
sshnaidm | rascasoft, you don't build images in job, right? | 08:57 |
ykarel | tosky, jfrancoa hmm that's are consistently timing out from 23rd, there should atleast a bug for those | 08:58 |
*** skramaja has joined #oooq | 08:58 | |
rascasoft | sshnaidm, nope | 08:58 |
rascasoft | sshnaidm, I just use images: with overcloud-full and ipa_images | 08:58 |
sshnaidm | rascasoft, how were images downloaded before? or not downloaded? | 08:59 |
rascasoft | sshnaidm, that's a good question. I just relied on those parameters (which were populated by the different config files, depending on the release) | 09:00 |
sshnaidm | rascasoft, ok, will look.. because there is no downloading images part in your playbook | 09:01 |
sshnaidm | rascasoft, and that's the main problem.. | 09:01 |
rascasoft | sshnaidm, for example, in the case of downstream, I used repo_cmd_after: to get them from the local repo | 09:01 |
sshnaidm | rascasoft, in which file is it? | 09:01 |
rascasoft | sshnaidm, for example http://git.app.eng.bos.redhat.com/git/tripleo-environments.git/tree/config/release/rhos-13.yml#n33 | 09:02 |
rascasoft | sshnaidm, for downstream I use this way | 09:02 |
sshnaidm | rascasoft, well, that explains | 09:04 |
rascasoft | sshnaidm, uhm, help me to understand | 09:04 |
*** holser_ has joined #oooq | 09:07 | |
sshnaidm | rascasoft, looking.. | 09:07 |
*** skramaja has quit IRC | 09:09 | |
sshnaidm | rascasoft, I understand why it doesn't work, but can't understand how it worked before.. | 09:12 |
rascasoft | sshnaidm, ok let me try to help, it is not working, why? | 09:12 |
sshnaidm | rascasoft, overcloud_prep_imges script expects images to be in its directory | 09:13 |
sshnaidm | rascasoft, afaik it didn't change ever | 09:13 |
sshnaidm | rascasoft, your images are in /usr/share/rhosp-director-images | 09:14 |
sshnaidm | rascasoft, of course the script doesn't know about it and fails | 09:14 |
rascasoft | sshnaidm, ok, also git blame confirms it, now let's make a step backwards | 09:14 |
rascasoft | sshnaidm, look at one job that worked: | 09:14 |
rascasoft | sshnaidm, note this was BEFORE the fast forward: https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-rhos-13-puddle-bmu-had00-lab-float_nic_with_vlans/24/ | 09:15 |
rascasoft | sshnaidm, for this we have full logs, but if you look at the console output you'll see that overcloud-prep-images worked fine | 09:16 |
rascasoft | sshnaidm, (damn logs aren't available anymore for this job) | 09:17 |
sshnaidm | rascasoft, well, without logs it doesn't help | 09:18 |
rascasoft | sshnaidm, at least is a proof that before it worked | 09:18 |
sshnaidm | rascasoft, maybe something changed in tripleo-environments repo..? | 09:20 |
rascasoft | sshnaidm, it changed but just in terms of names, config files have the same content | 09:21 |
sshnaidm | rascasoft, ok | 09:22 |
sshnaidm | rascasoft, as a quick hack I'd set "tar -xvf images " in repo_cmd_after, so that they will be extracted to "{{ working_dir }}" | 09:31 |
rascasoft | sshnaidm, and that is fine, but I can also give you an env related to master in which things are not working the same | 09:32 |
sshnaidm | rascasoft, for a long range I'd wait for rlandy to ask how images appear in downstream jobs usually, because I'm not so familiar with that | 09:32 |
rascasoft | sshnaidm, ack | 09:33 |
sshnaidm | rascasoft, yeah, because there is no part in your playbook that fetches them, and I don't know if it's a bug or something intentional | 09:33 |
sshnaidm | rascasoft, so before we start to rewrite all there, I'd ask rlandy :) | 09:33 |
rascasoft | sshnaidm, sure | 09:33 |
rascasoft | makes sense | 09:33 |
rascasoft | I'll continue investigations | 09:33 |
rascasoft | sshnaidm, thanks for now | 09:36 |
sshnaidm | rascasoft, np | 09:36 |
honza | the chan announce says rdo cloud is down but the date is two weaks ago, is it still accurate? still down? | 09:42 |
honza | is there a dashboard somewhere that's kept up to date? | 09:43 |
*** dtantsur is now known as dtantsur|brb | 09:44 | |
honza | i'm getting this error using reproducer, any ideas? Could not find or access '/tmp/.quickstart/playbooks/overcloud-validate-ha.yml' | 09:50 |
honza | this is in /tmp/logs/quickstart_install.log | 09:50 |
sshnaidm | honza, is reproducer of latest job? | 09:58 |
sshnaidm | honza, what is in quickstart-extras-requirements.txt? | 09:59 |
honza | sshnaidm: i'm using legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master from https://review.openstack.org/#/c/596618/1 which seems to have passed in the last 24h | 10:06 |
sshnaidm | honza, better to paste the whole log | 10:06 |
sshnaidm | honza, but afaik ovb reproducer is broken, although not because of this error.. | 10:07 |
honza | sshnaidm: https://paste.fedoraproject.org/paste/TyjjJeBw2Lurh88IEUhrQQ | 10:07 |
honza | sshnaidm: do you have a bug handy? | 10:07 |
sshnaidm | honza, I think this one https://bugs.launchpad.net/tripleo/+bug/1787910 but not sure, didn't try it recently | 10:09 |
openstack | Launchpad bug 1787910 in tripleo "OVB overcloud deploy fails on nova placement errors" [Critical,Triaged] - Assigned to Marios Andreou (marios-b) | 10:09 |
honza | sshnaidm: thanks --- should i open a bug for the above? or am i doing something wrong? it doesn't work on master either, and it did work recently | 10:10 |
ykarel | honza, sshnaidm that bug is only from promotion job(latest nova), shouldn't be related to ovb reproducer | 10:12 |
sshnaidm | honza, yes, please | 10:12 |
sshnaidm | ykarel, ok, just saw weshay|rover mails about it.. | 10:13 |
sshnaidm | honza, seems like last changes affected it.. I'll look at it | 10:13 |
honza | sshnaidm: i'll open one now, thanks for looking | 10:14 |
ykarel | honza, u just ran that reproducer https://logs.rdoproject.org/18/596618/1/openstack-check/legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/7aaed09/logs/reproducer-quickstart.sh? have u used something custom? | 10:14 |
honza | ykarel: no changes | 10:15 |
honza | ykarel: actually, it was featureset001 | 10:15 |
ykarel | honza, ack that's should not matter atleast for the error u shared | 10:16 |
ykarel | honza, so https://review.openstack.org/#/c/537669/ must have caused it | 10:18 |
honza | ykarel: +1 | 10:19 |
ykarel | and this needs to be fixed in ovb recreate, afair this file is removed that time | 10:19 |
ykarel | so a bug for it is worth | 10:20 |
honza | bug here https://bugs.launchpad.net/tripleo/+bug/1789192 | 10:20 |
openstack | Launchpad bug 1789192 in tripleo "OVB reproducer fails during install because of missing file" [High,Triaged] | 10:20 |
honza | sshnaidm: ykarel thanks folks! | 10:21 |
ykarel | sshnaidm, can u please review https://review.openstack.org/#/c/596618/ | 10:22 |
*** jaosorior has joined #oooq | 10:25 | |
sshnaidm | ykarel, done. Does somebody use these files? | 10:28 |
sshnaidm | ykarel, I mean consistent | 10:28 |
ykarel | sshnaidm, don't think so if consistent were are used by someone | 10:29 |
*** dtantsur|brb is now known as dtantsur | 10:41 | |
sshnaidm | honza, found the problem, will submit a patch.. | 11:03 |
*** apetrich has quit IRC | 11:04 | |
*** apetrich has joined #oooq | 11:13 | |
*** gkadam has quit IRC | 11:23 | |
*** gkadam has joined #oooq | 11:23 | |
*** udesale has quit IRC | 11:38 | |
*** ykarel is now known as ykarel|away | 11:42 | |
*** ykarel|away has quit IRC | 11:46 | |
*** dtrainor has joined #oooq | 12:05 | |
*** rfolco has joined #oooq | 12:28 | |
*** rlandy has joined #oooq | 12:30 | |
*** ykarel|away has joined #oooq | 12:31 | |
*** ykarel|away is now known as ykarel | 12:32 | |
*** ykarel_ has joined #oooq | 12:36 | |
*** ykarel has quit IRC | 12:39 | |
weshay|rover | sshnaidm, are you able to get to create_complete in ovb in your tenant? | 12:40 |
weshay|rover | rlandy, ^ | 12:40 |
sshnaidm | weshay|rover, running now | 12:41 |
rlandy | weshay|rover: sshnaidm: I think https://review.openstack.org/#/c/596689/ is missing a piece - pls see https://review.openstack.org/#/c/581488/32/roles/build-test-packages/defaults/main.yml | 12:42 |
rlandy | full fixes for reproducer ^^ | 12:42 |
rlandy | I've been trying to get that through gates for two weeks :( | 12:42 |
weshay|rover | lolz | 12:43 |
weshay|rover | k | 12:43 |
weshay|rover | rlandy, nice job on osp-13 | 12:43 |
*** trown|outtypewww is now known as trown | 12:44 | |
rlandy | weshay|rover: rhos-13 - eventually, now I need to put the --extra-vars settings into a ci-rhos settings file and make sure that overrides fs | 12:45 |
rlandy | ie: node cleaning note working | 12:45 |
rlandy | not | 12:45 |
rlandy | rascasoft: hi - took a shot at fixing your job ... https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans | 12:50 |
rlandy | add stuff to the job itself as extra-vars | 12:50 |
rlandy | also enabled log collection | 12:50 |
rlandy | weshay|rover: ^^ | 12:50 |
weshay|rover | ah cool.. | 12:51 |
weshay|rover | thank you | 12:51 |
*** gkadam has quit IRC | 12:51 | |
weshay|rover | |\n| stack_status | CREATE_FAILED |\n| stack_status_reason | Resource CREATE failed: ResourceInError: resources.undercloud_env.resources.undercloud_server: Went to status ERROR due to \"Message: No valid host was found. There are | 12:52 |
weshay|rover | not enough hosts available., Code: 500\" |\n| parameters | OS::project_id: 4baea7454bf4451aa56da82fc5baf6f6 | 12:52 |
weshay|rover | ovb recreate :( | 12:52 |
ykarel_ | :( dtantsur also asking ^^ | 12:53 |
*** ykarel_ is now known as ykarel | 12:53 | |
dtantsur | ykarel: is it OVB itself? if yes, it's not ironic :) but today I could not create a m1.large2 server on OVB | 12:54 |
dtantsur | also failed with no valid hosts found. I had to create m1.large (without 2) | 12:54 |
ykarel | dtantsur, yes rdo cloud out of hosts, so it will fail i guess even for m1.large | 12:54 |
dtantsur | worked for me, but maybe I took one of last slots :) | 12:55 |
ykarel | :) | 12:55 |
ykarel | good to send a reminder again to folks who are not using there vms, should cleanup for others | 12:55 |
weshay|rover | ya.. the cloud could be in fact out of resources which makes this very hard to debug | 12:56 |
sshnaidm | weshay|rover, deploy failed, but in ansible step on one of controllers: ERROR] stdout: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (FE:16:3E:39:58:1A) already uses address 172.18.0.28." | 12:56 |
sshnaidm | weshay|rover, heat stack is create_complete | 12:56 |
dtantsur | this helps us not to forget that cloud is just someone else's computers :) | 12:57 |
weshay|rover | sshnaidm, ah. I hit that yesterday | 12:57 |
weshay|rover | dtantsur, :) tru | 12:57 |
*** myoung|training is now known as myou7ng | 12:57 | |
*** myou7ng is now known as myoung | 12:57 | |
sshnaidm | rlandy, what was a problem with rascasoft jobs? | 12:58 |
sshnaidm | rlandy, we looked at it together today, I didn't get how it worked before without fetching images | 12:58 |
rascasoft | rlandy oh thanks, I'm checking what you passed to the new job | 13:00 |
myoung | rlandy, rascasoft: catching up from last week, do we need rocky jobs in rdo2 still? | 13:00 |
rascasoft | rlandy, so I'm seeing --extra-vars to_build=true and --extra-vars cacheable=true | 13:01 |
rlandy | sshnaidm: no to_build passed | 13:01 |
rlandy | looked at diff with ovb | 13:01 |
weshay|rover | sshnaidm, rfolco mtg | 13:02 |
*** ykarel has quit IRC | 13:03 | |
*** ykarel has joined #oooq | 13:03 | |
weshay|rover | sshnaidm, rfolco ????? | 13:04 |
sshnaidm | weshay|rover, trying very hard :(\ | 13:04 |
weshay|rover | lol :) | 13:04 |
weshay|rover | k | 13:04 |
rascasoft | rlandy, I need an explanation :) | 13:05 |
rlandy | rascasoft: I just checked what role was not running and why | 13:06 |
rlandy | and did a compare | 13:06 |
rlandy | we can see it ot worked | 13:06 |
rlandy | we also didn;t have logs before which made debug harder | 13:06 |
rascasoft | rlandy, but what you compared? And how did you added logs? | 13:07 |
sshnaidm | rlandy, please include bug number in https://review.openstack.org/#/c/581488/ | 13:09 |
myoung | weshay|rover, rlandy: updated https://code.engineering.redhat.com/gerrit/#/c/147921 (rdo2: rocky) - I think it should be good to go for a merge / first test. | 13:39 |
myoung | rascasoft: ^^ will cause cygnus to start running rocky jobs as part of new promoted rdo1 hashes (for rocky when we get there) | 13:42 |
myoung | rascasoft: ^^ oooq-rocky-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans | 13:43 |
rascasoft | myoung, /me checks | 13:43 |
rascasoft | rlandy, it failed while creating the image | 13:51 |
rascasoft | rlandy, do you think we can have a quick chat on this today? I'm losing too many pieces... | 13:52 |
*** saneax has quit IRC | 14:01 | |
rlandy | panda: https://review.openstack.org/#/c/596422/ | 14:03 |
rlandy | updated to put back the removal of legacy playbook | 14:03 |
rlandy | I did not delete the playbook from git yet | 14:03 |
rlandy | you can work in that review | 14:03 |
rlandy | rascasoft: was in meeting - looking at your error | 14:04 |
panda | rlandy: ok | 14:04 |
weshay|rover | rascasoft, what questions do you have re: your job? | 14:11 |
weshay|rover | https://thirdparty.logs.rdoproject.org/jenkins-oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans-24/ | 14:11 |
*** vkapalav has joined #oooq | 14:12 | |
rascasoft | weshay|rover, I just want to understand why everything exploded | 14:15 |
rascasoft | weshay|rover, it's not clear to me why the last fast forward on the internal repo broke everything | 14:15 |
rlandy | rascasoft: trying same approach we use with https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/tripleo-quickstart-master-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans/configure | 14:16 |
rascasoft | weshay|rover, it can't be the split of the playbook, because things are the same there, but all of sudden overcloud-prep-images | 14:16 |
rascasoft | started failing | 14:16 |
weshay|rover | rascasoft, ok.. let's look | 14:16 |
weshay|rover | rascasoft, isolation-image? | 14:19 |
rascasoft | weshay|rover, what is isolation-image? | 14:19 |
weshay|rover | rascasoft, virt-cat -a isolation-image.qcow2 /tmp/builder.log > builder.log 2>&1 || true\n virt-cat -a isolation-image.qcow2 /ironic-python-agent.log > ironic-python-agent.log 2>&1 || true\n virt-cat -a isolation-image.qcow2 /overcloud-full.log | 14:20 |
weshay|rover | rascasoft, that is what is failing | 14:20 |
weshay|rover | rascasoft, from the undercloud | 14:20 |
rascasoft | weshay|rover, ok, but my question here is: why are we building image again? We *never* did it in the past, at least for baremetal deployments | 14:20 |
weshay|rover | that's not building the image | 14:21 |
weshay|rover | rascasoft, in the sense of using DIB | 14:21 |
rascasoft | weshay|rover, I'm officially lost | 14:21 |
weshay|rover | rascasoft, this is bm undercloud right | 14:21 |
weshay|rover | rascasoft, this error I suspect if still a result of not following upstream config.. | 14:22 |
weshay|rover | rascasoft, I'd have to trace through your config.. ensure that a few settings are turned off | 14:22 |
rascasoft | weshay|rover, yes, plain and simple | 14:22 |
weshay|rover | rascasoft, but srsly. what is the isolation image? | 14:22 |
weshay|rover | rascasoft, where is the config for this job | 14:22 |
rascasoft | weshay|rover, I'm having an hard time following this discussion | 14:23 |
rascasoft | am I supposed to answer? | 14:23 |
weshay|rover | rascasoft, -config $WORKSPACE/tripleo-environments/hardware_environments/ha-lab-cygnus/network_configs/float_nic_with_vlans/config_files/config.yml \ | 14:24 |
* weshay|rover loosk at this | 14:24 | |
rascasoft | weshay|rover, which is http://git.app.eng.bos.redhat.com/git/tripleo-environments.git/tree/hardware_environments/ha-lab-cygnus/network_configs/float_nic_with_vlans/config_files/config.yml | 14:25 |
rlandy | rascasoft: weshay|rover: I only attempted the image build as a test | 14:25 |
rlandy | I removed that and defined ipa_image and reran | 14:26 |
rlandy | watching this now ... https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans/25/console | 14:26 |
rascasoft | rlandy, shouldn't ipa_image be defined in the general release config file? | 14:26 |
rlandy | rascasoft; looks like you are picking up a build | 14:27 |
rlandy | I can just 'build now' on jenkins | 14:27 |
rascasoft | rlandy, I can do it for you | 14:27 |
rascasoft | but if you do "rebuild last" it should do the trick | 14:27 |
rlandy | rascasoft: yeah - I did that | 14:28 |
rascasoft | rlandy, and it didn't ask you to confirm the three parameters? | 14:28 |
rlandy | I think I did retry | 14:28 |
weshay|rover | rascasoft, we need to be as focused on destroying this file | 14:28 |
weshay|rover | http://git.app.eng.bos.redhat.com/git/tripleo-environments.git/tree/hardware_environments/ha-lab-cygnus/network_configs/float_nic_with_vlans/config_files/config.yml | 14:28 |
weshay|rover | as possible | 14:28 |
rascasoft | rlandy, in any case, since we're picking master, ipa_image_url is defined here https://github.com/openstack/tripleo-quickstart/blob/master/config/release/trunk/master.yml#L5 isn't it? | 14:28 |
weshay|rover | rascasoft, I think rlandy will have you there.. once we get through rdo sf hell | 14:29 |
weshay|rover | rascasoft, that is your goal though.. just enough config required to run your extra validation | 14:29 |
rascasoft | weshay|rover, I'm all open on destroying everything we need, but first I want to have success again | 14:29 |
weshay|rover | rascasoft, aye | 14:29 |
rlandy | rascasoft: so you don;t care about a particular build? or you do? | 14:30 |
rascasoft | rlandy, no I don't care | 14:30 |
rlandy | rascasoft: I'm comparing where we get the images on vm undercoud vs. bm undercloud | 14:36 |
panda | rlandy: is this the job you're using to test zullv3 in rdo ? https://review.rdoproject.org/r/gitweb?p=rdo-jobs.git;a=blob;f=zuul.d/zuul-v3-jobs.yaml;h=9af8988e1a6449eac3c5e17231376f4c3882fe99;hb=refs/heads/master | 14:36 |
rlandy | panda: yes - ... I had a trigger job I reverted | 14:37 |
rlandy | but you can readd it to test | 14:37 |
* rlandy gets | 14:37 | |
rlandy | panda: https://review.rdoproject.org/r/#/c/15097/4/zuul.d/tripleo.yaml | 14:38 |
rlandy | ^^ if you use that you will need a change to browbeat minimal to test ... | 14:38 |
panda | rlandy: do you think we can create an intermediate parent for rdo job ? a tripleo-ci-rdo-base that will inherit from triple-ci-base, specifying some basic variables we'll always need in rdo ? | 14:38 |
rlandy | https://review.openstack.org/#/c/596370/ | 14:38 |
rlandy | panda: I had a review out there to separate ovb and multinode ... | 14:39 |
rlandy | https://review.openstack.org/#/c/593063/ | 14:39 |
rlandy | we could add rdo vars there | 14:39 |
rlandy | but what would be rdo specific? | 14:39 |
panda | rlandy: the te-broker ip | 14:40 |
panda | rlandy: and I guess the timeouts | 14:41 |
panda | rlandy: also, nodesets | 14:41 |
ssbarnea | weshay|rover: why this failed to merge? zuul reported that dependent change failed to merge but both CRs listed as depends-on are already merged. | 14:41 |
rlandy | panda: sure - create and intermediate job | 14:42 |
rlandy | whatever it takes at this point | 14:42 |
panda | rlandy: ok | 14:42 |
rlandy | let's just try keep the reviews together so we actually test on commit | 14:42 |
rascasoft | rlandy, I'm wondering why I can add ipa_image_url in here https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2#L29 ? | 14:50 |
rlandy | rascasoft: why not? | 14:51 |
rascasoft | rlandy, I'll prepare a review and add it to the job as a manual workaround | 14:51 |
rascasoft | let's see how it performs | 14:51 |
rlandy | rascasoft: I am comparing https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/tripleo-quickstart-queens-rdo_trunk-baremetal-hp_dl360_envD-single_nic_vlans/ | 14:51 |
rlandy | that seems ti be passing | 14:52 |
rlandy | or better yet master of that | 14:52 |
rlandy | rascasoft: if you look at the config here ... we are using fs001 | 14:52 |
rlandy | for example ... https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-master-rdo_trunk-baremetal-hp_dl360_envD-single_nic_vlans-147/undercloud/home/stack/overcloud-prep-images.sh | 14:53 |
rlandy | openstack overcloud image upload is blank | 14:55 |
rascasoft | rlandy, what do you mean? | 14:58 |
rlandy | rascasoft: ignore me - I am just debugging out loud | 14:59 |
rascasoft | rlandy, in the meantime I'm going to test this https://review.openstack.org/596789 | 15:01 |
rascasoft | rlandy, it failed again on prep image, I'll take the control over the jenkins job to try the above patch, ok? | 15:05 |
rlandy | rascasoft: yes | 15:05 |
rlandy | panda: you can see the failures from removing legacy showing up already... https://review.openstack.org/#/c/596422/ | 15:07 |
panda | rlandy: yeah, working on them | 15:07 |
rlandy | weshay|rover: "Create a tracking bug in launchpad tripleo w/ promotion-blocker to summarize this issue at this point." - which exact issue? there are a wealth of them | 15:16 |
panda | rlandy: surprise ! https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-deploy/defaults/main.yml#L96 | 15:23 |
panda | rlandy: things instantly got more complicated. we are using /etc/nodepool in our tqe roles too. | 15:24 |
rlandy | oh dear | 15:26 |
rlandy | panda: can we leave define it elsewhere? | 15:26 |
rlandy | panda: I think that is used in that complex 3 node test | 15:27 |
panda | rlandy: also, some of the env variables set here are used in https://github.com/openstack/tripleo-heat-templates/blob/master/deployed-server/scripts/enable-ssh-admin.sh#L9 | 15:30 |
rlandy | panda: it was paul's suggestion we could just retrieve that info from the inventory - your thought? | 15:32 |
panda | rlandy: there are no key informations on the inventory, I have to understand what key are we looking for and set that. FOr the /etc/nodepool/subnodes_private, we have that information on the inventory. But if I change that part of code on overcloud-deply, I'm not sure ow to move. what data structure do we have that will be common across all worflows, like /etc/nodepool/ was ? | 15:37 |
rlandy | panda: we also have a reference here ... https://github.com/openstack/tripleo-heat-templates/blob/master/ci/common/net-config-multinode-os-net-config.yaml#L136 | 15:41 |
rlandy | for the overcloud deploy ref ... | 15:42 |
*** ykarel has quit IRC | 15:42 | |
rlandy | I think we can work around that | 15:42 |
* rlandy tries | 15:42 | |
*** ykarel has joined #oooq | 15:43 | |
rlandy | https://github.com/openstack/tripleo-quickstart/blob/master/config/nodes/2ctlr.yml#L14 | 15:50 |
rlandy | ^^ as well | 15:50 |
weshay|rover | rlandy, just the fact we have $something blocking the update of rdo jobs to zuulv3 | 15:59 |
weshay|rover | rlandy, things outside our control that need coordination | 15:59 |
weshay|rover | panda, rlandy where are you guys reconstructing /etc/nodepool/$config | 16:01 |
weshay|rover | ? | 16:01 |
rlandy | not sure yet ... | 16:02 |
rlandy | I think we need a new playbook in that same spot | 16:02 |
rlandy | to pull it out of inventory | 16:02 |
rlandy | unless panda has a better idea | 16:03 |
rlandy | http://logs.openstack.org/63/593063/9/check/tripleo-ci-centos-7-containers-multinode/3370f9e/zuul-info/inventory.yaml - example | 16:04 |
rlandy | subnodes defined there | 16:04 |
rlandy | http://logs.openstack.org/63/593063/9/check/tripleo-ci-centos-7-containers-multinode/3370f9e/logs/undercloud/etc/nodepool/ - claims these are empty archives :( | 16:05 |
weshay|rover | rlandy, wouldn't a jinja template fix this? | 16:09 |
weshay|rover | we have all the info in the inventory | 16:09 |
*** jfrancoa has quit IRC | 16:09 | |
*** ccamacho is now known as ccamacho|brb | 16:10 | |
weshay|rover | rlandy, panda do we have an example of what it shold look like? | 16:11 |
panda | weshay|rover: /etc/nodepool/provider is in inventory.hosts.primary.nodepool.provider, /etc/nodepool/sub_node_private is again in inventory.hosts.$hosts.nodepool.private_ipv4. There no replacement for /etc/nodepool/id_rsa in the inventory, because it's assumed that zull will take care of contacting the subnodes, and probably the undercloud already has that key somwhere, but I still have to understand how we are | 16:21 |
panda | using it | 16:21 |
weshay|rover | ya.. panda I'm look at jobs now w/ it | 16:22 |
weshay|rover | maybe I need to look at rdo jobs | 16:22 |
panda | ok /etc/nodepool/id_rsa is just /home/zuul/.ssh/id_rsa for user zuul | 16:24 |
panda | so we are ok if e are using the default value in the THT template | 16:25 |
*** ykarel is now known as ykarel|dinner | 16:26 | |
panda | updated https://review.openstack.org/596422 let's see how far we can get. | 16:27 |
*** trown is now known as trown|lunch | 16:29 | |
rlandy | panda: thanks | 16:31 |
*** ykarel|dinner has quit IRC | 16:33 | |
weshay|rover | sshnaidm, you still around? | 16:35 |
sshnaidm | weshay|rover, yep | 16:35 |
weshay|rover | sshnaidm, have a second? | 16:36 |
sshnaidm | weshay|rover, yep | 16:36 |
rlandy | http://logs.openstack.org/22/596422/4/check/tripleo-ci-centos-7-scenario009-multinode-oooq/dc4dab8/job-output.txt.gz#_2018-08-27_16_33_54_014438 | 16:36 |
rlandy | panda: ^^ | 16:36 |
rlandy | https://github.com/openstack-infra/tripleo-ci/blob/master/playbooks/tripleo-ci/run-v3.yaml#L40 | 16:39 |
rascasoft | rlandy, I was really close to the solution, then I hit the pip/cache problem :( relaunched right now, let's see | 16:41 |
weshay|rover | sshnaidm, http://logs.openstack.org/32/591632/2/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/caff28b/ara-report/ | 16:41 |
*** kopecmartin has quit IRC | 16:41 | |
weshay|rover | sshnaidm, http://logs.openstack.org/32/591632/2/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/caff28b/ara-report/result/8cc8545a-8b07-445e-9cb0-59d745a9ea97/ | 16:42 |
rlandy | zuul_info_dir: "{{ zuul.executor.log_root }}/zuul-info" | 16:46 |
rlandy | rascasoft: k - let us know | 16:46 |
weshay|rover | rascasoft, f.. pip cache | 16:49 |
rlandy | panda: you around? | 16:53 |
panda | rlandy: yes, uploaded new PS | 16:56 |
rlandy | panda: k - thanks | 16:57 |
weshay|rover | rascasoft, http://jinja.quantprogramming.com/ | 17:03 |
weshay|rover | rascasoft, that must be a bug in to_nice_yaml | 17:03 |
weshay|rover | rascasoft, /me compares {{ overcloud_roles | default('') | to_nice_yaml }} | 17:04 |
weshay|rover | w/ {{ overcloud_roles | default('') }} | 17:04 |
weshay|rover | rlandy, panda so both updateam and rdo have empty files in /etc/nodepool | 17:11 |
weshay|rover | on the undercloud :( | 17:11 |
*** vkapalav has quit IRC | 17:14 | |
panda | weshay|rover: we are getting rid of any dependency of that directory with (at least) https://review.openstack.org/596422, other reviews may follow. | 17:17 |
panda | will follow | 17:17 |
weshay|rover | ah nice | 17:18 |
weshay|rover | so going the other way.. and getting rid of the requirement | 17:18 |
weshay|rover | ++ | 17:18 |
weshay|rover | that's even better | 17:18 |
weshay|rover | panda, rlandy ya.. we'll need follow ups http://codesearch.openstack.org/?q=%2Fetc%2Fnodepool&i=nope&files=&repos=tripleo-ci,tripleo-quickstart,tripleo-quickstart-extras | 17:19 |
rlandy | yep - right now - main review is failing | 17:20 |
*** ykarel has joined #oooq | 17:20 | |
rlandy | typo | 17:20 |
rlandy | http://logs.openstack.org/22/596422/5/check/tripleo-ci-centos-7-scenario004-multinode-oooq-container/6a19943/job-output.txt.gz#_2018-08-27_17_01_58_386777 | 17:20 |
weshay|rover | panda, we create the nodepool file for the recreate work | 17:20 |
weshay|rover | would it not be easier just to adapt that? | 17:20 |
* rlandy fixes | 17:21 | |
weshay|rover | I guess it would be better to nuke | 17:21 |
weshay|rover | I'm a flip flopper | 17:21 |
weshay|rover | not a small bit of work :( | 17:22 |
rlandy | patch 6 at work | 17:23 |
rlandy | weshay|rover: panda: wow - the never ending story of work rework here | 17:23 |
*** dtantsur is now known as dtantsur|afk | 17:24 | |
weshay|rover | rlandy, ya.. | 17:24 |
weshay|rover | agree | 17:24 |
rlandy | watching patch 6 - let's see how we go | 17:24 |
*** trown|lunch is now known as trown | 17:26 | |
*** ykarel_ has joined #oooq | 17:29 | |
rlandy | te_broker_ip is undefined | 17:30 |
rlandy | ok .. | 17:30 |
panda | rlandy: ok wait | 17:30 |
panda | rlandy: we'll use a default value | 17:31 |
*** ykarel has quit IRC | 17:31 | |
rlandy | use the default we had before | 17:31 |
rlandy | export GEARDSERVER=${TEBROKERIP-192.168.1.1} | 17:32 |
panda | rlandy: that default doesnt' make any sense to anyone, it wasn't making srens even fot rh1 and rh2, I'll put 192.168.103.254 | 17:33 |
panda | so this will work in rdocloud while we set up the new parents | 17:33 |
rlandy | ok | 17:34 |
rlandy | the real value should be in rdo ovb file though | 17:34 |
rlandy | default could be anything really | 17:35 |
panda | rlandy: we need the real value to be in the inventory, because it's used by toci_* scripts. That's why I wanted to reparent. Or maybe we can set some group vars in the nodeset, and not touch the parents at all. | 17:37 |
rlandy | right now makes no diff as there is only one te-broker | 17:38 |
panda | panda: repeat with me: "the case statement in bash does not work like in C" | 17:46 |
rlandy | lol | 17:48 |
rlandy | ;; | 17:48 |
rlandy | panda: at least it fails fast | 17:48 |
rascasoft | weshay|rover, uhm the to_nice_yaml is a must... or not? | 17:56 |
weshay|rover | rascasoft, can you show me what you are getting for it? | 17:56 |
rascasoft | weshay|rover, I mean, it is what converts the variable into a yaml usable by the deployment... | 17:56 |
weshay|rover | rascasoft, part of the issue is that you are reporting issues via irc | 17:56 |
rascasoft | weshay|rover, oh it was there also before | 17:56 |
weshay|rover | ya | 17:56 |
weshay|rover | it was there before | 17:56 |
*** holser_ has quit IRC | 17:56 | |
rascasoft | weshay|rover, oh are you saying we need a bug? | 17:57 |
rlandy | bug added to https://review.openstack.org/#/c/581488 | 17:57 |
weshay|rover | rascasoft, I'm saying I need more data | 17:57 |
rascasoft | weshay|rover, about what precisely? | 17:57 |
weshay|rover | rascasoft, I plugged that code into a jinja linter and it works w/o to_nice_yaml | 17:58 |
rlandy | cp: cannot stat ‘/etc/nodepool/id_rsa*’: No such file or directory | 17:58 |
weshay|rover | which is an ansible library I think | 17:58 |
weshay|rover | rascasoft, where do you see that in the log | 17:58 |
weshay|rover | 0 | 17:58 |
rascasoft | weshay|rover, ok, but, what if you try to use it inside a deployment? | 17:58 |
rascasoft | weshay|rover, because that's where I found the problem: I was trying to do a composable deployment without declaring the overcloud_roles var | 17:59 |
rascasoft | weshay|rover, expecting the roles to be generated by generate_overcloud_roles | 17:59 |
rascasoft | weshay|rover, but in this way you get an error | 18:00 |
rascasoft | weshay|rover, I've tried to be as precise as possible in the review description | 18:00 |
weshay|rover | rascasoft, bluejeans | 18:00 |
panda | once more unto the patch, my friends. | 18:00 |
rlandy | panda: ack | 18:00 |
weshay|rover | rascasoft, https://bluejeans.com/u/whayutin/ | 18:01 |
rascasoft | weshay|rover, yes master, coming | 18:01 |
rfolco | rlandy, are you working on sub_nodes replacement for removing legacy pre ? | 18:03 |
rfolco | I see https://review.openstack.org/#/c/596422, but some patches are needed on tripleo-ci and tqe to stop using sub_nodes | 18:05 |
rfolco | I can work on that if not duplicating work... otherwise let me know how I can help rlandy panda | 18:06 |
panda | rfolco: I'm workinmg on it | 18:11 |
rfolco | panda, ack | 18:11 |
panda | rfolco: https://review.openstack.org/596422 | 18:11 |
panda | rfolco: I'll probably have to drop at some point, I hope to get at least past playbooks calling | 18:11 |
rfolco | panda, how about tqe ? https://github.com/openstack/tripleo-quickstart-extras/blob/8c09ea39144e1fdc257f5db150173f5597d76ae3/roles/overcloud-deploy/defaults/main.yml#L96 | 18:13 |
*** dtrainor has quit IRC | 18:14 | |
panda | quickstart: you are fired. | 18:23 |
panda | rfolco: yeah so, I think we are recreating that files somewhere ? is it possibile ? otherwise we have to get it from nodepool and pass it directly as extra var | 18:24 |
panda | to quickstart | 18:24 |
*** ykarel_ is now known as ykarel | 18:25 | |
rfolco | panda, I believe we should replace that with ansible inventory var, I can take a look at this part if you want me to | 18:28 |
rfolco | I mean, can help with tqe references to legacy sub_nodes_private | 18:29 |
panda | oh dear | 18:29 |
panda | tripleo-inventory uses /etc/nodepool | 18:29 |
panda | I'm going to cry in a corner | 18:29 |
rlandy | this is not fun | 18:30 |
rfolco | yeah we need to make sure we label correctly nodes for controllers and compute and use the right label.... for *all-the-code* | 18:31 |
rfolco | thats why we decided to keep with sub_nodes when we started the migration to v3 | 18:31 |
rlandy | panda: rfolco: it's referenced in config as well | 18:32 |
rlandy | https://github.com/openstack/tripleo-quickstart/blob/master/config/nodes/2ctlr.yml#L16 | 18:32 |
rlandy | https://github.com/openstack/tripleo-quickstart/blob/master/config/nodes/1ctlr.yml | 18:32 |
rfolco | yep | 18:33 |
rfolco | yep yep yep | 18:33 |
rfolco | oh man | 18:33 |
rlandy | ^^ need to replace all of that | 18:33 |
rfolco | I guess these changes need to be depends-on in https://review.openstack.org/596422 | 18:36 |
rfolco | tq and tqe | 18:36 |
rlandy | well some tests will hit those | 18:36 |
*** ykarel is now known as ykarel|away | 18:38 | |
rlandy | rfolco: ^^ we will only see the nodes in some tests but inventory in all | 18:39 |
rlandy | and yes 0 depends-on | 18:39 |
rlandy | rfolco: you are putting in those changes?? | 18:40 |
rfolco | rlandy, if you don't mind, I want to help somewhere :) | 18:40 |
rlandy | go ahead | 18:40 |
rfolco | thx | 18:40 |
rlandy | sorry | 18:41 |
rlandy | rfolco: you good with what changes to make? | 18:41 |
rfolco | rlandy, want to quick bj to make sure we are on the same page ? | 18:42 |
rlandy | rfolco: yep - the changes in tq and tqe - include panda as well | 18:43 |
rfolco | rlandy, replace all references to sub_nodes* on tq and tqe. Then https://review.openstack.org/596422 can depends-on these changes | 18:43 |
rlandy | maybe he is done crying in the corner | 18:43 |
rfolco | poor panda | 18:43 |
rlandy | correct | 18:43 |
rlandy | some are reproducer only but we may as well fix them all | 18:44 |
*** dtrainor has joined #oooq | 18:44 | |
panda | rfolco: how are you going to replace that ip list ? | 18:45 |
panda | rfolco: the only way I see is to get it in zuul playbooks with a subelements loop | 18:45 |
panda | rfolco: and then pass it to quickstart | 18:45 |
rfolco | for ansible_host in hosts ? --> http://logs.openstack.org/48/589448/9/check/tripleo-ci-centos-7-3nodes-multinode/4398d2d/zuul-info/inventory.yaml | 18:47 |
panda | rfolco: where ? | 18:48 |
*** vinaykns has joined #oooq | 18:48 | |
rlandy | have to reference the inventory | 18:48 |
panda | rfolco: where you plan to do this ? I'm not sure we have the inventory aavailable | 18:48 |
rlandy | how though | 18:48 |
rlandy | there are roles on zuul-jobs | 18:49 |
rlandy | also we have a zuul inventory and a quickstart one | 18:49 |
rlandy | zuul.executor | 18:52 |
rfolco | well, the idea is to replace points where we read /etc/nodepool/sub_nodes_private to what we have in zuul inventory with the zuul var zuul.hosts.xxx | 18:54 |
rfolco | cat sub_nodes_private | 18:54 |
rfolco | 10.209.4.81 | 18:54 |
rfolco | 10.209.3.247 | 18:54 |
rlandy | rfolco: what does zuul.hosts return? | 18:57 |
rlandy | rfolco: ?? | 19:05 |
rfolco | rlandy, sorry, I am doing some research on zuul docs | 19:05 |
rlandy | ok | 19:05 |
panda | rfolco: rfolco https://github.com/openstack-infra/tripleo-ci/blob/master/playbooks/openstack-zuul-jobs/legacy/pre.yaml#L27 | 19:05 |
panda | rfolco: use this | 19:05 |
panda | rfolco: as a starting point | 19:05 |
panda | rfolco: this is how the legacy playbook recreates the file | 19:06 |
rlandy | zuul.executor.inventory_file | 19:06 |
rfolco | wow I am surprised we filled the legacy with the new vars but not nuked the legacy | 19:06 |
rlandy | can't we just include_vars: file: zuul.executor.inventory_file | 19:11 |
rfolco | the vars are available, just need to use them | 19:14 |
rascasoft | rlandy, weshay|rover, it worked! We're at the overcloud-deploy task, so we passed the critical point https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans/29/console | 19:19 |
rlandy | amazing! | 19:19 |
weshay|rover | rascasoft, good.. nice catch on that | 19:20 |
*** ykarel|away has quit IRC | 19:20 | |
rascasoft | rlandy, weshay|rover, I'm leaving now, but these are the two reviews that we need to merge to apply the change: https://review.openstack.org/#/c/596789/ and https://code.engineering.redhat.com/gerrit/#/c/148025/ | 19:23 |
rascasoft | (in this order) | 19:23 |
sshnaidm | rascasoft, rlandy, weshay|rover I'm afraid it's a hack, not long term solution. This overcloud download was a hack for upgrades team, but rascasoft case is completely different. We shouldn't use this hack for that | 19:26 |
sshnaidm | we have role that fetches images and should use it | 19:26 |
weshay|rover | sshnaidm, no no | 19:26 |
weshay|rover | sshnaidm, point me at that | 19:27 |
weshay|rover | sshnaidm, you referring to fetch-images? | 19:28 |
weshay|rover | we could call that as an option after the undercloud | 19:28 |
sshnaidm | weshay|rover, yea, it doesn't present in these playbooks, commented in the patch | 19:28 |
weshay|rover | rascasoft, the advantage there would be it has nice retries etc | 19:28 |
weshay|rover | sshnaidm, ya.. so that's fairly confusing.. let's put in a patch on top of rasca's | 19:29 |
sshnaidm | weshay|rover, we shouldn't have this code at all, need to check if upgrades still needs it (hope that no) | 19:29 |
rascasoft | weshay|rover, sshnaidm, that's fine by me, but then, why keeping this overcloud image limited thing? | 19:29 |
weshay|rover | sshnaidm, k.. well rascasoft's use case is required | 19:29 |
weshay|rover | but I hear ya w/ fetch.yml | 19:29 |
weshay|rover | sshnaidm, rascasoft /me puts patch in on top of rascasoft | 19:30 |
sshnaidm | rascasoft, it's just a bug that playbook doesn't contain fetching images | 19:30 |
rascasoft | sshnaidm, I can fix that yes | 19:30 |
sshnaidm | rascasoft, should be something like that: https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/ovb-setup.yml#L51 | 19:30 |
sshnaidm | rascasoft, but just with "- { role: fetch-images, when: not to_build|bool }" | 19:31 |
rascasoft | sshnaidm, yes, and this is exactly what we were looking for this morning--- | 19:31 |
sshnaidm | rascasoft, yeah | 19:31 |
rascasoft | sshnaidm, this can be done right after the undercloud install, let me prepare it | 19:32 |
sshnaidm | I just didn't get how it worked before, but as I understood from rlandy it was building images, right? | 19:32 |
rlandy | no - it wasn;t | 19:32 |
rlandy | I was experimenting with that | 19:32 |
rascasoft | sshnaidm, no, at this point I think we los fetch-images somewhere | 19:32 |
rascasoft | *lose | 19:32 |
sshnaidm | rascasoft, yeah, seems like that | 19:32 |
weshay|rover | sshnaidm, fetch images is not a perfect fit | 19:33 |
sshnaidm | weshay|rover, it's the fit :) | 19:33 |
weshay|rover | # Same as the above just copy the base os image to the fetch_dir as undercloud | 19:33 |
weshay|rover | - name: Get base OS qcow2 image from cache | 19:33 |
weshay|rover | command: > | 19:33 |
weshay|rover | cp {{ image_cache_path }} {{ image_fetch_dir }}/undercloud.{{ image.type }} | 19:33 |
weshay|rover | when: image.type == "qcow2" and image.md5sum is defined | 19:33 |
sshnaidm | rascasoft, your jobs are periodic, right? you don't need to build patches there | 19:34 |
rascasoft | sshnaidm, correct | 19:34 |
sshnaidm | rascasoft, cool, so you don't need all these parts that modify image | 19:34 |
rascasoft | not at all | 19:34 |
rascasoft | sshnaidm, rlandy, weshay|rover, fetch images was present before inside the full baremetal-undercloud.yml playbook | 19:35 |
rascasoft | right after prep-containers | 19:36 |
weshay|rover | sshnaidm, ya... it's not a good enough fit to -1 his patch imho | 19:36 |
rascasoft | sshnaidm, I agree in any case with weshay|rover we can keep both | 19:36 |
sshnaidm | rascasoft, you can use it even before undercloud, it doesn't require any OS services afaik | 19:36 |
weshay|rover | sshnaidm, we need to fix this a bit afaict | 19:36 |
rascasoft | sshnaidm, true | 19:37 |
sshnaidm | weshay|rover, what do you mean? | 19:37 |
weshay|rover | sshnaidm, a use case should be that it downloads images directly to the undercloud, not the host | 19:37 |
sshnaidm | weshay|rover, it does | 19:37 |
weshay|rover | ok.. let me try it again | 19:38 |
rlandy | rfolco: any thoughts on include_vars with zuul.executor.inventory_file? | 19:38 |
rfolco | rlandy, what you want to acomplish ? | 19:39 |
rlandy | so we can access all.hosts.primary.* | 19:40 |
rfolco | we can access zuul inventory already | 19:40 |
rascasoft | weshay|rover, sshnaidm, with this in place https://review.openstack.org/596877 I don't even need the internal patch | 19:41 |
rascasoft | weshay|rover, sshnaidm, this should fix *everything* (trying it right now) | 19:41 |
weshay|rover | ah ok.. ya.. that should do it | 19:41 |
rfolco | {{ hostvars['primary']['ansible_host'] }} --> http://logs.openstack.org/48/589448/9/check/tripleo-ci-centos-7-3nodes-multinode/4398d2d/zuul-info/inventory.yaml | 19:42 |
rfolco | ansible_host: 104.130.138.206 | 19:42 |
weshay|rover | I need that for baseos too | 19:42 |
weshay|rover | rascasoft, and you have the overcloud full and ipa defined in your images list? | 19:42 |
rlandy | rfolco: o - so what's the hold up in replacing /etc/nodepool/primary_node_private? | 19:42 |
rascasoft | weshay|rover, sshnaidm yes, both upstream and downstream since those are part of the release config files | 19:43 |
rascasoft | (so no need to override) | 19:43 |
sshnaidm | rascasoft, do you need become:true ? | 19:43 |
sshnaidm | rascasoft, images could be inaccessible this way | 19:44 |
rascasoft | sshnaidm, I don't think so, I wasn't becoming previously | 19:44 |
rascasoft | let me double check | 19:44 |
rascasoft | sshnaidm, I confirm, no need to become. fetch-images takes care of it (see tripleo-quickstart/roles/fetch-images/tasks/main.yml) | 19:45 |
sshnaidm | rascasoft, ok, commented there | 19:45 |
rascasoft | sshnaidm, sorry sagi, I double checked, I WAS doing the same before | 19:46 |
rascasoft | sshnaidm, let me try the patch and I can come tomorrow with the results to see if it worked | 19:47 |
rascasoft | sshnaidm, but it makes more sense not having become: true, let me repatch | 19:47 |
rascasoft | sshnaidm, weshay|rover, done, I'm testing it | 19:48 |
sshnaidm | rascasoft, yeah, worth to check without become, always possible to change it back.. | 19:49 |
*** holser_ has joined #oooq | 19:50 | |
rfolco | rlandy, lack of knowledge of what is zuul inventory vs quickstart inventory from my side | 19:51 |
rfolco | rlandy, I am replacing case by case and will submit a patch soon | 19:51 |
rascasoft | sshnaidm, weshay|rover, testing it, I'm leaving now, my eyes are bleeding. I'll let you know how the test works. | 19:52 |
weshay|rover | rascasoft, cool man.. thanks | 19:54 |
rlandy | cool | 19:58 |
rlandy | weshay|rover: rascasoft: sshnaidm: should we just update to remove the become? | 20:04 |
rlandy | https://review.openstack.org/#/c/596877/ | 20:05 |
*** dtrainor_ has joined #oooq | 20:05 | |
weshay|rover | rlandy, ya.. become is gone | 20:05 |
rlandy | then we can core vote on it and move things along | 20:05 |
rlandy | ah - sorry see updated | 20:05 |
*** dtrainor has quit IRC | 20:06 | |
rlandy | under test now? | 20:06 |
weshay|rover | ssbarnea, rfolco https://review.openstack.org/#/c/596799/ | 20:07 |
rlandy | weshay|rover: pls check out https://bugs.launchpad.net/tripleo/+bug/1789294 and let me know what you want to do with assigning this bug and status | 20:22 |
openstack | Launchpad bug 1789294 in tripleo "RDO Cloud jobs move to zuulv3 native is blocked by legacy dependencies" [Undecided,Triaged] | 20:22 |
rlandy | I can take ownership if need be | 20:23 |
rlandy | rfolco: panda: ^^ | 20:30 |
rlandy | need to run out for a few minutes - biab | 20:31 |
*** rlandy is now known as rlandy|brb | 20:31 | |
*** holser_ has quit IRC | 20:36 | |
*** trown is now known as trown|outtypewww | 20:43 | |
*** rlandy|brb is now known as rlandy | 20:53 | |
rlandy | https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-master-rdo_trunk-bmu-ha-lab-cygnus-float_nic_with_vlans/30/console | 20:55 |
rlandy | failed :( | 20:55 |
rlandy | pip cache? didn;t bring in change | 21:03 |
rlandy | best way to clear pip-cache? | 21:15 |
rlandy | --clean errors | 21:15 |
rlandy | weshay|rover: sshnaidm: ping - ever used workdir/apply_patch.sh? | 21:24 |
rlandy | don't think this is working for rascasoft's job | 21:25 |
rlandy | would like to use review -d | 21:25 |
rlandy | rfolco: how goes it? | 21:27 |
rfolco | rlandy, painful | 21:27 |
rfolco | rlandy, let me just finish the second patch so we can iterate/test | 21:27 |
rlandy | rfolco: sure - I am around | 21:28 |
rfolco | rlandy, my last blocker: https://github.com/openstack/tripleo-quickstart-extras/blob/8c09ea39144e1fdc257f5db150173f5597d76ae3/roles/overcloud-deploy/defaults/main.yml#L96 | 21:29 |
rfolco | rlandy, sub_nodes_private would have: | 21:29 |
rfolco | secondary-2 ip | 21:29 |
rfolco | secondary-1 ip | 21:30 |
rlandy | hmmm ... 21:17:41 fatal: Couldn't find remote ref refs/changes/77/596877/2 | 21:30 |
rlandy | rascasoft: ^^ | 21:30 |
rlandy | rfolco: ok - looking positive | 21:30 |
rfolco | rlandy, my question is - https://github.com/openstack/tripleo-quickstart-extras/blob/8c09ea39144e1fdc257f5db150173f5597d76ae3/roles/overcloud-deploy/defaults/main.yml#L96 | 21:30 |
rlandy | should just return ip I think | 21:31 |
rfolco | how do I get controller for 1 subnode or 2 subnodes cases | 21:31 |
rlandy | depends on nodes file | 21:31 |
rfolco | hosts: "{{ hostvars['secondary-1']['ansible_host'] }}" | 21:31 |
rfolco | for 1 ctrl | 21:31 |
rlandy | yes | 21:31 |
rlandy | ok | 21:31 |
rlandy | and | 21:31 |
rfolco | and hosts: "{{ hostvars['secondary-2']['ansible_host'] }}" | 21:31 |
rfolco | for 2ctrl | 21:32 |
rfolco | I just don't find a good condition | 21:32 |
rlandy | that is ok | 21:32 |
rlandy | it was never deterministic | 21:32 |
rlandy | hence the failure on 3 node | 21:33 |
rfolco | the problem is with sed it got always the 1st line of the file | 21:33 |
rfolco | which was always the controller node, secondary-2 or secondary-1 | 21:33 |
rfolco | but here I need to specify who is the controller | 21:33 |
rfolco | maybe it requires a new nodeset group/label | 21:34 |
rlandy | rfolco: https://github.com/openstack/tripleo-quickstart/blob/master/config/nodes/2ctlr.yml | 21:35 |
rlandy | that never worked consistently | 21:36 |
rlandy | sometimes it was in the right order and sometimes not | 21:36 |
rfolco | hmm it overrides | 21:36 |
rlandy | it didn't always get it right | 21:37 |
rlandy | myoung: you still around? | 21:37 |
rlandy | know anything about workdir/apply_patch.sh? | 21:38 |
rfolco | rlandy, panda https://review.openstack.org/#/q/topic:replace_legacy_pre+(status:open+OR+status:merged) | 21:52 |
rfolco | I am sure this requires more work | 21:52 |
rlandy | rfolco: k - will check it out | 21:52 |
rfolco | feel free to jump in | 21:52 |
rlandy | just giving rascasoft's job and other try | 21:55 |
rlandy | 21:54:56 + /home/rhos-ci/workdir/apply_patch.sh 596877 tripleo-quickstart-extras tripleo-quickstart-extras openstack | 21:55 |
rlandy | 21:54:57 Revision for 596877/ -> refs/changes/77/596877/2 | 21:55 |
rlandy | better | 21:55 |
rlandy | finally!!!! | 22:04 |
rlandy | rascasoft's job has the change | 22:04 |
rlandy | rfolco: pls make https://review.openstack.org/#/c/596422/ dependent on your jobs so we can test | 22:06 |
*** apetrich has quit IRC | 22:09 | |
*** apetrich has joined #oooq | 22:14 | |
*** dsneddon has joined #oooq | 22:19 | |
*** vinaykns has quit IRC | 22:35 | |
weshay|rover | rlandy, easy one https://review.openstack.org/#/c/596919/ | 22:48 |
* weshay|rover looks at the bug ] | 22:49 | |
weshay|rover | rlandy, /me reads https://bugs.launchpad.net/tripleo/+bug/1789294 | 22:49 |
openstack | Launchpad bug 1789294 in tripleo "RDO Cloud jobs move to zuulv3 native is blocked by legacy dependencies" [Undecided,Triaged] - Assigned to Ronelle Landy (rlandy) | 22:49 |
rlandy | looks fine | 22:49 |
rlandy | thanks | 22:49 |
rlandy | weshay|rover: also modified rascasoft's job a bit | 22:50 |
weshay|rover | rlandy, k.. thanks | 22:50 |
rlandy | looks like it gets the change now | 22:50 |
weshay|rover | rlandy, what do you want to do w/ osp-13 run it on changes or periodically? | 22:50 |
rlandy | weshay|rover: periodically | 22:50 |
rlandy | but I need to put in a review change | 22:50 |
rlandy | will do that tonight | 22:51 |
rlandy | I hacked the job | 22:51 |
weshay|rover | k | 22:51 |
weshay|rover | sshnaidm, is working late :) | 22:52 |
weshay|rover | or :( | 22:52 |
weshay|rover | rlandy, ssbarnea sshnaidm don't work tooooo much and fry thyself | 22:52 |
sshnaidm | I'm not working, just keeping eye ) | 22:52 |
weshay|rover | k.. I'm off to soccer practice.. bbl | 22:53 |
rlandy | not so late for me | 22:54 |
rlandy | 6:45 | 22:54 |
rfolco | weshay|rover, [siri]: did you mean "football" ? | 22:55 |
rfolco | :) | 22:55 |
rlandy | lol | 22:56 |
weshay|rover | futbol | 23:04 |
weshay|rover | l8r's | 23:04 |
*** tosky has quit IRC | 23:08 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!