Friday, 2020-12-04

*** sshnaidm|afk has joined #tripleo00:20
*** sshnaidm|afk is now known as sshnaidm|off00:20
*** gouthamr has quit IRC00:57
*** gouthamr has joined #tripleo00:59
*** tosky has quit IRC01:01
*** rfolco has joined #tripleo01:02
*** rfolco has quit IRC01:18
*** openstackgerrit has quit IRC01:37
*** zzzeek has quit IRC02:14
*** zzzeek has joined #tripleo02:15
*** Goneri has quit IRC02:19
*** dhill has quit IRC03:37
*** skramaja has joined #tripleo03:40
*** dhill has joined #tripleo03:46
*** mvalsecc has quit IRC04:32
*** hakhande has joined #tripleo04:44
*** ykarel|away has joined #tripleo04:53
*** mvalsecc has joined #tripleo04:53
*** ykarel|away is now known as ykarel04:53
*** udesale has joined #tripleo04:56
*** evrardjp has quit IRC05:33
*** evrardjp has joined #tripleo05:33
*** lbragstad has quit IRC06:02
*** lbragstad has joined #tripleo06:03
*** hakhande has quit IRC06:17
*** ratailor has joined #tripleo06:20
*** hakhande has joined #tripleo06:29
*** bandini has joined #tripleo06:38
*** lmiccini has joined #tripleo06:51
*** ysandeep|away is now known as ysandeep06:55
*** rcernin has quit IRC07:04
*** marios has joined #tripleo07:17
*** rcernin has joined #tripleo07:17
*** rcernin has quit IRC07:18
*** saneax has joined #tripleo07:31
*** hakhande has quit IRC07:41
*** amoralej|off is now known as amoralej07:44
*** jcapitao has joined #tripleo07:57
*** belmoreira has joined #tripleo08:05
*** cylopez has joined #tripleo08:30
*** tkajinam has quit IRC08:31
*** tkajinam has joined #tripleo08:32
*** jaosorior has joined #tripleo08:35
*** pcaruana has joined #tripleo08:35
*** jpena|off is now known as jpena08:57
*** xek_ has joined #tripleo08:58
*** jpich has joined #tripleo08:59
*** tosky has joined #tripleo09:05
*** frenzy_friday has joined #tripleo09:06
*** derekh has joined #tripleo09:09
*** ysandeep is now known as ysandeep|lunch09:12
*** ramishra has quit IRC09:19
*** ramishra has joined #tripleo09:40
*** gfidente|afk is now known as gfidente09:46
*** mvalsecc has quit IRC09:49
*** karthiks has joined #tripleo10:22
*** ysandeep|lunch is now known as ysandeep11:13
*** cmorey has joined #tripleo11:28
cmoreywhere's the best place to ask about troubleshooting ceph storage node deployment with 'openstack overcloud deploy' I'm getting @stderr: [errno 110] error connecting to the cluster\", \"-->  RuntimeError: Unable to create a new OSD id\@ when it's in TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds]11:31
*** rfolco has joined #tripleo11:36
Tengugfidente, fultonj, fmount -^^  that's for you I think :)11:47
fmountcmorey: o/ that should be ceph-ansible (triggered during the overcloud deploy) trying to build the ceph components: I think you should take a look into your <config-download>/ceph-ansible dir (where a ceph_ansible_command.log can be found)11:50
fmountTengu: thanks for the ping11:50
Tengufmount: bouncing things to other folks is my Friday Pleasure ;)11:51
fmountcmorey: <config-download> could be /var/lib/mistral/<stack_name>11:51
fmountTengu: ahahah ++11:51
fmountcmorey: that error you see there reminds me some issues related to mons <-> osds nodes connectivity: check they can reach each other on the storage network11:53
fmount(mons can be found collocated within the openstack services in controller nodes, osds in the ceph-storage nodes)11:53
fmountfultonj: gfidente  fyi ^11:53
*** jcapitao is now known as jcapitao_luncj11:57
*** jcapitao_luncj is now known as jcapitao_lunch11:57
cmoreyfmount, ah... that's a nugget i was missing11:57
cmoreyfmount (the location of the mons),11:57
cmoreylet me doublecheck the network config then, I should have prov, storage and storagemgmt vlans configured on both the controller and ceph node11:58
fmountack11:59
Tenguand check container status11:59
cmoreydangnabbit12:00
Tengunice password12:01
* Tengu runs away12:01
*** ratailor has quit IRC12:01
cmoreynot a password, just an experession of (oops, i may have found the problem), although the vlans appear to be configured, i can't ping from ceph to controller on at least one of the 3 vlans12:02
Tenguroute issue?12:02
hjensasreviews please - https://review.opendev.org/c/openstack/tripleo-ansible/+/763377 *thanks*12:03
Tenguhjensas: uho....12:03
TenguI'll need to update tripleo-lab I think, with that change :/12:03
Tenguhjensas: does it have any impact on the "name_lower" thing in the network description?12:04
Tengu*network-data12:05
hjensasTengu: I don't think we remove the role_networks_lower map set in group_vars. So if you have working templates using that they should'nt be broken.12:05
hjensasTengu: but this will https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76337912:05
Tenguerf12:05
Tenguguess I'll start an env with those 2 patches.12:06
hjensasTengu: I'm open to discuss if we should'nt merge that thing.12:06
hjensasTengu: i.e keep the role_networks_lower map in group_vars to not break people. I just think it's not so long since we introduced, it maby we can be a bit naughty and just clean it up before we have to keep both forever because lot's of people are using it.12:07
Tenguhjensas: well... if the migration path is known, I'm not against. for instance, here's what I generate here: https://github.com/cjeanner/tripleo-lab/blob/master/roles/overcloud/templates/oc-network-data.yaml.j212:07
Tengubut it's "name_lower" here. sooooo maybe I'm just mixing things up ?12:08
Tengujust want to avoid surprises on the network part next time I deploy ^^'. call me selfish ;)12:08
cmoreyTengu, not sure, vlan configuration looks o.k. on the switch, i think there may be an issue with the network profile12:13
hjensasTengu: yes, you are mixing things up. It's not the name_lower in network data. It's a group_var role_networks_lower used by the ansible j2 nic config templates that change.12:25
Tenguhjensas: fine then :). Better asking than being lost later.12:26
*** jpich has quit IRC12:26
hjensasTengu: indeed.12:26
*** jpich has joined #tripleo12:27
hjensasTengu: btw, we should catch up on the need to disable validations when using nova-less (pre-deployed) server some time.12:27
Tenguhjensas: ah, I saw a ping some time ago indeed!12:27
hjensasTengu: since nova-less is the default now, it's not nice that we have to use12:27
hjensasTengu: '--disable-validations' flag.12:27
Tenguyep. would be interesting to understand why we have to deactivate them, and maybe do some cleanup in order to remove the problematic things. Or, better, correct them12:28
hjensasTengu: I think that disables many validations we do want to run ...12:28
Tenguyup12:28
* hjensas needs to figure out why it's needed12:28
Tenguhjensas: we can have a discussion next week?12:28
hjensasTengu: yes, if I find time to investigate before. Let's revisit next week.12:29
Tenguhjensas: same for me - need to do some checks on my own12:29
Tenguhjensas: you know where you can find me ;).12:29
Tenguhjensas: I think it's related to utils.check_stack_network_matches_env_files (in tripleoclient/v1/overcloud_deploy.py)12:31
*** jpena is now known as jpena|lunch12:32
Tenguthough the easiest way is to disable the check on the parameter inputs and just run things.12:32
Tenguand see how it burns :]12:33
*** zzzeek has quit IRC12:36
*** zzzeek has joined #tripleo12:37
*** cgoncalves has quit IRC12:45
*** cgoncalves has joined #tripleo12:46
*** cgoncalves has quit IRC12:47
*** udesale_ has joined #tripleo12:47
*** udesale has quit IRC12:49
*** weshay|pto is now known as weshay|ruck12:49
cmoreyTengu, thanks for that nugget, for some reason i assumed the mon would be running on the (only) CephStorage node, and it turns out my vlan assignments were on the wrong port (obscured by the fact that i was seeing vlan 1 traffic on the port i was expecting to be up)12:50
Tengucmorey: heh - yeah, ceph mon are on the controllers, only the OSD are on the ceph nodes12:51
*** jcapitao_lunch is now known as jcapitao12:58
*** rlandy has joined #tripleo13:00
*** bandini has quit IRC13:03
*** jpich has quit IRC13:03
*** jpich has joined #tripleo13:04
fultonjfmount: gfidente: i think we can land the non-wip parts of https://review.opendev.org/q/topic:%22tripleo_ceph_client%22 soon.13:07
fultonjso we're trying to get RDO green (though it's red for unrelated reasons)13:08
fultonjand tht patch needs green zuul but others are green with zuul13:08
fultonjfmount: i'll be trying your CephExternalMultiConfig idea today13:09
*** hberaud has quit IRC13:14
*** bandini has joined #tripleo13:15
*** hberaud has joined #tripleo13:16
gfidentefultonj yeah I am looking at the comments in the puppet change right now13:17
*** skramaja has quit IRC13:19
gfidenteso I might have to make a small change to a parameter description13:21
gfidentefmount fultonj ^^ do we have any job in progress waiting to finish or can I update puppet change now?13:21
fultonjgfidente: not me13:22
Tenguhjensas: sooo.... I'm running an overcloud deploy with pre-deployed nodes (using metalsmith) AND validations... let's see.13:22
Tengusince I happen to have an env.13:22
*** jpena|lunch is now known as jpena13:29
*** ysandeep is now known as ysandeep|mtg13:33
*** cgoncalves has joined #tripleo13:40
cmoreyTengu, i'm not sure if the fact i have multipath on this node (so the drive is showing up twice) is causing it, but i;ve only told it to use one path, but i'm getting a error that looks like its unhappy about  non-zero exit code due to "WARNING: The same type, major and minor should not be used for multiple devices."  which is generating a " KeyError: 'ceph.cluster_name'"13:41
cmoreyTengu, short of phyiscally pulling one of the SAS links, is there a way around this?13:41
Tengucmorey: you probably want to talk again to fmount and his team :)13:41
*** fmount has quit IRC13:46
cmoreyas if on queue :)13:47
cmoreycue even13:47
cmoreythanks for your help so far Tengu13:48
*** pleimer_ has joined #tripleo13:48
slaglehjensas: what do you think about https://review.opendev.org/c/openstack/tripleo-ansible/+/765431? i was running into the issue this fixes yesterday13:50
*** amoralej is now known as amoralej|lunch13:52
cmoreyTengu, looks a lot like https://access.redhat.com/solutions/5398181 .13:52
Tengumeh....13:53
cmoreyexcept i'm not running in HCI (and it's not RHSOP, but train/centos7)13:53
hjensasslagle: that makes sense. Thanks! +213:53
cmoreyso i guess fmount/gfidente/fultonj are my go-tos for it?13:54
Tengucmorey: osp-16 (basically stable/train) is aiming rhel8 (centos-8). I think you might want to move away from centos-713:54
Tengucmorey: and yeah - they are the best ppl for ceph issues :)13:54
cmoreyTengu, centos8 dropped support for the sas cards in the boxes i'm using as a testbench,... so..13:55
Tenguyay...13:55
Tenguhow convenient13:55
*** fmount has joined #tripleo13:55
*** saneax has quit IRC13:57
cmoreyah ha, speak of fmount and ...14:02
*** tkajinam has quit IRC14:07
cmoreyfmount: o/ i'm getting the same error as https://access.redhat.com/solutions/5398181 but i'm not running HCI14:13
fmountcmorey: hey, mmm let me see, didn't see this kind of error before but we can investigate14:18
*** ysandeep|mtg is now known as ysandeep14:19
fmountso it's basically ceph-volume failing on running that command to build the osd14:19
*** cmorey has quit IRC14:21
*** cmorey has joined #tripleo14:25
fmountcmorey: hey I found an open tracker for this kind of bug in ceph-volume14:28
fmountcmorey: https://tracker.ceph.com/issues/4435614:28
*** bogdando has joined #tripleo14:29
fmountcmorey: can you show me the ceph-ansible DiskConfig?14:31
cmoreywhere would i find it?14:32
*** mcornea has joined #tripleo14:32
fmountyou should have a storage/ceph/ceph-ansible environement file (where the disks are specified) included in your deploy command, let me see the content of that file14:34
*** apetrich has quit IRC14:35
cmoreysorry, brain slow, one sec14:35
cmoreyhttp://paste.openstack.org/show/800735/14:35
*** apetrich has joined #tripleo14:38
slaglehjensas: thanks :)14:39
*** amoralej|lunch is now known as amoralej14:40
*** bogdando has quit IRC14:40
*** bnemec has joined #tripleo14:44
gfidentefmount fultonj tht just passed14:45
gfidenteI am going to refresh puppet just to fix the parameters comment14:45
fmountcmorey: osd_scanario: lvm14:45
fmounts/scanario/scenario14:45
cmoreyo.k.14:46
fultonjgfidente: glad it passed, sounds good14:46
fmountgfidente: /me checking the logs14:46
cmoreyfmount, whoops,.. thanks14:46
*** TrevorV has joined #tripleo14:48
*** bnemec is now known as beekneemech14:50
*** tmazur has joined #tripleo14:51
fmountfultonj: gfidente logs look good, just rechecked with scenario00{1,4} => 76091514:52
*** openstackgerrit has joined #tripleo14:52
openstackgerritGiulio Fidente proposed openstack/puppet-tripleo master: Remove /etc/ceph dependency on puppet services  https://review.opendev.org/c/openstack/puppet-tripleo/+/76354514:52
*** Goneri has joined #tripleo14:54
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Introduce role/instance 'networks' key  https://review.opendev.org/c/openstack/tripleo-ansible/+/76216014:56
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Populate network ports env module  https://review.opendev.org/c/openstack/tripleo-ansible/+/76463814:56
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Add role_net_map to expand roles output  https://review.opendev.org/c/openstack/tripleo-ansible/+/76463914:56
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Provison/Unprovision instance network ports  https://review.opendev.org/c/openstack/tripleo-ansible/+/76464014:56
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Provision workflow managed/unmanaged node support  https://review.opendev.org/c/openstack/tripleo-ansible/+/76216214:56
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook  https://review.opendev.org/c/openstack/tripleo-ansible/+/76554514:56
gfidentefmount ack I fixed the params in the puppet change14:57
gfidente(descriptions)14:57
fmount++ thanks14:57
*** jbadiapa has joined #tripleo15:00
openstackgerritFrancesco Pantano proposed openstack/tripleo-heat-templates master: WIP - Disable ceph client role execution  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76091515:02
openstackgerritSandeep Yadav proposed openstack/tripleo-quickstart-extras master: Modify overcloud-deploy to support multiple stacks  https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/76378615:11
*** ysandeep is now known as ysandeep|away15:23
cmoreyfmount, died with the same error15:24
*** pojadhav|ruck is now known as pojadhav|afk15:25
cmoreyshould i re-enable multipathd and switch to the multi-path device?15:25
*** mcornea has quit IRC15:27
fmountnot sure, do you have any ceph-volume.log <= in the storage node which is failing15:29
cmoreyyeah, it seems obsessed with /dev/sda115:29
fultonjcmorey: did you clean your disks?15:30
fultonjhttps://bugzilla.redhat.com/show_bug.cgi?id=161391815:30
cmoreyfultonj, earlier today i removed the vg and pv.. for the lvm volume /dev/sda is the OS disk15:30
openstackbugzilla.redhat.com bug 1613918 in Documentation-RHHI4C "[Docs] The Ceph Guide for OpenStack should have have a disk cleaning recomendation" [Medium,Verified] - Assigned to agunn15:30
fultonjthe OS disk cannot be used as an osd15:31
fmountyeah cmorey if you can rerun after cleaning your disks it's better15:31
fmountcmorey: is /dev/sdb used for osds, right?15:31
cmoreyfultonj, i don't want it to use the OS disk15:31
fmountfultonj: http://paste.openstack.org/show/800735/15:31
fultonjgood15:32
cmoreyfultonj, i only want it to use /dev/sdb (or ideally /dev/mapper/mpatha15:32
cmoreyI did a full node clean before i tried to start deploying it as a cephNode15:32
cmoreyi haven't cleaned it since though15:32
fultonjcmorey: please see https://bugzilla.redhat.com/show_bug.cgi?id=1613918#c115:33
openstackbugzilla.redhat.com bug 1613918 in Documentation-RHHI4C "[Docs] The Ceph Guide for OpenStack should have have a disk cleaning recomendation" [Medium,Verified] - Assigned to agunn15:33
fultonjyou could run 'lsblk' on the node15:34
cmoreyfultonj, /dev/sdb is a new array added to the node after deployment, the only thing that's used it is the ceph-node15:34
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook  https://review.opendev.org/c/openstack/tripleo-ansible/+/76554515:35
fultonjdid you read comment #1 ?15:35
fultonjit needs to be cleaned in between every deployment attempt15:35
cmorey:(15:36
*** ekultails has joined #tripleo15:37
cmoreyfultonj, the node is active, so i can't clean it...15:37
fultonjis it in production?15:38
*** udesale_ has quit IRC15:38
cmoreythe rest of the overcloud kind of is15:38
cmoreyi guess i can set cephnode count to 0 and see if that will de-provision15:39
mwhahahano15:39
fultonjdeprovisoning is documented15:39
mwhahahayou have to delete teh node (if deployed)15:39
fultonjproduction server with only 1 OSD?15:39
cmoreyopenstack baremetal node delete15:39
fultonjno15:39
cmoreyfultonj, PoC test cluster15:39
fultonjyou can ssh into it and directly clean it15:40
hjensasslagle: since you are testing the network extract/provision things I added you on a couple of reviews that still need to merge.15:40
fultonjsgdisk -Z15:40
fultonjand there's a dmsetup command too15:40
cmoreyfultonj, one equestion, do you want me to clean the whole node, or just sdb?15:41
fultonjsdb15:41
cmoreyoh right15:41
cmoreythat's much easier...15:41
fultonjsince that's what ceph-volume isn't making an OSD on15:41
fultonjbut because people often don't clean the disk correctly when they try to do it manualy15:41
cmoreyi'm happy to blow that away, i thought you wanted me to blow the whole node15:41
fultonji think it's better to have ironic clean it correctly15:42
cmoreyfultonj, i'll happi;y trash the volume from the storage device and re-create if if needs be15:42
fultonjironic will do the whole node and clean it right15:42
fultonjyou can try it's certtainly possible15:42
fultonjsgdisk -Z may not be sufficient15:42
cmoreyis there a command that would show that it's clean enough?15:43
fultonjlsblk15:43
fultonjdmsetup15:43
cmoreyo.k.15:43
fultonjsdb                                                                                                    8:16   0   50G  0 disk ceph--a881a17b--eee2--4c65--8709--538fb07af16c-osd--data--207c4446--851f--44de--b25c--9699552c9243 253:1    0   50G  0 lvm15:44
fultonjis for a disk that's NOT clean ^15:44
fultonjit has ceph data on it15:44
fultonjso ceph "helps" you by not deleting your data for you even if it's left over form an older deployment15:44
openstackgerritMerged openstack/tripleo-ansible stable/victoria: Fix networks_skip_config condition in nic templates  https://review.opendev.org/c/openstack/tripleo-ansible/+/76496515:45
fultonjit really is helping you in the long run though, since deleting data is something the admin should take responsibility for15:45
fultonjbut for now it's getting in your way ;)15:45
fultonjdmsetup ls15:46
*** ysandeep|away is now known as ysandeep15:46
fultonjso if you see something like that then15:46
fultonjdmsetup remove ceph--98dfc177--3fe1--4248--9333--c2110f854c76-osd--data--4284de94--753a--4bfc--a27e--8fa79f1ab42b15:46
fultonjcmorey: ^15:46
cmoreyhttp://paste.openstack.org/show/800740/15:46
fultonjand then sgdisk -Z /dev/sdX15:47
rlandymwhahaha: https://review.opendev.org/q/dbfa2399b47aa3b9fef84709ff786b9bf69bf2d3 - the branch cherry-picks here are not merged - any problem with merging these now?15:47
cmoreyfultonj, done15:47
fultonjyou did the sgdisk too?15:47
cmoreyyep15:47
fultonjis there a ceph-volume log on the machine?15:47
cmoreyhttp://paste.openstack.org/show/800740/15:48
cmoreyyes15:48
cmorey(to the ceph-volume log)15:48
cmoreywant me to move that out of the way? or just nuke /etc/ceph, /var/lib/ceph, /var/log/ceph15:48
fultonjno15:48
mwhahaharlandy: no. i think they were stuck on ci15:49
fultonjthe ceph-volume log should tell you why ceph-volume wasn't happy15:49
fultonjusually it's because the disk isn't clean15:49
fultonjcmorey: did the output of those commands look diff before you cleaned it?15:49
rlandymwhahaha: k - thanks - will recheck15:49
fultonjit basically runs https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/15:50
fultonjbut within the ceph container15:50
cmoreyfultonj, lsblk showed something like you posted,15:50
fultonjcmorey: ah good15:50
fultonjcmorey: that probably means it was culprit15:50
fultonjso try re-running the same 'openstack overcloud deploy ...'15:51
cmoreyfultonj, do you want the ceph-volume.log or shall i re-try the deploy15:51
fultonjyou can tail -f config-download/<stack>/ceph-ansible/ceph-ansible.log when you redeploy15:51
openstackgerritmbu proposed openstack/tripleo-validations master: Generate inventory without any overcloud  https://review.opendev.org/c/openstack/tripleo-validations/+/76495515:51
fultonjyou'll need to confirm the exact path15:52
fultonjrelative to your version and env15:52
cmoreyis that on the undercloud node?15:52
fultonjyes15:52
cmoreyok. i'll see how that goes,15:52
*** ysandeep has quit IRC15:52
*** odyssey4me has quit IRC15:52
*** ysandeep has joined #tripleo15:53
*** odyssey4me has joined #tripleo15:53
openstackgerritMerged openstack/tripleo-heat-templates stable/victoria: Fix barbican settings missing from glance Edge nodes  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76513916:04
openstackgerritMerged openstack/tripleo-validations stable/train: Add a validation to check the local.  https://review.opendev.org/c/openstack/tripleo-validations/+/76433416:04
openstackgerritMerged openstack/tripleo-heat-templates master: [PowerFlex] Fix resource name typo in template  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76503116:05
openstackgerritMerged openstack/tripleo-quickstart master: Revert "ensure dlrn current is actually pulling current"  https://review.opendev.org/c/openstack/tripleo-quickstart/+/76508516:05
*** ykarel has quit IRC16:17
openstackgerritDavid Peacock proposed openstack/tripleo-operator-ansible master: WIP - add role to show tripleo validation  https://review.opendev.org/c/openstack/tripleo-operator-ansible/+/75537516:19
openstackgerritDavid Peacock proposed openstack/tripleo-operator-ansible master: WIP - add role to list available tripleo validations  https://review.opendev.org/c/openstack/tripleo-operator-ansible/+/75536516:19
*** marios is now known as marios|out16:22
openstackgerritAlan Bishop proposed openstack/tripleo-heat-templates stable/ussuri: Fix barbican settings missing from glance Edge nodes  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76556616:23
*** ysandeep is now known as ysandeep|away16:28
openstackgerritMerged openstack/tripleo-ansible stable/train: port c7 molecule job to c8 - part 3  https://review.opendev.org/c/openstack/tripleo-ansible/+/75238216:31
*** amoralej is now known as amoralej|off16:38
cmoreyfultonj, cleaning the disk has allowed it to move on, it now seems to be complainign about unrecognised pools (specifically vms, volumes and images16:40
cmoreyah, it created vms o.k. but volums and images complain with "[\"Error ERANGE:  pg_num 128 size 3 would mean 768 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)"16:41
fultonjyep16:42
fultonjanother "feature"16:42
cmoreyfixable?16:42
fultonjoh yeah definitely16:42
fultonjon sec16:42
fultonjcmorey: https://bugs.launchpad.net/tripleo/+bug/174954416:43
openstackLaunchpad bug 1749544 in tripleo "Overcloud deployment during ControllerDeployment_Step4 with ceph fails "ObjectNotFound: error opening pool 'metrics'\"," [High,Fix released] - Assigned to John Fulton (jfulton-org)16:43
fultonj    mon_max_pg_per_osd: 307216:44
fultonjis a BAD ^ idea16:44
fultonjif you care about your data16:44
fultonjit's all explained in the bug report above16:44
fultonjsee also https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_config.html#ceph-placement-group-validation16:45
fultonjthat will at least let you simulate the pool creation to check that your prosed configuration wont' fail going in16:46
fultonjcmorey: ^16:46
*** chandankumar is now known as raukadah16:47
cmoreyI really need to get my head around ceph16:47
fultonjhow many storage nodes do you have, just 1?16:48
fultonjwith one OSD?16:48
cmoreyat the moment16:49
fultonjso ceph makes copies of your data to keep it safe16:49
fultonjit can't make multiple copies if there's only one disk (OSD)16:50
cmoreyit's user data.... they know it's almost ephemeral16:50
*** bandini has quit IRC16:50
fultonjceph's default is 3 copies16:50
cmorey(but that's why it's on a storage array)16:50
fultonjtell it you only need 1 and it will stop comaining16:50
fultonjcomplaining16:50
cmoreyat the moment, i don't have any other nodes with extra disks16:51
cmoreyo.k. so if I tell it to keep 1 copy (preumsably via an override), the rest of the config (CephPoolDefaultSize, CephPoolDefaultPgNum and CephPools) will be fine?16:53
fultonjnot exactly16:54
fultonjthe pools were already created with the CephPoolDefaultSize which defaults to 316:55
fultonjyou can update the existing pools16:55
*** bandini has joined #tripleo16:55
fultonjfor P in vms volumes images; do ceph osd pool set $P size 1; done16:56
fultonjbut the ceph binary is in the container16:56
fultonjso16:56
fultonjon a controller16:56
fultonjpodman exec ceph-mon-$HOSTNAME16:56
fultonjpodman exec -ti ceph-mon-$HOSTNAME /bin/bash16:57
cmoreyif i'm going to be re-runing openstack overcloud deploy, do i need to nuke /dev/sdb, or are we past that stage?16:57
fultonjfirst so youc an use the ceph binary16:57
fultonjno need to nuke16:57
fultonjit's idempotent16:57
fultonjit will just see you have that osd and not try to create it16:57
fultonjso add 'CephPoolDefaultSize: 1' to your templates16:57
cmoreyfwiw, i'm trying to also have ceph_rgw running, as well, if that makes a difference16:58
fultonjCephPoolDefaultPgNum: 3216:58
fultonjwhich is very low16:58
fultonjthen set all pools which have already been created from 3 to 1 replicas16:58
fultonji'd say you're just trying ceph out on not the right stuff which is fine16:59
fultonjit was designed for lots of disks and will perform better that way16:59
fultonjbut you can deploy with just 1, you'll just need to reduce the defaults16:59
cmorey*nods* if i had the ability to go and plug extra disks in, i would16:59
gfidentefultonj fmount so I was thinking, if we end up using cephadm for ganesha17:00
cmoreyso, edit template, re-deploy, then run ceph osd pool set <volume> size 117:00
gfidenteand write some role in tripleo-ceph to create its config file17:00
gfidenteis this really worth a spec or would it rather go under the existing tripleo-cephadm spec17:00
gfidente?17:00
*** jaosorior has quit IRC17:01
gfidenteprobably yes just to mention it'll use a slightly different "workflow" ?17:01
*** marios|out has quit IRC17:01
fultonjcmorey:  ceph osd pool set <volume> size 117:02
fultonjthen edit template and redeploy17:02
fultonjelse redeploy will still fail with same message since those pools were already created17:03
fultonjdo it for all the pools you have17:03
fultonjyou're basically disabling ceph's data protection since you're short on disk17:03
cmoreyfultonj, one one pool, 'set pool 1 size to 1'17:03
fmountgfidente: yeah it's probably ok describe this workflow because it's a standalone service deployed by cephadm (to close the gap w/ ceph-ansible)17:04
fultonjpool number vs name17:04
* fultonj thought both worked17:04
cmoreythat was the output of .. set vms size 117:04
cmoreyi only have one pool17:04
cmoreymy default poolsize is already set to 117:05
cmoreyhttp://paste.openstack.org/show/800746/ <-- new env file17:06
fultonjgfidente: you could make the spec shorter then17:06
fultonjthe tripleo-ceph spec merged17:06
gfidentefultonj yeah I am avoiding work17:06
fultonjso the ganesha spec could become like an addendum to it17:07
fmount++17:07
fultonjperhaps a blueprint17:07
fmountthat's a good idea ^17:07
gfidentewe have the blueprints in launchpad for all three I think17:07
gfidenteI'll shorten it a little17:07
fultonjsince the spec exists perhaps just finish it but make it short as it needs to be17:07
fultonjso people dont' think we gave up on the project :)17:07
gfidenteyeah17:08
gfidentethanks17:08
fultonjgfidente: so you would deploy standalone ganesha before on the overcloud before running heat17:08
fultonjgfidente: and you'd do with the a correctly crafted cephadm spec file17:08
*** lmiccini has quit IRC17:09
fultonjcmorey: you've had 'CephPoolDefaultPgNum: 32' + 'CephDefaultPoolSize: 1' since day 1?17:09
cmoreyfultonj, i'm getting a warning from "openstack overcloud deploy" that the CephDefaultPOolSize is defined but not currently used in the deployment plan17:09
cmoreyfultonj, no, just CephDefaultPoolSize 117:09
fultonjah17:09
gfidentePOol17:09
*** dprince has joined #tripleo17:10
cmoreymy typo, sorry17:10
fultonjyeah so 128 is the default and it's too high17:10
cmorey"WARNING: Following parameter(s) are defined but not currently used in the deployment plan. These parameters may be valid but not in use due to the service or deployment configuration. BondInterfaceOvsOptions, SwiftRingPutTempurl, CephDefaultPoolSize, SwiftRingGetTempurl"17:10
fultonjceph octopus has pg auto scale so we won't have to do this dance in the future17:10
fultonjmaybe the templates aren't in the right order of -e's17:11
cmoreyhopefully next week i can add some more drives to the other servers and them17:11
cmorey"-e $THT/environments/ceph-ansible/ceph-ansible.yaml -e $THT/environments/ceph-ansible/ceph-rgw.yaml -e templates/c17:11
cmoreyeph-settings.yaml -e templates/local-config.yaml"17:11
cmoreyyou can ignore local-config, that's just SELinux17:12
cmoreybut the others are the ceph ones17:12
fultonjsadly my day off is starting17:12
cmoreyhave a good break.17:12
fultonjbut https://bugs.launchpad.net/tripleo/+bug/1749544 has the info on what you're running into17:13
openstackLaunchpad bug 1749544 in tripleo "Overcloud deployment during ControllerDeployment_Step4 with ceph fails "ObjectNotFound: error opening pool 'metrics'\"," [High,Fix released] - Assigned to John Fulton (jfulton-org)17:13
cmoreyshould i add the low-memory.yaml?17:13
cmorey(i've not looked at that yaml yet, )17:13
fultonjcmorey: no point17:14
fultonjhttps://review.opendev.org/c/openstack/tripleo-heat-templates/+/544588/4/environments/low-memory-usage.yaml17:14
cmoreyfultonj, ok.17:14
fultonjyou've already done the equivalent17:14
cmoreytrue17:14
cmoreythe nodes aren't short of memory (all overcloud nodes currently have 64G or more)17:15
* fultonj has to rip himself away from his computer and go play guitar17:16
cmoreythanks for your help fultonj17:16
* fultonj scheduled pto17:16
fultonjhappy hacking everyone17:16
*** fultonj has quit IRC17:16
openstackgerritMerged openstack/python-tripleoclient master: Dumping task key instead of tasks from validation_output  https://review.opendev.org/c/openstack/python-tripleoclient/+/76493717:18
weshay|ruckmwhahaha, for a happier gate this should help https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/76240017:19
*** rlandy is now known as rlandy|brb17:20
openstackgerritFrancesco Pantano proposed openstack/tripleo-heat-templates master: Remove /etc/ceph dependency and add tripleo_ceph_client role  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76354217:20
slaglehjensas: check my comment on https://review.opendev.org/c/openstack/tripleo-ansible/+/761908, i figured out where the Exception was coming from17:20
*** jpich has quit IRC17:21
slaglecode assumes "subnet" key is on the instance networks, but the examples in the spec do not show up. i think it might be needed though, so you know what subnet to use to provision the port17:22
hjensasslagle: ah, right. It's should only be required if there is more than one subnet. I will update the patch.17:25
hjensasslagle: or, wdyt should we just require the key to be set? For spine-leaf/edge we need it to know where to create the port. But for L2 we can assume only one subnet and get it of the network.17:27
slaglei think i'd prefer to not require it if it's not necessary17:29
hjensasslagle: ok, we have reached agreement. The feedback is much appriciated! Thanks!17:30
openstackgerritGiulio Fidente proposed openstack/tripleo-specs master: Introduce tripleo-ceph-ganesha spec  https://review.opendev.org/c/openstack/tripleo-specs/+/75939517:39
*** gfidente is now known as gfidente|afk17:41
slaglehjensas: i added the subnet key to my yaml for now, and I got all my ports provisioned. :)17:44
slagleabout to try the heat stack now17:44
*** rlandy|brb is now known as rlandy17:47
*** beekneemech has quit IRC17:49
*** jpena is now known as jpena|off17:59
*** derekh has quit IRC18:04
cmorey:( ceph still is unhappy, i think it deployed, but won't pass.. http://paste.openstack.org/show/800749/18:05
mwhahahayou can specify the max count if you are deploying only a single node18:06
mwhahahawe do it in ci i think18:06
cmoreymax count of?18:06
mwhahahaCephPoolDefaultSize i think18:06
cmoreymwhahaha, got that included (i hope)18:07
cmorey PGs were not reported as active+clean\", \"It is possible that the cluster has less OSDs than the replica configuration\", \"Will refuse to continue\" is the error18:07
* mwhahaha shrugs18:07
cmoreyhttps://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/? might be missing things18:08
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Network ports module  https://review.opendev.org/c/openstack/tripleo-ansible/+/76190818:12
hjensasslagle: ^^ should take care of the subnet key error?18:13
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Introduce role/instance 'networks' key  https://review.opendev.org/c/openstack/tripleo-ansible/+/76216018:19
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Populate network ports env module  https://review.opendev.org/c/openstack/tripleo-ansible/+/76463818:19
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Add role_net_map to expand roles output  https://review.opendev.org/c/openstack/tripleo-ansible/+/76463918:19
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Provison/Unprovision instance network ports  https://review.opendev.org/c/openstack/tripleo-ansible/+/76464018:19
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Provision workflow managed/unmanaged node support  https://review.opendev.org/c/openstack/tripleo-ansible/+/76216218:19
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook  https://review.opendev.org/c/openstack/tripleo-ansible/+/76554518:19
*** raildo has quit IRC18:21
cmoreymwhahaha, is there a way to render the output of all of the -e options?18:22
mwhahahaNo18:23
cmorey:(18:23
mwhahahaIt might be in the stack itself18:23
mwhahahaAfter you tried a deploy18:24
mwhahahaBut I don't have info off the top of my head18:24
cmoreyit looks like it ignored 'CephDefaultPoolSize: 1'18:24
*** bandini has quit IRC18:27
cmoreydoh18:28
cmoreybecause the tht has CephPoolDefaultSize18:29
cmoreynot CephDefaultPoolSize18:29
* cmorey bangs his head on the desk18:30
*** karthiks has quit IRC18:35
*** raildo has joined #tripleo18:43
*** raildo has quit IRC18:45
*** raildo has joined #tripleo18:48
*** jbadiapa has quit IRC18:51
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Network ports module  https://review.opendev.org/c/openstack/tripleo-ansible/+/76190819:04
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Introduce role/instance 'networks' key  https://review.opendev.org/c/openstack/tripleo-ansible/+/76216019:04
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Populate network ports env module  https://review.opendev.org/c/openstack/tripleo-ansible/+/76463819:04
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Add role_net_map to expand roles output  https://review.opendev.org/c/openstack/tripleo-ansible/+/76463919:04
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Provison/Unprovision instance network ports  https://review.opendev.org/c/openstack/tripleo-ansible/+/76464019:04
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Provision workflow managed/unmanaged node support  https://review.opendev.org/c/openstack/tripleo-ansible/+/76216219:04
openstackgerritHarald Jensås proposed openstack/tripleo-ansible master: Support for unmanaged servers in provision playbook  https://review.opendev.org/c/openstack/tripleo-ansible/+/76554519:04
hjensasETOOMANYPATCHESTOAPPLYINMYLAB : can we merge https://review.opendev.org/c/openstack/tripleo-ansible/+/763377 please?19:09
cmoreymwhahaha, works now...19:12
mwhahahak19:12
hjensasmwhahaha: tusen takk!19:13
*** jcapitao has quit IRC19:14
slaglehjensas: do I still define the fixed ip for the vips with the Heat parameters?19:14
slaglehjensas: i'm thinking no, b/c then there's no port created for them19:14
cmoreymwhahaha, thanks for your help, (and if you see fultonj, please pass it on)19:14
hjensasslagle: yes, the VIP ports are still created by heat. I did'nt get to those yet. It'll be easy for the network VIP, but we have service VIPs for ovn, redis etc which I'm not sure how we should manage.19:16
slagleok, that's fine. i figured i was missing something :)19:17
*** pcaruana has quit IRC19:22
openstackgerritAdriano Petrich proposed openstack/tripleo-common master: Fix localization for horizon container  https://review.opendev.org/c/openstack/tripleo-common/+/76375319:22
*** cmorey has quit IRC19:23
openstackgerritHarald Jensås proposed openstack/python-tripleoclient master: Add '--network-ports' option to node (un)provision  https://review.opendev.org/c/openstack/python-tripleoclient/+/76481019:36
*** jfrancoa has quit IRC20:03
openstackgerritmbu proposed openstack/tripleo-validations master: Generate inventory without any overcloud  https://review.opendev.org/c/openstack/tripleo-validations/+/76495520:04
openstackgerritFrancesco Pantano proposed openstack/tripleo-puppet-elements master: [WIP] - Include cephadm in the overcloud image  https://review.opendev.org/c/openstack/tripleo-puppet-elements/+/76551220:26
openstackgerritMerged openstack/tripleo-heat-templates stable/train: Set correct default NovaLibvirtCPUMode  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76431420:31
openstackgerritMerged openstack/tripleo-heat-templates master: Ensure cloud-init has finished before puppet run  https://review.opendev.org/c/openstack/tripleo-heat-templates/+/76494320:31
*** belmoreira has quit IRC20:46
*** dciabrin_ has joined #tripleo20:50
*** jamesdenton has quit IRC20:51
*** jamesdenton has joined #tripleo20:51
*** dciabrin has quit IRC20:52
openstackgerritmbu proposed openstack/python-tripleoclient master: Move extra vars cli option to narg type  https://review.opendev.org/c/openstack/python-tripleoclient/+/76513720:59
*** rlandy has quit IRC21:04
*** rfolco has quit IRC21:07
*** dprince has quit IRC21:28
*** TrevorV has quit IRC21:36
*** ccamacho has quit IRC21:54
*** raildo has quit IRC21:59
*** tkajinam has joined #tripleo22:00
*** gfidente|afk has quit IRC22:11
*** tmazur has quit IRC22:16
*** ekultails has left #tripleo22:29
*** jamesdenton has quit IRC22:40
*** jamesdenton has joined #tripleo22:40
*** pleimer_ has quit IRC22:46
*** rfolco has joined #tripleo23:01
*** xek_ has quit IRC23:02
*** supamatt has joined #tripleo23:05
*** rfolco has quit IRC23:06
*** cylopez has quit IRC23:18
*** ccamacho has joined #tripleo23:22
*** Goneri has quit IRC23:50
*** tosky has quit IRC23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!