Wednesday, 2024-03-27

opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Do not define a random password for each run
jrosseri think we might have circular dependancies for skyline08:49
jrosserlinters fail for this
jrosserwhich appears to be fixed by
jrosserbut that depends-on the first one08:49
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Use container setup role from plugins repo
noonedeadpunkjrosser: yeah, you're right I guess.... though I hoped they were fixed by 91233309:59
jrosserit might be the difference between 9 and "9" in the meta/main.yml10:00
jrosserbut it is not good how lint passes on 912333 but fails on 85944610:02
noonedeadpunkyeah, though I hoped that meta is not verified in integrted repo10:03
noonedeadpunkwe can actually make another patch I guess to cover this meta thing10:04
jrosseroh you mean just merge that first into the os_skyline?10:04
jrosseri can make a patch for that10:06
noonedeadpunkjrosser: do you remember how you workarounded unsafe condition previously for python_venv_build?10:11
noonedeadpunkas seems we have common thing here
jrosseryeah i added `use: "{{ ansible_facts['pkg_mgr'] }}"` to the `package` module10:12
jrosserwhich makes it then not try to template that ^^ inside the action plugin10:13
noonedeadpunkok, it's different then I assume10:15
jrosserits related but yes different10:15
jrosserlooks lke that totally deserves a new bug10:15
jrosseras it's a different side-effect of the same change to the types they made10:16
jrosseroh well perhaps its not, as it says `[WARNING]: conditional statements should not include jinja2 templating`10:17
jrosserright before it fails10:17
noonedeadpunkyeah, true, though I would assume that it could be fine for assert... anyway will check 10:19
jrosserhow did this not fail before10:20
jrosseroh that job is maybe conditional10:20
noonedeadpunkwill check how to transform10:21
jrosseri think i found it pretty hard to escape from having an AnsibleUnsafeText10:25
jrosserand the | type_debug thing was pretty handy when trying to work out what was happening10:26
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_skyline master: Add quotes for EL version in meta/main.yml
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability
jrosserurgh `[MIRROR] systemd-devel-252-32.el9.x86_64.rpm: Status code: 404 for (IP:`10:32
jrosserand more `fatal: [localhost]: FAILED! => {"attempts": 5, "changed": false, "msg": "Failed to download packages: pcp-6.2.0-2.el9.x86_64: Cannot download, all mirrors were already tried without success", "results": []}`10:34
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove Jinja from conditions
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove Jinja from conditions
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove Jinja from conditions
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability
noonedeadpunkok, but now that should pass, I assume10:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Add EL distro support
jrosseri hope so10:44
opendevreviewJames Denton proposed openstack/openstack-ansible-os_skyline master: Support large uploads via Skyline
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Install skyline-console through yarn
gokhanhello folks, after upgrade to antelope, live migration is not working. I can migrate but live migrate is not 10:44
gokhanwhen I check with nova user on compute hosts, I can not ssh between compute hosts. It waits when trying to ssh  10:45
noonedeadpunkiirc, what changed in terms of live migration, is that libvirt should be service tls since antelope10:45
noonedeadpunkand also ports were changed iirc10:46
noonedeadpunkregarding SSH - it's done with SSH certs right now, but SSH is used only for offline migration10:46
gokhanmy env is without ssl :( 10:46
noonedeadpunkIIRC tunnel migrations were deprecated10:47
jrosserssl between haproxy and the api is not the same as live migtation over tls between libvirt<>libvirt10:47
noonedeadpunkso, we have that by default:
noonedeadpunkbut we kept old logic there actually:
noonedeadpunkthough I guess it's broken?10:49
noonedeadpunkas keyfile is probably wrong?10:49
noonedeadpunkhuh, let me check quickly10:49
gokhanok thanks jrosser fyi10:49
gokhannoonedeadpunk, my libvirt config is in nova
noonedeadpunkyeah, ok, so you should be running live migrations through TLS10:51
noonedeadpunkif you're using firewall, just ensure that required ports are open, as they're different10:53
noonedeadpunkso it should be 16514 and 49152:49215 for management network10:54
noonedeadpunkI again forgot how ssh certs are working /o\10:58
noonedeadpunk ssh 10.X.X.X -l nova -i /var/lib/nova/.ssh/id_rsa throws Permission denied (publickey)10:58
gokhannova conductor throws no valid host found error
noonedeadpunkok, well, that is different10:59
gokhanall compute service is up 11:00
noonedeadpunkanything in scheduler logs?11:02
noonedeadpunklike some "info" of why11:02
gokhannoonedeadpunk, Mar 27 10:35:37 dev-infra3-nova-api-container-d4035159 nova-scheduler[65766]: 2024-03-27 10:35:37.376 65766 INFO nova.scheduler.host_manager [req-15e24c17-4b77-4911-a4a5-f99315dcc173 req-e06bd0de-9490-44ec-8c9a-ca75321d8744 4841276dcdbe4ab096ef60b1744c4fa9 f2e52a5c5d1c4ca1b51274619b517e0e - - default default] Host filter only checking host dev-compute1 and node dev-compute111:04
gokhanMar 27 10:35:37 dev-infra3-nova-api-container-d4035159 nova-scheduler[65766]: 2024-03-27 10:35:37.377 65766 INFO nova.scheduler.host_manager [req-15e24c17-4b77-4911-a4a5-f99315dcc173 req-e06bd0de-9490-44ec-8c9a-ca75321d8744 4841276dcdbe4ab096ef60b1744c4fa9 f2e52a5c5d1c4ca1b51274619b517e0e - - default default] Host filter ignoring hosts: dev-compute111:04
noonedeadpunkWell, I guess it's worth checking what is special about the instance or host - maybe some aggregates, AZs, server groups...11:07
opendevreviewMerged openstack/openstack-ansible-os_nova stable/2023.2: Ensure nova_device_spec is templated as JSON string
noonedeadpunkok, it was my internal issue not being able to login via ssh :)11:18
noonedeadpunk(as nova user)11:18
gokhannoonedeadpunk, I can't ssh to compute nodes with command which you sent 11:19
noonedeadpunkactually this reminds me of some bug11:19
noonedeadpunkgokhan: SSH is used only for offline migration with the config you have11:21
noonedeadpunkfor online migration it is not needed at all11:21
noonedeadpunkand you obviously have some scheduling issue rather then anything else11:21
gokhannoonedeadpunk, ok so this is different.11:22
gokhannoonedeadpunk, I am using default scheduler filter in osa 11:23
noonedeadpunkI guess the question here is, why `Host filter ignoring hosts: dev-compute1`11:24
noonedeadpunkas that's the only candidate11:24
noonedeadpunkand it could be quite some reasons actually11:25
gokhanin nova config resizeonsamehost is true 11:25
jrossernoonedeadpunk: i just also checked the nova<>nova ssh and it works for me here11:25
noonedeadpunkyeah, just our internal mess dropped Include /etc/ssh/sshd_config.d/*.conf out of sshd_config11:26
gokhanlive migration is not working in all my antelope envs 11:28
gokhanfilters are there
jrossergokhan: i think you need to do debugging to find the root cause in your specific environment11:31
jrosserit's not possible to understand just from the config what is happening11:31
gokhanjrosser, I think it is about scheduling. I debugged nova scheduler but I didn't find root cause. In scheduler logs it lastly says There are 0 hosts available but 1 instances requested to build. these are scheduler logs
noonedeadpunkyeah, so `Host filter ignoring hosts: dev-compute1`11:46
noonedeadpunkso it's a specific filter that filtered out it11:46
opendevreviewMerged openstack/openstack-ansible-os_horizon stable/2023.2: Do not change mode of files recursively
noonedeadpunkquestion - do you provide destination for the vm explicitly?11:47
gokhannoonedeadpunk, yes I am giving destination explicitly 11:47
noonedeadpunkand what if you don't?:)11:48
noonedeadpunksame - no hosts available?11:48
jrosseryou can look in the nova compute log on the dev-compute1 to see if the migration was rejected there11:49
jrosserand you can also look in placement logs to see if the resource claim is rejected there11:50
gokhannoonedeadpunk, yes similar error :( Host filter ignoring hosts: dev-compute3, dev-compute1, dev-compute211:50
jrossergokhan: are you migrating between antelope hosts? or is this migration part of your upgrade to clean out / upgrade compute nodes?11:51
noonedeadpunkwell, from what I do see in nova code - there should be ignored host somehow explicitly to get filtered in a way it does
noonedeadpunkI do recall one possible bug in nova, but it was sorted quite some time ago11:51
jrosserthere is this
noonedeadpunknah, this one is kernel issue iirc11:52
noonedeadpunkit's like specific set of cpu and kernel - at least that's what we had11:53
noonedeadpunkwill try to find what I'm talking about11:53
gokhanjrosser, yes now I checked nova-compute and it throws this bug. it is about cpu doesn2t have compability 11:54
gokhanjrosser, yes  this is migration part of upgrade to clean out / upgrade compute nodes11:55
noonedeadpunkjrosser: so, there's regression between kernel 3.14 and 3.17 where with Intel Gold kernel announces an extra cpu flag regardless of what's requested or present in cpu_map11:56
noonedeadpunkso migrating vms from some old E5 to Gold is one way ticket, unless you have hwe kernel11:56
jrosserahhhh ok we would not have come across that11:57
jrosserand on focal i think we were running HWE11:57
jrosserfor antelope we needed to patch nova for migrations to work
gokhanwe are running also hwe on focal 11:58
jrossergokhan: it is pretty much normal for us to test this completely in a lab and the result of that is patched versions of a few services11:59
jrosseri think it's been a while since a completely stock install worked for us12:00
noonedeadpunkhuh, it does for us mostly...12:01
gokhanjrosser, so I need this commit
jrossernova/neutron/magnum/keystone we need to fork currently12:03
jrossergokhan: perhaps - this is not anything i guarantee for you12:03
jrosserthats based on our tests in our lab]12:03
gokhanjrosser, I will test it now in test env12:04
jrosserwe discussed this with the nova team some time ago in their weekly meeting, and there suggestion for a quick-fix was to make that reverty12:05
jrosserwhere "we" is my team here12:05
jrossernot OSA12:06
gokhanjrosser, it worked and now I can migrate thanks.12:09
jrosserit is probably worth replying to the bug to say that you are also affected12:11
gokhanjrosser, yes I am replying now. 12:12
gokhanthanks jrosser noonedeadpunk for your help :)12:19
gokhanjrosser, rebuilf rom volume based image is also not working. It seem you also fixed this problem.
jrossergokhan: ah well yes but i think you really have noonedeadpunk to thank there
jrosserand that looks to be fixed in nova 2023.1 anyway now, so would be good idea to see why you don't have that already12:30
gokhanjrosser, I am getting again Image 4ef6efed-3ee1-4360-b366-3c6d69eedc09 is unacceptable: Unable to rebuild with a different image for a volume-backed server.12:34
gokhanI think this feature is added in zed but now it is not working 12:34
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Use container setup role from plugins repo
nixbuilderAnyone run across this after installing with 28.0.1???
nixbuilderIt's rabbitmq errors :-(13:04
jrosserideally you'd be using 28.1.013:06
nixbuilderjrosser: OK... will try that!13:07
jrosserthough i don't specifically expect a magic fix there, just that it's the latest tagged release13:07
noonedeadpunkit feels that it tries to reach rabbit on tls port without tls or smth like that13:09
noonedeadpunkSo yeah. I wouldn't put too much on 28.1.0 in this case13:09
jrosserso in my rabbitmq.conf i have management.ssl.versions.1 = tlsv1.213:10
jrosserand then in my nova.conf the rabbitmq connection string ends `@`13:11
jrossernixbuilder: ^ so you can see there that ssl is enabled and the tls version is matching at both ends13:12
jrosserthats probably a good place to start debugging whats happening13:12
noonedeadpunkjrosser: finally found bug I was talking about /o\
nixbuilderjrosser: Thanks for the tip.. I am re-installing with 28.1.0 but I imagine I will have to debug my issue after my latest system comes back up!13:14
noonedeadpunkit was time when I spent like 2 days trying to build a kernel patch to live-apply through kpatch, but failed in building even kernel itself lol13:15
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Define amount of gunicorn workers through config
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: [Feature] Add skyline deployment capability
noonedeadpunkok, this one passed ^ but I had to update it a bit as yarn jobs were in retry_limit13:37
noonedeadpunkand this one also looks like good to go
jrosserrelation chain on some of the skyline patches is pretty messed up13:40
jrosseri dare not try to rebase that13:40
jrosserthis in particular
jrosseri'm always wary when the relation chain on the patch is pretty different to how it looks on other patches13:42
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Do not define a random password for each run
noonedeadpunkthere're just couple of changes on top of as the one which should run at least playbooks towards skyline role13:44
noonedeadpunkstill not good enough testing, but at least role gets executed13:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Place sysctl options to it's own file
opendevreviewMerged openstack/openstack-ansible-openstack_hosts unmaintained/wallaby: Update .gitreview for unmaintained/wallaby
noonedeadpunkI;m looking through apt_key patches and see how ppl just rush to irc/launchpad filling bugs about failed deployments...14:00
noonedeadpunkwhile indeed that might make most sense right now14:01
jrosserbecasue the data structure for confgiruing custom repos pretty much changes?14:02
opendevreviewMerged openstack/openstack-ansible-openstack_hosts unmaintained/xena: Update .gitreview for unmaintained/xena
noonedeadpunkand not always obvious14:03
jrosserno, though the new module is pretty neat and there is an opportunity to do a bunch of tidy up14:03
jrosserit is good to get feedback / review though, and i also think that people will trip over this14:03
noonedeadpunkI was thinking if we can get some pre-upgrade check or smth to fail early14:04
noonedeadpunkor dunno14:04
jrosserwe could do that if we changed the name of the var perhaps14:04
jrosserthen assert old var is not defined14:04
noonedeadpunkwell, I tried to do some wiring, but then realized it's also so different for 3 repos in topic14:05
noonedeadpunkand vars are same for centos/ubuntu which is another complication14:06
noonedeadpunkwe could probably to check for some required keys in the list14:07
jrosserwell perhaps this is then "all or nothing"14:07
noonedeadpunkbut I' not sure what is required except name14:07
jrosserthat we need a big transition to the new module, everywhere14:07
jrosseri realised pretty early that there was no way to retrofit the old data structure to the new module14:08
jrosserbut the new module is so flexible that pretty much any possibility is covered if we just expose all the module params and put in the right places for overrides14:09
jrosserit would make it possible to no longer need to pre-stage apt keys for internal repos for example, they could be in data14:10
noonedeadpunkyeah, I know and kinda agree on that... but also I'm quite sure that at least 2 times I will be pinged about that itnerally after upgrade failures14:28
jrosseri also didnt make a releasenote for this as it felt like one overall one was needed14:30
jrosser^ in the individual patches14:30
noonedeadpunk Ithink you did?14:30
jrosseroh did i?! long ago :_14:30
noonedeadpunkor well, not everywhere, but somewhere14:30
noonedeadpunkok, so only openstack_hosts is not covered :D14:31
noonedeadpunkand debian bullseye seems not liking that at all15:03
noonedeadpunksame for mariadb15:04
noonedeadpunksounds like bullseye expects smth different for signed-by then should be15:08
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Implement installation method selection for MariaDB role
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server master: Implement installation method selection for MariaDB role
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_skyline master: Install skyline-console through yarn
noonedeadpunkso this looks quite good now from what I can tell
noonedeadpunkthe biggest question if 914405 pass :D17:12
jrosserthis is really cool stuff17:20
jrosserdamiandabrowski: mgariepy NeilHanlon is anyone around to look at ?17:22
* NeilHanlon is around17:23
jrossercool, thanks!17:24
NeilHanlonof course :) 17:25
NeilHanlonbackport candidate? 17:25
noonedeadpunkor well... dunno17:26
noonedeadpunkthat's good question actually17:26
noonedeadpunkas we probably should bump ansible version on stable braches to cover cve17:26
NeilHanloni mean, probably not incredibly urgent17:28
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_neutron master: Fix multiline yaml formatting in neutron systemd services
* jrosser throws rocks at ansible lint :/17:32
noonedeadpunkwell, we can at least on 2023.217:33
jrosser914544 can be checked with something like this 91454417:33
jrossernoonedeadpunk: out of interest, where does the actual skyline-console wheel get built?18:19
jrosseroh i see it gives skyline_console_yarn_build_path as the source location for skyline-console to python_venv_build role18:23
opendevreviewMerged openstack/openstack-ansible master: Remove Jinja from conditions

Generated by 2.17.3 by Marius Gedminas - find it at!