Friday, 2025-07-04

*** ykarel__ is now known as ykarel		04:10
opendevreview	Stephen Finucane proposed openstack/nova master: api: Return 404 on bad project ID for 'os-quota-sets' https://review.opendev.org/c/openstack/nova/+/937125	09:58
stephenfin	gibi: sean-k-mooney: Can we get https://review.opendev.org/c/openstack/nova/+/952266 in? It's rather annoying when running unit tests 😅	10:00
gibi	+2	10:16
stephenfin	ty	10:17
opendevreview	Stephen Finucane proposed openstack/nova master: api: Deprecate v2 API https://review.opendev.org/c/openstack/nova/+/954102	10:51
opendevreview	Stephen Finucane proposed openstack/nova master: WIP: api: Remove version controller split https://review.opendev.org/c/openstack/nova/+/954103	10:51
sean-k-mooney	stephenfin: sure	11:05
opendevreview	Biser Milanov proposed openstack/nova stable/2024.1: Hardcode the use of iothreads for KVM. https://review.opendev.org/c/openstack/nova/+/954113	11:25
opendevreview	Biser Milanov proposed openstack/nova stable/2024.1: Hardcode the use of iothreads for KVM. https://review.opendev.org/c/openstack/nova/+/954113	11:28
opendevreview	Biser Milanov proposed openstack/nova stable/2024.1: StorPool: Pass the instance UUID and device_name to os-brick https://review.opendev.org/c/openstack/nova/+/954115	11:29
sp-bmilanov	sean-k-mooney: Hi, you can ignore the backport chain : https://review.opendev.org/c/openstack/nova/+/954115 , it is not meant to actually be merged	11:30
sean-k-mooney	ack well the master version is not appliable either	11:30
sp-bmilanov	yep, I am aware there's an effort to go about adding iothreads another way	11:31
sean-k-mooney	we are not gong to make this a per host config option	11:31
sean-k-mooney	masahito: by the way even if we are pass the spec freeze i would encurrage you to work on a poc of the iothread and queue patches based on the current version fo hte spec, while we may not have time to do extensive reivew of it if you have a working implemation with tempest test ectra at the sart of the new cycle in october i think it would be reaosnbale ot avocate for	12:50
sean-k-mooney	reiviewing and completign the overall functionatliy early in the cycle.	12:50
sean-k-mooney	depending on how early it was done you may even have time to look at the next steps although that less likely to happen in time for 2026.1	12:51
gibi	fyi I see significant failure rate in our multi node gate due to rabbit is not accessible from compute1 https://bugs.launchpad.net/nova/+bug/2115980 It is started 3 days ago	12:51
sean-k-mooney	gibi: ya so that a zuul issue	12:52
gibi	is it a tracker for it from infra perspective?	12:52
sean-k-mooney	gibi: tl;dr in the old impelation nodesets were not allowe to be provisioned form differnt providers	12:52
sean-k-mooney	in the new one it can but that breaks our jobs beacue they use the local ips since it expect all the vms to be on the same neutron network	12:52
sean-k-mooney	gibi: in terms of tracker i am not sure but dansmith and clarkb where talkign about it last night	12:53
gibi	OK, reading back then...	12:53
gibi	thanks	12:53
sean-k-mooney	they know that his is happenign and i think they are looking to disbale that feature	12:53
sean-k-mooney	but i dont know if that change has been doen yet, the openstack-infra channle is proably hte best plase to follow up	12:54
gibi	ack, I've pinged infra about it	12:57
sean-k-mooney	i belive this is basicly just a boolean we need to flip for our tenant. long term we coudl refactor the multi node role that we use in the devstack job to use the floating ips and streach the deploymnt over the wan but that is likely to cause other issues so i dont think that will pan out	12:59
gibi	14:59 < fungi> gibi: there's https://review.opendev.org/954064 which we should be auto-upgrading to within the next 24 hours	13:01
gibi	so fix is in the pipe	13:01
sean-k-mooney	cool	13:01
fungi	yeah, we automatically upgrade our zuul servers through rolling restarts over the weekend	13:02
sean-k-mooney	fungi: i woudl still suggest that it shoudl be posisble to disable the ablity to use multiple providers at the nodeset or tenant level	13:03
fungi	booting a node in another provider for a multi-node job should only ever be a fallback in cases where the job would have otherwise been reported as a node failure	13:03
fungi	the fact that it was happening in recent cases was a logic bug, which that change addresses	13:04
sean-k-mooney	fungi: right but we know our multi node devstack jobs will never work in that case	13:04
sean-k-mooney	so woudl it not be better to allow declaring if split providers can be tollerated in a job/build set level	13:04
fungi	disabling the fallback would just mean always reporting node_failure in those cases rather than trying anyway and possibly failing for other reasons	13:04
sean-k-mooney	right which may actully be a beter develoepr experince	13:05
sean-k-mooney	it mean i dont havee to debug why it fail jsut to see it was mulit provider	13:05
fungi	so the job would still fail, but yes maybe the benefit is that it returns sooner, doesn't waste as many resources, and gives a clearer failure reason for those	13:06
sean-k-mooney	if we modify our devstack josb to alwasy use publicly routabel ips for all our networking that would be one thing	13:06
sean-k-mooney	fungi: for what its worth i am happy the multi provider capablity was finally added i just wish is was more contolable	13:07
fungi	yeah, but also the fact that two nodes boot in the same provider region doesn't guarantee low latency between them	13:08
fungi	anyway, once we're running on the referenced change, the incidence of this particular failure should be on par with prior frequency of node_failure results in those jobs, which i hope was exceedingly rare to begin with	13:09
sean-k-mooney	fungi: latency is not really the concern we do expect routableity between the nodes however using there local ips	13:10
fungi	ah, yeah also booting in the same provider region doesn't guarantee that their local addresses can reach each other, though it's a relatively safe assumption	13:10
sean-k-mooney	fungi: well that demped how you conifuged the provider	13:13
sean-k-mooney	specificly you can configure the subnet and i think the network in the provider section	13:14
fungi	right. in many providers we, as the users, aren't configuring that	13:14
sean-k-mooney	to ensure that all node provided by that guarentee that	13:14
sean-k-mooney	in which case we are assumign there is only one netowrk and that is beign used by default	13:15
sean-k-mooney	which si ok but not ideal	13:15
fungi	a lot of openstack public clouds do shared provider networks for the server ports, but yes those can generally all still reach each other even in cases where they allocate out of disjoint networks (they just bounce off a gateway address)	13:15
opendevreview	Merged openstack/nova master: db: Resolve alembic deprecation warning https://review.opendev.org/c/openstack/nova/+/952266	13:16
sean-k-mooney	yep so our current multi jobs all depend on that today	13:16
sean-k-mooney	ideally we woudl be able to express that on the nodeset definitoin "node_must_be_routeable" or somethign like that, i assuem we cant assuem all nodes have ipv6 yet	13:18
sean-k-mooney	if we coudl we coudl have the exisitng multi node role that create the tunnel messh do that over ipv6 and then use the ipv4 addreses it provides for our jobs	13:18
sean-k-mooney	i.e. instead of using the ip form the default route use the one for the multi node bridge	13:19
sean-k-mooney	im referint ot https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-bridge just ot be clear	13:19
sean-k-mooney	if we could rely on that mesh working across providers we woudl not have to care for the most part as that vxlan tunnle should resolve any firewall issues we have	13:20
sean-k-mooney	although runing ceph rbd over that likely wotn be a fun time	13:21
sean-k-mooney	lets just see how things go once that patch lands	13:21
fungi	another option is an early acceptance check in the job to confirm the nodes are able to reach one another satisfactorally over the interfaces it wants them to use, and then the nodeset can be discarded and the build automatically retried if not. we do something similar in a common role to make sure nodes are able to reach internet resources, for example	13:24
sean-k-mooney	ya thats not a bad idea we coudl do it as a pre-playbook that would alow the job to potetally rescdhule to a diffetn node set	13:28
sean-k-mooney	thats honestly the best of both worlds as we may be able to avoid recheckign	13:28
sean-k-mooney	and if we do it as a role we coudl selectivly add it to the jobs that actully need it	13:28
sean-k-mooney	that may end up beign all the multi node jobs but perhaps not	13:29
opendevreview	Takashi Natsume proposed openstack/nova master: Update contributor guide for 2025.2 Flamingo https://review.opendev.org/c/openstack/nova/+/944603	14:17
gboutry	Hello nova, is there anything else required from this review (a part from reviews): https://review.opendev.org/c/openstack/nova/+/953737? I'd be quite interested for the backport in 2025.1	14:19
gibi	gboutry: no, I think we just waiting for reviews there	14:27
gibi	but the US is out today and our resident stable branch reviewer elodilles is on PTO	14:28
gibi	bauzas: sean-k-mooney: ^^ could you hit https://review.opendev.org/c/openstack/nova/+/953737	14:28
sean-k-mooney	oh the backport sure	14:28
sean-k-mooney	done	14:30
sean-k-mooney	stephenfin: ^ if your still here could you also look at that	14:30
gibi	thanks	14:30
gboutry	Thank you very much!	14:49
opendevreview	Balazs Gibizer proposed openstack/nova master: [pci]Keep used dev in Placement regardless of dev_spec https://review.opendev.org/c/openstack/nova/+/954149	15:11
sean-k-mooney	gibi: so i did somethign similr to that for the pci device table a long time ago	15:14
sean-k-mooney	gibi: is that only for the placment side or was there a regression on the device table as well	15:14
gibi	it is only the PCI in Placement isde	15:14
gibi	sdie	15:14
gibi	side	15:14
sean-k-mooney	ack	15:14
sean-k-mooney	i guess we missed that edge case so in the orginal implemeation	15:15
gibi	btw the pci tracker side is also a bit buggy as after the inconsistency handled by deleting the VM the device that is not in the devspec any more is not marked deleted until the compute is restarted again	15:15
gibi	we had a bug in the edge case handling and we have an edge case of an edge case during VM deletion that was not really handled well	15:16
sean-k-mooney	that becasue of the cacheing we have i think	15:16
gibi	yepp, we don't re-read the hypervisor view and remove the dev	15:16
sean-k-mooney	right	15:16
sean-k-mooney	so the actual correct behavior is to refuse to start the comptue at all	15:16
sean-k-mooney	i.e. if a deivce is in use by a vm and you remvoe it form the dev spec	15:17
sean-k-mooney	that shoudl be a hard error	15:17
gibi	we can change to that but that is a hard block if one PCI device died	15:17
sean-k-mooney	but we didnt want to do that when we were fixing the orginal bug	15:17
gibi	I have to drop for the weekend but I have to go back to the bugfix to do the proper tests anyhow next week	15:18
gibi	o/	15:18
sean-k-mooney	gibi: yep although if we chang this now we probaly shoudl add a workaroudn flag	15:18
sean-k-mooney	gibi: o/	15:18
gmaan	sean-k-mooney: you asked me to remind about manager role series review, whenever you have time https://review.opendev.org/q/topic:%22bp/policy-manager-role-default%22+status:open	15:19
gmaan	I migrated all existing migration (live and cold) tempest tests to use manager role, changes are linked in same topic	15:20
gmaan	one tempest change adding abort/force complete test is in progress which I am working in parallel but those things are covered in nova side tests also but I just want to write tempest test also if we can	15:21
sean-k-mooney	cool, im trying to start wrappign up for the week so i likely wont get to it till monday	15:22
gmaan	no issue, thanks	15:22
sean-k-mooney	https://review.opendev.org/c/openstack/tempest/+/953847/6/tempest/lib/api_schema/response/compute/v2_1/servers.py so we just didnt have tempest schema for that before?	15:23
sean-k-mooney	i guess tempest never tested those api in the past then?	15:24
sean-k-mooney	i feel like any test that try to use abort/force complete woudl eb racy	15:24
sean-k-mooney	unless you plannig to add somthign to slow down the migration by creating memory load	15:25
sean-k-mooney	btu even then that seam hard to properly test	15:25
gmaan	yeah, we did not had tests/schema for list migration (in-progress live migration list), tempest has migration list (list all migration) schema though	15:26
gmaan	yeah, I am trying in try block if migration is still going on then perform the force complet/abort	15:27
gmaan	I know that is best we can test and log if test is not able to perform the operation	15:27
gmaan	I am not 100% convince if that will help or make a good tempest test but in some cases it can perform the things it want	15:28
gmaan	for abort, it is like if test able to perform the abort migration it pass otherwise skip test saying 'abort is not performed bcz migration was completed before that'	15:31
opendevreview	Merged openstack/nova master: Replace utils.spawn_n with spawn https://review.opendev.org/c/openstack/nova/+/948076	18:00

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!