opendevreview | Amit Uniyal proposed openstack/nova master: Allow swap resize from non-zero to zero https://review.opendev.org/c/openstack/nova/+/857339 | 07:52 |
---|---|---|
dvo-plv_ | Hello, All Could you pelase help me with file naming | 08:09 |
dvo-plv_ | opt/stack/glance/releasenotes/notes/zed-milestone-3-3e38697ae4677a81.yaml where should I get hash ( zed-milestone-....yaml) in this file ? | 08:09 |
sean-k-mooney | its auto geneerated using a tool | 08:31 |
sean-k-mooney | tox -e venv reno new zed-milestone-3 | 08:31 |
sean-k-mooney | reno is the tool we use to create releasenotes | 08:32 |
sean-k-mooney | dvo-plv_:^ | 08:32 |
dvo-plv_ | thank you | 08:37 |
opendevreview | Danylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added. https://review.opendev.org/c/openstack/nova/+/876075 | 11:33 |
opendevreview | Amit Uniyal proposed openstack/nova stable/wallaby: fup: Print message logging uncaught nova-manage exceptions https://review.opendev.org/c/openstack/nova/+/877334 | 11:53 |
opendevreview | Danylo Vodopianov proposed openstack/nova-specs master: VirtIO PackedRing Configuration support https://review.opendev.org/c/openstack/nova-specs/+/868377 | 11:54 |
dvo-plv_ | gibi: I have resolved your comment at this bp https://review.opendev.org/c/openstack/nova-specs/+/868377 | 12:15 |
gibi | dvo-plv_: look good to me | 12:18 |
opendevreview | Danylo Vodopianov proposed openstack/nova-specs master: VirtIO PackedRing Configuration support https://review.opendev.org/c/openstack/nova-specs/+/868377 | 12:23 |
dvo-plv_ | I fixed pep8 validation. Should I do something more for further bp activities? | 12:38 |
sean-k-mooney | im just waitign for ci to report back im still happy with it so ill readd +2 once its green | 12:38 |
sean-k-mooney | oh its already repoted back | 12:39 |
sean-k-mooney | gibi: care to send it on its way https://review.opendev.org/c/openstack/nova-specs/+/868377 | 12:40 |
gibi | sean-k-mooney, dvo-plv_ : done | 12:43 |
dvo-plv_ | Great. So spec file is approved. Then we have to wait for code review ? https://review.opendev.org/q/topic:bp%252Fvirtio-packedring-configuration-support | 12:47 |
opendevreview | Merged openstack/nova-specs master: VirtIO PackedRing Configuration support https://review.opendev.org/c/openstack/nova-specs/+/868377 | 12:58 |
sahid | quick question regarding host aggregate and AZ | 13:05 |
sahid | we have noticed that when we add a compute host, that one is always considered in the AZ nova (the one that is by default configured in default_avalibility_zone) | 13:06 |
sahid | it seems that if we don't want this behavior to happen, we have to first add the host to the correct aggregate and then start nova-compute service | 13:07 |
sahid | the problem is that, it seems not possible to add in an aggregate a host that has not been "registered" by a service like nova-compute, right? | 13:08 |
sean-k-mooney | sahid: that not how it works | 13:09 |
sean-k-mooney | if you dont add a host to an aggreate with az metadta | 13:09 |
sahid | so during deployement of a new compute host we always have that window where the host is considered in AZ nova (this az does not exist in our deployement) until that we set the host to the correct aggregate | 13:09 |
sean-k-mooney | if you dont add a host to an aggreate with az metadta then its is considerd part of default_availability_zone | 13:10 |
sahid | (a aggregate with ZA metadata, yes) | 13:10 |
sahid | sean-k-mooney: yes we are in the same page | 13:10 |
sean-k-mooney | sahid: yes ther is no way to atomically ahve a comptue node com up and register in an az | 13:11 |
sean-k-mooney | sahid: if you want to prevent ti beign selected you can make the service register in a disabeld state | 13:11 |
sahid | i think it may be possible to achieve that if we allow to add an host that does not yet exist in a aggregate (that has az metadata), right? | 13:12 |
sahid | sean-k-mooney: ahh interesting point | 13:12 |
sahid | what do you mean by "selected"? | 13:12 |
sahid | I don't want that nova az appears | 13:12 |
sean-k-mooney | considerd as a valid host for scheduling | 13:12 |
sahid | because it's confusing for users | 13:13 |
sahid | ah... | 13:13 |
sean-k-mooney | that is not somethign you can do currently | 13:13 |
sahid | no no, from users perspective is not really good to see this "nova" az appearing and disapearing | 13:13 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.enable_new_services | 13:13 |
sean-k-mooney | sahid: well many deployment just use the nova az | 13:14 |
sean-k-mooney | a new feature could be added to hide az perhaps | 13:14 |
sean-k-mooney | but its workign as intended | 13:14 |
sahid | and what about to give ability to add a host that does not yet exists to an aggregate? | 13:14 |
sean-k-mooney | i.e. we coudl add a config option to hide the defautl az or add a metadata parmater perhasp | 13:15 |
sean-k-mooney | sahid: i dont really like that option | 13:15 |
dvo-plv_ | gibi, sean-k-monney: Great. So spec file is approved. Then we have to wait for code review ? https://review.opendev.org/q/topic:bp%252Fvirtio-packedring-configuration-support | 13:15 |
sean-k-mooney | i could maybe see adding a cofnig option to let the compute node know what az to regester in by default | 13:15 |
sean-k-mooney | dvo-plv_: ya but at this point provided all the tests ectra are in place and we are ok with the code all the admin is done | 13:16 |
sahid | this already exist right? https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_availability_zone | 13:16 |
sahid | but this work for deployment with one az | 13:16 |
sahid | i had feeling that configuring the host to the aggregate, then starting nova-compute on it would have been a not so bad solution | 13:17 |
sahid | nova-compute at start could have been able to be registered in the right way | 13:18 |
dvo-plv_ | sean-k-monney: Do you mean additional test for packed option from our side, or zuul verification ? | 13:18 |
sahid | we need to find a solution so when we start nova-compute for the first time it is be able to determine is AZ properly | 13:19 |
sean-k-mooney | sahid: well compute agents are not ment to kwno what az they are in realy | 13:21 |
sean-k-mooney | same as they are not ment to really knwo what cell they are in | 13:21 |
sean-k-mooney | sahid: default_availability_zone is what is makign it show as nova | 13:22 |
sean-k-mooney | but that is not used by the compute agent | 13:22 |
sean-k-mooney | the nova az does not actully exsit as a host aggreate either | 13:23 |
sean-k-mooney | or rather the default_availability_zone does not actully exist in the database | 13:23 |
sean-k-mooney | that parmater is just read by the api to use as a default when the compute service is not associated with an az. bauzas can correct me if that is wrong | 13:24 |
bauzas | on a meeting now | 13:24 |
bauzas | explain me quickly the issue, please ? | 13:24 |
sean-k-mooney | bauzas: sahid would like a way for compute agents to auto register to an az | 13:24 |
sean-k-mooney | so that they dont end up in the default_availability_zone i.e. nova | 13:25 |
sean-k-mooney | because they dont wnat that az to come and go as they are provisioning new servers | 13:25 |
sean-k-mooney | we coudl adress that multiple ways but it would be a new feature in any case | 13:26 |
bauzas | so, automatically adding a compute service to an aggregate ? well, some script can do it | 13:26 |
sean-k-mooney | it can yes which is what we expect to be done now | 13:26 |
bauzas | nothing I see for needing Nova to do it automatically | 13:26 |
sean-k-mooney | there is a small window btween the compute node starting and being able to add it at the api | 13:26 |
bauzas | they can disable it by default, right? | 13:27 |
sean-k-mooney | the az will still show up while the ocmptue is disabled | 13:27 |
sean-k-mooney | its the default az they want to hide not nessisarly the compute node | 13:27 |
dansmith | but aggregates are in the api database | 13:27 |
bauzas | sure, but that's an operator point | 13:27 |
dansmith | auto-registering to an az would require compute->conductor->apidb right? | 13:27 |
sean-k-mooney | dansmith: yep which is why i said this is not really somthing the compute agent should know about via config | 13:27 |
sean-k-mooney | dansmith: ya it woudl need an up call | 13:28 |
dansmith | okay good, I thought you were arguing _for_ doing that | 13:28 |
sean-k-mooney | no not really | 13:28 |
bauzas | again, any script can do it, I don't see *why* nova should support it | 13:28 |
dansmith | it also breaks several things: 1. computes don't know their az and 2. no upcalls | 13:28 |
sean-k-mooney | yep | 13:28 |
sean-k-mooney | so either we add config drive api behavior ( allow hiding the default az via a api config option) | 13:29 |
bauzas | plus the fact that we check race conditions in the API, not by the compute service | 13:29 |
sahid | it's why i was asking to give ability for operator to add in aggregate host that does not yet exist | 13:29 |
sean-k-mooney | or we make the default az a reall az/hostaggerte in the db | 13:29 |
sean-k-mooney | or we just dont change things | 13:29 |
sahid | bauzas: it's about an user experience | 13:29 |
bauzas | everytime you're adding a host to an aggregate, we synchronously check that AZs are correct | 13:30 |
sean-k-mooney | sahid: we technially dont have a forign key that would prevent that so we could | 13:30 |
sahid | they see nova appearing | 13:30 |
sean-k-mooney | if we comine that with the stable uuid feature | 13:30 |
sean-k-mooney | it might be workable | 13:30 |
sean-k-mooney | ie preallcoat the uuid that would be used for the comptue somehow but that is still kind of messy | 13:30 |
bauzas | sahid: I think we explained that by default nova supports one AZ | 13:30 |
bauzas | that doesn't mean that nova will *check* hosts | 13:31 |
sean-k-mooney | i think we check that there is a host_mapping today before allowing you to add a host to a host aggreate | 13:32 |
bauzas | only the operators know whether the AZ that the user uses is actually a right AZ or just a fake one | 13:32 |
dansmith | sahid: I assume the goal is so that an operator adding a new compute can tell the compute where it wants to plug in and have that happen when it first registers itself, instead of adding, registering, and then having to go back and add it to an aggregate right? | 13:32 |
bauzas | as we also loudly say that operators shall NEVER create 'nova' AZs, I don't see the problem | 13:32 |
sahid | sean-k-mooney: yes today, if you try to do that with an host that does not "exist" you receive an error | 13:34 |
sahid | dansmith: yes it's mostly that, today we start the service and then add the host to the correct aggregate, during this window, nova appears in the list of AZs | 13:37 |
sahid | we want to do the opposite (if possible), by configuring the host aggregate and then starting the service | 13:38 |
dansmith | sahid: so is it just a "day 0" thing, or do you want compute to be able to change its own az if you change its config and restart? | 13:38 |
sahid | in that way we avoid that "nova" | 13:38 |
sahid | just a day 0 thing | 13:39 |
dansmith | so, what about something like allowing metadata on a service, which you can configure in nova.conf for the compute, and then have the discover_hosts thing look for a "desired_az" or "desired_aggs" key and do the needful during discovery (which has to happen anyway)? | 13:40 |
dansmith | that way no upcall, and it's just "desired" and "metadata" and not "this is my az" | 13:40 |
dansmith | doesn't work after day zero | 13:40 |
dansmith | we could probably use metadata on a service for other things | 13:41 |
dansmith | I'd have to think some more.. I'm trying to listen to a meeting right now too | 13:41 |
sean-k-mooney | dansmith: we could also avoid the upcall by having the nova comptue just call the nova api driectly | 13:42 |
sahid | this should resolve our case for sure and is a bit like what sean-k-mooney mentioned | 13:42 |
dansmith | sean-k-mooney: it needs admin credentials though which would be unfortunate I think | 13:42 |
sean-k-mooney | well at least teh service role | 13:42 |
sean-k-mooney | but ya | 13:42 |
dansmith | yeah, not full admin hopefully, but still | 13:43 |
sean-k-mooney | where where you thinking of stashing the desired az | 13:43 |
sean-k-mooney | i dont think we have a db table we coudl use for that in the cell db currenlty | 13:43 |
dansmith | we'd need to add a generic metadata blob to service | 13:43 |
dansmith | nope | 13:43 |
sean-k-mooney | thats also not partically nice but ok | 13:43 |
dansmith | not nice because it requires db changes? or not nice for other reasons? | 13:44 |
sean-k-mooney | the db changes | 13:44 |
sean-k-mooney | if its just for this | 13:44 |
sean-k-mooney | if it was like instance_system_metadta | 13:44 |
sean-k-mooney | and we had other usecases i woudl care less | 13:44 |
dansmith | yeah, well, I mean.. something will require changes.. but I figure we might be able to use it for other things too | 13:45 |
sean-k-mooney | so you prefer doign ti via discover host to limit the creds | 13:45 |
dansmith | well, we already have a discovery process to do very similar things | 13:45 |
dansmith | lemme finish listening to this meeting for the next 15 minutes and then I can think a little more | 13:46 |
sahid | this should resolve our case for sure and is a bit like what sean-k-mooney mentioned | 13:51 |
sahid | oops sorry | 13:51 |
sahid | i've create bug report here, https://bugs.launchpad.net/nova/+bug/2018398 in case that we consider it's as valid and we want to do something to improve users experience :-) | 13:56 |
dansmith | sahid: s/users/operators/ so we're clear which "users" we're talking about | 13:57 |
dansmith | but yeah, I think this would be a nice behavior for operators, as long as we can do it within the other constraints and design points we have | 13:57 |
bauzas | sahid: so thanks for the bug report, now I better understand your concern | 13:57 |
bauzas | lemme check one thing tho | 13:58 |
bauzas | sahid: so, say your operator defined two AZs : AZ1 and AZ2 | 13:59 |
bauzas | hostA is in AZ1, hostB in AZ2 | 13:59 |
sahid | dansmith: wait, perhaps there is something that we are not doing correctly, but it's really our users which see those AZs in our interfaces | 13:59 |
bauzas | now, he gonna add hostC | 13:59 |
bauzas | once he registers hostC, then it appears a third AZ in the AZ list which is 'default_availability_zone' ie. 'nova' | 14:00 |
bauzas | I get the problem | 14:00 |
sahid | when they create an instance they can select the AZ | 14:00 |
bauzas | nown my question | 14:00 |
sahid | bauzas: yes rights | 14:00 |
bauzas | why isn't the default availabilty zone be either AZ1 or AZ2 ? | 14:00 |
bauzas | we just use that value to fake AZs | 14:01 |
dansmith | sahid: I know users see AZs, but users will not see this compute node feature | 14:01 |
bauzas | but this is not meaning that instances are automatically scheduled to that ZA | 14:01 |
bauzas | AZ* | 14:01 |
sahid | dansmith: ack | 14:01 |
bauzas | the default value for sending an instance to an AZ is default_schedule_zone | 14:01 |
bauzas | which is None by default | 14:02 |
dansmith | bauzas: oh right, I always forget about this difference | 14:02 |
dansmith | there are two conf knobs right? | 14:02 |
bauzas | yes | 14:03 |
bauzas | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_schedule_zone is for pushing new instances to a specific AZ automatically | 14:04 |
bauzas | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_availability_zone is just for showing a AZ if none exists | 14:05 |
bauzas | in a context where some AZs already exist, the solution is just to set the latter option to an existing AZ | 14:05 |
dansmith | bauzas: but it doesn't actually put the compute there right? | 14:05 |
bauzas | no | 14:05 |
dansmith | it's like faked in the api? | 14:05 |
sean-k-mooney | yep | 14:05 |
bauzas | " This option determines the default availability zone for ‘nova-compute’ services, which will be used if the service(s) do not belong to aggregates with availability zone metadata." | 14:05 |
sean-k-mooney | its not used by the compute service at all | 14:06 |
sean-k-mooney | you could have discover host use it if its set to a real az | 14:06 |
dansmith | yeah, so now does that help? I mean, it might help the "don't expose some other AZ for a window after adding the compute" but it doesn't help the mechanical operations of just dealing with the compute itself while adding | 14:06 |
bauzas | dansmith: there are two different concepts that are hard to understand | 14:07 |
dansmith | sean-k-mooney: but then discover would add all computes to one az, not the one the compute should be in | 14:07 |
bauzas | there is a the AZ API | 14:07 |
sean-k-mooney | dansmith: ya its not greate to use that directly | 14:07 |
bauzas | which is just showing the list of aggregates with a AZ metadata + the default az value | 14:07 |
bauzas | and then, the second concept is the scheduling thing | 14:07 |
dansmith | yeah, I understand the default_schedule_zone part | 14:08 |
bauzas | dansmith: if you read https://bugs.launchpad.net/nova/+bug/2018398 sahid is concerned by the fact 'nova' is presented to endusers when they do nova availability-zone-list | 14:08 |
sean-k-mooney | what happens today if you set default_avaiablity_zone=internal | 14:08 |
dansmith | bauzas: right | 14:08 |
sean-k-mooney | internal is not show to endusers unless they are admin | 14:08 |
bauzas | dansmith: I'm just saying to modify this value to an existing AZ value, eg. AZ1 | 14:09 |
dansmith | bauzas: although I'm not sure how that happens even, because the compute isn't actually in that az | 14:09 |
bauzas | dansmith: that option is purely API-based | 14:09 |
sean-k-mooney | if default_avaiablity_zone=internal is accpeted today | 14:09 |
dansmith | bauzas: right, that helps not expose 'nova' for a minute while you fix it | 14:09 |
sean-k-mooney | that woudl hide the compute until the admin added them to a real az | 14:09 |
bauzas | correct ^ | 14:09 |
dansmith | bauzas: but that's only one aspect of the problem (IMHO) | 14:09 |
bauzas | dansmith: well, lemme clarify then | 14:09 |
dansmith | bauzas: the second aspect is that you add a compute, you wait for it to register, you wait for it to be discovered, then you go and add it to the right az | 14:09 |
dansmith | that's stuff that a deployment tool has to do a bunch of polling and looping over, but it really shouldn't | 14:10 |
sean-k-mooney | dansmith: if we remvoed the exists check in the host aggrate add | 14:10 |
dansmith | if it could say "this should be in AZ foo" then all that stuff can happen automatically and without a bunch of slow steps in the process | 14:10 |
bauzas | dansmith: technically you don't add a host to an az, but to an aggregate | 14:10 |
sean-k-mooney | and we leveraged the fact we can right a uuid for the comptue to use | 14:10 |
dansmith | sean-k-mooney: yeah but then people can typo things when they add them, so I don't love that | 14:11 |
dansmith | bauzas: I know :) | 14:11 |
bauzas | dansmith: and an operator can add a host to an aggregate as soon as it's registered in the services table | 14:11 |
sean-k-mooney | i could see the workflow being the installer generate a uuid for the host, then adds it to the righ host aggrate and then deploys the host | 14:11 |
sean-k-mooney | dansmith: ya that one of the reasons we dont allow it today | 14:11 |
dansmith | sean-k-mooney: but you don't add compute nodes to aggregates, you add services right? | 14:11 |
sean-k-mooney | ah you are right its the service | 14:11 |
bauzas | dansmith: typos can't append, we query the record based on the service name | 14:11 |
sean-k-mooney | which is why this does nto really work with ironic today | 14:12 |
bauzas | happen* | 14:12 |
dansmith | bauzas: sean-k-mooney is suggesting we drop that requirement | 14:12 |
sean-k-mooney | dansmith: well sahid was previously | 14:12 |
bauzas | again, why ? | 14:12 |
sean-k-mooney | bauzas: to allwo you to pregreister the service to host aggrate | 14:12 |
sean-k-mooney | before its created | 14:12 |
bauzas | can't a script loop over a service list and do the aggregate add after ? | 14:13 |
dansmith | I'm just thinking of how customers are sometimes afraid to run stack update on a running cloud, just like I'm afraid to re-run my own shoddy ansible sometimes :) | 14:13 |
dansmith | and how making the act of adding a new compute more of a compute-focused thing than a deployment-focused thing would likely be a welcome pattern for people | 14:13 |
bauzas | sean-k-mooney: we would soften the requirements for the very little benefit of pre-adding a compute to an aggregate | 14:13 |
sean-k-mooney | bauzas: you can but if you do that in cron and a vm is booted that request nova in that interval then it will be broken | 14:14 |
dansmith | bauzas: I agree, I don't like that idea | 14:14 |
sean-k-mooney | once an instance gets schdeuled to the host it cant move to a real az | 14:14 |
bauzas | I like the idea to only add something schedulable if that thing can actually really schedule my stuff | 14:14 |
sean-k-mooney | could we hide the nova az if all compute service in it were disabled | 14:15 |
bauzas | the second it got added to an aggregate, it can be scheduled | 14:15 |
bauzas | again, why ? | 14:15 |
sean-k-mooney | then you can set the config option to only register them as disabled | 14:15 |
bauzas | we would put a lot of preconditions and added complexity for a very little gain | 14:15 |
bauzas | and adding a compute isn't really a daily operation | 14:15 |
sean-k-mooney | bauzas: sure its little gain but its a ligitmate pain point for there end customer apperenlty | 14:15 |
sean-k-mooney | well it depend on your scale | 14:16 |
bauzas | sean-k-mooney: the bug report wasn't complaining about it | 14:16 |
dansmith | we already have a task that has to run for a new compute (discover) which can either be automated in scheduler, called via cron, or manually during a deployment.. its purpose is to do things in the api database, so it feels like a very natural thing to have that step also do this, which is placing the compute in the right grouping | 14:16 |
bauzas | sean-k-mooney: the bug report was about to prevent 'nova' to show off | 14:16 |
sean-k-mooney | bauzas: sahid said it caused confustion for ther custoemr. | 14:16 |
bauzas | dansmith: I'm confused, everything is API-driven | 14:17 |
dansmith | sahid: is the confusion of showing the wrong AZ the only thing you care about? | 14:17 |
dansmith | bauzas: adding a new compute is not api-driven | 14:17 |
dansmith | bauzas: the exception to that is needing to put it into the right az | 14:17 |
sean-k-mooney | bauzas: expanding on ^ you can delete compute services from the api, you can only create them via rpc via the conductor | 14:18 |
sean-k-mooney | bauzas: we do not have a create endpoint https://docs.openstack.org/api-ref/compute/#compute-services-os-services | 14:19 |
bauzas | OK I admit the feature gap | 14:20 |
sean-k-mooney | dansmith: addign a create endpoint woudl also work i guess. when the compute starts it would find the existing record | 14:20 |
sean-k-mooney | and ocne the record is created you could perhaps add the servicew to an aggreate | 14:20 |
dansmith | sean-k-mooney: to do that we'd have to expose the concept of a cell to the API which I'm -3 on :) | 14:20 |
bauzas | but I'm very reluctant to make add_host_to_agg() less strict | 14:20 |
sean-k-mooney | oh ya we would | 14:20 |
dansmith | making this more manual is also not something I'm in favor of | 14:21 |
sean-k-mooney | case this is in the cell db | 14:21 |
sean-k-mooney | so ya thats not an option relaly | 14:21 |
dansmith | and pre-creating hosts makes that more manual, and much more likely to be wrong, given what we know about hostnames :) | 14:21 |
sean-k-mooney | very true | 14:21 |
dansmith | I'd like to make this more automatic | 14:21 |
dansmith | which may or may not be what sahid really wants, but making it more automatic will *also* solve sahid's concern | 14:21 |
bauzas | without any upcalls for sure | 14:21 |
dansmith | yep | 14:21 |
bauzas | which is quite a chicken-and-egg problem | 14:22 |
sean-k-mooney | the thing is we would like this to be more atomatic while also not havign upcalls or having the comptue specificly knwo what az its in | 14:22 |
sean-k-mooney | because that shoudl still be changable vai the api | 14:22 |
dansmith | well, I don't see "desired_az" as breaking the "knowing what az I'm in" rule | 14:22 |
dansmith | yeah of course | 14:22 |
* sahid sorry i had to join a meeting | 14:23 | |
sean-k-mooney | desired_az to me woudl be like the initial cpu allocation ratio | 14:23 |
dansmith | just like our default allocation ratios - they're used if we have to set a default, otherwise it's what is in placement that matters | 14:23 |
dansmith | yep | 14:23 |
sean-k-mooney | rigth somethign that is used once and only once | 14:23 |
dansmith | exactly | 14:23 |
bauzas | fwiw, AZs are already a complicated matter to understand | 14:23 |
sahid | dansmith: yes our biggest concern was the az showed | 14:23 |
bauzas | do we really want to introduce another concept ? | 14:23 |
dansmith | bauzas: what other concept? | 14:23 |
bauzas | 'I, as compute, will be part of an AZ eventually' | 14:24 |
dansmith | well, that's true regardless of what we do here :) | 14:24 |
sean-k-mooney | that a concept for the installer/deploer | 14:24 |
bauzas | reconciliation, if you prefer | 14:24 |
sean-k-mooney | not really for the end users and only partly for the admin | 14:24 |
bauzas | dansmith: see, sahid's problem is the user-facing AZ list problem | 14:25 |
bauzas | not the day-2 operational problem of adding a host to an AZ | 14:25 |
dansmith | bauzas: I've said several times, I get that.. I think there are two benefits to a solution here, his being one | 14:25 |
sean-k-mooney | im still not sure if that can be adress by settign default_avaialblity_zone=internal | 14:25 |
bauzas | sean-k-mooney: internal has a very different meaning today | 14:26 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.internal_service_availability_zone | 14:26 |
sean-k-mooney | bauzas: it does but its not show to non admins | 14:26 |
sean-k-mooney | and sahid does not want this to be show | 14:26 |
bauzas | internal skips all computes | 14:26 |
sean-k-mooney | i know | 14:26 |
bauzas | again, this show problem is resolvable by default_availability_zone | 14:27 |
sean-k-mooney | but do we have somethign to prevent you setting https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_availability_zone to internal | 14:27 |
bauzas | ah, no | 14:27 |
bauzas | but I can quickly check | 14:27 |
sean-k-mooney | i would be ok with extending the concep of the intenal az to have compute in it if you set default_availability_zone=internal and documenting this means they cannot be used for workloads | 14:28 |
sean-k-mooney | i.e. make it a parkign ground for newly added compute serivce that the operator is yet to assign to a real az | 14:28 |
sean-k-mooney | i dont think that woudl break any exising usecases | 14:28 |
bauzas | this could break the filter | 14:29 |
bauzas | https://github.com/openstack/nova/blob/033af941792a9ae510c8d6b2cc318f062f0e1c66/nova/scheduler/filters/availability_zone_filter.py#L68 | 14:29 |
sahid | bauzas: if user can see that, they can try to deploy instance with this az which is not what we want as-well | 14:29 |
sean-k-mooney | not if we reject all api request with az=Config.internal_service_availability_zone | 14:30 |
sean-k-mooney | sahid: i belive they cannot see it unless they are an admin | 14:30 |
sahid | (i mean in an ideal case) | 14:30 |
bauzas | sahid: again, speaking of examples | 14:31 |
bauzas | sahid: if you already have AZ1, AZ2 and AZ2 that are meaningful | 14:31 |
bauzas | all those three are shown by nova az-list | 14:31 |
bauzas | and now you do care of not showing 'nova' in that list | 14:31 |
bauzas | what I'm saying is that then change 'default_az' to any of the three, and job is done | 14:32 |
sahid | ok, let's use this way | 14:35 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Fix get_segments_id with subnets without segment_id https://review.opendev.org/c/openstack/nova/+/882160 | 15:06 |
dansmith | eharney: if you're good with this, it could use a +W as everything it depends on is in the gate now: https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/881764 | 15:12 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Test new ceph job configuration with nova https://review.opendev.org/c/openstack/nova/+/881585 | 15:18 |
dansmith | man, the nova jobs sure have gotten big again | 15:21 |
dansmith | we're running a lot of jobs | 15:21 |
auniyal | dansmith, yes its takes around 2 hour for a patch to get verified from tox | 15:28 |
auniyal | zuul | 15:28 |
dansmith | auniyal: that's generally a function of the slowest job, not the number of jobs | 15:32 |
dansmith | gouthamr: the tempest stuff is all landed.. if you can +W the cinder-tempest-change now, it can go in and then the devstack plugin change is all that's left | 16:50 |
dansmith | eharney: good catch I guess... I don't know if it's okay to just bump the requirement or not | 16:56 |
dansmith | gmann: https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/881764?tab=comments | 16:56 |
dansmith | 27 was wallaby and 30 was yoga | 16:57 |
dansmith | so upstream I would think we're good bumping the version but I dunno if anything installs this from master in older versions... | 16:57 |
gmann | dansmith: give me 2 min, will check | 17:25 |
dansmith | gmann: ack thanks | 17:25 |
* gouthamr is catching up now.. | 17:33 | |
gouthamr | stable jobs install "master" version of the plugin (and tempest) unless explicitly pinned | 17:34 |
dansmith | gouthamr: but how far back? | 17:35 |
dansmith | we pin tempest at some point I think when the stable jobs get too old to support master tempest | 17:36 |
dansmith | all the release jobs back to xena are passing | 17:37 |
gouthamr | ah good; the first pin i see is in stable/wallaby: https://github.com/openstack/cinder/blob/stable/wallaby/.zuul.yaml#L291-L292 | 17:37 |
dansmith | and I know they're using master tempest because they were honoring the depends-on when I was working on those patches (and breaking things) | 17:37 |
dansmith | gouthamr: ack, and since the xena job on c-t-p works, we should be good then right? | 17:38 |
gouthamr | i think so dansmith; tosky and gmann would be experts in this area | 17:38 |
dansmith | ack, well, we'll see what gmann has to say | 17:38 |
gouthamr | for correctness, we need the latest version of tempest - and afawct, that's working as expected.. | 17:39 |
dansmith | yeah, so I'm not sure what the tempest requirement would be, if we're actually using master tempest | 17:40 |
gouthamr | we just hope no-one outside of our gates is pinning tempest for whatever reason but using a newer version of ctp.. (a weirdness i've seen albeit temporarily in some rdo jobs) | 17:40 |
gmann | dansmith: gouthamr tempest is pinned till wallaby as stable/xena still not in EM https://review.opendev.org/c/openstack/releases/+/881254 | 17:40 |
dansmith | gmann: right, but xena is good according to this: https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/881764?tab=change-view-tab-header-zuul-results-summary | 17:41 |
gmann | dansmith: ok, which one is failing | 17:42 |
dansmith | gmann: nothing is failing | 17:42 |
dansmith | gmann: eharney was concerned about the tempest version in ctp requirements | 17:42 |
gmann | dansmith: ohk, checking that only. will reply | 17:43 |
dansmith | btw "nothing is failing" feels *amazing* to say :) | 17:43 |
gmann | :) | 17:45 |
gouthamr | ++ | 18:13 |
gmann | dansmith: replied, agree with Eric to bump tempest version there https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/881764/comments/0be5d072_64ceae34 | 18:16 |
dansmith | gmann: okay I'll update it, but I don't understand why it would time out | 18:17 |
dansmith | oh is it because with ACTIVE we wait for the instance state to become the wait_until value? | 18:17 |
gmann | yeah https://github.com/openstack/tempest/blob/27.0.0/tempest/common/compute.py#L236 | 18:17 |
gmann | it will pass SSHABLE as server status to wait for | 18:18 |
dansmith | ack, okay I was focused on the SSHABLE special cases up earlier, I see | 18:18 |
dansmith | just updated, gouthamr ^ | 18:18 |
gmann | ack | 18:20 |
opendevreview | Merged openstack/nova stable/yoga: Remove mentions of removed scheduler filters https://review.opendev.org/c/openstack/nova/+/858025 | 18:38 |
dansmith | gouthamr: I think I better go ahead and do the cephadm->py3 job name change in the devstack plugin, otherwise we'll break nova as soon as that lands before we either rename it there or change nova (et al) to point to the new job | 18:41 |
gouthamr | dansmith: +1; to clear this in my head and plan further cleanup - in your revert patch, you set cephadm options in "devstack-plugin-ceph-tempest-py3-base", correct? | 18:45 |
dansmith | gouthamr: no I set the cephadm thing only in the cephadm job (which would be renamed to -py3).. probably no reason not to set it on the base one yet | 18:46 |
dansmith | i figured I would rename the jobs and then you can remove the old one when you remove the non cephadm support from it | 18:46 |
dansmith | just leave the distro-based one named -ubuntu, nonvoting | 18:46 |
dansmith | I can do the removal and refactoring there, I just don't want to change so much, especially in a patch that was supposed to be a revert :) | 18:47 |
dansmith | I just want to get this landed so we can start getting more runs on the new stuff and not keep refactoring this, blocking that process | 18:48 |
dansmith | and since I have it working in this form I just hesitate to move too much at once | 18:48 |
gouthamr | ack; this makes sense dansmith.. | 18:56 |
dansmith | gouthamr: okay I just pushed that up let me know if that looks okay | 18:57 |
* gouthamr looks | 18:57 | |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Test new ceph job configuration with nova https://review.opendev.org/c/openstack/nova/+/881585 | 18:59 |
gouthamr | dansmith: a couple of comments | 19:03 |
dansmith | gouthamr: oh gate, right thanks :) | 19:04 |
dansmith | gouthamr: "gate" has seemed so far off for a long time :D | 19:04 |
dansmith | gouthamr: I don't know what is needed for rpm | 19:04 |
dansmith | gouthamr: but since these jobs don't run on there, I think we can defer that | 19:05 |
gouthamr | dansmith: "devstack-plugin-ceph-cephfs-nfs" is passing; although it doesn't do cephadm, it runs on centos-stream-9 | 19:06 |
dansmith | right, and those packages are only needed for cephadm | 19:07 |
gouthamr | dansmith: if we bump that to use cephadm soon, and we hit any issues, we'll get that covered :) | 19:07 |
dansmith | ack, cool | 19:07 |
opendevreview | sean mooney proposed openstack/os-vif master: [WIP] set default qos policy https://review.opendev.org/c/openstack/os-vif/+/881751 | 20:46 |
opendevreview | Dan Smith proposed openstack/nova master: Have host look for CPU controller of cgroupsv2 location. https://review.opendev.org/c/openstack/nova/+/873127 | 22:04 |
dansmith | I have seen a lot of port binding related failures today, many of them on the grenade multinode job: https://c68ef44fab0ad498c042-f24a7834eba09db97966c05c7e428413.ssl.cf1.rackcdn.com/881585/8/check/nova-grenade-multinode/cbd5c81/testr_results.html | 22:37 |
melwitt | I saw one yesterday | 23:25 |
dansmith | maybe sean-k-mooney could take a look in the morning | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!