opendevreview | sean mooney proposed openstack/nova master: [codespell] fix typos in api-ref https://review.opendev.org/c/openstack/nova/+/897094 | 00:33 |
---|---|---|
opendevreview | sean mooney proposed openstack/nova master: [codespell] apply codespell to the releasenotes https://review.opendev.org/c/openstack/nova/+/897095 | 00:33 |
opendevreview | sean mooney proposed openstack/nova master: [codespell] doc,devstack and gate typos https://review.opendev.org/c/openstack/nova/+/897096 | 00:33 |
*** kopecmartin|off is now known as kopecmartin | 07:04 | |
bauzas | cores, could you please quickly merge https://review.opendev.org/c/openstack/nova-specs/+/895815 before our Bobcat GA ? | 07:36 |
gibi | bauzas: Is it better to mark the ironic shard implemented with a note that it is reverted? Do we expect that shard spec will be reproposed in C? | 08:12 |
bauzas | gibi: have you seen the next change in the series ? | 08:13 |
bauzas | https://review.opendev.org/c/openstack/nova-specs/+/896679/1/specs/2023.2/implemented/ironic-shards.rst | 08:13 |
gibi | bauzas: I saw the note yes | 08:14 |
gibi | my question is will there be a new spec in C to merge the shard implementation again? | 08:14 |
bauzas | gibi: I hope so | 08:14 |
gibi | then we should not put the current shard spec to implemented | 08:14 |
bauzas | johnthetubaguy: do you know if you could be able to create a new spec in Caracal for ironic-shards ? | 08:15 |
gibi | we should only put the shard spec to implemented that was implemented and *released* | 08:15 |
bauzas | okay, then let me change the Launchpad blueprint | 08:16 |
opendevreview | Sylvain Bauza proposed openstack/nova-specs master: Move Bobcat implemented specs https://review.opendev.org/c/openstack/nova-specs/+/895815 | 08:23 |
opendevreview | Sylvain Bauza proposed openstack/nova-specs master: explain why ironic-shards was reverted in the spec https://review.opendev.org/c/openstack/nova-specs/+/896679 | 08:23 |
gibi | thanks | 09:32 |
sean-k-mooney | sure ill review it now | 09:37 |
sean-k-mooney | done | 09:38 |
sean-k-mooney | back to codespell... | 09:40 |
bauzas | thanks | 09:50 |
opendevreview | Merged openstack/nova-specs master: Move Bobcat implemented specs https://review.opendev.org/c/openstack/nova-specs/+/895815 | 09:52 |
opendevreview | Merged openstack/nova-specs master: explain why ironic-shards was reverted in the spec https://review.opendev.org/c/openstack/nova-specs/+/896679 | 09:52 |
dvo-plv | sean-k-mooney, Hello, I had resolved your comment, Could you please review it again in a free time ? https://review.opendev.org/c/openstack/nova-specs/+/895924 | 10:18 |
sean-k-mooney | yep looks fine to me | 10:26 |
sean-k-mooney | easy reporposal ^ if anyone else wants to review that | 10:26 |
dvo-plv | Thank you | 10:27 |
dvo-plv | yes, bauzas told me yesterday, that its review week, so I would like to wait for it as far as it is in his agenda | 10:28 |
sean-k-mooney | well the bobcat release is mostly done now and we are getting ready for the ptg so its not really formal | 10:30 |
sean-k-mooney | but this week is when most of us are starting to focus more on the next cycle | 10:31 |
sean-k-mooney | thats why im getting all the liniting/spellecheking/style enhancment stuff i have wanted to do for the last year or so ready this week | 10:32 |
sean-k-mooney | i.e. before next cycle dev really picks up get the tooling to help use review and land code faster in place | 10:32 |
opendevreview | sean mooney proposed openstack/nova master: [codespell] fix typos in tests https://review.opendev.org/c/openstack/nova/+/897213 | 11:02 |
opendevreview | sean mooney proposed openstack/nova master: [codespell] fix final typos and enable ci https://review.opendev.org/c/openstack/nova/+/897214 | 11:02 |
opendevreview | sean mooney proposed openstack/nova master: [codespell] ignore codespell in git blame https://review.opendev.org/c/openstack/nova/+/897215 | 11:02 |
opendevreview | sean mooney proposed openstack/nova master: fix sphinx-lint errors in docs and add ci https://review.opendev.org/c/openstack/nova/+/897089 | 11:08 |
opendevreview | sean mooney proposed openstack/nova master: ignore sphinx-lint series in git blame https://review.opendev.org/c/openstack/nova/+/897218 | 11:11 |
opendevreview | sean mooney proposed openstack/nova master: remove deprecated vmware virt driver https://review.opendev.org/c/openstack/nova/+/897017 | 11:15 |
opendevreview | sean mooney proposed openstack/nova master: remove deprecated vmware virt driver https://review.opendev.org/c/openstack/nova/+/897017 | 11:17 |
opendevreview | sean mooney proposed openstack/nova-specs master: resubmit per-process-healthchecks spec for 2024.1 https://review.opendev.org/c/openstack/nova-specs/+/897225 | 12:24 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add initial healthcheck support https://review.opendev.org/c/openstack/nova/+/825015 | 12:26 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add healthcheck manager to manager base https://review.opendev.org/c/openstack/nova/+/827844 | 12:26 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add healthcheck tracker to nova context https://review.opendev.org/c/openstack/nova/+/829468 | 12:26 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add healthcheck utils and constants https://review.opendev.org/c/openstack/nova/+/829469 | 12:26 |
opendevreview | sean mooney proposed openstack/nova master: add healthcheck endpoint to proxy commands https://review.opendev.org/c/openstack/nova/+/830703 | 12:26 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds device tagging functional tests https://review.opendev.org/c/openstack/nova/+/895162 | 12:35 |
opendevreview | Amit Uniyal proposed openstack/nova master: Device tags: don't pass pf_interface=True to get_mac_by_pci_address https://review.opendev.org/c/openstack/nova/+/670593 | 12:35 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: Device tagging: expose target PCI address, not source https://review.opendev.org/c/openstack/nova/+/672127 | 12:35 |
opendevreview | ribaudr proposed openstack/nova master: Allow config to support virtiofs (driver) https://review.opendev.org/c/openstack/nova/+/886522 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (db) https://review.opendev.org/c/openstack/nova/+/831193 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (objects) https://review.opendev.org/c/openstack/nova/+/839401 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (manila abstraction) https://review.opendev.org/c/openstack/nova/+/831194 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (drivers and compute manager part) https://review.opendev.org/c/openstack/nova/+/833090 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Mounting the shares as part of the initialization process https://review.opendev.org/c/openstack/nova/+/880075 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Deletion of associated share mappings on instance deletion https://review.opendev.org/c/openstack/nova/+/881472 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add metadata for shares https://review.opendev.org/c/openstack/nova/+/850500 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/854823 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute manager part) https://review.opendev.org/c/openstack/nova/+/854824 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/860284 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Support resuming an instance with shares (compute manager part) https://review.opendev.org/c/openstack/nova/+/860285 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares https://review.opendev.org/c/openstack/nova/+/860286 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part) https://review.opendev.org/c/openstack/nova/+/860287 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute manager part) https://review.opendev.org/c/openstack/nova/+/860288 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Allow to mount manila share using Cephfs protocol https://review.opendev.org/c/openstack/nova/+/883862 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support (compute manager) https://review.opendev.org/c/openstack/nova/+/885751 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add share lock/unlock and restrict visibility https://review.opendev.org/c/openstack/nova/+/890340 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support (only API exception) https://review.opendev.org/c/openstack/nova/+/885752 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (API) https://review.opendev.org/c/openstack/nova/+/836830 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support (API) https://review.opendev.org/c/openstack/nova/+/850499 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to attach/detach shares https://review.opendev.org/c/openstack/nova/+/885753 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach notification https://review.opendev.org/c/openstack/nova/+/850501 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach notification https://review.opendev.org/c/openstack/nova/+/851028 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add shares to InstancePayload https://review.opendev.org/c/openstack/nova/+/851029 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach_error notification https://review.opendev.org/c/openstack/nova/+/860282 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach_error notification https://review.opendev.org/c/openstack/nova/+/860283 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working. https://review.opendev.org/c/openstack/nova/+/852086 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Add virt/libvirt error test cases https://review.opendev.org/c/openstack/nova/+/852087 | 13:15 |
opendevreview | ribaudr proposed openstack/nova master: Docs about Manila shares API usage https://review.opendev.org/c/openstack/nova/+/871642 | 13:15 |
bauzas | sean-k-mooney: please try to avoid telling our contributors that we would have new linting enhancements until we have a consensus, please | 13:31 |
sean-k-mooney | bauzas: i didnt say we will i said i wanted to get them ready before the ptg | 13:32 |
bauzas | okay | 13:32 |
sean-k-mooney | i would very strongly like to do this however | 13:33 |
bauzas | tbc, I'm okay with a new project, my only concerns are around to Nova | 13:34 |
sean-k-mooney | im mainly concerned about nova and our deliverables | 13:35 |
atmark | Hello! Yesterday, I upgraded Victoria to Wallaby using kolla-ansible. Out of 48 computes, I found 2 computes duplicated entries in `openstack compute service list`. It won't let me delete rouge entry | 14:56 |
atmark | I tried deleting the resource allocations and tried to do heal allocations but it throws an error | 14:57 |
dansmith | atmark: won't let you delete service records because why? it thinks they have instances running on them? | 14:58 |
atmark | dansmith: yes | 14:59 |
atmark | these are duplicate entries https://paste.openstack.org/show/bBM1U8uQkSgpsNh6gLFU/ | 14:59 |
bauzas | reminder : nova meeting in 1 hour by now here | 15:00 |
dansmith | atmark: okay you'll want to find the instances that are assigned to the orphaned computes | 15:01 |
dansmith | I'm surprised to see that they have identical names | 15:02 |
atmark | compute01's nova-compute.log https://paste.openstack.org/show/bGYG9FcesMGyFRgYH08x/ | 15:02 |
opendevreview | Merged openstack/nova-specs master: Amend spec to add more details around encryption secrets https://review.opendev.org/c/openstack/nova-specs/+/887905 | 15:04 |
dansmith | huh okay I guess we can have duplicate hostnames in service records, even for the same binary | 15:06 |
atmark | dansmith: so I found the and tried to heal allocations but throws not descriptive `Error:` https://paste.openstack.org/show/bFSAcoAlJvZdpCrxzRyh/ | 15:06 |
atmark | found the instances* | 15:06 |
dansmith | atmark: yeah that's not going to help because they have different nodes in the DB as well | 15:06 |
dansmith | atmark: so the best thing you can do, if able, is to migrate the affected instances away from the duplicate services and then delete the dupes after they're empty | 15:08 |
atmark | Cold migrate? Getting `No server with a name or ID of '' exists. when doing live migrate | 15:11 |
dansmith | any migration | 15:12 |
dansmith | is that the whole error message? | 15:12 |
atmark | yes, that's from the openstack client | 15:13 |
opendevreview | Merged openstack/nova-specs master: Re-propose using extend volume completion action for 2024.1 https://review.opendev.org/c/openstack/nova-specs/+/895648 | 15:15 |
dansmith | like it's complaining about the instance id being unknown? | 15:15 |
dansmith | are you able to do a server show on that instance? | 15:16 |
atmark | dansmith: disregard the previous error, my for loop syntax to live migrate was wrong | 15:24 |
atmark | getting `Compute service of compute25.cnco1 is unavailable at this time.` | 15:24 |
dansmith | heh okay | 15:24 |
dansmith | atmark: is that just live or did you try cold as well? | 15:24 |
atmark | live | 15:24 |
atmark | let me try cold | 15:24 |
dansmith | so I think this might be because you've got these duplicated service records, so it's probably picking the wrong one.. usually what happens is people get duplicated *node* records, but the service record shows as up, so this works | 15:25 |
dansmith | picking the wrong one meaning it thinks the compute node is down even though it's not (obviously).. the migrating is what we need to correct the node mismatch, which you *also* have | 15:26 |
atmark | `Service is unavailable at this time.` for cold migrate | 15:26 |
dansmith | can you select out the host and node fields for one of the affected instances direct from the database? | 15:29 |
atmark | dansmith: https://paste.openstack.org/show/b0oiRoqWDHHgataIK7jO/ | 15:35 |
dansmith | atmark: and can you query out the relevant records from services and compute_nodes? | 15:37 |
dansmith | it's weird that the compute is having trouble finding the right compute node record (placement errors in the compute logs) and that the instance is being associated with the downed duplicate service record | 15:38 |
atmark | these are instances affected https://paste.openstack.org/show/bM6wA9HPHbuNzGmNEqPB/ | 15:38 |
dansmith | yeah so just a little background, we associate instances, nodes and services very loosely with these hostnames, which is terrible and fragile, as you're noticing.. work has been done recently to try to make this better (if that's any consolation) | 15:39 |
atmark | services and compute_nodes table https://paste.openstack.org/show/bujkfzse6gko5nV197ue/ | 15:43 |
dansmith | hmm, okay, so there is only one non-deleted service of each name right? | 15:44 |
dansmith | a record is deleted when there is a non-zero value in the deleted column | 15:44 |
dansmith | I'm not sure why you're seeing them in the services list through the openstack client in that case | 15:47 |
atmark | for compute01, I ran `openstack resource provider allocation delete $consumerid`, deleted the resource provider for the compute, restarted the nova compute to let it recreate resource provider entry a | 15:47 |
atmark | upon healing allocation, it wont just throws an error | 15:47 |
atmark | it just throws an non descriptive error* | 15:48 |
dansmith | yeah, well (a) you shouldn't have done that but (b) I think it's failing to create a new provider with the same name | 15:48 |
dansmith | also that has nothing to do with these records, FWIW | 15:49 |
dansmith | atmark: your compute nodes list doesn't show the deleted field, can you update with that field? | 15:49 |
atmark | https://paste.openstack.org/show/b6Ng2S3XwYQoL8G0mJvr/ with deleted field | 15:50 |
dansmith | so the uuid in our compute01 logs shows a node uuid of bf7a7e8e-c486-42b2-8dc4-00b2d5f70f86 | 15:52 |
dansmith | but I don't see that in your nodes list | 15:53 |
dansmith | can you see if any compute node has that uuid in the DB? | 15:53 |
dansmith | atmark: I guess I should also ask - do you have multiple compute cells? | 15:53 |
atmark | (a) - I followed what's in RH KB https://access.redhat.com/solutions/5308941. After the upgrade, nova-compute on those was throwing `Conflicting resource provider name` | 15:53 |
dansmith | are you running OSP? | 15:54 |
atmark | atmark: Yes, two cells: cnco1 and cnco2 but I only use cnco1 cell | 15:55 |
atmark | dansmith: No. I have kolla-ansible deployment | 15:55 |
dansmith | meaning there are no compute nodes in the other cell? but it's mapped? | 15:55 |
atmark | show databases https://paste.openstack.org/show/b26YFJxzw33hNLa9pVLw/ | 15:56 |
opendevreview | Elod Illes proposed openstack/nova stable/victoria: libvirt: Abort live-migration job when monitoring fails https://review.opendev.org/c/openstack/nova/+/837321 | 15:56 |
dansmith | re: cells I'm just trying to suss out if there are any duplicates happening because of that.. can you check the services table in the other cell to see if any of them have the same hostname of either of these affected ones just to be sure? | 15:56 |
atmark | nova_cnco1 is where all computes are mapped | 15:56 |
dansmith | atmark: we're going to have to pause here for the compute meeting in a few minutes but if you're still around after we can resume | 15:57 |
atmark | got it | 15:57 |
atmark | appreaciate the help | 15:57 |
atmark | re: can you check the services table in the other cell - nova_cell0? | 15:58 |
dansmith | no you said you have a cnco2 cell | 15:59 |
atmark | nova_cell0 https://paste.openstack.org/show/by9WvQgKARdZRppniRoY/ | 16:00 |
dansmith | cell0 is a different thing | 16:00 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Oct 3 16:00:09 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | sorry guys, meeting by now | 16:00 |
elodilles | o/ | 16:00 |
bauzas | atmark: I can also help you once we're done with the meeting, shall be quick | 16:00 |
dansmith | o/ | 16:00 |
bauzas | heya folks | 16:00 |
opendevreview | Merged openstack/nova master: Add job to test with SQLAlchemy master (2.x) https://review.opendev.org/c/openstack/nova/+/886230 | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:01 |
gibi | o/ | 16:01 |
bauzas | ok, I guess we can start | 16:01 |
bauzas | #topic Bugs (stuck/critical) | 16:01 |
Uggla | o/ | 16:01 |
bauzas | #info No Critical bug | 16:02 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 44 new untriaged bugs (-1 since the last meeting) | 16:02 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:02 |
bauzas | gibi: happy with triaging some bugs ? | 16:02 |
bauzas | or do you want to skip it given your other prios ? | 16:02 |
gibi | bauzas: I can try | 16:03 |
bauzas | gibi: no worries, as I always say, it's a stretch goal, best-effort | 16:03 |
bauzas | and fwiw, in general when it's me, the number of bugs is raising :blushes: :facepalm: | 16:04 |
bauzas | but thanks, appreciated given the known situation | 16:04 |
bauzas | #info bug baton is gibi | 16:04 |
bauzas | moving on | 16:04 |
bauzas | #topic Gate status | 16:04 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:04 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status | 16:04 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:05 |
bauzas | all greens | 16:05 |
bauzas | nothing I spotted | 16:05 |
bauzas | that's a silent week and I appreciate it :) | 16:05 |
bauzas | any CI failures you guys spotted N? | 16:05 |
bauzas | (doh about the N?, thanks french keyboard layout) | 16:06 |
bauzas | looks nothing ditto | 16:06 |
bauzas | that's cool \o/ | 16:06 |
bauzas | #topic Release Planning | 16:06 |
bauzas | #link https://releases.openstack.org/caracal/schedule.html | 16:06 |
bauzas | #info Nova deadlines aren't set yet until we agree on them at the PTG | 16:07 |
bauzas | #info Bobcat GA planned tomorrow | 16:07 |
bauzas | mostly a FYI | 16:07 |
bauzas | #info Specs can be reproposed for 2024.1 Caracal timeframe | 16:07 |
bauzas | I've been a round of reviews but I still need to do my homework on some | 16:07 |
bauzas | I've been doing* | 16:08 |
bauzas | (with the verb, it's better) | 16:08 |
bauzas | #info Caracal-1 milestone in 6 weeks | 16:08 |
bauzas | I don't expect any deadline around caracal-1 so hold my beer | 16:08 |
sean-k-mooney | well | 16:09 |
sean-k-mooney | the only "deadlien" we really had around m1 ins the past was a gentil | 16:09 |
sean-k-mooney | encurrament to please have your specs ready for review | 16:09 |
bauzas | yeah, for sure, good point | 16:10 |
sean-k-mooney | because many are on PTO for a lot of the time between m2 and m1 | 16:10 |
sean-k-mooney | so prehaps not a deadline | 16:10 |
sean-k-mooney | but i do want to do a spec review day | 16:10 |
bauzas | once we agree on the deadlines, be sure I'll probably tell this | 16:10 |
bauzas | yeah, sure too | 16:10 |
sean-k-mooney | somewhere between m1 and decemeber 13th ish | 16:10 |
bauzas | fwiw, I started looking at some specs | 16:10 |
bauzas | but we can align ourselves on a review day, this is helpful | 16:11 |
bauzas | and yeah about the calendar | 16:11 |
sean-k-mooney | when is m1 this release | 16:11 |
bauzas | for some unexpected reason, people leave around Christmas period, doh. | 16:11 |
* sean-k-mooney has not looked at the schedule yet | 16:11 | |
bauzas | C-1 is mid-Nov | 16:11 |
sean-k-mooney | ack | 16:11 |
bauzas | C-2 is the second week of January | 16:12 |
sean-k-mooney | so like m-1 + 4 weeks is when we should try to get most specs reviewd and landed by | 16:12 |
bauzas | so yeah, basically, it would be appreciated if specs could be written in advance of the XMas period | 16:12 |
sean-k-mooney | m-2 is very early in jan ya | 16:12 |
bauzas | sure, and that's one of the topics we already have in the PTG etherpad :) | 16:13 |
sean-k-mooney | cool | 16:13 |
bauzas | anyway, yeah, people can propose anytime they want, the sooner the better | 16:13 |
bauzas | but we'll clarify that after PTG | 16:13 |
bauzas | this actually goes well as a good transition | 16:14 |
bauzas | #topic Caracal vPTG planning | 16:14 |
bauzas | #info Sessions will be held virtually October 23-27 | 16:14 |
bauzas | #info Register yourselves on https://ptg2023.openinfra.dev/ even if the event is free | 16:14 |
bauzas | (I stupidly did it twice, heh) | 16:14 |
bauzas | #link https://etherpad.opendev.org/p/nova-caracal-ptg PTG etherpad | 16:14 |
bauzas | #info add your own topics into the above etherpad if you want them to be discussed at the PTG | 16:15 |
bauzas | time for collecting your inputs ! | 16:15 |
bauzas | again, the sooner the better | 16:15 |
bauzas | the sooner we know the topics, the easiest it would be for us to organize our sessions and plan in advance the schedule | 16:15 |
bauzas | tl;dr: don't hold your thoughts, provided you care about something for Caracal | 16:16 |
bauzas | moving on | 16:16 |
bauzas | #topic Review priorities | 16:16 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:16 |
bauzas | #info As a reminder, people eager to review changes can +1 to indicate their interest, +2 for asking cores to also review | 16:17 |
bauzas | I shall actually regenerate a new Gerrit dashboard with review prios | 16:17 |
bauzas | that's easy | 16:17 |
bauzas | #action bauzas to propose a Gerrit dashboard for review priorities | 16:18 |
bauzas | #topic Stable Branches | 16:18 |
bauzas | elodilles: WAAAZUUUP ? | 16:18 |
elodilles | :) | 16:18 |
elodilles | as you said, we are in a calm period, even for stable branches | 16:18 |
elodilles | i'm not aware of any broken stable gates | 16:19 |
bauzas | \o/ | 16:19 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:19 |
elodilles | please add here issues if you find one ^^^ | 16:19 |
elodilles | that's all from me | 16:19 |
auniyal | #info request to review https://review.opendev.org/q/topic:bug%252F1996732 | 16:19 |
bauzas | elodilles: bobcat will become our stable branch tomorrow | 16:19 |
bauzas | elodilles: I assume with the new TC resolution that nothing happens with the other branches in terms of maintenance ? | 16:20 |
elodilles | well, 2023.2 is already open for all our repositories | 16:20 |
elodilles | and yoga was next to transition to EM in early november, | 16:20 |
bauzas | oh right, we did cut the last rc | 16:20 |
elodilles | i don't know yet what will be the new process around yoga | 16:21 |
bauzas | I need to read something at bedtime | 16:21 |
bauzas | which is the new TC resolution about EM | 16:21 |
elodilles | anyway, i suggest to release any needed yoga patches, if you have one :) | 16:21 |
bauzas | (should hopefully find sleep after that :D) | 16:21 |
bauzas | (naaah, kidding) | 16:21 |
elodilles | because in a month yoga will transition to somehwere :) | 16:22 |
gibi | :) | 16:22 |
bauzas | fwiw, I have no personal interest into yoga, in whether form it can be :) | 16:22 |
bauzas | including the real strecthing thing | 16:23 |
bauzas | except it will be our first SLURP-ish release to disappear | 16:23 |
dansmith | SLURPy? | 16:23 |
elodilles | :) | 16:23 |
bauzas | hah | 16:24 |
bauzas | anyway, that questions me about the future of non-SLURP releases if the precedent SLURP release goes EOL | 16:24 |
auniyal | bauzas few from my end ... \o | 16:24 |
bauzas | surely people can non-skip releases, so it can exist | 16:25 |
bauzas | but its interest reduces | 16:25 |
sean-k-mooney | well each reasle is supported for 18 months | 16:25 |
bauzas | auniyal: you're next in the pipe | 16:25 |
sean-k-mooney | and non slurps are drop after that | 16:25 |
auniyal | ack thakns | 16:25 |
auniyal | sean-k-mooney there is a backport patch for stable/zed https://review.opendev.org/c/openstack/nova/+/885344 created by gibi | 16:25 |
sean-k-mooney | only slurps can move form stable/x to unmained/x | 16:25 |
auniyal | its for anti-affinity check count, can you please review this . | 16:25 |
bauzas | anyway, not a question we shall discuss here, mostly a TC-wide question | 16:26 |
sean-k-mooney | so the current stables got granfatherd in | 16:26 |
auniyal | I am asking you because you had reviewed it for master and its merged in master. | 16:26 |
sean-k-mooney | there is a section in the tc doc for that | 16:26 |
auniyal | it's created till ussuri, if its alright can you please approve them as well. thanks :) | 16:26 |
sean-k-mooney | so zed is still ok to have patches merged | 16:26 |
bauzas | ok, anuyal, I guess it's your turn now | 16:26 |
JayF | sean-k-mooney: ++ that matches my understanding | 16:26 |
auniyal | :( | 16:26 |
sean-k-mooney | we should however do a review of the old stable branches and determin if they are healty | 16:26 |
sean-k-mooney | liek we are ment to do for unstable | 16:27 |
bauzas | sean-k-mooney: yeah, sounds an idea | 16:27 |
auniyal | bauzas added already thats all from me | 16:27 |
sean-k-mooney | that specific patch seams to have passed ci so i hope that means zed is healty | 16:27 |
bauzas | because we EOLd Train with pain | 16:27 |
bauzas | and I would want us to be a little more aggresive in terms of cleanup :) | 16:28 |
sean-k-mooney | i would have to check the resolution but i think everyting pre Wallaby should be removed under the propsal | 16:28 |
sean-k-mooney | but i dont recall off the top of my head | 16:28 |
sean-k-mooney | https://github.com/openstack/governance/blob/master/resolutions/20230724-unmaintained-branches.rst#transition | 16:29 |
sean-k-mooney | he last 3 active Extended Maintenance branches are automatically transitioned to Unmaintained branches. | 16:29 |
elodilles | ++ | 16:29 |
sean-k-mooney | so thats W,X,Y | 16:29 |
sean-k-mooney | unless im off by one which i could be | 16:30 |
elodilles | (this is the merged resolution: https://governance.openstack.org/tc/resolutions/20230724-unmaintained-branches.html ) | 16:30 |
sean-k-mooney | so we dont need to fully figure this out now | 16:30 |
bauzas | as I said, I need to do bedtime readings | 16:31 |
bauzas | nothing urges | 16:31 |
sean-k-mooney | but we can prepare to EOL Ussuri and victoria and asses Wallaby | 16:31 |
elodilles | sean-k-mooney: i remember, X,W,V, hence, i'm not sure about Y :) | 16:31 |
bauzas | but yeah, EOLing up to Victoria is honestly a goal for me :) | 16:31 |
sean-k-mooney | eoling Victoria allows us to drop 18.04 | 16:32 |
sean-k-mooney | form all our ci | 16:32 |
elodilles | yeah, ussuri is the last 18.04 based series | 16:32 |
sean-k-mooney | so that i think has a lot of value | 16:33 |
bauzas | yeah | 16:33 |
dansmith | why? | 16:33 |
bauzas | in terms of CI | 16:33 |
dansmith | I mean I'm fine with EOLing V but is 18.04 adding a lot of pain I'm unaware of? | 16:33 |
bauzas | if nova stops supporting ussuri, that would pull the other projects to do the same | 16:33 |
sean-k-mooney | dansmith: 18.04 is EOL form a canonical point of view | 16:34 |
sean-k-mooney | so its not entirly unsecurity patched | 16:34 |
sean-k-mooney | it went out of supprot in march this year | 16:34 |
bauzas | worth adding it into a PTG topic then | 16:35 |
dansmith | okay just for that reason, fine, I just am not aware of any actual problems | 16:35 |
bauzas | I take the action | 16:35 |
sean-k-mooney | we dont get any similar benifts again until zed | 16:35 |
dansmith | bauzas: you said this meeting would be quick | 16:36 |
bauzas | ok, moving on | 16:36 |
bauzas | and yeah | 16:36 |
sean-k-mooney | zed droped centos 8 and python 3.6 | 16:36 |
bauzas | my bad, I opened a can | 16:36 |
elodilles | :D | 16:36 |
bauzas | #topic Open discussion | 16:36 |
bauzas | (nothing) | 16:36 |
bauzas | so, anything else or can I return you 23 mins of your time ? | 16:37 |
sean-k-mooney | not form me | 16:37 |
bauzas | thanks all | 16:37 |
bauzas | #endmeeting | 16:37 |
opendevmeet | Meeting ended Tue Oct 3 16:37:29 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:37 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-03-16.00.html | 16:37 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-03-16.00.txt | 16:37 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-10-03-16.00.log.html | 16:37 |
elodilles | thanks o/ | 16:37 |
* sean-k-mooney has one more meeting at the top of the hour and im done for today | 16:38 | |
dansmith | atmark: okay so, there should never be compute services in cell0.. that's why you're seeing the duplicates from the api I think, which we should resolve to start | 16:38 |
dansmith | atmark: I think you probably also have that node uuid in the compute_nodes table there | 16:38 |
dansmith | and that if you look, these two compute nodes are pointing at cell0 and not the proper cell, which explains your other issues | 16:39 |
sean-k-mooney | if it actully got that far it proably create teh placement RP which could cause name collisions if you repoint it ot cell1 | 16:39 |
dansmith | so can you check compute_nodes in cell0, and also check to see if there are any instances in cell0 (other than ones in the build state) | 16:39 |
dansmith | sean-k-mooney: yes and atmark has already deleted some of that, so we have more work to cleanup there | 16:40 |
sean-k-mooney | ack | 16:40 |
atmark | dansmith: i do have those two computes in nova_cell0 | 16:43 |
dansmith | atmark: okay, now things are starting to make sense :) | 16:43 |
auniyal | sean-k-mooney hopefully you noticed my request in meeting | 16:45 |
atmark | compute_tables in nova_cell0 https://paste.openstack.org/show/bYtEaxBqcGpSkDhThQY2/ | 16:45 |
atmark | compute_nodes* | 16:45 |
auniyal | if not please have a look when you have time | 16:45 |
dansmith | atmark: aha, there's the missing bf7* uuid | 16:45 |
sean-k-mooney | auniyal: you asked baout https://review.opendev.org/c/openstack/nova/+/885344? but it already has a +2w form dansmith | 16:45 |
sean-k-mooney | was there something else | 16:46 |
auniyal | yeah just noticed thanks dansmith | 16:46 |
auniyal | can you please look for others in topic | 16:46 |
sean-k-mooney | the others are in other branches | 16:46 |
sean-k-mooney | so after it has merged i can perhaps take another look | 16:47 |
atmark | dansmith: bf7* is new resource provider id for compute01. it used to be c017* | 16:47 |
auniyal | yes same patch backport for other stable branches | 16:47 |
sean-k-mooney | there is noting to do until the zed patch merges | 16:47 |
auniyal | till ussuri | 16:47 |
dansmith | atmark: it's the one I was looking for from the compute logs.. but it needs to be deleted and rebuilt entirely at this point | 16:47 |
auniyal | if you ad your +2 I'll look for seond later | 16:47 |
sean-k-mooney | ya so we can look at https://review.opendev.org/c/openstack/nova/+/885345 next but it will fail in the gate until the zed one is merged | 16:48 |
dansmith | atmark: so before we proceed, we need to check the instances table in there | 16:48 |
sean-k-mooney | maybe tomorrow can you ping me in the morning | 16:49 |
atmark | ok, one sec | 16:49 |
auniyal | ack, | 16:49 |
auniyal | tanks | 16:49 |
sean-k-mooney | i dont have time to load context and review before i finish for today but i can look at thing in the morning | 16:49 |
atmark | dansmith: cell0's instances table https://paste.openstack.org/show/bZuA2J0x09iTDI5JDmKH/ | 16:50 |
dansmith | atmark: okay excellent | 16:50 |
dansmith | that is both good and expected | 16:50 |
dansmith | atmark: so first thing to do is figure out what's wrong with those two computes. my guess is that during the upgrade, they got their transport_url pointed at the cell0/superconductor instead of the cell conductor | 16:51 |
atmark | I see find that in nova.conf right ? | 16:53 |
dansmith | yep | 16:53 |
atmark | the transport_url in nova.conf points to rabbitmq | 16:54 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Warn if we find compute services in cell0 https://review.opendev.org/c/openstack/nova/+/897246 | 16:55 |
dansmith | atmark: right it always points at rabbit.. I'm not familiar with kolla but are you running two sets of conductors, one for the top-level and one for the cells? | 16:56 |
dansmith | easy way to tell is compare your transport_url to a compute that is not borked :) | 16:56 |
atmark | in the controller i can see transport_url points to nova_cnco1 vhost of rabbitmq | 16:56 |
dansmith | start with the transport_url on the computes, and compare one of the broken ones to a non-broken one | 16:56 |
atmark | dansmith: ok, the transport_url on broken computes doesn't have nova_cnco1 vhost | 16:57 |
dansmith | okay, that's the problem, so fix that and restart nova-compute there | 16:58 |
atmark | just a sec | 17:01 |
atmark | dansmith: done | 17:14 |
atmark | took a while, had to the push change through kolla-ansible | 17:15 |
dansmith | atmark: okay is it happier on startup now? there might be more work to clean up, but it might also have just worked it out | 17:15 |
dansmith | by it I mean the compute service in its log | 17:15 |
dansmith | now that they're not pointing at the wrong transport (you fixed both right?) you can clear the service and node records from cell0's database which should resolve the service list duplication | 17:16 |
dansmith | if the computes are happy in the log, you probably need to do the placement heal part now | 17:16 |
atmark | yes, both are fixed. Still seeing `Failed to retrieve allocations for resource provider c017*` on compute01's nova-compute.log https://paste.openstack.org/show/b8j8iC0eV6TOK1QLVBfN/ | 17:18 |
atmark | is that expected? | 17:19 |
dansmith | atmark: yeah, so what happened is you deleted some stuff, the confused computes re-created their providers with the wrong uuid (the new one in the cell0) and now that it's back, it wants to use/recreate the old one | 17:19 |
dansmith | atmark: so probably best to nuke the allocations *and* the RP for each, then run the heal and then get them started back up | 17:20 |
dansmith | it's why I said "you shouldn't have deleted the allocations" earlier, despite the doc saying it's appropriate in some circumstances | 17:20 |
dansmith | but it's cool, this is confusing and messed up obviously :) | 17:20 |
atmark | clear the service and node records from cell0's database - I'm safe to remove these records https://paste.openstack.org/show/bxfAMF8OBcUCz2GsHiin/ ? | 17:24 |
atmark | should I remove entry or just update the deleted field ? | 17:26 |
dansmith | atmark: you can just set the deleted field to id if you want (that's what service delete does) but really they should go away.. but fine to just mark as deleted to test if you want | 17:26 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Warn if we find compute services in cell0 https://review.opendev.org/c/openstack/nova/+/897246 | 17:40 |
atmark | dansmith: For compute25 - I didn't have to do anything. compute01 - I deleted the RP, restarted nova-compute and then ran heal so issue is now resolved. I learned something new. Thanks a lot. | 17:47 |
dansmith | atmark: awesome, glad to hear it.. so you cleaned up the service *and* node records from cell0 right? | 17:47 |
atmark | If I unset alloactions instead of alloactions, will I face the same issue? https://docs.openstack.org/nova/latest/admin/troubleshooting/orphaned-allocations.html#solution | 17:47 |
atmark | dansmith: yes, I decided to remove the records instead of updating deleted field. | 17:48 |
dansmith | cool | 17:48 |
dansmith | atmark: not sure what you're asking about. unset .. "allocations instead of allocations" | 17:48 |
atmark | "allocation unset instead of allocation remove" | 17:49 |
dansmith | atmark: no I think it would have ended up the same way | 17:50 |
atmark | is it safe to remove these records in instances table in nova_cell0? https://paste.openstack.org/show/bP2VO4c7Gh6Tb4SIm5Wi/ | 17:56 |
dansmith | yes on the compute nodes.. the instances were all deleted in the last dump, so they belong there and will be cleaned after an archive | 17:59 |
dansmith | instances *do* belong in cell0, if they failed the schedule | 17:59 |
atmark | got it. thanks again | 18:10 |
opendevreview | Merged openstack/nova stable/zed: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/885344 | 18:19 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Warn if we find compute services in cell0 https://review.opendev.org/c/openstack/nova/+/897246 | 19:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!