*** dasm|afk is now known as dasm|off | 00:14 | |
*** hemna0 is now known as hemna | 02:28 | |
*** hemna9 is now known as hemna | 02:45 | |
*** clarkb is now known as Guest2790 | 03:17 | |
*** bhagyashris is now known as bhagyashris|PTO | 05:43 | |
opendevreview | Stephen Finucane proposed openstack/os-resource-classes master: setup: Update Python testing classifiers https://review.opendev.org/c/openstack/os-resource-classes/+/834643 | 10:17 |
---|---|---|
opendevreview | Stephen Finucane proposed openstack/os-resource-classes master: setup: Replace dashes with underscores, add links https://review.opendev.org/c/openstack/os-resource-classes/+/834644 | 10:17 |
*** sfinucan is now known as stephenfin | 10:18 | |
zigo | Is there a way to evacuate a host that has 3 VMs that have affinity? Can I somehow tell nova "migrate them together" ? | 10:51 |
*** prometheanfire is now known as Guest0 | 11:48 | |
*** ChanServ changes topic to "This channel is for Nova development. For support of Nova deployments, please use #openstack" | 11:55 | |
*** osmanlicilegi is now known as Guest2 | 11:59 | |
opendevreview | anguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration https://review.opendev.org/c/openstack/nova/+/834677 | 12:41 |
opendevreview | anguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration https://review.opendev.org/c/openstack/nova/+/834677 | 12:47 |
stephenfin | sean-k-mooney: This isn't hugely important, but could you look at https://review.opendev.org/c/openstack/nova/+/723572/ and https://review.opendev.org/c/openstack/nova/+/723573/ today? | 12:50 |
opendevreview | anguoming proposed openstack/nova master: fix the bug of the log line has no request_id info at source host when live migration https://review.opendev.org/c/openstack/nova/+/834677 | 12:54 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Don't use generic 'Field' container https://review.opendev.org/c/openstack/nova/+/738239 | 12:58 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Remove unnecessary type aliases, exceptions https://review.opendev.org/c/openstack/nova/+/738240 | 12:58 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Use imports instead of type aliases https://review.opendev.org/c/openstack/nova/+/738018 | 12:58 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Remove wrappers around ovo mixins https://review.opendev.org/c/openstack/nova/+/738019 | 12:58 |
opendevreview | Stephen Finucane proposed openstack/nova master: WIP: add ovo-mypy-plugin to type hinting o.vos https://review.opendev.org/c/openstack/nova/+/758851 | 12:58 |
sean-k-mooney | stephenfin: sure ill take a look at them now while i have context on this they look reasonably short and i see gmann has already reviewed them | 13:00 |
sean-k-mooney | getting rid fo the dict compat layer has been long overdue | 13:00 |
sean-k-mooney | i woudl be nice not to have to review for new usease of them as a dict | 13:00 |
opendevreview | Stephen Finucane proposed openstack/nova master: doc: Remove useless contributor/api-2 doc https://review.opendev.org/c/openstack/nova/+/828599 | 13:02 |
EugenMayer | When deploying via terraform it and changing an flavor (thus replacing it) it seems like the old flavour was removed but not yet 'removed from the instance it has been used' and then it all failed. Now i'am stuck with Unable to retrieve instance size information. Details Flavor 384bc436-a0cb-4e4a-80d1-26dd03743061 could not be found. (HTTP 404) | 13:54 |
EugenMayer | (Request-ID: req-7c68445d-a8b5-4ef6-a11d-6f037402d92a) - so basically one of my instances references a flavor that no longer exists. Is there a way to somehow fix this? | 13:54 |
*** dasm|off is now known as dasm | 14:20 | |
artom | Anyone able to run functional tests on ussuri? | 14:48 |
artom | Trying to figure out if it's something local to me, or more widespread | 14:48 |
artom | Seems to be hanging/timing out on: | 14:48 |
artom | functional installdeps: -chttps://releases.openstack.org/constraints/upper/ussuri, -r/home/artom/src/nova/requirements.txt, -r/home/artom/src/nova/test-requirements.txt, openstack-placement>=1.0.0 | 14:48 |
* artom strace's | 14:49 | |
sean-k-mooney | i can try it one sec | 14:50 |
artom | Seems to be doing... something? | 14:50 |
artom | Looping on https://paste.opendev.org/show/b45jbgPA429f5iKFJSEq/ | 14:50 |
sean-k-mooney | looks like we are missing a fixture | 14:52 |
sean-k-mooney | from that trace | 14:52 |
sean-k-mooney | we shoudl not eb doing ioctl;calls in general | 14:52 |
sean-k-mooney | like that implies we are doing file io or network configuration | 14:53 |
sean-k-mooney | its running fine for me | 14:56 |
sean-k-mooney | were you having a failing test? | 14:56 |
sean-k-mooney | or just would not install | 14:56 |
sean-k-mooney | i did locally change the psycopg2 to psycopg2-binary in my test-requirements.txt but that is just because i dont have or want postgress installed on my laptop | 14:57 |
sean-k-mooney | so i dont have the headers to build psycopg2 form source | 14:58 |
artom | So there is a backport in progress | 15:01 |
* artom tries on a pristine ussuri rpeo | 15:01 | |
artom | *repo | 15:01 |
*** Guest2790 is now known as clarkb | 15:01 | |
artom | But... it's not running any tests (yet), it's on installdeps... | 15:01 |
sean-k-mooney | got the gerrit linke i can try that explictly if you want | 15:02 |
artom | sean-k-mooney, only local for now | 15:03 |
artom | Backporting https://review.opendev.org/c/openstack/nova/+/796907/2/nova/tests/functional/libvirt/test_pci_sriov_servers.py#73 to ussuri | 15:04 |
sean-k-mooney | i had one failure | 15:08 |
sean-k-mooney | FileNotFoundError: [Errno 2] No such file or directory: 'openssl' | 15:08 |
sean-k-mooney | which is likely just down to the fact im runing this on nixos | 15:08 |
artom | Seems to be the same problem with a pristine ussuri... | 15:09 |
artom | I should try on Ubuntu I guess? | 15:10 |
artom | Although func tests should be platform-independant | 15:10 |
sean-k-mooney | running them on macos last night not as much as you woudl think | 15:15 |
sean-k-mooney | we have a bunch that fail because they detach its not linux | 15:15 |
sean-k-mooney | maybe pass -r | 15:15 |
sean-k-mooney | or delete the .tox dir | 15:16 |
sean-k-mooney | incase you have some leftover issue form a previous run | 15:16 |
artom | Yep, tried with -r, same | 15:17 |
sean-k-mooney | odd what distro are you currently using | 15:17 |
sean-k-mooney | i can try on ubunu if you like i also have a centos 9 vm | 15:17 |
artom | F35 | 15:18 |
bauzas | reminder : nova meeting in 41 mins here at #openstack-nova | 15:19 |
bauzas | fwiw, DST is not impacting our meeting, as we use UTC | 15:19 |
clarkb | artom: sean-k-mooney: pip installs taking forever likely indicates a dependency resolver problem | 15:20 |
clarkb | we've seen that happen when the solver can't find a valid answer. However constraints tends to fix that and you supply constraints so maybe not that | 15:20 |
sean-k-mooney | clarkb: i dont think it was the resolver | 15:20 |
sean-k-mooney | clarkb: i think artom is gettign stack traces | 15:20 |
artom | sean-k-mooney, no, just spinning in the void | 15:21 |
sean-k-mooney | oh have you added -v | 15:21 |
artom | The paste was a `strace -p` output | 15:21 |
sean-k-mooney | so you can see what actully happening | 15:21 |
artom | sean-k-mooney, *facepalm* lemme try that | 15:23 |
sean-k-mooney | artom: f35 has a much newer gcc libffi and kernel then ussuri was developed with by the way so that strace was refering to ffi presumable as part of compiling some of the c python modules so there might be issues with tyrign to install ussuri on f35 to run the func tests | 15:35 |
zigo | I just noticed that if a host is over its CPU ratio (because it has been reduced), then live-migrations are silently failing (only the scheduler gives a clue). Is this known? Is this considered a bug? Should I file the bug? | 15:42 |
zigo | The workaround is obviously to temporary up the CPU overcommit ratio temporarily, but that's still kind of annoying to do. | 15:43 |
sean-k-mooney | zigo: yes its a know issue | 15:43 |
sean-k-mooney | it has to do with how placement currently validates allocation candiates | 15:43 |
sean-k-mooney | if its the issue i think it is | 15:43 |
zigo | Thanks. | 15:45 |
sean-k-mooney | if i rememebr correctly it also effect evacuate | 15:48 |
bauzas | last reminder : nova meeting in 9 mins | 15:51 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Mar 22 16:00:16 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | hey ho | 16:00 |
elodilles | o/ | 16:00 |
chateaulav | \o | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:00 |
gmann | o/ | 16:00 |
dansmith | o/ | 16:00 |
artom | ~o~ | 16:01 |
bauzas | ok, let's start | 16:01 |
bauzas | #topic Bugs (stuck/critical) | 16:01 |
bauzas | #info No Critical bug | 16:01 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+0 since the last meeting) | 16:01 |
bauzas | #help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage | 16:01 |
bauzas | #link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement | 16:01 |
bauzas | any bug in particular to discuss ? | 16:02 |
bauzas | I triaged a few of them but I need to create some env for verifying some others | 16:02 |
bauzas | ok, looks not | 16:03 |
bauzas | next, | 16:03 |
bauzas | #topic Gate status | 16:03 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:03 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status | 16:03 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:03 |
bauzas | I haven't seen any new problem | 16:03 |
gmann | one update for centos9 stream volume detach failure | 16:04 |
gmann | it is fixed now as SSH-able series is merged #link https://review.opendev.org/q/(topic:bug/1960346+OR+topic:wait_until_sshable_pingable)+status:merged | 16:04 |
gmann | I have made centos9-stream as voting job in tempest gate | 16:04 |
bauzas | \o/ | 16:04 |
dansmith | gmann: really, that makes it all pass reliably? | 16:05 |
gmann | and proposed to be voting in devstack side too #link https://review.opendev.org/c/openstack/devstack/+/834546 | 16:05 |
gmann | dansmith: for now yes:) | 16:05 |
dansmith | cool | 16:05 |
dansmith | fips job in glance was still failing this morning I think, but I will look and see if it ran against that or not | 16:05 |
gmann | and we will monitor it carefully now as we made it voting. n-v jobs always gets ignored somehow | 16:05 |
dansmith | yeah cool | 16:05 |
artom | So I wonder, would there be anything else to understand at the guest:host interaction level to understand why Ubuntu doesn't need to wait for SSHABLE? | 16:06 |
dansmith | artom: I'm super curious as well, as this seems like an odd thing to have changed with just newer libvirt/qemu, although certainly possible | 16:06 |
dansmith | we'll see if more weirdness comes out of running it in the full firehose | 16:06 |
gmann | dansmith: yeah, you can try with recheck. this patch fixed the last test #link https://review.opendev.org/c/openstack/tempest/+/831608 | 16:07 |
bauzas | agreed, it's weird but ok | 16:07 |
dansmith | as I was seeing other problems (on stream 8 mind you) when we were running it voting | 16:07 |
bauzas | thanks gmann btw. for having worked on it :) | 16:07 |
gmann | np!, just carried lyarwood work in this. | 16:07 |
bauzas | can we move ? | 16:07 |
gmann | yeah | 16:08 |
bauzas | kk | 16:08 |
bauzas | #topic Release Planning | 16:08 |
bauzas | shit | 16:08 |
bauzas | #topic Release Planning | 16:08 |
bauzas | #link https://releases.openstack.org/yoga/schedule.html#y-rc1 RC1 is past now | 16:08 |
bauzas | #link https://etherpad.opendev.org/p/nova-yoga-rc-potential Etherpad for RC tracking | 16:09 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=yoga-rc-potential RC potential tags | 16:09 |
bauzas | this is Regression chasing time ! | 16:09 |
bauzas | we only have 2 days to provide a RC2 if we find a regression | 16:09 |
bauzas | for the moment, we haven't see any of them | 16:09 |
bauzas | #info RC2 deadline is in 2 days, so we can only fix regressions before | 16:10 |
bauzas | actually, this is RC-deadline | 16:10 |
bauzas | not really a specific RC2 | 16:10 |
bauzas | we could have a RC2 release tomorrow and then a RC2 on Thursday | 16:10 |
bauzas | shit, RC3 on Thurs | 16:10 |
* dansmith watches where he steps in here | 16:11 | |
bauzas | this is just, either we find regressions before Thursday and then we need to merge the changes before, or we would have a Yoga GA release with some known issue and we could only fix the regression by a next stable release | 16:11 |
bauzas | but, as you can see https://bugs.launchpad.net/nova/+bugs?field.tag=yoga-rc-potential is empty | 16:12 |
bauzas | anyway | 16:12 |
bauzas | that's it for me | 16:13 |
bauzas | any question or discussion for Yoga before we go to the next topic ? | 16:13 |
bauzas | looks not | 16:14 |
bauzas | #topic PTG preparation | 16:14 |
bauzas | #link https://etherpad.opendev.org/p/nova-zed-ptg Nova Zed PTG etherpad | 16:14 |
bauzas | nothing to say, please provide your topics you would like to discuss | 16:15 |
bauzas | the PTG will be in 2 weeks, so I'd prefer to see all the topics before end of the next week | 16:16 |
bauzas | for the moment, we only have a few of them | 16:16 |
bauzas | anything to discuss about the PTG ? | 16:16 |
bauzas | reminder, PTG will be April 4 - 8, 2022 | 16:17 |
Uggla | bauzas, sorry for the noob question, will we review bp/specs for zed ? | 16:17 |
bauzas | Uggla: no worries, it's your first PTG | 16:18 |
Uggla | should we put the bp/specs in the agenda ? | 16:18 |
bauzas | Uggla: in general, we discuss about some specs if people have some stuff they'd like to see the community to find a consensus | 16:18 |
bauzas | Uggla: we don't generally look at all the open specs | 16:18 |
bauzas | people can also go and discuss about something they'd like to see or work, without having a spec yes | 16:19 |
bauzas | yet* | 16:19 |
bauzas | Uggla: look at the Xena PTG we had so you'll see what we discussed https://etherpad.opendev.org/p/nova-xena-ptg | 16:19 |
Uggla | bauzas, I will have a look, thanks. | 16:20 |
bauzas | ok, moving on, then | 16:21 |
bauzas | #topic Review priorities | 16:21 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1 | 16:21 |
artom | (No osc/sdk in there?) | 16:22 |
bauzas | I have seen new changes | 16:22 |
artom | (What with moving towards deprecation of the novaclient CLI) | 16:22 |
bauzas | artom: nope | 16:22 |
bauzas | artom: osc is another community but I understand your point | 16:23 |
bauzas | artom: this is just, this label is only supported for our repos | 16:23 |
artom | Ah, right | 16:24 |
bauzas | (AFAIK) | 16:24 |
sean-k-mooney | artom: we deprecated teh novaclint cli already | 16:24 |
artom | Yeah, I wasn't sure | 16:24 |
bauzas | artom: but if you want us to look at OSC changes, we can do this by some etherpad | 16:24 |
sean-k-mooney | the python binding are still allowed to be extended | 16:24 |
bauzas | artom: but you know what ? let's discuss this at the PTG to see how the nova community can review those OSC changes :) | 16:25 |
bauzas | artom: hopefully you'll provide a topic, right? | 16:25 |
bauzas | :) | 16:25 |
artom | Shoudve kept my fat mouth shut :P | 16:26 |
* artom will | 16:26 | |
bauzas | artom: :p | 16:26 |
bauzas | moving on | 16:26 |
bauzas | #topic Stable Branches | 16:26 |
bauzas | elodilles: your point | 16:26 |
elodilles | #info xena branch seems to be blocked by nova-tox-functional-centos8-py36 job - https://zuul.opendev.org/t/openstack/builds?job_name=nova-tox-functional-centos8-py36 | 16:26 |
elodilles | #info pike branch is blocked - fix: https://review.opendev.org/c/openstack/nova/+/833666 | 16:26 |
elodilles | and finally a reminder: | 16:27 |
elodilles | Victoria Extended Maintenance transition is due ~ in a month (2022-04-27) | 16:27 |
bauzas | wow, time flies | 16:27 |
elodilles | yes yes | 16:27 |
elodilles | that's it i think | 16:28 |
bauzas | elodilles: can we make the centos8 job non-voting ? | 16:28 |
elodilles | bauzas: that's an option | 16:28 |
bauzas | does someone already look at the issue ? | 16:28 |
elodilles | i had a quick look only | 16:28 |
artom | Seems to be spurious... | 16:29 |
bauzas | elodilles: ping me tomorrow morning and we'll jump onto it | 16:29 |
artom | The last few runs passed | 16:29 |
elodilles | it seems to be related to some mirror issue, but not sure | 16:29 |
bauzas | artom: not the stable/xena branch | 16:29 |
gmann | yeah seems mirror issue otherwise we can see same version conflict in other places also | 16:29 |
elodilles | bauzas: sure, thanks | 16:29 |
artom | ... then which? stephenfin has a fix up for the pike one, looks like... | 16:30 |
artom | So 'NFO: pip is looking at multiple versions of openstack-placement' is new, no? | 16:30 |
bauzas | for the pike branch, agreed on reviewing the fix | 16:30 |
artom | On my laptop, for stable/ussuri, it's taking forever | 16:30 |
gmann | elodilles: let's wait for few more run. | 16:30 |
bauzas | I don't want us to dig into the job resolution for now | 16:31 |
bauzas | but people can start looking at it after the meeting if they want | 16:31 |
elodilles | gmann: ack | 16:31 |
bauzas | this is just, I don't want this branch holding because of one single job | 16:31 |
bauzas | gmann: elodilles: I'd appreciate some DNM patches to make sure we don't hit this every change | 16:32 |
bauzas | looks we discuss all the thingies by now | 16:33 |
bauzas | discussed* | 16:33 |
*** Guest0 is now known as prometheanfire | 16:33 | |
bauzas | can we move ? | 16:33 |
gmann | did recheck on 828413, let's see | 16:33 |
bauzas | gmann: ++ | 16:33 |
elodilles | yes, thanks, let's move on | 16:34 |
bauzas | last topic then | 16:35 |
bauzas | #topic Open discussion | 16:35 |
bauzas | I have one | 16:35 |
bauzas | (bauzas) Upgrade our minimum service check https://review.opendev.org/c/openstack/nova/+/833440 | 16:35 |
bauzas | takashi gently provided a changes for bumping our min version support | 16:35 |
bauzas | before merging it, I'd like to make sure all people here agree on it | 16:36 |
dansmith | so one thing we might want to consider, | 16:36 |
bauzas | (that said, there is a grenade issue on its change, so even with +Wing it...) | 16:36 |
dansmith | is a PTG topic about the check (and the problems with it that we didn't foresee) to see if there's any better way we could or should be doing that whole thing | 16:36 |
dansmith | and just punt on the patch until we have that discussion | 16:36 |
bauzas | I already opened a PTG topic | 16:37 |
bauzas | I'll add the service check in it | 16:37 |
dansmith | okay | 16:37 |
bauzas | just done | 16:39 |
bauzas | people agree with this plan ? | 16:39 |
bauzas | either way, as said the change itself has grenade issues that need to be fixed | 16:39 |
bauzas | and I don't see any reason for rushing on it being merged | 16:39 |
bauzas | we have the whole zed timeframe for this | 16:39 |
elodilles | (grenade issue might be because devstack does not have yet stable/yoga) | 16:40 |
elodilles | (so that should be OK in 1 or 2 days) | 16:40 |
bauzas | we haven't released stable/yoga | 16:40 |
bauzas | this will be done on next Wed | 16:40 |
bauzas | elodilles: but yeah, sounds this | 16:41 |
elodilles | ++ | 16:41 |
gmann | yeah, we should do that soon, neutron face same ssue. | 16:41 |
gmann | elodilles: I will discuss in release channel | 16:41 |
elodilles | gmann: ack | 16:41 |
bauzas | ok, I guess we're done then | 16:43 |
artom | Oh, can we chat about https://review.opendev.org/c/openstack/nova/+/833453? | 16:43 |
bauzas | #agreed let's hold https://review.opendev.org/c/openstack/nova/+/833440 until we correctly discuss this at the PTG | 16:43 |
* bauzas clicks on artom's patch | 16:44 | |
artom | Really only bringing it up here because, as a periodic, we'd have to check up on the status, presumably here | 16:44 |
artom | Here == the meeting | 16:44 |
bauzas | artom: yeah, that's my point | 16:45 |
bauzas | we already do a few checks during the gate topic | 16:45 |
bauzas | but I wonder whether that wouldn't be better if we could agree on this at the PTG | 16:45 |
EugenMayer | is it possible to set the flavor of an instance manually using the api? | 16:46 |
EugenMayer | Oh - sorry. Still meeting time. Ignore me. | 16:46 |
artom | bauzas, doesn't seem controversial, but OK :) | 16:46 |
bauzas | artom: yup, I don't disagree | 16:47 |
bauzas | do people have concerns with adding a periodic check on whitebox ? | 16:47 |
artom | I guess the downside is CI resource usage, but... one nightly job seems OK? | 16:47 |
bauzas | I heard news of some CI resource shortage, but I'm not in the TC | 16:47 |
artom | Yet ;) | 16:48 |
bauzas | dansmith: gmann: can we just add a periodic job without being concerned ? | 16:48 |
artom | dansmith said someone is pulling out | 16:48 |
artom | (phrasing </archer>) | 16:48 |
dansmith | periodic is probably not a big deal I would imagine | 16:48 |
dansmith | I think we're going to need to trim down nova's per-patch jobs too, as it's getting pretty heavy | 16:48 |
bauzas | yeah, I don't think this is a big thing if we add a periodic | 16:49 |
bauzas | dansmith: adding a PTG topic about it fwiw | 16:49 |
gmann | yeah, and periodic also we can see if daily or weekly? | 16:49 |
bauzas | tbh, the only matter is how much we'll check its status and that will be weekly (during the team meeting) | 16:50 |
gmann | bauzas: artom along with periodic, add in experimental pipeline too for manual trigger. that helps to avoid adding it in check/gate pipeline if anyone want to run maually | 16:51 |
artom | bauzas, yep, no point in making it daily if we're only checking the status weekly | 16:51 |
artom | gmann, ack, can do | 16:51 |
gmann | +1 | 16:51 |
dansmith | yeah daily seems excessive | 16:51 |
bauzas | artom: update this change with the weekly period time and mention in the commit msg we'll need to verify it during weekly meetings | 16:53 |
* artom will have to find example of periodic weekly to figure out the correct Zuul words magic | 16:53 | |
bauzas | look at the placement ones | 16:53 |
artom | Oh yeah! | 16:53 |
gmann | artom: https://github.com/openstack/placement/blob/master/.zuul.yaml#L64 | 16:53 |
gmann | yeah | 16:54 |
artom | Hah, that was easy | 16:54 |
bauzas | this is another pipeline IIRC | 16:54 |
sean-k-mooney | by the way i think weekly jobs in general suit use better as we can review them in the weekly meeting | 16:54 |
sean-k-mooney | if we have a nightly we proably wont look at it every day | 16:54 |
bauzas | oh yeah | 16:54 |
bauzas | I just hope this meeting won't transform into some CI meeting | 16:54 |
chateaulav | artom: nova zuul has an example of weekly periodic now | 16:54 |
bauzas | if we start adding more periodics | 16:55 |
artom | I mean, feel free to nack the idea entirely :) | 16:55 |
sean-k-mooney | bauzas: well it should just be (are they green no we shoudl look at X after the meeting) | 16:55 |
artom | I'll obviously try to debate/convince you | 16:55 |
bauzas | artom: nah, I like the idea, I just want us to buy it | 16:55 |
artom | But if we think whitebox doesn't bring value to Nova CI, let's just not do it :) | 16:55 |
bauzas | we're approaching meeting's end time | 16:56 |
artom | End times are nigh | 16:56 |
bauzas | any other item to mention before we close ? | 16:56 |
sean-k-mooney | :) | 16:56 |
* artom gets raptured | 16:56 | |
sean-k-mooney | ah i actully had two blueprints i wanted to raise | 16:56 |
sean-k-mooney | we defered updating the defaults for allcoation ratios | 16:56 |
bauzas | sean-k-mooney: oh I forgot to mention I changed Launchpad to reflect zed as the active series | 16:57 |
sean-k-mooney | shall we proceed with that or discuss at ptg | 16:57 |
sean-k-mooney | also kasyaps blueprint for usign the new libvirt apis | 16:57 |
bauzas | we're a bit short in time for reapproving specless bps by now | 16:57 |
sean-k-mooney | can we retarget both to zed | 16:57 |
sean-k-mooney | ack | 16:57 |
bauzas | but we can look at them during next meeting | 16:57 |
sean-k-mooney | we can disucss it next week or at ptg | 16:57 |
bauzas | well, Zed is open | 16:58 |
bauzas | I'm OK with approving things by now | 16:58 |
bauzas | and the specs repo is ready | 16:58 |
bauzas | sean-k-mooney: just propose your two blueprints for the next meeting so we'll reapprove them (unless concerns of course) | 16:58 |
sean-k-mooney | ack | 16:59 |
bauzas | fwiw, I leave the non-implemented blueprints in Deferred state | 16:59 |
bauzas | once we start reapproving some, I'd change back their state | 17:00 |
bauzas | but anyway, we're on time | 17:00 |
bauzas | thanks all | 17:00 |
bauzas | #endmeeting | 17:00 |
opendevmeet | Meeting ended Tue Mar 22 17:00:16 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.html | 17:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.txt | 17:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2022/nova.2022-03-22-16.00.log.html | 17:00 |
elodilles | thanks bauzas o/ | 17:00 |
bauzas | was a productive meeting, after all | 17:00 |
EugenMayer | Is there any 'good way' to set the task-state of an instance that has been stuck in 'image backup' due to an issue in glance? so the field OS-EXT-STS:task_state is on "image_backup" | 17:03 |
*** tosky is now known as Guest38 | 17:04 | |
*** tosky_ is now known as tosky | 17:04 | |
EugenMayer | i see there is 'nova set --state' or 'nova reset-state' but both seeem to operate on the instance-power-state (OS-EXT-STS:power_state) or OS-EXT-STS:vm_state - but not the task-state | 17:05 |
zigo | sean-k-mooney: Yeah, this was an evacuate operation. | 17:07 |
dansmith | zigo: I thought you said live migrate? | 17:07 |
sean-k-mooney | zigo: ok the reason this breaks is for evacuation we only have 1 allocation in placemnt against both hosts | 17:07 |
sean-k-mooney | and since the souce host is over capstiy because you reduce the allocate ration the entire allcoation is considered invlaid | 17:08 |
sean-k-mooney | we disussed this at the ptg 1 or 2 ptgs ago | 17:08 |
sean-k-mooney | i cant recall if we said we should fix this after consumer types but i dont think we had a workaround other then tempoarly increase the allcoation ratio so its nolonger over commited | 17:09 |
dansmith | sean-k-mooney: we could also solve it the way we do for cold migration, which is hold the allocation on the source with the migration uuid right? | 17:09 |
sean-k-mooney | dansmith: yes we could that was on eof the options | 17:10 |
sean-k-mooney | im trying to find the launchpad bug | 17:10 |
bauzas | dansmith: sean-k-mooney: yeah, the Migration uuid for evacuate seems the better and cleaner approach | 17:11 |
sean-k-mooney | bauzas: that is what we were proposing doing | 17:12 |
sean-k-mooney | but i dont think anyone has worked on it since | 17:12 |
bauzas | :-) | 17:13 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1943191 | 17:13 |
sean-k-mooney | that might be it | 17:13 |
EugenMayer | I'am looking on https://wiki.openstack.org/wiki/CrashUp/Recover_From_Nova_Uncontrolled_Operations to understand how to recover from the crashed task state 'image_backup' but i'am not sure how to actual act upon that. Should i use the nova api? | 17:13 |
sean-k-mooney | and https://bugs.launchpad.net/nova/+bug/1924123 | 17:14 |
bauzas | sean-k-mooney: some people expect bugs to be fixed automatically :) | 17:14 |
bauzas | we don't have yet AI bots smart enough to close the gaps | 17:14 |
sean-k-mooney | EugenMayer: the wiki is basicaly unmaintained | 17:14 |
EugenMayer | i see. Thank you | 17:15 |
sean-k-mooney | in the early days of openstack we used the wiki for sepc and project created docs(docs not by the docs team) | 17:16 |
EugenMayer | I'am really not sure hot to again recover from the failed task the proper way. The only way i yet know, which is huge is: reset the state, then restart the compute the vm is hosted so thee state is somewhat recovered | 17:16 |
sean-k-mooney | there is not way to recover form it really beyond that | 17:17 |
sean-k-mooney | we dont provide a api to allow taskt to be restarted | 17:17 |
dansmith | reset state and reboot the vm is what I'd try first, | 17:17 |
sean-k-mooney | yep same | 17:18 |
dansmith | not restarting the compute I'd hope | 17:18 |
sean-k-mooney | ya that normally shoudl not be required | 17:18 |
sean-k-mooney | i guess it woudl depend on why it failed | 17:18 |
dansmith | definitely not expected for anything like a glance thing | 17:18 |
EugenMayer | trying that. AFAIR i had to restart the entire compute last time. Anyway, trying that | 17:18 |
sean-k-mooney | do you recall way? | 17:19 |
sean-k-mooney | *why | 17:19 |
EugenMayer | dansmith well this happens the 4th time. A stuck glance image backup task leaves the task_state of the instance in a broken state | 17:19 |
dansmith | honestly restarting the compute shouldn't even do anything, AFAIK | 17:19 |
sean-k-mooney | i wonder if the main thread of the compute agent was blocked on an io operations | 17:19 |
sean-k-mooney | that is the only thing i can think of that would be fixed by an agent restart | 17:20 |
sean-k-mooney | we were not using a thread pool on some of the older release for those | 17:20 |
dansmith | sean-k-mooney: compute is the thing that "consumes" the task_state and turns it into a vm_state, so to speak, so maybe we clear task_state in init_host in some cases? | 17:20 |
EugenMayer | well i'am on xena, so not really old | 17:20 |
dansmith | but either way, reset_state to error is supposed to let you clear everything by enabling force reboot I think | 17:21 |
dansmith | or that's the intent | 17:21 |
sean-k-mooney | dansmith: i think we do yes but not sure about this case | 17:21 |
EugenMayer | dansmith it is clear, swt wise, that there is more then one misconception in the microservice and task callstack. I'am not sure if glance is required to call a webhook on success or error (not sure how the result is propagated) but this is simply not the right design. | 17:22 |
EugenMayer | should the task crash on glance, neither success nor error is called (ever) and there seems nothing to recover from that | 17:22 |
dansmith | EugenMayer: none of that :) | 17:22 |
dansmith | everything is nova->glance | 17:22 |
sean-k-mooney | i belive this is a blocking call to do the upload to glace | 17:23 |
sean-k-mooney | if its async then either nova would poll | 17:23 |
sean-k-mooney | or we woudl get an external event form glance | 17:23 |
dansmith | so depending on the failure, nova should clean up whatever it can.. an upload to glance for sure should be recoverable on our end, so that's likely it's own bug if we're missing something | 17:23 |
sean-k-mooney | but i think image upload if blocking | 17:23 |
dansmith | sean-k-mooney: none of that with glance | 17:23 |
sean-k-mooney | right we dont do polling or external event right | 17:24 |
sean-k-mooney | we just do two blocking calls | 17:24 |
EugenMayer | if it is a blocking task, well the blocking should cleanup - which it seem to not do | 17:24 |
sean-k-mooney | one for creating the image and the second for the data upload | 17:24 |
dansmith | EugenMayer: if you can repro the problem that's definitely a bug candidate | 17:24 |
sean-k-mooney | EugenMayer: yes it should clean up if we get an error form glance | 17:24 |
dansmith | there are some situations where it might not make sense to clean up, but I would think a glance thing would always be something we can handle | 17:24 |
EugenMayer | dansmith i can reproduce this the 4th time. If you tell me what to gather, i will grab the logs you need the 5th time - which will happen | 17:25 |
dansmith | EugenMayer: logs | 17:25 |
EugenMayer | which logs to get? | 17:25 |
dansmith | all of them? :) | 17:25 |
sean-k-mooney | dansmith: i would expect the vm to go back to active or error if we dont clean up right | 17:25 |
dansmith | nova-compute, nova-api at least | 17:25 |
dansmith | sean-k-mooney: error, yeah | 17:25 |
EugenMayer | vm is in active state, power is on, task_state is image_backup | 17:25 |
dansmith | that said, reset_state resets task_state so that should be the way to get out here | 17:26 |
sean-k-mooney | you can reset state to active | 17:26 |
EugenMayer | reset-state --active + reboot seems to recover just right. Also viewing the console works (which is one of the problems with a partial state recovery) | 17:26 |
dansmith | EugenMayer: we're saying that what we would expect is vm_state=ERROR,task_state=None | 17:26 |
sean-k-mooney | rather then error and potentaly just trigger the backup/snapthot again | 17:26 |
EugenMayer | dansmith that never happened yet | 17:26 |
dansmith | EugenMayer: I know, I'm saying that's what we expect nova should be doing | 17:27 |
sean-k-mooney | EugenMayer: do you know why the glance operation is failing. | 17:27 |
sean-k-mooney | dansmith: i could see an argument to be made that we woudl have vm_state=Active task_state=None but the snapshot action was marked as error in the server event log | 17:28 |
sean-k-mooney | if the vm was indeed still runing proberly depending on how it failed | 17:28 |
EugenMayer | there is so much one can break right now. e.g. a other topic is using terraform and rescale a flavor. In 2 of 5 cases the following happens (i cannot tell you exactly). The old flavor is delete (too early), the new one is created, then the instance is fetched, this fails since the flavor_id of the old flavor is still set and cannot be found. TF | 17:28 |
EugenMayer | cancles and that's it | 17:28 |
dansmith | sean-k-mooney: the problem is one of signaling, which is why we (originally as designed) went to error,None for everything and then you do a start (which does nothing) to reset back to active as sort of "ack" | 17:28 |
EugenMayer | stuck again - stuck that one now needs to shelve the instance and restore it from glance using the 'new flavor' | 17:29 |
EugenMayer | i did not yet check the tf openstack provider implementation to see what they have implemented and how that is a timing issue in the first place (since it does not happen every timee) .. but if i look at the openstack rest api / nova api .. swapping flavors is not designed at all. | 17:30 |
sean-k-mooney | EugenMayer: well flavor are intenede to be imuatble so you idealy woudl not delete them until all instance using them are resized | 17:30 |
sean-k-mooney | we do cache the flavor | 17:30 |
sean-k-mooney | in the insntace | 17:30 |
dansmith | EugenMayer: are you describing two issues or one? if the former, then let's not complicate diagnosing this one | 17:30 |
sean-k-mooney | but really you shoudl try to avoid removing flavor or image that are in use | 17:30 |
EugenMayer | well i cannot tell why tf openstack providere deletes the flav too early or whatever happens in detail (i did not check the sequence in the code yet) | 17:31 |
sean-k-mooney | EugenMayer: it should not delete it at all | 17:31 |
EugenMayer | dansmith sorry, my bad. second issue (the latter one with the flav) | 17:31 |
sean-k-mooney | it sould like they are implementing the hacky workflow that horizon use to have | 17:31 |
dansmith | EugenMayer: yeah, not helping :) | 17:31 |
EugenMayer | dansmith sorry. my bad. | 17:32 |
sean-k-mooney | where they allowed you to update a flavor by deletign and recating it but ya lets not talk about that issue now | 17:32 |
EugenMayer | well if you ask me to the state error - one should not mark the instance as 'error' if a image_backup task failed - there is no reason for that. Creating a glance image does not required the instance to shutdown or similar, this said, i assume both task (the instance running) and the creation of the image can work in parallel and are independent | 17:35 |
dansmith | EugenMayer: going to error state is just the nova convention (in most places) | 17:35 |
EugenMayer | so this said, if the image_backup task is: failed, the task_id does no longer exists or whatever, nova should not block 'restarting the instance' | 17:36 |
dansmith | and if the issue wasn't critical, then a start operation will clear the error state without requiring a reboot of the actual instance | 17:36 |
EugenMayer | dansmith well it is the 'better safe then sorry convention i guess' | 17:36 |
dansmith | EugenMayer: we're agreeing with you that we do not expect that this is something that should be so jammed up and that there's probably some missing error handling in this case | 17:37 |
dansmith | I'm describing what the usual nova error procedure is, regarding going to error state to signal to the user that their thing didn't happen | 17:37 |
dansmith | it's not great, it's just the convention | 17:37 |
sean-k-mooney | EugenMayer: creatign the glance image might require the instnace to be shutdown by the way | 17:37 |
dansmith | because if you do a backup, and the instance goes to active, you assume it worked, but it didn't | 17:37 |
sean-k-mooney | snapshots are not guareentee to be live | 17:37 |
dansmith | right | 17:38 |
EugenMayer | if nova is the task owner, which i understood is the case, it should design a propere state machine in case the task (which i understood is blocking via REST, so very fragile). Task could complete failed or succeeded. Task could never complete or even be deleted (on the glance side) | 17:38 |
sean-k-mooney | EugenMayer: there was a effort to do that at one point but this is also a distibuted system problem | 17:38 |
dansmith | EugenMayer: there's no task | 17:38 |
EugenMayer | understood, but i assume the sequence is: shutdown/sleep instance, create snapshot, start/resume instance, upload snapshot to glance .. (do task tracking) | 17:38 |
sean-k-mooney | EugenMayer:right yes but there may be clean up to be done in the compute node or stoage backend if the upload fails | 17:39 |
EugenMayer | no task means: it's blocking only. Understood there is no task_id or somewhat, just a blocking http-call. So as you both suggested, this blocking call needs to cleanup in all cases: 200,500 and also 408 and others. | 17:40 |
sean-k-mooney | such as deleting the file we created that was not uploaded | 17:40 |
dansmith | EugenMayer: we're saying exactly that.. we should, assuming we can | 17:41 |
sean-k-mooney | EugenMayer: yep and nova shoudl check the respocne code and start cleaning up if it failed | 17:41 |
EugenMayer | the i have seen the glance image task under image, which i was able to delete, but since the blocking request disconnected far ago, no cleanup happened on the nova side | 17:41 |
dansmith | EugenMayer: there are cases that are more complicated, such as with ceph where we might not be able to recover at all, depending on what happened, but in general we agree | 17:41 |
EugenMayer | agreed | 17:41 |
sean-k-mooney | well recovery in ceph might be squash/merge the ceph snapshot back into the previous volume for example | 17:42 |
dansmith | depends on the failure of course | 17:42 |
sean-k-mooney | where as for qcow we woudl mirror the file on disk then upload and if it faile delete the copy | 17:42 |
sean-k-mooney | EugenMayer: if you have logs and or a repoduce please file a bug and we can see if we can figure out why nova is not cleaning up as expected | 17:43 |
opendevreview | Stephen Finucane proposed openstack/nova master: mypy: Add nova.cmd, nova.conf, nova.console https://review.opendev.org/c/openstack/nova/+/705657 | 17:52 |
opendevreview | Stephen Finucane proposed openstack/nova master: mypy: Add type annotations to top-level modules https://review.opendev.org/c/openstack/nova/+/705658 | 17:52 |
opendevreview | Stephen Finucane proposed openstack/nova master: trivial: Clean manager.Manager, service.Service signatures https://review.opendev.org/c/openstack/nova/+/764806 | 17:52 |
EugenMayer | sean-k-mooney dansmith will do, thank you for your both time | 17:58 |
admin1 | hi all .. i am hitting this bug, https://bugs.launchpad.net/glance/+bug/1916482 , but don't have an idea on how to solve it .. i am using openstack-ansible and the latest tag 24.0.1 | 18:22 |
admin1 | nova is local disk, glance is rbd | 18:22 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Remove unnecessary type aliases, exceptions https://review.opendev.org/c/openstack/nova/+/738240 | 18:22 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Use imports instead of type aliases https://review.opendev.org/c/openstack/nova/+/738018 | 18:22 |
opendevreview | Stephen Finucane proposed openstack/nova master: objects: Remove wrappers around ovo mixins https://review.opendev.org/c/openstack/nova/+/738019 | 18:22 |
opendevreview | Stephen Finucane proposed openstack/nova master: WIP: add ovo-mypy-plugin to type hinting o.vos https://review.opendev.org/c/openstack/nova/+/758851 | 18:22 |
opendevreview | Ghanshyam proposed openstack/nova stable/xena: DNM: testing centos8 py36 job https://review.opendev.org/c/openstack/nova/+/834765 | 18:36 |
opendevreview | Ghanshyam proposed openstack/nova stable/wallaby: DNM: testing centos8 py36 job https://review.opendev.org/c/openstack/nova/+/834721 | 18:38 |
*** dasm is now known as dasm|off | 22:18 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!