opendevreview | Merged openstack/nova stable/ussuri: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864007 | 07:39 |
---|---|---|
opendevreview | Amit Uniyal proposed openstack/nova stable/train: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/863806 | 07:42 |
opendevreview | Amit Uniyal proposed openstack/nova stable/train: [compute] always set instance.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/864055 | 07:42 |
gibi | dansmith: I saw such interpreter crashes before. It is really a segfault of the python interpreter based on dmesg. Unfortunately it happens randomly afais. | 08:48 |
gibi | dansmith: hm, but this time it was OOM | 08:53 |
viks__ | hi, is there a way to set say `/var/lib/nova1` instead of `/var/lib/nova` ? i could not find any configuration to do that in `nova.conf` | 09:16 |
gibi | viks__: https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.instances_path and https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.state_path are the way I think | 09:43 |
viks__ | gibi: thanks... got it.. actually i was searching for `/var/lib/nova` in sample conf so i did not find it before... anyway can i set multiple values for it say for eg: `/var/lib/nova,/var/lib/nova1` where they are differnt 2 mount points ? | 09:48 |
gibi | viks__: you can only set a single path | 09:50 |
viks__ | gibi: ok.. thanks.. one more thing.. when to use `instances_path` if `state_path` itself will do the job? any suggestions? | 09:52 |
gibi | viks__: if you want to store the instance local disks in a different place then the nova lock files then instances_path will let you separate the instance disk from the lock files | 09:55 |
viks__ | gibi: ok.. thanks | 10:04 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: compute: enhance compute evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858383 | 10:14 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: api: extend evacuate instance to support target state https://review.opendev.org/c/openstack/nova/+/858384 | 10:14 |
*** elodilles_pto is now known as elodilles | 10:19 | |
gibi | dansmith: opened the bug for the OOM https://bugs.launchpad.net/nova/+bug/2002951 | 10:32 |
sean-k-mooney | viks__: if you cant use a single mount path for soem reason you could fake it using lvm voluems to combine multipel disk into one or do it at thte file system level instead of block level using mergerfs https://manpages.ubuntu.com/manpages/impish/man1/mergerfs.1.html | 10:54 |
sean-k-mooney | nova need everything to be in a single directory on the file system but we dont really care where that folder comes form or how you created it | 10:55 |
bauzas | auniyal: so, about what we discussed for https://bugs.launchpad.net/nova/+bug/1996732 | 10:56 |
bauzas | auniyal: what you need to do first is to check how to look at the host_state.failed_builds value | 10:57 |
sean-k-mooney | all you need to do is add a new exception that inherits form the existign one and then where we incremente the value skip it if its the new excpetion | 10:58 |
sean-k-mooney | the late affinity failure will use the new expction and not be counted | 10:58 |
sean-k-mooney | but because it inherits form the orginal any clean up that was previousl done will still be done | 10:58 |
sean-k-mooney | so ya one of the first steps is find out where we modify the build failure value | 10:59 |
sean-k-mooney | and also where we raise the current excetption for the affinity check | 10:59 |
sean-k-mooney | after that you can add the new expction and modify both to use it | 11:00 |
bauzas | sean-k-mooney: the concern of auniyal was how to test iut | 11:00 |
sean-k-mooney | out side of functional tests it would be tricky to do end to end but unit test for the excption raising and functional test for the end to end interaction. there is no point doing tepest type testing since fault inject will be needed and that is not somethign tempest is good for | 11:04 |
bauzas | sean-k-mooney: I think it's possible to have a functest for it | 11:06 |
bauzas | but we need to be able to verify the stats | 11:06 |
viks__ | sean-k-mooney: thanks for the suggestions | 11:06 |
bauzas | sean-k-mooney: if we have a functest that creates a group with anti-affinity policy without having the filter, then we can create two instances asking for the same host | 11:09 |
auniyal | sean-k-mooney, bauzas, suppose we have only one host hostA, an instance with server group having a anti-affinity is present hostA, so if user create anothere instance of same group, that time it will go for reschedule, and hence increase the counter of build faild | 11:09 |
auniyal | is this a valid scenario | 11:09 |
auniyal | to test | 11:09 |
bauzas | sean-k-mooney: and then we could verify the stats value | 11:09 |
bauzas | auniyal: yeah | 11:10 |
bauzas | auniyal: but then you need to verify the counter | 11:10 |
bauzas | https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/resource_tracker.py#L1978-L1980 | 11:10 |
bauzas | which is a defaultdict set here https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/resource_tracker.py#L99 | 11:11 |
bauzas | so the stats is updated here https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/stats.py#L143 | 11:12 |
bauzas | so, as you'll see, this is a keyed dict | 11:12 |
bauzas | with a functional test, you can introspect that compute.stats dict | 11:13 |
bauzas | and see that the value of that dict for the key 'failed_builds' is incremented by 1 | 11:13 |
bauzas | after creating inst2 | 11:13 |
bauzas | and that's what we want to change | 11:14 |
bauzas | auniyal: ^ | 11:14 |
auniyal | ack, this will tell us isntance failed again and agian | 11:16 |
gibi | auniyal: no, if you have one host with an instance in an anti-affinity group and you try to schedule the second instance to the same group then it will not trigger a reschedule | 11:19 |
gibi | it will simply fail the scheduling with NoValidHost | 11:19 |
auniyal | yes | 11:19 |
bauzas | yes you need a test with 2 nodes, don't disagree | 11:20 |
gibi | bauzas: two nodes will not help either as the scheduler will pick the other host | 11:20 |
bauzas | gibi: not if you trick it :) | 11:20 |
gibi | to trigger a late affinity check failure (and hence the build failure counter increase) you need to have two parallel scheduling request | 11:20 |
bauzas | gibi: my proposal is simplier with a functest, we have code snippets for tricking the scheduler | 11:21 |
bauzas | or you could use the az hack | 11:22 |
bauzas | it will skip the scheduler | 11:22 |
gibi | (alternatively we could try to remove the anti-affinity filter from the config then the scheduler will allow both VMs to the same host, and the second will fail the late affinity check there) | 11:23 |
bauzas | gibi: ah, you missed then my point | 11:26 |
bauzas | (12:09:29) bauzas: sean-k-mooney: if we have a functest that creates a group with anti-affinity policy without having the filter, then we can create two instances asking for the same host | 11:27 |
bauzas | we indeed need a functest that *doesn't* use the AntiAffinityFilter | 11:27 |
bauzas | faking the scheduler is just for making sure we land instances on the same host | 11:27 |
gibi | bauzas: ack, then we thought about the same thing. cool :) | 11:28 |
auniyal | can we reproduce it manually | 11:29 |
auniyal | I understand reschdule is correct but its should be counted | 11:31 |
auniyal | so we just need to verify this before sauing buld failed at https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2265 | 11:32 |
auniyal | *it should NOT be counted | 11:32 |
bauzas | auniyal: you can reproduce it with devstack | 11:37 |
bauzas | auniyal: make sure the filter is disabled | 11:37 |
bauzas | and force to create two instances with the same group on the same host | 11:37 |
bauzas | that should work | 11:37 |
sean-k-mooney | bauzas: you can contol what filters are used in the fucntest so that should not be a problem | 11:37 |
bauzas | introspecting the stats field would be a bit trickier but I think we log the stats with the DEBUG level | 11:38 |
sean-k-mooney | ya we likely do but if you really need to you can jsut go driect to the db | 11:38 |
bauzas | sean-k-mooney: yeah, and I even think we have a funtest fake filter for ensuring all instances go the same host | 11:38 |
sean-k-mooney | often w will use a spy function to intercept and recored such thigns | 11:38 |
sean-k-mooney | bauzas: we do yes | 11:39 |
bauzas | anyway, /me goes off for lunch | 11:39 |
kashyap | Hmm, on this bz: https://bugzilla.redhat.com/show_bug.cgi?id=2138381 (on CPU compatibility). A Red Hat customer-facing person says using the new CPU API still throws the same error. Actually removing the check is what works correctly: | 11:42 |
kashyap | https://review.opendev.org/c/openstack/nova/+/869587 -- libvirt: Remove compareCPU() check in _check_cpu_compatibility() | 11:42 |
kashyap | About debuggability concerns (if you remove the compareCPU() check: the same guy confirms you'd get the same error from libvirt. And it is still debuggable (which is what I said before) | 11:43 |
sean-k-mooney | kashyap: its much much much less debugable as you now need to boot a vm to triger it | 11:44 |
sean-k-mooney | since you have https://review.opendev.org/c/openstack/nova/+/869950 i think i would prefer if you abandoned https://review.opendev.org/c/openstack/nova/+/869587 and we proceded with the replacment instead | 11:46 |
kashyap | sean-k-mooney: Sure, I would also prefer the replacement | 11:47 |
kashyap | We can agree to disagree on "much much much" | 11:47 |
kashyap | I don't have the energy to argue much anyway; /me is still recovering from a bike accident | 11:48 |
sean-k-mooney | when they tested this did they use teh cpu falgs to remove teh flag | 11:48 |
sean-k-mooney | as in did they set cpu_extra_flags=-whatever | 11:49 |
kashyap | I don't think they have specified it; I'll ask 'em on th ebz | 11:49 |
sean-k-mooney | ok its also kind fo sucks that the errror message does not tell you want is incompatible | 11:50 |
sean-k-mooney | it woudl be nice if it said this set of features are unavaible | 11:50 |
kashyap | Yeah | 11:51 |
sean-k-mooney | looks like they found a beaker node internally with icelake to test on | 11:52 |
sean-k-mooney | maybe we can do the same or get access to test it there | 11:52 |
sean-k-mooney | it might save some back and forth if you can get direct ssh access to see whats happening or get a dev env yourself that you can deploy devstack on | 11:53 |
kashyap | sean-k-mooney: Also, we should still keep the removing the check option open absolutely. Please don't insist on keeping it w/o good reasons. | 11:58 |
kashyap | The tests pass, and DanPB also once said that code is wrong and should be even removed. | 11:58 |
kashyap | ("tests pass" is not the full reason; but it is not causing problems/troubles. And it was also properly tested by the same RHT person) | 11:59 |
sean-k-mooney | kashyap: i have given yuou a good reason it will regress novas functionality to remove it and i have explained why | 11:59 |
sean-k-mooney | if i was insiting i would be using my -2 rights on the patch. i am not | 12:00 |
kashyap | sean-k-mooney: Sigh; the definition of "regression" is not serious here. We're going in circles. I also want other people's take here. | 12:01 |
kashyap | (You have to see the _effect_ of the patch: it is changing _where_ it is failing. Yes, it's a kind of a "regression"; but functionally users are better off) | 12:02 |
kashyap | Anyway. Let's explore the replacement patch in fuller too. | 12:02 |
kashyap | sean-k-mooney: I'm in agreement with you on definitely using the newer API, as that's a net-benefit. (I was not debating that one.) | 12:07 |
sahid | o/ guys, do we have a process to convert an option from bool to int? | 12:11 |
sean-k-mooney | sahid: we do it called not doing it. basically if your changing the type you have to rename the option and deprecat the old one. in this case your going form bool which in our congi is based on string to int | 12:17 |
sean-k-mooney | you can do that in plcaee befause we accpet true/yes|false/no not just 1|0 | 12:18 |
sean-k-mooney | sahid: what config option do you want to modify | 12:19 |
sean-k-mooney | you will basically have to deprecate the old one and add a new one in the new format and support both in the A cycle. | 12:19 |
sean-k-mooney | supporting both formats is required becasue you are not allowd to requrie config change to upgrade | 12:20 |
sahid | yes that the point I don't want to break things. | 12:25 |
sahid | sean-k-mooney: I'm not sure to understand you mean we can update from bool to int transparently as this is using a string to int? | 12:25 |
sean-k-mooney | we cant do it transparently because its string to int | 12:27 |
sahid | oh.. but the pb in our case will be that, a True will not be converted to a int | 12:27 |
sean-k-mooney | in c it would be int to int | 12:27 |
sean-k-mooney | we still need to accpet yes/y/True ectra in the config | 12:28 |
sean-k-mooney | and that woudl have to be converted to 1 i guess | 12:28 |
sean-k-mooney | but you would also have to accpet 1 and any other values you are supproting | 12:28 |
sean-k-mooney | so ya after your change you still need to be able to handel a config with True in it as valid if it was to be transparent | 12:29 |
sean-k-mooney | so thats the problem in this case | 12:29 |
sahid | sean-k-mooney: is related to this one, if you have a moment to take a look https://review.opendev.org/c/openstack/nova/+/867324 | 12:33 |
sahid | basically it's to add ability to set number of retry | 12:33 |
sahid | originaly the option is Bool, and used to activated or desactive announces | 12:34 |
sean-k-mooney | ah that patch i saw that breifly fly by | 12:34 |
sean-k-mooney | honestly i would just add a second config option for the retry | 12:34 |
sahid | it's now needed to specify a number of retries and i wamted to avoid that we introduce a new option | 12:35 |
sean-k-mooney | yep but if we do this i think its just clean to add a new option and default to 1 or 3 | 12:35 |
sahid | yes... as it turn now it's basically what we will have to do | 12:35 |
sean-k-mooney | ya so workaround options still are treatd like normal config options so the same rules apply | 12:36 |
sahid | you mean we could harcored the number of retry instead, | 12:36 |
sean-k-mooney | in this case i would jsu tkeep the enable as a bool and add a retry option | 12:37 |
sean-k-mooney | well we coudl but im ok with a config option for the reties | 12:37 |
sean-k-mooney | ill just comment on the patch one sec. | 12:37 |
sahid | so one option to enable, one option to set the number of retries, and one option to specify the interval | 12:37 |
sahid | cool thank you | 12:37 |
sean-k-mooney | yep exactly | 12:38 |
sean-k-mooney | and we can set teh retires and interval to whatever we think is a good default | 12:38 |
sahid | ok fairenough :) | 12:39 |
sean-k-mooney | ok done i was suggesting 1 or 3 because 1 i sthe current behavior and 3 is what qemu defaults too when it sends them | 12:43 |
kashyap | sean-k-mooney: BTW a small data point on that "mpx" saga: if Nova doesn't break at the first CPU compare in check_cpu_compatibility(), then using "cpu_model_extra_flags=-mpx" works | 13:37 |
kashyap | (That gives a hint too that the first compare is wrong) | 13:38 |
sean-k-mooney | the way it should be working is we should be removing the mpx flag form all the modles listed in cpu_models and if any of them pass then we proceed as normal | 13:39 |
sean-k-mooney | so as long as any of the listed modeles work with the cpu_model_extra_flags option applied then we shoudl boot | 13:40 |
sean-k-mooney | /boot/start the agent/ | 13:40 |
sean-k-mooney | although really if any of them are invlied with that combination we shoudl reject it as an error | 13:41 |
*** dasm|off is now known as dasm | 14:03 | |
opendevreview | Aaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self https://review.opendev.org/c/openstack/nova/+/867324 | 16:33 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Microversion 2.94: FQDN in hostname https://review.opendev.org/c/openstack/nova/+/869812 | 16:56 |
artom | bauzas, ^^ | 16:56 |
bauzas | artom: ack, will look | 16:57 |
bauzas | damn, who knows the Launchpad nick of Kirill ? /me needs to paperwork the right ownership of https://blueprints.launchpad.net/nova/+spec/ironic-vnc-console | 17:39 |
bauzas | anyway, I can live with that | 17:40 |
bauzas | wow, the numbers of accepted blueprints for Antelope are identical to Yoga | 17:44 |
bauzas | disclaimer: this is gonna be a productive 5-week | 17:44 |
sean-k-mooney | ya we have more then we will likely land but we shal see how it goes | 17:53 |
sean-k-mooney | pci in palcemnt is technially complete we jsut have some cleanups and a bugfix still waiting ot merge | 17:53 |
sean-k-mooney | but the feature is fully merged | 17:53 |
sean-k-mooney | im hoping artom's fqdn change, shaids evacuate change and dansmits uuid change will merge in the next week | 17:54 |
sean-k-mooney | we will see i guess based on review bandwith | 17:55 |
opendevreview | Merged openstack/nova master: Follow up for the PCI in placement series https://review.opendev.org/c/openstack/nova/+/855654 | 18:40 |
opendevreview | Merged openstack/nova master: Rename _to_device_spec_conf to _to_list_of_json_str https://review.opendev.org/c/openstack/nova/+/855648 | 19:44 |
*** dasm is now known as dasm|off | 22:37 | |
opendevreview | Merged openstack/nova master: Enable new defaults and scope checks by default https://review.opendev.org/c/openstack/nova/+/866218 | 23:49 |
opendevreview | Merged openstack/nova master: Remove use of removeprefix https://review.opendev.org/c/openstack/nova/+/867788 | 23:49 |
opendevreview | Merged openstack/nova master: Unit test exceptions raised duing live migration monitoring https://review.opendev.org/c/openstack/nova/+/859358 | 23:56 |
opendevreview | Merged openstack/nova master: Reproduce PCI pool filtering bug https://review.opendev.org/c/openstack/nova/+/855649 | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!