Monday, 2022-07-11

bauzasgood morning Nova08:13
bauzasgibi: I just realized I won't be able to lead our weekly meeting tomorrow and I'll be off until Friday 08:13
bauzasmdbooth is visiting me this week for seeing two stages of the Tour de France in the Alps08:14
kashyapNice08:14
bauzasnot in Nice :p08:14
kashyapHeh08:15
Ugglabauzas, o/08:18
bauzas\o (left-handed)08:18
Ugglabauzas, you are left handed, me too in fact. :)08:18
bauzasyou won't never see me raising the right hand08:19
gibibauzas: this tuesday I have to log off around ~18:30 CEST so I can start but I need somebody else also on the hook if we run longer than 30 mins08:20
bauzasgibi: we'll see if sean-k-mooney can do it or we'll punt08:21
gibiUggla: would you like to try to run a nvoa meeting? I can help in the first 30 mins :)08:21
gibibauzas: yeah, or sean-k-mooney 08:21
bauzasheh, that's a good experience indeed08:21
bauzasI can prepare the agenda, I'll be working tomorrow08:22
gibiOK08:22
bauzasthis is just, I have to leave early for arriving to the Lautaret pass not that late08:22
gibibauzas: btw, I left +2 on the first patch of unshelve to host, and I have only minor nits in the func test in the second patch, so we could land that feature this week 08:23
bauzasgibi: that's cool, I was about to do it08:23
bauzasbut I'll need to be schizophrenic :p08:23
bauzasI just uploaded my own version of 2.91 :)08:23
bauzasso either I shoot myself in the foot when reviewing Uggla's patch or I leave him angry :D08:24
gibinah, you 2.91 is still WIP, I vote for Uggla's 2.91 :)08:25
bauzasyeah I know08:25
bauzastoday it should no longer be WIP but Uggla has the precedence :)08:25
bauzasthis is just, touching the UTs and functtests of keypairs is really... fuky08:26
bauzasfunky08:26
Ugglabauzas, gibi, I'd like to help with the meeting but I have never done it. May I miss rights to run it ?  08:28
gibiI guess they were not touched for a long time and gathered some dust08:28
gibiUggla: no need for rights other than assigning you to be chair at the beginning08:28
gibithen the bot will listen to you :)08:28
sean-k-mooneyUggla: bauzas or gibi just add you as a co chair for the meeting once they start it and that will give you the rights08:28
gibi#startmeeting foo08:29
opendevmeetMeeting started Mon Jul 11 08:29:58 2022 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.08:29
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.08:29
opendevmeetThe meeting name has been set to 'foo'08:29
gibi#endmeeting08:30
opendevmeetMeeting ended Mon Jul 11 08:30:02 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)08:30
opendevmeetMinutes:        https://meetings.opendev.org/meetings/foo/2022/foo.2022-07-11-08.29.html08:30
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/foo/2022/foo.2022-07-11-08.29.txt08:30
opendevmeetLog:            https://meetings.opendev.org/meetings/foo/2022/foo.2022-07-11-08.29.log.html08:30
gibiheh08:30
gibinot even chair right needed08:30
sean-k-mooneyoh hum i guess i was under the impression that there as a list somewhere of who had permissions to do that08:30
Ugglaany specific commands to know ?08:32
bauzassean-k-mooney: no, anyone can start any meeting with the bot08:32
sean-k-mooney#link #agreed #undo #topic?08:33
bauzassomeone can litterally hijack our nova meeting without having problems08:33
sean-k-mooneyi think thos are the main ones08:33
bauzasUggla: sec08:33
bauzashttps://docs.releng.linuxfoundation.org/en/latest/meetbot.html08:33
bauzaswe use this bot ^08:33
sean-k-mooneyits in the bot output above too08:33
bauzassean-k-mooney: yeah but the wikipage lacks some actions08:34
bauzasthe one I provided has better documentation08:34
sean-k-mooneybauzas: good to know i had assumed you needed to be an op in the irc channel to use teh bot08:34
bauzasno08:34
bauzasI'm even not op in this chan08:35
sean-k-mooneyack08:35
sean-k-mooneydid that change since we moved to oftc or was that alsway the case08:35
bauzasUggla: if you wanna run the end of the meeting, you'll just have to copy/paste the agenda, that's it08:35
bauzassean-k-mooney: was always the case08:36
sean-k-mooneyi guess it makes sense since it would have been annoying to maintain in the shared meeting rooms08:36
bauzashttps://meetings.opendev.org/meetings/climate/2013/climate.2013-10-28-10.00.log.html08:36
sean-k-mooneyya ok just never had a reason to try so never looked into it08:36
bauzasthat's the first meeting I ran IIRC :)08:37
bauzasand I wasn't op in the chan this time08:38
Ugglabauzas, if it can help I'm ok.08:38
Ugglabauzas, I'll try to do my best.08:38
bauzasUggla: you'll need high-leveled skills08:39
bauzas1/ open a webpage08:39
bauzas2/ copy a line08:39
bauzas3/ paste it to the chan08:39
Ugglaok08:41
Ugglabauzas, just to let you know that I have updated my laptop system fw to 0.1.23, and it looks better regarding thermal management and cpu throttling.08:45
bauzasUggla: oh, just one thing, you'll need to remove the starting space when pasting08:45
bauzasthe # char needs to be the first in the line08:45
bauzasUggla: yeah, I upgraded too08:46
bauzasnow I'm in performance unconditionnally08:46
bauzasbut I'd like to see how I could make it changing the mode depending on my battery plug08:47
Ugglabauzas, I agree it seems we can stick it to performance mode.08:47
sean-k-mooneybauzas: gibi  by the way i just want to highlight this to ye to get some highlevel imput on the direction https://review.opendev.org/c/openstack/nova/+/845660 i dont often -2 things but i feel like this needs a spec or at least a blueprint and i dont really agree directionally with haveing config driven api behaivor like this when there is nothign the enduser can do about it08:48
sean-k-mooneyi was going to bring it up in the opendiscussion section tomorrow an either maintain or drop the -2 depending on the outcome but if ye wont be there then can ye leave some feedback on the direction on gerrit08:49
gibisean-k-mooney: do we have an alternative? passing this via flavor extra_spec?08:53
sean-k-mooneywe have several08:55
sean-k-mooneyfirst if we had that config optioon that would cause all volume attchments to fail on that host08:55
sean-k-mooneyit could instead prevent teh agent form starting08:55
bauzasagreed with sean08:56
bauzason both concerns08:56
sean-k-mooneybut we could alos model this in the connetion info form cinder08:56
bauzas1/ we need a procedural stamp08:56
sean-k-mooneyand they could tell us if its required and we could schdule on it using traits08:56
bauzas2/ we don't want to have endusers wondering why this cloud fails while this other not08:56
gibiI'm +1 on preventing the agent to start08:56
bauzassee ? design solved.08:57
sean-k-mooneynot really08:57
sean-k-mooneythe agent start may or may not work depending on the backends08:57
gibibauzas: in this particular case this cloud fails because it is deployed incorrectly :)08:58
sean-k-mooneyi.e. if you have a mix of ceph and iscsi 08:58
sean-k-mooneyi assume tha tis why they did not go that route but to mee failing a tenant operattion due to a misconfiguration fo a system by a cloud admin is wrong08:58
sean-k-mooneyi would really like use to treat this like neutorn qos and guarenteed bandwith08:59
sean-k-mooneyi.e. if we in tend to enforce or guarentee multipath08:59
sean-k-mooneythen we should make it discoverable to the scheduler via placment08:59
sean-k-mooneyand ideally requestable by the enduer via a property on the volume09:00
sean-k-mooneyotherwise we should keep this best effort and not enforce multipath09:00
gibiI'm not sure about that the enduser needs to ask for this explicitly. For me multipath information is an implementation detail of the cloud. So maybe the operator asks for it09:01
sean-k-mooneyif we went with the enforce config option i basically would expect it to check in init_host or one of the subfucntion it calls that multipatd is running09:01
gibithat sounds like a simple solution^^09:02
sean-k-mooneygibi: well the opartator really can only ask for it in two ways09:02
sean-k-mooneythe flavor or some atribute set on the volume_type09:02
gibiyepp, I would say volume_type in this case09:02
sean-k-mooneyright volume_type woudl be my preference too but we would need to do the same translation we do for neutorn ports on our side to make the enforcement work09:03
gibiyepp, so if killing the agent is enough to cover the use case then I would do that 09:03
sean-k-mooneyok i had proposed that more or less in my last top level comment on the review09:04
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/845660/8#message-718400c522ed5bf53306f28a382300335d592f30=09:05
gibiOK, noted my preference there too now09:06
sean-k-mooneythanks09:07
bauzasfolks, I have a question09:47
bauzashttps://6f28c6786b3c06159e6f-ac3ef42d4a9c79eb41cb204944df5803.ssl.cf2.rackcdn.com/849133/1/check/nova-tox-functional-py38/9cf47c1/testr_results.html09:47
bauzastest_rebuild_with_keypair fails because the regression test uses the latest microversion09:47
bauzashttps://github.com/openstack/nova/blob/master/nova/tests/functional/regressions/test_bug_1843708.py#L2609:48
bauzasso, what would get your preference ?09:48
bauzaschanging the microversion used to cap to 2.90, or providing a pubkey ?09:48
bauzasIMHO, the latter09:49
gibiI would provide a pub key09:49
bauzasyeah09:49
bauzasI was thinking this09:49
bauzasthanks09:49
sean-k-mooneyya proably the same i would proably generate a public key or just hardcode a pair in a fixture somewhere that we can reuse09:52
sean-k-mooneythe regressiosn should be freestandign for the most part so i woudl do it in the test personally09:53
*** xek__ is now known as xek10:20
opendevreviewGorka Eguileor proposed openstack/nova master: Support os-brick specific lock_path  https://review.opendev.org/c/openstack/nova/+/84932811:21
opendevreviewStephen Finucane proposed openstack/placement master: Fix typo in schema  https://review.opendev.org/c/openstack/placement/+/84934812:09
stephenfingibi: bauzas: sean-k-mooney: I've already sort of made my mind up on this but can you take a look at https://review.opendev.org/c/openstack/placement/+/849348 (context is a change from ratailor at https://review.opendev.org/c/openstack/placement/+/848634)12:12
sean-k-mooney[m]ya we would need a new microverion i belive12:15
sean-k-mooney[m]unless we were failing on a 50012:15
sean-k-mooney[m]because we checked in the code somewhere else12:15
bauzasstephenfin: I can try to take a look once I'm done with my own change12:17
stephenfin sean-k-mooney[m] If I revert the code change, the new test I added doesn't fail. That would suggest we were happily accepting an empty object at the code level (presumably because we do falsey checks or similar)12:18
sean-k-mooney[m]ack12:18
gibi"This field may be sent when writing allocations back to the server but will be ignored; this preserves symmetry between read and write representations."12:18
gibiso this value is never used by placement12:19
gibiI think we don't need a microversion in this case12:19
sean-k-mooney[m]oh its the provider tree mappings12:19
sean-k-mooney[m]to lookup the provdier tree form the allocation12:20
gibi(context, the mappings are generated by placement during GET allocation_candidates for the client to know which group are fulfiulled from which RP, so writing this back to placement only make sense for keeping the a_c result directly usable for POST allocations12:20
gibi)12:20
sean-k-mooney[m]ack12:20
sean-k-mooney[m]so  we likely can fix it but we should have the release note to call it out12:21
sean-k-mooney[m]which stephenfin has in there patch12:21
gibiyepp, so I'm +212:26
*** damiandabrowski[m] is now known as damiandabrowski12:45
*** dasm|off is now known as dasm13:51
* bauzas is about to become crazy15:22
bauzashttps://paste.opendev.org/show/bR65i5QPAN7ieYOEJHF4/ fails because the pubkey doesn't re.match()15:22
bauzasbut... the pubkey is exactly good15:22
bauzaswtf, /me is wrapping his head15:24
sean-k-mooneymy guess is maybe there are som spaces or other whitespace15:27
sean-k-mooneyi would try removeing the ^ and $15:28
bauzasI can't remove those15:29
bauzassean-k-mooney: https://github.com/openstack/nova/blob/master/nova/tests/functional/api_samples_test_base.py#L248-L25115:30
bauzasbefore adding the ^ and $ for regexp, I verified the strings15:30
bauzashttps://paste.opendev.org/show/bixG1pb6FdkuMcZNYBqK/ shows the strings matching before we add the trailing chars15:32
bauzasoh, I think I spotted it15:46
gibi-/1615:54
ygk_12345Hi all. we have a wallaby setup. the compute compute services are not starting saying "too old compute version 30". But we have deleted that version from the nova.services db table. But still it is picking up version 30 which we cant find. How to resolve this >15:54
ygk_12345bauzas: any idea about this issue ?15:55
ygk_12345i tried pdb inside nova/cmd/compute.py   file and could see that current_version is getting value 30 15:56
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: api: Drop generating a keypair and add special chars to naming  https://review.opendev.org/c/openstack/nova/+/84913316:12
bauzasstill a WIP due to UTs missing ^16:12
sean-k-mooneyygk_12345: in addtion to deleting the old compute services you will have to restart the conductors and posibly the other contoller services16:17
sean-k-mooneyygk_12345: they cache the min compute version on start up of the service16:17
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: api: Drop generating a keypair and add special chars to naming  https://review.opendev.org/c/openstack/nova/+/84913316:18
ygk_12345sean-k-mooney: is this in addition to the db entries deletion ?16:18
sean-k-mooneyyes so you will need to delete the old compute service records assuming those host nolonger exist16:18
sean-k-mooneyand if you have already tried to start the conductor ectra then you will have to restart it16:18
sean-k-mooneyygk_12345: you should only delete the old entries16:19
sean-k-mooneyif those host are gone16:19
sean-k-mooneyand will not be coming back16:19
sean-k-mooneyygk_12345: there is a workaround for this if you are fast forward upgrading16:19
sean-k-mooneybut if you use that you need to be aware that we dont support configuration where the min compute service version is not met16:20
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#workarounds.disable_compute_service_check_for_ffu16:20
ygk_12345sean-k-mooney: when  restarting the nova-compute service, do they cache any db entries as well ?16:20
sean-k-mooneyrestarting the nova-comptue service would clear any caching it had since it only caches info in memory16:21
sean-k-mooneywe dont cache stuff on disk so there is nothing that would need to be cleaned up manually16:21
ygk_12345sean-k-mooney: but from where is it picking up the 30 version , even though its not there in the db ? maybe from the memory ?16:23
sean-k-mooneyeither the binary you are starting is not at the min version or its coming from the db via an rpc to the conductor to get the min service version in the cell16:24
ygk_12345sean-k-mooney: but the nova.services table doesn't have any trace of version 3016:25
ygk_12345sean-k-mooney: also this workaround , is it in control plane nova.conf or in compute nova.conf ?16:25
sean-k-mooneyygk_12345: have you recently done an upgrade to wallaby or are you in the process of an upgrade?16:25
sean-k-mooneyygk_12345: the workaround need to be set on all hosts16:25
ygk_12345sean-k-mooney: we have upgraded only the control plane16:25
sean-k-mooneyfrom what release16:26
ygk_12345sean-k-mooney: and we are upgrading computes step by step16:26
ygk_12345sean-k-mooney: OSA 23.2.016:26
sean-k-mooneywhat openstack release does that map too16:26
ygk_12345wallaby16:26
sean-k-mooneyso 23 is wallayby 16:27
ygk_12345yes16:27
sean-k-mooneyand you are coming form victoria?16:27
ygk_12345yes ussuri->victoria->Wallaby16:27
sean-k-mooneyok so you upgraded the contolers to victoria then upgreaed all the computes16:27
sean-k-mooneythen upgraded contoler to wallayby16:27
sean-k-mooneythen upgraded the computes16:28
sean-k-mooneyyou cant upgrade the controlers directly form ussuri to wallayby in one go16:28
ygk_12345no no. upgraded first the control plane from stein->train->ussuri>vic>wallaby. Then upgraded the computes16:28
sean-k-mooneyok so you are doing a fast forward upgrade16:29
sean-k-mooneyor skiplevel depneind on the branding16:29
sean-k-mooneythat is not supported by nova directly16:29
sean-k-mooneyso you will need to disable our validation with the workaround16:29
sean-k-mooneythen upgade allt eh compute to a supported version16:29
sean-k-mooneythen renable the check16:29
ygk_12345so the version 30 is being picked up by cache somewhere ?16:30
sean-k-mooneynova and most service only suport a n to n+1 version delta16:30
sean-k-mooneyunlikely16:30
sean-k-mooneyits more likely that you are trying to start a stein compute with a wallaby contoler16:31
ygk_12345yes 16:31
sean-k-mooneyright that is not supported16:31
ygk_12345but stein is not 30 version16:31
sean-k-mooneyvictoria should be version 3016:31
ygk_12345i deleted older versions containers from nova.services tables completely16:32
sean-k-mooneythat is not a good thing16:32
sean-k-mooneyif you have intnaces on the cloud that would break placment16:32
ygk_12345we have successfully upgraded in other platforms16:33
ygk_12345sean-k-mooney: i wil try restarting     all the nova services16:35
ygk_12345sean-k-mooney:  thanks bro for your time. appreciate it.....16:36
sean-k-mooneythe compute service version in stien was 37 by the way https://github.com/openstack/nova/blob/stable/stein/nova/objects/service.py#L3416:36
ygk_12345yes16:36
ygk_12345not 3016:36
sean-k-mooney30 was queens16:36
ygk_12345yes16:36
sean-k-mooneyand the miniutm version supproted by a wallaby controler is 52/victoria16:37
ygk_12345we are totally putting a new os image with wallaby nova-compute version , retaining  the vms from the older compute whil rebooting16:38
ygk_12345the /va/lib/nova/instances is a separate partition16:38
sean-k-mooneyso the conductor and other contoller service should not be able to start without bring the computes to victoria first if you are not using the workaround currently16:39
sean-k-mooneyi dont know where the queens service is coming form if you have delete the serivcies in the db16:40
ygk_12345thats what strange16:40
sean-k-mooneybut the only place in code it could come form on teh compute node is the nova package16:40
sean-k-mooneyimplying you have queens code. other wise it has to be coming form  the db16:40
ygk_12345i think its db cache in the memory . i will try a control plane nova services restart and check16:42
sean-k-mooneyygk_12345: i would expect this code to prevent thet conductor form starting https://github.com/openstack/nova/blob/b320f16b851fd1e5238c0b49c780f6a9c6851e48/nova/utils.py#L1053-L1100=16:43
ygk_12345sean-k-mooney: yes exactly. that part of the  code is giving 'current_service_version' to 30 during the pdb16:44
ygk_12345i cant understand from where it is picking it up16:44
sean-k-mooneyhave you checked both the api and cell db16:44
ygk_12345which tables in api and cell ?16:45
sean-k-mooneythe service table in the cell db16:46
ygk_12345it is empty16:46
sean-k-mooneyyou dont have any entries in cell0 or cell116:47
ygk_12345yes there are.. but ony nova.services table has entries16:47
sean-k-mooneyi assume nova.service is cell0?16:48
ygk_12345no16:48
sean-k-mooneyok its cell116:48
sean-k-mooneythe nameing depends on the deployemnt too16:48
ygk_12345just the plain nova db . inside it 'services' table16:48
sean-k-mooneyright so that db name "nova" depends on your deployment too16:48
ygk_12345i see nova.cell0 and placement dbs 16:48
ygk_12345and also nova and nova_api16:49
sean-k-mooneyok then its proably teh cell 1 database16:49
ygk_12345is there a services table inside it as well ?16:49
sean-k-mooneyya so there should be a service table in every cell database16:49
sean-k-mooneybut you should not delete the services 16:49
ygk_12345ok16:50
sean-k-mooneyif you do it will break your deployment16:50
ygk_12345yes got it16:50
sean-k-mooneyif you delete the services then teh compute agent wil create a new service entry when it starts16:50
ygk_12345so i will just retry with  nova control plane restart16:50
sean-k-mooneythat will resullt in a different serivce uuid and break placment16:50
sean-k-mooneysince the hostname uuid pair will fail the unique constrait16:50
ygk_12345ok16:50
sean-k-mooneytry restartin the contolplane contianer but i expect them to fail16:51
sean-k-mooneywithout the workaround option set16:51
sean-k-mooneysince your compute service versions shoudl be <5616:51
ygk_12345its working for our other setups perfectly16:51
sean-k-mooneysince you have not started them16:51
ygk_12345anyway let me try and will bother you later16:52
sean-k-mooneyygk_12345: what you are currently trying is not something we expect to wrok upstream so if you are able to do it else wehre you must have the workaround enabled or a downstream patch16:52
sean-k-mooneyok16:52
ygk_12345sean-k-mooney: thanks16:53
mnaserdoes nova like to take cpu flag related decisions or is that not a direction nova wants to take anymore19:53
mnaseri've got 100% identical cpus in terms of make/model that fail to live migrate, and it seems like this is because some of them have `tsx-ctrl` and `taa-no` and others dont.19:53
mnaserthose two are apparently found in cpu MSR and not in cpuid so they're not visible (see here https://www.qemu.org/docs/master/system/qemu-cpu-models.html?highlight=taa#important-cpu-features-for-intel-x86-hosts )19:54
mnasershould we warn on it?  should we just disable it?  it seems like "Same cpu model" isn't really even valid anymore lol19:55
*** dasm is now known as dasm|off22:18
TheJuliaIt hasn't really felt valid for a long time, to me... but I had a 6 week order lag and ran into something super similar ~12-13 years ago.22:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!