*** takashin has joined #openstack-nova | 00:07 | |
*** tetsuro has joined #openstack-nova | 00:15 | |
*** med_ has joined #openstack-nova | 00:16 | |
*** markvoelker has quit IRC | 00:17 | |
*** markvoelker has joined #openstack-nova | 00:17 | |
*** hshiina has joined #openstack-nova | 00:18 | |
*** markvoelker has quit IRC | 00:21 | |
*** gyee has quit IRC | 00:29 | |
*** erlon has quit IRC | 00:35 | |
openstackgerrit | Merged openstack/nova stable/rocky: Delete instance_id_mappings record in instance_destroy https://review.openstack.org/604373 | 00:56 |
---|---|---|
*** macza has joined #openstack-nova | 00:59 | |
*** macza has quit IRC | 01:03 | |
*** mhen has quit IRC | 01:05 | |
*** mhen has joined #openstack-nova | 01:07 | |
openstackgerrit | Merged openstack/nova stable/rocky: Fix stacktraces with redis caching backend https://review.openstack.org/606895 | 01:09 |
openstackgerrit | Merged openstack/nova stable/rocky: Null out instance.availability_zone on shelve offload https://review.openstack.org/606086 | 01:09 |
*** hamzy has joined #openstack-nova | 01:12 | |
openstackgerrit | Merged openstack/nova stable/rocky: XenAPI/Stops the migration of volume backed VHDS https://review.openstack.org/604203 | 01:13 |
openstackgerrit | Merged openstack/nova stable/pike: Follow devstack-plugin-ceph job rename https://review.openstack.org/602022 | 01:14 |
openstackgerrit | Merged openstack/nova stable/pike: Fix unit test modifying global state https://review.openstack.org/584592 | 01:14 |
*** mdbooth has quit IRC | 01:18 | |
*** mdbooth has joined #openstack-nova | 01:18 | |
*** mrsoul has joined #openstack-nova | 01:19 | |
*** bhagyashris has joined #openstack-nova | 01:21 | |
*** mdbooth has quit IRC | 01:21 | |
*** mdbooth has joined #openstack-nova | 01:22 | |
*** mdbooth has quit IRC | 01:25 | |
*** mdbooth has joined #openstack-nova | 01:25 | |
*** moshele has joined #openstack-nova | 01:26 | |
*** tetsuro has quit IRC | 01:29 | |
*** med_ has quit IRC | 01:30 | |
*** mdbooth has quit IRC | 01:30 | |
*** mdbooth has joined #openstack-nova | 01:31 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:31 | |
*** Dinesh_Bhor has quit IRC | 01:38 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:47 | |
*** tetsuro has joined #openstack-nova | 01:49 | |
*** dave-mccowan has joined #openstack-nova | 01:56 | |
*** tinwood has quit IRC | 02:10 | |
*** tinwood has joined #openstack-nova | 02:11 | |
*** markvoelker has joined #openstack-nova | 02:18 | |
*** cfriesen has quit IRC | 02:28 | |
*** moshele has quit IRC | 02:30 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova stable/rocky: Fix aggregate members in nested alloc candidates https://review.openstack.org/607454 | 02:36 |
*** med_ has joined #openstack-nova | 02:50 | |
*** markvoelker has quit IRC | 02:51 | |
*** Dinesh_Bhor has quit IRC | 03:00 | |
*** Dinesh_Bhor has joined #openstack-nova | 03:01 | |
*** dave-mccowan has quit IRC | 03:08 | |
*** med_ has quit IRC | 03:37 | |
*** med_ has joined #openstack-nova | 03:40 | |
*** hongbin has joined #openstack-nova | 03:41 | |
*** markvoelker has joined #openstack-nova | 03:48 | |
*** penick has quit IRC | 03:52 | |
*** hongbin has quit IRC | 03:59 | |
*** Dinesh_Bhor has quit IRC | 04:01 | |
*** Dinesh_Bhor has joined #openstack-nova | 04:05 | |
*** Dinesh_Bhor has quit IRC | 04:20 | |
*** med_ has quit IRC | 04:21 | |
mnaser | has anyone seen behaviour (on queens) where doing pci passthrough for gpus, qemu-kvm process goes up but seems to be spinning at 100% cpu and the process is stuck? | 04:21 |
*** markvoelker has quit IRC | 04:21 | |
mnaser | strace just shows a bunch of ioctl's and poll's | 04:21 |
mnaser | nothing weird in dmesg or journals | 04:22 |
mnaser | nova side looks ok.. "Final resource view: name=<snip> phys_ram=524194MB used_ram=66560MB phys_disk=1863GB used_disk=225GB total_vcpus=48 used_vcpus=6 pci_stats=[PciDevicePool(count=7,numa_node=0,product_id='102d',tags={dev_type='type-PCI'},vendor_id='10de')]" | 04:23 |
*** macza has joined #openstack-nova | 04:26 | |
*** macza has quit IRC | 04:30 | |
mnaser | Hmm, I have some leads. I might push some doc changes for the pci passthrough | 04:45 |
mnaser | Dunno if it’s okay to have docs that are more hardware specific | 04:45 |
mnaser | Something along the lines of https://github.com/dholt/kvm-gpu/blob/master/README.md | 04:45 |
*** Dinesh_Bhor has joined #openstack-nova | 04:58 | |
*** Bhujay has joined #openstack-nova | 05:01 | |
*** Bhujay has quit IRC | 05:01 | |
*** Bhujay has joined #openstack-nova | 05:02 | |
*** markvoelker has joined #openstack-nova | 05:18 | |
takashin | cd /tmp | 05:22 |
*** udesale has joined #openstack-nova | 05:41 | |
*** ratailor has joined #openstack-nova | 05:43 | |
*** markvoelker has quit IRC | 05:52 | |
*** moshele has joined #openstack-nova | 06:00 | |
gmann | nova API office hour time | 06:00 |
gmann | #startmeeting nova api | 06:01 |
openstack | Meeting started Wed Oct 3 06:01:34 2018 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot. | 06:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 06:01 |
*** openstack changes topic to " (Meeting topic: nova api)" | 06:01 | |
openstack | The meeting name has been set to 'nova_api' | 06:01 |
gmann | PING List: gmann, alex_xu | 06:01 |
gmann | hanging around for some time if anyone has query related to API | 06:08 |
*** vivsoni has joined #openstack-nova | 06:10 | |
*** Dinesh_Bhor has quit IRC | 06:11 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata https://review.openstack.org/604260 | 06:14 |
*** dims has quit IRC | 06:24 | |
*** dims has joined #openstack-nova | 06:26 | |
gmann | let's close office hour. | 06:32 |
gmann | #endmeeting | 06:32 |
*** openstack changes topic to "Current runways: use-nested-allocation-candidates -- This channel is for Nova development. For support of Nova deployments, please use #openstack." | 06:32 | |
openstack | Meeting ended Wed Oct 3 06:32:39 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 06:32 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.html | 06:32 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.txt | 06:32 |
openstack | Log: http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.log.html | 06:32 |
*** dims has quit IRC | 06:34 | |
*** Dinesh_Bhor has joined #openstack-nova | 06:35 | |
*** vivsoni has quit IRC | 06:35 | |
*** dims has joined #openstack-nova | 06:35 | |
openstackgerrit | Merged openstack/nova stable/ocata: Update RequestSpec.flavor on resize_revert https://review.openstack.org/605880 | 06:37 |
*** dpawlik has joined #openstack-nova | 06:39 | |
openstackgerrit | Merged openstack/python-novaclient master: Fix up userdata argument to rebuild. https://review.openstack.org/605341 | 06:47 |
*** pcaruana has joined #openstack-nova | 06:51 | |
*** vivsoni has joined #openstack-nova | 06:53 | |
*** sahid has joined #openstack-nova | 06:54 | |
bauzas | good morning nova | 07:10 |
*** jding1_ has joined #openstack-nova | 07:10 | |
*** aarents has joined #openstack-nova | 07:17 | |
*** Dinesh_Bhor has quit IRC | 07:19 | |
*** jroll has quit IRC | 07:19 | |
*** logan- has quit IRC | 07:19 | |
*** jackding has quit IRC | 07:19 | |
*** ajo has quit IRC | 07:19 | |
*** Gorian has quit IRC | 07:19 | |
*** odyssey4me has quit IRC | 07:19 | |
*** jcosmao has quit IRC | 07:19 | |
*** vdrok has quit IRC | 07:19 | |
*** amotoki has quit IRC | 07:23 | |
gibi | morning | 07:28 |
*** helenafm has joined #openstack-nova | 07:29 | |
*** jroll has joined #openstack-nova | 07:32 | |
*** logan- has joined #openstack-nova | 07:32 | |
*** ajo has joined #openstack-nova | 07:32 | |
*** vdrok has joined #openstack-nova | 07:32 | |
*** jcosmao has joined #openstack-nova | 07:32 | |
*** odyssey4me has joined #openstack-nova | 07:32 | |
*** skatsaounis has quit IRC | 07:36 | |
*** tssurya has joined #openstack-nova | 07:38 | |
*** Sigyn has quit IRC | 07:40 | |
*** alexchadin has joined #openstack-nova | 07:40 | |
*** Sigyn has joined #openstack-nova | 07:40 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use provider tree in virt FakeDriver https://review.openstack.org/604083 | 07:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Refactor allocation checking in functional tests https://review.openstack.org/607287 | 07:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Run ServerMovingTests with nested resources https://review.openstack.org/604084 | 07:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Ignore forcing of live migration for nested instance https://review.openstack.org/605785 | 07:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Consider nested allocations during allocation cleanup https://review.openstack.org/606050 | 07:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Ignore forcing of evacuation for nested instance https://review.openstack.org/606111 | 07:41 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs https://review.openstack.org/604125 | 07:41 |
*** jpena|off is now known as jpena | 07:51 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: consumer gen: support claim_resources https://review.openstack.org/583667 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Enable nested allocation candidates in scheduler https://review.openstack.org/585672 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use provider tree in virt FakeDriver https://review.openstack.org/604083 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Refactor allocation checking in functional tests https://review.openstack.org/607287 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Run ServerMovingTests with nested resources https://review.openstack.org/604084 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Ignore forcing of live migration for nested instance https://review.openstack.org/605785 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Consider nested allocations during allocation cleanup https://review.openstack.org/606050 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Ignore forcing of evacuation for nested instance https://review.openstack.org/606111 | 07:52 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs https://review.openstack.org/604125 | 07:52 |
*** tetsuro has quit IRC | 07:54 | |
*** ralonsoh has joined #openstack-nova | 07:55 | |
*** rcernin has quit IRC | 07:55 | |
*** takashin has left #openstack-nova | 08:02 | |
*** vivsoni has quit IRC | 08:07 | |
*** jroll has quit IRC | 08:13 | |
*** logan- has quit IRC | 08:13 | |
*** ajo has quit IRC | 08:13 | |
*** odyssey4me has quit IRC | 08:13 | |
*** jcosmao has quit IRC | 08:13 | |
*** vdrok has quit IRC | 08:13 | |
*** markvoelker has joined #openstack-nova | 08:18 | |
*** vivsoni has joined #openstack-nova | 08:18 | |
*** alexchadin has quit IRC | 08:21 | |
*** jroll has joined #openstack-nova | 08:27 | |
*** logan- has joined #openstack-nova | 08:27 | |
*** ajo has joined #openstack-nova | 08:27 | |
*** vdrok has joined #openstack-nova | 08:27 | |
*** jcosmao has joined #openstack-nova | 08:27 | |
*** odyssey4me has joined #openstack-nova | 08:27 | |
*** amotoki_ has joined #openstack-nova | 08:27 | |
*** skatsaounis has joined #openstack-nova | 08:28 | |
*** cdent has joined #openstack-nova | 08:31 | |
*** derekh has joined #openstack-nova | 08:33 | |
ralonsoh | stephenfin: https://review.openstack.org/#/c/476612/36/vif_plug_ovs/ovsdb/ovsdb_lib.py@83. I don't understand this | 08:47 |
ralonsoh | stephenfin: do you mean I need to move this function... where? | 08:47 |
*** ttsiouts has joined #openstack-nova | 08:51 | |
*** jroll has quit IRC | 08:51 | |
*** logan- has quit IRC | 08:51 | |
*** ajo has quit IRC | 08:51 | |
*** odyssey4me has quit IRC | 08:51 | |
*** jcosmao has quit IRC | 08:51 | |
*** vdrok has quit IRC | 08:51 | |
*** markvoelker has quit IRC | 08:51 | |
*** ttsiouts has quit IRC | 08:53 | |
*** ttsiouts has joined #openstack-nova | 08:53 | |
openstackgerrit | Rodolfo Alonso Hernandez proposed openstack/os-vif master: Add abstract OVSDB API https://review.openstack.org/476612 | 08:54 |
*** derekh has quit IRC | 08:58 | |
*** ttsiouts has quit IRC | 08:58 | |
* stephenfin looks | 08:58 | |
*** ttsiouts has joined #openstack-nova | 09:00 | |
stephenfin | ralonsoh: Sorry, yeah, I mean move that above 'create_ovs_vif_port' or below 'delete_ovs_vif_port', so that 'create_', 'update_' and 'delete_' are grouped together | 09:02 |
stephenfin | ralonsoh: It's a nit though. Don't worry about it | 09:02 |
ralonsoh | stephenfin: done! | 09:02 |
stephenfin | Oh, perfect :) | 09:03 |
stephenfin | I'll review that now | 09:03 |
*** jroll has joined #openstack-nova | 09:05 | |
*** logan- has joined #openstack-nova | 09:05 | |
*** ajo has joined #openstack-nova | 09:05 | |
*** vdrok has joined #openstack-nova | 09:05 | |
*** jcosmao has joined #openstack-nova | 09:05 | |
*** odyssey4me has joined #openstack-nova | 09:05 | |
*** ttsiouts has quit IRC | 09:07 | |
*** ttsiouts has joined #openstack-nova | 09:07 | |
*** ttsiouts_ has joined #openstack-nova | 09:10 | |
*** ttsiouts has quit IRC | 09:12 | |
*** alexchadin has joined #openstack-nova | 09:13 | |
*** priteau has joined #openstack-nova | 09:13 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Rewrite the console doc https://review.openstack.org/606148 | 09:38 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Add minimal documentation for RDP consoles https://review.openstack.org/606992 | 09:38 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Add minimal documentation for MKS consoles https://review.openstack.org/606993 | 09:38 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: conf: Allow 'nova-xvpvncproxy' to be called with CLI args https://review.openstack.org/606929 | 09:38 |
*** alexchadin has quit IRC | 09:46 | |
*** mdbooth has joined #openstack-nova | 09:46 | |
bauzas | does anyone know how to unstack/stack devstack with OFFLINE=True for a single project ? | 09:47 |
bauzas | I mean, I have all my stack but I just want to reinstall nova from master | 09:47 |
sean-k-mooney | bauzas: unstack, then manually git pull/checkout master on the nova repo set OFFLINE=True and stack | 09:49 |
sean-k-mooney | devstack wont update any of the projects and just used what is availabel | 09:50 |
sean-k-mooney | if you are unlucky you might need to pip install requirements form master if they have changed but that rarely is required | 09:50 |
bauzas | sean-k-mooney: that was my plan but I had the assumption that it would redeploy *all* projects from ENABLED_SERVICES | 09:51 |
sean-k-mooney | bauzas: it will | 09:51 |
bauzas | sean-k-mooney: my point is that I just want to init_nova() honestly | 09:51 |
sean-k-mooney | is that an issue | 09:51 |
sean-k-mooney | oh well dont unstack then | 09:51 |
bauzas | sean-k-mooney: my need is just that now that I reshaped some inventories, I just want to reshape back | 09:51 |
sean-k-mooney | just checkout the branch you want and run sudo systemctl restart devstack@n-* | 09:52 |
sean-k-mooney | if reshape is done by nova manage change restart to stop, then run nova manage and then start | 09:53 |
sean-k-mooney | as long as there are no schema migrations in between your current branch and the one your going to and no requiremetns changes you dont need to restack to change commit. just git checkout and restart service X | 09:55 |
cdent | yeah, that's what I was going to suggest | 09:58 |
*** bhagyashris has quit IRC | 10:04 | |
*** ttsiouts_ has quit IRC | 10:07 | |
stephenfin | bauzas: Could you look at pushing https://review.openstack.org/#/c/456572/ through? | 10:08 |
*** kukacz has quit IRC | 10:12 | |
*** kukacz has joined #openstack-nova | 10:13 | |
bauzas | sean-k-mooney: the problem is that the inventories are reshaped | 10:17 |
bauzas | sean-k-mooney: so a DB sync won't work, right? | 10:17 |
bauzas | because all the tables are there | 10:17 |
sean-k-mooney | bauzas: you can always drop the tables and then sync i guess | 10:18 |
bauzas | if I'm dropping the tables, I miss eg. https://github.com/openstack-dev/devstack/blob/master/lib/nova#L722-L723 | 10:19 |
bauzas | sean-k-mooney: ^ | 10:19 |
sean-k-mooney | i dont think there is any magical way to reshape witout deleting the reshaped RPs and restarting nova compute | 10:20 |
sean-k-mooney | that is what i ment by droping the tables | 10:20 |
bauzas | I think I'll just unstack/unstack for this time, and snapshot the DB | 10:21 |
*** derekh has joined #openstack-nova | 10:22 | |
bauzas | so, when I want to go backwards, I'll just use the dumpfile | 10:22 |
sean-k-mooney | ya that would work but other then for local testing is there a reason you are trying to downgrade? | 10:22 |
bauzas | sean-k-mooney: no, just testing indeed | 10:22 |
sean-k-mooney | if you have never used it https://www.heidisql.com/ is a great little tool for working with dbs | 10:23 |
bauzas | well | 10:23 |
sean-k-mooney | you need to run it under wine however | 10:23 |
bauzas | just for what I want, a mysqldump is enough | 10:24 |
*** udesale has quit IRC | 10:26 | |
sean-k-mooney | the one thing that annoys me more then the fact that we use a 80 charter line lenght is that we configure pep8 on spec for 79 charters | 10:36 |
*** med_ has joined #openstack-nova | 10:41 | |
openstackgerrit | sean mooney proposed openstack/nova-specs master: Add spec for sriov live migration https://review.openstack.org/605116 | 10:43 |
*** mdbooth has joined #openstack-nova | 10:47 | |
*** mvkr has quit IRC | 10:48 | |
*** markvoelker has joined #openstack-nova | 10:49 | |
*** hshiina has quit IRC | 10:55 | |
stephenfin | That moment of panic where you submit a review and spot a group of comments on older patchsets from the corner of your eye | 10:57 |
stephenfin | What *did* Stephen of June 2018 have to say about this... | 10:57 |
*** ttsiouts has joined #openstack-nova | 11:02 | |
*** mvkr has joined #openstack-nova | 11:02 | |
*** erlon has joined #openstack-nova | 11:03 | |
*** moshele has quit IRC | 11:03 | |
*** amotoki_ is now known as amotoki | 11:07 | |
mdbooth | stephenfin: Fancy a bash at this one: https://review.openstack.org/#/c/605436/ I bitch about python3 in it ;) | 11:08 |
openstackgerrit | sean mooney proposed openstack/nova-specs master: Add spec for sriov live migration https://review.openstack.org/605116 | 11:09 |
*** s10 has joined #openstack-nova | 11:11 | |
*** markvoelker has quit IRC | 11:22 | |
*** ratailor has quit IRC | 11:37 | |
*** macza has joined #openstack-nova | 11:37 | |
*** jpena is now known as jpena|lunch | 11:38 | |
sean-k-mooney | mdbooth: one basic question regarding https://review.openstack.org/#/c/605436/5. the compaute manger runs on the compute agent which is singel treaded but uses eventlets. so there is no paralleism but there is concurancy. so the lock you are aquiring is the mockey patched greenthread lock. is the reason we need the lock in the first place the fact we are doing db io and eventlets is cause us to yeild | 11:39 |
sean-k-mooney | allow another invocation of the fuction to start concurrently which races | 11:39 |
mdbooth | sean-k-mooney: Without going into details of locking, I find it's safest to ignore eventlets entirely when considering locking. | 11:40 |
mdbooth | When you tie yourself in knots trying to second guess a scheduler, you make lots of mistakes. | 11:41 |
sean-k-mooney | mdbooth: if we did not have eventlets in this case we would not need to lock | 11:41 |
mdbooth | It has multiple threads of execution. | 11:41 |
mdbooth | I don't care how we achieve that. | 11:41 |
sean-k-mooney | the compute manager is exectued from the compute agent right which does not have workers so only one thread | 11:42 |
mdbooth | Eventlets or python's 'threading' all have the same issues. | 11:42 |
*** macza has quit IRC | 11:42 | |
sean-k-mooney | mdbooth: yes but my point is eventlets intoduced the concurency so that therefor we need a lock to be correct | 11:43 |
mdbooth | sean-k-mooney: We have concurrency. | 11:43 |
sean-k-mooney | if we did not have eventlets the previous code would have been correct because it would have been singel threaded | 11:43 |
sean-k-mooney | mdbooth: yep i know | 11:44 |
mdbooth | sean-k-mooney: Sure. We do have concurrency, though. It uses eventlets. | 11:44 |
sean-k-mooney | just makeing sure i understand the subtelties of the patch. this is an example of why i hate eventlets it hides concurancy | 11:44 |
mdbooth | sean-k-mooney: It doesn't really by the time you get into the compute manager. | 11:44 |
mdbooth | You just ignore eventlets entirely and assume you have multiple concurrent threads. How they're implemented isn't all that important in that code. | 11:45 |
mdbooth | If we later switched to a multi-threaded model, it would still be fine. | 11:45 |
sean-k-mooney | mdbooth: well if you were new to nova or did not think about it at the time then you can write races because of eventlets easier then if it was expcitly threaded | 11:46 |
mdbooth | Honestly, I never consider eventlets. I assume it's explicitly threaded. | 11:46 |
mdbooth | It's not, but that doesn't have any bearing on writing safe code, except when there's bugs in eventlet. | 11:47 |
sean-k-mooney | mdbooth: most new people i have talked to that work on openstack dont think about threads at all because the say oh its python and that has a gil so i dont need to care | 11:47 |
* mdbooth would tell anybody new to Nova to ignore eventlets and assume it's multi-threaded. | 11:47 | |
mdbooth | sean-k-mooney: That is just one of many issues with python in the real world :( | 11:48 |
mdbooth | Also, the gil doesn't prevent overlapping threads of execution, it just stops them running at the same time. The problems are the same. | 11:48 |
sean-k-mooney | mdbooth: yes and no. there perception is normally correct. its eventlets that violates the paradime | 11:49 |
mdbooth | As an old curmudgeon, I think python has been extremely detrimental to software engineering, particular in education | 11:49 |
mdbooth | No, it would not be correct. If you have a multi-threaded python program, even though the gil prevents multiple threads running concurrently, you still need locks. | 11:50 |
sean-k-mooney | mdbooth: i self taught myself c++ as my first langage then java so ya i like to understand exactly what is going on | 11:50 |
mdbooth | Basically: eventlets or python-multithreading: I don't care. It shouldn't change how your write code. | 11:51 |
sean-k-mooney | learning c++ fist made me a better engineer then learing python would have. | 11:51 |
mdbooth | They both use fake threading. | 11:52 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage https://review.openstack.org/604400 | 11:53 |
mdbooth | I think ^^^ might fix that weird issue I was hitting | 11:53 |
mdbooth | There's a shortcut in service_is_up if the service is forced down, and we forced it down | 11:53 |
sean-k-mooney | mdbooth: yes i know if it was expcitly multi thraded it would be incorrect without the lock. anyway cool ill take a look at that too | 11:54 |
mdbooth | sean-k-mooney: Don't worry about ^^^ btw. Will just wait until zuul has voted. | 11:55 |
mdbooth | sean-k-mooney: I knew I was missing something simple there. | 11:55 |
sean-k-mooney | waith why was it forced down? | 11:55 |
sean-k-mooney | oh so you could evacuate | 11:56 |
mdbooth | The test_evacuate.sh script which ran before the second round... | 11:56 |
mdbooth | yeah | 11:56 |
sean-k-mooney | ha yet anthoer race this tiem between test :) | 11:57 |
nicolasbock | Morning | 12:00 |
nicolasbock | I have a run-away server, i.e. a VM that's running on a hypervisor different than what Nova thinks. So far I haven't quite been able to get the placement service to help me update Nova's view of reality... | 12:01 |
nicolasbock | I can see the server with 'openstack server show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd' | 12:02 |
nicolasbock | And it lists the wrong hypervisor | 12:02 |
mdbooth | sean-k-mooney: $ git grep test\.nested | wc -l | 12:02 |
mdbooth | 348 | 12:02 |
mdbooth | I'm not moving that ;) | 12:02 |
mdbooth | sean-k-mooney: Feel free to write a follow-up patch | 12:03 |
nicolasbock | I can also check that with 'openstack resource provider allocation show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd' | 12:03 |
*** udesale has joined #openstack-nova | 12:03 | |
sean-k-mooney | mdbooth: ok i would have just done nested=common.nested in test.py | 12:03 |
nicolasbock | The VM is really running on 6cbb84b0-02f4-4ee3-9df2-151475b1effe | 12:04 |
mdbooth | sean-k-mooney: Sure. I don't want to mess with test.nested here, though. | 12:04 |
nicolasbock | But `openstack resource provider allocation set --allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd` is not working | 12:04 |
sean-k-mooney | mdbooth: ok ill submist a follow up patch. eventully... i have added it to my whiteboard | 12:05 |
nicolasbock | I suppose I am missing a `resource-class-name` | 12:05 |
*** ttsiouts has quit IRC | 12:05 | |
nicolasbock | But what do I put there? | 12:05 |
*** ttsiouts has joined #openstack-nova | 12:07 | |
openstackgerrit | Elod Illes proposed openstack/nova stable/ocata: Fix the help for the disk_weight_multiplier option https://review.openstack.org/607537 | 12:08 |
*** ttsiouts has quit IRC | 12:11 | |
sean-k-mooney | nicolasbock: was the vm migrated | 12:11 |
nicolasbock | Yes sean-k-mooney | 12:12 |
sean-k-mooney | mdbooth: was lookin at an edgecase recently where if cleanup on the source fail we dont update the vm host | 12:12 |
sean-k-mooney | nicolasbock: e.g. when you finish migrating the instace if the post migrate job on the source node failts to say unplug a vif we fail before we update the instace db record to refect that the vm is running on the new node | 12:14 |
sean-k-mooney | mdbooth: did you ever proposea a patch for ^ | 12:14 |
mdbooth | sean-k-mooney: Probably. | 12:15 |
*** moshele has joined #openstack-nova | 12:15 | |
sean-k-mooney | nicolasbock: are the placement allocation currently associated with the vm correct for the host it is actully on | 12:17 |
sean-k-mooney | e.g. if you ignore the db host value and actully check the vms location | 12:17 |
mdbooth | sean-k-mooney: Trying to parse your comment here: https://review.openstack.org/#/c/605436/5/nova/compute/manager.py@1447 | 12:18 |
mdbooth | Are you saying we can release the lock there? | 12:18 |
mdbooth | Or yield the context manager there? | 12:18 |
mdbooth | Or something else, because neither of ^^^ would be correct. | 12:18 |
nicolasbock | The VM is running on `6cbb84b0-02f4-4ee3-9df2-151475b1effe` | 12:18 |
nicolasbock | But placement says that it's on `57b0e4d5-3a3e-4cf3-ba8c-b88c8ce4679b` | 12:19 |
sean-k-mooney | mdbooth: im saying if we dont have a lock when we invoke that line the db request could cause use to yeild causeing a race | 12:19 |
mdbooth | Ah, *eventlet* yield | 12:19 |
mdbooth | Ok. | 12:19 |
sean-k-mooney | e.g. this is the thing that definetly needs to be in the critcal section of the lock | 12:19 |
nicolasbock | I ran `openstack server show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd` | 12:20 |
*** dave-mccowan has joined #openstack-nova | 12:20 | |
nicolasbock | and `openstack resource provider allocation show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd` | 12:20 |
sean-k-mooney | mdbooth: ya re reading that was not clear | 12:21 |
sean-k-mooney | nicolasbock: is 57b0e4d5-3a3e-4cf3-ba8c-b88c8ce4679b the source or destination of the migration | 12:21 |
sean-k-mooney | nicolasbock: im assume the destination correct? | 12:22 |
sean-k-mooney | sorry source | 12:22 |
nicolasbock | I don't know what happened, but I would guess that it is the source | 12:23 |
nicolasbock | Sorry, I am not sure I completely grasp the terminology of source and destination | 12:23 |
sean-k-mooney | ya so the resouce being used on 6cbb84b0-02f4-4ee3-9df2-151475b1effe are likely still owned by the migration object in placement | 12:23 |
nicolasbock | What's the migration object? | 12:24 |
*** med_ has quit IRC | 12:25 | |
sean-k-mooney | when you do a migration we create a migration record that we use to calim resouces on the destination host then when the vm is move we use a special atomic oepration in the placemnt api to change the allocation consumer form the migration recordds uuid to the vms uuid | 12:26 |
nicolasbock | So you are saying that that atomic operation wasn't executed? | 12:27 |
sean-k-mooney | yes | 12:27 |
nicolasbock | Ok | 12:27 |
nicolasbock | Can I get it to execute? | 12:27 |
sean-k-mooney | so if you do nova server-migration-list 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd does it have a migration object listed | 12:28 |
nicolasbock | No | 12:28 |
sean-k-mooney | oh hum strange. perhaps the migration has already been confirmed. | 12:30 |
sean-k-mooney | nicolasbock: efried might be able to help better the i if he is around | 12:31 |
nicolasbock | So I thought that `openstack resource provider allocation set` would allow me to update the DB | 12:31 |
*** ttsiouts has joined #openstack-nova | 12:31 | |
nicolasbock | Thanks sean-k-mooney ! | 12:31 |
nicolasbock | But I am not using that command correctly since it complains about an incorrect 'allocation string format' | 12:32 |
*** ttsiouts has quit IRC | 12:33 | |
*** ttsiouts has joined #openstack-nova | 12:34 | |
openstackgerrit | Vlad Gusev proposed openstack/nova stable/pike: libvirt: Use os.stat and os.path.getsize for RAW disk inspection https://review.openstack.org/607544 | 12:36 |
openstackgerrit | Matthew Booth proposed openstack/nova master: DNM: Run against mriedem's evacuate test https://review.openstack.org/604423 | 12:37 |
*** jpena|lunch is now known as jpena | 12:41 | |
mdbooth | This is an interesting query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20u'host3'%20%3D%3D%20u'host3'%5C%22 | 12:44 |
mdbooth | I wonder why that started spiking only a few days ago: code change, or infra change? | 12:44 |
*** moshele has quit IRC | 12:53 | |
*** artom has quit IRC | 12:56 | |
stephenfin | lyarwood: RE: https://review.openstack.org/588570 I'd been putting it off but will do that now. Will keep you posted | 13:03 |
lyarwood | stephenfin: cheers | 13:04 |
*** udesale has quit IRC | 13:05 | |
s10 | How can I reopen bug https://bugs.launchpad.net/nova/+bug/1209101 ? It still exists. | 13:06 |
openstack | Launchpad bug 1209101 in OpenStack Compute (nova) "Non-public flavor cannot be used in created tenant" [High,Fix released] - Assigned to Sumanth Nagadavalli (sumanth-nagadavalli) | 13:06 |
sean-k-mooney | s10: just change status and leave a comment with details | 13:07 |
s10 | sean-k-mooney: I've left comment, but I can't change status, all of them are grey. | 13:08 |
sean-k-mooney | s10: that said i dont think its nessisarly the same but. that was fixed in 2013 | 13:08 |
sean-k-mooney | its more likely a regression | 13:08 |
sean-k-mooney | are you seeing this on master? | 13:08 |
s10 | sean-k-mooney: Yes, this regression have never been fixed. I will write a comment about how to reproduce it. | 13:09 |
sean-k-mooney | well it was fixed and then reverted so this likely needs to be treated as more then a bug fix but rather as a blueprint/spec | 13:10 |
sean-k-mooney | the original bug fix predates microversions so i think a mini spec + microversion but would be required to alter the api behavior | 13:13 |
sean-k-mooney | bauzas: is ^ correct | 13:13 |
bauzas | context ? | 13:14 |
sean-k-mooney | private flavor are automatically expsed to new tenants | 13:14 |
sean-k-mooney | because https://bugs.launchpad.net/nova/+bug/1209101 was reverted | 13:15 |
openstack | Launchpad bug 1209101 in OpenStack Compute (nova) "Non-public flavor cannot be used in created tenant" [High,Fix released] - Assigned to Sumanth Nagadavalli (sumanth-nagadavalli) | 13:15 |
sean-k-mooney | sorry are not automatically exposed | 13:15 |
sean-k-mooney | bauzas: so context is if we want to chagne teh behavior of the api to auto grant access to the private flavor that would require a spec rather then being just a bug fix right as its an api change? | 13:18 |
*** liuyulong has joined #openstack-nova | 13:20 | |
bauzas | sean-k-mooney: IIUC, I'd tend to say yes, as it's a behavioural change | 13:20 |
bauzas | we don't really call the fact to not show private flavors as a "bug" | 13:20 |
bauzas | some people would also like to keep this behaviour I guess | 13:20 |
bauzas | and last but not the least, two OpenStack clouds could behave differently for the same request and list of flavors, which is not interop | 13:21 |
bauzas | HTH | 13:21 |
*** mvkr has quit IRC | 13:23 | |
*** eharney has joined #openstack-nova | 13:23 | |
*** artom has joined #openstack-nova | 13:30 | |
sean-k-mooney | s10: im just having lunch but based on bauzas confirmation rather then repoen the but i would suggest you file a nova spec. if you dont have time to do that i can see if i can do it later today but you have more context then i as to what you wanted to achive | 13:34 |
*** mriedem has joined #openstack-nova | 13:37 | |
*** cfriesen has joined #openstack-nova | 13:39 | |
efried | nicolasbock: Still around? | 13:39 |
efried | nicolasbock: mriedem would be a better source of CLI syntax help, but I can tell you what the API call itself would need to look like. | 13:40 |
mnaser | does anyone know if CERN does pci passthrough on centos or ubuntu? | 13:41 |
dansmith | I thought they were all centos | 13:42 |
mnaser | i'm trying to figure out why pci passthrough isn't working | 13:43 |
mnaser | everything nova side is working ok, but the newly spawned qemu-kvm process is stuck spinning at 100% cpu, no console logs, libvirt qemu logs show nothing.. | 13:43 |
mnaser | so i'm a bit at a loss, not really sure where to go next | 13:44 |
s10 | sean-k-mooney: Basically, what I want to archive is to be able to provide tenants and ability to manage private flavors which they created. | 13:44 |
s10 | sean-k-mooney: But this will be more complicated than automatic flavor access add after private flavor creation... | 13:46 |
sean-k-mooney | s10: ok if that is the usecase then that should be relitvly simple to capture in a spec. ill write up a spec. | 13:46 |
sean-k-mooney | s10: oh why should we not jsut need to allow the teant that created teh flavor to have acess to it automatically? | 13:47 |
sean-k-mooney | for interop reasons we will need a microversion bump but that is just a mechanical sideeffect not germain to the feature | 13:48 |
*** mvkr has joined #openstack-nova | 13:50 | |
*** awaugama has joined #openstack-nova | 13:52 | |
s10 | sean-k-mooney: I want them to have access to created flavor automatically because there is no RBAC in nova for private flavors. Flavors don't have an owner. I can't only allow tenants to run flavor-access-add for "their" flavors :( | 13:53 |
*** tbachman has joined #openstack-nova | 13:56 | |
mnaser | ou | 13:58 |
mnaser | i wonder if this has to do with using lvm local storage | 13:58 |
mnaser | and the qemu user not being able to access it | 13:58 |
mnaser | hmm | 13:58 |
mnaser | yup, it can't access it | 13:59 |
mnaser | would it be a responsibility of nova to setup permissions of volumes under lvm so that the qemu user can read it? | 14:00 |
*** maciejjozefczyk has quit IRC | 14:01 | |
*** rpittau_ has joined #openstack-nova | 14:02 | |
mdbooth | I just ran a git bisect to try to find out why the incidence of test_parallel_evacuate_with_server_group failure went from occasional to almost 50% in the last few days, and the culprit is my patch from the other day: https://review.openstack.org/#/c/604859/ | 14:02 |
mdbooth | I still consider this a feature :) | 14:03 |
*** rpittau has quit IRC | 14:03 | |
*** munimeha1 has joined #openstack-nova | 14:05 | |
*** Bhujay has quit IRC | 14:06 | |
*** hoangcx has quit IRC | 14:06 | |
*** Bhujay has joined #openstack-nova | 14:07 | |
mriedem | dansmith: did you know there was a cinder_img_volume_type metadata key which can be used to create a bootable volume with a specific volume type? https://docs.openstack.org/cinder/latest/cli/cli-manage-volumes.html#volume-types | 14:14 |
mriedem | so technically people today could boot from volume from an image with that metadata and get what they wanted instead of passing a volume type to nova - clunky i know | 14:15 |
dansmith | on the image | 14:15 |
mriedem | yeah, | 14:15 |
dansmith | I did not know that, no | 14:15 |
mriedem | likely also means that we need to make a decision in the compute API if the user specifies a volume type and the source image has that metadata key/value, which do we pick? or do we 400? | 14:15 |
mriedem | probably need to know what cinder does in that same case | 14:16 |
mriedem | smcginnis: do you know off the top of your head? ^ | 14:16 |
*** Bhujay has quit IRC | 14:16 | |
dansmith | seems to me like if they ask for something on the boot request, that always wins | 14:17 |
dansmith | like, we have a default type, and there can be a default for an image, | 14:17 |
smcginnis | If you explicitly provide a type, I believe we will give that priority over the image property. | 14:17 |
dansmith | but if they ask for something specific at the time, I would expect they want the one they asked for | 14:17 |
mriedem | yeah that's what i'd expect too | 14:17 |
smcginnis | So fallback can be to have the volume type stuffed in the image properties, but that should not change the primary usage of someone saying specifically what they want. | 14:17 |
*** mlavalle has joined #openstack-nova | 14:19 | |
* bauzas facepalms when he looks https://stackoverflow.com/questions/11941817/how-to-avoid-runtimeerror-dictionary-changed-size-during-iteration-error | 14:20 | |
mnaser | yup. that was the issue | 14:20 |
mnaser | if you use lvm on centos with nova, the volumes are created under user 'root' | 14:20 |
mnaser | so qemu process cant touch them and it cant boot | 14:21 |
mnaser | is this technically a nova bug? | 14:21 |
mnaser | (aka the devices on the system /dev/vg_foo/vmuuid_disk are root:root, qemu-kvm runs on qemu:qemu) | 14:21 |
*** hoangcx has joined #openstack-nova | 14:22 | |
nicolasbock | Hi efried I am still here | 14:23 |
efried | nicolasbock: So looking at the manual (https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-set) it appears as though you're going to want multiple --alocation params... | 14:25 |
s10 | mriedem: Can we remove line https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L3669 ? It causes bug https://bugs.launchpad.net/nova/+bug/1746972 . I don't believe that error in set-admin-password on running instance should put instance in error state and require cloud admin intervention to reset instances state or that user should abandon and remove this vm. | 14:25 |
openstack | Launchpad bug 1746972 in OpenStack Compute (nova) "After setting the password failed, the VM state is set to error" [Undecided,Confirmed] | 14:25 |
nicolasbock | Yes, that's my reading too efried | 14:25 |
nicolasbock | But I don't understand what the other parameters should look like | 14:26 |
efried | nicolasbock: I *think* each should look like --allocation rp=$rp_uuid,$rc=$amount | 14:26 |
dansmith | mnaser: what do you want nova to do? chown them? afaik, it doesn't know what user qemu will run as | 14:26 |
openstackgerrit | Merged openstack/nova stable/pike: nova-status - don't count deleted compute_nodes https://review.openstack.org/604788 | 14:26 |
nicolasbock | Ok. What I don't get is why I need a resource-class in there as well. I don't want anything to update in terms of resource classes | 14:27 |
efried | nicolasbock: Oh, but you do :) | 14:27 |
nicolasbock | I do? | 14:27 |
*** dpawlik has quit IRC | 14:27 | |
efried | nicolasbock: I guess it's obvious to me because I know what the REST payload looks like, but come to think of it, it makes sense how you're thinking about it. | 14:27 |
nicolasbock | Maybe I am not looking at resource classes correctly | 14:28 |
efried | nicolasbock: See, the allocations in the API are a hierarchical structure like resource provider => resource class => amount | 14:28 |
nicolasbock | But the way I am thinking about them is that they specify things like memory and CPU cores | 14:28 |
nicolasbock | Ok | 14:28 |
efried | And also the CLI (and the API it's using) is designed to fully *replace* allocations, not like edit pieces of them. | 14:28 |
efried | So given that... | 14:29 |
efried | nicolasbock: You should do `openstack resource provider allocation show $instance_uuid` | 14:29 |
efried | Which should give you allocations in three-ish resource classes | 14:29 |
efried | nicolasbock: can you pastebin me that output? | 14:29 |
dansmith | you know what would be awesome | 14:29 |
dansmith | openstack resource provider allocation edit <uuid> | 14:30 |
nicolasbock | https://pastebin.com/sFdzQuPE | 14:30 |
dansmith | like virsh edit | 14:30 |
*** ttsiouts has quit IRC | 14:30 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: doc: fix and clarify --block-device usage in user docs https://review.openstack.org/607589 | 14:30 |
sean-k-mooney | dansmith: that could be done as a client only feature but yes that would be nice | 14:31 |
dansmith | sean-k-mooney: obviously client-only | 14:31 |
sean-k-mooney | well i was debating if you wuld want to put it in th openstack sdk or just the osc plugin | 14:31 |
sean-k-mooney | but ya i think just in the plugin | 14:32 |
dansmith | oh, I meant just in the plugin | 14:32 |
dansmith | yeah | 14:32 |
dansmith | and you could translate to/from yaml for the actual editing maybe | 14:32 |
dansmith | so people aren't having to hand-edit json | 14:32 |
dansmith | since you need to validate the schema before you send it back anyway | 14:32 |
mnaser | dansmith: yeah, that why i don't think it's a nova problem but maybe something that we should document.. or libvirt should | 14:33 |
sean-k-mooney | sounds like a nice low hangin fruit bug | 14:33 |
efried | nicolasbock: Okay, so I think you're going to want to build your command with: | 14:33 |
efried | --allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,MEMORY_MB=8192 \ | 14:33 |
efried | --allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,VCPU=4 \ | 14:33 |
efried | --allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,DISK_GB=80 | 14:33 |
mriedem | ha | 14:33 |
mriedem | "InstancePasswordSetFailed: Failed to set admin password on | 14:33 |
mriedem | Â 9f9330c2-4ab4-45f1-a9f9-2770dd34cf30 because error setting admin password" | 14:33 |
nicolasbock | Ah ok | 14:34 |
mriedem | "we failed because we failed" | 14:34 |
efried | mriedem: duh | 14:34 |
nicolasbock | Let me try that | 14:34 |
mriedem | s10: i don't know why the instance is put into ERROR state there, i want to say i've seen a patch to remove that | 14:34 |
efried | nicolasbock: Note that's gotta be all in one command. Otherwise you'll end up with an instance with just disk :) | 14:34 |
s10 | mriedem: yes, I see, there is https://review.openstack.org/#/c/555160/ | 14:35 |
nicolasbock | Good point efried :) | 14:35 |
mriedem | efried: nicolasbock: might be sensible to have an "openstack resource provider allocation class set" similar to the inventory class set CLI https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-class-set | 14:36 |
nicolasbock | So the command worked | 14:36 |
mriedem | ^ allows you to set inventory on a provider for a specific class, not replace the entire set of inventory for the provider | 14:36 |
nicolasbock | But now I have https://pastebin.com/KrcWAXbF | 14:36 |
mriedem | that uses https://developer.openstack.org/api-ref/placement/#update-resource-provider-inventory | 14:36 |
sean-k-mooney | efried: that is proably another reason to have an edit command since this all needs to be done atomicly | 14:36 |
mriedem | we don't have an api like that for allocations, which is why there isn't a CLI for it | 14:37 |
mriedem | we just have https://developer.openstack.org/api-ref/placement/#update-allocations | 14:37 |
openstackgerrit | Merged openstack/nova stable/rocky: Ignore VirtDriverNotReady in _sync_power_states periodic task https://review.openstack.org/605533 | 14:37 |
mriedem | but we could easily write a command that just updates one of the resource classes within the existing allocations | 14:37 |
nicolasbock | Yes that sounds sensible mriedem | 14:37 |
efried | nicolasbock: Oh, interesting. That's... probably a bug. | 14:37 |
nicolasbock | :) | 14:38 |
*** ivve has joined #openstack-nova | 14:38 | |
mriedem | i very much doubt osc-placement handles consumer generations yet, so it could be racy for the CLI to orchestrate this | 14:38 |
nicolasbock | I should remove the old allocation, right? | 14:38 |
mriedem | but that's probably a low risk | 14:38 |
efried | nicolasbock: Yeah, except the only way to do that is openstack resource provider allocation delete $instance_uuid which (I sincerely hope) removes all of them. | 14:38 |
efried | nicolasbock: actually what may have happened is that the source host still thinks it has the instance, and it "healed" the allocations. | 14:39 |
nicolasbock | All of them? | 14:39 |
efried | That would be something to look in the logs for. | 14:39 |
nicolasbock | Ok | 14:39 |
mriedem | do you have ocata computes? | 14:39 |
nicolasbock | But if it removes all of them, wouldn't that be bad? | 14:39 |
efried | nicolasbock: Well, if you remove all of them, then you can run your 'set' command to restore the proper ones. | 14:40 |
efried | But | 14:40 |
mriedem | if you have ocata computes, the resource tracker is reporting the allocations it thinks exist to placement | 14:40 |
efried | if my suspicion is correct, once you delete all the allocations and wait a minute, the original (source) allocations will magically reappear. | 14:40 |
*** ttsiouts has joined #openstack-nova | 14:40 | |
efried | okay, so mriedem that would explain the source allocs magically reappearing? | 14:41 |
nicolasbock | mriedem: This is using Newton | 14:42 |
nicolasbock | I'll try to delete the allocation | 14:43 |
nicolasbock | And wait to see what happens :) | 14:43 |
mriedem | newton/ocata computes will recreate allocations yes | 14:43 |
mriedem | until you get everything upgraded to >= pike, the resource tracker periodic task in the compute service will try to manage allocations | 14:44 |
nicolasbock | The new allocation was deleted while we were chatting | 14:44 |
nicolasbock | Interesting mriedem | 14:44 |
nicolasbock | But where is the periodic task getting its information from? | 14:45 |
mriedem | the instances it thinks are running on that host, | 14:46 |
mriedem | and those instances flavors | 14:46 |
nicolasbock | Is there a way to update that? | 14:46 |
mriedem | so if compute host A thinks instance B is running on it with a flavor that uses x,y,z vcpu/ram/disk, it's going to report that | 14:46 |
mriedem | update what? | 14:47 |
nicolasbock | So I would have to convince the compute host that it's not running the instance? | 14:47 |
efried | nicolasbock: I kind of missed how we got into this situation. What makes you think the instance was successfully removed from the source host? | 14:47 |
mriedem | nicolasbock: is the instance.host in the db pointing at that host? | 14:47 |
nicolasbock | I am going by what `openstack server show` is telling me :) | 14:47 |
mriedem | server show should also tell you yeah | 14:48 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage https://review.openstack.org/604400 | 14:48 |
*** slaweq has quit IRC | 14:48 | |
nicolasbock | So `server show` is reporting an incorrect hypervisor | 14:49 |
mriedem | this is where the RT gets the instances it thinks are running on it https://github.com/openstack/nova/blob/newton-eol/nova/compute/resource_tracker.py#L556 | 14:49 |
openstackgerrit | Vlad Gusev proposed openstack/nova master: Not instance to ERROR if set_admin_password failed https://review.openstack.org/555160 | 14:49 |
sean-k-mooney | mriedem: there is a live migration edgecase that mdbooth was looking at a few weeks ago where post migrate source failed and we would not update the host the vm was running on | 14:50 |
sean-k-mooney | but the vm has actully been moved correectly | 14:50 |
mriedem | nicolasbock: so did you live migrate this vm or something? why is nova reporting its on the wrong host? | 14:52 |
mdbooth | Ah, yes. I do recall a bug with that. If we get an error in cleanup on the source host, called *post* successful migration, we then rollback the migration and put the instance in an error state, but it's still running fine on the destination. | 14:54 |
mdbooth | So, e.g. if you get an error in terminate_connection or whatever, you get in this state | 14:54 |
mdbooth | And you can't clean it up, because instance.host is pointing to the source, but it's actually running on the dest | 14:54 |
mriedem | terminate_connection as in cleaning up source node volume attachments and such right? | 14:55 |
mriedem | same with ports i'm sure | 14:55 |
mriedem | post live migration cleaning up the source | 14:55 |
mdbooth | mriedem: Right. Any cleanup on the source | 14:55 |
mriedem | we should just catch and log cleanup failures | 14:55 |
openstackgerrit | Vlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed https://review.openstack.org/555160 | 14:55 |
*** slaweq has joined #openstack-nova | 14:56 | |
mdbooth | mriedem: Right. The error in my view was that we put the instance in an error state, when the instance was fine. We should put the migration in an error state, but leave the instance alone. | 14:57 |
mdbooth | And also do as much cleanup as possible in the presence of errors. | 14:57 |
nicolasbock | mriedem: Yes I think that's what happened | 14:57 |
openstackgerrit | Jan Gutter proposed openstack/nova-specs master: Spec to implement vRouter HW offloads https://review.openstack.org/567148 | 15:07 |
openstackgerrit | Jan Gutter proposed openstack/nova-specs master: Spec to implement generic HW offloads for os-vif https://review.openstack.org/607610 | 15:07 |
*** jangutter has joined #openstack-nova | 15:07 | |
openstackgerrit | Claudiu Belu proposed openstack/nova master: tests: autospecs all the mock.patch usages https://review.openstack.org/470775 | 15:09 |
*** eharney has quit IRC | 15:09 | |
melwitt | . | 15:10 |
mriedem | s10: commented in that patch | 15:12 |
mriedem | nicolasbock: so live migration was successful but something failed in post like mdbooth is mentioning | 15:12 |
nicolasbock | Ok | 15:12 |
mriedem | nicolasbock: you'll likely need to manually update the instances.host value in the db then for that instance | 15:12 |
mriedem | otherwise nova-compute on the source host is going to continue thinking it owns the instance | 15:13 |
s10 | mriedem: thank you | 15:14 |
nicolasbock | Ok, other than that this sounds mildly scary, could you give me a pointer where I find that value mriedem ? | 15:14 |
mriedem | do you know where the guest is actively running now? | 15:14 |
openstackgerrit | Merged openstack/nova stable/ocata: [Stable Only] Add amd-ssbd and amd-no-ssb CPU flags https://review.openstack.org/607296 | 15:14 |
nicolasbock | Yes | 15:15 |
mriedem | it should be in the last live-migration migration record for the instance | 15:15 |
nicolasbock | Ok | 15:15 |
mriedem | well then you just update the table record in the nova db | 15:15 |
mriedem | update instances set host=<host> where uuid=<instance uuid>; | 15:16 |
nicolasbock | Ok, sounds so straightforward when you put it like that ;) | 15:16 |
nicolasbock | I'll give it a try | 15:16 |
nicolasbock | Thanks! | 15:16 |
sean-k-mooney | mriedem: out of interest what would happen if you tried to do a hard reboot or other lifecycle action on a vm in this state with the wrong host set | 15:19 |
openstackgerrit | Vlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed https://review.openstack.org/555160 | 15:19 |
*** dpawlik has joined #openstack-nova | 15:19 | |
sean-k-mooney | woudl it repare the instance or try to start it on the wrong host? | 15:19 |
mriedem | it would try to start it on the wrong host | 15:21 |
mriedem | i do'nt know what would then happen - would you get the same instance running on two hosts? or a domain not found from the wrong host when trying to reboot it? | 15:21 |
mriedem | i'd hope the latter | 15:21 |
sean-k-mooney | right which if it was using share storage could lead to data curoption correct | 15:21 |
mriedem | well i'd hope reboot would fail if the guest isn't actually on the hypervisor | 15:22 |
sean-k-mooney | i would hope the latter too | 15:22 |
openstackgerrit | Elod Illes proposed openstack/nova stable/ocata: Don't delete neutron port when attach failed https://review.openstack.org/607614 | 15:22 |
mriedem | doesn't look like it would fail though https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L849 | 15:23 |
mriedem | we just handle the not found and assume the guest is already gone | 15:23 |
*** dpawlik has quit IRC | 15:24 | |
*** eharney has joined #openstack-nova | 15:24 | |
mriedem | so maybe there is something to be said for putting hte instance into ERROR state on failed post live migration | 15:24 |
mriedem | to force the admin to fix it | 15:24 |
sean-k-mooney | mriedem: right we need that in case where we are usign hard reboot to "fix" things | 15:25 |
mriedem | so the user doesn't try to reboot the thing and screw it up | 15:25 |
sean-k-mooney | mriedem: perhaps when i discussed this wit mdbooth previorsly i was suggesting always updating the host to the correct location fo the vm | 15:26 |
sean-k-mooney | then you could decide if it shoudl stay in error or active state seperately without haveing more bugs if you left it in active | 15:26 |
openstackgerrit | Claudiu Belu proposed openstack/nova master: hyper-v: autospec classes before they are instantiated https://review.openstack.org/342211 | 15:27 |
sean-k-mooney | mriedem: or to put that another way i think there are two issues in the other case. 1 the host is not upstaed when the vm is moved in some cases and 2 what to do when cleanup fails post migration | 15:28 |
mriedem | if post live migration set the instance.host to the correct host on which it's running then yeah my concern about the user rebooting it and now having the same guest on different hosts is less of an issue | 15:28 |
*** ttsiouts has quit IRC | 15:31 | |
sean-k-mooney | nicolasbock: if you are still around the clip notes version of that conversation is you might want to consider locking the instance untill you have reparied the db to prevent any lifecylce envents | 15:32 |
mdbooth | mriedem: IIRC my thought at the time was that we should completely update the instance record for the destination immediately after we switch it, then run source cleanup. | 15:33 |
mdbooth | So if we fail for whatever reason we've recorded that we're running on the dest. | 15:33 |
*** pcaruana has quit IRC | 15:33 | |
*** med_ has joined #openstack-nova | 15:37 | |
mriedem | s10: i think the unit test is going to fail in that patch, see comments for why | 15:37 |
mdbooth | Incidentally, to reiterate something I said earlier, when this patch landed late last week it made the test_parallel_evacuate_with_server_group about 20 times more likely to occur: https://review.openstack.org/#/c/604859/ | 15:37 |
mdbooth | That test is now failing around 50% for me. | 15:38 |
mriedem | we can skip the test for now | 15:38 |
mriedem | while the fix is being reviewed | 15:38 |
mdbooth | mriedem: ack. Given ^^^ it seems that the test has never been good. | 15:38 |
*** macza has joined #openstack-nova | 15:39 | |
*** hamzy has quit IRC | 15:40 | |
openstackgerrit | Rodolfo Alonso Hernandez proposed openstack/os-vif master: Add native implementation OVSDB API https://review.openstack.org/482226 | 15:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Skip test_parallel_evacuate_with_server_group until fixed https://review.openstack.org/607620 | 15:43 |
*** hamzy has joined #openstack-nova | 15:43 | |
mriedem | dansmith: efried: mdbooth: ^ gives time to be comfortable with the fix | 15:43 |
mdbooth | mriedem: ack. | 15:44 |
efried | mriedem: What will the criteria for comfort be? | 15:44 |
efried | mriedem: https://review.openstack.org/#/c/605436/ has three +1s and a +2 | 15:45 |
mriedem | i guess that's up to whoever +Ws it | 15:45 |
efried | mriedem: Well, I'm comfortable that it fixes the problem. But don't feel confident enough in the actual code change to +W. I would think someone like.... mriedem would be able to have that confidence. | 15:46 |
*** Luzi has joined #openstack-nova | 15:47 | |
mriedem | i'm not very confident in anything atm | 15:47 |
dansmith | I think that change needs a lot of inspection | 15:48 |
dansmith | which I can't do right this moment | 15:48 |
*** penick has joined #openstack-nova | 15:57 | |
mriedem | who enjoys a good UnboundLocalError? https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1429 | 15:59 |
mriedem | oh i guess it's not unbound, it's a module import... | 16:02 |
*** sahid has quit IRC | 16:06 | |
*** gyee has joined #openstack-nova | 16:06 | |
*** slaweq has quit IRC | 16:09 | |
mriedem | so uh, if we hit ^ shouldn't we fail the build? | 16:09 |
*** slaweq has joined #openstack-nova | 16:10 | |
mriedem | clearly the user isn't going to get the sriov port attached to the guest that they requested | 16:10 |
openstackgerrit | Jay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance https://review.openstack.org/607626 | 16:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix logging parameter in _populate_pci_mac_address https://review.openstack.org/607628 | 16:12 |
*** ttsiouts has joined #openstack-nova | 16:12 | |
*** hoangcx has quit IRC | 16:12 | |
mriedem | sean-k-mooney: you might be interested in https://bugs.launchpad.net/nova/+bug/1795064 | 16:12 |
openstack | Launchpad bug 1795064 in OpenStack Compute (nova) "SR-IOV error IndexError: pop from empty list" [Undecided,New] | 16:12 |
mriedem | something something sriov and kernel versions | 16:13 |
*** hoangcx has joined #openstack-nova | 16:13 | |
sean-k-mooney | looking | 16:13 |
jaypipes | mriedem: https://review.openstack.org/#/c/607626/ is that stable/ocata backport for the duplicate hypervisor_hostname thingie | 16:13 |
jaypipes | mriedem: thx for your help earlier. | 16:13 |
*** s10 has quit IRC | 16:13 | |
mriedem | jaypipes: so you cherry-picked that from master? | 16:15 |
mriedem | https://review.openstack.org/#/c/508555/ | 16:15 |
*** ttsiouts has quit IRC | 16:15 | |
*** spatel_ has joined #openstack-nova | 16:15 | |
mriedem | or did you cherry pick from the pike backport but forget the -x option on the cherry-pick command? | 16:15 |
spatel_ | Hi folks | 16:15 |
jaypipes | mriedem: no, I cherry-picked the SHA1 from the stable/pike patch | 16:15 |
*** ttsiouts has joined #openstack-nova | 16:16 | |
jaypipes | mriedem: oh, sorry, I don't know about -x :( | 16:16 |
spatel_ | I am having issue with SR-IOV with shared PCI device between numa and reading this blueprint : https://blueprints.launchpad.net/nova/+spec/share-pci-between-numa-nodes | 16:16 |
sean-k-mooney | spatel_: this bug https://bugs.launchpad.net/nova/+bug/1795064? or another? | 16:17 |
openstack | Launchpad bug 1795064 in OpenStack Compute (nova) "SR-IOV error IndexError: pop from empty list" [Undecided,New] | 16:17 |
spatel_ | I have set hw:pci_numa_affinity_policy='preferred' in flavor but still its not allowing me to run instance on NUMA-1 | 16:17 |
spatel_ | sean-k-mooney: that problem got resolved by downgrading kernel to 3.x | 16:17 |
sean-k-mooney | spatel_: i think you issue with 4.18 was that you did not have a netdev associated with the vf | 16:18 |
spatel_ | Is that configuration issue or BUG? | 16:19 |
sean-k-mooney | spatel_: i would say config issue. i would guess the default options for the gernel module cahnged and or you are using a different driver by default | 16:19 |
jaypipes | mriedem: apologies. how can I fix appropriately? do I need to re-do the git cherry-pick with -x? or can/should I just edit the commit message with seomthing? | 16:19 |
sean-k-mooney | spatel_: for example if the device was bound to vfio_pci instead fo the broadcom driver then it would existit in lspci but not have a netdev | 16:20 |
*** ttsiouts has quit IRC | 16:20 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: WIP: Libvirt live migration: update NUMA XML for dest https://review.openstack.org/575179 | 16:20 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Service version check for NUMA live migration https://review.openstack.org/566723 | 16:20 |
spatel_ | hmmm! | 16:21 |
sean-k-mooney | spatel_: do you whitelist devices using the devname option? | 16:21 |
mriedem | jaypipes: see my other comment in the ocata backport about documenting the conflicts? | 16:21 |
spatel_ | sean-k-mooney: yes i am using devname option to specify my interface | 16:21 |
spatel_ | pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }" | 16:21 |
sean-k-mooney | spatel_: in general i advise against that for this exact reason | 16:21 |
sean-k-mooney | spatel_: if you use the pci adddress instead 4.18 would likely be fine | 16:21 |
spatel_ | pci address ? | 16:22 |
*** macza has quit IRC | 16:22 | |
sean-k-mooney | spatel_: the whitelist can have 3 modes of whitelisting | 16:22 |
spatel_ | you mean vendor_id or product_id ? | 16:22 |
*** macza has joined #openstack-nova | 16:22 | |
sean-k-mooney | spatel_: you can used devname, (vendor_id and product_id) or you can pass a pci address | 16:23 |
spatel_ | sean-k-mooney: i will give it a try and report back to BUG | 16:23 |
spatel_ | sean-k-mooney: currently i am dealing with this issue :( https://bugs.launchpad.net/nova/+bug/1795920 | 16:23 |
openstack | Launchpad bug 1795920 in OpenStack Compute (nova) "SR-IOV shared PCI numa not working " [Undecided,New] | 16:23 |
spatel_ | Do you know what wrong i am doing here | 16:23 |
*** panda is now known as panda|off | 16:24 | |
spatel_ | I have 2 NUMA node and running SR-IOV with shared PCI | 16:24 |
*** macza has quit IRC | 16:24 | |
sean-k-mooney | https://docs.openstack.org/mitaka/networking-guide/config-sriov.html has the doc | 16:24 |
sean-k-mooney | am let me look | 16:24 |
spatel_ | I am only able to use one side of NUMA | 16:24 |
sean-k-mooney | by default unless you set a pci numa affinity policy in the flavor or image we require strict numa afinity | 16:25 |
spatel_ | Its not allowing me to launch SR-IOV instance on NUMA-2 ( because PCI is attach to NUMA-1 ) | 16:25 |
spatel_ | All i did is hw:pci_numa_affinity_policy=preferred in flavor | 16:25 |
spatel_ | what else i need to do? | 16:25 |
sean-k-mooney | spatel_: let me check. i taught that was enough but you might also need to set the policy in the whitelist | 16:26 |
spatel_ | I do have aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' in flavor | 16:26 |
*** helenafm has quit IRC | 16:26 | |
spatel_ | I think document isn't clear in blueprint so i am totally confused :( | 16:27 |
spatel_ | If i remove "aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated'" from flavor then i am able to launch instance anywhere in NUMA with SR-IOV support | 16:28 |
*** dims has quit IRC | 16:28 | |
sean-k-mooney | aggregate_instance_extra_specs:pinned='true' is not a standard thing | 16:29 |
*** med_ has quit IRC | 16:29 | |
spatel_ | hmmm! i didn't get it | 16:30 |
sean-k-mooney | spatel_: you should not need to set anything in the aggragte to use the pci policies | 16:30 |
openstackgerrit | Jay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance https://review.openstack.org/607626 | 16:30 |
spatel_ | oh! so you are saying i should remove aggregate_instance_extra_specs:pinned | 16:30 |
jaypipes | mriedem: k, hopefully correct now. | 16:30 |
jaypipes | thx for the help again. | 16:31 |
sean-k-mooney | spatel_: yes | 16:31 |
spatel_ | lets say if i remove "aggragte" then does my vCPU get Pinned ? | 16:31 |
spatel_ | removing aggrate and going to launch instance | 16:33 |
*** med_ has joined #openstack-nova | 16:34 | |
*** dims_ has joined #openstack-nova | 16:35 | |
spatel_ | sean-k-mooney: didn't work error 'No valid host was found. There are not enough hosts available' | 16:36 |
spatel_ | look like something is missing.. | 16:36 |
sean-k-mooney | spatel_: so jsut to confirm you dont have any aggragte metadata set and have hw:pci_numa_affinity_policy=preferred set | 16:38 |
sean-k-mooney | in the flavor | 16:38 |
spatel_ | This is what i have currently in flavor -> properties | hw:cpu_policy='dedicated', hw:numa_nodes='2', hw:pci_numa_affinity_policy='preferred' | 16:39 |
spatel_ | If i remove all 3 option then i am successfully able to launch instance | 16:39 |
stephenfin | spatel_, sean-k-mooney: We didn't implement it with a flavor extra spec in the end | 16:39 |
sean-k-mooney | so this has be set in the pci whitelist then | 16:40 |
stephenfin | spatel_: Yes | 16:40 |
stephenfin | Oops, sean-k-mooney ^ | 16:40 |
spatel_ | oh!! wait wait.. so what i need to do in pci whitelist ? | 16:40 |
sean-k-mooney | spatel_: https://github.com/openstack/nova/blob/master/nova/pci/request.py#L16-L25 | 16:40 |
sean-k-mooney | sorry thats not what you want but yes you do | 16:41 |
spatel_ | so i need to add that snippet in compute nova.conf in [PCI] section ? | 16:41 |
stephenfin | spatel_: Have you seen this? https://docs.openstack.org/nova/latest/admin/networking.html#numa-affinity | 16:41 |
stephenfin | spatel_: Ignore that - wrong feature :) | 16:41 |
spatel_ | ok | 16:41 |
spatel_ | I am running queens | 16:41 |
stephenfin | spatel_: https://docs.openstack.org/nova/latest/configuration/config.html#pci | 16:42 |
stephenfin | See the alias configuration key | 16:42 |
stephenfin | spatel_: But, to be clear, is this for a PCI device or an SR-IOV device? | 16:42 |
spatel_ | SR-IOV device | 16:42 |
*** artom has quit IRC | 16:42 | |
spatel_ | We are running high performance network application and need high speed network or high PPS rate | 16:43 |
sean-k-mooney | stephenfin: looking at the whitelist code i dont think we supprot it in the whitelist | 16:43 |
stephenfin | sean-k-mooney: Doesn't seem like it. I'm trying to think why it was done that way | 16:44 |
sean-k-mooney | which would mean the polices only work for device requeted via flavor alias which would be dumb | 16:44 |
sean-k-mooney | are you sure we did not supprot this in the flavor extraspecs /image metadata | 16:44 |
spatel_ | I am going to add alias and get back to you.. | 16:44 |
stephenfin | definitely not | 16:44 |
sean-k-mooney | stephenfin: was the whole point of this feature to fix neutron sriov | 16:45 |
spatel_ | is product_id and vendore_id mandatory because in i am using devname here "pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }"" | 16:45 |
*** med_ has quit IRC | 16:45 | |
*** s10 has joined #openstack-nova | 16:46 | |
sean-k-mooney | spatel_: no | 16:46 |
*** s10 has quit IRC | 16:46 | |
sean-k-mooney | a white list can be in any of these forms https://github.com/openstack/nova/blob/master/nova/pci/devspec.py#L182-L192 | 16:47 |
sean-k-mooney | actully the alias yes that need to use vendor_id and product_id | 16:48 |
stephenfin | sean-k-mooney: If it was, it seems something may have slipped through the cracks here | 16:48 |
sean-k-mooney | alias are not for networking they are for passthrough devices | 16:48 |
stephenfin | Yup, I get that | 16:48 |
spatel_ | type-PCI, type-PF and type-VF what i should pick ? | 16:49 |
spatel_ | VF ? | 16:49 |
stephenfin | From the quick glance here, it should really be configured via the whitelist. I'm not sure why I went with the alias | 16:49 |
stephenfin | spatel_: yup | 16:49 |
spatel_ | doing it.. | 16:49 |
sean-k-mooney | stephenfin: ya i am go to confirm https://bugs.launchpad.net/nova/+bug/1795920 | 16:49 |
openstack | Launchpad bug 1795920 in OpenStack Compute (nova) "SR-IOV shared PCI numa not working " [Undecided,Confirmed] | 16:49 |
*** med_ has joined #openstack-nova | 16:49 | |
sean-k-mooney | stephenfin: interested in working on this? if not ill add it to my list but this need to be fixed | 16:50 |
sean-k-mooney | and backported | 16:50 |
stephenfin | I won't tackle it tonight but I can do so, yeah | 16:50 |
stephenfin | not sure if we can backport though. It'll be a config file change | 16:50 |
sean-k-mooney | cool we liekly need to repreose the old spec | 16:50 |
sean-k-mooney | stephenfin: i was suggesting we need to add the flavor and image extraspecs so no config file change | 16:51 |
stephenfin | sean-k-mooney: Possibly, but before doing so I'd suggest going back and reading the spec reviews | 16:51 |
stephenfin | There was a reason we didn't do that, though I don't recall it now :/ | 16:51 |
*** spatel has joined #openstack-nova | 16:52 | |
*** spatel_ has quit IRC | 16:52 | |
spatel | sean-k-mooney: this is what i change in nova.conf http://paste.openstack.org/show/731417/ | 16:52 |
stephenfin | If it's image metadata changes, we can't backport those due to object changes | 16:52 |
spatel | can you verify | 16:52 |
sean-k-mooney | yes i rembere i was very against using the alisa but i never recalled dropping the extra specs | 16:52 |
spatel | going to launch instance now, figure cross | 16:53 |
sean-k-mooney | spatel: its goning to fail. | 16:53 |
spatel | ?? | 16:53 |
spatel | why? | 16:53 |
sean-k-mooney | looking at the code the feature was not finished | 16:53 |
spatel | Damn it :( | 16:54 |
spatel | so what is the deal here ? | 16:54 |
sean-k-mooney | the numa policies are only repected for flavor based pci device passhtoruh e.g. for things like gpu or acllorator cards | 16:54 |
sean-k-mooney | spatel: the spec was approve and the full feature was not merged | 16:54 |
spatel | in my case its VF | 16:55 |
*** med_ has quit IRC | 16:55 | |
spatel | currently i can't utilize both numa :( | 16:55 |
sean-k-mooney | spatel: in your case its a neutron port with vnic_type direct correct | 16:55 |
spatel | yes | 16:55 |
sean-k-mooney | spatel: the workaround is to make the guest have 2 numa nodes | 16:56 |
melwitt | sean-k-mooney: which spec was not properly finished? | 16:56 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Handle IndexError in _populate_neutron_binding_profile https://review.openstack.org/607650 | 16:56 |
mriedem | spatel: fyi ^ | 16:56 |
sean-k-mooney | melwitt: the numa pci polices | 16:56 |
sean-k-mooney | ill get the link on sec | 16:57 |
melwitt | thanks. I didn't see a link in the backscroll | 16:57 |
spatel | sean-k-mooney: "guest have 2 numa nodes" can you explain this? | 16:57 |
*** macza has joined #openstack-nova | 16:57 | |
sean-k-mooney | melwitt: https://review.openstack.org/#/c/361140/ | 16:57 |
openstackgerrit | Vlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed https://review.openstack.org/555160 | 16:57 |
sean-k-mooney | spatel: in the flavor set hw:numa_nodes=2 | 16:57 |
spatel | let me try hold on... | 16:58 |
sean-k-mooney | this will create a guest with 2 numa nodes with half the cpus and ram on each | 16:58 |
spatel | sean-k-mooney: FYI, i have tried this and it failed "hw:cpu_policy='dedicated', hw:numa_nodes='2', hw:pci_numa_affinity_policy='preferred'" | 16:58 |
spatel | now i am going to remove "hw:cpu_policy='dedicated" and "hw:pci_numa_affinity_policy='preferred'" to see if that work | 16:59 |
sean-k-mooney | melwitt: the flavor and image extraspecs are appently not implemente meaning that this does not work for neutron sriov ports as it should | 16:59 |
melwitt | sean-k-mooney: ok, according to the notes on the blueprint, people thought it was completed "The last functional patch for this was merged on Dec 30, 2017" https://blueprints.launchpad.net/openstack/nova/+spec/share-pci-between-numa-nodes | 16:59 |
spatel | melwitt: that is why i am chasing that blueprint because it says completed 2017 | 16:59 |
*** derekh has quit IRC | 17:00 | |
sean-k-mooney | melwitt: i taught it was complted im checking the code to confirm but apparently its not working | 17:00 |
spatel | Do we have any ETA because i have 100 compute node in racks and here i am stuck with this issue :( | 17:00 |
melwitt | I feel like someone has asked me about this bp before, asking if it applies to SRIOV too, and I thought since it never mentions SRIOV, that it doesn't | 17:00 |
sean-k-mooney | i can proably test this locally too i have jsut set up a sriov host | 17:01 |
melwitt | or wasn't meant to. and that adding SRIOV support would be additional work outside the scope of this particular blueprint | 17:01 |
sean-k-mooney | melwitt: its primary usecase was sriov specifics for telcos where they had to numa node server but all nics were connecte to one numa node due to space constriants in there rack | 17:01 |
melwitt | i.e. a new blueprint would be, for example, "add SRIOV support for sharing PCI devices between NUMA nodes" | 17:01 |
*** Luzi has quit IRC | 17:02 | |
spatel | melwitt: its much clear now.. :) | 17:03 |
openstackgerrit | Merged openstack/nova stable/rocky: Explicitly fail if trying to attach SR-IOV port https://review.openstack.org/605118 | 17:03 |
mriedem | known issue yeah? https://bugs.launchpad.net/nova/+bug/1794717 | 17:04 |
openstack | Launchpad bug 1794717 in OpenStack Compute (nova) "rocky: ephemeral disk can not be resized" [Undecided,New] | 17:04 |
sean-k-mooney | mriedem: not being able to resise ephemeral disk ya | 17:05 |
sean-k-mooney | mriedem: i mean i thk i some very specific edgecase it can work today but there is no generic way to enable it | 17:05 |
sean-k-mooney | e.g. how to you resize form 1 500G disk to 2 400G disk so we just decided not to support it at all | 17:06 |
mriedem | yup, bug 1558880 | 17:06 |
openstack | bug 1558880 in OpenStack Compute (nova) "instance can not resize ephemeral in mitaka" [Medium,Confirmed] https://launchpad.net/bugs/1558880 | 17:06 |
melwitt | sean-k-mooney: ok, I don't know anything about that. if that spec scope is actually incomplete, then we need to decide how we deal with it. open another spec for this cycle to finish it or treat them as bugs | 17:07 |
sean-k-mooney | melwitt: proably reporpose the spec is the best way and jsut add the flavor exra specs and image metadata values that were orginaly proposed | 17:08 |
sean-k-mooney | melwitt: unless you think we can backport them in which case it could be a bug | 17:08 |
sean-k-mooney | backporting woudl be the only reason to make it a bug in my mind but its also addign new fuctionality e.g. tuning off numa affinty for pci devices | 17:09 |
*** jpena is now known as jpena|off | 17:11 | |
*** ralonsoh has quit IRC | 17:13 | |
*** med_ has joined #openstack-nova | 17:13 | |
sean-k-mooney | melwitt: i or stephen will repropsoe the spec | 17:15 |
sean-k-mooney | melwitt: stephenfin is heading home so i will likely do it later today | 17:15 |
melwitt | sean-k-mooney: I'd run the idea by mriedem too, in case he has another opinion on how to handle this | 17:17 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: libvirt: implement reshaper for vgpu https://review.openstack.org/599208 | 17:17 |
sean-k-mooney | melwitt: sure i just felt a little checky to sneak it in as a bug fix :) | 17:17 |
bauzas | dansmith: mriedem: I tested the vgpu reshape for allocations too and good news it works ! I just fixed a few things that I discovered when testing ^ | 17:18 |
melwitt | sean-k-mooney: yeah, you're probably right. I didn't think much about it | 17:19 |
bauzas | now, call it a day | 17:19 |
*** med_ has quit IRC | 17:20 | |
sean-k-mooney | spatel: where you able to use a multi numa node guest to spawn the instance. that funtionality definetly works | 17:23 |
spatel | Testing it now..should i add "hw:cpu_policy='dedicated'" too for pinning ? | 17:24 |
dansmith | bauzas: ack | 17:25 |
sean-k-mooney | spatel: yes if you want cpu pinning then add hw:cpu_policy='dedicated' | 17:25 |
spatel | doing it.. hold on.. soon report back | 17:25 |
sean-k-mooney | no rush | 17:27 |
*** tbachman has quit IRC | 17:29 | |
*** tbachman has joined #openstack-nova | 17:33 | |
spatel | sean-k-mooney: i am able to launch two VM with 10vCPU core each ( i have 32 core compute node with 16+16 numa) but look like it didn't pin down CPU | 17:35 |
spatel | check this out http://paste.openstack.org/show/731420/ | 17:35 |
spatel | I can see it pin CPU cross numa | 17:35 |
sean-k-mooney | spatel: can you run virs dumpxml <instance> | 17:37 |
*** tbachman_ has joined #openstack-nova | 17:37 | |
sean-k-mooney | spatel: i think it pinned everythin correctly | 17:37 |
spatel | http://paste.openstack.org/show/731423/ | 17:37 |
sean-k-mooney | spatel: it looks like each | 17:37 |
spatel | I thought it should pin all vCPU core with same NUMA node CPU right? | 17:38 |
sean-k-mooney | no | 17:38 |
spatel | hmmmm? | 17:38 |
*** tbachman has quit IRC | 17:39 | |
*** mvkr has quit IRC | 17:39 | |
*** tbachman_ is now known as tbachman | 17:39 | |
sean-k-mooney | by setting hw:numa_nodes=2 you will have half the cpus on one numa node and half on the other | 17:39 |
sean-k-mooney | memory will also be equally split | 17:39 |
sean-k-mooney | provided there is a pci device free on at least one of the 2 numa nodes assocaicated with the vm vcpus then we will allow the vm to boot | 17:40 |
spatel | if i remove numa_node=2 then it will pin down all vCPU on same node right? | 17:40 |
sean-k-mooney | correct addint hw:cpu_policy=dedicated implictly adds hw:numa_nodes=1 | 17:40 |
spatel | hmm! interesting.. | 17:41 |
spatel | using numa_node=2 will have some performing issue right? | 17:41 |
sean-k-mooney | by explcitly setting hw:numa_nodes=2 it will allow both numa nodes on the host to be used but it will also limit the vm to host with 2+ numa nodes | 17:41 |
sean-k-mooney | spatel: it can if the application in the gust itself does not understand numa affinity | 17:42 |
sean-k-mooney | it can also improve the performacne as you doubles your memory bandwith as the vm will now use memroy form 2 host numa noes/memory controlers | 17:43 |
spatel | I think time to run some test... | 17:43 |
spatel | We are media company and using VoIP base application | 17:43 |
sean-k-mooney | testing is always a good idea :) | 17:43 |
spatel | First i build openstack without SR-IOV and found performance was horrible (PPS rate was only 50k after that it start dropping packets) | 17:44 |
sean-k-mooney | as a comunity we have done a lot of work to improve numa affinity over the years | 17:44 |
spatel | I have just started learning numa stuff so i am new but it looks interesting.. | 17:45 |
sean-k-mooney | spatel: the stict pci numa affinity was added for telco usescases wehre they could not tolerate cross numa pci/sriov | 17:45 |
sean-k-mooney | spatel: it certenly is .... interesting. its also a pain in the ass but give better performace when you get it right | 17:45 |
spatel | I have some legacy hardware and i have to stick to them | 17:46 |
spatel | other side i am planning to test DPDK if its better | 17:46 |
sean-k-mooney | numa is not going away infact its become more common | 17:46 |
sean-k-mooney | dpdk is much better then kernel ovs | 17:46 |
sean-k-mooney | but its more complicated too | 17:46 |
spatel | but it doesn't need hardware dependency atleast | 17:47 |
spatel | I spent thousand of $$$$ to get SR-IOV supported card | 17:47 |
sean-k-mooney | spatel: not in the same way it requires that the guests use hugepages and that there is a dpdk driver for your nic | 17:47 |
spatel | Does it perform like SR-IOV ? | 17:48 |
sean-k-mooney | spatel: ya dpdk will be cheaeper in that sense but you will have to dedicate 1-2 cores to handel trafic for ovs-dpdk | 17:48 |
sean-k-mooney | spatel: in some cases yes. in general not quite | 17:48 |
sean-k-mooney | what data rates / traffic profiles are you targeting? | 17:49 |
spatel | currently i am deploying VoIP application on 1U server with 32 core / 32G memory. and i have 1000 servers... | 17:49 |
sean-k-mooney | 10G small packets 40G jumbo frames? a mix | 17:49 |
spatel | my peak in production 200 to 230kpps UDP packet rate | 17:50 |
sean-k-mooney | on well dpdk can handel that easilly | 17:50 |
spatel | really??? | 17:50 |
spatel | if that is the case then it will be win win solution | 17:50 |
*** ivve has quit IRC | 17:51 | |
sean-k-mooney | ya dpdk was desinged to hit 10G line rate with 64byte packets which is 14mpps | 17:51 |
spatel | we have lots of server in AWS (with sr-iov) support | 17:51 |
spatel | that is really cool! | 17:51 |
sean-k-mooney | with the right hardware it can hit 32mpps on a singel core but in generall you will see more like 6mpps | 17:51 |
spatel | we are using LinuxBridge + VLAN so i need to upgrade to OVS | 17:52 |
sean-k-mooney | its mroe or less like this | 17:52 |
sean-k-mooney | lb<ovs<sriov+macvtap<ovs-dpdk<sriov direct | 17:52 |
spatel | I have tried macvtap but that didn't work too | 17:53 |
sean-k-mooney | spatel: checkout https://dpdksummit.com/Archive/pdf/2016USA/Day02-Session04-ThomasHerbert-DPDKUSASummit2016.pdf slides 16-19 | 17:54 |
spatel | reading.. | 17:55 |
sean-k-mooney | spatel: im alittle biased as im one of the people that added ovs-dpdk support to openstack but for your data rate i think it would work quite well | 17:56 |
spatel | I need to find out how to migrate LinuxBridge to OVS | 17:56 |
sean-k-mooney | spatel: today cold migrate works. im working on fixing livemigrate | 17:57 |
spatel | cool! | 17:57 |
spatel | in SR-IOV i am not able to get that function too | 17:57 |
spatel | even bonding isn't supported | 17:57 |
sean-k-mooney | live migrate almost works we just dont update the bridge name correctly im hoping to backport that | 17:57 |
spatel | nice! if that work | 17:58 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Add scatter-gather-single-cell utility https://review.openstack.org/594947 | 17:58 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Return a minimal construct for nova list when a cell is down https://review.openstack.org/567785 | 17:58 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_qfd_by_cell_and_project() https://review.openstack.org/607663 | 17:58 |
sean-k-mooney | spatel: haha i think im working on all your missing features :) https://review.openstack.org/#/c/605116/ | 17:58 |
openstackgerrit | Merged openstack/nova stable/pike: nova-manage - fix online_data_migrations counts https://review.openstack.org/605840 | 17:59 |
spatel | :) | 17:59 |
spatel | i have lots of requirement :) these are just starting | 17:59 |
spatel | sean-k-mooney: thanks for help!!! | 18:00 |
sean-k-mooney | my main focus this realease at least initally is livemigraton hardenign. e.g fixing edgecase like lb->ovs or sriov | 18:00 |
nicolasbock | <freenode_sea "nicolasbock: if you are still ar"> Thanks for the tip! | 18:00 |
spatel | i didn't know freenode will be very helpful.. last 2 days i am chasing google | 18:00 |
spatel | what you use to deploy your openstack? I am using openstack-ansible | 18:01 |
sean-k-mooney | spatel: no worries. im usually hear so feel free to ping me if you have issues | 18:01 |
spatel | I am going to spend next 6 month here :) until my cloud is ready!! | 18:01 |
sean-k-mooney | spatel: for developement devstack. i used to use kolla-ansible but recently joined redhat so i proably shoudl suggest OSP | 18:01 |
spatel | we spent million dollar last year in AWS so my boss want to build own AWS :) | 18:02 |
sean-k-mooney | spatel: that is how alot of compaines end up running opnestack clouds yes | 18:02 |
spatel | indeed | 18:02 |
spatel | OSP using tripleO right? | 18:03 |
spatel | i tried and found very complicated | 18:03 |
sean-k-mooney | yes that is the officaly supprot installer from redhat | 18:03 |
sean-k-mooney | spatel: and yes it can be | 18:03 |
mnaser | does anyone know if daniel berrange hangs out on irc much? | 18:03 |
mnaser | i'm looking at this old abandoned review and i'm wondering if this is still an issue -- https://review.openstack.org/#/c/241401/ | 18:03 |
sean-k-mooney | mnaser: not on this irc but he is usally on the libvirt one | 18:04 |
mnaser | i'll try to ping him there | 18:04 |
spatel | sean-k-mooney: going to eat something, will catch you again, if any issue :) thanks again | 18:04 |
sean-k-mooney | mnaser: i think that is a bug that has been forgoten about but not nessacarily fixed | 18:06 |
mnaser | sean-k-mooney: yeah, it's not fixed, but it's been a while so i'm wondering if the whole "it doesnt working with backing" argument is no longer valid | 18:07 |
mnaser | we're setting up some really fast hardware (pci-e nvme drives) and want to squeeze the best performance out of it.. short of going to something like lvm | 18:07 |
sean-k-mooney | mdbooth: and lyarwood would like be able to comment better then i on https://bugs.launchpad.net/nova/+bug/1510328 | 18:07 |
openstack | Launchpad bug 1510328 in OpenStack Compute (nova) "Nova pre-allocation of qcow2 is flawed" [Low,Confirmed] | 18:07 |
openstackgerrit | Jack Ding proposed openstack/nova master: Add HPET timer support for x86 guests https://review.openstack.org/605902 | 18:08 |
sean-k-mooney | mnaser: right am in that case would you be better with a raw image instead of qcow if your always preallocating | 18:08 |
mnaser | sean-k-mooney: right, i'm thinking that might be the next path, raw files on disk | 18:09 |
sean-k-mooney | mnaser: if you are also supporting ceph or boot form volume raw can often be better too even if you are using more space for glance / image cache | 18:10 |
mnaser | but then we lose a lot of qcow2 features | 18:10 |
sean-k-mooney | mnaser: like live snapshot | 18:10 |
mnaser | yeah, a lot.. unfortunately | 18:10 |
*** spatel has quit IRC | 18:10 | |
sean-k-mooney | mnaser: i dont think anyone would object if you had a way to fix the bug but did not cause others | 18:11 |
mnaser | sean-k-mooney: yep.. its just that unfortunately there was no documentation as to why that was an issue with backing images | 18:11 |
mnaser | so thats what im trying to research | 18:11 |
sean-k-mooney | mnaser: its got to have something to do with the overlays that we create | 18:13 |
mnaser | yeah it looks like it's not really a possibility | 18:13 |
mnaser | :< | 18:13 |
sean-k-mooney | mnaser: mdbooth and kashyap should be able to confirm tomorow when they are back online | 18:14 |
mnaser | i'll wait to hear | 18:14 |
mnaser | now to find ways to benchmark this server | 18:14 |
mnaser | server/vm that is | 18:14 |
mnaser | http://paste.openstack.org/show/731425/ | 18:15 |
sean-k-mooney | is that a vm with 468G of ram | 18:16 |
sean-k-mooney | sorry 472 | 18:16 |
sean-k-mooney | i also like the insane amount of gpus | 18:17 |
sean-k-mooney | may i sugges you use it to play minecrat tootally how you should benchmark | 18:17 |
sean-k-mooney | mnaser: also i hear bitcoin is a thing :) | 18:17 |
mnaser | sean-k-mooney: aha, we're rolling out gpus and we have instances with 472G of ram, 48 (dedicated) threads, 1.8T of PCI-e NVMe storage.. | 18:18 |
sean-k-mooney | mnaser: actully on a serious note you are an operator of a cloud with vgpus correct | 18:18 |
mnaser | sean-k-mooney: no vgpu support, only dedicated gpus (as far as we've planned) | 18:18 |
*** spatel has joined #openstack-nova | 18:19 | |
mnaser | part of this is MAYBE seeing if we can get some vGPU CI.. if possible, but i hear there are some more complicated reasons why its not possible | 18:19 |
sean-k-mooney | ah well does the lack of vgpu numa affinity effect your decision to use vgpus or deploy gpus in the cloud in general | 18:19 |
sean-k-mooney | mnaser: actully it might be useing complicated trick | 18:19 |
sean-k-mooney | e.g. nested virt + q35 chipset + viommu + pci passthoug of phyical gpu PF to host vm | 18:20 |
mnaser | i think we're starting to roll things out by having dedicated gpus to see market demand for it (we've had some). unfortunately the other thing that's coming to mind is i'm thinking that users who need gpu levels of performance probably would want 100% of it | 18:20 |
mnaser | we can make nested virt available for gpu instances so maybe thats possible | 18:21 |
sean-k-mooney | mnaser: have you talked to bauzas about possible vgpu ci? | 18:21 |
*** imacdonn has quit IRC | 18:21 | |
*** imacdonn has joined #openstack-nova | 18:21 | |
mnaser | sean-k-mooney: we briefly talked about it.. dansmith mentioned concerns about iommu and stuff that's beyond my level of comprehension :) | 18:21 |
mnaser | but we plan to provide at least 1 or 2 instances to openstack CI *if* there's a use case that makes sense | 18:22 |
dansmith | mnaser: he said viommu, so if that's a thing now then maybe it's doable | 18:22 |
sean-k-mooney | dansmith: yes it is but we have not enabled it in nova yet | 18:22 |
sean-k-mooney | but its trival so we could | 18:23 |
sean-k-mooney | well its a flavor extraspec + xml generation and other crap but its not technical very hard to do we just have not done it yet | 18:23 |
*** pcaruana has joined #openstack-nova | 18:24 | |
mnaser | i'd be more than happy to provide 1 or 2 instances with a gpu | 18:24 |
*** itlinux has joined #openstack-nova | 18:24 | |
sean-k-mooney | dansmith: i was added in libvrt 2.1 and qemu 3.4 https://libvirt.org/formatdomain.html#elementsIommu | 18:25 |
* dansmith nods | 18:25 | |
nicolasbock | <freenode_mri "nicolasbock: you'll likely need "> I had to also update `instances.node` but then the allocation was updated correctly | 18:26 |
sean-k-mooney | mnaser: thats very generous. it would certenly help if we could actully test vgpu the upstream ci even if it was an experimtal job that did not run on all patches | 18:26 |
mnaser | while i wrap things up here i can push up a patch to add 1 or 2. we'll probably do it with min-servers: 0 and max-servers: 2 to start with | 18:27 |
*** spatel has quit IRC | 18:28 | |
mriedem | efried: i've replied in https://review.openstack.org/#/c/606122/ | 18:28 |
*** spatel has joined #openstack-nova | 18:28 | |
efried | ack | 18:29 |
spatel | sean-k-mooney: currently i have "intel_iommu=on" in grub.conf, should i add "iommu=pt" too? | 18:29 |
sean-k-mooney | spatel: "iommu=pt" is not requried but advised | 18:30 |
efried | mriedem: +2 | 18:30 |
spatel | will add that :) | 18:30 |
sean-k-mooney | spatel: this is my cmdline on my sriov systems BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=UUID=2cca5edf-cbcc-4f0d-91df-df438bbd56c5 ro crashkernel=auto rhgb quiet intel_iommu=on iommu=pt pci=assign-busses,realloc | 18:30 |
spatel | are you using SR-IOV? | 18:31 |
spatel | or DPDK? | 18:31 |
mriedem | efried: thanks | 18:31 |
mriedem | lazy-load can be a cruel mistress | 18:31 |
sean-k-mooney | spatel: pci=assign-busses,realloc is to work around some hardware bugs where my bios does not allocate enough iommu space | 18:31 |
efried | srsly | 18:31 |
spatel | nice! | 18:32 |
sean-k-mooney | spatel: iommu=pt is need for dpdk but not sriov | 18:32 |
spatel | oh! make sense | 18:32 |
sean-k-mooney | i enable it always so i can deploy both and swap between them | 18:32 |
spatel | sean-k-mooney: i have created new flavor (15 vCPU / 14G memory ) and i got this error | 18:32 |
spatel | ERROR (BadRequest): Instance CPUs and/or memory cannot be evenly distributed across instance NUMA nodes. Explicit assignment of CPUs and memory to nodes is required (HTTP 400) (Request-ID: req-400663e1-75d1-4bbc-a06b-07dcfd845be6) | 18:32 |
*** cdent has quit IRC | 18:33 | |
spatel | This is what i have in flavor hw:cpu_policy='dedicated', hw:numa_nodes='2' | 18:33 |
sean-k-mooney | spatel: yes the error could be improved. the vcpus needs to be devisable by the number of numa nodes othere wise you have to tell us how many cpus to put on each numa node | 18:33 |
*** mvkr has joined #openstack-nova | 18:34 | |
sean-k-mooney | spatel: so i would jsut set it to 14 vcpus and 14G memory | 18:34 |
spatel | cool!! | 18:34 |
spatel | doing it | 18:35 |
*** artom has joined #openstack-nova | 18:35 | |
sean-k-mooney | spatel: since you are optimising your flavors and given your usecase i would also recomment enableing hugepage memroy for the vm | 18:35 |
sean-k-mooney | it will give you a 30-40% performacne boost in many workloads but require you to allocate hugepages on the host first via the kernel command line ideally | 18:36 |
spatel | I have this setting in grub "hugepagesz=2M hugepages=2048 transparent_hugepage=never" | 18:36 |
sean-k-mooney | ah cool that will only allcoate 4G of hugepates form the 32 you have total. | 18:37 |
spatel | one more question i have 32G memory so what number should be good for number of pages? | 18:37 |
spatel | yes i have 32G memory | 18:37 |
spatel | i heard 1G is better for hugepage | 18:38 |
sean-k-mooney | haha i was getting to that next. :) i would recommend between 24-28G of hugepages leave 6-8 for the host | 18:38 |
sean-k-mooney | spatel: it depends for some workloads yes for most it does not matter | 18:38 |
sean-k-mooney | hugepages cannot be subdevided so if you use 1G hugepges the ram in you flavor must be a multiple of 1G | 18:39 |
spatel | my application doesn't need lots of memory because its RTP traffic voip | 18:39 |
spatel | hmm! make sense | 18:39 |
sean-k-mooney | spatel: in your case i doubt you will see a difference and 2MB hugepages will give you more granularity | 18:40 |
spatel | lets stick to 2M then :) | 18:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add post-test hook for testing evacuate https://review.openstack.org/602174 | 18:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add volume-backed evacuate test https://review.openstack.org/604397 | 18:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional regression test for bug 1794996 https://review.openstack.org/606106 | 18:41 |
openstack | bug 1794996 in OpenStack Compute (nova) "_destroy_evacuated_instances fails and kills n-cpu startup if lazy-loading flavor on a deleted instance" [High,In progress] https://launchpad.net/bugs/1794996 - Assigned to Matt Riedemann (mriedem) | 18:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix InstanceNotFound during _destroy_evacuated_instances https://review.openstack.org/606122 | 18:41 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage https://review.openstack.org/604400 | 18:41 |
spatel | sean-k-mooney: should i use this? hugepagesz=2M hugepages=15360 | 18:41 |
spatel | it will give 30G | 18:41 |
spatel | let me try to make it 28G | 18:41 |
spatel | keep 4G for OS | 18:42 |
sean-k-mooney | the hugepage memory will not be availabel to normal os process so 2MB is likely too tight for a compute node | 18:42 |
*** tbachman has quit IRC | 18:43 | |
sean-k-mooney | 4G should be ok but used to give ^G as my safty margin that said i did not need that much of a margin | 18:43 |
*** artom has quit IRC | 18:43 | |
spatel | In that case let me give 8G to OS (keep 24G for VM) | 18:44 |
sean-k-mooney | spatel: i would set it to 12 288 | 18:44 |
sean-k-mooney | which is 24G | 18:44 |
spatel | hugepagesz=2M hugepages=12288 - DONE! going to reboot compute node | 18:45 |
spatel | Do you use isolcpus= CPUAffinity ? | 18:45 |
spatel | I was reading about that not sure i need to worry about that or not | 18:45 |
sean-k-mooney | i would then also reduce the max size vms to 10 or 12 GB ram for your largset flavor so you can alway boot at least 2 of them | 18:46 |
sean-k-mooney | isolcpus is not the same as cpuaffintiy | 18:46 |
sean-k-mooney | i generally avoid isolcpus= it is a rather large hammer to reach for | 18:47 |
mriedem | bauzas: i've -2ed https://review.openstack.org/#/c/599208/ as we discussed yesterday | 18:47 |
sean-k-mooney | it should only be used for realtime instances even then its tricky to use correctly | 18:47 |
*** artom has joined #openstack-nova | 18:47 | |
spatel | ok! got it | 18:47 |
*** liuyulong has quit IRC | 18:47 | |
*** artom has quit IRC | 18:47 | |
sean-k-mooney | spatel: generally i would only suggest usign it to isolage cores allcoated to ovs-dpdk if you chose to depoly it | 18:47 |
*** artom has joined #openstack-nova | 18:48 | |
sean-k-mooney | spatel: dont get me wronge isolcpus= has a place but its only somting i reach for when i have no other options left and i really really need it | 18:48 |
spatel | I will soon deploy dpdk (believe me) | 18:49 |
mnaser | sean-k-mooney: https://review.openstack.org/#/c/607686/ .. ill push up a patch to test things out when possible (or at least something to confirm its working) | 18:49 |
spatel | in flavor i should set hw:mem_page_size='2048' right ? | 18:50 |
mnaser | so maybe if you want to start figuring out nova dependencies | 18:50 |
*** gyee has quit IRC | 18:50 | |
*** pcaruana has quit IRC | 18:50 | |
*** s10 has joined #openstack-nova | 18:52 | |
dansmith | mriedem: melwitt tssurya: cells meeting today? I have an appointment the hour before, but I will probably be back in time | 18:52 |
sean-k-mooney | spatel: you can but i prefer seting hw:mem_page_size=large | 18:52 |
sean-k-mooney | spatel: that will work with both 1G and 2MB hugepages | 18:53 |
spatel | done! let me do that | 18:53 |
dansmith | side note, mriedem melwitt: This is easy early utility stuff we can merge in front of the down cell stuff: https://review.openstack.org/#/c/594947/ | 18:53 |
tssurya | dansmith: the most important question I had was the best way to get the "type" of exception from the utility ^ | 18:54 |
dansmith | type? | 18:54 |
tssurya | we could also do it during the meeting if others also have topics | 18:54 |
mriedem | dansmith: i was holding off on that one until i knew what was going on further in the series | 18:54 |
tssurya | yea for instance a TimeOut/DBonnectionError expception versus InstanceNotFound exception | 18:54 |
tssurya | as of now we always return the "raised_exception_sentinel" which is not that useful | 18:55 |
tssurya | because based on the type of exception we have to handle it differently | 18:55 |
nicolasbock | Fixing the migration is more difficult it seems: I successfully updated the DB with the correct hypervisor and `server show` was now showing the correct hypervisor information | 18:55 |
mriedem | please hold, i have to sell something to a craigslist weirdo real quick | 18:55 |
dansmith | tssurya: by timeout you mean an rpc timeout, not the did_not_respond_sentinel I assume? | 18:55 |
nicolasbock | I ran `server migrate` which failed with `[Errno 2] No such file or directory: '/var/lib/nova/instances/2aa3a324-bf22-4e0c-912a-d7c52f59f1fd/disk` | 18:55 |
nicolasbock | So the disk didn't make it in the first migration | 18:55 |
nicolasbock | I verified that the disk is still on the old host | 18:56 |
* melwitt listens | 18:56 | |
sean-k-mooney | mnaser: that spec would allow testing quite alot of featue espcially if it supproted nested virt | 18:56 |
nicolasbock | Since I am in the middle of open heart surgery anyway I figured I just rsync the disk to the current hypervisor | 18:56 |
tssurya | dansmith: TimeOut was just an example, my main problem is to filter the "InstanceNotFOund" from others for nvoa show | 18:56 |
tssurya | nvoa show* | 18:56 |
tssurya | nova show* | 18:56 |
nicolasbock | So that worked | 18:56 |
mnaser | sean-k-mooney: these vms have nested virt | 18:56 |
nicolasbock | However, migrate is now refusing to migrate since the VM is in an ERROR state | 18:56 |
dansmith | tssurya: yeah | 18:57 |
mnaser | nicolasbock: nova reset-state --active | 18:57 |
nicolasbock | I can't `nova reset-state` either, it says `Reset state for server 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd succeeded; new state is error` | 18:57 |
tssurya | as of now, when we get the InstanceNotFound, utility hides this returns the sentinel, I try to go and make a minimal construct when I shouldn't be | 18:57 |
nicolasbock | which isn't all that helpful :( | 18:57 |
dansmith | tssurya: probably have to get away from the sentinel object I guess | 18:57 |
mnaser | `--active` | 18:57 |
dansmith | tssurya: which is going to be a mess | 18:57 |
tssurya | melwitt and I had a brief discussion | 18:57 |
melwitt | dansmith, tssurya: sean-k-mooney proposed this class as a way to be able to return exception objects https://review.openstack.org/605251 | 18:57 |
tssurya | the other day | 18:57 |
nicolasbock | Yeah mnaser !!! | 18:57 |
*** spatel has quit IRC | 18:58 | |
* tssurya looking | 18:58 | |
nicolasbock | I hadn't considered that since `--active Request the server be reset to "active" state instead of "error" state (the default).` | 18:58 |
*** efried has quit IRC | 18:58 | |
dansmith | um | 18:58 |
*** spatel has joined #openstack-nova | 18:58 | |
nicolasbock | I guess `--active` isn't the default after all | 18:58 |
dansmith | seems a lot overkill :) | 18:59 |
sean-k-mooney | mnaser: do you provide any other custom nodes. i dont know if you care about ovs-dpdk or cpu pinnng but would you be ok if we used that or a sligly less different flavor to maybe test does feature in the gate? | 18:59 |
dansmith | melwitt: tssurya: it would be trivial to just use the exception as the sentinel in the response, and we just check to see if the result isinstance(thing, Exception) | 18:59 |
mriedem | nicolasbock: that disk not found with cold migration sounds like a bug i've seen before that is fixed, but had to do with shared storage and volume-backed instances | 18:59 |
melwitt | dansmith: comment on the review :) it came about because I said something like, can we return the exception object in addition to the sentinel, in a tuple or something | 18:59 |
dansmith | and then you have the exception itself | 18:59 |
mnaser | we are slowly rolling out nested virt across our entire fleet but that is something to discuss more with the infra team i think | 18:59 |
mriedem | nicolasbock: but likely not fixed on newton | 18:59 |
melwitt | dansmith: yeah, that was my other suggestion. I had two ideas: drop the sentinel and check isinstance or keep the sentinel and have tuples | 19:00 |
tssurya | dansmith: right, that would be simple, is it okay to change the utility's face now ? | 19:00 |
nicolasbock | ok, do you happen to remember the review this was fixed in mriedem ? Maybe I can backport? | 19:00 |
mriedem | looking | 19:00 |
dansmith | melwitt: no reason for the sentinel I don't think | 19:00 |
*** efried has joined #openstack-nova | 19:00 | |
nicolasbock | Thanks mriedem | 19:01 |
dansmith | anything that isinstance(Exception) is... an error, so... | 19:01 |
nicolasbock | mnaser: it worked! The VM has migrated to a new host | 19:01 |
melwitt | dansmith: yeah, that's what I was thinking | 19:01 |
sean-k-mooney | mnaser: for ovs-dpdk and cpu pinning/hugepages we dont need nvme or gpus but we do need nested virt and a vm with multiple numa nodes. it is somthing that i agree i would love to discuss with infra. | 19:01 |
mnaser | yeah, we'd have to talk it out with infra | 19:01 |
melwitt | dansmith: but sean-k-mooney was thinking checking isinstance was an anti-pattern of some kind | 19:01 |
sean-k-mooney | melwitt: sorry i should read the scrole back | 19:02 |
melwitt | sean-k-mooney: we're just talking about the "return exceptions from scatter-gather" thing | 19:02 |
dansmith | melwitt: overengineering is an anti-pattern :) | 19:02 |
tssurya | sean-k-mooney: its about this: https://review.openstack.org/#/c/605251/ | 19:02 |
mriedem | nicolasbock: https://review.openstack.org/#/q/Ib10081150e125961cba19cfa821bddfac4614408 is what i'm thinking of | 19:03 |
nicolasbock | Is it ok that the disk is still on the old host after migration? | 19:03 |
melwitt | sean-k-mooney: dansmith suggested the same thing I suggested when we first talked about it, just return exception objects instead of the sentinel and check isinstance(thing, Exception) to know whether an error was returned or not | 19:03 |
nicolasbock | Thanks mriedem | 19:03 |
sean-k-mooney | dansmith: well i was porting a standard calass form c++ to python. retrun an exception has some weird sidefect in python 2 | 19:03 |
dansmith | sean-k-mooney: I have no idea what weird side effect you mean, other than that re-raising it doesn't keep the exception context properly, but we won't be doing that here | 19:04 |
dansmith | sys.exc_info I mean | 19:04 |
nicolasbock | mriedem: gerrit's cherry-pick doesn't seem to know Newton. Is that because Newton is EOL'ed? | 19:04 |
sean-k-mooney | melwitt: so the exception object in python 2 has a reference to the stack fram form which it was first thrown if the garbage collector cant deallocate it or any locks. sys.exc_info and retruning it is fine | 19:04 |
mriedem | nicolasbock: correct, newton is eol upstream | 19:05 |
mriedem | nicolasbock: note that that change is also building on top of two other fixes | 19:06 |
mriedem | called out in the commit message | 19:06 |
sean-k-mooney | dansmith: https://www.python.org/dev/peps/pep-0344/#open-issue-garbage-collection | 19:06 |
sean-k-mooney | dansmith: if we call sys.exc_info() and return the tuple as the sentiel that is fine however | 19:06 |
nicolasbock | Thanks mriedem , I will apply the fix in our vendor packages only then | 19:06 |
nicolasbock | Thanks all for the help with the "lost" VM! | 19:07 |
dansmith | sean-k-mooney: how is returning it any different than encapsulating it in your object here? | 19:07 |
dansmith | from a GC perspective | 19:07 |
sean-k-mooney | dansmith: i if you dont raise the exception and catch it it does not have the referecne to the stack frame so retrun VauleError("invalid data") is fine | 19:08 |
mriedem | nicolasbock: well, you probably should verify that the reason you got into this mess in the first place was due to one of those bugs | 19:08 |
mriedem | but whatever you want to do downstream is fine with me :) | 19:08 |
sean-k-mooney | "expect Exception as e: return e" is not | 19:09 |
nicolasbock | Yes of course :) | 19:09 |
sean-k-mooney | expect Exception: return sys.exc_info() is also fine | 19:09 |
spatel | sean-k-mooney: "pci=assign-busses,realloc" was causing issue my server got hung at boot, as soon as i remove that it works.. I have HP DL360p G8 | 19:09 |
dansmith | sean-k-mooney: but this result object of yours is just going to swallow the exception that we get from our handler right? | 19:09 |
dansmith | sean-k-mooney: so it's still pinning the reference to the stack | 19:10 |
sean-k-mooney | dansmith: ya i was planning to extend it to do the right thing on each python verion. | 19:11 |
sean-k-mooney | this is only an issue on python2 | 19:11 |
sean-k-mooney | they fix it in python 3 so you can just return the exception | 19:11 |
dansmith | we could trivially re-construct the exception since we know it inherits from NovaException and has very specific characteristics | 19:11 |
dansmith | since it's only py2, and since py3 is the future, and since this is a suuuper corner case that only "may" have issues with some GC being delayed... I would tend to punt on caring about this entirely | 19:12 |
dansmith | but | 19:12 |
sean-k-mooney | dansmith: yep but if we are going to do that i taught it would be nice to hide that in a result calss that does the right thing | 19:12 |
dansmith | just re-creating the exception and returning it would be fine for our purposes | 19:12 |
melwitt | tssurya: sorry, I'm not really getting the usefulness of the single cell scatter-gather from the commit message. how does it help for down cell? | 19:13 |
sean-k-mooney | spatel: ya dont add that. it was specically to work around a hardware bug in my old servers | 19:13 |
dansmith | IMHO we can punt and not worry about this | 19:13 |
spatel | roger! | 19:13 |
spatel | i took it out | 19:13 |
tssurya | melwitt: https://review.openstack.org/#/c/591658/3/nova/compute/api.py@2323 | 19:14 |
tssurya | we could just directly use the scatter_gather_cells(), just that it looked bad | 19:14 |
mriedem | jaypipes: i have crushed your soul https://review.openstack.org/#/c/607626/ | 19:14 |
melwitt | tssurya: I see, to get the automatic "wait for this amount of time before timing out" part. thanks | 19:15 |
tssurya | melwitt: we will be using it for Instance.get_by_uuid for the nova show part | 19:15 |
sean-k-mooney | dansmith: so ya ill likely work on https://review.openstack.org/#/c/605251/2 a little more in my spare time later this week as i want that class as a tool in my tool box for the future but we may not need it for this usecase. | 19:17 |
tssurya | dansmith, sean-k-mooney: so.. which way are are agreeing on ? | 19:17 |
dansmith | tssurya: I vote for return the exception and boil the ocean later :) | 19:17 |
melwitt | so what's all this reconstruction talk? in scatter-gather, when we catch the exception from the cell, are you saying we have to do more than just save the "as exp" part? | 19:18 |
tssurya | dansmith: ack, | 19:18 |
dansmith | melwitt: we do not | 19:18 |
dansmith | melwitt: we could trivially if we want | 19:18 |
tssurya | melwitt: as far as I understood just returning it should be enough | 19:18 |
melwitt | mmkay | 19:18 |
dansmith | just return exp.__class__(exp.args) would be enough | 19:19 |
dansmith | but it'd need a comment about why | 19:19 |
dansmith | and if we don't, then.. not | 19:19 |
melwitt | ok, that's what I meant, we can't just return exp, we have to do that step | 19:19 |
dansmith | we've just spawned a thread at this point, and most of these things are single DB calls, which means the stack being pinned by the exception is tiny | 19:19 |
jaypipes | mriedem: awesome. | 19:20 |
melwitt | ok, yeah. comment if we do that because otherwise I'm not going to remember why | 19:20 |
sean-k-mooney | dansmith: oh so just return a new exception object not the one we caught | 19:20 |
jaypipes | mriedem: it's hard enough already for me to give a rat's ass about a stable branch. :) | 19:20 |
dansmith | melwitt: we don't have to do that step.. we can just return exp. If we want, we can do the reconstruction step (and document it) | 19:21 |
dansmith | melwitt: I would vote for not reconstructing because I think this is super tiny | 19:21 |
tssurya | dansmith: ah got it | 19:21 |
melwitt | ack, thank you | 19:21 |
tssurya | melwitt, sean-k-mooney, dansmith: thanks | 19:21 |
dansmith | soooo, back to the meeting, | 19:22 |
dansmith | I will probably be back, if ya'll want to meet | 19:22 |
tssurya | dansmith: yea are we having one ? | 19:22 |
melwitt | I'm neutral about meeting. I don't have anything special to talk about | 19:22 |
melwitt | mriedem might want to talk about cross-cell stuff? I dunno | 19:22 |
dansmith | mriedem may want to talk about crossing the streams | 19:22 |
dansmith | yeah, t hat | 19:22 |
tssurya | I don't have anything special except some silly bugs | 19:22 |
melwitt | silly bugs? now I'm curious | 19:23 |
sean-k-mooney | just one other comment we are not holding any locks or file handels correct wehre we raise the exception in the scater gater case? | 19:23 |
tssurya | melwitt: https://bugs.launchpad.net/nova/+bug/1794994 | 19:23 |
openstack | Launchpad bug 1794994 in OpenStack Compute (nova) "Update the --max-rows parameter description for nova-manage db archive_deleted_rows" [Low,In progress] - Assigned to Surya Seetharaman (tssurya) | 19:23 |
tssurya | for now I changed it to a doc fix, but I am skeptical about it | 19:23 |
dansmith | sean-k-mooney: we would have just gotten a result from a threadpool of db workers, and they would almost definitely have re-raised outside of any locks | 19:24 |
tssurya | it would be just good to have the API table record removal also in the max-rows | 19:24 |
tssurya | not sure if people care though | 19:24 |
*** spartakos has joined #openstack-nova | 19:25 | |
tssurya | but yea its not super urgent | 19:25 |
tssurya | okay then I will head home now and will be lurking around during the meeting time in case we decide to have one | 19:26 |
melwitt | ok, will read through it. the issue is the command output can be confusing given the treatment of the API records | 19:26 |
sean-k-mooney | dansmith: ok the the stack frame reference keeps stack locals alive including any file handles or locks so can we add a commet the pep issue if we just return the exception just incase we have issue in the future | 19:26 |
tssurya | melwitt: exactly | 19:26 |
dansmith | sean-k-mooney: yep | 19:26 |
sean-k-mooney | dansmith: i think we will be fine but future me would regret not adding it if we ever have to debug it :) | 19:27 |
*** tbachman has joined #openstack-nova | 19:27 | |
melwitt | tssurya: thanks. this is hard for me to imagine because I can't remember what the archive_deleted_rows output looks like :P will look in the code | 19:28 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: stable-only: fix typo in IVS related privsep method https://review.openstack.org/604817 | 19:28 |
*** spatel has quit IRC | 19:29 | |
*** tssurya has quit IRC | 19:30 | |
mriedem | melwitt: dansmith: i don't really want to talk about cross-cell resize today probably; i suggested to dansmith that i skim my poc with him over a hangout early next week (i'm out tomorrow and friday) | 19:30 |
melwitt | k | 19:31 |
mriedem | tl;dr functional testing shows it working, | 19:31 |
mriedem | but there are a shit load of todos | 19:31 |
*** spatel has joined #openstack-nova | 19:31 | |
mriedem | and the patch is over 2K LOC now | 19:31 |
mriedem | it is definitely not enterprise ready | 19:31 |
melwitt | lol | 19:31 |
mriedem | i have also resorted to taking sleep aids to not wake up at 1am thinking about it... | 19:32 |
*** tbachman has quit IRC | 19:33 | |
mriedem | i've found this helps https://www.youtube.com/watch?v=Lrle0x_DHBM | 19:34 |
melwitt | heh | 19:36 |
*** jding1_ has quit IRC | 19:37 | |
*** spatel has quit IRC | 19:38 | |
*** jackding has joined #openstack-nova | 19:38 | |
sean-k-mooney | mriedem: oh youtube look kind of weird to me when its rendering 4:3 aspect ratio videos | 19:43 |
*** spatel has joined #openstack-nova | 19:47 | |
*** s10 has quit IRC | 19:48 | |
*** spatel has quit IRC | 19:56 | |
*** spatel has joined #openstack-nova | 19:58 | |
mriedem | who here knows what actually happens to the guest when you stop/start a vmware/hyperv/xenapi/ironic/powervm VM? | 19:59 |
mriedem | specifically, the root disk of said VMs if it's volume-backed? | 20:00 |
mriedem | efried: does powervm in-tree support boot from volume yet? | 20:00 |
efried | edmondsw: ^ | 20:00 |
efried | looking... | 20:00 |
*** spatel has quit IRC | 20:03 | |
*** spatel has joined #openstack-nova | 20:04 | |
efried | mriedem: Does compute set destroy_disks=False to the destroy() method if booted from volume? | 20:05 |
efried | mriedem: I assume you're trying to find out whether the disk gets destroyed or not. | 20:06 |
*** artom has quit IRC | 20:07 | |
efried | I can tell you this: In tree, we don't destroy volumes. | 20:07 |
efried | But I don't know whether we support bfv | 20:07 |
mriedem | efried: no not related to that | 20:07 |
mriedem | related to https://review.openstack.org/#/c/600628/ | 20:07 |
mriedem | which i haven't -1ed yet but it's coming | 20:07 |
mriedem | the virt driver doesn't destroy volumes, the compute manager orchestrates the detach and delete if bdm.terminate_on_deletion is True | 20:08 |
mriedem | i'm mostly wondering if the virt driver will disconnect and reconnect volumes on simple stop/start operatoins | 20:08 |
mriedem | for libvirt, we do - starting around queens or rocky | 20:08 |
efried | We don't disconnect anything on power-off | 20:10 |
*** slaweq has quit IRC | 20:10 | |
*** mlavalle has quit IRC | 20:10 | |
efried | That said, I'm not 100% sure the *platform* retains ownership of that resource in such a way that you couldn't attach it to something else while the instance is powered off. | 20:10 |
efried | Gerald would be better equipped to answer this stuff. But he ain't here. | 20:11 |
*** slaweq has joined #openstack-nova | 20:11 | |
*** mlavalle has joined #openstack-nova | 20:11 | |
*** macza has quit IRC | 20:15 | |
*** macza has joined #openstack-nova | 20:16 | |
edmondsw | mriedem re: bfv for powervm in-tree... I believe the code is close enough that it might work, but it's untested and there is at least one improvement we should make | 20:16 |
*** med_ has joined #openstack-nova | 20:16 | |
*** spartakos has quit IRC | 20:17 | |
edmondsw | mriedem why would you disconnect and reconnect volumes on stop/start? | 20:18 |
sean-k-mooney | edmondsw: the libvirt driver destorys the domain and recreates it on start stop | 20:20 |
edmondsw | right... why? | 20:21 |
sean-k-mooney | edmondsw: so its proably done as a sideffect of that | 20:21 |
mriedem | comments inline in that spec | 20:21 |
sean-k-mooney | edmondsw: legacy reasons but we treat stop like delete as far as libvirt is concerend but we dont delete the disk obviously | 20:22 |
sean-k-mooney | we do detach all ports, gpus ectra when we shut down the vm but we still retain owner ship of them in placement/the resouce tracker | 20:23 |
*** gyee has joined #openstack-nova | 20:24 | |
edmondsw | ok. I'll assume "legacy reasons" means there's no reason for other drivers to consider doing that | 20:24 |
*** tbachman has joined #openstack-nova | 20:26 | |
sean-k-mooney | well there is one but its not a good one. if you are using iscsi volumes by detaching the volume on stop it reduces memory uses on the issci server | 20:26 |
sean-k-mooney | which if its hardware based also means we can potenailly free up other hardware resouces but that also means the vm can fail to start back up if somting else grabs the last slot | 20:27 |
sean-k-mooney | that said you would have max out your cloud stroage at that point so you have bigger issue then one vm not starting | 20:27 |
sean-k-mooney | edmondsw: i dont know if there is a actul usecase where you would want to disconnect today but mayber there is | 20:28 |
edmondsw | sean-k-mooney tx for the explanation | 20:28 |
sean-k-mooney | i know some people want to be able to do things with bfv root volumes when the instace is offline too but i kindof zoned out at the ptg for that conversation. | 20:30 |
mriedem | sean-k-mooney: that is exactly the spec i'm referring to above | 20:30 |
mriedem | and why i'm asking about this | 20:30 |
*** spatel has quit IRC | 20:31 | |
sean-k-mooney | mriedem: ah ok that make more sense. | 20:31 |
mriedem | because i'm pretty sure swapping the root volume while the instance is stopped was not part of the originally approved spec | 20:31 |
mriedem | and s10 got Kevin_Zheng to change it | 20:31 |
mriedem | b/c of how the libvirt driver works | 20:31 |
*** tbachman_ has joined #openstack-nova | 20:31 | |
*** tbachman has quit IRC | 20:31 | |
*** tbachman_ is now known as tbachman | 20:31 | |
mriedem | and i'm asserting that's not a good enough reason... | 20:31 |
*** spatel has joined #openstack-nova | 20:31 | |
sean-k-mooney | right i think haveing an expcit api to say detach volume for stoped instacnce would be better | 20:32 |
sean-k-mooney | e.g. not assumeing its implcitly detatched when you stop | 20:32 |
mriedem | well, the virt driver could just refuse to detach the root volume while the instance is stopped | 20:32 |
mriedem | if it doens't support it and raise an exception which gets recorded as a fault | 20:32 |
sean-k-mooney | mriedem: it could but did they not want to allow that? | 20:32 |
sean-k-mooney | e.g. detaching root volume when its stoped. that would be almost a noop for libvirt | 20:33 |
mriedem | the spec is proposing that you can swap the root volume while the instance is offloaded or stopped | 20:33 |
*** med_ has quit IRC | 20:33 | |
sean-k-mooney | right but you could do that by createing a new volume. stoping the instance. detach the root volmu and attach the volume created in step 1 then start | 20:34 |
sean-k-mooney | do you need an explcit api to do the swap as an atopmic operation | 20:34 |
mriedem | no, and that is what the spec is proposing | 20:34 |
mriedem | "createing a new volume. stoping the instance. detach the root volmu and attach the volume created in step 1 then start" | 20:35 |
mriedem | my point is, i don't know that all virt drivers could handle that today for the root volume | 20:35 |
mriedem | while the instance is stopped | 20:35 |
sean-k-mooney | ha ok am perhaps | 20:35 |
sean-k-mooney | i cant think why they could not if they support rebuild | 20:36 |
*** tssurya has joined #openstack-nova | 20:36 | |
*** spatel has quit IRC | 20:36 | |
sean-k-mooney | its basicaly the same thing except we are not chaning host | 20:36 |
*** spatel has joined #openstack-nova | 20:37 | |
mriedem | rebuild does a driver.destroy and then a driver.spawn | 20:37 |
mriedem | it assumes destruction | 20:37 |
mriedem | i can't say that stop/start assume that same thing | 20:38 |
mriedem | same with shelve/unshelve | 20:38 |
mriedem | shelve does a driver.destroy and unshelve does a driver.spawn | 20:38 |
mriedem | sean-k-mooney: haven't you been working for like 20 straight hours at this point or something? | 20:39 |
mriedem | at what point do you become drunk by exhaustion? | 20:39 |
sean-k-mooney | mriedem: all of this is true. i would be surprised if this could not be supported on multiple hyperviors but its good to check. | 20:39 |
sean-k-mooney | haha not quite but if i get tired enough i do find it harder to concentrate. | 20:40 |
sean-k-mooney | i have been working sine 10:30 + i took an hour for lunch but ya im just finishing up for the day | 20:41 |
mriedem | pretty sure you said you were finishing for the day about 4 hours ago | 20:41 |
sean-k-mooney | i did finish rather late last night. ya got distracted with a few things | 20:41 |
sean-k-mooney | i really need to file my ptg expensice tomrrow... i started twice today but then got pull into email threads + code. | 20:44 |
sean-k-mooney | thats what i was trying to figure out for the last 30 mins but it can wait | 20:44 |
sean-k-mooney | talk to you tommorw | 20:45 |
melwitt | mriedem, tssurya: I told dansmith we're skipping the meeting based on the earlier convo | 20:46 |
tssurya | melwitt: ack, thanks for the info :) | 20:47 |
melwitt | he's not going to be back in time anyway | 20:47 |
mriedem | he said he would be back in time | 20:47 |
tssurya | wfm, Its too late here anyways, you guys have a good day | 20:47 |
mriedem | <3 broken | 20:47 |
melwitt | heh | 20:48 |
tssurya | :) | 20:48 |
mriedem | https://bugs.launchpad.net/nova/+bug/1795966 | 20:49 |
openstack | Launchpad bug 1795966 in OpenStack Compute (nova) "<class 'oslo_db.exception.DBNonExistentTable'> (HTTP 500)" [Undecided,Invalid] | 20:49 |
melwitt | meanwhile, gd consoleauth. we have an API where you can 'show' your console auth token. and that is making the deprecation nightmare worse. have to figure out if/how to adjust this for the database backend | 20:50 |
*** spatel has quit IRC | 20:50 | |
melwitt | mriedem: that must be the shortest bug report ever | 20:50 |
mriedem | https://developer.openstack.org/api-ref/compute/#create-remote-console ? | 20:50 |
melwitt | https://developer.openstack.org/api-ref/compute/#show-console-connection-information | 20:51 |
mriedem | ah heh | 20:51 |
melwitt | FML | 20:51 |
mriedem | well, | 20:51 |
mriedem | oh heh you can't know which cell to route it to right | 20:52 |
mriedem | b/c the token isn't mapped in the api | 20:52 |
mriedem | you'll have to iterate the cell dbs looking for that token id | 20:52 |
melwitt | no... which I'm trying to remember, what did I find last time I looked at this. arrrrgghh | 20:53 |
*** spartakos has joined #openstack-nova | 20:54 | |
mriedem | hmm, we only store the hashed token in the db right | 20:55 |
mriedem | and that's not what the API would have in it? | 20:55 |
melwitt | yeah only the hashed token. and I think the API takes the unhashed token from the user | 20:55 |
mriedem | ha, cool | 20:56 |
mriedem | for cell in all_cells(): for console_auth_token in all_console_auth_tokens_in_this_cell(): if console_auth_token == req_id: do that thing() | 20:56 |
*** eharney has quit IRC | 20:57 | |
mriedem | i bet we log that unhashed token in the nova-api logs too... | 20:58 |
mriedem | since it's on the path | 20:58 |
mriedem | https://github.com/openstack/nova/blob/master/nova/api/openstack/requestlog.py#L41 | 20:59 |
melwitt | indeed, I can see it in the func test output | 20:59 |
mriedem | ha, cool | 20:59 |
melwitt | 2018-10-03 20:37:40,870 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/os-console-auth-tokens/714a26ff-d7e6-4698-bc30-9934ebf38807" | 20:59 |
mriedem | well luckily logging credentials isn't a CVE | 21:00 |
*** priteau has quit IRC | 21:01 | |
melwitt | ... | 21:01 |
melwitt | I guess people aren't paying too much attention to this API, myself included | 21:02 |
mriedem | it's admin-only by default | 21:02 |
melwitt | I see, ok | 21:02 |
melwitt | Note "This is only used in Xenserver VNC Proxy." | 21:03 |
melwitt | really? I wonder how | 21:03 |
mriedem | that's for the other 4 | 21:03 |
mriedem | i saw that as well | 21:03 |
mriedem | os-console-auth-tokens was added specifically for rdp consoles for hyperv | 21:03 |
melwitt | O.o | 21:04 |
*** erlon has quit IRC | 21:06 | |
*** spartakos has quit IRC | 21:08 | |
melwitt | yeah, so we could scatter-gather a ConsoleAuthToken.validate(context, token) call and only one will return token object, the others will raise exceptions. that method takes an unhashed token and will hash it before looking for it in the db | 21:09 |
mriedem | sure | 21:10 |
mriedem | shitty performance but whatareyougonnado | 21:11 |
mriedem | plus it's admin-only and no one knew it existed | 21:11 |
melwitt | yeah, exactly | 21:11 |
melwitt | haha, right. we have that going for us | 21:11 |
mriedem | is there a bug for this? | 21:11 |
melwitt | I'll add it to the pile o poopatches | 21:11 |
melwitt | no | 21:11 |
melwitt | I'll open one | 21:11 |
mriedem | cool. not sure if we should report the token logging thing or just pretend i never said it. | 21:12 |
melwitt | I was just making the changes to only access consoleauth if [workarounds] and ran into this in the func tests | 21:12 |
melwitt | so I'm doing really good here | 21:12 |
melwitt | yeah, I'm not sure either. I expect it wouldn't cause a CVE because it's been this way for years | 21:14 |
mriedem | heh, well, we've had things "this way for years" that are CVEs, | 21:19 |
mriedem | but logging credentials and tokens and such isn't considered one of them | 21:19 |
mriedem | it's a "hardening opportunity" | 21:20 |
melwitt | haha, ok | 21:20 |
openstackgerrit | Jay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance https://review.openstack.org/607626 | 21:20 |
melwitt | https://bugs.launchpad.net/nova/+bug/1795982 | 21:21 |
openstack | Launchpad bug 1795982 in OpenStack Compute (nova) "/os-console-auth-tokens/{console_token} API doesn't handle the database backend" [High,Triaged] - Assigned to melanie witt (melwitt) | 21:21 |
*** spatel has joined #openstack-nova | 21:24 | |
*** spartakos has joined #openstack-nova | 21:26 | |
*** spatel has quit IRC | 21:29 | |
melwitt | so, these other console create/delete/get APIs are connected to cell database models, with nothing at the API level to target cells for the consoles | 21:30 |
*** slaweq has quit IRC | 21:30 | |
melwitt | "nova-console, which is a XenAPI-specific service that most recent VNC proxy architectures do not use." | 21:31 |
melwitt | it sounds like that should be deprecated. we didn't do anything to handle it in a cells v2 world | 21:32 |
melwitt | maybe I should send something to the ML to ask about it | 21:35 |
*** awaugama has quit IRC | 21:43 | |
*** slagle has quit IRC | 21:44 | |
mriedem | i thought the xvp stuff was xen-only | 21:45 |
*** takashin has joined #openstack-nova | 21:45 | |
melwitt | yeah, the nova-console service is xen-only. but if someone ran multi-cell with xen, the nova-console part wouldn't work right | 21:46 |
mriedem | but yeah this is clearly busted in a cells v2 world | 21:46 |
*** tbachman has quit IRC | 21:47 | |
melwitt | so the question will be, do we cells-v2-ify it or do we deprecate it. tbc, this is for the other APIs, not the consoleauth one I'm fixing | 21:47 |
melwitt | *the other 4 | 21:47 |
mriedem | yeah i know | 21:48 |
*** tbachman has joined #openstack-nova | 21:49 | |
mriedem | idk, i've asked about killing xvp in the past | 21:50 |
mriedem | no one seems to know | 21:50 |
melwitt | ah, ok | 21:50 |
mriedem | i'd say if there are alternatives available for xenapi users, then we should deprecate it | 21:51 |
mriedem | so probably a question for naichuans and BobBall | 21:51 |
melwitt | yeah, that's what I wasn't sure about, because IIUC, xenapi has to use some ancient version of stuff, so they might actually need it because they can't use newer VNC | 21:51 |
mriedem | and yeah send something to the dev and ops MLs | 21:51 |
mriedem | oh b/c of python 2.4? | 21:52 |
mriedem | i might be thinking of something else | 21:52 |
mriedem | i guess start with the ML | 21:52 |
melwitt | maybe. when stephenfin worked on the encrypted console stuff, he had to exclude xenapi from the version requirement IIRC | 21:52 |
mriedem | b/c twould be nice to drop all this crap | 21:52 |
melwitt | yeah | 21:52 |
melwitt | I was thinking of this https://github.com/openstack/nova/blob/master/nova/cmd/novncproxy.py#L40 | 21:54 |
melwitt | so maybe unrelated since that implies xenapi users can use the regular novnc proxy | 21:55 |
*** munimeha1 has quit IRC | 21:56 | |
melwitt | um, this is ancient. and not for exactly the same log you pointed out https://bugs.launchpad.net/nova/+bug/1492140 | 22:01 |
openstack | Launchpad bug 1492140 in OpenStack Compute (nova) "consoleauth token displayed in log file" [Low,In progress] - Assigned to Tristan Cacqueray (tristan-cacqueray) | 22:01 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Explicitly fail if trying to attach SR-IOV port https://review.openstack.org/607729 | 22:05 |
mriedem | yeah that's for consoleauth | 22:05 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Ignore VirtDriverNotReady in _sync_power_states periodic task https://review.openstack.org/607730 | 22:07 |
*** s10 has joined #openstack-nova | 22:08 | |
*** itlinux has quit IRC | 22:09 | |
*** mlavalle has quit IRC | 22:11 | |
mriedem | gibi: efried: so i polished off this old dnm devstack test experiment patch to hammer the scheduler to create 1000 instances in a single request which used to give us a ConcurrentUpdate failure during scheduling and creating allocations, and now i can make it fail with consumer generation conflicts http://logs.openstack.org/18/507918/8/check/tempest-full/a9f3849/controller/logs/screen-n-sch.txt.gz?level=TRACE#_Oct_02_23_29_12 | 22:16 |
mriedem | 481 | 22:16 |
mriedem | Oct 02 23:29:12.475481 ubuntu-xenial-limestone-regionone-0002536892 nova-scheduler[22653]: ERROR oslo_messaging.rpc.server [None req-f4fe43ea-d117-4b7d-a3a4-23dcb59f3058 admin admin] Exception during message handling: AllocationDeleteFailed: Failed to delete allocations for consumer 6962f92b-7dca-4912-aeb2-dcae03c4b52e. Error: {"errors": [{"status": 409, "request_id": "req-13df41fe-cb55-49f1-a998-09b34e48f05b", "code": "place | 22:16 |
mriedem | .concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 3 ", "title": "Conflict"}]} | 22:16 |
efried | mriedem: Are you using any in-flight patches under that, or just master? | 22:17 |
*** spartakos has quit IRC | 22:17 | |
mriedem | i think that is newish right? | 22:17 |
mriedem | master | 22:17 |
efried | Yes, it's new, since the bottom few patches of gibi's consumer gen patches merged. | 22:17 |
efried | mriedem: You may want to try running it on top of https://review.openstack.org/#/c/583667/ and see if that fixes it. | 22:19 |
efried | mriedem: fyi, the ConcurrentUpdate and generation conflict are the same thing, we just switched the error code recently. | 22:20 |
efried | so it's not really "new", it's just wearing a different dress. | 22:20 |
mriedem | that change doesn't look like it would help here, | 22:20 |
efried | orly ynot? | 22:21 |
mriedem | it doesn't use the latest consumer generation when deleting allocations right? | 22:21 |
mriedem | we're failing to submit allocations because the host is full | 22:21 |
mriedem | Oct 02 23:29:12.377430 ubuntu-xenial-limestone-regionone-0002536892 nova-scheduler[22653]: WARNING nova.scheduler.client.report [None req-f4fe43ea-d117-4b7d-a3a4-23dcb59f3058 admin admin] Unable to submit allocation for instance 63ae7544-7693-4749-886b-024dc93f09f9 (409 {"errors": [{"status": 409, "request_id": "req-0e85117c-871c-46c2-9e01-53c84e811b44", "code": "placement.undefined_code", "detail": "There was a conflict when | 22:21 |
mriedem | ing to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider 'b7709a93-f14c-42ed-addf-9736fb721728'. The requested amount would exceed the capacity. ", "title": "Conflict"}]}) | 22:21 |
mriedem | and then the scheduler is trying to cleanup allocations created for previously processed instances in the same request | 22:21 |
mriedem | and fails to do that cleanup b/c the consumer generation changed | 22:22 |
efried | hm, yeah, I didn't notice that the first one was a failure on deletion. | 22:22 |
efried | That window I bitched about us not really closing | 22:22 |
efried | apparently it's big enough for us to actually hit it. | 22:22 |
mriedem | note this is also an extreme case, | 22:23 |
mriedem | i'm creating 1000 instances in a single request | 22:23 |
mriedem | expecting to melt the scheduler | 22:23 |
mriedem | and i do | 22:23 |
efried | I'm trying to figure out where that message is really coming from. "expected null but got 3" <== does this mean we sent null or 3 to the API? | 22:25 |
mriedem | yeah so between the GET and PUT of the allocatoins, the consumer generation changed and we blew up | 22:25 |
melwitt | I wonder if a normaler number like 100 would do it? because I have heard of people doing that (oath) | 22:25 |
*** rcernin has joined #openstack-nova | 22:25 | |
efried | right, I'm trying to figure out how that happens. What else is mucking with the allocations? | 22:25 |
mriedem | https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L2083 | 22:25 |
mriedem | "If between the GET and the PUT the consumer # generation changes then we raise AllocationDeleteFailed." | 22:25 |
mriedem | the consumer is just the project/user right? | 22:25 |
melwitt | (my comment was based on the use of the word "extreme") | 22:26 |
mriedem | i mean, i guess we have the instance_uuid for the consumer here. | 22:26 |
efried | That's the aforementioned window I bitched about, the result of which was the comment a couple of lines below that. | 22:26 |
mriedem | is the consumer in placement the unique constraint of the uuid/project_id/user_id? | 22:26 |
efried | There's a real consumer object now | 22:26 |
mriedem | so the other thing here, | 22:26 |
efried | the consumer object (a db table row) has a generation we have to update atomically or die. | 22:27 |
sean-k-mooney | melwitt: i have spawned 350 instance in one request before on newton and it worked fine | 22:27 |
mriedem | is that because we have a retry decorator on the select_destinations rpc call, if we get MessagingTimeout from the scheduler b/c it takes too long to schedule 1000 instances in a single request, it re-sends the request to the scheduler with the same list of instances | 22:27 |
melwitt | sean-k-mooney: ack, that's a data point | 22:27 |
mriedem | so at this point we would have 2 workers trying to create allocations for the same set of instances (consumers) against the same set of providers | 22:27 |
mriedem | which will stomp all over themselves | 22:27 |
efried | oh, okay. That'd do it. | 22:27 |
mriedem | what i'm wondering is if delete_allocation_for_instance should detect the consumer generation conflict and retry | 22:28 |
mriedem | like we do on claim_resources https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1744 | 22:28 |
*** s10 has quit IRC | 22:28 | |
efried | mriedem: We decided it should not | 22:31 |
efried | there was a whole long ML thread about it. | 22:31 |
efried | Because, we said, there really shouldn't be more than one thing acting on instance allocations at once, we said. | 22:31 |
efried | IMO the bug is that ^ | 22:32 |
mriedem | yeah i'm writing up the bug now | 22:33 |
efried | mriedem: Do you need the ML thread? | 22:33 |
mriedem | no | 22:33 |
mriedem | https://bugs.launchpad.net/nova/+bug/1795992 | 22:33 |
openstack | Launchpad bug 1795992 in OpenStack Compute (nova) "retry_select_destinations decorator can make a mess with allocations in placement in a large multi-create request" [Medium,Triaged] | 22:33 |
efried | Heh. "make a mess". | 22:34 |
* efried has fond memories of "diaper failure" | 22:34 | |
efried | ...fond because they're *memories*. | 22:34 |
mriedem | total blowout | 22:34 |
efried | One time in IKEA | 22:34 |
mriedem | coincidentally, lbragstad is dealing with that right now | 22:34 |
efried | Oh, did he pop? Good deal. | 22:35 |
mriedem | black split pea soup coming out of everything | 22:35 |
mriedem | let me mind meld with him quick | 22:35 |
melwitt | congrats lbragstad | 22:35 |
sean-k-mooney | oh before i forget i popped back to say i just found out that kernel 4.16 added a new netdevsim driver that supports among other coolthings sriov. would people be ok with me creating an experimental gate job to test sriov using fedora28? | 22:36 |
efried | mriedem: Actually, that patch I mentioned before might possibly make the failure happen earlier in the sequence... | 22:36 |
efried | because surely the allocation is being overwritten | 22:36 |
efried | though it might be subject to the same window-teeninenss | 22:37 |
*** tbachman has quit IRC | 22:38 | |
*** tssurya has quit IRC | 22:41 | |
*** macza has quit IRC | 22:54 | |
*** macza_ has joined #openstack-nova | 22:54 | |
*** spartakos has joined #openstack-nova | 23:02 | |
*** tbachman has joined #openstack-nova | 23:07 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Use long_rpc_timeout in select_destinations RPC call https://review.openstack.org/607735 | 23:07 |
mriedem | dansmith: ^ | 23:07 |
* dansmith nods in approval | 23:08 | |
*** macza_ has quit IRC | 23:12 | |
*** tbachman has quit IRC | 23:12 | |
mriedem | melwitt: ocata backport here should be ready to go https://review.openstack.org/#/c/605842/ | 23:12 |
melwitt | ok | 23:13 |
*** tbachman has joined #openstack-nova | 23:16 | |
mriedem | lyarwood: if you want to get these live migration ipv6 changes into the final ocata release before we put it into EM mode you'll need to get the pike and ocata backports fixed up https://review.openstack.org/#/q/I1201db996ea6ceaebd49479b298d74585a78b006 | 23:24 |
*** artom has joined #openstack-nova | 23:30 | |
melwitt | TIL unified object string fields are six.text_type i.e. unicode | 23:38 |
melwitt | do we have any things where we compare strings agnostic to bytes vs unicode in unit tests? | 23:46 |
melwitt | this test is asserting the api response as a dict | 23:47 |
melwitt | and if consoleauth served the request, it's bytes strings and if the unified object served the request, it's unicode strings | 23:47 |
*** itlinux has joined #openstack-nova | 23:48 | |
*** mchlumsky has quit IRC | 23:49 | |
*** slagle has joined #openstack-nova | 23:51 | |
*** mriedem has quit IRC | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!