Friday, 2020-07-24

*** jmlowe has joined #openstack-nova		00:01
*** brinzhang0 has joined #openstack-nova		00:06
*** brinzhang_ has quit IRC		00:09
*** aj_mailing has joined #openstack-nova		00:10
*** xek_ has quit IRC		00:14
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add new default roles in tenant networks policies https://review.opendev.org/742771	00:15
*** brinzhang has joined #openstack-nova		00:17
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add test coverage of tenant networks policies https://review.opendev.org/742765	00:18
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Introduce scope_types in tenant networks policy https://review.opendev.org/742766	00:19
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add new default roles in tenant networks policies https://review.opendev.org/742771	00:19
*** brinzhang0 has quit IRC		00:20
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Pass the actual target in tenant networks policy https://review.opendev.org/742772	00:23
*** aj_mailing has quit IRC		00:33
*** aj_mailing has joined #openstack-nova		00:34
*** songwenping__ has joined #openstack-nova		00:45
*** xiaolin has joined #openstack-nova		00:53
*** jmlowe has quit IRC		00:58
*** yaawang has quit IRC		01:00
*** jmlowe has joined #openstack-nova		01:00
*** yaawang has joined #openstack-nova		01:00
*** songwenping_ has joined #openstack-nova		01:07
*** songwenping__ has quit IRC		01:10
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add test coverage of volumes policies https://review.opendev.org/742773	01:15
*** masterpe has quit IRC		01:16
*** masterpe has joined #openstack-nova		01:20
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Introduce scope_types in volumes policy https://review.opendev.org/742774	01:22
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add new default roles in security_groups policies https://review.opendev.org/742763	01:23
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Pass the actual target in security_groups policy https://review.opendev.org/742764	01:23
openstackgerrit	Yingji Sun proposed openstack/nova master: Set different VirtualDevice.key https://review.opendev.org/713565	01:38
*** aj_mailing has quit IRC		01:47
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Add new default roles in volumes policies https://review.opendev.org/742777	01:57
*** yaawang has quit IRC		02:04
openstackgerrit	Ghanshyam Mann proposed openstack/nova master: Pass the actual target in volumes policy https://review.opendev.org/742779	02:07
*** songwenping__ has joined #openstack-nova		02:09
*** songwenping_ has quit IRC		02:12
*** mkrai has joined #openstack-nova		02:20
*** yaawang has joined #openstack-nova		02:24
*** aj_mailing has joined #openstack-nova		02:25
*** dave-mccowan has quit IRC		02:26
alex_xu	stephenfin: gibi, I saw you mentioned the upgrade issue for provider config yaml. I didn't follow the spec discussion in the beginning, could you remind me what is about? then I think I can help tony_su go through the problem.	02:28
*** lbragstad_ has joined #openstack-nova		02:30
openstackgerrit	Merged openstack/nova stable/stein: compute: Allow snapshots to be created from PAUSED volume backed instances https://review.opendev.org/729176	02:30
*** aj_mailing has quit IRC		02:31
*** lbragstad has quit IRC		02:32
*** gyee has quit IRC		02:33
*** lbragstad_ has quit IRC		02:35
*** gyee has joined #openstack-nova		02:40
*** Yumeng has joined #openstack-nova		02:43
openstackgerrit	Merged openstack/nova stable/ussuri: objects: Update keypairs when saving an instance https://review.opendev.org/742631	02:50
*** yaawang has quit IRC		03:00
*** yaawang has joined #openstack-nova		03:01
*** huaqiang has joined #openstack-nova		03:09
openstackgerrit	Xinran WANG proposed openstack/nova-specs master: SRIOV SmartNic Support Specification https://review.opendev.org/742785	03:15
*** songwenping__ has quit IRC		03:19
*** songwenping__ has joined #openstack-nova		03:19
*** mriedem has left #openstack-nova		03:23
tony_su	gibi: stephenfin: A status update for provider-config-file patches. I am handling your comments which are all valuable. Most of them are easy and okay to simply upgrade patches. But a few like refactor schema into code or add new test coverage require more consideration and more days ...	03:26
tony_su	gibi: stephenfin: A status update for provider-config-file patches. I am handling your comments which are all valuable. Most of them are easy and okay to simply upgrade patches. But a few like refactor schema into code or add new test coverage require more consideration and more days ...	03:27
*** aj_mailing has joined #openstack-nova		03:27
*** tony_su has left #openstack-nova		03:27
*** tony_su has joined #openstack-nova		03:28
*** yaawang has quit IRC		03:31
openstackgerrit	Yingji Sun proposed openstack/nova master: Set different VirtualDevice.key https://review.opendev.org/713565	03:32
*** yaawang has joined #openstack-nova		03:32
*** brinzhang_ has joined #openstack-nova		03:33
*** brinzhang has quit IRC		03:36
*** psachin has joined #openstack-nova		03:36
*** huaqiang has quit IRC		03:40
*** yaawang has quit IRC		04:09
*** yaawang has joined #openstack-nova		04:09
*** gyee has quit IRC		04:14
openstackgerrit	Xinran WANG proposed openstack/nova-specs master: SRIOV SmartNic Support Specification https://review.opendev.org/742785	04:16
*** aj_mailing has quit IRC		04:28
*** udesale has joined #openstack-nova		04:33
*** mkrai has quit IRC		04:34
*** mkrai has joined #openstack-nova		04:44
*** songwenping_ has joined #openstack-nova		04:54
*** eharney has quit IRC		04:55
*** amodi has quit IRC		04:55
*** songwenping__ has quit IRC		04:57
*** aj_mailing has joined #openstack-nova		05:02
*** eharney has joined #openstack-nova		05:08
*** yaawang has quit IRC		05:11
*** yaawang has joined #openstack-nova		05:12
*** ratailor has joined #openstack-nova		05:14
*** aj_mailing has quit IRC		05:17
*** aj_mailing has joined #openstack-nova		05:25
*** links has joined #openstack-nova		05:37
*** songwenping__ has joined #openstack-nova		05:47
*** songwenping_ has quit IRC		05:51
*** jsuchome has joined #openstack-nova		06:31
*** tinwood is now known as tinwood-afk		06:33
*** yaawang has quit IRC		06:59
*** yaawang has joined #openstack-nova		06:59
*** aj_mailing has quit IRC		07:05
*** aj_mailing has joined #openstack-nova		07:06
*** aj_mailing has quit IRC		07:09
*** tesseract has joined #openstack-nova		07:13
*** ralonsoh has joined #openstack-nova		07:28
gibi	tony_su: don't worry. I appreciate your work on that series and I will look at it when you are ready	07:29
gibi	alex_xu: I'm not sure I can recall an upgrade issue in the provider config series (but it is Friday so my brain is already slow) do you have a reference?	07:31
bauzas	gibi: do you know the answer of https://review.opendev.org/#/c/739211/5/nova/tests/unit/test_crypto.py@21	07:32
bauzas	?	07:32
bauzas	that's an horrible import	07:32
gibi	bauzas: looking...	07:32
bauzas	hmmm, can't find a castellanclient kind of thing	07:34
gibi	bauzas: does castellan just an interface and by having castellen we don't have to pull in whole key manager backend like barbican	07:34
gibi	?	07:34
bauzas	I'm not a specialist of any OpenStack key manager	07:35
bauzas	but if you're right, that explains my readings	07:35
* bauzas goes looking at the castellan docs		07:35
bauzas	mmmm https://docs.openstack.org/castellan/latest/user/index.html#basic-usage	07:36
bauzas	looks you're right indeed	07:36
*** tosky has joined #openstack-nova		07:37
*** yaawang has quit IRC		07:40
*** yaawang has joined #openstack-nova		07:40
*** mkrai has quit IRC		07:44
*** maciejjozefczyk has joined #openstack-nova		07:45
*** xinranwang__ has joined #openstack-nova		08:05
*** markvoelker has joined #openstack-nova		08:11
*** markvoelker has quit IRC		08:15
*** nightmare_unreal has joined #openstack-nova		08:27
*** mkrai has joined #openstack-nova		08:32
*** xek_ has joined #openstack-nova		08:35
stephenfin	bauzas, gibi: The fix for that o.vo version issue is here, btw https://review.opendev.org/#/c/742650/1	08:41
gibi	stephenfin: thanks	08:41
stephenfin	alex_xu: As gibi said, I don't think anyone noted any upgrade issues with provider.yaml. Perhaps you're confusing it with the investigation of upgrade issues bauzas was doing for the vTPM series?	08:42
gibi	stephenfin: ahh, that was the upgrade discussion yesterday ^^	08:43
gibi	I knew there was something	08:43
gibi	I just did not remember what	08:43
openstackgerrit	Stephen Finucane proposed openstack/nova master: Use compression by default for 'SshDriver' https://review.opendev.org/684393	08:45
*** tinwood-afk is now known as tinwood		08:45
alex_xu	stephenfin: ah, thanks :)	08:46
*** derekh has joined #openstack-nova		08:51
*** dtantsur\|afk is now known as dtantsur		08:53
*** janno has quit IRC		08:53
*** janno has joined #openstack-nova		08:54
*** janno has quit IRC		08:55
*** janno has joined #openstack-nova		08:55
*** ociuhandu has joined #openstack-nova		09:06
*** ratailor_ has joined #openstack-nova		09:06
*** ratailor has quit IRC		09:08
*** ociuhandu has quit IRC		09:09
*** xek_ has quit IRC		09:12
*** jraju__ has joined #openstack-nova		09:23
*** links has quit IRC		09:23
openstackgerrit	Merged openstack/nova master: scheduler: Request vTPM trait based on flavor or image https://review.opendev.org/739210	09:23
openstackgerrit	Merged openstack/nova master: crypto: Add support for creating, destroying vTPM secrets https://review.opendev.org/739211	09:24
openstackgerrit	Merged openstack/nova master: manager: Prevent compute startup on invalid vTPM config https://review.opendev.org/739212	09:24
openstackgerrit	Merged openstack/nova master: tests: Rename tests for '_create_guest_with_network' https://review.opendev.org/740464	09:24
openstackgerrit	Merged openstack/nova master: tests: Move single use constants to their callers https://review.opendev.org/741280	09:24
openstackgerrit	Merged openstack/nova master: tests: Define constants in '_IntegratedTestBase' https://review.opendev.org/741281	09:24
openstackgerrit	Merged openstack/nova master: tests: Remove 'test_servers.ServersTestBase' https://review.opendev.org/741282	09:24
openstackgerrit	Merged openstack/nova master: tests: Add 'PlacementHelperMixin', 'PlacementInstanceHelperMixin' https://review.opendev.org/741283	09:25
openstackgerrit	Merged openstack/nova master: tests: Make '_IntegratedTestBase' subclass 'PlacementInstanceHelperMixin' https://review.opendev.org/741284	09:25
*** mkrai has quit IRC		09:27
*** mkrai_ has joined #openstack-nova		09:27
*** yaawang has quit IRC		09:30
*** yaawang has joined #openstack-nova		09:30
stephenfin	Holy s***, they all merged in one go. No CI failures :O	09:31
openstackgerrit	Stephen Finucane proposed openstack/nova master: Use compression by default for 'SshDriver' https://review.opendev.org/684393	09:31
gibi	stephenfin: that was a nice set	09:31
stephenfin	bauzas, gibi: Can you look at ^ again real quick? Turns out 'scp' cares about the order of arguments. CI caught it for us and will catch it again if it's wrong	09:31
gibi	stephenfin: looking	09:32
stephenfin	(from https://zuul.opendev.org/t/openstack/build/7e8c6c6ddaba44e09a90a847dfe6ee46/log/logs/screen-n-cpu.txt)	09:32
stephenfin	Thanks	09:32
bauzas	ack	09:32
* bauzas wonders then why CI didn't catch it		09:32
stephenfin	It did	09:32
bauzas	hah	09:33
bauzas	fwiw https://linux.die.net/man/1/scp	09:33
stephenfin	I just thought it was intermittent failures and wasn't looking at it often enough to spot the trend :)	09:33
stephenfin	zuul++	09:33
bauzas	stephenfin: i don't see any required ordering with scp manpage	09:34
stephenfin	bauzas: neither did I, but the CI failure is fairly unambiguous	09:34
stephenfin	probably the implementation of getopt they're using is borked	09:34
bauzas	in theory, you could also scp -rC	09:35
* bauzas prefers tar over nc		09:35
gibi	I can reproduce the ordering requirement of scp locally	09:35
gibi	so the manpage is incomplete :)	09:35
bauzas	gibi: I honesly never used the -C flag	09:35
bauzas	like I said, I tend to use tar over nc when I wanted to transfer large files	09:36
stephenfin	tbf, parsing command line arguments is hard work	09:36
bauzas	waaaaay more efficient	09:36
* stephenfin suggests looking at the bug list for argparse /o\		09:36
gibi	scp is secure tar + nc is fast, it is a tradeoff :)	09:36
stephenfin	so broken :-(	09:36
stephenfin	to the point that click (which is actually awesome) uses the deprecated optparse. Less magical and more reliable, apparently	09:37
* stephenfin goes back to breaking stuff		09:38
* gibi hugs zuul both for being stable and for catching bugs		09:39
gibi	stephenfin: btw https://that.guru/blog/the-numa-scheduling-story-in-nova/ is a great article that made me think about where and when nova selects the resources to consume	09:42
bauzas	stephenfin: gibi: that's an argparse bug http://paste.openstack.org/show/796277/	09:42
bauzas	definitely not scp-related	09:42
stephenfin	gibi: You can thank sean-k-mooney for most of that. I just spell checked and reorganized :)	09:43
stephenfin	bauzas: Put '-C' at the end	09:43
bauzas	oh that	09:43
stephenfin	the issue isn't with the order of the positionals	09:43
bauzas	of course, it won't work then	09:43
stephenfin	options	09:43
stephenfin	it's with options coming after positionals	09:43
bauzas	you shock me if you thought it would work :p	09:44
stephenfin	but it does in many applications!	09:44
bauzas	but I honestly haven't paid attention at the argparse result :)	09:44
openstackgerrit	Merged openstack/nova master: trivial: Test object backporting against correct version https://review.opendev.org/742650	09:44
bauzas	it NEVER worked with scp then :)	09:44
bauzas	and many BSD commands	09:44
bauzas	(many many)	09:44
stephenfin	bauzas: http://paste.openstack.org/show/796278/	09:46
stephenfin	run that with e.g. 'python test.py 123 MB -b test'	09:46
stephenfin	it'll work just fine	09:46
stephenfin	so optparse (or whatever scp is using) is just plain broken	09:47
stephenfin	but hey, I'm not going to fix it :)	09:47
gibi	yeah 'grep foo ./ -R' works too	09:48
gibi	sean-k-mooney: good article https://that.guru/blog/the-numa-scheduling-story-in-nova/ :)	09:49
bauzas	stephenfin: fyk https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html	09:50
bauzas	tl;dr: options != operands	09:50
bauzas	argparse was probably written by Linux geeks who weren't knowing about UNIX :p	09:51
bauzas	time for a quote	09:51
bauzas	BSD is what you get when a bunch of UNIX hackers sit down to try to port a UNIX system to the PC. Linux is what you get when a bunch of PC hackers sit down and try to write a UNIX system for the PC	09:52
tosky	nice as a quote, even though iirc historically incorrect: when BSD started, there were no PC	09:53
bauzas	that's not coming from me :)	09:54
bauzas	but I used to play with some BSD OSes in the past, and this pun was very well known	09:54
bauzas	do people know that 'ps' has a very specific POSIX syntax that people can use indefffrently from the OS ?	09:55
*** mkrai_ has quit IRC		10:01
*** markvoelker has joined #openstack-nova		10:03
*** markvoelker has quit IRC		10:08
*** k_mouza has joined #openstack-nova		10:08
stephenfin	bauzas: I was taught to use e.g. 'ps aux' which I think is BSD compatible too	10:08
bauzas	that's correct, and that's the old syntax	10:09
bauzas	we made it forward compatible	10:09
bauzas	whoops	10:09
bauzas	they, not we	10:10
bauzas	I'm not THAT modest	10:10
*** zhanglong has quit IRC		10:10
bauzas	tl;dr: options without the dash come from BSD	10:10
bauzas	and Linux ported them	10:11
bauzas	but in theory, you should always follow the POSIX syntax to be 100% compliant across all platforms	10:11
bauzas	https://askubuntu.com/questions/484982/what-is-the-difference-between-standard-syntax-and-bsd-syntax	10:12
bauzas	or slighly better https://man7.org/linux/man-pages/man1/ps.1.html	10:15
*** spatel has joined #openstack-nova		10:18
*** k_mouza has quit IRC		10:19
*** brinzhang_ has quit IRC		10:20
*** mkrai_ has joined #openstack-nova		10:22
*** spatel has quit IRC		10:22
*** martinkennelly has joined #openstack-nova		10:26
*** k_mouza has joined #openstack-nova		10:29
*** psachin has quit IRC		10:34
*** links has joined #openstack-nova		10:40
*** jraju__ has quit IRC		10:41
*** mkrai_ has quit IRC		10:43
*** mkrai__ has joined #openstack-nova		10:43
*** k_mouza has quit IRC		10:45
*** yaawang has quit IRC		10:50
*** yaawang has joined #openstack-nova		10:51
*** k_mouza has joined #openstack-nova		10:51
*** k_mouza has quit IRC		10:53
*** k_mouza has joined #openstack-nova		10:53
*** ociuhandu has joined #openstack-nova		10:54
*** ociuhandu has quit IRC		10:59
*** k_mouza has quit IRC		11:01
*** Yumeng has quit IRC		11:08
*** udesale_ has joined #openstack-nova		11:11
*** k_mouza has joined #openstack-nova		11:12
*** udesale has quit IRC		11:13
*** k_mouza has quit IRC		11:17
*** mkrai__ has quit IRC		11:42
openstackgerrit	Stephen Finucane proposed openstack/nova master: scheduler: Default request group to None https://review.opendev.org/742651	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: tests: Add helpers for suspend, resume and reboot of server https://review.opendev.org/741285	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Pass context, instance to '_create_domain' https://review.opendev.org/741286	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: api: Reject non-spawn operations for vTPM https://review.opendev.org/741500	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: libvirt: Add emulated TPM support to Nova https://review.opendev.org/631363	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: docs: Add docs for vTPM support https://review.opendev.org/739213	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: Don't unset Instance.old_flavor, new_flavor until necessary https://review.opendev.org/741995	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add support for resize and cold migration of emulated TPM files https://review.opendev.org/639934	11:52
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add type hints to 'nova.compute.manager' https://review.opendev.org/742863	11:53
openstackgerrit	Stephen Finucane proposed openstack/nova master: privsep: Add support for recursive chown, move_tree operations https://review.opendev.org/742864	11:53
openstackgerrit	Stephen Finucane proposed openstack/nova master: Add type hints to 'nova.virt.libvirt.utils' https://review.opendev.org/742865	11:53
*** markvoelker has joined #openstack-nova		12:04
*** markvoelker has quit IRC		12:05
*** markvoelker has joined #openstack-nova		12:06
*** k_mouza has joined #openstack-nova		12:07
*** k_mouza has quit IRC		12:12
*** k_mouza has joined #openstack-nova		12:18
*** psachin has joined #openstack-nova		12:18
*** k_mouza has quit IRC		12:23
*** derekh has quit IRC		12:24
*** k_mouza has joined #openstack-nova		12:28
*** k_mouza has quit IRC		12:31
*** k_mouza has joined #openstack-nova		12:31
*** ratailor_ has quit IRC		12:34
*** ociuhandu has joined #openstack-nova		12:43
*** ociuhandu has quit IRC		12:47
*** derekh has joined #openstack-nova		12:51
*** lbragstad has joined #openstack-nova		13:05
*** mriedem has joined #openstack-nova		13:07
*** artom has joined #openstack-nova		13:09
*** zigo has quit IRC		13:19
*** gokhani has joined #openstack-nova		13:25
*** ociuhandu has joined #openstack-nova		13:30
*** zigo has joined #openstack-nova		13:31
openstackgerrit	Elod Illes proposed openstack/nova stable/rocky: compute: Allow snapshots to be created from PAUSED volume backed instances https://review.opendev.org/729177	13:41
*** sean-k-mooney has joined #openstack-nova		13:46
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Add regression test for bug 1879787 https://review.opendev.org/741230	13:48
openstack	bug 1879787 in OpenStack Compute (nova) "post_live_migration does not handle Neutron errors" [Medium,In progress] https://launchpad.net/bugs/1879787 - Assigned to Artom Lifshitz (notartom)	13:48
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Handle Neutron errors in _post_live_migration() https://review.opendev.org/729763	13:48
*** gokhani has quit IRC		13:48
openstackgerrit	Merged openstack/nova stable/ussuri: libvirt: Handle VIR_ERR_DEVICE_MISSING when detaching devices https://review.opendev.org/742414	13:49
sean-k-mooney	gibi: hi o/ i was away for a funeral this week so just seeing your commnet on the attach/detach patch now. i can take a look at it more closely next week but still pretty burt out today. hopefully ill be less mentally exausted after the weekend	13:54
sean-k-mooney	so ya i think your right there is a bug related to macvtap detach that is prexisting in the libvirt driver	13:56
sean-k-mooney	well 2 one its not updating the domain correctly because its not finding the device properly and 2 its not relasing the vf claim because we just dont do that today for sriov detach	13:57
sean-k-mooney	problem 1 is want prevents the vf mac from being reset and the macvtap being removed on detach	13:58
gibi	sean-k-mooney: no worries. take your time to recover	13:58
*** mlavalle has joined #openstack-nova		13:58
gibi	sean-k-mooney: I think I've just found the reason of 2	13:58
gibi	and I think I can fix it	13:58
gibi	I will be away next week	13:59
gibi	so feel free to touch my code or add patches to the series while I'm away and I will continue the weak after	13:59
sean-k-mooney	what proably makes sense is to have 3 patches. 1 that block detach in the api, then your current one and a final patch for macvtap	13:59
gibi	yes, make sense to have separate patches for the separate issues	14:00
*** k_mouza has quit IRC		14:00
sean-k-mooney	we also need a ptach for direct-physical but it is basically the same issue the device lookup fails although it fails for a different reason	14:00
*** psachin has quit IRC		14:01
sean-k-mooney	it filas because the mac is not present rather then the target_dev	14:01
sean-k-mooney	but its still failing in the same if i belive https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L257	14:01
gibi	sean-k-mooney: cool. I haven't had time to try direct-physical yet,	14:02
sean-k-mooney	what i think make sense is to just have 2 code paths. if its an sriov inteface find it by the pci adresss and remove it	14:03
sean-k-mooney	if not find it by its mac and remove it	14:03
*** xek_ has joined #openstack-nova		14:04
sean-k-mooney	ill have to look at the code and see if that makes sense in partice however as im not sure if we have the vnic_type or vif type avaiable	14:05
gibi	the code that searchs for the interface has access to the vif 124: enp129s16f6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000	14:06
gibi	bah	14:06
gibi	the code that searchs for the interface has access to the vif 124: enp129s16f6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000	14:06
gibi	my copy paste buffer is brokn :/	14:07
*** ociuhandu has quit IRC		14:07
sean-k-mooney	ok cool	14:07
gibi	https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L257	14:07
*** ociuhandu has joined #openstack-nova		14:07
gibi	nah, this is the place where the matching between the current domain and the vif being detached happens https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L257	14:07
sean-k-mooney	same link :) i think you ment https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/driver.py#L2199	14:09
sean-k-mooney	and yes it has the vif	14:09
stephenfin	lyarwood: Are you the person I need to shout at for nova-ceph-multistore failing? :P	14:10
sean-k-mooney	so we could add a get_interface_by_pci_address and call that instead for sriov devices.	14:10
gibi	sean-k-mooney: yes both yours and mine points to the code that causes the failure	14:10
stephenfin	jk, but heads up I'm seeing a lot of failures on that today. Haven't investigated yet though	14:10
*** dpawlik2 has quit IRC		14:11
*** k_mouza has joined #openstack-nova		14:11
lyarwood	stephenfin: dansmith introduced it while I was out so no ;)	14:12
lyarwood	stephenfin: what's up?	14:12
openstackgerrit	Balazs Gibizer proposed openstack/nova master: [WIP] Support SRIOV interface attach and detach https://review.opendev.org/740995	14:12
dansmith	stephenfin: link?	14:12
sean-k-mooney	stephenfin: its a modifed verion of the previous ceph job	14:12
stephenfin	dansmith: https://review.opendev.org/#/c/741286/	14:12
sean-k-mooney	so the test are the same but the config is slighly different to enable multistore and the image import form copy feature	14:13
dansmith	stephenfin: thanks will look through it in a sec	14:13
gibi	sean-k-mooney: https://review.opendev.org/#/c/740995/5/nova/virt/libvirt/guest.py@240 this change fixes the macvtap detach issue in my env, but I agree that the condition might need a refactoring to have two condition one for pci and another for mac	14:14
* lyarwood wonders if this is a space issue again		14:14
sean-k-mooney	gibi: ya so that will work for macvtap but we will still fail for direct-physical	14:15
gibi	sean-k-mooney: yes, probably, haven't tried	14:15
sean-k-mooney	interfaces = self.get_all_devices(	14:15
sean-k-mooney	vconfig.LibvirtConfigGuestInterface)	14:16
sean-k-mooney	that wont return the direct-physical interfaces	14:16
sean-k-mooney	since they are not element <interface ...> and use <hostdev ...?	14:16
gibi	ohh	14:16
gibi	interesting	14:16
sean-k-mooney	also they dont have a mac in the host develement	14:16
sean-k-mooney	libvirt cant passthough a pf with the <interface type=hostdev> only VFs	14:17
sean-k-mooney	and the hostdev element dose not have a mac either so interface.mac_addr == cfg.mac_addr would fail	14:18
sean-k-mooney	proably with an attribute error if we got that far	14:18
dansmith	stephenfin: did you look into those fails at all? looks to me like just novalidhost on at least one of the three failed tests, and it's a conflict from placement during scheduling:	14:20
dansmith	https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/screen-n-sch.txt#3493	14:20
dansmith	meaning, are you sure it's just that job failing more? because that fails way before the point where we get to any of the new (i.e. ceph or multistore) stuff	14:21
sean-k-mooney	there are traces in the n-cpu log https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/screen-n-cpu.txt#14267-14323	14:21
sean-k-mooney	nova.exception.ImageNotFound: Image a549f544-e4e3-4f66-962e-03c1514ee21f could not be found	14:21
stephenfin	dansmith: Barely. I'm seeing image retrieval failures in n-cpu	14:21
stephenfin	yeah, those ^	14:21
*** links has quit IRC		14:21
dansmith	hmm, maybe the first test I picked was a rando failure then	14:22
stephenfin	but there are a couple of patches in that series failing and I don't think they're related to the code	14:22
sean-k-mooney	if those tests are uploading new images maybe they are not ready when the boot is started because the import/conversion takes longer or something	14:23
dansmith	ah yeah, I see now	14:23
dansmith	sean-k-mooney: yeah that could be	14:24
dansmith	I think we should still be able to GET the image though	14:24
sean-k-mooney	looks like that is not the case for rescure at least https://github.com/openstack/tempest/blob/257f3b009f7978723a8748f9f5b413aa8eb38e3a/tempest/api/compute/servers/test_server_rescue.py#L55-L67	14:25
dansmith	sean-k-mooney: what is not the case for rescue?	14:26
sean-k-mooney	ya it just does rescue without specifying an image so it will use the image the vm was booted with or the image specifid in the config. i wonder if it failed before that	14:26
sean-k-mooney	dansmith: the rescue test is not uploading any images	14:26
dansmith	are you looking at a different fail?	14:26
sean-k-mooney	tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescue_unrescue_instance	14:27
sean-k-mooney	its the second failure in the test report	14:27
dansmith	ack, the first thing you linked is to an ImagesTest not rescue right?	14:27
dansmith	it's definitely doing a snapshot	14:27
sean-k-mooney	actully looking at the server uuid its not in the ncpu log so the novalid host looks like it really could not fit	14:29
dansmith	sean-k-mooney: right, that's what I was saying, I just picked poorly on the first test to look at :)	14:29
dansmith	sean-k-mooney: hmm, I see a DELETE of the image just before the failed GET in the glance logs, for that snapshot one, which is odd	14:29
sean-k-mooney	yep Got no allocation candidates from the Placement API.	14:30
sean-k-mooney	oh downstream call	14:31
dansmith	ah	14:31
dansmith	so,	14:31
dansmith	I think that stack trace from sean-k-mooney is a red herring	14:31
dansmith	I think that's an images test that tries to delete the image whilst snapshotting or something	14:31
dansmith	it's not even one of the tests that failed in the testr report :)	14:31
dansmith	all three of those tests are novalidhost	14:32
dansmith	so maybe we're actually reporting something different to placement and running out of disk or something?	14:32
sean-k-mooney	ya maybe	14:32
sean-k-mooney	we have 80G of disk in the ci vms but it may not all be avaible int /opt	14:33
sean-k-mooney	so i dont know we might have ran out of space	14:33
dansmith	well,	14:34
dansmith	it might be a reporting thing or something and not actually out of space,	14:34
dansmith	because we're not seeing problems, just placement is refusing to find space	14:34
dansmith	Jul 24 12:44:22.575632 ubuntu-bionic-ovh-bhs1-0018770257 devstack@placement-api.service[50512]: DEBUG placement.wsgi_wrapper [req-eeb6d563-2483-4e4f-91e8-2dc3a694ade4 req-c57d5bd6-fc4e-469d-9784-cdfe1652d653 service placement] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'DISK_GB' on resource provider 'e786426a-5ae2-4732-8cf6-16325fd2bf2a'. The requested amount would exceed	14:36
dansmith	the capacity. {{(pid=50513) call_func /opt/stack/placement/placement/wsgi_wrapper.py:31}}	14:37
dansmith	Over capacity for DISK_GB on resource provider e786426a-5ae2-4732-8cf6-16325fd2bf2a. Needed: 1, Used: 10, Capacity: 10.0	14:37
dansmith	10G doesn't sound right	14:37
*** mlavalle has quit IRC		14:40
mriedem	random drive by comment but https://review.opendev.org/#/c/586363/	14:40
*** eharney has quit IRC		14:40
mriedem	anyway related to ceph ci jobs?	14:41
dansmith	I'm trying to figure out, but we are running a ceph df right before we report inventory	14:41
* mriedem ducks back into hole		14:41
openstackgerrit	Alex Deiter proposed openstack/nova master: Detach is broken for multi-attached fs-based volumes https://review.opendev.org/741712	14:41
*** mlavalle has joined #openstack-nova		14:43
*** k_mouza has quit IRC		14:43
*** k_mouza has joined #openstack-nova		14:53
*** eharney has joined #openstack-nova		14:53
sean-k-mooney	dansmith: by the way if the traceback is unrelated then we likely have another silent bug as we are not catching the excpetion in the missing image case	14:54
dansmith	sean-k-mooney: yep	14:54
dansmith	so we're calling ceph df to get the total size of the pool and reporting that	14:55
sean-k-mooney	dansmith: i think you are right that its unrelated	14:55
dansmith	as best I can tell, the ceph is backed by a 24G partition	14:55
*** udesale_ has quit IRC		14:55
dansmith	so I dunno where the 10G is coming from	14:55
sean-k-mooney	this is using the ceph image backend in nova so the local_GB should be the ceph pool size right	14:56
dansmith	well, it should be yes	14:56
dansmith	ceph has 24G, so I'm trying to find where our images pool would be limited to 10G but not seeing it	14:56
dansmith	one thing that might explain this,	14:56
dansmith	is that our normal ceph job was using qcow on rbd, which is not what you're supposed to do,	14:57
sean-k-mooney	oh ya because we have to flatten it	14:57
sean-k-mooney	it should be raw	14:57
sean-k-mooney	to get the cow optimization	14:57
dansmith	and so we convert the image to raw, which is 44M per image instead of 12 or something.. although we shouldn't really be using that much space, so... hmm	14:57
dansmith	and this is just placement saying we're out of space, not ceph	14:57
dansmith	I wonder if glance is incorrectly determining the size of the new image after it flattens or something	14:58
sean-k-mooney	well with after teh first image import is all cow clones in ceph right	14:58
dansmith	and telling us we need a lot more than we do or something	14:58
dansmith	right	14:58
dansmith	check this out: Jul 24 12:44:22.270293 ubuntu-bionic-ovh-bhs1-0018770257 nova-scheduler[55176]: WARNING nova.scheduler.host_manager [None req-eeb6d563-2483-4e4f-91e8-2dc3a694ade4 tempest-MultipleCreateTestJSON-968818181 tempest-MultipleCreateTestJSON-968818181] Host ubuntu-bionic-ovh-bhs1-0018770257 has more disk space than database expected (8 GB > 1 GB)	15:01
sean-k-mooney	reserved_host_disk_mb IS 0 TOO	15:01
sean-k-mooney	that is strange do we have the hoststate update enabled	15:02
*** k_mouza has quit IRC		15:03
sean-k-mooney	im pretty sure we do	15:03
*** eharney has quit IRC		15:03
sean-k-mooney	ya we do	15:03
*** derekh has quit IRC		15:03
sean-k-mooney	disk_allocation_ratio=1.0,disk_available_least=8,free_disk_gb=10,f	15:04
dansmith	we're only asking placement for DISK_GB=1 allocation so I don't think we're getting a bad number from glance or anything	15:04
sean-k-mooney	what do our flavor look like	15:06
sean-k-mooney	actully no never mind	15:06
sean-k-mooney	this is not bfv	15:06
sean-k-mooney	the flavor should be either 1 or 2GB per instance i think	15:06
*** jsuchome has quit IRC		15:07
dansmith	and it seems like 1 since we're asking for that size allocation	15:07
sean-k-mooney	ya its based on teh image size https://github.com/openstack/devstack/blob/2ecd1823850ae0e00ad0ecebbbceb312be60ccf4/lib/tempest#L204-L206	15:09
sean-k-mooney	so for cirros image it will be 1g	15:09
dansmith	sudo ceph -c /etc/ceph/ceph.conf osd pool create vms 8 8	15:10
dansmith	that's 8G for the vms pool	15:10
dansmith	I dunno where we're getting 10G	15:10
sean-k-mooney	i dont think that is the size	15:10
sean-k-mooney	i think that is the buckest to share it in	15:10
sean-k-mooney	let me check	15:10
dansmith	hmm, okay it seems like size	15:11
sean-k-mooney	i think its the placment groups but its been a while	15:11
dansmith	okay yeah, maybe you're right	15:11
sean-k-mooney	ceph osd pool create <pool-name> <pg-num> <pgp-num> [replicated] \	15:12
sean-k-mooney	[crush-ruleset-name] [expected-num-objects]	15:12
sean-k-mooney	so ya its not the size	15:12
dansmith	yeah	15:13
dansmith	I still dunno where we're getting 10G,	15:13
sean-k-mooney	same	15:13
*** ociuhandu has quit IRC		15:14
dansmith	because CEPH_LOOPBACK_DISK_SIZE=24G	15:14
sean-k-mooney	so it should be 8 https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/settings#L17	15:14
sean-k-mooney	by default	15:15
dansmith	it's overridden in our job somewhere	15:15
dansmith	you can see in the devstacklog	15:15
sean-k-mooney	CEPH_LOOPBACK_DISK_SIZE is	15:15
sean-k-mooney	is VOLUME_BACKING_FILE_SIZE	15:15
dansmith	ues	15:15
dansmith	both are	15:15
sean-k-mooney	ok cool	15:15
dansmith	VOLUME_BACKING_FILE_SIZE=24G	15:16
dansmith	and the df shows 24G on /var/lib/ceph	15:16
sean-k-mooney	ah yes it does	15:17
*** eharney has joined #openstack-nova		15:17
dansmith	we run "ceph df" to get the DISK_GB we report,	15:17
dansmith	and don't really do much to it,	15:17
dansmith	so it really seems like we're being told 10G	15:18
*** k_mouza has joined #openstack-nova		15:19
dansmith	lyarwood: do you know anything about what ceph df may be telling us about total pool size that differs from the backing store's size?	15:19
lyarwood	dansmith: nope, AFAIK it just reports the size of the images_rbd_pool	15:20
* lyarwood looks		15:20
dansmith	seems straightforward :)	15:21
lyarwood	https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L374-L382 - ah well melwitt has a handy comment here that might help	15:21
lyarwood	gah highlights are a new lines off but you get the point	15:21
dansmith	oh I read that,	15:21
dansmith	but didn't grok until now	15:22
dansmith	so replication makes the thing looks smaller I guess?	15:22
dansmith	seems weird to go from 24G to 10G, as that's not an even factor	15:22
-openstackstatus- NOTICE: We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience.		15:22
dansmith	er, no wait	15:23
dansmith	that's for max_avail, which is "free" not total right?	15:23
*** k_mouza has quit IRC		15:23
lyarwood	right sorry and you're seeing 10 reported as the total capacity right?	15:24
dansmith	correct	15:24
lyarwood	kk sorry then that isn't it	15:24
*** dklyle has joined #openstack-nova		15:25
*** maciejjozefczyk has quit IRC		15:26
dansmith	I guess one thing we could do is increase the ceph backing size to 36G and see if DISK_GB goes up	15:26
bauzas	can someone tell me what the fuck is ? http://paste.openstack.org/show/796292/	15:26
bauzas	tl;dr: ssh: connect to host review.openstack.org port 29418: Network is unreachable	15:26
bauzas	have I missed a memo ?	15:27
melwitt	there's a openstackstatus above ^ said there will be a short outage	15:27
openstackgerrit	Sylvain Bauza proposed openstack/nova-specs master: WIP: Offline Reshape tool spec https://review.opendev.org/742908	15:29
bauzas	yay, it worked	15:29
bauzas	melwitt: thanks	15:29
bauzas	calling it a day	15:29
*** k_mouza has joined #openstack-nova		15:32
melwitt	dansmith: MAX_AVAIL should be total actually, just taking number of replicas into account. if you only have 1 replica (default NUM_REPLICAS=1) then MAX_AVAIL should match whatever total says in 'ceph df'	15:32
dansmith	melwitt: you're reporting free as max_avail though in that thing aren't you?	15:32
dansmith	or does MAX_AVAIL != max_avail ?	15:32
melwitt	but if you've set NUM_REPLICAS=2 when you deployed a devstack, then since the devstack ceph plugin creates 2 OSDs on the same HDD in that case, it would be 2x the real disk	15:32
melwitt	no MAX_AVAIL is a ceph thing	15:33
melwitt	(if you're referring to what is written about ceph df in rbd_utils.py)	15:33
dansmith	you mean half the disk I assume	15:33
dansmith	yeah	15:33
melwitt	no like the old behavior used to report 20G if you had a 10G disk, of you had created 2 OSDs that point at the same HDD	15:33
dansmith	so maybe (24 - overhead) / 2 == 10 or something	15:34
melwitt	you're using NUM_REPLICAS=1 right? you didn't set it in the job	15:34
melwitt	if so, there shouldn't be a difference	15:34
dansmith	I'm not setting it, but let me look if it's getting set	15:34
melwitt	I doubt it, I've never seen it set in CI before. I had to set it locally to do the testing for that MAX_AVAIL change	15:35
dansmith	yeah I don't even see that variable anywhere	15:35
dansmith	is that a devstack-plugin-ceph thing?	15:35
melwitt	yeah sec	15:35
lyarwood	dansmith: https://docs.ceph.com/docs/jewel/rados/operations/pools/#create-a-pool ; sudo ceph -c /etc/ceph/ceph.conf osd pool create vms 8 8 ; that doesn't mean create a 8GB pool	15:36
melwitt	bah sorry it's CEPH_REPLICAS https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L109	15:36
*** k_mouza has quit IRC		15:36
dansmith	lyarwood: yeah we established that :)	15:36
lyarwood	ah sorry wasn't watching irc	15:36
dansmith	lyarwood: somewhere in the plugin I saw a comment that made it sound like that was size	15:36
melwitt	10G honestly I would have thought is just the cloud image's disk size, no?	15:37
dansmith	melwitt: yeah 1	15:37
melwitt	or do we probably use something larger in CI	15:37
dansmith	melwitt: no, said above, it's 24G	15:37
dansmith	melwitt: https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/df.txt	15:37
dansmith	and it's overridden to 24G in the devstack log	15:37
lyarwood	that's total for the three different pools	15:37
*** k_mouza has joined #openstack-nova		15:37
lyarwood	vms images and volumes?	15:37
melwitt	oh I see	15:38
dansmith	lyarwood: and are the pools set to something specific for size? that's what we're trying to find and can't :)	15:38
dansmith	lyarwood: the way it looks now I'd assume it just reports that they're all 24G in size, with various amounts free like zfs does for filesystems on a pool,	15:38
dansmith	but I'm just guessing	15:38
dansmith	I'm stacking a ceph devstack so I can poke but right now all I have is logs	15:38
dansmith	if total decreases as we use space, then we're not really reporting the right thing to placement	15:39
dansmith	which could be part of the problem of coruse	15:39
lyarwood	dansmith: yup true and that's also going to bounce around alot during a tempest run	15:40
dansmith	yep	15:40
dansmith	I'm pretty sure this is not a consequence of my job, by the way, I think mine is just a little slower because we have some glance features turned on, so we probably have a little more of a logjam than normal	15:40
*** xek_ has quit IRC		15:41
dansmith	oh jeez, you know what I just realized?	15:41
dansmith	we might be snapshotting to the file store and not the ceph store in some cases, actually	15:41
dansmith	hmm	15:41
dansmith	nova does the snapshots itself so maybe not, but if we ever do a raw image upload.. the default store is the file store	15:41
*** gyee has joined #openstack-nova		15:42
dansmith	not that that would cause this, but it might be changing the timing characteristics	15:42
*** k_mouza has quit IRC		15:42
dansmith	I'll have to think on that a bit	15:42
melwitt	well, this doesn't look promising for MAX_AVAIL, it sounds like it would decrease with use and is not a total https://access.redhat.com/solutions/3537961	15:42
dansmith	ah yeah	15:43
dansmith	melwitt: did you read this? https://access.redhat.com/solutions/2273951	15:43
dansmith	we're not replicated I guess so maybe that doesn't affect us in CI, but probably has some impact for real users of this	15:44
melwitt	no	15:44
melwitt	so there are multiple reasons MAX_AVAIL shouldn't be used :(	15:45
*** bnemec is now known as beekneemech		15:46
dansmith	not it!	15:46
*** k_mouza has joined #openstack-nova		15:47
dansmith	the other problem I'm guessing,	15:47
melwitt	yeah... I'm thinking whether to revert that or tweak it to take total and divide by pool size, the latter would do what was actually desired and report total with replication considered	15:47
dansmith	is that if we report the real actual total (even minus replication overhead), but other pools can consume space from the same store,	15:48
dansmith	we will tell placement we have more room than it can allocate	15:48
dansmith	so really we need to sum up all the pools on the same store, and then set reserved= for any space they use I guess, but then we race with those other uses in our reporting	15:48
dansmith	and could go negative	15:48
melwitt	yeah, I'm trying to remember, I could have sworn this get_pool_info was only used to report free space, not total space, but I could be totally making that up	15:49
melwitt	or that that's what it's used for ultimately in higher layers	15:49
melwitt	let me look up what "total" used to be, maybe it meant "total available"	15:50
melwitt	no, looks like it was total. had total, total used, and total available	15:51
*** k_mouza has quit IRC		15:51
sean-k-mooney	dansmith: one thing that i just tought of	15:52
*** dtantsur is now known as dtantsur\|afk		15:53
sean-k-mooney	by default replicate pools have a replciation factor of 3	15:53
sean-k-mooney	so if we have 24G of space we would only have 8 useable	15:53
melwitt	but looking at the clip again https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L374-L382 I did parse out total_bytes to go with 'total', max_avail to go with 'free', and bytes_used to go with 'used'. so this should be fine....	15:53
dansmith	I'm confused about whether we're replicating or not	15:53
dansmith	sean-k-mooney: ^	15:53
sean-k-mooney	that is the default unless we create a erasure encoded pool	15:53
dansmith	and even still, 24/3==10 only for very small values of 3 :P	15:53
dansmith	hmm, okay what is CEPH_REPLICAS then?	15:54
melwitt	that's the number of replicas for when it creates the pools	15:54
sean-k-mooney	well we have 24G for ceph but we have multiple pools right?	15:54
*** k_mouza has joined #openstack-nova		15:54
sean-k-mooney	the images pool will also be using that	15:54
dansmith	right, vms and images	15:55
sean-k-mooney	https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L109	15:56
sean-k-mooney	its 1	15:56
sean-k-mooney	CEPH_REPLICAS	15:57
sean-k-mooney	wich for ci makes sense	15:57
dansmith	right, I think we established that earlier :)	15:57
melwitt	yeah, I was saying earlier I've never seen CI use anything other than the default of 1	15:57
sean-k-mooney	well we dont need to test anything else in our ci since we are not really testing ceph	15:58
dansmith	https://pastebin.com/6gcGhTHQ	15:58
sean-k-mooney	just ceph integration with other thngs	15:58
dansmith	this is what my ceph df shows on a clean devstack	15:58
*** gibi is now known as gibi_pto		15:58
dansmith	interestingly I didn't update my backing size from 8 to 24, but still got 24	15:58
gibi_pto	so I'm going away for a week. I will be back on 3rd of Aug	15:58
dansmith	gibi_pto: p/	15:59
*** k_mouza has quit IRC		15:59
gibi_pto	o/	15:59
lyarwood	\o	15:59
sean-k-mooney	dansmith: i think VOLUME_BACKING_FILE_SIZE is a devstack setting	15:59
dansmith	oh, I see, and ceph plugin uses that, gotcha	16:00
sean-k-mooney	yes https://github.com/openstack/devstack/blob/e0d06adffcf4c8da1aefebc66f2de9a440badbf6/stackrc#L766	16:00
sean-k-mooney	and devstack defaults it to 24	16:00
sean-k-mooney	so that is where that is comming form	16:00
sean-k-mooney	that was orginically for cinder	16:01
sean-k-mooney	oh	16:03
sean-k-mooney	https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph_log.txt	16:03
sean-k-mooney	pgmap v5: 0 pgs: ; 0 B data, 704 KiB used, 9.0 GiB / 10 GiB avail	16:03
sean-k-mooney	so ceph does think it has only 10G	16:03
dansmith	oh nice, but from where?	16:03
*** k_mouza has joined #openstack-nova		16:04
sean-k-mooney	there is a ceph follder at the root of the contoler logs	16:04
dansmith	from my devstack: 2020-07-24 08:25:52.400945 mgr.x client.14099 192.168.201.41:0/3299763660 2 : cluster [DBG] pgmap v5: 0 pgs: ; 0B data, 188MiB used, 23.8GiB / 24.0GiB avail	16:04
dansmith	no I mean where is it getting the 10G	16:04
sean-k-mooney	im wondering if we are using the filestore backend and didnt resize the filesystem or something?	16:04
sean-k-mooney	although DF on the host shose 24G right	16:05
dansmith	it does,	16:05
sean-k-mooney	is that the block device size or filesystem	16:05
dansmith	and my local devstack shows the 24G	16:05
dansmith	filesystem	16:05
dansmith	doesn't look like we grab the ceph configs	16:06
sean-k-mooney	https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#10	16:06
sean-k-mooney	so its using filestore	16:06
sean-k-mooney	but wy is that 10G	16:06
*** songwenping_ has joined #openstack-nova		16:06
sean-k-mooney	oh its using bluestore not file store	16:06
sean-k-mooney	but same question	16:06
dansmith	you mean files in /var/lib/ceph right?	16:07
sean-k-mooney	bluestore(/var/lib/ceph/osd/ceph-0) _setup_block_symlink_or_file resized block file to 10 GiB	16:07
sean-k-mooney	its not using the mount	16:07
dansmith	not using the mount for what?	16:07
sean-k-mooney	its creating a ceph-0 file inside it it think	16:07
dansmith	sure, that's inside the mount	16:07
dansmith	it's creating a 10G flat file right?	16:08
sean-k-mooney	i think so	16:08
*** k_mouza has quit IRC		16:08
sean-k-mooney	that is then being used for the osd	16:08
dansmith	right	16:08
sean-k-mooney	so we are creatinga 24G flatifile and attaching it as a loopback device then mounting it on /mnt/ceph	16:09
sean-k-mooney	sorry	16:09
sean-k-mooney	/var/lib/ceph	16:09
dansmith	right	16:09
sean-k-mooney	then inside that they are creating another flatfile	16:09
dansmith	and then it's creating a file called block inside there as the actual thing the osd uses	16:09
sean-k-mooney	and using that for the osd	16:09
sean-k-mooney	yep	16:10
sean-k-mooney	so this is wrong	16:10
dansmith	and that thing is 10G	16:10
*** songwenping__ has quit IRC		16:10
sean-k-mooney	i think we are expecting them to use the /var/lib/ceph mound directly for the osd	16:10
sean-k-mooney	i suspect this behavior changed when we changed form the filestore to bluestore backend	16:10
sean-k-mooney	we should mount the loopback device at /var/lib/ceph/osd/ceph-0/block	16:11
sean-k-mooney	instead that way it would have teh full 24G	16:11
dansmith	I dunno what "directly" means.. they still have to store their data in their special format right?	16:11
dansmith	and it's normally a raw disk they want, so if we give them a filesystem they need to create a flat file to emulate the block device on no?	16:11
dansmith	fwiw, I don't have a block file (yet) and mine is reporting 24G	16:12
dansmith	so i dunno why it's different	16:12
dansmith	ah, my osd0.log:	16:13
dansmith	xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf	16:13
dansmith	so that's different than bluestore I guess you're saying?	16:13
sean-k-mooney	i think if we just left /var/lib/ceph mounted under / as part of the root filestem and moved wehere we mount the 24G loopback device file to /var/lib/ceph/osd/ceph-0/block ceph would have all 24G	16:13
sean-k-mooney	dansmith: yes that is teh filestore backend	16:13
sean-k-mooney	that use need a folder to use	16:13
dansmith	ah, CI is using the nautilus version of ceph, I'm on luminous	16:13
sean-k-mooney	luminous is the defualt in the devstack plugin ya	16:14
dansmith	but CI is using nautilus	16:14
sean-k-mooney	but ci i guess is overriding it	16:14
dansmith	ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process ceph-osd, pid 8952	16:14
sean-k-mooney	ya	16:14
dansmith	can we set the backing store driver?	16:14
dansmith	back to xfs?	16:14
sean-k-mooney	we could but i think the better solution is to change how we do the mounting	16:15
sean-k-mooney	bluestore is not the default	16:15
sean-k-mooney	upstream	16:15
sean-k-mooney	in ceph and downstream as of osp 16	16:15
sean-k-mooney	so its nice to test with bluestore	16:15
*** k_mouza has joined #openstack-nova		16:15
dansmith	unless you see where we're setting it to bluestore, it would seem maybe the default changed in nautilus?	16:15
sean-k-mooney	well after lumious in any case but yes i dont think we currently set it directly	16:16
dansmith	you said "bluestore is not the default" above	16:16
dansmith	so I'm confused about what you're proposing	16:17
sean-k-mooney	oh i ment is	16:17
sean-k-mooney	it is now the default in ceph	16:17
*** k_mouza has quit IRC		16:17
sean-k-mooney	i filestore used to be the default before	16:17
dansmith	okay that's what I was saying	16:18
*** k_mouza has joined #openstack-nova		16:18
dansmith	I still don't get where the 10G comes from, other than that something is clearly different with blue vs xfs stores	16:18
sean-k-mooney	bluestore has been the default for a few releases now. filestore is deprecated upstream and downstream in ops	16:18
sean-k-mooney	dansmith: i think that is the default size that the ceph tool uses	16:19
sean-k-mooney	when its creating a backing file	16:19
dansmith	okay I don't see that anywhere	16:19
sean-k-mooney	its being created by the ceph osd itself here https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#4	16:20
dansmith	I imagine that keeping the loopback mount for var lib ceph is ideal for the plugin as long as we have stable branches that use that	16:20
dansmith	sean-k-mooney: yeah I get that :)	16:20
dansmith	sean-k-mooney: I'm saying I don't know where 10G is set or assumed or whatever ;)	16:20
sean-k-mooney	yes we can proably change that in the job?	16:21
melwitt	dansmith: is it not here? https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#18	16:21
dansmith	lol	16:21
dansmith	yes, I understand 10G is being used	16:21
sean-k-mooney	or if the destack pluging is branched we can change it only on the branchs that use nautilus	16:21
dansmith	I'm saying I don't see a config for that	16:21
dansmith	https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/	16:21
sean-k-mooney	if there is a config it would be the ceph config file	16:21
melwitt	oh, sorry, just saying that's a command that is setting 10G deliberately	16:22
dansmith	melwitt: yep I think that's understood now	16:22
melwitt	from what sean-k-mooney was saying, I thought no one saw a deliberate setting of it yet	16:22
melwitt	that it was happening "automatically"	16:22
melwitt	I am caught up now	16:22
dansmith	all I'm saying is, I imagine that bluestore can have more than 10G of backing store, and if it's not basing that on actual disk free space, it's probably a config somewhere or something :)	16:23
melwitt	yeah, I understand now	16:23
melwitt	I misunderstood what sean was saying earlier	16:24
dansmith	the bluestore config actually seems to be mostly focused on using physical devices	16:25
*** songwenping__ has joined #openstack-nova		16:25
sean-k-mooney	melwitt: it happens frequently that people missunderstad me. well not that often if i type what i ment to type but i am bad at not doing that	16:27
*** songwenping_ has quit IRC		16:27
*** nightmare_unreal has quit IRC		16:27
melwitt	sean-k-mooney: eh, I often have trouble understanding people so by our powers combined... !	16:28
dansmith	I HAVE NO IDEA WHAT YOU PEOPLE ARE SAYING	16:30
melwitt	I AM GOOD AT DEALING WITH PEOPLE	16:31
* stephenfin decides this is too much weirdness and bails		16:31
sean-k-mooney	stephenfin: talking about storage does this to people	16:31
sean-k-mooney	just taking a step back	16:32
sean-k-mooney	we are happy we know where the 10G size is comming from now	16:32
melwitt	StOrAGE	16:32
*** eharney has quit IRC		16:32
sean-k-mooney	and that we are proably just hitting a real no valid host error because we are actully providing 10G to cpeh instead of 24	16:33
sean-k-mooney	so the ceph jobs failrues are not related to dansmith's recent changes to the job	16:33
sean-k-mooney	yes?	16:33
*** ociuhandu has joined #openstack-nova		16:34
dansmith	sean-k-mooney: yeah I thought we were assuming that	16:34
dansmith	because it's clearly just placement	16:34
dansmith	my job might be slower (or faster) causing us to hit it more than we were or something	16:34
sean-k-mooney	so we just either a.) swap back to file store to get the old behavior or b.) mount our loopback file in such a way that the bluestore block device uses our 24G loopback device instead of creating its own	16:35
dansmith	yeah so I figured going back to xfs would be ideal for compatibility with everything	16:35
dansmith	my system clearly gets xfs	16:35
dansmith	I assume the workers are getting blue because they're newer ubuntu or something	16:35
dansmith	I'm still on bionic	16:36
sean-k-mooney	dansmith: sure but eventurally we will have to move since i think filestore is deprected in ceph	16:36
dansmith	sure	16:36
dansmith	pain now or pain later	16:36
dansmith	pain later might be someone else's pain :P	16:36
sean-k-mooney	so i guess what we are looking for is a ceph config option to select filestore for the osd backend	16:36
sean-k-mooney	that or we set it on the osd create command	16:37
dansmith	yeah	16:38
sean-k-mooney	so this code https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L475-L484	16:38
*** ociuhandu has quit IRC		16:39
sean-k-mooney	that inital sudo ceph -c ${CEPH_CONF_FILE} osd create	16:39
dansmith	well, the other option is to figure out how to make blue use 20ish G instead of 10,	16:39
dansmith	which would be less impactful than retooling the mount stuff in the ceph plugin	16:39
sean-k-mooney	well we are mounting it on /var/lib/ceph	16:40
sean-k-mooney	so i guess this is already plugin specific	16:40
sean-k-mooney	we are likely resuing the same function that is used for cinder and just passing the mount path	16:40
sean-k-mooney	ya we are just calling create_disk https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L387	16:40
dansmith	cinder just wants a loop not a mounted fs though right/	16:41
sean-k-mooney	maybe this is the fucntion in devstack https://github.com/openstack/devstack/blob/eee60c76719c02c08dba7b7fb703798a056b22b9/functions#L758-L789	16:41
sean-k-mooney	that kind of looks like a hack	16:42
sean-k-mooney	e.g. that does not look like it was created orginally for ceph	16:42
melwitt	hm, I found this https://forum.proxmox.com/threads/proxmox-ceph-osd-partition-created-with-only-10gb.55291/	16:43
sean-k-mooney	oh its for swift orginially	16:43
*** maciejjozefczyk has joined #openstack-nova		16:44
sean-k-mooney	i think the "sudo ceph-osd -c ${CEPH_CONF_FILE} -i ${OSD_ID} --mkfs" is the one we would need to modify	16:46
sean-k-mooney	melwitt: that does seam like the same issue more or less	16:49
melwitt	yeah, I'm having trouble understanding it	16:49
melwitt	the last comment links to another post https://forum.proxmox.com/threads/where-can-i-tune-journal-size-of-ceph-bluestore.44000/ where they're talking about tuning journal size and bluestore_block_db_size and bluestore_block_wal_size	16:50
melwitt	and I don't know what any of that is or means	16:51
melwitt	(in ceph.conf)	16:51
*** markvoelker has quit IRC		16:52
*** maciejjozefczyk has quit IRC		16:59
*** k_mouza has quit IRC		16:59
sean-k-mooney	those are not realated to the data storage size of the osd	17:01
sean-k-mooney	blustore has an embeed database that track where the logic block are located on disk	17:01
sean-k-mooney	wal i think it the write ahead log or somethingl like that	17:02
sean-k-mooney	its part of how it does write journalling	17:02
sean-k-mooney	in both cases they are turning parmatner for how bluestore can save its metadata	17:03
sean-k-mooney	unlike file sotre it can save it inline in the blockdevice it is managening or it can save it oh external devices and they support tuneing of the sizing of them independelty	17:03
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Handle Neutron errors in _post_live_migration() https://review.opendev.org/729763	17:04
melwitt	sean-k-mooney: found a new thing https://bugzilla.redhat.com/show_bug.cgi?id=1597048	17:09
openstack	bugzilla.redhat.com bug 1597048 in RADOS "ceph osd df not showing correct disk size and causing cluster to go to full state" [High,Closed: notabug] - Assigned to bhubbard	17:09
dansmith	imagine that :)	17:10
melwitt	what	17:11
sean-k-mooney	it should be 3.7TB but is 10G	17:11
dansmith	melwitt: "not showing correct disk size"	17:11
melwitt	yeah?	17:11
melwitt	I'm still googling for why bluestore is maxed out at 10G	17:12
dansmith	melwitt: just saying, I think we've stumbled into a realization that our df reporting on ceph in libvirt es no bueno right?	17:12
melwitt	no	17:12
dansmith	oh did I miss something? I thought those RHN articles were indicating that we're reporting the wrong thing still	17:13
sean-k-mooney	https://bugzilla.redhat.com/show_bug.cgi?id=1597048#c8	17:13
openstack	bugzilla.redhat.com bug 1597048 in RADOS "ceph osd df not showing correct disk size and causing cluster to go to full state" [High,Closed: notabug] - Assigned to bhubbard	17:13
dansmith	like, if we're reporting the total size of the osd, but that's shared by vms and images, we'll be telling placement it can allocate all that space for instances but it can't	17:14
sean-k-mooney	so it look like they hever actully got to the root cause of why the bluestore file was a 10G file	17:14
sean-k-mooney	they just redeployed with file store an ignored it	17:14
*** songwenping_ has joined #openstack-nova		17:14
melwitt	yeah.. but then what does this mean? "The BlueStore block device was a file named with a block, not a symlink to block device partition of this disk and that file size was 10G hence it was showing the size of the OSD as 10G."	17:15
sean-k-mooney	i think they ment	17:15
sean-k-mooney	that in stead of it being a symlink to /dev/sdX	17:15
sean-k-mooney	it was a file named block	17:15
sean-k-mooney	that was 10G	17:15
sean-k-mooney	hence -rw-r--r--. 1 ceph ceph 10737418240 Jul 2 16:51 block	17:16
melwitt	right.. so you think it's correct that it's pointing at a file named block? and that the problem is that the file is not larger than 10G?	17:16
sean-k-mooney	they were expecting it to be a symlink to the actual hdd	17:16
sean-k-mooney	well in that case yes	17:16
sean-k-mooney	and also likely in our case	17:16
melwitt	ok. from reading that I thought maybe it was pointing wrongly at a file	17:17
*** songwenping__ has quit IRC		17:17
*** k_mouza has joined #openstack-nova		17:17
sean-k-mooney	/var/lib/ceph/osd/ceph-0/block is liekly a 10G file	17:17
dansmith	I think that bug is that they deployed on file instead of having the bluestore osd use the disk they wanted	17:17
sean-k-mooney	dansmith: yes	17:17
dansmith	we want file, they wanted disk, right?	17:17
sean-k-mooney	yes	17:18
melwitt	oh	17:18
melwitt	ok so why is /var/lib/ceph/osd/ceph-0/block only 10G ... who creates it ...	17:18
dansmith	right, that I think we still don't know.. where the 10G comes from and how we change it	17:19
dansmith	because the osd itself (the driver) seems to create that as a flat 10G file if it's not there	17:19
melwitt	yeah, at least before now I did not know that the 10G comes from the size of the file named "block" so now I'm gonna see if I can find where that file is created	17:20
*** k_mouza has quit IRC		17:21
*** k_mouza has joined #openstack-nova		17:22
*** k_mouza has quit IRC		17:27
melwitt	hm https://github.com/ceph/ceph/blob/master/src/common/legacy_config_opts.h#L940	17:28
*** hamalq has joined #openstack-nova		17:29
dansmith	lol	17:29
melwitt	https://github.com/ceph/ceph/blob/8c1a077e560248760ac441f315b84304aa693e72/src/common/options.cc#L4122	17:29
dansmith	so maybe we're supposed to create that block file to be what we want it to be	17:30
melwitt	looks like they changed the default to 100G at some point	17:30
dansmith	definitely obscure though	17:30
sean-k-mooney	dansmith: yes we are ment to create the file/partion first normaly when deploying ceph	17:30
melwitt	well I think you can set bluestore_block_size in ceph.conf no?	17:30
melwitt	oh	17:30
sean-k-mooney	am likely	17:31
sean-k-mooney	but ceph does not expect to have to create this normally	17:31
melwitt	https://github.com/ceph/ceph/commit/57890fce7064811780823e298b31e7fced2fa0e3	17:31
sean-k-mooney	if you use the tooling they provide tehy create the partions ahead of time	17:31
melwitt	that's more recent, change from 1 TB -> 100G default. but in older versions the default was 10G, trying to see when that was so we can compare with what version we're running	17:32
sean-k-mooney	this is the funtion that actully creates teh file https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5934	17:33
melwitt	v15.1.0 is Octopus	17:33
sean-k-mooney	if the block file is not present it creates it https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5931-L5934	17:34
melwitt	and somehow 'size' is passed in from the config option I assume	17:34
sean-k-mooney	that is what im currently trying to find yes	17:34
sean-k-mooney	this maybe https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5943-L5944	17:36
sean-k-mooney	ah no its here	17:38
sean-k-mooney	https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L6050-L6052	17:38
sean-k-mooney	so in the mkfs call	17:38
sean-k-mooney	so when we do this	17:39
sean-k-mooney	https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L482-L483	17:39
melwitt	ah yup, and it's pulling the conf option	17:39
sean-k-mooney	it cause the mkfs function to be invoked on the backend store	17:39
sean-k-mooney	whic for bluestore uses that config option to create the 10G file	17:39
melwitt	the interesting thing is, I wonder why it uses the legacy option and not the new one. I don't understand how that works in their code. cause in nautilus they have both the 10G and 100G default in the legacy conf vs the non	17:40
sean-k-mooney	if var/lib/ceph/osd/ceph-0/block is not a symplink to a device	17:40
sean-k-mooney	melwitt: its legacy on master	17:41
sean-k-mooney	it might not be on nautalius	17:41
melwitt	oh, I thought you mentioned earlier that CI is using nautilus	17:41
sean-k-mooney	actully its alos here on master https://github.com/ceph/ceph/blob/master/src/common/options.cc#L4127-L4131	17:41
sean-k-mooney	melwitt: yes it is	17:42
sean-k-mooney	actully just above that	17:42
sean-k-mooney	https://github.com/ceph/ceph/blob/master/src/common/options.cc#L4122-L4125	17:42
melwitt	yeah I'm saying it's weird that it's not defaulting to 100G like that is showing	17:42
melwitt	the old default was 10G	17:42
sean-k-mooney	yep	17:43
sean-k-mooney	we are pulling 14.2.2 https://github.com/ceph/ceph/blob/v14.2.2/src/common/options.cc#L4339	17:43
sean-k-mooney	which is 10	17:44
sean-k-mooney	they backported the 100G change to nautilus	17:44
sean-k-mooney	but its not in the tag we are pulling	17:44
sean-k-mooney	i think legacy_config_opts.h is just an old way to define config options	17:45
melwitt	ohhh	17:45
melwitt	good find. ok at least everything makes sense now	17:45
sean-k-mooney	rather then deprected by the way	17:45
sean-k-mooney	ya so i guess we just set that config option to say 20G?	17:45
sean-k-mooney	in ceph.conf	17:46
melwitt	yeah, seems like it	17:46
sean-k-mooney	which we can do here https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L415-L428	17:46
melwitt	yarp. just have to double check whether it's a "global" or what. are those config groups or?	17:47
sean-k-mooney	i like how this is basically undocumeted other then in the source code	17:47
sean-k-mooney	i think in global yes	17:47
melwitt	yeah, I know. they have a bluestore config doc but zero mention of this https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref	17:48
sean-k-mooney	iniset -sudo ${CEPH_CONF_FILE} global "bluestore_block_size" "20"	17:48
sean-k-mooney	is that right?	17:48
sean-k-mooney	i was search for 10_G but i think _G is a user defied suffix	17:49
sean-k-mooney	so now i need to find that	17:49
*** aj_mailing has joined #openstack-nova		17:50
sean-k-mooney	yep https://github.com/ceph/ceph/blob/8c1a077e560248760ac441f315b84304aa693e72/src/common/options.cc#L343-L345	17:50
melwitt	oh, is the unit GB or something else?	17:51
sean-k-mooney	its in bytes i think	17:52
sean-k-mooney	10_G is doing 10 << 32	17:52
sean-k-mooney	its a c++ 11 user defied literal https://en.cppreference.com/w/cpp/language/user_literal	17:52
sean-k-mooney	actuly its << 30 not 32	17:53
sean-k-mooney	but ya still bytes	17:53
sean-k-mooney	unsigned long long .... im glad they also defined a bettere way to name integers in c++11 so you dont have toe use that c way of naming types	17:54
melwitt	ok so you can't just put "20" in the conf	17:54
sean-k-mooney	i think we have to do 20<<30	17:54
sean-k-mooney	so 21474836480	17:55
melwitt	right	17:55
*** tesseract has quit IRC		17:59
sean-k-mooney	ill pretend tehy are not potting a unsigned long long into a size_t variant without asserting it fits	17:59
melwitt	:)	18:00
*** k_mouza has joined #openstack-nova		18:00
*** aj_mailing has quit IRC		18:01
*** aj_mailing has joined #openstack-nova		18:02
*** k_mouza has quit IRC		18:05
*** gmann is now known as gmann_lunch		18:13
dansmith	have ya'll fixed it yet?	18:16
sean-k-mooney	im looking at a linux bridge issue from the neutron channel currently but it looks liek we jsut need one more line here to set the config option https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L429	18:18
sean-k-mooney	dansmith: can you test it with your local setup	18:18
sean-k-mooney	just add iniset -sudo ${CEPH_CONF_FILE} global "bluestore_block_size" "21474836480"	18:19
dansmith	yup	18:19
dansmith	oh wait, I can't	18:19
dansmith	because mine doesn't use blue	18:19
dansmith	but I can float a patch and get jobs going	18:19
sean-k-mooney	ya that works	18:19
sean-k-mooney	i dont have a ceph env currently i could set one up but its almost half past 7 on a friday so dont want to wait for it to stack :)	18:20
*** ociuhandu has joined #openstack-nova		18:22
dansmith	sean-k-mooney: dude, you need to cut yourself off :)	18:24
dansmith	https://review.opendev.org/#/c/742961/	18:25
*** ociuhandu has quit IRC		18:27
dansmith	I think the nova team needs to have the keys to sean-k-mooney's irc bouncer so we can turn it off when it's time for him to sleep	18:27
sean-k-mooney	hehe i dont use one i just dont trun my laptop off :P	18:28
dansmith	like giving car keys to the bartender	18:28
melwitt	sean-k-mooney laptop and dev box permanently ON	18:28
dansmith	sean-k-mooney: well, then an ssh account to your laptop I guess	18:29
sean-k-mooney	melwitt: yes they more or less are.	18:29
dansmith	melwitt: more like sean-k-mooney permanently ON	18:30
melwitt	man, what was I doing earlier	18:30
melwitt	true	18:30
dansmith	laptop sleep timer be like "jesus when is he going to go to bed, I'm exhausted"	18:30
melwitt	haha yeah	18:30
artom	dansmith, I think we'll need remote access to his fuse box...	18:31
artom	First we'll need to invent an SSHable fuse box...	18:31
* artom checks		18:31
dansmith	artom: he's a property owner now, so we can't go to the landlord	18:31
dansmith	artom: no need to cut the power just reset his luks key	18:32
* artom was half expecting remotable fuse boxes to exist, because IoT		18:32
artom	But then they'd have a crappy web UI with 'password' hardcoded as the admin password	18:33
artom	So maybe not	18:33
melwitt	yeah, I would not be surprised	18:33
sean-k-mooney	dansmith: oh your mean resting the luks key on my laptop would be a pain to fix	18:34
dansmith	sean-k-mooney: no, we can reset it back to the one you know when you should be online	18:34
sean-k-mooney	ah ok	18:34
dansmith	hah	18:35
sean-k-mooney	speaking of which o/	18:35
dansmith	good :)	18:35
melwitt	wait, weren't you on pto today too? what the heck	18:35
*** slaweq has joined #openstack-nova		18:35
sean-k-mooney	yesterday	18:35
sean-k-mooney	well untill today	18:36
dansmith	maybe /melwitt/ needs the sleep	18:36
sean-k-mooney	i was at a funeral	18:36
sean-k-mooney	so ya back for a "shortish" day of not very stressful things	18:36
sean-k-mooney	i planned to leave after teh bug call but got distraced	18:36
sean-k-mooney	anyway food	18:36
sean-k-mooney	o/	18:37
melwitt	well we swapped running the bug call today so I was like why is sean here	18:37
melwitt	have a nice weekend o/	18:37
artom	melwitt, he was on PTO wednesday, so unable to send out the email	18:37
artom	And usual email + run the call go hand in hand	18:37
melwitt	no, I swapped with him and I was supposed to send the email	18:37
melwitt	I just forgot to	18:37
artom	Right, so swap means you get to run the call as well	18:37
melwitt	I know, and I did	18:38
artom	But... he's allowed to be there for the call	18:38
melwitt	I know he's allowed to be there lol	18:38
artom	WELL WHAT THE HELL ARE WE ARGUING ABOUT	18:38
melwitt	I just thought if we swapped cause he was out, I was surprised when he was there	18:38
melwitt	I DONT KNOW	18:38
artom	WHY ARE WE YELLING	18:38
melwitt	WE ARE HAVING TROUBLE CONTROLLING THE VOLUME OF OUR VOICE	18:39
artom	OH RIGHT I STARTED IT IM SO SORRY	18:39
melwitt	s/TROUBLE/DIFFICULTY/	18:40
artom	Umm, how about we do real work for a bit? Is there a way to see stats for a particular job? nova-ceph-multistore just failed twice on me	18:40
dansmith	lol	18:40
dansmith	dude	18:40
artom	mriedem would have hacked up a logtash query in seconds	18:40
dansmith	have you like paid attention to the last three hours in here at all?	18:40
melwitt	omg	18:40
dansmith	and also, logstash is still fubar I think	18:40
melwitt	no u di'nt	18:40
artom	dansmith, me? Pay attention? lol u cray cray	18:41
dansmith	apparently ;)	18:41
*** xinranwang__ has quit IRC		19:04
*** huaqiang has joined #openstack-nova		19:18
*** gmann_lunch is now known as gmann		19:19
mriedem	i only do logdna queries these days now anyway	19:22
*** dklyle has quit IRC		19:26
*** dklyle has joined #openstack-nova		19:30
*** dklyle has quit IRC		19:40
melwitt	dansmith: I opened https://bugs.launchpad.net/nova/+bug/1888895 for the gate failure. going to mail the ML now	19:43
openstack	Launchpad bug 1888895 in devstack-plugin-ceph "nova-ceph-multistore job fails often with 'No valid host was found. There are not enough hosts available.'" [Undecided,In progress]	19:43
dansmith	cool	19:43
melwitt	great, the WIP patch just failed	19:44
melwitt	[errno 110] error connecting to the cluster wtf	19:44
dansmith	maybe that config made it fail to start?	19:46
dansmith	no ceph logs	19:47
dansmith	hrm, don't even see it set the ini,	19:47
dansmith	so maybe it didn't even get that far	19:47
melwitt	yeah must be unless it's a fluke coincidence that ceph totally bombed this time	19:50
dansmith	I think it must be a fluke bombing because it didn't run that line	19:51
*** dklyle has joined #openstack-nova		19:51
dansmith	ah here we go: 2020-07-24 18:32:36.045 \| /opt/stack/devstack-plugin-ceph/devstack/lib/ceph: line 429: (24G: value too great for base (error token is "24G")	19:53
melwitt	ah	19:55
dansmith	hacky fix	19:55
*** ralonsoh has quit IRC		19:57
*** dklyle has quit IRC		20:04
dansmith	melwitt: oops, should have put the bug on that, sorry	20:05
dansmith	but it'll need cleanup	20:05
melwitt	ah yeah	20:05
melwitt	I was just thinking, I think all ceph jobs in openstack are broken over this, not just ours	20:05
dansmith	potentially, but like I said, my job may be running slower or have different behaviors	20:06
dansmith	actually	20:07
melwitt	yeah. I was thinking of replying to mention other ceph jobs may be affected too. I see the older version 14.2.2 being pulled in openstack/tempest, for example	20:07
dansmith	heh, I just saw a glance failure on the plain ceph job which is the same novalidhost	20:07
dansmith	so I think yeah.	20:07
dansmith	yeah	20:07
* melwitt nods		20:07
dansmith	this is from one of my glance patches: https://e81e6b81331830d4903c-5acdef5dc10478cee5291df1596ec66a.ssl.cf1.rackcdn.com/742065/9/check/devstack-plugin-ceph-tempest-py3/292fd21/testr_results.html	20:08
melwitt	ah yeah. and this is the tempest job failure I was looking at https://1ca2ee7583d21788b1d8-42b9b3ca9891e58d539431fcfb5b799d.ssl.cf2.rackcdn.com/742836/2/check/devstack-plugin-ceph-tempest-py3/547fab7/testr_results.html	20:09
melwitt	(I picked a recent patch proposed to the tempest repo)	20:09
dansmith	cool	20:09
*** dklyle has joined #openstack-nova		20:09
dansmith	that's awesome because of two things:	20:09
dansmith	1. I didn't break stuff	20:10
dansmith	2. I get to steal the credit from sean-k-mooney for fixing more people!	20:10
melwitt	lol awww	20:10
*** ociuhandu has joined #openstack-nova		20:10
dansmith	dang, now I better give him honorable mention in the commit message ;P	20:10
melwitt	hey, I found the default config value too	20:11
melwitt	after that he blew past me finding how/where it was used	20:11
dansmith	heh	20:12
dansmith	he gave me a line to copy/paste, so...	20:12
melwitt	yeah, that's what it's all about	20:13
*** ociuhandu has quit IRC		20:15
*** dave-mccowan has joined #openstack-nova		20:25
*** mriedem has left #openstack-nova		20:58
*** ociuhandu has joined #openstack-nova		22:02
*** ociuhandu has quit IRC		22:07
*** _erlon_ has quit IRC		22:23
*** raildo has quit IRC		22:28
*** dave-mccowan has quit IRC		22:44
*** mlavalle has quit IRC		23:02
*** martinkennelly has quit IRC		23:09
*** hamalq has quit IRC		23:10
*** tonyb[m] has left #openstack-nova		23:21
*** bbowen has quit IRC		23:34
*** bbowen has joined #openstack-nova		23:35
*** tosky has quit IRC		23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!