Friday, 2016-01-22

openstackgerritMerged openstack/networking-ovn: Expose ovs-vswitchd log to file
openstackgerritMerged openstack/networking-ovn: Fix in create_router and update_router
openstackgerritMerged openstack/networking-ovn: HOST_IP is missing in computenode-local.conf.sample
openstackgerritNuman Siddique proposed openstack/networking-ovn: Store all the fixed ips of a port in one entry in Logical_Port.addresses
*** numans has joined #openstack-neutron-ovn12:49
Sam-I-Amrussellb: morningses14:07
russellbmorning, in meetings and such until afternoon14:09
Sam-I-Amwell, when you get some time14:10
Sam-I-Amotherwise, have a fun morning :)14:10
regXboirussellb: ping?14:47
russellbSam-I-Am: do you think you could write up something about your mtu ideas?  sounds like something maybe we should discuss on the ovs dev list if there's changes needed in ovn?14:53
regXboiI'm reading OVS tea leaves this morning and want to make sure I'm reading them close to correct14:53
russellbi could use some tea before this next meeting14:53
russellbbut i think i'll get coffee instead (brb)14:53
regXboiso (1) the insert idl operation from networking-ovn comes to ovn-controller via an update2 jsonRPC method14:55
Sam-I-Amrussellb: i wrote it up on the -dev list as it pertains to linuxbridge, but the same concepts apply to any virtual networking with tunnels14:55
regXboi(2) the code the processes that method does a loop over all of the possible ops (which allows a single jsonRPC call to do multiple things)14:55
regXboiand then each found ops is processed separately by ovsdb_idl_process_update214:56
regXboidoes that sound about right?14:56
russellbSam-I-Am: ok i'll try to catch up14:59
russellbregXboi: i'd have to read the code15:00
Sam-I-Amrussellb: its pretty involved, but there's a tl;dr at the beginning that might help.15:02
Sam-I-Amsome of the background might be useful too if you dont mess with mtu much15:03
regXboirussellb: well, don't do that - you've got more important things to chase down - I'll make the assumption that I'm correct and add some decoration to see about verifying it15:05
regXboiok, yes I've verified that I read the code correctly... now to get enough instrumentation in that I can put everything together15:22
russellbSam-I-Am: i dont ... i live in nicely theoretical dev land15:26
Sam-I-Amrussellb: lol. i was a network engineer for a while.15:27
Sam-I-Amdoing things that should not be done, like stuffing ethernet into T1s15:27
*** azbiswas has joined #openstack-neutron-ovn16:07
*** salv-orlando has joined #openstack-neutron-ovn16:08
*** salv-orlando has quit IRC16:12
*** armax has joined #openstack-neutron-ovn16:33
*** salv-orlando has joined #openstack-neutron-ovn16:47
mamulsowrussellb: I ran a test where I tried putting 4k routers connected to one shared private network, things got really bad around 1200 routers connected to that single network/subnet17:32
mamulsowby really bad I mean pretty much all neutron calls started hanging, and ovn-northd died17:33
mamulsowregXboi tells me that it's something of a known issue that too many ports on one subnet is a performance problem17:34
mamulsowdo you want a bug for this or is it just a known issue?17:34
* regXboi notes use of handle and wanders in17:35
russellbmamulsow: so you had N networks with N corresponding routers all connected to a single shared network17:42
russellband it went boom17:42
regXboirussellb: pretty much, yes17:42
russellbno, i wouldn't say that's a known issue exactly17:42
russellbovn-northd should certainly recover if you stopped creating new stuff17:43
russellbsounds like it got to a point where it couldnt' keep up though17:43
mamulsowwell, I think the bad stuff wasn't from the N routers connected to N networks, I think it was the N networks connected to 1 network17:43
mamulsowsorry, N routers connected to 1 network17:44
russellbsounds like an important thing to profile and work on for tenant network support17:44
mamulsoweach of the 1200 routers had two interfaces, 1 to a private network that was 1:1 with the routers and the other interface was to a single shared network17:44
russellbmaybe lower priority if your interest is in provider networks to start17:45
russellbright, so this would be like if you had 1200 tenants17:45
russellball connected to the same shared public net or whatever17:45
russellbmakes sense17:45
Sam-I-Amwhich is a common deployment method17:45
russellbthe most common, even17:45
russellbthough not with OVN yet since we don't have NAT..17:46
Sam-I-Amits less wasteful of ips17:46
mamulsowwell, normally that shared network would be an external network, not a regular private network17:46
russellbmamulsow: right17:46
russellbi'd probably want to profile ovn-northd on this one17:47
mamulsowso I'm not sure what the impact of this being another router interface instead of a router gateway to an external network17:47
russellbit's probably just not keeping up with the number of logical flows needed to descirbe this17:47
mamulsowyeah, there wasn't anything useful in the ovn-northd log after this happened17:48
mamulsowI think I can easily reproduce though17:48
Sam-I-Ammamulsow: so the networks on both ends of the routers are private?17:48
Sam-I-Amhmm... its probably a similar set of operations17:49
mamulsowwouldn't be a real world situation, but figured it would be close to simulating what we would do with NAT17:49
Sam-I-Amif you're not testing ip traffic, couldn't you test the conventional way of connecting each router to a provider net?17:49
Sam-I-Amjust testing creation times17:49
regXboiSam-I-Am: I'm already pretty much doing that17:50
-openstackstatus- NOTICE: Restarting zuul due to a memory leak17:51
russellbmamulsow: yes, it's a good simulation17:51
russellbthough there's kind of 2 paths right now for private networks and NAT17:52
russellbone of them looks like how ML2+OVS usess provider networks today17:52
russellband the other is using OVN gateways17:52
russellbwhich wouldn't necessarily involve logical routers for every tenant network17:52
regXboirussellb: is there a helper method to dump a osvdb_idl_row structure as a string?17:52
russellbregXboi: you looking at the C code?17:53
regXboirussellb: ack - I'm looking at the C code17:53
russellbI don't know17:53
Sam-I-AmregXboi mamulsow are you working on the same thing?17:53
regXboiSam-I-Am: not precisely17:53
Sam-I-Amhas there been any testing of neutron-ns-metadata-proxy, being that it still resides as a conventional neutron agent?17:55
Sam-I-Amand has a history of being greedy on resources17:55
regXboiSam-I-Am: I've not looked at that to date17:56
Sam-I-Ami'm not as worried about the dhcp agent17:59
Sam-I-Ambut it also consumes resources17:59
mamulsowhmm, so I updated my test so now it only puts 200 routers per shared network and ran again18:16
mamulsowunfortunately it looks very similar to the first time I ran it18:17
mamulsowonce the total number of routers got up into the 800+ range things started getting very slow, and around 1000 neutron calls started returning with errors or just hanging18:18
mamulsowI wonder if I'm killing things with DHCP agents18:22
mamulsowlet me try to clean this mess up and try again without dhcp agents18:23
Sam-I-Ammamulsow: metadata agents too?18:25
russellbmamulsow: i'm guessing it's related to the size of the logical flow table18:30
russellbsame with ovn-controller performance18:30
russellbthat we worked on the other day18:30
russellbif you're brave, you could check with $ ovn-sbctl lflow-list | wc -l18:31
russellbor if you want the exact # of logical flows ...18:31
russellbovn-sbctl lflow-list | grep -v Datapath | wc -l18:32
russellbsomething like that18:32
openstackgerritRussell Bryant proposed openstack/networking-ovn: Revert "Deployment: Update with OVN DB requirements"
openstackgerritKyle Mestery proposed openstack/networking-ovn: Revert "Deployment: Update with OVN DB requirements"
mesteryrussellb: You beat me to it :)18:41
mesteryBy 30 seconds!18:41
mamulsowSam-I-Am: yes metadata agent is running there, but so far I haven't booted any VMs in it18:45
mamulsowI mean I've booted plenty of VMs in this environment, but just not part of that test18:46
mamulsowrussellb: I'm started ovn-sbctl lflow-list | wc -l, but it hasn't returned yet18:47
Sam-I-Ammamulsow: is it spawning a process for each subnet?18:47
russellbmamulsow: heh, yeahhhhh18:48
russellbit may take a bit.18:48
* mamulsow leaves it running18:48
Sam-I-Amrussellb: re your comment on 271091, i was curious myself and asked someone from infra. apparently only non-standard projects (whatever that means) should use build_sphinx... and the central gate jobs.18:51
mamulsowSam-I-Am: yes, metadata agent per subnet, that and the dhcp agent per subnet are very likely the reason the rabbit nodes are crying18:51
Sam-I-Amif you look at other projects, you'll see one or the other18:51
russellbSam-I-Am: ok just curious18:52
mamulsowI see between 10-60 metadata agent processes per compute node18:52
Sam-I-Ammamulsow: you can probably turn both of those off to get bare performance18:52
mamulsowyeah, I'll stop metadata agent, and create the subnets with dhcp disabled18:52
Sam-I-Ammamulsow: meltdowns from the md agent processes are a common complaint in #openstack18:53
russellbSam-I-Am: weird thing to melt down18:53
Sam-I-Amwe might need regXboi to look at the code18:53
Sam-I-Amyeah, it is weird. i think its just not been a focus for performance improvements.18:53
* russellb nods18:53
Sam-I-Ama lot of folks give up and use config drive18:53
Sam-I-AmregXboi: you've been volunteered for making the metadata agent suck less :)18:54
regXboiis this like being volunteered to run for PTL?18:54
Sam-I-Amthe metadata agent is mostly just a proxy for nova metadata, so it might be easy to emulate it elsewhere18:55
russellbconfig drive ++18:55
Sam-I-AmregXboi: not as painful... maybe?18:55
regXboiwell - I'm not signing up :)18:56
Sam-I-Amrussellb: i vote for config drive too, but there are going to be people who want conventional metadata18:56
Sam-I-Amhowever, the network guide scenario can include whatever we recommend, and other stuff is ymmv18:57
Sam-I-Amusually with some sort of note about "this'll work, but expect issues for scaling"18:58
Sam-I-Amsame with using the conventional l3 agent18:58
mamulsowso dumb question… neutron-ns-metadata-proxy != neutron-metadata-agent?18:59
mamulsowI’ve stopped neutron-metadata-agent on all of the compute nodes, but I still see neutron-ns-metadata-proxy processes on all the compute nodes18:59
mamulsowso either 1) my understanding of the world is wrong or 2) those processes are hanging around when they shouldn't be19:01
mamulsowI'm thinking 1 is more likely :)19:01
russellbyes, we should make metadata work19:02
russellbi have no idea if it works today19:02
Sam-I-Ammamulsow: each namespace gets a neutron-ns-metadata-proxy process19:05
Sam-I-Amif the md agent was running when you created those subnets, they'll stick around19:05
mamulsowah, thanks, so I can just go kill those now?19:06
Sam-I-Ami think you can providing you're not booting vms expecting to use it19:06
mamulsowI'll cross that bridge when I get to it, but for this testing I'll start with just getting routers and see if there's an upper limit there19:08
mamulsowsounds like I need to go separately work on scaling rabbitmq to handle the dhcp/metadata agent load19:09
Sam-I-Ameliminating the other stuff is a good idea19:09
Sam-I-Amget a good baseline19:09
Sam-I-Amrussellb: outside of your comment about calling sphinx, i think the patch is ok. it was more or less something to test whether or not the gate logic patch merged right... which is did. now docs patches go through quickly.19:11
russellbok, will +2 if you want19:11
Sam-I-Ami could probably make the gate logic better, but those other jobs are minimal19:12
Sam-I-Amrussellb: un-wipped19:12
Sam-I-Ami might add some stuff for doc8 later19:12
Sam-I-Amstepping out for a bit19:16
mamulsowfyi, "ovn-sbctl lflow-list | wc -l" is still running19:22
*** azbiswas has quit IRC19:26
shettygmamulsow: you likely don;t have ovsdb-server connection to ovn-sbctl.19:30
shettygit is a bug in both ovn-nbctl and ovn-sbctl that they "hang"19:30
mamulsowah, ok19:31
mamulsowdoh, I was running that on a compute node, not the node with sb db19:32
openstackgerritKyle Mestery proposed openstack/networking-ovn: Vagrant: Fix issue with boxes
openstackgerritKyle Mestery proposed openstack/networking-ovn: Vagrant: Adjust HOST_IP for compute nodes
openstackgerritMerged openstack/networking-ovn: Revert "Deployment: Update with OVN DB requirements"
*** salv-orlando has joined #openstack-neutron-ovn21:18
openstackgerritMerged openstack/networking-ovn: Modify docs build environment
Sam-I-Amrussellb: moo.22:51
