15:00:56 <slaweq> #startmeeting neutron_ci
15:01:01 <openstack> Meeting started Tue Dec  8 15:00:56 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:03 <slaweq> welcome again :)
15:01:05 <openstack> The meeting name has been set to 'neutron_ci'
15:01:24 <bcafarel> hi again
15:02:47 <slaweq> lets wait few more minutes for other folks
15:03:46 <lajoskatona> Hi
15:03:55 <bcafarel> it gives me more time to watch progress/failure on https://review.opendev.org/c/openstack/neutron/+/766000
15:04:40 <slaweq> ok, lets start and do that quickly
15:04:49 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:04:56 <slaweq> #topic Actions from previous meetings
15:05:01 <slaweq> bcafarel to check and update doc https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html
15:05:09 * bcafarel looks for link
15:05:49 <bcafarel> https://review.opendev.org/c/openstack/neutron/+/765959 I added also note on neutron-tempest-plugin template and *-master jobs
15:06:57 <slaweq> thx
15:07:03 <slaweq> I will check that later
15:08:49 <slaweq> next one
15:08:51 <slaweq> ralonsoh to report and check issue with TestSimpleMonitorInterface in functional tests
15:08:56 <slaweq> ralonsoh is not here today
15:09:18 <slaweq> but as this hits us pretty often recently, I spent some time yesterday to check that
15:09:23 <slaweq> LP https://bugs.launchpad.net/neutron/+bug/1907068
15:09:25 <openstack> Launchpad bug 1907068 in neutron "Functional test neutron.tests.functional.agent.linux.test_ovsdb_monitor.TestSimpleInterfaceMonitor.test_get_events" [Critical,In progress] - Assigned to Slawek Kaplonski (slaweq)
15:09:25 <slaweq> Patch https://review.opendev.org/c/openstack/neutron/+/765792
15:09:34 <slaweq> I hope that this will help with that issue
15:09:38 <slaweq> so please review it :)
15:10:21 <bcafarel> universal solution (aka add some sleep() ) detected :)
15:10:28 <slaweq> bcafarel: yes :/
15:10:37 <slaweq> but I don't see better way to solve it really
15:11:38 <slaweq> next one
15:11:40 <slaweq> slaweq to check if test_dhcp_port_status_active will be still failing after https://review.opendev.org/c/openstack/neutron/+/755313 will be merged
15:11:47 <slaweq> it didn't help for sure
15:11:56 <slaweq> so bug reported https://bugs.launchpad.net/neutron/+bug/1906654
15:11:57 <openstack> Launchpad bug 1906654 in neutron "neutron_tempest_plugin.api.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_dhcp_port_status_active is failing often" [Critical,Confirmed]
15:12:05 <slaweq> and Skip proposed https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/765327
15:12:31 <slaweq> it's already merged so should give us a breath on CI
15:12:40 <slaweq> but we need to investigate why this happens really
15:12:53 <slaweq> next one
15:12:56 <slaweq> slaweq to report LP about SSH failures in the neutron-ovn-tempest-ovs-release-ipv6-only
15:12:59 <slaweq> Bug https://bugs.launchpad.net/neutron/+bug/1906490
15:13:00 <openstack> Launchpad bug 1906490 in neutron "SSH failures in the neutron-ovn-tempest-ovs-release-ipv6-only job" [Critical,Confirmed]
15:13:01 <slaweq> Patch to skip failing test https://review.opendev.org/c/openstack/neutron/+/765070
15:13:34 <slaweq> as in most or all cases it's this one test which is failing, I proposed to skip it for now
15:13:41 <slaweq> so please review that patch too :)
15:15:56 <slaweq> and that's all regarding last week
15:16:00 <slaweq> lets move on
15:16:12 <slaweq> anything regarding stadium projects or stable branches?
15:17:28 <bcafarel> not much from my side, https://review.opendev.org/c/openstack/requirements/+/764021 still not merged for ussuri
15:17:54 <bcafarel> and I did not check stable branches yet much, though nobody complained they were broken so it should be OK
15:18:15 <slaweq> :)
15:18:22 <lajoskatona> for stadium: the patch to make unsatble some bgpvpn tests was just merged
15:18:38 <slaweq> lajoskatona: yes, I saw
15:18:46 <slaweq> and it makes job to be voting again, right?
15:19:06 <lajoskatona> so the job is voting with again
15:19:14 <lajoskatona> yes exactly
15:19:21 <slaweq> ++
15:21:15 <slaweq> W
15:21:29 <slaweq> sorry
15:21:49 <slaweq> so I think we can move on
15:21:55 <slaweq> #topic Grafana
15:22:40 <slaweq> in overall check queue seems to be in very bad state due to that issue with pip dependencies
15:23:14 <slaweq> also functional and fullstack jobs are failing 100% of times
15:25:29 <bcafarel> sigh
15:26:08 <bcafarel> for pip deps I am playing whack-a-mole "fix one dep, have new error" in https://review.opendev.org/c/openstack/neutron/+/766000
15:26:17 <slaweq> for fullstack it seems that error is similar https://zuul.opendev.org/t/openstack/build/d9c4e4ac0b794b7f8c667249a9eb3a35
15:26:22 <slaweq> but not exactly the same
15:26:51 <slaweq> it's the same for functional and fullstack
15:26:53 <lajoskatona> bcafarel: LOL
15:26:53 <bcafarel> yes that sounds like similar root cause (new resolver in pip or whatever)
15:27:30 <slaweq> lajoskatona: bcafarel: as You already playing with it, can You also check fullstack and functional jobs?
15:28:14 <bcafarel> at least to pass CI we will need all of them fixed indeed
15:28:44 <lajoskatona> slaweq: yeah, I think I ask around if some common wisdom come from infra team or similar place
15:28:56 <slaweq> lajoskatona: thx
15:29:26 <slaweq> please use this LP which I reported to track all those issues, I don't think we need another one for those fullstack/functional jobs too
15:29:47 <bcafarel> +1
15:30:10 <slaweq> thx
15:31:34 <slaweq> other than that I don't have too many other things to discuss today
15:32:32 <slaweq> in most cases I think that we are hitting known issues like failure with TestSimpleMonitor in functional job (should be fixed with sleep()), fullstack lack of resources which should be fixed by lajoskatona's patch
15:32:46 <slaweq> and neutron-ovn-tempest-ovs-release-ipv6-only issues with ssh
15:34:13 <slaweq> I have just one more topic to discuss
15:34:19 <slaweq> few days ago I sent email http://lists.openstack.org/pipermail/openstack-discuss/2020-December/019240.html
15:34:27 <slaweq> about kernel panics in guest vms
15:35:38 <slaweq> according to the log there we should try to use "noapic" option during boot of the vm
15:36:00 <slaweq> do You know if there is any way we can do that from our jobs? (I'm not expert there and I don't really know that)
15:36:12 <slaweq> or we would need to change cirros image to achieve that?
15:37:14 <bcafarel> hmm
15:37:46 <bcafarel> I think this can be done in qemu itself (passing this kind of option), I guess nova folks will know better
15:39:13 <slaweq> according to sean-k-mooney reply nova can provide such option so IIUC currently it isn't possible really
15:39:41 <sean-k-mooney> currently we cant disable it no
15:39:56 <sean-k-mooney> we woudl have to add a new flag to the glance image properties
15:40:05 <sean-k-mooney> this look like its a cirrios kernel bug
15:40:20 <sean-k-mooney> i dont think the 5.3 kernel has been patched
15:41:20 <sean-k-mooney> cirros is i belive more or less unmaintianed so we will eventurally have to move to somethign like alpine to file the same role as a longterm solution
15:41:41 <slaweq> sean-k-mooney: are You talking about https://alpinelinux.org/ ?
15:41:45 <slaweq> or something else?
15:41:48 <sean-k-mooney> yes that
15:42:12 <slaweq> but I saw there that those images are much bigger than cirros
15:42:13 <sean-k-mooney> its the smallest actvily developed os that woudl be a resonable replacement
15:42:24 <sean-k-mooney> not tha much
15:42:34 <sean-k-mooney> they are still ~ 40mb
15:42:45 <sean-k-mooney> less then 100mb certenly
15:43:00 <sean-k-mooney> we have a min vm size of 1GB
15:43:14 <slaweq> http://dl-cdn.alpinelinux.org/alpine/v3.12/releases/x86_64/alpine-virt-3.12.1-x86_64.iso
15:43:18 <slaweq> this one, right?
15:43:30 <bcafarel> and with larger community around too, when you use cirros people often reply "oh you are from openstack"
15:43:48 <sean-k-mooney> slaweq: ya one sec
15:44:06 <sean-k-mooney> https://review.opendev.org/q/topic:%22alpine%22+(status:open%20OR%20status:merged)
15:44:40 <sean-k-mooney> i started adding support alpine but i hit an initramfs issue using the minimal filesystem image
15:44:49 <sean-k-mooney> which si for contienr and chroots
15:45:03 <sean-k-mooney> i had planned to eventually swap to that iso as a base
15:45:09 <sean-k-mooney> but have not had time to work on it
15:45:32 <sean-k-mooney> short termif ths is blocking the gate
15:45:33 <slaweq> ok, I will try to check this iso maybe locally
15:45:54 <sean-k-mooney> i can add a flag to nova quickly for this
15:46:01 <sean-k-mooney> and then ping the core team to review
15:46:11 <sean-k-mooney> e.g. to disabel the ioapic
15:46:12 <slaweq> it's not blocking the gate but I see some jobs failed due to kernel panics in guest vm at least few times a week
15:46:55 <slaweq> for sure we have more urgent and critical issues but fixing somehow this one would also help, I think not only Neutron but also other projects :)
15:47:16 <sean-k-mooney> ill bring it up on thet nova channel shortly and get buy in if there is no objectsion ill quickly write a patch as a workaround
15:47:31 <slaweq> sean-k-mooney: would be great, thx a lot
15:47:31 <sean-k-mooney> ya it showe up in the nova jobs too
15:48:46 <slaweq> ok, that was the last thing which I had for this week
15:49:47 <slaweq> if You don't have anything else, I will give You few minutes back
15:50:32 <slaweq> thx for attending the meeting
15:50:35 <bcafarel> sounds good
15:50:38 <slaweq> have a great week
15:50:40 <bcafarel> o/
15:50:40 <lajoskatona> Bye
15:50:41 <slaweq> o/
15:50:44 <slaweq> #endmeeting