bauzas | morning folks | 08:20 |
---|---|---|
opendevreview | Amit Uniyal proposed openstack/nova master: For evacuation, ignore if task_state is not None https://review.opendev.org/c/openstack/nova/+/848886 | 08:43 |
opendevreview | Amit Uniyal proposed openstack/nova master: add regression test case for bug 1978983 https://review.opendev.org/c/openstack/nova/+/849104 | 09:06 |
opendevreview | Amit Uniyal proposed openstack/nova master: For evacuation, ignore if task_state is not None https://review.opendev.org/c/openstack/nova/+/848886 | 09:06 |
auniyal_ | how should we write zuul recheck cmd | 11:31 |
auniyal_ | so I want to run zuul, recheck for 4 jobs | 11:32 |
sean-k-mooney | auniyal_: you cant and that by design | 11:47 |
sean-k-mooney | auniyal_: we don not allow indivigual jobs to be rechecked seperatly | 11:47 |
auniyal_ | okay, so have to run all jobs, by giving recheck only ? | 11:48 |
sean-k-mooney | correct. you can tirgger third party ci seperately | 11:50 |
auniyal_ | thanks sean-k-mooney | 11:50 |
sean-k-mooney | but first party ci will run all jobs together | 11:50 |
auniyal_ | earlier I saw somewhere, that we should not run all jobs if only 1 or 2 job fails and can only run by single jobs using recheck <something> <job-name>, but tried to look in https://zuul-ci.org/docs/zuul/latest/ couldn't find it | 11:54 |
sean-k-mooney | the confirution is per pipeline and we expeictly do not allow that in openstack | 12:08 |
sean-k-mooney | zuul may support that but we do not allow that in openstack under the green check policy | 12:09 |
sean-k-mooney | all jobs on the check run must use the same revison fo the code | 12:09 |
sean-k-mooney | if you recheck indivicual jobs that woudl not be the case | 12:09 |
bauzas | folks, we have a problem with tempest.api.compute.admin.test_live_migration.LiveMigrationTest.test_live_migration_with_trunk | 12:31 |
* bauzas tries to look how many checks we have a problem with it | 12:31 | |
bauzas | https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30h,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_live_migration_with_trunk),sort:!()) | 12:35 |
bauzas | looks like it's https://bugs.launchpad.net/nova/+bug/1940425 | 12:35 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Poison /sys access via various calls in test https://review.opendev.org/c/openstack/nova/+/844627 | 12:35 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add compute restart capability for libvirt func tests https://review.opendev.org/c/openstack/nova/+/850510 | 12:35 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename [pci]passthrough_whitelist to device_spec https://review.opendev.org/c/openstack/nova/+/843834 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename exception.PciConfigInvalidWhitelist to PciConfigInvalidSpec https://review.opendev.org/c/openstack/nova/+/843861 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename whitelist in tests https://review.opendev.org/c/openstack/nova/+/843862 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Basics for PCI Placement reporting https://review.opendev.org/c/openstack/nova/+/846187 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Extend device_spec with resource_class and traits https://review.opendev.org/c/openstack/nova/+/846218 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reject PCI dependent device config https://review.opendev.org/c/openstack/nova/+/846435 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reject mixed VF rc and trait config https://review.opendev.org/c/openstack/nova/+/846436 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Ignore PCI devs with physical_network tag https://review.opendev.org/c/openstack/nova/+/846219 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reject devname based device_spec config https://review.opendev.org/c/openstack/nova/+/846466 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support [pci]device_spec reconfiguration https://review.opendev.org/c/openstack/nova/+/846470 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Stop if tracking is disable after it was enabled before https://review.opendev.org/c/openstack/nova/+/847009 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move provider_tree RP creation to PciResourceProvider https://review.opendev.org/c/openstack/nova/+/850546 | 12:36 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow enabling PCI tracking in Placement https://review.opendev.org/c/openstack/nova/+/850468 | 12:36 |
gibi | bauzas: I see 10 hits in the last 7 day: https://paste.opendev.org/show/b8BF5WsTcwJojALnC5J0/ so it is become a bit more frequent than when I reported that bug | 12:38 |
bauzas | I found 226 hits from the last 7 days | 12:38 |
gibi | I don't really know how to parse the opensearch query. did you just queried for 'test_live_migration_with_trunk' ? that is all the runs of the test case including when the test passed, isn'tit? | 12:42 |
gibi | also this 'from:now-30h,to:now' does not seem to be 7 days | 12:42 |
gibi | I filtered for the nova-next job runs, so my number can be smaller than the global number for sure | 12:43 |
gibi | bauzas: that is probably closer to the 7 days query of all jobs https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:%22test_live_migration.py%22),sort:!()) | 12:47 |
gibi | I queryd for "test_live_migration.py" do that filters out passing test cases (the test case name is printed ther but not the file) | 12:48 |
bauzas | gibi: yeah, I just checked for the testname | 12:48 |
bauzas | as the testname is only provided with a FAILURE | 12:48 |
gibi | nope | 12:48 |
gibi | this is a passing test with testname {1} tempest.api.compute.admin.test_live_migration.LiveAutoBlockMigrationV225Test.test_live_migration_with_trunk [30.171614s] ... ok\ | 12:48 |
bauzas | see the buildrate | 12:49 |
bauzas | it's 100% | 12:49 |
bauzas | but agreed, I could query it better | 12:49 |
gibi | does opensearch filters out SUCCESS runs automatically? | 12:51 |
gibi | or why the passing runs not appear? | 12:51 |
gibi | the string "test_live_migration_with_trunk" is in the job-output.txt for passing runs too | 12:51 |
gibi | so something magic happens in opensearch to filter those | 12:51 |
bauzas | dunno, just testing this new tool | 12:53 |
gibi | I don't like magic :D | 12:53 |
bauzas | well, at least the failure rate seems high and not related to one specific job | 12:54 |
gibi | yeah I don't think it is related to nova-next at all, I just needed a way to limit my query | 12:56 |
gibi | my tool don't do a full search on all job results as that would require to download all the job results and logs locally | 12:57 |
gibi | and that is not feasible | 12:57 |
gibi | I've just run a widened search on all the nova devstack based jobs, it is 16 hits in 7 days for me and it is hitting nova-next and nova-grenade-multinode in nova | 12:58 |
*** mfo is now known as Guest5969 | 13:03 | |
*** mfo_ is now known as mfo | 13:03 | |
bauzas | ovs-hybrid-plug job too | 13:05 |
gibi | bauzas: good point. I missed that job in my config | 13:17 |
gibi | this way I get 34 hits in the last 7 days filtered for nova jobs | 13:18 |
*** dasm|off is now known as dasm | 13:51 | |
bauzas | I'm getting mad at 1940425 | 14:58 |
bauzas | bug 1940425 I mean | 14:58 |
bauzas | gibi: wdyt we could do for https://bugs.launchpad.net/nova/+bug/1940425 | 15:07 |
bauzas | we already wait for 60secs | 15:07 |
bauzas | ? | 15:08 |
gibi | bauzas: would make sense pinging neutron folks | 15:08 |
bauzas | failure rate is so high that we can't merge things by now | 15:10 |
bauzas | mmm, so melwitt had a thought on https://github.com/openstack/tempest/blob/97be23ea6402649652991983f3f2b85873eba4d8/tempest/api/compute/admin/test_live_migration.py#L285 maybr not required for all neutron backends | 15:33 |
bauzas | slaweq: around ? | 15:36 |
melwitt | ^ it was sean-k-mooney's thought that I repeated :) sean's the one who knows tons of stuff about nova <--> neutron | 15:44 |
bauzas | if I'm not getting it wrong, nova isn't only impacted | 15:45 |
bauzas | this is also hitting cinder | 15:46 |
sean-k-mooney | melwitt: so the basis of this is that for ovn we only plug one port into ovs. and the trunking is implemented as openflow rules on that port | 15:59 |
sean-k-mooney | for ml2/ovs and ml2/linux bridge | 16:00 |
sean-k-mooney | we careate a ovs or linux bridge per trunkport | 16:00 |
sean-k-mooney | in the ovs case each sub port is create as a patch port pair between the br-int and br-trunk### | 16:00 |
sean-k-mooney | and the br-int side of the patch port is tagged with the neutron port uuid | 16:01 |
sean-k-mooney | so ml2/ovs reports the status fo the port as up/active | 16:01 |
sean-k-mooney | once it finishes wiring them up | 16:01 |
sean-k-mooney | for ml2/ovn i have no idea if it bothers to report the subports as up since they dont exist on the dataplane level | 16:02 |
bauzas | sean-k-mooney: except the ovn-hybrid-plug job, this is also failing on nova-next | 16:02 |
melwitt | I think my brain just exploded from reading that | 16:02 |
bauzas | ovs-hybrid-plug, my bad | 16:02 |
bauzas | we also have the grenade-multinode which hits such bug | 16:03 |
sean-k-mooney | bauzas: ack so its failing regardesll of how its implemneted | 16:03 |
gibi | the interesting part that it is not 100% failure in nova-next, so in some case neutron does bring up the subport | 16:03 |
bauzas | yup | 16:03 |
sean-k-mooney | it could be a timing thing | 16:03 |
bauzas | this looks a transient issue | 16:03 |
bauzas | we already wait for 60s | 16:03 |
gibi | but 60sec is a lot | 16:03 |
sean-k-mooney | neutron is not ment to send network vif pulgged for the parent | 16:03 |
sean-k-mooney | untill all the subports are setup | 16:03 |
sean-k-mooney | maybe it does nto take that into account | 16:03 |
sean-k-mooney | and sends it only once the parent is set up | 16:04 |
sean-k-mooney | meaning we might be racing | 16:04 |
gibi | yeah that make sense | 16:04 |
sean-k-mooney | nova only has one port attached to the vm so we only care about the parent | 16:04 |
gibi | that will depend on how was tempest querying the state | 16:04 |
sean-k-mooney | i dont think the parent shoudl really be active if the subports are not active | 16:05 |
melwitt | this is the tempest test https://github.com/openstack/tempest/blob/97be23ea6402649652991983f3f2b85873eba4d8/tempest/api/compute/admin/test_live_migration.py#L285 | 16:05 |
sean-k-mooney | but honestly that is proably an impmentation detail that noone should depend on | 16:05 |
sean-k-mooney | i dont think this was defiended in teh specs | 16:05 |
sean-k-mooney | so the test might just be asserting stuff that is not required/guarenteed by the api | 16:05 |
sean-k-mooney | im pretty sure this was the relevent spec https://specs.openstack.org/openstack/neutron-specs/specs/newton/vlan-aware-vms.html | 16:07 |
sean-k-mooney | gibi: melwitt so ya reading that quickly the status of the subport is not defeined in relation to port binding | 16:10 |
dansmith | sean-k-mooney: on the event, it should send the event once the thing the port represents can pass traffic, right? so subport or not, we shouldn't get the alert until the traffic will flow | 16:10 |
dansmith | else we're wiring up to something we can't expect to get dhcp or other critical traffic through | 16:10 |
sean-k-mooney | correct | 16:10 |
dansmith | calling the trunk up because one side is active isn't good enough | 16:10 |
sean-k-mooney | we should not get the network-vif-plugged for the trunk parent untill everything is configured to allow all taffic to flow | 16:11 |
dansmith | right, so depending on the backend implementation that may come depending on what wiring needs to happen | 16:11 |
sean-k-mooney | but they may not have implemented that depened status check | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add more test coverage for devname base dev spec https://review.opendev.org/c/openstack/nova/+/844625 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Extra tests for remote managed dev spec https://review.opendev.org/c/openstack/nova/+/844626 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Unparent PciDeviceSpec from PciAddressSpec https://review.opendev.org/c/openstack/nova/+/844491 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix PciAddressSpec descendants to call super.__init__ https://review.opendev.org/c/openstack/nova/+/844565 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Remove dead code from PhysicalPciAddress https://review.opendev.org/c/openstack/nova/+/844628 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Clean up mapping input to address spec types https://review.opendev.org/c/openstack/nova/+/845765 | 16:11 |
dansmith | right but that would be a neutron issue | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Remove unused PF checking from get_function_by_ifname https://review.opendev.org/c/openstack/nova/+/845775 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix type annotation of pci.Whitelist class https://review.opendev.org/c/openstack/nova/+/845780 | 16:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move __str__ to the PciAddressSpec base class https://review.opendev.org/c/openstack/nova/+/845781 | 16:11 |
sean-k-mooney | dansmith: yes it would | 16:11 |
dansmith | ack, just confirming ;) | 16:12 |
sean-k-mooney | so your suggesting that marking it confirmed for nova on the bug is wrong and we should likely change it | 16:12 |
dansmith | I dunno about that I just want to be clear that nova shouldn't be trying to interpret vif-plugged differently for trunks | 16:13 |
sean-k-mooney | bauzas: gibi by the way with the escalation and everything that happened in the last few days i have not been doing upstream bug triage this week sorry | 16:13 |
gibi | sean-k-mooney: no worries | 16:13 |
sean-k-mooney | dansmith: agreeed | 16:13 |
bauzas | sean-k-mooney: no worries at all, again, bug triage is just down any prio | 16:13 |
sean-k-mooney | dansmith: nova should jsut care about thte one port that is attached to the vm (the trunk) | 16:13 |
sean-k-mooney | the rest is up to neutron to care about | 16:14 |
dansmith | yes | 16:14 |
sean-k-mooney | dansmith: if tempest should check this at all is proably TBD | 16:14 |
gibi | maybe the tempest test verify the system from neutron perspective hence the assert on the subport too | 16:14 |
dansmith | sean-k-mooney: also true | 16:14 |
sean-k-mooney | gibi: yes but in that casae it is indicating that neutron is not correctly seting up the trunk | 16:15 |
gibi | yes | 16:15 |
gibi | I agree | 16:15 |
sean-k-mooney | i would suggest seting the nova part to incomplete for now | 16:15 |
sean-k-mooney | as its not clear that nova shoudl be doing anything it is not already doing | 16:15 |
gibi | works for me | 16:20 |
gibi | later we can set it to invalid if turn out only neutron needs a fix | 16:20 |
opendevreview | Billy Olsen proposed openstack/nova master: Handle mdev devices in libvirt 7.7+ https://review.opendev.org/c/openstack/nova/+/838976 | 16:24 |
opendevreview | Amit Uniyal proposed openstack/nova master: For evacuation, ignore if task_state is not None https://review.opendev.org/c/openstack/nova/+/848886 | 19:30 |
*** melwitt_ is now known as melwitt | 21:10 | |
*** dasm is now known as dasm|off | 21:25 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!