Tuesday, 2023-04-04

wxy-xiyuanopenEuler mirror(timeout problem) should be fixed by https://review.opendev.org/c/openstack/diskimage-builder/+/878807 Let's wait more.02:50
opendevreviewmelanie witt proposed openstack/devstack master: DNM Create flavors with disk encryption when enabled  https://review.opendev.org/c/openstack/devstack/+/86416003:55
*** elodilles_pto is now known as elodilles08:04
kopecmartin#startmeeting qa15:00
opendevmeetMeeting started Tue Apr  4 15:00:11 2023 UTC and is due to finish in 60 minutes.  The chair is kopecmartin. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'qa'15:00
kopecmartin#link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting#Agenda_for_next_Office_hours15:00
kopecmartinagenda ^^15:00
lpiwowaro/15:00
fricklero/15:00
kopecmartin\o15:01
kopecmartin#topic Announcement and Action Item (Optional)15:03
kopecmartinnothing from my side, other than we've had PTG last week15:03
kopecmartinwhich leads me to15:04
kopecmartin#topic OpenStack Events Updates and Planning15:04
kopecmartinthis was the PTG etherpad15:04
kopecmartin#link https://etherpad.opendev.org/p/qa-bobcat-ptg15:04
kopecmartinif you couldn't attend the session, feel free to go through that15:04
kopecmartinI'll send an email summarizing the discussions some time this week15:05
lpiwowarack +115:06
kopecmartin#topic Bobcat Priority Items progress15:06
kopecmartinthis is the previous etherpad with the priority items15:06
kopecmartin#link https://etherpad.opendev.org/p/qa-antelope-priority15:06
kopecmartinI've moved the unfinished items to the new one15:07
kopecmartin#link https://etherpad.opendev.org/p/qa-bobcat-priority15:07
kopecmartingmann: do we still need to/want to track SRBAC?15:07
kopecmartinhowever, i haven't translated the PTG discussions into new priority items yet 15:07
kopecmartinI'll try to work on it this week too15:08
kopecmartinif you see something in the PTG discussion which you know is worth of tracking it as a priority item, feel free to draft something in the new priority etherpad 15:08
kopecmartinnow business as usual15:09
kopecmartin#topic Gate Status Checks15:09
kopecmartin#link https://review.opendev.org/q/label:Review-Priority%253D%252B2+status:open+(project:openstack/tempest+OR+project:openstack/patrole+OR+project:openstack/devstack+OR+project:openstack/grenade)15:09
kopecmartinnothing there15:09
kopecmartinany other urgent reviews?15:10
lpiwowarMaybe this when you have time please: https://review.opendev.org/c/openstack/tempest/+/878074 ? 15:10
kopecmartinsure15:10
lpiwowarthanks!15:10
kopecmartin#topic Bare rechecks15:12
kopecmartin#link https://etherpad.opendev.org/p/recheck-weekly-summary15:12
kopecmartinno new data there15:12
kopecmartin#topic Periodic jobs Status Checks15:12
kopecmartinstable15:12
kopecmartin#link https://zuul.openstack.org/builds?job_name=tempest-full-yoga&job_name=tempest-full-xena&job_name=tempest-full-wallaby-py3&job_name=tempest-full-victoria-py3&job_name=tempest-full-ussuri-py3&job_name=tempest-full-zed&pipeline=periodic-stable15:12
kopecmartinmaster15:12
kopecmartin#link https://zuul.openstack.org/builds?project=openstack%2Ftempest&project=openstack%2Fdevstack&pipeline=periodic15:12
kopecmartinvery green, nice! \o/15:13
kopecmartinno failures we haven't known about15:13
kopecmartin#topic Distros check15:13
kopecmartincs-915:14
kopecmartin#link https://zuul.openstack.org/builds?job_name=tempest-full-centos-9-stream&job_name=devstack-platform-centos-9-stream&skip=015:14
kopecmartinfedora15:14
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-fedora-latest&skip=015:14
kopecmartindebian15:14
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-debian-bullseye&skip=015:14
kopecmartinfocal15:14
kopecmartin#link https://zuul.opendev.org/t/openstack/builds?job_name=devstack-platform-ubuntu-focal&skip=015:14
kopecmartinrocky15:14
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-rocky-blue-onyx15:14
kopecmartinopenEuler15:14
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-openEuler-22.03-ovn-source&job_name=devstack-platform-openEuler-22.03-ovs&skip=015:14
fricklerseems we need a periodic 2023.1 job now?15:14
kopecmartinyes15:15
fricklerfor openeuler there was a dib fix mentioned earlier that might help with the timeouts15:15
frickler#link https://review.opendev.org/c/openstack/diskimage-builder/+/87880715:15
fricklerianw: ^^ maybe you can have a look15:15
kopecmartinfrickler: that's great, thanks15:17
frickleralso seems we don't run wallaby and older anymore, so you could clean up the above periodic link15:17
kopecmartinyeah, i'll update the agenda, i did some modification before the meeting, but haven't updated everything :/ 15:18
kopecmartin#topic Sub Teams highlights15:19
kopecmartinChanges with Review-Priority == +115:19
kopecmartin#link https://review.opendev.org/q/label:Review-Priority%253D%252B1+status:open+(project:openstack/tempest+OR+project:openstack/patrole+OR+project:openstack/devstack+OR+project:openstack/grenade)15:19
kopecmartinnothing there15:19
kopecmartin#topic Open Discussion15:19
kopecmartinanything for the open discussion?15:19
fricklerI started another round of abandoning old devstack patches15:19
fricklerreviving some the don't seem completely outdated15:20
kopecmartinperfect, thanks15:21
fricklerI also started testing devstack on bookworm, now that it is mostly frozen15:21
fricklersadly it seems that global pip installs are no longer wanted15:21
fricklermaybe a good opportunity to finally set up some global venv instead15:22
fricklernot sure how much time I'll have to invest in that, so anyone joining in the effort would be most welcome15:23
fricklerhttps://paste.opendev.org/show/bCElNqBBVUCMOB957ZEj/ is what happens in a test run15:24
fricklerI'll also try to look into dib support so we can actually run things in opendev CI15:25
clarkbworth noting the next ubuntu lts will also break global pip installs15:27
kopecmartinhm, perfect, something to look forward to :D 15:28
clarkbalso you can test global venv without bookworm images15:29
frickleryes, I think if/when we do this, we would switch everything, not make distro specific things15:29
clarkbI have/had changes up for it from years ago that didn't get a lot of momentum. I'm happy for others to pick that up and push updates or abandon my changes and push new ones15:29
clarkbbut I am unlikely to be able to devote dedicated time to it now15:29
frickleryes, I should revisit those patches15:31
fricklerguess that's it for now?15:36
fricklermaybe that would also help with the ceph jobs15:36
kopecmartini guess so15:36
* kopecmartin updating the agenda 15:36
kopecmartin#topic Bug Triage15:36
kopecmartin#link https://etherpad.openstack.org/p/qa-bug-triage-bobcat15:36
kopecmartinthe new bug number tracker ^15:37
kopecmartinthat's all from my side15:37
kopecmartinif there isn't anything else, let's end the office hour15:37
fricklerack, thx kopecmartin 15:38
kopecmartinthanks everyone15:38
kopecmartin#endmeeting15:38
opendevmeetMeeting ended Tue Apr  4 15:38:23 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:38
opendevmeetMinutes:        https://meetings.opendev.org/meetings/qa/2023/qa.2023-04-04-15.00.html15:38
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/qa/2023/qa.2023-04-04-15.00.txt15:38
opendevmeetLog:            https://meetings.opendev.org/meetings/qa/2023/qa.2023-04-04-15.00.log.html15:38
lpiwowarthanks kopecmartin15:45
*** iurygregory_ is now known as iurygregory16:24
opendevreviewMerged openstack/tempest master: Add project reader to account-generator  https://review.opendev.org/c/openstack/tempest/+/87807417:17
gmanndansmith: if you are around, can you please check this grenade change https://review.opendev.org/c/openstack/grenade/+/87911319:44
dansmithgmann: yep19:47
JayFHeads up, just letting you all know we are testing an Ironic upgrade issue with the grenade job, that setting MYSQL_REDUCE_MEMORY:false seems to clear up19:54
JayFwe've gotten 2 clean passes on an intermittant issue, will recheck the grenade-only change a few times to get more information19:54
dansmithdid we switch that to default on?19:54
JayFhttps://review.opendev.org/c/openstack/ironic/+/879494 and https://review.opendev.org/c/openstack/ironic/+/879495/19:54
gmanni do not think so, we defaulted that yet19:54
dansmithI thought not19:54
JayFI thought it was in https://github.com/openstack/devstack/commit/7567359755a105e7278bbf97541332f28228b87d#diff-108978819c05ae183d88ec87959c2341a94cfc3f9465e3aeee82d554217b4f58R70119:55
JayFif that's not true this is a red herring and we just got "lucky" (which is really unlucky)19:55
dansmithonly for one job19:55
dansmiththe devstack-multinode job19:55
JayFah19:55
dansmithare you inheriting from that job/19:55
JayFyeah I see below, it's defaulted to false19:55
JayFwell, i'm glad I said somehting because we were well down the creek of bad assumptions based on a misreading the first time19:56
JayFiurygregory: ^ 19:56
gmannyeah, default is false and only multinode job enable it + few in nova19:56
JayFiurygregory: tl;dr it shouldn't be enabled unless either we explicitly enabled it or we're using the multinode job19:56
JayFiurygregory: so the passes are probably just coincidence :( 19:56
dansmithit's certainly possible that flag can cause unintended things resulting in failures, but .. probably only if enabled :)19:58
iurygregoryhummm19:58
dansmithwhat are the failures you thought were related?19:58
JayFA vip is unable to be pinged in neutron after upgrade, I believe is the symptom19:58
JayFiurygregory has been looking more in depth and might have a log that could be linked19:58
JayFI don't have one at hand19:58
dansmithand there was some pointer to database issues?19:59
opendevreviewMartin Kopec proposed openstack/tempest master: Enable file injection tests  https://review.opendev.org/c/openstack/tempest/+/87951019:59
JayFdansmith: more of, it spawned us to go spewlunking in failure statistics vs recent changes19:59
JayFdansmith: it lined up and anytime something is intermittant I assume performance could be a component19:59
dansmithokay20:00
dansmithgood at least that we didn't find the first data point of that flag breaking things I guess ;)20:08
JayFgood that I mentioned it in here and got the misreading exposed before we ended up WAY down the wrong path20:08
JayFthanks for that :D 20:09
dansmithmy half of the deal seems better, but..sure :)20:10
iurygregoryso, TheJulia was the one who found out that we were getting a 404 from the DB20:25
iurygregoryI could only track till the point we see the flavor wasn't created in nova20:25
iurygregoryhttps://zuul.opendev.org/t/openstack/build/21790f36aef540d9a7af420356baeafb/log/controller/logs/grenade.sh_log.txt#190320:26
iurygregoryhttps://zuul.opendev.org/t/openstack/build/21790f36aef540d9a7af420356baeafb/log/controller/logs/screen-n-api.txt#422420:26
JayFthat's different than the failures I saw20:27
iurygregoryfun20:28
iurygregory.-.20:28
frickleras I mentioned in #-nova, that is normal OSC behavior20:28
fricklerit checks whether the value works as ID, then tries to look up the name after that20:28
iurygregoryI also remember Dmitry was looking and he saw a problem that was related to float-ip not being pingable20:28
JayFyeah, that's the one I've dug into any at all20:29
iurygregoryso maybe we have different issues based on cloud-providers...20:29
TheJuliaBut then how is it the instance creation fails on the flavor not being found ?20:29
fricklerwhere do you see that?20:31
TheJuliaThe post response on the job log I linked to a few hours ago20:32
fricklerin the above run, the failure in grenade is: 2023-04-03 11:09:29.388 | HttpException: 501: Server Error for url: http://10.209.0.254/compute/v2.1/servers/e635780d-a265-4ce8-b350-328f6ee7fc7f/action, The requested functionality is not supported.20:33
TheJuliaLiterally the instance creation post resulted in a failure where the object cannot be found in the db20:33
TheJuliaBottom of the n-api log20:33
fricklercan you post the link again to make sure we are looking at the same job?20:34
TheJuliaYeah, give me a few, I stepped away due to a migraine20:35
TheJuliaAnd a cat is seeking attention20:35
fricklero.k., it's late here, I can take another look my morning then20:36
TheJuliafrickler: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b1d/879215/3/check/ironic-grenade/b1d6cf4/controller/logs/screen-n-api.txt20:40
TheJuliafrickler: Apr 03 22:47:03.961414 appears to be the start of error, which appears to be tied to the last logged post of the instance creation which the apache logs records as resulting in a 500 at 22:47:04. At least, that is the last actual apache log entry related that seemed to be related. The n-api logs seem like there is something talking to it directly outside of apache20:42
TheJuliaso there is a possibility that the logs could still be red-herring wise, but nothing to the post requesting the instance creation seems logged and the timing seems to match up, so just crazy weird in my book20:43
TheJuliahmm, the logged request says "request: GET /compute/v2.1/flavors/baremetal"20:44
TheJuliathe actual /compute/v2.1/servers post is not in the nova-api log at all :\20:45
TheJuliaeven though it is in the apache logs20:45
dansmithsome sort of split-brain going on?20:49
dansmithlike there's two novas and you create the flavor in one but try to boot from the other?20:49
dansmitheither at the api level or there's two (api) databases?20:49
dansmiththat would be similar to the rbac "can't see the flavor because of permissions" issue someone else recommended20:50
dansmithand to be clear, this is only on the ironic jobs, not the regular grenade job?20:50
TheJuliahttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b1d/879215/3/check/ironic-grenade/b1d6cf4/controller/logs/apache/access_log.txt the apache log20:55
TheJuliaso, rbac wise, unless the old policy was removed, it is not enforced on the job20:55
TheJuliathe ironic job running grenade specifically20:55
TheJuliaso... there shouldn't be a split brain sort of case, but that *is* what it kind of feels like since nova-api logs don't reallly seem to jive with the apache logs20:56
JayFit'd be interesting to inspect a held node in this broken state, unless we think any potential split-brain is ephemeral 20:56
TheJuliahttps://zuul.opendev.org/t/openstack/build/b1d6cf4027924c01a61fe07281ad3dd420:57
TheJuliahttps://github.com/openstack/ironic/blob/master/zuul.d/ironic-jobs.yaml#L845-L93320:57
TheJuliaJayF: Yeah, I think that is the path forward, tbh20:57
dansmithor just throw a couple debug commands into the job, dumping all the processes and the keystone catalog20:58
dansmithI feel like holding a node is cheating, but obviously easier for debugging20:58
TheJulia:)20:59
TheJuliaJayF: should we just ask opendev to hold nodes from ironic-grenade failing?21:06
opendevreviewMerged openstack/grenade master: Fix tempest venv constraints for target branch tempest run  https://review.opendev.org/c/openstack/grenade/+/87911321:06
JayFI wouldn't say "nodes". A node, sure. I don't want to ask as I don't have time to dedicate this this right now, I can help but don't wanna lead.21:06
TheJuliaI can, but they are not going to drop it until tomorrow most likely if it catches one today21:07
TheJuliajust speaking from end of day experiencce21:07
TheJulia:)21:07
dansmithI just want to clarify - you don't see this in any grenade jobs, only the ironic grenade job right?21:11
dansmithassuming so, I guess I'd be looking at what you could possibly be doing differently here21:11
TheJuliawell, we create a baremetal flavor, and grenade is informed to use it21:13
dansmithI guess your special flavor is one othing21:13
TheJuliayeah, that *should* basically be it off the top of my head21:13
dansmithis that done on the old side?21:13
TheJuliaflavor is created on the old side21:13
TheJuliabut it immediately fails it looks like21:13
dansmithand the failure happens post-upgrade?21:13
TheJuliabut returns a 20021:13
dansmithoh, still fails on the old side?21:14
TheJuliathat is what it looks like based upon the nova api logs21:14
TheJuliawhich I think is why this seems so confusing21:14
dansmithyeah, old I see21:15
dansmithApr 03 22:41:05.982907 np0033643434 devstack@n-api.service[85463]: DEBUG nova.api.openstack.wsgi [None req-1e785296-d62a-442e-b51a-24c4719a6953 admin admin] Action: 'create', calling method: <bound method FlavorManageController._create of <nova.api.openstack.compute.flavor_manage.FlavorManageController object at 0x7f3eddbf5fc0>>, body: {"flavor": {"name": "baremetal", "disk": 10, "ram": 1024, "rxtx_factor": 1.0, "vcpus": 1, "id": null,21:16
dansmith "swap": 0, "os-flavor-access:is_public": true, "OS-FLV-EXT-DATA:ephemeral": 0}} {{(pid=85463) _process_stack /opt/stack/old/nova/nova/api/openstack/wsgi.py:511}}21:16
dansmiththat's the create21:16
TheJuliayup21:16
dansmithand yeah, literally the next operation is a show which fails21:17
TheJuliaooooh21:18
TheJuliadifferent PIDs21:18
TheJuliathe last not found is 8546421:19
dansmiththe first one is the same pid as the post to create it though21:19
TheJuliayup21:19
TheJuliaand 63 is the one the failed get is on as well :\21:20
dansmiththat's what I meant.. created on 63, failed immedately on 6321:21
dansmithmaybe you mentioned earlier,21:21
dansmithbut the devstack log shows that it creates the flavor and then sets a bunch of properties on it, which all succeed21:22
TheJuliaI just noticed the different PIDs in play, but yeah :\21:22
dansmithI believe you recognized it as a distinct lack of necessary jiveness21:22
TheJuliaI think that was a single post21:22
TheJuliameh, a lot goes over my head when I've got a migraine21:23
dansmithno,21:23
dansmithit's like six different openstack commands21:23
TheJuliahmm21:24
TheJuliaoh, yup, I see it21:25
dansmithI wonder if those are failing and osc isn't telling you21:25
TheJuliathis is a good question21:26
* TheJulia looks for the apache log21:26
TheJuliaapache reports 200 response codes21:27
dansmithso the several flavor shows right after that seem to line up, time-wise21:27
dansmithah, okay21:29
TheJuliaig ues the client is doing the name matching?21:29
TheJuliaerr, I guess21:29
dansmithso the first show that fails,21:29
dansmithis it trying to get it by id baremetal, which is wrong,21:29
TheJuliaat least based upon the query pattern21:29
dansmithso then the client does a list, 20021:29
dansmithand then get by uuid, 20021:29
dansmithah yep, what you said.. the client is resolving the name21:29
dansmithtries the name as the id first21:29
dansmithnote how the base devstack flavors create with an id:21:30
dansmith2023-04-03 22:36:37.827408 | controller | + lib/nova:create_flavors:1258             :   openstack --os-region-name=RegionOne flavor create --id 5 --ram 16384 --disk 160 --vcpus 8 --property hw_rng:allowed=True m1.xlarge21:30
dansmithya'll don't, so you only have the name21:30
dansmithand the id is auto-assigned as a uuid21:31
TheJuliadoes the type of id impact the ability to reference the name?21:34
dansmithand tempest is configured with ids directly not names21:34
dansmithtbh, I didn't think we allowed flavors by name in the API, and relied on the client to resolve, but don't quote me on that21:34
dansmithbut at the very least that might account for the difference21:35
dansmithmaybe we do, but we suck at it or something21:35
dansmithcertainly if there are two with the same name somehow, that would be a problem21:35
TheJulia... okay, regardless of it seeming weird, why now I guess is another question21:35
TheJuliaso, first() is by result in the logic path on orm query21:36
dansmithwell, it's not even now, because it's on the grenade old side, but yeah I dunno21:36
TheJuliabut... if we purely rely upon client21:36
dansmithis this on jammy?21:36
TheJuliadunno, wouldn't think that would impact it21:36
dansmithperhaps something mysql server-side changed some would-have-been-stable-order-ing?21:36
TheJuliapossibly21:36
TheJuliaI guess we'll find out soon? :)21:36
dansmithI mean, try switching it to id and see if it goes away and if so, then ask "why" :)21:37
TheJuliaI do think we changed the job to jammy at some point last cycle21:37
JayFIt might be interesting science, depending on how hard... ^ yes to try that21:37
dansmithJayF: it should be trivial since the rest of devstack/tempest uses ids21:37
dansmithjust change the command to add `--id baremetal` should do it21:38
dansmithand maybe mangle the name to something else just for clarity in further debug, since the id and the name don't match on the others21:38
dansmiththis name and id business is legacy ec2 stuff, AFAIK21:38
TheJuliathat is a good data point21:38
dansmithcould also grep out the generated id and use that instead, but it'd be easier to read if you force the id to something, as grepping the logs for 'baremetal' is much easier21:39
TheJuliaack, well, I guess lets get some more data and figure out the next step from there21:40
dansmithour api ref says id or URL to a flavor when booting21:43
dansmithso it's possible that tempest is resolving the flavor for you, inconsistently perhaps21:44
dansmithI haven't looked at our code to see if we try harder than that21:44
dansmithbut, prove it fixes the problem and we can chase it down further21:44
TheJuliain this case I believe it is an openstack server create call21:45
TheJulianot tempest yet afaik21:45
dansmithum, what's doing that then? the devstack run on the old side finishes right?21:46
TheJuliathat is a good question21:46
dansmithoh, resource create I see21:46
TheJuliaok21:46
dansmithwhere grenade creates some things to see if the survive to the other side21:46
TheJuliayup21:46
dansmithso,21:47
dansmiththat's actually failing with a 500: 2023-04-03 22:52:04.438 | Unknown Error (HTTP 500)21:47
TheJuliayup, the post returns that per the apache log21:47
dansmiththis:21:48
dansmith10.208.224.11 - - [03/Apr/2023:22:47:04 +0000] "POST /compute/v2.1/servers HTTP/1.1" 500 803 "-" "python-novaclient"21:48
* TheJulia cracks open the nova api code21:48
TheJuliayup21:48
dansmithokay it's got the flavor id in the server create21:51
dansmithand I see the flavor resolution attempt by osc before that21:51
dansmithso nova-api or apache seems to die right as the  create comes in21:51
TheJuliaor it freezes, or was frozen21:54
dansmithand nothing happens for 5.5 minutes after that21:54
TheJuliait could have come in on either process21:54
TheJuliawe see some possibly unrelated stuff on... I think it was process 64 in the log21:54
dansmithnova starts talking to neutron before it freezes21:54
dansmithI wonder if we're out of apache threads with us calling back to ourselves so much here21:55
* TheJulia raises an eyebrow21:55
clarkbI think the default is somewhere around 200 on ubuntu. Not sure if devstack changes that21:55
TheJuliahmm, no nova conductor log entries21:55
clarkbfor apache threads I mean. It uses the combo worker tpe where it forks X processes and each process has Y threads too by default21:56
dansmithneutron is also doing an arse-ton of DB traffic at the time21:57
dansmithso we could try turning dbcounter off21:58
TheJuliahmmm... I wonder if your on to something with apache21:58
dansmithlike well over 200 SELECT calls over the course of a few seconds right when we're hanging here21:59
TheJuliaoh... we could be deadlocking on the db then21:59
TheJuliasomehow21:59
dansmithprobably not on select calls21:59
TheJuliaunlikely unless there is also a create in flight on the same table someplace22:00
dansmithit is doing a few insert/delete/updates, but at least it's "a lot of activity"22:00
dansmithapache is returning 500 though and if it were timing out waiting for nova or neutron, that'd be a 50322:00
TheJuliaI'd expect a timeout eventually, maybe22:00
dansmiththe thing is all that db activity on the neutron side is with no more api requests coming in22:01
dansmithso it's like super busy doing I don't know what22:01
dansmithactuall,y22:02
dansmithI don't think it *ever* fields another api query after that22:02
TheJuliathat does seem to be the case22:03
dansmithyeah, so we see the POST /compute in the apache log,22:04
dansmithwe see that hit nova-api,22:04
dansmithwe see nova say "I'mma go hit neutron" but we never see a request against /network in apache and right around that time, neutron stops logging queries but is doing a ton of db stuff22:05
TheJuliajust a few seconds later, devstack starts shutting down, fwiw... first command loged at 22:41:2422:05
dansmithokay, so wait,22:06
dansmithapache logs that 500 way out of order22:06
TheJulialikely, when it is *done*22:06
TheJuliaor at least finishes the write out22:06
dansmithit logs it five minutes late, right22:06
dansmithso there is apache activity in between there,22:06
dansmithtalking to placement and ironic22:06
dansmithbut not neutron22:06
dansmithso I wonder if that's it waiting on nova which is waiting on neutron, and then timeout22:06
dansmiththis, btw is the last thing nova-api really says:22:07
dansmithApr 03 22:47:04.809417 np0033643434 devstack@n-api.service[85463]: DEBUG nova.network.neutron [None req-0528b41f-c502-4239-bfa8-9c2f40bc440c nova_grenade nova_grenade] validate_networks() for [('0ffa5ccc-6329-4664-ac88-4e778b324d8b', None, None, None, None, None)] {{(pid=85463) validate_networks /opt/stack/old/nova/nova/network/neutron.py:2623}}22:07
TheJuliado you have the time when nova connects out to neutron handy?22:07
dansmith^22:08
TheJuliaack thanks!22:08
dansmiththat's just the last thing we see, so.. probably sometime after that,22:08
dansmithbut that lines up with the last neutron api call I think22:08
dansmithwhich is:22:08
dansmithApr 03 22:47:05.157665 np0033643434 neutron-server[86238]: INFO neutron.wsgi [req-0528b41f-c502-4239-bfa8-9c2f40bc440c req-36ff5c21-ff3d-4a41-b3b6-f2b2a8867435 nova_grenade nova_grenade] 10.208.224.11 "GET /v2.0/ports?tenant_id=0049ae1d7885406099a43056d3e73ecc&fields=id HTTP/1.1" status: 200  len: 210 time: 0.066336422:08
dansmiththat's the last time neutron.wsgi logs anything22:08
dansmithso likely whatever we do after that22:08
dansmithhttps://github.com/openstack/nova/blob/3886f078dea50baa062c732a0bd9f653e35e09cc/nova/network/neutron.py#L263022:09
dansmithyeah, that code requires special skills to understand, which I don't have22:12
dansmithbut I think the first call to _ports_needed() is, at least, doing a show on each port22:12
TheJuliaso, I *think* the bump in selects is the neutron object model at play22:12
dansmithwhich we see the first and only22:12
TheJuliaI'm semi-curious if we're getting to https://github.com/openstack/nova/blob/3886f078dea50baa062c732a0bd9f653e35e09cc/nova/network/neutron.py#L2654 based upon the neutron log22:13
dansmithI mean.. it's a dang lot of selects22:13
dansmithI don't think we are22:13
dansmithI really need to go to something I'm late for, and now that we've wandered into "neutron's fault" territory I'm losing interest :P22:14
TheJuliaokay, we have two places which can pull a list of ports it looks like22:14
dansmithbut maybe tomorrow morning we can rope in a neutron person22:14
TheJuliaone burried down after line 2630 you linked22:14
dansmithmaybe they'll recognize what "never answers http again, followed by too much DB traffic"22:15
TheJuliamaybe, hopefully we'll have more data in the morning22:15
dansmithdata from what?22:15
dansmithI think the flavor thing is likely not related22:15
TheJuliawe might have a held node to dig into :)22:15
dansmithah okay22:15
dansmithI guess it's not clear to me that it would still be hung when you get it, depending I guess22:16
TheJuliait might not be, but we might have a lot more logging to dig through22:16
dansmithwhether or not it is will tell you ... something22:16
TheJuliaexactly22:16
dansmithI guess seeing if neutron ever comes back will be interesting22:16
TheJulia++22:16
dansmithor if mysql is 100% cpu and neutron is still selecting the world22:17
* dansmith &22:17
TheJuliag'night22:17

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!