Friday, 2023-11-17

@nexn:matrix.orgNilesX: May I know which release of starlingx ISO did you use07:10
@nexn:matrix.orgI am using release 807:11
@maloute:matrix.orgSame m.07:29
@nexn:matrix.orgNilesX: If it is possible kindly send the direct downloading link for that?07:31
@nexn:matrix.orgCould I know any raid configuration is in your environmrnt 07:53
@maloute:matrix.orgThe link to download is on the release page08:10
@maloute:matrix.orgAnd no raid for me08:10
@nexn:matrix.orgokay08:15
@nexn:matrix.org> <@nexn:matrix.org> sent an image.09:18
I hope you can see the alarm raised in fault management, could maybe the reason we can't able to unlock the controller host?
@maloute:matrix.orgif you look at the dashboard when unlocking you should see the differents steps it's doing09:46
@maloute:matrix.orgas for your alarm, this seems to be normal since you did this command after 1 min of unlocking09:46
@nexn:matrix.orgThen what will be the reason we can't  able to unlock the host .we tried with you guidance still case is same. 11:10
@nexn:matrix.orgThank you NilesX ,We are able to unlock the controller host!12:57
@maloute:matrix.orgNice one ! What was different?13:04
@bruce.jones:matrix.orgLooks like review.opendev.org is down15:36
@jreed:matrix.orgCan confirm15:36
@bruce.jones:matrix.orgfungi: is there an ETA?15:36
@fungicide:matrix.orgshould take no more than an hour. this upgrade maintenance was announced some time ago: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/15:39
@bruce.jones:matrix.orgtyty15:40
@fungicide:matrix.orgif you're not subscribed to service-announce, you might consider doing so. it's very low-traffic15:40
@fungicide:matrix.orgalso for more in-the-moment things, you may want to follow https://fosstodon.org/@opendevinfra/15:41
@fungicide:matrix.orgit was offline for about 10 minutes upgrading, but we're still poking around to make sure everything looks okay so don't be surprised if we have to take it down again for a rollback if we find something we overlooked in staging tests15:44
@fungicide:matrix.orgalso we have zuul offline for a few more minutes, so anything that gets pushed or approved at the moment may need a recheck once it comes back online15:49
@jreed:matrix.orgZuul is still down :/ - Will it automatically pick up jobs when it comes back?16:59
@jreed:matrix.orgZull is back up!!!17:08
@jreed:matrix.org * Zuul is back up!!!17:08
@jreed:matrix.orgBut this is the biggest back log of jobs I've ever seen... WOW - 17:09
@jreed:matrix.orghttps://zuul.openstack.org/status17:09
@fungicide:matrix.orgjreed: note that most of the builds for those were already completed, only builds which happened to be running when we brought the service down are being rerun now, so the backlog looks worse than it is. the test nodes and node request graphs here give a slightly more accurate picture: https://grafana.opendev.org/d/21a6e53ea4/zuul-status17:15
@fungicide:matrix.orgalso on the running builds graph you can see that it had an average of about 40 builds in progress from each executor before the maintenance, and caught back up to roughly the same within a few minutes of coming back online17:19
@fungicide:matrix.orgit will have missed gerrit events which occurred between 15:30 and 17:05 utc though, so you might have to add recheck comments to any changes you uploaded during that span of time, or remove and reapply approval votes for anything that got approved within that window17:20
@jreed:matrix.orgI think I have one stuck. 17:22
@jreed:matrix.orgI'll see if I can get a core to redo their vote real quick and see if that works. 17:22
@jreed:matrix.orgI don't think simply replying with a comment does it.17:22
@fungicide:matrix.orgstuck as in it got approved while zuul was offline? yeah in that case you'll need a new approval vote on it (which can be done by the same person too, they just have to remove their workflow +1 and then add it again)17:24
@jreed:matrix.orgYes. I'm trying that now.17:24
@jreed:matrix.orgAnd it's running.... that's the way to go. Thanks fungi 17:25
@fungicide:matrix.orgyw17:27
@jreed:matrix.orgWell... zuul ran and one of the jobs failed but I don't think it was a test failure17:49
@jreed:matrix.orghttps://zuul.opendev.org/t/openstack/build/7b5008b924e247c7a1f3eb76fe96151f17:49
@jreed:matrix.orgSaid it couldn't install dependencies??17:49
@jreed:matrix.orgIt 'ass the normal check just fine... I think zuul is having an issue with it because of the maintenance17:51
@jreed:matrix.orgI guess we should just re-trigger it again?17:52
@jreed:matrix.orgfungi:  Any idea what's up? 17:54
@jreed:matrix.orgI've requeued the job.  Not sure what the heck is happening.17:57
@jreed:matrix.orgFailed trying to reach - https://mirror-int.ord.rax.opendev.org/wheel/debian-11.8-x86_64/pip/  - Is that mirror undergoing maintenance as well?18:00
@fungicide:matrix.orgjreed: that looks like ansible is incorrectly using `debian-11.8` instead of just `debian-11` in the url, likely something to do with how it expands the distro version string. i recall something related to that recently in newer ansible, digging to see if i can find details19:12
@fungicide:matrix.orglooking deeper, that url path is wrong but it's also wrong in the subsequent builds that passed, so normally it's non-fatal (but should still be fixed). i think whatever happened to that build was something else, but i'm still looking19:43
@fungicide:matrix.orgtox log collection in that job definition seems to be broken, and the console log is massive (i may have to switch to plaintext because it's shredding my browser), so investigation is slow-going but it does at least seem that whatever happened to cause that build to fail is not occurring consistently19:53
@fungicide:matrix.orgalmost 62k lines in that console log19:54
@fungicide:matrix.org`Could not fetch URL https://mirror-int.ord.rax.opendev.org/pypi/simple/pygments/: connection error: HTTPSConnectionPool(host='mirror-int.ord.rax.opendev.org', port=443): Read timed out. - skipping`20:11
@fungicide:matrix.orghttps://zuul.opendev.org/t/openstack/build/7b5008b924e247c7a1f3eb76fe96151f/log/job-output.txt#6123020:11
@fungicide:matrix.orgjreed: that was the fetch that succeeded in later builds, so it does look like some temporary network issue impacted the job node (or possibly the mirror server), such that the https connection timed out reading from the socket at some point when trying to retrieve that index20:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!