*** dviroel|rover|dinner is now known as dviroel|rover | 00:33 | |
*** dviroel|rover is now known as dviroel|out | 00:36 | |
jieniu | Hi, I'm from Computing Force Network working group of OIF, it's a new working group setup in July, recently we are planning to use opendev for code hosting, story board, I have submit a change request https://review.opendev.org/c/openstack/project-config/+/863025 to apply for resources, the job failed because no acl-config, so acl-config is must-have , right ? | 03:16 |
---|---|---|
opendevreview | Jie Niu proposed openstack/project-config master: Apply cfn repository for code and storyboard https://review.opendev.org/c/openstack/project-config/+/863025 | 03:38 |
opendevreview | Jie Niu proposed openstack/project-config master: add acl-config file https://review.opendev.org/c/openstack/project-config/+/863112 | 03:38 |
*** yadnesh|away is now known as yadnesh | 04:31 | |
*** ysandeep is now known as ysandeep|afk | 06:42 | |
tkajinam | seems like gerrit is down ? | 06:57 |
yadnesh | tkajinam, seems like it | 07:00 |
marios | o/ yeah same here | 07:08 |
frickler | infra-root: ^^ review02 is unreachable from the outside, other servers in the same region are fine. console log doesn't show anything obvious, vnc console is working. instead of searching for a login, I've tried a soft reboot, but that didn't help | 07:17 |
frickler | from the console log after the reboot it looks like the server is failing to set up networking, trying to find a login now | 07:17 |
*** yadnesh is now known as yadnesh|afk | 07:30 | |
frickler | #status notice review.opendev.org (Gerrit) is currently down, we are working to restore service as soon as possible | 07:30 |
opendevstatus | frickler: sending notice | 07:30 |
-opendevstatus- NOTICE: review.opendev.org (Gerrit) is currently down, we are working to restore service as soon as possible | 07:30 | |
opendevstatus | frickler: finished sending notice | 07:33 |
frickler | looking at the initial console log I captured, it seems the instance got rebooted or crashed just before the issue was initially reported in opendev | 07:47 |
frickler | [ 0.884595] PM: RTC time: 06:59:15, date: 2022-11-01 | 07:47 |
frickler | likely crashed because it did some cleaning during rootfs check | 07:48 |
*** ysandeep|afk is now known as ysandeep | 07:51 | |
frickler | hmm, seems the kernel is pretty recent 5.4.0-131.147 | 08:04 |
frickler | also it is -generic instead of -kvm what I see e.g. on mirror01 | 08:05 |
frickler | tried another reboot to see if I could catch a grub prompt, but that went by much too fast | 08:05 |
frickler | checking the unattended-upgrade mails, the kernel was updated on Oct 19, but the older versions were -generic, too. so the issue could likely only be new kernel version, not different kernel type | 08:11 |
frickler | anyway, with what we are at now, the only option I can see for how to progress is to rescue the instance via the corresponding openstack command. but I've never done that before so I'd rather wait until help is around | 08:15 |
Tengu | frickler: maybe an update/upgrade was running for the big OpenSSL thing happening today? | 08:54 |
frickler | Tengu: I don't think that's likely, that should only be out after 13:00 | 08:56 |
marios | thanks for looking at that and for the updates frickler o/ | 08:56 |
Tengu | ah, wasn't aware of the actual hour. right. | 08:57 |
Tengu | 713 | 09:02 |
*** yadnesh|afk is now known as yadnesh | 09:03 | |
*** rlandy|out is now known as rlandy | 10:38 | |
*** dviroel|out is now known as dviroel|rover | 11:28 | |
*** ysandeep is now known as ysandeep|afk | 11:36 | |
*** soniya29 is now known as soniya29|afk | 11:57 | |
*** ysandeep|afk is now known as ysandeep | 12:15 | |
Tengu | frickler: heya! any news? guess you have to wait for NA ppl to show up with the All Saints holiday hitting most of Catholic EMEA? | 12:32 |
*** yadnesh is now known as yadnesh|away | 12:42 | |
frickler | Tengu: well even without a holiday there isn't any EMEA based infra-root other than me. so yes, we'll likely have to wait for another hour or two at least | 12:43 |
Tengu | frickler: uho. ok ^^' | 12:43 |
sean-k-mooney | frickler: i think the main differnce form the kvm kernel and the generic one is the kvm kernel has some hardwre only module disabled ot make it smaller | 12:59 |
Tengu | frickler: dropped a mail on openstack-discuss | 12:59 |
Tengu | just to make ppl aware - the IRC notification might be missed. | 13:00 |
sean-k-mooney | morelikely today since yesterday was a public holiday for many so i for onw actully turned my laptop off for the first time in about 4 months | 13:01 |
Tengu | hmmm today is (also?) public holiday in some locations | 13:03 |
sean-k-mooney | in some yes | 13:04 |
sean-k-mooney | by the old irish calandar today is the 1st day of the new year | 13:04 |
sean-k-mooney | the historical celibartion halloween evovlved out of was the autum harvest festivial that marked the end of the old year and start of the new. so novmber 1st woudl be the first day of winter and 1st day or the new year. the roman catholic church leveraged the fact that many cultures had a festival of the dead to and other semi religious event at this time and created all saints day | 13:11 |
sean-k-mooney | on novemebr 1st | 13:11 |
sean-k-mooney | so all saits day is what is celibarted in some european countries today | 13:11 |
Tengu | yup. and in Switzerland, it's even funnier, since it depends on the actual canton. my current has the holiday, while my former one didn't. "yay". | 13:17 |
Tengu | how to not be complicated at all. | 13:17 |
frickler | same for Germany, some yesterday, some today, some neither | 13:26 |
*** dasm|off is now known as dasm | 13:29 | |
sean-k-mooney | add DST to the mix and and im sure meetign will totally happen today | 13:31 |
frickler | plus something spooky going on https://github.com/NCSC-NL/OpenSSL-2022/blob/main/spooky.png | 13:35 |
Tengu | yeah, that one is nide. | 13:39 |
Tengu | *nice | 13:39 |
corvus | frickler: what's the current status? | 14:01 |
corvus | it looks like the server is up and i can log in with ssh and get a shell. | 14:01 |
slittle1 | hmm, is gerrit down ? | 14:03 |
slittle1 | ah, I see the notice when I scroll up. | 14:04 |
slittle1 | Any ETA for a fix? | 14:04 |
yadnesh|away | it's working now | 14:10 |
corvus | i started the docker containers; i do not know why they didn't start at boot | 14:10 |
Tengu | \o/ | 14:11 |
Tengu | thank you folks! | 14:11 |
corvus | #status log restarted docker containers on review02 which were not running after a crash/reboot | 14:13 |
opendevstatus | corvus: finished logging | 14:13 |
Tengu | let's just hope the amount of ppl hitting f5 won't crash the whole thing :] | 14:14 |
Clark[m] | frickler: sean-k-mooney: yes the only difference between those kernel packages is meant to be how many drivers you get along with them. | 14:15 |
corvus | i see new patchsets being uploaded and zuul is running jobs | 14:15 |
Clark[m] | I guess whatever issue was preventing network from being configured resolved external to the host? DHCP server maybe | 14:15 |
corvus | infra-root: we do not have auto restart configured on the gerrit docker containers | 14:16 |
Clark[m] | Ipv6 is configured statically on that server but iirc frickler's ipv6 routing to vexxhost has been problematic | 14:17 |
Clark[m] | I guess the next thing is to check syslog/dmesg/kernel log and see if the problem was on the host side and if not ask mnaser for input from the cloud side? | 14:17 |
* sean-k-mooney is using the fact i cant review ot actully work on my own code for once | 14:18 | |
corvus | Clark: can you verify ipv6? | 14:18 |
corvus | Nov 01 13:02:31 review02 systemd-networkd[949]: ens3: DHCPv4 address 199.204.45.33/24 via 199.204.45.1 | 14:19 |
corvus | i'm guessing that's when the ipv4 issues resolved externally | 14:19 |
corvus | Nov 01 13:02:30 review02 systemd-timesyncd[868]: Initial synchronization to time server [2620:2d:4000:1::40]:123 (ntp.ubuntu.com). | 14:19 |
corvus | around the same time, so probably both v4 and v6 transitioned from broken to working then | 14:20 |
Clark[m] | I can reach review.o.o https on my phone via my mobile ISP which is my easiest ipv6 check. Can do a more in depth check in a bit | 14:20 |
sean-k-mooney | https://downforeveryoneorjustme.com/review.opendev.org | 14:20 |
sean-k-mooney | that says its up now | 14:20 |
sean-k-mooney | and i can reach it too | 14:20 |
*** ysandeep is now known as ysandeep|dinner | 14:22 | |
fungi | i'm not really around this week, but checking in it looks like i missed all the excitement | 14:22 |
frickler | o.k., so that sounds like it may have been an issue on vexxhost side. maybe mnaser__ can tell more at some point in time | 14:22 |
corvus | thanks. i think that's probably sufficient confirmation that v6 is working.. plus i saw another infra-root log in via ssh over ipv6 | 14:22 |
fungi | and yeah, i can reach it over v6 | 14:23 |
corvus | i think we can send a status update now? | 14:23 |
fungi | it looks like the server was last rebooted almost 6 hours ago according to uptime | 14:23 |
fungi | so maybe neutron went on an extended lunch break on that host | 14:23 |
corvus | status notice review.opendev.org (Gerrit) is back online | 14:24 |
corvus | ^ yes/no ? | 14:24 |
frickler | yes | 14:24 |
corvus | #status notice review.opendev.org (Gerrit) is back online | 14:25 |
opendevstatus | corvus: sending notice | 14:25 |
-opendevstatus- NOTICE: review.opendev.org (Gerrit) is back online | 14:25 | |
sean-k-mooney | https://paste.opendev.org/show/b9wxuH5Ub3MV72oa1rgD/ | 14:25 |
sean-k-mooney | i can curl it fine on ipv6 so it should be fine | 14:25 |
fungi | cacti shows a gap between ~06:15 and 13:05 utc | 14:25 |
fungi | so something definitely "fixed" networking for that instance around 13:00 utc | 14:26 |
fungi | many hours after the reboot | 14:26 |
frickler | so my theory is that the hypervisor rebooted unplanned, causing the instance to reboot, too. then neutron was broken until somebody from vexxhost got up and fixed it | 14:26 |
opendevstatus | corvus: finished sending notice | 14:27 |
frickler | https://status.vexxhost.com/ doesn't say anything, but maybe our tenant isn't covered by that | 14:29 |
clarkb | ya that seems likely given the evidence we have so far | 14:41 |
sean-k-mooney | thats roughtly around when the openssl release shoudl have happened but i doubt they roled it out that fast | 14:41 |
clarkb | they also haven't disclosed it yet last I checked. They had a 4 hour window (and I haven't checked in about 20 minutes) | 14:42 |
sean-k-mooney | ya proably just a tasitant issue | 14:42 |
clarkb | corvus: re auto starting gerrit iirc the old system service setup did not auto start it and I'm guessing the docker compose config simply ported that behavior over. It might be worth discussing if that behavior is what we still want to retain | 14:47 |
corvus | yeah, it may still be the right thing to do. may help avoid corruption, etc. | 14:52 |
corvus | mostly when i joined the incident, server was up and gerrit was not, so that was the main mystery. | 14:52 |
frickler | yes, I didn't recheck things after the initial debugging round, cause I didn't expect things to magically repair themselves | 14:53 |
frickler | but I do hope that such incidents don't happen too often, so I would vote in favor of keeping the manual start as a sanity check | 14:55 |
*** dviroel|rover is now known as dviroel|rover|lunch | 15:09 | |
jieniu | Hi all, when I'm logging gerrit, it redirect me with | 15:24 |
jieniu | "https://review.opendev.org/SignInFailure,SIGN_IN,Contact+site+administrator" , | 15:24 |
jieniu | i have logged in successfully using this account before, what could be the problem? | 15:24 |
clarkb | jieniu: usually it means that you are trying to log in to an account with a conflicting email address with another ccount. | 15:25 |
clarkb | jieniu: when ws the last time you logged in successfully? | 15:25 |
clarkb | I can take a closer look after my meeting | 15:25 |
clarkb | usually this happens because you've created a new ubuntu one account with an email address that matches an older account | 15:26 |
jieniu | last time I login could be 2 month ago? | 15:27 |
jieniu | I am able to use this account to push changes recent days. | 15:27 |
*** knikolla[m] is now known as knikolla | 15:27 | |
clarkb | pushing changes via ssh is likely using the old account. Did you change anything on the ubuntu one side? | 15:27 |
*** ysandeep|dinner is now known as ysandeep | 15:40 | |
clarkb | jieniu: lets keep the conversation here as much as possible (don't need to share email info etc in the public channel, but it helps keep everyone up to date if someone else needs to debug this later) | 15:48 |
clarkb | jieniu: There is an existing account with an ubuntu one openid that does not appear to be valid any longer. You are now attempting to log in with a new openid but the email address for this new openid conflicts with the email address for the old openid and Gerrit doesn't allow this conflict | 15:49 |
clarkb | jieniu: something must have changed on your ubuntu one account side to do this. Maybe you deleted an old ubuntu one account and made a new one or updated it somehow? | 15:49 |
jieniu | sorry, I remember there was some issue(didn't remember what exact issue it was ) when I use this account(old account), I may have deleted the account and re-registered | 15:51 |
jieniu | so you maybe right the push via ssh may using the old account. | 15:51 |
jieniu | Any suggestion how to fix the issue? | 15:51 |
clarkb | I think we have two options here. I can retire and disable the old account removing the old openid from it. THis shoudl allow you to login with your new openid. Or you can try to restore the old openid (some people have reported success with this appraoch when they know what they did and it was reversible, but not knowing what changed on ubuntu one I can't say this will work here) | 15:51 |
jieniu | I deleted the old OpenID, so it may not be reversible? | 15:53 |
clarkb | if it was deleted then ya I don't think it is reversible | 15:53 |
jieniu | so could you help me to disable the old OpenID? thank you very much ? | 15:54 |
jieniu | * ! (typo ..) | 15:54 |
clarkb | yes, I can do that. It will take me a few minutes to page that process back in. | 15:54 |
clarkb | Note this will effectively orphan the old account in Gerrit. Its changes and reviews and so on will continue to exist, but you'll need to use the new account (with a new ssh username) going forward | 15:55 |
clarkb | unfortunately there isn't a much better option to us without taking a gerrit downtime currently | 15:56 |
clarkb | anyway I'll start working on that now | 15:56 |
clarkb | jieniu: ok I'm done retiring the old account. I believe you'll be able to login and create the new account now | 16:11 |
*** dviroel|rover|lunch is now known as dviroel|rover | 16:12 | |
jieniu | clakrb: yes, just able to tell you that the account looks good now, thank you ! | 16:12 |
clarkb | jieniu: you're welcome | 16:16 |
clarkb | infra-root I put logs for ^ in the usual gerrit cleanup location of my homedir on the gerrit server | 16:16 |
clarkb | corvus: I'm deescalating my privs after ^ and notice your admin account is in bootstrappers. Should I remove it? | 16:17 |
corvus | clarkb: yes, thx, sorry | 16:27 |
clarkb | done | 16:27 |
noonedeadpunk | sounds like gerritbot died | 16:35 |
noonedeadpunk | I actually wanted to hear opinion on that https://review.opendev.org/c/openstack/project-config/+/863158 including repo naming before pushing patch to the governance | 16:36 |
clarkb | it probably didn't like the prolonged gerrit outage. I'll go look at its longs | 16:36 |
clarkb | ya its last logs are from 06:44 ish I'll restart it | 16:37 |
clarkb | noonedeadpunk: windmill is effectively abandoned at this point I think. Your creation of a new role should be fine. I'm not aware of anything that will collide with the same repo name under an org dir (in fact we have a number of colliding project-config repos at this point if so) | 16:40 |
clarkb | noonedeadpunk: I'll leave a note about that but +2 it | 16:40 |
noonedeadpunk | nah, it's intersect only with windmill | 16:41 |
noonedeadpunk | but I still feel quite bad about that fact :( | 16:41 |
noonedeadpunk | Ok, then let me push patch to governance before that, so I can create depends-on without spending your time on re-voting :) | 16:42 |
clarkb | I wish I had a reason for you to push a new patchset then we could test gerritbot after I restarted it :) | 16:42 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add another role for Zookeeper installation https://review.opendev.org/c/openstack/project-config/+/863158 | 16:47 |
noonedeadpunk | looks like it working :) | 16:47 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add zookeeper repo to zuul and gerritbot https://review.opendev.org/c/openstack/project-config/+/863162 | 16:52 |
*** ysandeep is now known as ysandeep|out | 17:07 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add repository for Skyline installation by OpenStack-Ansible https://review.opendev.org/c/openstack/project-config/+/863165 | 17:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add repository for Skyline installation by OpenStack-Ansible https://review.opendev.org/c/openstack/project-config/+/863165 | 17:16 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add os_skyline repo to CI https://review.opendev.org/c/openstack/project-config/+/863167 | 17:17 |
*** marios is now known as marios|out | 17:22 | |
opendevreview | Jie Niu proposed openstack/project-config master: Apply cfn repository for code and storyboard https://review.opendev.org/c/openstack/project-config/+/863168 | 17:28 |
ianw | snapd | 19:16 |
ianw | Reverse Depends: | 19:16 |
ianw | apparmor | 19:16 |
ianw | apparmor | 19:16 |
ianw | command-not-found | 19:16 |
ianw | that's on the jammy bridge :/ i wonder if that will cause a fight of it getting uninstalled and re-installed by other things | 19:17 |
ianw | e.g. rsyslog -> apparmor -> snapd | 19:17 |
clarkb | https://packages.ubuntu.com/jammy/apparmor doesn't list it | 19:17 |
clarkb | but sorting that out is a good idea | 19:17 |
*** diablo_rojo_phone is now known as Guest182 | 19:19 | |
ianw | hrm, wonder if that shows recommendations ... we might be installing recs too | 19:19 |
clarkb | https://packages.ubuntu.com/jammy/rsyslog does show the apparmor suggests | 19:20 |
opendevreview | Clark Boylan proposed opendev/system-config master: Rebuild gitea images under new golang release https://review.opendev.org/c/opendev/system-config/+/863176 | 19:47 |
ianw | ... pulls up some things on gitea after being offline for 4 days ... sees DAO ... closes tab quickly :) | 19:51 |
clarkb | I'm going to scrounge up lunch. I've also got to pick up kids from school later today and should probably do a short indoor bike ride (the outside is cold and wet). But I can be around for the gitea deployment assuming CI is happy with it later in my day. | 19:52 |
ianw | right now there's no hourly or deploy jobs running, i think a good time to cycle bridge01 | 19:53 |
ianw | ... started | 19:53 |
ianw | and it's back | 19:55 |
clarkb | thanks | 19:55 |
*** Guest182 is now known as diablo_rojo_phone | 20:34 | |
clarkb | ianw: looks like snaps rely on apparmor to do a lot of their isolation stuff | 20:40 |
clarkb | https://packages.ubuntu.com/jammy/snapd | 20:40 |
clarkb | I wonder why apt lists it in reverse depends when it is a regular dependency? | 20:40 |
mnaser__ | clarkb, frickler: just to provide context, looks like the compute node crashed and the networking didn’t properly come back till we noticed it. Also you can feel free to send an email to support@vexxhost.com — most of us here are aware about opendev | 21:06 |
mnaser__ | open10k8s: ricolin and Guilherme (not in this channel it seems) are all aware and can check things out | 21:06 |
fungi | mnaser__: thanks for following up! that was the most likely scenario we were able to guess based on observed behaviors | 21:09 |
mnaser__ | sorry, haven’t been as active up here as I’d like to.. :) | 21:10 |
clarkb | mnaser__: thanks for confirming and good to know | 21:11 |
clarkb | I need to do that school run momentarily but I think the next step for gitea is for someone to pull docker://insecure-ci-registry.opendev.org:5000/opendevorg/gitea:60a872511484487aae940b78fb81d6fe_latest and check if go is 1.18.8 there. I tried to figure that out from the job log directly but I 'm not sure we record that info (the shas that docker image builds write doesn't seem to | 21:12 |
clarkb | map to the sha identifiers for the image as a whole? I find all of that very confusing with docker) | 21:12 |
clarkb | the sha at https://zuul.opendev.org/t/openstack/build/60a872511484487aae940b78fb81d6fe/log/job-output.txt#785 does match the sha of what I get when I docker run golang:1.18-bullseye and then go version in the resulting container reports 1.18 which I think is sufficient for me to be confident that we are using hte new version | 21:14 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/863176 is ready when we are | 21:14 |
ianw | clarkb: hrm, all i can see in apparmor is "Breaks: apparmor-profiles-extra (<< 1.21), fcitx-data (<< 1:4.2.9.1-1ubuntu2), snapd (<< 2.44.3+20.04~)" | 21:25 |
ianw | i wonder if that's confusing rdepends | 21:26 |
ianw | apt-rdepends ... the separate tool, seems to give an answer more like i'd think it would be -> https://paste.opendev.org/show/bJfVeEkdINw21rdUM4FO/ | 21:29 |
ianw | basically a lot of gnome-y stuff | 21:29 |
*** dviroel|rover is now known as dviroel|rover|bbl | 21:32 | |
clarkb | I'm back now if we want to approve https://review.opendev.org/c/opendev/system-config/+/863176 I can be around to watch it this afternoon/evening | 22:07 |
*** rlandy is now known as rlandy|out | 22:54 | |
ianw | ^ i'm fine with it, i can help watch too | 23:04 |
*** dasm is now known as dasm|off | 23:09 | |
clarkb | thanks | 23:16 |
*** dasm|off is now known as Guest202 | 23:37 | |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799 | 23:54 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799 | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!