Tuesday, 2022-11-01

jieniuHi, I'm from Computing Force Network working group of OIF,  it's a new working group setup in July, recently we are planning to use opendev for code hosting, story board,  I have submit a change request https://review.opendev.org/c/openstack/project-config/+/863025  to apply for resources,  the job failed because no acl-config, so acl-config is must-have , right ? 03:16
opendevreviewJie Niu proposed openstack/project-config master: Apply cfn repository for code and storyboard  https://review.opendev.org/c/openstack/project-config/+/86302503:38
opendevreviewJie Niu proposed openstack/project-config master: add acl-config file  https://review.opendev.org/c/openstack/project-config/+/86311203:38
tkajinamseems like gerrit is down ?06:57
yadneshtkajinam, seems like it07:00
marioso/ yeah same here 07:08
fricklerinfra-root: ^^ review02 is unreachable from the outside, other servers in the same region are fine. console log doesn't show anything obvious, vnc console is working. instead of searching for a login, I've tried a soft reboot, but that didn't help07:17
fricklerfrom the console log after the reboot it looks like the server is failing to set up networking, trying to find a login now07:17
frickler#status notice review.opendev.org (Gerrit) is currently down, we are working to restore service as soon as possible07:30
opendevstatusfrickler: sending notice07:30
-opendevstatus- NOTICE: review.opendev.org (Gerrit) is currently down, we are working to restore service as soon as possible07:30
opendevstatusfrickler: finished sending notice07:33
fricklerlooking at the initial console log I captured, it seems the instance got rebooted or crashed just before the issue was initially reported in opendev07:47
frickler[    0.884595] PM: RTC time: 06:59:15, date: 2022-11-0107:47
fricklerlikely crashed because it did some cleaning during rootfs check07:48
fricklerhmm, seems the kernel is pretty recent 5.4.0-131.14708:04
frickleralso it is -generic instead of -kvm what I see e.g. on mirror0108:05
fricklertried another reboot to see if I could catch a grub prompt, but that went by much too fast08:05
fricklerchecking the unattended-upgrade mails, the kernel was updated on Oct 19, but the older versions were -generic, too. so the issue could likely only be new kernel version, not different kernel type08:11
frickleranyway, with what we are at now, the only option I can see for how to progress is to rescue the instance via the corresponding openstack command. but I've never done that before so I'd rather wait until help is around08:15
Tengufrickler: maybe an update/upgrade was running for the big OpenSSL thing happening today?08:54
fricklerTengu: I don't think that's likely, that should only be out after 13:0008:56
mariosthanks for looking at that and for the updates frickler o/ 08:56
Tenguah, wasn't aware of the actual hour. right.08:57
*** soniya29 is now known as soniya29|afk11:57
*** ysandeep|afk is now known as ysandeep12:15
Tengufrickler: heya! any news? guess you have to wait for NA ppl to show up with the All Saints holiday hitting most of Catholic EMEA?12:32
fricklerTengu: well even without a holiday there isn't any EMEA based infra-root other than me. so yes, we'll likely have to wait for another hour or two at least12:43
Tengufrickler: uho. ok ^^'12:43
sean-k-mooneyfrickler: i think the main differnce form the kvm kernel and the generic one is the kvm kernel has some hardwre only module disabled ot make it smaller12:59
Tengufrickler: dropped a mail on openstack-discuss12:59
Tengujust to make ppl aware - the IRC notification might be missed.13:00
sean-k-mooneymorelikely today since yesterday was a public holiday for many so i for onw actully turned my laptop off for the first time in about 4 months13:01
Tenguhmmm today is (also?) public holiday in some locations13:03
sean-k-mooneyin some yes13:04
sean-k-mooneyby the old irish calandar today is the 1st day of the new year13:04
sean-k-mooneythe historical celibartion halloween evovlved out of was the autum harvest festivial that marked the end of the old year and start of the new. so novmber 1st woudl be the first day of winter and 1st day or the new year. the roman catholic church leveraged the fact that many cultures had a festival of the dead to and other semi religious event at this time and created all saints day13:11
sean-k-mooneyon novemebr 1st13:11
sean-k-mooneyso all saits day is what is celibarted in some european countries today13:11
Tenguyup. and in Switzerland, it's even funnier, since it depends on the actual canton. my current has the holiday, while my former one didn't. "yay".13:17
Tenguhow to not be complicated at all.13:17
fricklersame for Germany, some yesterday, some today, some neither13:26
sean-k-mooneyadd DST to the mix and and im sure meetign will totally happen today13:31
fricklerplus something spooky going on https://github.com/NCSC-NL/OpenSSL-2022/blob/main/spooky.png13:35
Tenguyeah, that one is nide.13:39
corvusfrickler: what's the current status?14:01
corvusit looks like the server is up and i can log in with ssh and get a shell.14:01
slittle1hmm, is gerrit down ?14:03
slittle1ah, I see the notice when I scroll up.   14:04
slittle1Any ETA for a fix?14:04
yadnesh|awayit's working now14:10
corvusi started the docker containers; i do not know why they didn't start at boot14:10
Tenguthank you folks!14:11
corvus#status log restarted docker containers on review02 which were not running after a crash/reboot14:13
opendevstatuscorvus: finished logging14:13
Tengulet's just hope the amount of ppl hitting f5 won't crash the whole thing :]14:14
Clark[m]frickler: sean-k-mooney: yes the only difference between those kernel packages is meant to be how many drivers you get along with them.14:15
corvusi see new patchsets being uploaded and zuul is running jobs14:15
Clark[m]I guess whatever issue was preventing network from being configured resolved external to the host? DHCP server maybe14:15
corvusinfra-root: we do not have auto restart configured on the gerrit docker containers14:16
Clark[m]Ipv6 is configured statically on that server but iirc frickler's ipv6 routing to vexxhost has been problematic14:17
Clark[m]I guess the next thing is to check syslog/dmesg/kernel log and see if the problem was on the host side and if not ask mnaser for input from the cloud side?14:17
* sean-k-mooney is using the fact i cant review ot actully work on my own code for once14:18
corvusClark: can you verify ipv6?14:18
corvusNov 01 13:02:31 review02 systemd-networkd[949]: ens3: DHCPv4 address via
corvusi'm guessing that's when the ipv4 issues resolved externally14:19
corvusNov 01 13:02:30 review02 systemd-timesyncd[868]: Initial synchronization to time server [2620:2d:4000:1::40]:123 (ntp.ubuntu.com).14:19
corvusaround the same time, so probably both v4 and v6 transitioned from broken to working then14:20
Clark[m]I can reach review.o.o https on my phone via my mobile ISP which is my easiest ipv6 check. Can do a more in depth check in a bit14:20
sean-k-mooneythat says its up now14:20
sean-k-mooneyand i can reach it too14:20
fungii'm not really around this week, but checking in it looks like i missed all the excitement14:22
fricklero.k., so that sounds like it may have been an issue on vexxhost side. maybe mnaser__ can tell more at some point in time14:22
corvusthanks.  i think that's probably sufficient confirmation that v6 is working.. plus i saw another infra-root log in via ssh over ipv614:22
fungiand yeah, i can reach it over v614:23
corvusi think we can send a status update now?14:23
fungiit looks like the server was last rebooted almost 6 hours ago according to uptime14:23
fungiso maybe neutron went on an extended lunch break on that host14:23
corvusstatus notice review.opendev.org (Gerrit) is back online14:24
corvus^ yes/no ?14:24
corvus#status notice review.opendev.org (Gerrit) is back online14:25
opendevstatuscorvus: sending notice14:25
-opendevstatus- NOTICE: review.opendev.org (Gerrit) is back online14:25
sean-k-mooneyi can curl it fine on ipv6 so it should be fine14:25
fungicacti shows a gap between ~06:15 and 13:05 utc14:25
fungiso something definitely "fixed" networking for that instance around 13:00 utc14:26
fungimany hours after the reboot14:26
fricklerso my theory is that the hypervisor rebooted unplanned, causing the instance to reboot, too. then neutron was broken until somebody from vexxhost got up and fixed it14:26
opendevstatuscorvus: finished sending notice14:27
fricklerhttps://status.vexxhost.com/ doesn't say anything, but maybe our tenant isn't covered by that14:29
clarkbya that seems likely given the evidence we have so far14:41
sean-k-mooneythats roughtly around when the openssl release shoudl have happened but i doubt they roled it out that fast14:41
clarkbthey also haven't disclosed it yet last I checked. They had a 4 hour window (and I haven't checked in about 20 minutes)14:42
sean-k-mooneyya proably just a tasitant issue14:42
clarkbcorvus: re auto starting gerrit iirc the old system service setup did not auto start it and I'm guessing the docker compose config simply ported that behavior over. It might be worth discussing if that behavior is what we still want to retain14:47
corvusyeah, it may still be the right thing to do.  may help avoid corruption, etc.14:52
corvusmostly when i joined the incident, server was up and gerrit was not, so that was the main mystery.14:52
frickleryes, I didn't recheck things after the initial debugging round, cause I didn't expect things to magically repair themselves14:53
fricklerbut I do hope that such incidents don't happen too often, so I would vote in favor of keeping the manual start as a sanity check14:55
jieniuHi all, when I'm logging gerrit, it redirect me with 15:24
jieniu "https://review.opendev.org/SignInFailure,SIGN_IN,Contact+site+administrator" , 15:24
jieniui have logged in successfully using this account before, what could be the problem?15:24
clarkbjieniu: usually it means that you are trying to log in to an account with a conflicting email address with another ccount.15:25
clarkbjieniu: when ws the last time you logged in successfully?15:25
clarkbI can take a closer look after my meeting15:25
clarkbusually this happens because you've created a new ubuntu one account with an email address that matches an older account15:26
jieniulast time I login could be 2 month ago? 15:27
jieniuI am able to use this account to push changes recent days.15:27
clarkbpushing changes via ssh is likely using the old account. Did you change anything on the ubuntu one side?15:27
clarkbjieniu: lets keep the conversation here as much as possible (don't need to share email info etc in the public channel, but it helps keep everyone up to date if someone else needs to debug this later)15:48
clarkbjieniu: There is an existing account with an ubuntu one openid that does not appear to be valid any longer. You are now attempting to log in with a new openid but the email address for this new openid conflicts with the email address for the old openid and Gerrit doesn't allow this conflict15:49
clarkbjieniu: something must have changed on your ubuntu one account side to do this. Maybe you deleted an old ubuntu one account and made a new one or updated it somehow?15:49
jieniusorry, I remember there was some issue(didn't remember what exact issue it was ) when I use this account(old account),  I may have deleted the account and re-registered15:51
jieniuso you maybe right the push via ssh may using the old account.15:51
jieniuAny suggestion how to fix the issue?15:51
clarkbI think we have two options here. I can retire and disable the old account removing the old openid from it. THis shoudl allow you to login with your new openid. Or you can try to restore the old openid (some people have reported success with this appraoch when they know what they did and it was reversible, but not knowing what changed on ubuntu one I can't say this will work here)15:51
jieniuI deleted the old OpenID, so it may not be reversible?15:53
clarkbif it was deleted then ya I don't think it is reversible15:53
jieniuso could you help me to disable the old OpenID? thank you very much ?15:54
jieniu * ! (typo ..)15:54
clarkbyes, I can do that. It will take me a few minutes to page that process back in.15:54
clarkbNote this will effectively orphan the old account in Gerrit. Its changes and reviews and so on will continue to exist, but you'll need to use the new account (with a new ssh username) going forward15:55
clarkbunfortunately there isn't a much better option to us without taking a gerrit downtime currently15:56
clarkbanyway I'll start working on that now15:56
clarkbjieniu: ok I'm done retiring the old account. I believe you'll be able to login and create the new account now16:11
*** dviroel|rover|lunch is now known as dviroel|rover16:12
jieniuclakrb: yes, just able to tell you that the account looks good now, thank you !16:12
clarkbjieniu: you're welcome16:16
clarkbinfra-root I put logs for ^ in the usual gerrit cleanup location of my homedir on the gerrit server16:16
clarkbcorvus: I'm deescalating my privs after ^ and notice your admin account is in bootstrappers. Should I remove it?16:17
corvusclarkb: yes, thx, sorry16:27
noonedeadpunksounds like gerritbot died16:35
noonedeadpunkI actually wanted to hear opinion on that https://review.opendev.org/c/openstack/project-config/+/863158 including repo naming before pushing patch to the governance16:36
clarkbit probably didn't like the prolonged gerrit outage. I'll go look at its longs16:36
clarkbya its last logs are from 06:44 ish I'll restart it16:37
clarkbnoonedeadpunk: windmill is effectively abandoned at this point I think. Your creation of a new role should be fine. I'm not aware of anything that will collide with the same repo name under an org dir (in fact we have a number of colliding project-config repos at this point if so)16:40
clarkbnoonedeadpunk: I'll leave a note about that but +2 it16:40
noonedeadpunknah, it's intersect only with windmill16:41
noonedeadpunkbut I still feel quite bad about that fact :(16:41
noonedeadpunkOk, then let me push patch to governance before that, so I can create depends-on without spending your time on re-voting :)16:42
clarkbI wish I had a reason for you to push a new patchset then we could test gerritbot after I restarted it :)16:42
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add another role for Zookeeper installation  https://review.opendev.org/c/openstack/project-config/+/86315816:47
noonedeadpunklooks like it working :)16:47
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add zookeeper repo to zuul and gerritbot  https://review.opendev.org/c/openstack/project-config/+/86316216:52
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add repository for Skyline installation by OpenStack-Ansible  https://review.opendev.org/c/openstack/project-config/+/86316517:12
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add repository for Skyline installation by OpenStack-Ansible  https://review.opendev.org/c/openstack/project-config/+/86316517:16
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add os_skyline repo to CI  https://review.opendev.org/c/openstack/project-config/+/86316717:17
opendevreviewJie Niu proposed openstack/project-config master: Apply cfn repository for code and storyboard  https://review.opendev.org/c/openstack/project-config/+/86316817:28
ianwReverse Depends:19:16
ianw  apparmor19:16
ianw  apparmor19:16
ianw  command-not-found19:16
ianwthat's on the jammy bridge :/  i wonder if that will cause a fight of it getting uninstalled and re-installed by other things19:17
ianwe.g. rsyslog -> apparmor -> snapd19:17
clarkb https://packages.ubuntu.com/jammy/apparmor doesn't list it19:17
clarkbbut sorting that out is a good idea19:17
ianwhrm, wonder if that shows recommendations ... we might be installing recs too19:19
clarkbhttps://packages.ubuntu.com/jammy/rsyslog does show the apparmor suggests19:20
opendevreviewClark Boylan proposed opendev/system-config master: Rebuild gitea images under new golang release  https://review.opendev.org/c/opendev/system-config/+/86317619:47
ianw... pulls up some things on gitea after being offline for 4 days ... sees DAO ... closes tab quickly :)19:51
clarkbI'm going to scrounge up lunch. I've also got to pick up kids from school later today and should probably do a short indoor bike ride (the outside is cold and wet). But I can be around for the gitea deployment assuming CI is happy with it later in my day.19:52
ianwright now there's no hourly or deploy jobs running, i think a good time to cycle bridge0119:53
ianw... started 19:53
ianwand it's back19:55
clarkbianw: looks like snaps rely on apparmor to do a lot of their isolation stuff20:40
clarkbI wonder why apt lists it in reverse depends when it is a regular dependency?20:40
mnaser__clarkb, frickler: just to provide context, looks like the compute node crashed and the networking didn’t properly come back till we noticed it.  Also you can feel free to send an email to support@vexxhost.com — most of us here are aware about opendev21:06
mnaser__open10k8s: ricolin and Guilherme (not in this channel it seems) are all aware and can check things out21:06
fungimnaser__: thanks for following up! that was the most likely scenario we were able to guess based on observed behaviors21:09
mnaser__sorry, haven’t been as active up here as I’d like to.. :)21:10
clarkbmnaser__: thanks for confirming and good to know21:11
clarkbI need to do that school run momentarily but I think the next step for gitea is for someone to pull docker://insecure-ci-registry.opendev.org:5000/opendevorg/gitea:60a872511484487aae940b78fb81d6fe_latest and check if go is 1.18.8 there. I tried to figure that out from the job log directly but I 'm not sure we record that info (the shas that docker image builds write doesn't seem to21:12
clarkbmap to the sha identifiers for the image as a whole? I find all of that very confusing with docker)21:12
clarkbthe sha at https://zuul.opendev.org/t/openstack/build/60a872511484487aae940b78fb81d6fe/log/job-output.txt#785 does match the sha of what I get when I docker run golang:1.18-bullseye and then go version in the resulting container reports 1.18 which I think is sufficient for me to be confident that we are using hte new version21:14
clarkbhttps://review.opendev.org/c/opendev/system-config/+/863176 is ready when we are21:14
ianwclarkb: hrm, all i can see in apparmor is "Breaks: apparmor-profiles-extra (<< 1.21), fcitx-data (<< 1:, snapd (<< 2.44.3+20.04~)"21:25
ianwi wonder if that's confusing rdepends21:26
ianwapt-rdepends ... the separate tool, seems to give an answer more like i'd think it would be -> https://paste.opendev.org/show/bJfVeEkdINw21rdUM4FO/21:29
ianwbasically a lot of gnome-y stuff21:29
clarkbI'm back now if we want to approve https://review.opendev.org/c/opendev/system-config/+/863176 I can be around to watch it this afternoon/evening22:07
ianw^ i'm fine with it, i can help watch too23:04
*** dasm is now known as dasm|off23:09
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: helm: Add job for linting helm charts  https://review.opendev.org/c/zuul/zuul-jobs/+/86179923:54
opendevreviewMichael Kelly proposed zuul/zuul-jobs master: helm: Add job for linting helm charts  https://review.opendev.org/c/zuul/zuul-jobs/+/86179923:59

