Tuesday, 2021-07-13

opendevreviewMerged opendev/system-config master: Add paste service  https://review.opendev.org/c/opendev/system-config/+/79840000:07
opendevreviewMerged opendev/system-config master: Add opendev paste server  https://review.opendev.org/c/opendev/system-config/+/79973600:26
opendevreviewIan Wienand proposed opendev/system-config master: Add infra-service-deploy-paste to deploy pipeline  https://review.opendev.org/c/opendev/system-config/+/80057901:27
opendevreviewIan Wienand proposed opendev/zone-opendev.org master: paste: add missing trailing dot for acme  https://review.opendev.org/c/opendev/zone-opendev.org/+/80058603:21
opendevreviewMerged opendev/zone-opendev.org master: paste: add missing trailing dot for acme  https://review.opendev.org/c/opendev/zone-opendev.org/+/80058603:33
opendevreviewIan Wienand proposed opendev/system-config master: lodgeit: add robots.txt  https://review.opendev.org/c/opendev/system-config/+/80058704:16
opendevreviewIan Wienand proposed opendev/system-config master: lodgeit: add robots.txt  https://review.opendev.org/c/opendev/system-config/+/80058704:17
*** ykarel|away is now known as ykarel04:27
*** bhagyashris_ is now known as bhagyashris|ruck05:25
*** ysandeep|away is now known as ysandeep05:27
opendevreviewIan Wienand proposed opendev/system-config master: lodgeit: add robots.txt  https://review.opendev.org/c/opendev/system-config/+/80058706:31
*** amoralej|off is now known as amoralej06:51
opendevreviewIan Wienand proposed opendev/system-config master: lodgeit: add robots.txt  https://review.opendev.org/c/opendev/system-config/+/80058707:11
*** ysandeep is now known as ysandeep|lunch07:34
*** rpittau|afk is now known as rpittau07:39
*** ykarel is now known as ykarel|lunch08:47
*** ysandeep|lunch is now known as ysandeep08:54
*** dmellado_ is now known as dmellado09:25
*** ykarel|lunch is now known as ykarel10:11
*** dviroel|out is now known as dviroel11:20
*** amoralej is now known as amoralej|lunch12:19
opendevreviewRonelle Landy proposed zuul/zuul-jobs master: DNM - Testing rpm update from compose - iptables  https://review.opendev.org/c/zuul/zuul-jobs/+/80062912:39
*** ricolin_ is now known as ricolin12:48
*** amoralej|lunch is now known as amoralej13:13
*** ysandeep is now known as ysandeep|away14:23
*** ykarel is now known as ykarel|away14:24
opendevreviewRonelle Landy proposed zuul/zuul-jobs master: DNM - Testing rpm update from compose - iptables  https://review.opendev.org/c/zuul/zuul-jobs/+/80062915:16
*** mgoddard- is now known as mgoddard15:27
clarkbefoley: not sure if you saw https://bugs.chromium.org/p/gerrit/issues/detail?id=14776 but I think I narrowed down the root cause of being able to move your openid from an old account to a new account in the gerrit code15:29
clarkbefoley: doesn't help us fix it, but good to undertand it for the future I guess. Also ianw seemed happy to try the fixup in the downtime. I'm hoping to test that on the server we've staged up in the next day or two. Then hopefully you can test a login on that server and make sure it all looks good (I'll check in tomorrow with an update)15:30
efoleyclarkb: ack, thanks for the update15:35
*** marios is now known as marios|out16:02
*** rpittau is now known as rpittau|afk16:23
*** amoralej is now known as amoralej|off16:30
clarkbcorvus: hi17:15
corvusclarkb: o/17:15
clarkbcorvus: https://etherpad.opendev.org/p/gerrit-upgrade-2021 is the planning doc17:15
corvusclarkb: is the idea to make #13 work?17:17
clarkbcorvus: yes17:17
clarkbI think we can change that to be while zuul is off, force merge the dns update change in review02, then manually pull and run the playbook on bridge17:17
corvusclarkb: yeah, there's so much that's happening while zuul is off anyway, that doesn't seem a big deal to me17:18
corvusthat also means you can discard steps 10 and 1217:18
clarkbthis is definitely a source of awkwardness. we ran into similar chicken and egg issues when upgrading the zuul schedluer itself17:19
corvusclarkb: can you be more specific?17:19
corvusi'm not sure how the gerrit connection configuration relates to upgrading the zuul scheduler17:20
clarkbcorvus: specifically needing a working zuul system to update zuul and dns. Working zuul system may not be the case without first updating zuul and dns17:20
clarkbnot the gerrit connection config. But coordinating changes in the system when the system depends on itself17:20
clarkbianw was trying to use an escape hatch for that in this case. Unfortunately, that doesn't do what we expected for depends-on17:21
corvusclarkb: i believe you could perform a zero-downtime move of gerrit if gerrit were an HA system17:21
corvusand i believe gerrit can be run in that manner, but we haven't invested in doing so17:22
corvusin such a circumstance, it's understandable that there will be some thing we can't do without downtime17:22
corvusas for upgrading the zuul scheduler, that's specifically what the 5.0 work is about.  when we reach 5.0, we should be able to do similar work on zuul without any downtime or awkwardness.17:23
clarkbI don't think downtime is the issue as much as needing to make manual changes17:23
corvusclarkb: they are equivalent in this case17:23
corvusfeel free to s/downtime/manual changes/ in my text and i think it still applies17:23
clarkbin any case updating zuul to do what we'd like here and reverting both require we restart zuul. In that case I think reverting wins since we should be able to workaround it manually as mentioned previously17:24
toskyout of curiosity: if in the meantime I change the patch to Depends-On on the gerrit ID, like back in the days, shouldn't that work?17:24
clarkbUnless you feel really strongly about wanting to update zuul then we could do that17:24
corvusclarkb: i don't think it's worth updating zuul17:24
corvusit appears to require a very minor change in the plan to run the dns change manually (along with all the other stuff that's done manually)17:25
corvusclarkb: you doing a revert commit?17:27
opendevreviewClark Boylan proposed opendev/system-config master: Talk to review.o.o instead of review01.o.o  https://review.opendev.org/c/opendev/system-config/+/80069417:29
clarkbcorvus: yup sorry17:29
clarkbI think we don't want a full revert because the way the ssh known_hosts are set up now is good? Just need to revert the config connection setting I think17:29
corvusclarkb: yep, that's what i was thinking too.  +317:31
opendevreviewClark Boylan proposed opendev/system-config master: Add warning to inventory about zuul gerrit server config  https://review.opendev.org/c/opendev/system-config/+/80069917:34
clarkbThat is a followup to add a warning in the code to hopefully avoid this in the future.17:34
clarkb"good" news is that we can restart with the fix for that elif conditional in the zuul that produces a bunch of unwanted logging :)17:34
clarkbshould we do something like #status notice Depends-On using https://review.opendev.org are currently not working. This was due to a config change in Zuul that we are reverting and will be restarting Zuul to pick up.17:39
corvusclarkb: i think that's a good idea17:39
clarkb#status notice Depends-On using https://review.opendev.org URLs are currently not working. This was due to a config change in Zuul that we are reverting and will be restarting Zuul to pick up.17:40
opendevstatusclarkb: sending notice17:40
-opendevstatus- NOTICE: Depends-On using https://review.opendev.org URLs are currently not working. This was due to a config change in Zuul that we are reverting and will be restarting Zuul to pick up.17:40
clarkbtosky: I expect that will workaroudn this for you, but I hope we'll get zuul restarted more quickly than a round trip of your tempest jobs :)17:43
clarkbI'm going to take a break while we wait for that revert to merge. Back in a bit and can do zuul restart then prep for the meeting17:46
*** zbr is now known as Guest68117:50
opendevreviewVishal Manchanda proposed opendev/puppet-translation_checksite master: Remove 'django_openstack_auth' reference from ``local.conf.erb``  https://review.opendev.org/c/opendev/puppet-translation_checksite/+/80070217:55
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Reduce review.o.o TTL to 300 seconds  https://review.opendev.org/c/opendev/zone-opendev.org/+/80070518:09
clarkbpreviously ^ wasn't important but I think it is now18:09
toskyclarkb: eheh, right, thanks, it's better to wait18:44
toskyand thanks clarkb and corvus for the prompt investigation and (soon) resolution18:44
clarkbthe change should be merging soon (it is in the gate) then we wait for the deployment playbook to run then we can do a restart18:46
clarkbhopefully by the time the infra meeting is over we'll be in a good spot to restart18:46
ianw... interesting.  that's a side-effect i hadn't considered :/18:58
clarkbya I dind't think of it either18:58
opendevreviewMerged opendev/system-config master: Talk to review.o.o instead of review01.o.o  https://review.opendev.org/c/opendev/system-config/+/80069419:10
opendevreviewMerged opendev/zone-opendev.org master: Reduce review.o.o TTL to 300 seconds  https://review.opendev.org/c/opendev/zone-opendev.org/+/80070519:20
opendevreviewMerged opendev/system-config master: Add warning to inventory about zuul gerrit server config  https://review.opendev.org/c/opendev/system-config/+/80069919:21
opendevreviewMerged opendev/system-config master: lodgeit: add robots.txt  https://review.opendev.org/c/opendev/system-config/+/80058719:36
opendevreviewMerged opendev/system-config master: Add infra-service-deploy-paste to deploy pipeline  https://review.opendev.org/c/opendev/system-config/+/80057919:36
clarkbcorvus: ianw: the zuul deploy job for https://review.opendev.org/c/opendev/system-config/+/800694 is yet to even start but it looks like the hourly job just finished for zuul and I think that actually pulled in current master and updated teh config for us19:42
clarkbit also looks like we haev up to date images on zuul02.o.o. Does anyone else want to verify the config and the images and then I can save queues, stop zuul, start zuul, restore queues?19:44
corvusclarkb: confirmed zuul.conf looks good19:44
corvusclarkb: i find it easier to just run the pull playbook; it takes seconds19:45
corvus(and can be run before starting the restart playbook)19:45
clarkbcorvus: ok I'll do that19:45
ianwlgtm too19:45
clarkbpull completed successfully19:46
clarkbjust getting the commands together for queue saving and the restart playbook19:47
clarkbianw: corvus: I'm ready, any reason to not restart now?19:48
corvusclarkb: release team?19:49
clarkbI have notified them (about an hour ago and just now that we're likely to proceed) I don't see any release jobs either19:49
corvuscool, sounds good tehn19:49
clarkbI think we are good on that front19:49
clarkbalright I'm proceeding19:50
clarkbqueues have been saved and the restart playbook is running19:50
clarkbplaybook is done, just waiting for zuul to reload its configs then I can restore queues19:51
clarkbit complained about some of the ansible collections repos that are required projects on system-config-run-base-ansible-devel not sure if that is expected19:54
clarkbUnknown projects: github.com/ansible-collections/ansible.posix, github.com/ansible-collections/ansible.netcommon, github.com/ansible-collections/community.crypto, github.com/ansible-collections/community.general, github.com/ansible-community/ara19:54
clarkboh I bet this is loading system-config in a tenant other than openstack19:55
clarkbso ya probably expected19:55
clarkbianw: before I looked at zuul things I made that git clone and checkout in /tmp on review02 if you want to see what it looks like. It is different...19:57
clarkbtenants are loaded I'm restoring queues now19:59
ianwclarkb: cool, just poking now.  i've chowned /home/gerrit2 correctly.  i'm 99.99% certain that was because i mounted the LVM there before ansible ever ran, as i wanted to make sure it created everything on the lvm20:01
clarkbmakes sense20:02
clarkbianw: I've found the easiest way to navigate the externalids is via git grep20:03
clarkbthe hashes are deterministic but not the level of nesting from what I can tell so far20:04
clarkbyou basically hceck no nesting, one level, two levels, and so on until you find your hashed path20:04
clarkbgit grep cheats and just tells you what the path is20:04
clarkbbut I think we basically sync up review02, fetch refs/meta/external-ids, edit them, push back with gerrit off, reindex accounts and groups, then can start review and have efoley test login there if we like20:05
clarkbianw: if you want to work through that with me today I'm game :)20:05
clarkbreenqueue complete20:05
ianwok, sure.  let me first pull the latest gerrit images and run an rsync on review0220:06
ianwi have a root screen open20:06
clarkbianw: sure no rush either I need to eat some lunch and I expect you need some breakfast20:06
clarkbwe can sync back up in an hour or two?20:06
ianw++ yep20:07
clarkbfyi I have put https://review.opendev.org/c/zuul/zuul/+/800683 and https://review.opendev.org/c/zuul/zuul/+/800684 back in zuul's check queue to double check depends-on behavior20:07
clarkbThey should both fail the pep8/linters jobs20:07
clarkbif 684 passes then we may still have depends on problems20:07
corvus#status log restarted all of zuul on commit f9bfac09dd47e7065cd588287706b6965baaae37 to fix depends-on error and pick up result event handler fix20:08
opendevstatuscorvus: finished logging20:08
clarkboh thanks20:09
clarkbSuccessful failure "2021-07-13 20:11:20.541703 | ubuntu-focal | ERROR:   linters: commands failed" now I will eat lunch20:11
clarkbianw: added a note to the etherpad about openid logins since you had a question about that21:10
clarkbI guess let me know when you want to do review02 things and I'll context switch. Just going to work on reviews now21:12
ianwclarkb: yeah, i think we should pre-figure out the ssh version to merge the change, seems safest21:28
ianwhappy to poke at the user updates21:29
clarkbwfm. I think you escalate your privs to bootstrappers, then apply the +2 Verified, +2 Code-Review and +1 Workflow. Then push the submit button. All of that should be doable via ssh and I agree sorting out the commands first is a good idea21:30
clarkbianw: ok, I guess we want to start with a sync and an image update?21:30
clarkbor have those been done?21:30
ianwi have done that in a root screen running on review0221:30
ianwsent 6,036,696 bytes  received 13,159,906,090 bytes  17,636,895.90 bytes/sec21:30
ianwtotal size is 15,174,572,983  speedup is 1.1521:30
clarkbok cool. let me attach to that and we can look at All-Users.21:31
clarkbianw: you ok with me driving in the screen a bit?21:31
ianwyep, go for it21:32
clarkbI've started a new buffer (not sure if it switches to new buffers for all attached ro not)21:32
ianwso is 64/ or 74/ the incorrect one?  or does it not matter21:34
clarkbianw: one is the old legacy openid that launchpad used and the other is the new current ubuntu one url21:37
clarkb64/ is gone now and can be ignored. 74/ is the one we need to update. This was actaully the clue I used to sort out how this happened21:38
ianwah right, launchpad.net v ubuntu.com21:38
clarkbif gerrit updated the external ids in place then I would've expected 64/ to remain but it was gone. That implied to me that it had actually  been deleted and that seems to have been the case21:38
clarkboops it seems you were in scrollback mode? I think I broke that? let me know when I can proceed21:38
ianwahh go ahead was just checking out the orignal commands21:39
*** dviroel is now known as dviroel|out21:41
clarkbianw: that git push command look right to you?21:44
ianwyep, so we've checked out the refs/meta/external-ids branch, updated the id in question and now push it back21:45
clarkbianw: I think we can reindex now as that should be all that is necessary?21:47
clarkbthen if we start gerrit efoley should be able to attempt to login ? may need to edit /etc/hosts on efoley's side but ya21:48
ianwagree, there should be a command to paste in the etherpad21:48
clarkbI'll do that in another buffer as it needs root and will do lots of scrolling21:49
clarkbianw: that looks good to you?21:49
ianwyep.  i think i played with the threads arg and it didn't make any difference21:50
ianwto overall time21:50
clarkbit is running now21:51
ianwso in summary we found <user> via grep in All-Users, giving us XX/<sha> for their settings.  then checked out external-refs and edited the note for XX/<sha> to reflect the new address/old userid.  push that back and re-index21:52
clarkbianw: close, we fetched refs/meta/external-ids then checked out FETCH_HEAD. Then found them via grep21:53
clarkbbut then yes, find the external ids for the user. Grepping by accountId is the easiest way to do that. But you can hash the external id directly too. Let me try that in the git repo buffer21:54
clarkbas an example21:54
clarkber -e isn't the flag is it21:55
clarkbianw: does what I put in the screen buffer help?21:56
ianwyeah i think so21:56
ianwso gerrit gets back the login.ubuntu.com URL, hashes it, looks it up in external-ids, then essentially logs you in as the uid listed in there21:57
clarkbianw: what I found difficult about the hash method is often you don't actually know what the external id value is. but if you start with accountId you can grep that directly. The hash method is more important if you are adding a new entry rather than modifying21:57
clarkbianw: yup21:57
clarkbianw: its really simple in implemetnation but then the details make it super clunky/annoying as we've found :(21:59
clarkbianw: efoley is in an european timezone aiui. Maybe you can coordinate at the end of your day or I can coordinate at the beginning of mine tomorrow to test it22:02
clarkbI think we want efoley to login and verify the accound id shown on the settings page is the one we want (the one we switched the external id to) then also maybe double check the external id contents to make sure gerrit didn't edit them in an unexpected way22:02
ianwok, i can send a mail22:03
clarkband that can't happen until we've finished reindexing and are happy with gerrit starting up. But I expect that will all go fine22:03
clarkbthen sunday we just repeat the steps we took (because the sync will overwrite us) and we should be good22:04
ianwdoes the URL returned https://login.ubuntu.com/+id/XXX have a name?  is it your handle?22:11
clarkbianw: it is your openid id22:12
clarkbI think in the protocol fields it is called openid.identity22:12
clarkbsorry I think I broke your scroll buffer in screen again22:14
clarkbI was double checking the new account didn't have anymore external ids. It doesnt' appear to22:14
clarkbianw: reindex is done. Have you been starting it up after this point to check on it?22:19
ianwclarkb: yep22:20
clarkbianw: did you want to do that now?22:20
ianwwhat's with the mailto: external-id's ?  is that from authenticating email addresses?22:20
ianwsure, i can22:20
clarkbianw: those are what you get when you add additional email addresses to your accounts in gerrit22:20
clarkbyou can go into the gerrit settings and add a bunch of those. And ya gerrit is supposed to authenticate those as well22:20
ianwreal    28m2.880s22:21
ianwthat's a lot faster than i saw before, i think i was recording 50+ minutes22:21
clarkbianw: it may go faster if you have an existing index it can build on22:21
ianwahh, indeed, it was probably totally fresh22:21
ianwi just docker-compose up -d'd and it should be coming up22:22
clarkbyup I see it now. Have you tried logging in to see what the behavior is? I suspect you are right athat we need to set a local /etc/hosts entry to point at review02 for this server name (because our config says the canonical name is review.opendev.org)22:23
ianwyou have to make sure to hit httpS://review02.opendev.org or you get redirected, but it's up for me22:23
clarkbexcellent. We should be able to test things with efoley then. Thanks for the help on this22:24
ianwclarkb: yeah, i have tried but it was a while ago and iirc i overrode hosts22:24
ianwi'm just finihsing up the notes on account update in the etherpad22:24
clarkbianw: you can even rebase my existing change if you want I think22:29
clarkband just push it again22:29
clarkbbut redoing it is also easy22:29
clarkbI should be around for that part of the move too so I can help with bits like this22:30
ianwi just got to school run, but then i'll sort out the dns steps22:31
ianwthen i'll tidy up the checklist and we can do a final review of the steps22:31
clarkbsounds good. I'm going to context switch back to zuul reviews after a short break. Let me know if I can help with anything else22:31
ianwdid we loose the gerrit bot?22:32
clarkbianw: the irc server I'm talking to says it is still here22:35
ianwyeah, it seems very quiet though22:35
clarkbI've detached from the screen on review02 in case you decide you don't want it anymore22:41
clarkbinfra-root how does https://etherpad.opendev.org/p/BTdvOeY2lEiWtR2kQUHg for notifying people not to delete emails from their accounts?23:27
clarkb*how does that look23:27
clarkbI know that won't get in front of everyone but it seems betterthan nothing23:28
ianwclarkb: the bit about the reason we can't fix it easily is maybe a little in-depth for a general announcement23:35
clarkbianw: fair. I wanted to avoid a bunch of why cn't you just fix it questions but we can do that in followups23:37
clarkbianw: rewritten to be a bit less detail heavy23:39
clarkbI can send that out tomorrow morning when it is more likely to get eyeballs23:42
ianwlooks good.  one other thing to call otu might be openid == ubuntu just so people are clear23:44
clarkbgood idea23:44
opendevreviewIan Wienand proposed opendev/zone-opendev.org master: Update review.opendev.org to review02.opendev.org  https://review.opendev.org/c/opendev/zone-opendev.org/+/79824423:46
corvusclarkb: are we giving up on service-announce as a thing?  i ask because your etherpad is "to: all the lists"23:47
clarkbcorvus: In this situation I think it is warranted bceause almost no one is subscribed to the announce list as far as I can tell. If people ignore the announce list for things like downtimes I don't really care that much23:48
clarkbbut in this case we are actually trying to get as many people to read this as possible and service-announce doesn't seem to do that?23:49
corvusclarkb: if it's not useful for sending out announcements related to the service, we should probably shut it down then.  that's unfortunate.23:49
clarkbianw: https://review.opendev.org/c/opendev/zone-opendev.org/+/798244 doesn't seem to actually update the cname fwiw23:49
corvusclarkb: tbh, i would prefer not to have stuff like that go to zuul-discuss23:49
corvusso it's unfortunate that it isn't working23:50
clarkbcorvus: well I think there is a difference in what we are trying to achieve here. In the cas eof say planned maintenance its not my problem you aren't subscribed to the announce list and that is what you should read. But if you want as broad a reach as possible to impact user behavior then sending it to as many lists as possible makes sense to me23:50
opendevreviewGhanshyam proposed openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo  https://review.opendev.org/c/openstack/project-config/+/80055823:51
clarkbI agree more people shouldbe subscribed to that list23:51
corvusclarkb: i guess i understood anything going to service-announce as being relevant to users of the service.  i think that means downtime notifications and changes (or alerts) to the service.23:51
corvusi get that the fact that no one is subscribed to the list makes it unsuitable for that23:51
corvusno argument there23:52
clarkbI think I'm arguing it is still suitable for that in the case where it is on the user if they don't read it. But in this case gerrit is causing an incredibly broken unexpected behavior with no out for the user.23:52
corvusbut i guess i'm saying we should either make it be so, or discontinue it.  because i'm not sure a downtime announcement is any less important than this.  also, it looks like ianw has been sending downtime announcements to "all the lists" as well.23:53
clarkband it makes sense for us to try and communicate that as broadly as possible23:53
clarkbIf I send that email to the announce list it won't help any users avoid the gerrit bug. Our responses can be a) sorry you get a new account because you didn't read email b) wait for $time until we can take downtime to fix it or c) wait until we fix the db and can fix it online. None of those seem great and preventing it in the first place23:54
clarkbvs its too bad you didn't account for an hour of downtime23:54
clarkbfor that reason to me they are different, but I agree it is unfortunate that people aren't subscribing to that list23:55
clarkbcorvus: if you don't want it sent to the zuul-discuss list how do you suggest I get the message in front of zuul developers?23:55
clarkbcorvus: do we need to do a seprate push to get people subscribed to hte announce list?23:56
corvusclarkb: for me, i see "this will affect your work" for both things, and both are equally important.  it's okay, i don't think it's important that we convince each other.  :)23:56
clarkband then take the hard position of "sorry you get a new account" if the message is missed23:56
clarkbcorvus: well I think one creates work for me and the other doesnt23:56
clarkbits not my problem if a developer loses an hour to planned outages they didn't read for. But fixing the convoluted notedb has been my problem for months :/23:57
clarkband I'm slowly making progress but it isn't fun or interesting or rewarding23:57
corvusclarkb: i'm sorry, i'm not expressing myself well.  i'm not trying to create work for you.  i'm trying to say they are both important.  and every developer needs to see both23:57
corvusclarkb: let's put it another way: you see two classes of severity, low and high.  i only see high.  :)23:57
clarkbgot it23:57
clarkbIn that case I think it is worth reevaluating if we can convince people to join the announce list23:58
clarkband if not consider alternatives23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!