clarkb | counting off time to index 200 changes it does seem to be slowly getting quicker | 00:05 |
---|---|---|
clarkb | but that might not be a wide enough sample to check | 00:05 |
clarkb | ~now is when we expected it to be done. It is not done if anyone is wondering. Still slow but maybe slowly getting quicker. I'll keep an eye on it | 00:34 |
clarkb | fungi: corvus: I'll aim to be back around about 15:00 tomorrow as well | 00:34 |
clarkb | but we'll see how I do | 00:34 |
clarkb | ~10k changes in ~17 minutes | 00:37 |
clarkb | not great | 00:37 |
clarkb | but also watching it like this may not be great for my health. I'm gonna take a break | 00:37 |
clarkb | I've discovered that there may actually have been a flag to tell the migrator to not reindex. That would have allowed us to do the gc'ing first then manually reindex. But at this point sticking to what we've tested is our best bet I think even if it takes all night | 01:29 |
corvus | ++ | 01:29 |
corvus | plan the dive and dive the plan | 01:29 |
clarkb | are you mordred now? | 01:29 |
corvus | i, um, used to have a long daily commute by train and read pulp adventure novels | 01:30 |
clarkb | ha | 01:30 |
clarkb | for anyone following along I don't erally expect this to finish before I go to bed so that I can kick off the gc | 01:30 |
clarkb | I'll still check on it, but probably try and return tomorrow at 15:00 UTC. Assuming it exits 0 I think fungi you can probably go ahead and start the gc? but wait on others before doing the next steps. Or if you'd prefer to wait for me to be awake I'm cool with that too | 01:31 |
corvus | clarkb: it's probably going to be fungi that hits the button; but in case i (or someone else) happens to be around first... it's .... | 01:32 |
corvus | sorry what step? | 01:32 |
clarkb | currently 4.3: time find /home/gerrit2/review_site/git/ -type d -name "*.git" -print0 | xargs -t -0 -P 16 -n 1 -IGITDIR sudo -H -u gerrit2 git --git-dir="GITDIR" gc --aggressive | 01:32 |
clarkb | please run echo $? when this current command finishes so we can confirm it exits 0 | 01:33 |
clarkb | during testing we discovered that gerrit commands don't always tell you they have errored when they error :? | 01:33 |
clarkb | so so its echo $? then if 0 step 4.3 from a couple lines above | 01:33 |
corvus | clarkb: so 4.1 (migrate-to-notedb) that's running now; then 4.2 when that finishes, and if it's zero and nothing seems to be on fire, 4.3 (gc). right? | 01:34 |
clarkb | correct | 01:34 |
corvus | clarkb: can i 'strikethrough' the steps done on the etherpad? | 01:34 |
clarkb | corvus: yes I think that is fine | 01:34 |
corvus | done (and i bolded 4.1) | 01:35 |
corvus | clarkb: have a good evening! | 01:35 |
clarkb | I'll try! :) dinenr then the mandalorian I hope | 01:36 |
fungi | i just caught up, had two episodes to get through | 01:50 |
fungi | and yeah, this looks like it's taking a while | 01:50 |
fungi | i'm planning to fire off the git gc when i wake up, assuming the reindex is even done by then | 01:51 |
*** hamalq has quit IRC | 02:59 | |
ianw | Reindexing changes: project-slices: 29% (785/2697), 30% (235273/760363) (-) fyi | 03:47 |
clarkb | just crossed 300k | 04:47 |
clarkb | also I've learned that one of the things the wikimedia changes does is shuffle the project "slices" They are supposed to be broken down into smaller chunks to prevent a single repo from dominating the cost like nova | 04:54 |
clarkb | however, that element of randomness may explain why we see times that vary so much ? at least contribute to it | 04:55 |
clarkb | I haven't done as much testing as wikimedia did, but I would be really surprised if it is faster to skip around like that. it seems like you want to keep things warm in the cache | 04:56 |
clarkb | eg do all of nova, then do all of neutron and so on | 04:56 |
clarkb | "It does mean that reindexing after invalidating the DiffSummary cache will be expensive" another tidbit from the source (I wonder if we're in that situation perhaps induced by the notedb migration? | 05:01 |
clarkb | oh neat they also split up slices based on changeid/number not actual ref count | 05:09 |
clarkb | so if you've got lots of changes with lots of refs (patchsets) in certain projects those won't be balanced well | 05:09 |
clarkb | they also use mod to split them up so change 1 and 2 go in different slices and 101 and 102 go in different slices if moddiny by 2. When you probably want them to be in the same slice due to git tree state cache warmth? Anyway thats probably enough java for me tonight. There is likely quite a bit of room for improvement in the reindexer to be more deterministic and less reliant on luck | 05:11 |
clarkb | oh and when we tested we would typically start gerrit at 2.16 and maybe that populates the DiffSummary caches? We didn't want to do that this time ebcause to interact with it we'd have to drop our web notice. It would be funny if not starting on 2.16 without notedb was the problem | 05:17 |
mnaser | o/ is there an etherpad with the steps that are occurring and what was done / left to do for those curious people who want to watch from the sidelines ? | 05:19 |
mnaser | (aka me) | 05:19 |
clarkb | mnaser: https://etherpad.opendev.org/p/opendev-gerrit-3.2-upgrade-plan the bolded item is the one we're on | 05:19 |
clarkb | mnaser: we are currently doing the last part of the notedb migration which is a full reindex (which is going slower than expected but we also planned for this long task to happen during the between days period) | 05:20 |
clarkb | when this is done we git gc all the repos to pack up the notedb contents (makes things faster), then upgrade to 3.0, 3.1, 3.2 and reindex again | 05:21 |
mnaser | Cool! So it sounds like the major migration is done | 05:22 |
clarkb | the actual data migration part is ya. Now its a bucnh of house keeping around that (reindex and gc) | 05:22 |
mnaser | I’d argue that the actual migration into notedb is the trickier bit, indexing is indexing | 05:23 |
mnaser | Awesome | 05:23 |
mnaser | So I assume from now on, Gerrit will no longer use a database server | 05:24 |
mnaser | It will be using purely notedb I guess? | 05:24 |
clarkb | unfortunately that is a bad assumption :P | 05:24 |
clarkb | the accountPatchReviewDb remains in mysql | 05:24 |
clarkb | its the single table database that tracks when you have reviewed a file | 05:24 |
clarkb | but ya one of the changes I have proposed and WIP'd is one to remove the main db configuration from the gerrit config | 05:25 |
clarkb | we'll actually do that cleanup after we're settled on the new version as its ok to have the old db config in place. gerrit 3.2 will just ignore it | 05:25 |
mnaser | Oh I see | 05:26 |
mnaser | So in a way however the database is not that important, you’d just lose track of what patches you reviewed if that db is lost? | 05:26 |
clarkb | what files you have reviewed | 05:26 |
clarkb | the change votes are in notedb | 05:26 |
clarkb | you know when you look at a file diff and it gives you a checkmark on that file? | 05:27 |
mnaser | oh yes | 05:27 |
clarkb | thats all that database is doing is tracking those checkmarks next to files for you | 05:27 |
clarkb | and ya its not super critical | 05:27 |
clarkb | replication to gitea will also take a bit once this is all done as all that notedb state will be replicated for changes | 05:28 |
mnaser | and I guess in terms of scale there’s a few other deployments who have ran at our scale or even bigger :p | 05:28 |
mnaser | oh ouch, that will add a lot of additional data that is replicated across every gitea system | 05:29 |
clarkb | ya I haven't checked recently. I think gerrithub may be similar? But they didn't really exist until notedb was a thing? I may msiremember that. I know they were a driving force for it because it meant they could store stuff in github iirc | 05:29 |
clarkb | mnaser: ya the problem is refs/changes/12345/45/meta is where it goes | 05:29 |
clarkb | so you can't replicate the patchets without the notedb content (since git ref spec doesn't allow you to exclude things like that as far as I can tell) | 05:30 |
clarkb | I don't expect it will cause many issues once we get the initial sync done | 05:30 |
clarkb | that will just take some time (in testing it was like 1.5 days) | 05:30 |
mnaser | Looks like gerrithub is in the 500000s of changes | 05:30 |
mnaser | And I think we’re in the 700k’s | 05:30 |
clarkb | 760363 | 05:31 |
clarkb | we're watching a slow count up to that number on the reindex right now | 05:31 |
mnaser | Doesnt Google have a big installation too? | 05:31 |
clarkb | there is the gerrit gerrit, chrome, and android | 05:32 |
clarkb | however, google doesn't really run gerrit | 05:32 |
clarkb | they use dependency injection to replace a bunch of stuff aiui | 05:32 |
clarkb | so that it ties into their proprietary internal distributed filesystems and databses and indexers etc | 05:32 |
mnaser | The chrome one is at 2.5m wow heh | 05:33 |
mnaser | Oh I see so they’re probably not running notedb | 05:33 |
clarkb | we discovered this the hard way when we did an upgrade once and jgit just didn't work | 05:33 |
clarkb | it turned out that jgit was fine talking to their filesystem/storage/whatever it was but not to a posix fs | 05:33 |
clarkb | and so no one caught it until an open source deployment upgraded | 05:34 |
clarkb | (us) | 05:34 |
mnaser | ouch | 05:34 |
corvus | i think they're using notedb, but the git data store isn't what mere mortals use | 06:25 |
corvus | Reindexing changes: project-slices: 49% (1345/2697), 51% (390766/760363) (/) | | 06:25 |
corvus | that's a timestamped progress status before i go to bed | 06:26 |
ianw | Reindexing changes: project-slices: 74% (2021/2697), 77% (587125/760363) (-) | 10:21 |
ianw | 25% in ~ 4 hours | 10:21 |
ianw | that puts it at about 14:00UTC to finish | 10:22 |
fungi | yeah, awake again and it's claiming around 88% complete now | 12:21 |
fungi | Reindexing changes: project-slices: 87% (2373/2697), 89% (679414/760363) | 12:27 |
fungi | 99%! | 13:44 |
fungi | once this wraps up, assuming it looks good, i'll start the git gc and then i need to run out to the hardware store to pick up an order for some tools | 13:46 |
fungi | 1086m41.925s | 14:11 |
fungi | that's 18h6m42s | 14:12 |
fungi | exited 0 | 14:12 |
fungi | i've pulled the gc command back up and will start it momentarily | 14:13 |
fungi | just need to switch computers to double-check our notes | 14:13 |
fungi | okay, looking good and i've updated our notes to indicate which step we're on, gc is running now | 14:18 |
clarkb | thanks. I'm very slowly waking up but maybe I can take it easy for another hour or teo now | 14:19 |
fungi | estimated time to completion is 1.25 hours so hopefully done before 16:00 | 14:19 |
clarkb | the previous gc times were failry accurate if a few minutes fast iirc | 14:20 |
fungi | other than the final offline reindex, all the other steps should go quickly | 14:20 |
fungi | at least up until we start gerrit again, and then there's the replication which will probably take ages | 14:21 |
fungi | and the long tail of fixing things which are broken (some of which we know about, some of which we likely don't yet) | 14:21 |
fungi | anyway, not seeing any obvious errors stream by, so i'll take this opportunity to go pick up my order and be back in plenty of time for the rest of the upgrade | 14:22 |
clarkb | thanks again | 14:22 |
corvus | o/ | 14:25 |
fungi | okay, i'm back. if the gc finishes at 1.25 hours then that'll be ~12 minutes from now | 15:21 |
clarkb | judging by the cinder runtime when I checked about 5 minutes ago I think it will be longer but not significantly so. All the expensive repos seem to be processing at this point | 15:22 |
clarkb | nova, cinder, horizon, manuals | 15:24 |
clarkb | oh and neutron | 15:24 |
clarkb | nova is the only one running now | 15:33 |
fungi | 80m54.544s | 15:39 |
fungi | exited 0 | 15:40 |
clarkb | about 6 minutes logner than estimated much better. | 15:40 |
fungi | okay, ready for the next pull? | 15:41 |
clarkb | yes that loosk good to me | 15:41 |
fungi | opendevorg/gerrit 3.0 fbd02764262c 46 hours ago 679MB | 15:41 |
clarkb | that looks about right | 15:42 |
clarkb | if you're ready to run the init I am | 15:42 |
fungi | running | 15:42 |
fungi | in testing this was near instantaneous | 15:42 |
fungi | 0m12.344s and exited 0 | 15:43 |
fungi | no error messages | 15:43 |
fungi | ready for me to work on 3.1 or want to check anything? | 15:43 |
clarkb | I don't think there is anything other than the exit code to check | 15:44 |
clarkb | lets do 3.1. This init doesn't do any schema updates | 15:44 |
fungi | and pulling | 15:44 |
fungi | opendevorg/gerrit 3.1 eae7770f89d6 46 hours ago 681MB | 15:45 |
clarkb | lgtm | 15:45 |
fungi | ready to init with 3.1? | 15:45 |
clarkb | I think so. Can't think of anything else to check first | 15:45 |
fungi | underway | 15:45 |
clarkb | and done | 15:45 |
fungi | 0m11.280s | 15:46 |
fungi | exited 0 | 15:46 |
fungi | ready to pull 3.2? | 15:46 |
clarkb | yup | 15:46 |
fungi | opendevorg/gerrit 3.2 6fdfe303e8df 46 hours ago 681MB | 15:47 |
clarkb | that image lgtm | 15:47 |
clarkb | I think we can do the reindex | 15:47 |
fungi | running | 15:47 |
clarkb | er no sorry I keep getting ahead of myself | 15:47 |
clarkb | the init | 15:47 |
fungi | yeah, the init is what i'm running, sorry | 15:47 |
clarkb | the command you have queued looks right :) | 15:47 |
fungi | okay, running now | 15:47 |
fungi | 0m13.628s and exited 0 | 15:48 |
fungi | *now* it's time to reindex | 15:48 |
clarkb | yup and the command you have up for that lgtm | 15:48 |
fungi | okay, starting it now | 15:48 |
fungi | eta 41 minutes | 15:49 |
fungi | then we start gerrit and begin unwinding things | 15:49 |
clarkb | or 18 hours :/ | 15:50 |
fungi | yeah, ugh | 15:50 |
fungi | well, we're already at 1% done so hopefully not 18 hours | 15:51 |
clarkb | ya this is going much quicker just counting off progress at 20 second intervals | 15:52 |
clarkb | we were doing about 200 changes per 20 second interval last night. This just did like 4k | 15:52 |
clarkb | I think the gc'ing helps tremendously | 15:52 |
clarkb | for the unwinding it would be good for others to maybe look over what I've written down again and just sanity check it. I think my biggest concern at this point is any interaction between our ci/cd and gitea replication lag | 15:53 |
clarkb | I believe in cd we pull from gerrit and not gitea so that isn't an issue but I've got us explicitly replicating our infra repos first to mitigate that | 15:54 |
clarkb | as another sanity check our disk utilization has gone up about 5GB since the gc which is what we expected based on testing | 15:55 |
clarkb | 93GB -> 98GB on that fs | 15:55 |
clarkb | the unpacked state was about 110GB iirc | 15:55 |
clarkb | already up to 10% much much quicker this time | 15:57 |
clarkb | those exceptions in the screen scrollback are expected (small number of corrupted changes) | 15:59 |
corvus | gerrit needs its coffee | 15:59 |
corvus | i'm estimating ~18:00 for completion of this step | 16:00 |
corvus | oh the rate seems to have just significantly improved | 16:00 |
corvus | and my math was wrong | 16:01 |
clarkb | corvus needs coffee too? | 16:01 |
corvus | maybe ~17:00? | 16:02 |
clarkb | ya about another hour by my math | 16:02 |
clarkb | it took ~14 minutes to get to 20% so another 4 blocks of 14 minutes | 16:03 |
corvus | i have to run an errand; i probably won't be back until after this completes, but i'll check in when i get back and see if there's unexpected issues i can help with | 16:05 |
fungi | thanks! | 16:17 |
clarkb | it is up to 61% now | 16:26 |
clarkb | I guess the trick with the notedb migration would've been to somehow stop that process prior to reindexing, then garbage collect, then reindex manually. Reading the code there is a --reindex flag but it isn't clear to me if you can negate that somehow. Anyway we shouldn't need to do this again so not worth thinking about too much anymore | 16:27 |
clarkb | fungi: not to get ahead of myself, but do you think we should block port 29418 and leave the apache message in place when we first start gerrit? then check that logs indicate it is happy before opening things up? | 16:28 |
clarkb | I did have us starting gerrit before updating apache to check logs but realize that port 29418 would still be accessible | 16:28 |
fungi | yeah, wouldn't hurt to temporarily remove public access to that port initially, but obviously we shouldn't start up anything which would need access either (like zuul) | 16:37 |
fungi | i can edit the firewall rules temporarily now to do that. i'll use a second window in that screen session | 16:38 |
clarkb | ya there are a number of things I think we should do before starting zuul in the etherpad | 16:40 |
fungi | and done | 16:40 |
clarkb | thanks | 16:40 |
clarkb | I'm putting together a list of scripts to update to use the 3.2 image on review.o.o now since it occurred to me that we run manage-project type things periodically iirc | 16:41 |
fungi | iptables -nL and ip6tables -nL now report no allow rule for 29418 | 16:41 |
clarkb | and we don't want them to use the old image (ist actually probably ok for them to use the old image since its the same version of jeepby but I don't want to count on that | 16:41 |
fungi | (i left the overflow reject rules for 29418 in there for now) | 16:41 |
fungi | 90% | 16:42 |
clarkb | docker-compose.yaml, /usr/local/bin/manage-projects, /usr/local/bin/track-upstream seem to be the files using that variable when I grep in sytem-config | 16:43 |
clarkb | docker-compose is already edited but we should update the other two before starting zuul (I've made a note in the etherpad too) | 16:43 |
clarkb | done in 59 minutes | 16:48 |
fungi | 59m13.719s exited 0 | 16:48 |
fungi | yep | 16:48 |
fungi | okay, and 29418 is currently blocked so in theory we can start gerrit and check its service logs for obvious signs of distress | 16:48 |
clarkb | yup I think that is our next step | 16:49 |
clarkb | docker-compose up -d | 16:49 |
fungi | ready? | 16:49 |
clarkb | I guess so | 16:49 |
clarkb | Gerrit Code Review 3.2.5-1-g49f2331755-dirty ready | 16:50 |
clarkb | that plugin manager exception is expected. I believe it is because we don't enable the oplugin manager in our config but have the plugin installed | 16:50 |
fungi | something to add to the to do list to remove or enable i guess | 16:50 |
clarkb | ya | 16:50 |
clarkb | before we open things up I should add my gerrit admin ssh key. But I think you've had more experience with doing those things so maybe you want to do the force submit of the change if it still looks good to you as well as kick off replication for system-config and project-config? | 16:51 |
clarkb | we want to force merge first then replicate I think | 16:51 |
clarkb | also before we go further let me reread the etherpad notes :) | 16:51 |
fungi | are you going to be able to do those things without 29418 open? | 16:52 |
clarkb | no I'm saying lets just be ready for that when we open it | 16:52 |
fungi | oh, sure | 16:52 |
clarkb | before we open things though why don't we fix /usr/local/bin/manage-projects and /usr/local/bin/track-upstream ? | 16:52 |
clarkb | we need to change the image version in those scripts to 3.2 | 16:53 |
fungi | once 29418 is open i can add your openid account to project bootstrappers temporarily so you can add verify +2 and call submit | 16:53 |
clarkb | fungi: do you want to do the script fix in the screen or should I just do them off screen then you can confirm on screen? | 16:53 |
fungi | do we have a change to update /usr/local/bin/manage-projects and /usr/local/bin/track-upstream already? | 16:53 |
fungi | they're not going to get called until we reenable the crontabs | 16:54 |
clarkb | fungi: yes, the change whcih we force merge sets gerrit_container_image in ansible vars and that is used in docker-compose and the two scripts | 16:54 |
fungi | ahh, okay | 16:54 |
clarkb | fungi: manage-projects is called by zuul periodically iirc | 16:54 |
clarkb | so once zuul is up it may try it | 16:54 |
fungi | well, ansible is still disabled for the server too | 16:54 |
clarkb | oh good point | 16:54 |
clarkb | well I think we should fix it anyway since its a good sanity check | 16:55 |
fungi | sure, i can edit those manually for now | 16:55 |
clarkb | my concern in particular is a race between the config management updates and the manage-project updates | 16:55 |
clarkb | I don't know that they always go in order | 16:55 |
fungi | lgty? | 16:56 |
clarkb | those edits lgtm thanks | 16:56 |
clarkb | ok give me a minute to get situated with auth things then I guess we can turn it on and force merge the config mgmt change then replicate | 16:57 |
clarkb | alright i've got keys loaded and have my totp token | 16:59 |
fungi | cool, so open 29418 first or undo the maintenance page in apache first? | 17:00 |
clarkb | I think lets undo apache first | 17:00 |
fungi | does that look correct? | 17:00 |
clarkb | yes, but we also want to remove the /p blocks too | 17:00 |
fungi | like that? | 17:01 |
clarkb | yup | 17:01 |
fungi | ready for me to reload apache2? | 17:01 |
clarkb | let me just double check zuul isn't running somehow | 17:02 |
fungi | k | 17:02 |
clarkb | ps shows no zuul processes on zuul01 | 17:02 |
clarkb | I guess we continue unless you can think of anything else | 17:02 |
fungi | nope, nothing comes to mind | 17:02 |
fungi | and it's up | 17:03 |
fungi | i get the webui | 17:03 |
fungi | signing in | 17:03 |
clarkb | I'm signed in | 17:04 |
clarkb | as my regular user. Did you want to review https://review.opendev.org/c/opendev/system-config/+/762895/1 and maybe be the one to force merge it? | 17:04 |
clarkb | doesn't look like anyone else has voted on it yet | 17:04 |
fungi | yeah, signed in as my normal user too | 17:04 |
fungi | firing up gertty | 17:05 |
fungi | seems to be syncing okay | 17:05 |
clarkb | I'm removing my WIP on that change now | 17:05 |
fungi | should to remember to remind gertty users that now they need to add "auth-type: basic" to their configs | 17:06 |
fungi | worth noting you actually wanted me to review https://review.opendev.org/c/opendev/system-config/+/762895/2 not /1 | 17:09 |
fungi | took a bit to realize i was looking at an old patch there | 17:09 |
clarkb | oh sorry thats what it redirected me to from my link in theetherpad | 17:10 |
clarkb | because etherpad had the /1 too | 17:10 |
fungi | no worries, i've voted +2 on it | 17:12 |
clarkb | fungi: ok do you want to submit it or do you want me to? | 17:12 |
fungi | i can do it, just a sec | 17:12 |
clarkb | you'll need to add the +2 verified too | 17:12 |
fungi | yep | 17:12 |
fungi | and workflow +1 obviously | 17:12 |
clarkb | once that force merges I want to see if replication for system-config replicates everything or just that ref | 17:13 |
clarkb | but generally replicate system-config and project-config next I think | 17:13 |
fungi | fatal: "762895" no such change | 17:16 |
fungi | d'oh | 17:16 |
fungi | i was doing it to review-test ;) | 17:16 |
* fungi curses his command history | 17:17 | |
fungi | need to open 29418 on review.o.o for this | 17:17 |
fungi | are we good with that? | 17:17 |
clarkb | yes I am | 17:17 |
clarkb | also you still need the verified +2 (I assume your admin accounts will do that) | 17:17 |
fungi | it will | 17:18 |
clarkb | fungi: note that rules.v4 is the file now iirc | 17:18 |
clarkb | and if we missed actually blocking 29418 on ipv4 then oh well at thsi point :) it seems fine | 17:19 |
fungi | yeah, i'm just keeping rules consistent with it until we confirm and clean up the cruft | 17:19 |
clarkb | kk | 17:19 |
fungi | i edited all three | 17:19 |
clarkb | gotcha | 17:19 |
fungi | okay, it's merged and i've removed membership for my admin account from project bootstrappers | 17:20 |
clarkb | now lets see what is being replicated | 17:20 |
clarkb | nothing in the queue so did it only replicate that ref? /me looks at gitea | 17:21 |
clarkb | https://opendev.org/opendev/system-config/commit/2197f11a0f27da3f9bd1c009c84107dc09559f6e yes only that ref | 17:22 |
fungi | neat | 17:22 |
fungi | i suppose we need to manually trigger a full replication | 17:22 |
clarkb | what I think that means is we could not replicate anything and let it catch up over time? | 17:22 |
clarkb | ya or we manually replicate. I still think we manually replicate system-config and project-config first though | 17:22 |
fungi | i can trigger replication for system-config first | 17:23 |
clarkb | probably ripping off this bandaid is the best option to ensure we have plenty of disk on the giteas | 17:23 |
clarkb | fungi: ++ that would be great | 17:23 |
fungi | triggered | 17:23 |
clarkb | there are already 4 new changes too | 17:24 |
clarkb | hrm system-config is done replicating? that took suspciously little time | 17:25 |
clarkb | I see the refs on gitea01 though | 17:27 |
clarkb | I wonder if part of the reason we were slow replicating in testing was network bandwidth | 17:27 |
fungi | could be... | 17:28 |
fungi | i can trigger project-config next | 17:28 |
clarkb | ++ | 17:28 |
fungi | done | 17:28 |
clarkb | this might be overoptimization: but we may also want to do nova, neutron, cinder, horizon, openstack-manuals so that we can run teh gitea gc after they are done | 17:28 |
fungi | i assumed we were talking openstack/project-config not opendev/project-config in this case | 17:28 |
clarkb | since they should be the biggest repos | 17:28 |
clarkb | fungi: correct | 17:28 |
fungi | sure i can do nova next and see what happens | 17:29 |
clarkb | wow it says project-config is done | 17:29 |
fungi | tailing replication_log in the screen session is probably not useful. it lags waaaay behind because of how verbose the logs is | 17:30 |
clarkb | spot checking project-config on gitea01 shows that it seems to have worked too | 17:30 |
clarkb | I see refs/changes/xy/abcxy/meta | 17:30 |
clarkb | but ya lets work through that list I posted just above, check disk usage on gitea01 and run gc on all the giteas if it looks like we expanded disk use a lot | 17:31 |
clarkb | then when we're happy with that trigger full replication then start looking at zuul I guess | 17:31 |
fungi | the replication log is really, really busy though, are you sure it's not actively replicating everything? | 17:32 |
clarkb | fungi: gerrit show-queue -w says no | 17:32 |
fungi | strange | 17:32 |
clarkb | if I start a new tail on replication_log its quiet | 17:32 |
clarkb | I think thats just screen and ssh buffering with large amounts of text? | 17:33 |
fungi | previously it was very noisy but now it seems to have quiesced, yeah | 17:33 |
fungi | okay, i'll do nova now | 17:33 |
clarkb | ++ | 17:34 |
fungi | and it should be running | 17:34 |
clarkb | I see it in the show queue | 17:34 |
fungi | yeah | 17:34 |
clarkb | I see disk use slowly increasing on gitea01 so it seems to be doing things | 17:36 |
fungi | status notice The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored. | 17:40 |
fungi | something like that ^? | 17:40 |
clarkb | sounds good to me | 17:40 |
clarkb | nova replication is done according to show queue and disk use increased by about a gig so ya I think doing some of these big ones first, gc'ing then doing everything is a good idea | 17:41 |
fungi | #status notice The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored. | 17:41 |
openstackstatus | fungi: sending notice | 17:41 |
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored. | 17:41 | |
fungi | okay, i'll do openstack-manuals next | 17:42 |
clarkb | ++ | 17:42 |
fungi | and it's running | 17:42 |
clarkb | and honetly at the rate these have gone I think we should start global replication, benchmark it, then see if we can wait a bit before starting zuul since it seems quick. If benchmarks say it will eb all day then nevermind | 17:42 |
fungi | sure | 17:43 |
clarkb | since that will rule out any out of sync unexpectedness | 17:43 |
clarkb | manuals is done | 17:44 |
openstackstatus | fungi: finished sending notice | 17:44 |
fungi | neutron next? | 17:44 |
clarkb | I think you can just enqueue the others in the list and let gerrit figure out ordering | 17:44 |
clarkb | I would just tell it to do neutron cinder and horizon now | 17:45 |
fungi | yup, was just finding your original list in scrollback | 17:45 |
clarkb | that list is based on which things were slow to gc which implies more data/more refs | 17:46 |
fungi | triggered all three | 17:46 |
clarkb | horizon is done, neutron and cinder still running | 17:47 |
* mnaser is playing around gerrit right now | 17:48 | |
fungi | just be aware zuul is still offline | 17:49 |
mnaser | fungi: yep! i'm just trying to see if the gerrit functionality itself seems to be okay | 17:49 |
fungi | thanks, appreciated! | 17:49 |
mnaser | i am noticing a few things, none are critical of course, but "oh, interesting" type of tings | 17:49 |
clarkb | mnaser: ya I expect a lot of that :) | 17:49 |
fungi | sure, i'm going to hate the new ui for a while i'm sure | 17:50 |
mnaser | i.e. anything except verified/code-review/workflow are under this thing called "Other labels" | 17:50 |
clarkb | polygerrit adds a bunch of new excellent features and some not so great things | 17:50 |
mnaser | so roll call votes in governance are under "Other labels" | 17:50 |
mnaser | backport candidate patches seem to be affected too, not a big deal but maybe good for us to know how it decides whats other and whats not | 17:50 |
clarkb | but where we were was a dead end so we're ripping the bandaid off and going to try and work upstream and with plugins etc to make stuff better | 17:50 |
clarkb | mnaser: have a link to a change so we can see that? | 17:50 |
mnaser | sure -- https://review.opendev.org/c/openstack/governance/+/760917 | 17:51 |
fungi | also i'm noticing that the gitweb links are broken, probably worth working on a proper link to gitea to replace those anyway | 17:51 |
mnaser | you can see rollcall-vote is under other labels, so is code-review in there (but i guess maybe that's cause code-review doesn't mean anything for merging inside openstack/governance) | 17:51 |
fungi | might be a good time to start a post-upgrade notes etherpad where we can collect lists of things which have changed people might ask about, and things we know are broken which will either be fixed or removed | 17:52 |
mnaser | yeah, i can start putting a few things in there too | 17:52 |
clarkb | ++ | 17:53 |
mnaser | some other minor things are the ordering of code review comments | 17:53 |
mnaser | it seems to be verified, code-review then workflow | 17:53 |
clarkb | I think it was that way before? | 17:53 |
clarkb | I've already forgotten | 17:53 |
mnaser | i remember you would see code-review, verified, workflow in the list | 17:53 |
mnaser | zuul always came in the middle, workflow was always at the end | 17:53 |
mnaser | (in the display of votes at least) | 17:54 |
clarkb | fungi: ok those replications are done and we're using 4gb extra disk. I'll trigger the gc cron on all of the giteas now? any other repos you think we should replicate first? | 17:54 |
* corvus checking in | 17:54 | |
clarkb | corvus: tl;dr is gerrit is up and seems ok so far. replication is much quicker than anticipated. We are manually triggering replication for "large" repos so that we can gc on the giteas to pack back down again then start global replication | 17:54 |
clarkb | after that we'll eb looking at zuul | 17:55 |
fungi | i've started a pad here https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes | 17:55 |
corvus | ++ | 17:55 |
fungi | mnaser: ^ | 17:55 |
mnaser | fungi: cool i'll fill those out | 17:56 |
fungi | clarkb: i agree, git gc on gitea next | 17:56 |
clarkb | corvus: fungi any other repos we should manually replicate? We have done system-config project-config nova neutron cinder horizon and openstack manuals | 17:56 |
fungi | then we can do a full replication | 17:56 |
corvus | can't think of any others | 17:56 |
clarkb | fungi: k will give corvus a minute to bring up any other repos that may be worth doing that to then I can do the gitea gc'ing | 17:56 |
clarkb | cool I'll work on gitea gc'ing now | 17:56 |
fungi | just to avoid overrunning the fs with all of them at once | 17:57 |
fungi | thanks! | 17:57 |
mnaser | something i remember broke last time we did an update was all the bp topic links from specs | 17:57 |
mnaser | i just tested one and its working just fine | 17:57 |
mnaser | specifically: https://review.opendev.org/#/q/topic:bp/action-event-fault-details from https://blueprints.launchpad.net/nova/+spec/action-event-fault-details as an example | 17:57 |
mnaser | oops | 17:58 |
mnaser | i found our first broken | 17:58 |
mnaser | Directly linked changes are redirecting to an incorrect port, Example: https://review.opendev.org/712697 => Location: https://review.opendev.org:80/c/openstack/nova/+/712697/ | 17:59 |
mnaser | i added that to the etherpad | 17:59 |
mnaser | i remember fixing that inside our gerrit installation actually, let me find | 17:59 |
clarkb | that could be related to the thing fungi linked about after the bug fixing this week | 18:00 |
fungi | mnaser: that may be a known issue, at least wmf and eclipse ran into it and filed bugs | 18:00 |
mnaser | if i remember right, we did this: `listenUrl = proxy-https://*:8080/` | 18:00 |
mnaser | or maybe that was for https redirection stuff | 18:00 |
fungi | apparently we can fiddle the proxy settings in apache if it's the same issue | 18:00 |
* fungi checks notes | 18:00 | |
clarkb | all 8 giteas are gc'ing now | 18:01 |
fungi | mnaser: can you see if it looks like https://bugs.chromium.org/p/gerrit/issues/detail?id=13701 | 18:02 |
clarkb | using /c/number works fwiw | 18:02 |
clarkb | that may be an easy workaround for now if necessary | 18:02 |
fungi | if so the solution is supposedly "X-Forwarded-Proto" expr=%{REQUEST_SCHEME}" in our vhost config | 18:02 |
corvus | clarkb: well, the /# links are supposed to be "permalinks" so i don't think "use /c" is an easy solution (the problem is existing links point there) | 18:03 |
mnaser | that makes sense | 18:03 |
clarkb | corvus: yup we should fix it | 18:03 |
corvus | fungi: x-forward-proto makes sense to me | 18:03 |
mnaser | "X-Forwarded-Proto is now required because of underlying upgrade of the Jetty library, when Gerrit is accessed through an HTTP(/S) reverse-proxy." | 18:03 |
clarkb | I think I have figured out why replication timing is so much better. its because we're not replicating all the actual git content now | 18:03 |
mnaser | indeed, so yes, that does all make sense | 18:04 |
corvus | anyone writing an x-forwarded-proto change? | 18:04 |
clarkb | I'm not | 18:04 |
corvus | looks like i am :) | 18:04 |
clarkb | in fact I need to find something to drink. back shortly | 18:05 |
* mnaser keeps looking | 18:05 | |
corvus | i kind of want to ninja the fix in first just to make sure it works | 18:05 |
fungi | corvus: please feel free to hand-patch it into the config first | 18:05 |
corvus | k will do both | 18:05 |
fungi | i agree the change isn't much good if the fix turns out to be incorrect for our deployment for some reason | 18:06 |
mnaser | i'll add the "you'll need a new version of git-review" to "what's changed" | 18:06 |
mnaser | as i guess that might come up | 18:06 |
corvus | mnaser: redirect look good now? | 18:08 |
mnaser | corvus: yes! working in my browser and curl shows the right path too | 18:08 |
clarkb | yay | 18:09 |
mnaser | seems like gerritbot is not posting changes | 18:10 |
fungi | mnaser: i thnik it's git-review>=1.26 | 18:10 |
mnaser | i am not sure if thats cause its turned off or | 18:10 |
fungi | it probably needs to be restarted now that the event stream is accessible | 18:10 |
fungi | i'll do that now | 18:10 |
clarkb | corvus: fwiw if you'relooking at the vhodt I think there may be old cruft in there we should cleanup. I always get lost when looking though | 18:11 |
mnaser | fungi: i see the 1.27.0 release notes have: "Update default gerrit namespace for newer gerrit. According to Gerrit documentation for 2.15.3, refs/for/’branch’ should be used when pushing changes to Gerrit instead of refs/publish/’branch’." -- is it not that change? | 18:11 |
corvus | remote: https://review.opendev.org/c/opendev/system-config/+/763577 Add X-Forwarded-Proto to gerrit apache config [NEW] | 18:11 |
fungi | gerritbot has been restarted | 18:11 |
clarkb | corvus: fungi ^ should we force merge that one too? | 18:11 |
corvus | clarkb: i see comments related to upgrade i will address them | 18:11 |
clarkb | corvus: well the upgradethins should be handled | 18:12 |
mnaser | well look at that, i can now post emojis in my changes without a 500 | 18:12 |
mnaser | :P | 18:12 |
clarkb | as part of the earlier force merge | 18:12 |
fungi | mnaser: thanks, yeah 1.27 sounds right, i was going from memory | 18:12 |
corvus | clarkb: oh, er, what do you want me to do? | 18:13 |
corvus | clarkb: i agree that the TODO lines have been removed in system-config master | 18:13 |
clarkb | I'm more thinking about what I think is old gitweb config. I don't think it needs doing now. I just mean someone that groks apache better than me should look at that vhost and audit it | 18:13 |
corvus | clarkb: i have manually removed them from the live apache config | 18:13 |
clarkb | as there may be a few cleanups we can do | 18:13 |
clarkb | corvus: thanks | 18:13 |
corvus | but they were already commented out, so that should all be a noop | 18:13 |
mnaser | is the "links" part in the gerrit change display something that is customizable by the deploy (where gitweb currently is listed?). if so, probably would be neat if we added a "zuul builds" link which went to a prefiltered zuul build search using the changeid! | 18:14 |
clarkb | the gitea gc's are still going. The cron only does one repo at a time | 18:14 |
clarkb | mnaser: you can probably write a plugin for that | 18:15 |
mnaser | ok i see, so the gitweb link comes from a plugin | 18:15 |
clarkb | mnaser: gitweb is built in but gitiles is a plugin | 18:15 |
clarkb | aiui | 18:15 |
mnaser | https://review.opendev.org/q/hashtag:%22dnm%22+(status:open%20OR%20status:merged) tags stuff working pretty neatly too | 18:15 |
fungi | but i feel like we should consider replacing that with a link to gitea anyway if we can | 18:15 |
corvus | mnaser: https://review.opendev.org/Documentation/dev-plugins.html#links-to-external-tools may be relevant? | 18:16 |
corvus | looks like we'd need to do a tiny plugin | 18:16 |
mnaser | ou, that's pretty cool and seems like it would be quite straightforward too | 18:16 |
corvus | not sure if that's the right interface to put it in the 'links' section | 18:17 |
corvus | but seems pretty close to that | 18:17 |
corvus | could incorporate that into the zuul plugin | 18:17 |
clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/763577 lgtm if you want to review that one and force merge it too? | 18:18 |
corvus | speaking of which https://gerrit.googlesource.com/plugins/zuul/ | 18:18 |
mnaser | https://review.opendev.org/c/openstack/project-config/+/763576 seems to work pretty well too for a WIP change that is accessible :) | 18:18 |
corvus | also https://gerrit.googlesource.com/plugins/zuul-status/ | 18:18 |
corvus | btw gertty has half-implemented support for hashtags | 18:18 |
corvus | i will be motivated to finish it now :) | 18:19 |
clarkb | mnaser: ya one followup we can look at doing is removing workflow -1 | 18:19 |
mnaser | it seems like i see some 3pci still reporting to cinder, so they're probably 'just fine' | 18:19 |
fungi | 763577 is merged | 18:20 |
mnaser | it looks like you can mark a change as private, which i guess can be useful | 18:20 |
clarkb | yup and gerritbot reported it | 18:20 |
clarkb | mnaser: hrm I think we should actually disable that | 18:21 |
fungi | indeed it did | 18:21 |
mnaser | yeah i remmber it was disabled before | 18:21 |
fungi | well, "drafts" were disabled | 18:21 |
clarkb | mnaser: I don't want people assuming "private" is really "private' until we can check it | 18:21 |
clarkb | ya private is a newer thing iirc | 18:21 |
mnaser | i wonder if you can enable it per project too, or for specific users | 18:21 |
fungi | but gerrit removed drafts and replaced them with two features, private changes and work in progress status | 18:21 |
mnaser | would be really nice for embargo'd security changes | 18:21 |
clarkb | "Do not use private changes for making security fixes (see pitfalls below)" | 18:22 |
clarkb | no it won't be :P | 18:22 |
mnaser | aha | 18:22 |
clarkb | this is why I don't want it enabled if we can disable it | 18:22 |
clarkb | drafts was a honeypot and private will likely be too | 18:22 |
mnaser | i'll add it to "what's changed" for now | 18:22 |
clarkb | https://gerrit-review.googlesource.com/Documentation/intro-user.html#private-changes that quote is from there | 18:23 |
clarkb | we can set change.disablePrivateChanges to true | 18:23 |
fungi | yeah, it's an attractive nuisance | 18:23 |
fungi | i agree we should disable it | 18:23 |
clarkb | in gerrit.config | 18:23 |
fungi | i can write that change now | 18:23 |
clarkb | thanks | 18:23 |
mnaser | i moved my test change back to public in case that causes some issue about disabling it with a private change already there | 18:24 |
clarkb | mnaser: thanks, though I expect its fine. usualyl that stuff gets enforced when you push | 18:25 |
clarkb | similar to hwo we disabled drafts, the old drafts were fine | 18:25 |
clarkb | giteas are about done gc'ing | 18:25 |
fungi | what change topic were we using for upgrade-related changes? | 18:25 |
mnaser | oh that's quite cool -- if you look at a change diff and click on "show blame", it shows the blame and you can click to go to the original change behind it | 18:26 |
fungi | nifty | 18:26 |
clarkb | gitea06 onyl has 19GB free disk. I'm going to look at that as its much lower than the others | 18:26 |
clarkb | fungi: I was using gerrit-upgrade-prep | 18:26 |
fungi | thanks | 18:27 |
clarkb | and have a couple of wip changes there that we should land once we're properly settled | 18:27 |
clarkb | fungi: I wonder if we shouldn't manually apply that change and force merge it too | 18:27 |
fungi | it'll need a service restart | 18:28 |
clarkb | ya | 18:28 |
clarkb | probably not is the best time for those? | 18:28 |
clarkb | s/not/now/ | 18:28 |
clarkb | since we're telling people its not ready yet | 18:28 |
fungi | change is 763578 | 18:30 |
fungi | i'll hand edit the config now and restart gerrit | 18:31 |
fungi | i have the line added in the screen session if you want to double-check | 18:32 |
clarkb | screen looks correct | 18:32 |
mnaser | it looks like a change owner can set an assignee for their change | 18:32 |
clarkb | I'm still trying to sort out gitea06 disk | 18:32 |
mnaser | i'm not too sure what an assignee really .. means | 18:33 |
clarkb | the gitea web container us using 20gb of disk in /var/lib/docker/containers | 18:33 |
fungi | that may be to support workflows where reviewers are auto-assigned | 18:33 |
fungi | mnaser: ^ | 18:34 |
clarkb | whcih should be separate from the bind mounted stuff which is where we expect data to go | 18:34 |
fungi | okay, restarting the service | 18:34 |
corvus | mnaser: i'm also not sure who's supposed to check the "resolved" box for comments. the author or the reviewer? | 18:34 |
clarkb | I expect that if I restart gitea on 06 that will clean up after itself | 18:34 |
corvus | mnaser: we'll have some more cultural things to figure out | 18:34 |
clarkb | but maybe I should exec into it first and figure out where the disk is used | 18:34 |
mnaser | corvus: yep, as the assignee of a change seems to be a 1:1 mapping too | 18:35 |
mnaser | clarkb: i'd probably see why it ran away with so much disk space in the first place out of my curiosity :) | 18:35 |
clarkb | it is the log file | 18:36 |
clarkb | I'll compress a copy into my homedir then down up the container? | 18:37 |
fungi | wfm | 18:37 |
mnaser | hmmmmmmm | 18:37 |
mnaser | you can change your full display name inside gerrit right now | 18:37 |
clarkb | mnaser: you always could | 18:37 |
mnaser | oh, i thought you could change the formatting | 18:38 |
clarkb | some people would stick their irc nicks in there or put away messages | 18:38 |
fungi | nope, always was allowed | 18:38 |
mnaser | ah got it | 18:38 |
fungi | but now away messages are unnecessary, because... | 18:38 |
fungi | you can set your status! | 18:38 |
mnaser | indeed | 18:38 |
fungi | actually what's changed around the name is that it has a separate "display name" and "full name" | 18:39 |
fungi | you can change them both | 18:39 |
fungi | used to just be a full name | 18:39 |
mnaser | unrelated but | 18:40 |
mnaser | the static link url to the CLA is well, ancient | 18:40 |
mnaser | https://review.opendev.org/static/cla.html | 18:40 |
mnaser | "you agree that OpenStack, LLC may assign the LLC Contribution Agreement along with all its rights and obligations under the LLC Contribution License Agreement to the Project Manager." | 18:40 |
fungi | mnaser: technically still accurate | 18:40 |
mnaser | openstack, llc? :p | 18:41 |
fungi | mnaser: yep | 18:41 |
clarkb | hrm using xz because I don't want a 2GB gzip file | 18:41 |
clarkb | but this is slow | 18:41 |
clarkb | fungi: has gerrit restarted? | 18:41 |
mnaser | well IANAL but if it works, it works | 18:42 |
fungi | mnaser: section #9 contains the previous icla | 18:42 |
fungi | because lawyers | 18:42 |
fungi | it's an icla within an icla | 18:42 |
fungi | clarkb: yes | 18:42 |
fungi | in theory private changes should no longer appear as an option | 18:42 |
clarkb | fungi: cool the change for that lgtm. I +2'd it if you want to force merge it too | 18:43 |
clarkb | once I've got gitea06 in a good spot I t hink we're ready to start replicating more things | 18:43 |
clarkb | I'll give it say 5 minutes on the xz but if that isn't done switch to gz? | 18:44 |
fungi | mnaser: the short answer is that agreeing to the new license agreement carries a clause saying you agree that contributions previously made under the old agreement can be assumed to be under the new agreement, and part of doing that is specifying a copy of the old agreement | 18:44 |
fungi | i'll merge the private disable change now | 18:45 |
fungi | clarkb: care to add a workflow +1? | 18:46 |
clarkb | done | 18:46 |
fungi | thanks | 18:46 |
fungi | and now it's merged and i've removed my admin account from project bootstrappers again | 18:46 |
clarkb | thanks for taking care of that | 18:46 |
fungi | np | 18:46 |
mnaser | do we have an 'opendev' plugin in ue? | 18:47 |
mnaser | i was researching on how to add the opendev logo and replace 'Gerrit' by 'OpenDev', found out it was possible by writing a style plugin | 18:48 |
mnaser | i've found the one used by chromium -- https://chromium.googlesource.com/infra/gerrit-plugins/chromium-style/+/refs/heads/master | 18:48 |
fungi | nope, though that raises the question whether we'd want an aio plugin for all our stuff or separate single-purpose plugins | 18:48 |
clarkb | what is ue? | 18:48 |
fungi | i assumed he meant "use" | 18:48 |
mnaser | ah yes, in use indeed | 18:48 |
clarkb | mnaser: no what we came to realize was thati f we tried to get every single thing like that done before we did the notedb transition in particular we'd just be making it harder and harder as more changes land | 18:49 |
mnaser | clarkb: oh yes, of course, i agree :) | 18:50 |
clarkb | instead it felt prudent to ugprade, then figure out what we need tochange as we're able to roll ahead with eg the 3.3 release | 18:50 |
clarkb | that comes out next week, maybe we'll upgrade week after | 18:50 |
mnaser | by the way, funny thing | 18:50 |
mnaser | in that plugin `if (window.location.host.includes("chromium-review")) {` | 18:50 |
mnaser | `} else if (window.location.host.includes("chrome-internal-review")) {` | 18:50 |
mnaser | https://chrome-internal-review.googlesource.com/ i wonder where this little guy goes :) | 18:51 |
fungi | behind a firewall/vpn you can't reach, no doubt | 18:51 |
fungi | it's likely full of googlicious goodness | 18:52 |
clarkb | ok its been more than 5 minutes and xz is still going. I'm going to stop it and see how big a gzip is | 18:52 |
fungi | xz takes a lot more memory/cpu to compress than gzip | 18:53 |
fungi | so not surpeising | 18:53 |
fungi | gz will probably still make it nearly as small | 18:53 |
clarkb | fungi: I went with xz to start beacuse compressing journald logs is significanlty better with it than gzip | 18:54 |
clarkb | that is why the devstack jobs use xz for that purpose | 18:54 |
clarkb | like an order of magnitude | 18:54 |
fungi | woah really? | 18:55 |
clarkb | ya | 18:55 |
fungi | i rarely see xz get that much of an advantage over gz. maybe 25% | 18:55 |
fungi | order of magnitude is impressive indeed | 18:55 |
clarkb | its like 30MB xz and 200MB gzip iirc | 18:55 |
mnaser | rest of the gerrit looks pretty good to me so far in terms of functionality at this point, i'll come try 'break' things again once zuul is back up :) | 18:56 |
fungi | i guess it's on super repetitive stuff | 18:56 |
* mnaser goes for a walk | 18:56 | |
mnaser | gl! | 18:56 |
fungi | thanks again mnaser! | 18:56 |
fungi | i'm going to need to break in an hour to light the grill and start cooking dinner | 18:56 |
clarkb | corvus: ^ if you're still around any thoughts on the zuul startup process I have on the etherpad? | 18:57 |
clarkb | fungi: ya lunch here is in about an hour and I barely ate breakfast so should haev something too | 18:57 |
clarkb | ok gzip is done. took 18GB down to 1.2GB so its probably going to give us more than enough space. I'm stopping gitea06 now using the safer process in the playbook | 18:58 |
clarkb | yup 35GB available now which I think is plenty | 19:00 |
clarkb | fungi: corvus I think we are ready to trigger global replication now. Gitea01 has the least free disk at 27GB but our git repo growth was about 15GB so I expect that to be plenty | 19:00 |
clarkb | fungi: ^ do you want to trigger that if you agree we're good? | 19:01 |
fungi | sounds good, i can trigger it as soon as you're ready | 19:01 |
clarkb | I guess I'm as ready as I will be. gitea06 is up now | 19:01 |
fungi | i've done `replication start --all --now` | 19:03 |
clarkb | I see things getting queued up in show-queue | 19:03 |
clarkb | it doesn't seem to load the queue items as quickly as before | 19:05 |
clarkb | the number is still climbing | 19:05 |
clarkb | heh its stream events the replication scheduled events for everything | 19:06 |
clarkb | peaked at just over 17k events in the queue | 19:08 |
clarkb | number is falling now (slowly) | 19:08 |
clarkb | I'm going to remove the digest auth option from all our zuul config files as the default is basic | 19:10 |
clarkb | this is required before we start zuul back up again, but I will wait on zuul startup until we've got eyeballs | 19:10 |
clarkb | looks like it may only be necessary on the scheulder? the others have it but no corresponding secret. I'll do the others for completeness | 19:12 |
fungi | sounds right | 19:15 |
clarkb | just under 16k events now so whatever that comes out for replicating | 19:16 |
fungi | only the scheduler performs privileged actions on gerrit, the other services just pull refs (at least in our deployment) | 19:16 |
corvus | clarkb: looking re zuul | 19:16 |
corvus | clarkb: 6.4.1 and 6.4.2? | 19:17 |
clarkb | corvus: ya | 19:17 |
corvus | clarkb: i think 6.4.2 is done arleady, right? | 19:17 |
clarkb | yup and 6.4.1 is done as of 30 seconds ago | 19:18 |
clarkb | I guess the question for you is do you think we should start zuul now or wait or do other things first? | 19:18 |
clarkb | zuul can't ssh into bridge to run ansible right now | 19:18 |
clarkb | so we should be able to bring it up, have it run normal ci jobs, be happy with it then work to reenable cd? | 19:18 |
corvus | clarkb: sgtm. i can't think of a reason to delay | 19:19 |
clarkb | looks like zuul_start.yaml starts the scheduler, then web, then mergers, then executors | 19:19 |
clarkb | do we want ot hack up a playbook to not exclude disabled or do it more manually? | 19:20 |
corvus | clarkb: i'd just hack out disabled then run that | 19:20 |
clarkb | ok I think it has to be in the same dir as what we run out of because it includes other roles? | 19:21 |
clarkb | I guess tahts fine because nothing is updating system-config on bridge right now | 19:21 |
fungi | are we planning on relying on ansible to undo the commented-out cronjobs or should we manually uncomment them (and when)? | 19:22 |
clarkb | fungi: I was going to rely on ansible | 19:22 |
clarkb | track-upstream isn't super critical | 19:22 |
clarkb | actually lets uncomment them because the gc'ing and the log cleanup is good to have | 19:23 |
clarkb | we can probably do that now? | 19:23 |
clarkb | corvus: fungi: I've got an edited zuul start playbook in the root screen on bridge | 19:23 |
clarkb | that is a vim buffer if you want to take a look at that before we run it | 19:23 |
fungi | okay, i'll uncomment the cronjobs now | 19:23 |
fungi | playbook in bridge root screen lgtm | 19:24 |
clarkb | down to 14.7k replication tasks now | 19:24 |
corvus | clarkb: lgtm | 19:24 |
corvus | clarkb: rember -f 20 :) | 19:24 |
clarkb | corvus: ++ | 19:24 |
corvus | or 50 is fine :) | 19:24 |
fungi | heh, 50 it is | 19:25 |
corvus | -f lots | 19:25 |
clarkb | that command was in the scrollback so easy to modify | 19:25 |
* fungi fasts fireball | 19:25 | |
clarkb | does that command look good to yall? | 19:25 |
fungi | er, casts | 19:25 |
fungi | yeah, looks fine | 19:25 |
corvus | ++ | 19:25 |
clarkb | ok running it | 19:25 |
fungi | success! | 19:26 |
clarkb | looks happy | 19:26 |
clarkb | now to see what the running service is like | 19:26 |
corvus | executors are deleting stale dirs | 19:26 |
corvus | 2020-11-21 19:25:55,459 DEBUG zuul.Repo: Updating repository /var/lib/zuul/git/opendev.org/inaugust/src.sh | 19:27 |
fungi | crontabs edited in root screen session on review.o.o if anyone wants to double-check those | 19:27 |
corvus | that is not going as quickly as i would expect | 19:27 |
corvus | i wonder if zuul is going to have to pull a lot of new refs | 19:28 |
corvus | oh okay, things are moving now | 19:28 |
corvus | i think we might have been stuck at branch iteration longer than i expected | 19:28 |
corvus | ie, the delay wasn't git, but rather the rest api querying branches | 19:28 |
corvus | cat jobs are proceeding | 19:29 |
clarkb | this takes about 5-10 minutes typically iirc | 19:29 |
corvus | i'm seeing a number of errors in gertty | 19:31 |
clarkb | I moved my temporary playbook into my homedir to avoid any trouble that may cause system-config syncing when we get there | 19:31 |
corvus | i have no reason to think they are on the gerrit side; more likely minor api tweaks | 19:31 |
corvus | zuul is running jobs in the openstack tenant | 19:32 |
clarkb | https://review.opendev.org/763599 for that change | 19:32 |
clarkb | down to 13.3k replication tasks | 19:32 |
fungi | corvus: gertty isn't logging any errors for me... did you change your auth from digest to basic? | 19:32 |
corvus | fungi: oh, not yet; that's not the error i'm getting but maybe it's a secondary effect | 19:33 |
corvus | 2020-11-21 19:31:30,509 WARNING zuul.ConfigLoader: Zuul encountered an error while accessing the repo x/ansible-role- | 19:33 |
corvus | bindep. The error was: | 19:33 |
corvus | invalid literal for int() with base 16: 'l la' | 19:33 |
corvus | zuul logged that error for a handful of repos ^ | 19:33 |
clarkb | corvus: I thnik I saw that scroll by in the zuul scheduler debug | 19:33 |
corvus | yeah | 19:34 |
clarkb | should I be digging into that or are you investigating? | 19:34 |
fungi | corvus: yeah, the error i remember gertty throwing when i had the wrong auth type was opaque to say the least | 19:34 |
corvus | i don't recall seeing that before, therefore i don't know if it could be upgrade related. but it doesn't seem like it should be -- that's in-repo content over the git protocol, so i don't think anything should be different. but i dunno. | 19:35 |
fungi | i've put a reminder in the post-upgrade etherpad for gertty users to update their configs | 19:35 |
clarkb | corvus: oh I see this is us talking git not api | 19:35 |
clarkb | three jobs have succeeded, but the other jobs on that chagne will take a while to run so will be a while before we see zuul comment back | 19:36 |
corvus | fatal: https://review.opendev.org/x/ansible-role-bindep/info/refs not valid: is this a git repository? | 19:37 |
corvus | that would explain the proximate cause of the zuul error | 19:37 |
clarkb | info/refs/ is there and file level permissiosn look ok | 19:38 |
clarkb | ansible-role-bindep doesn't show up in the error_log | 19:39 |
corvus | i can clone it over ssh | 19:39 |
corvus | is there a problem with "x/" repos and http? | 19:39 |
clarkb | x/ranger reproduces (just a random one I remembered was in x/) | 19:40 |
clarkb | I wonder if this is a permissions issue perhaps related to the bug that got mitigated? | 19:41 |
corvus | just for 'x/' though? | 19:41 |
clarkb | review-test reproduces fwiw | 19:41 |
clarkb | if you search for chagnes in those repos you can see them | 19:43 |
clarkb | in the web ui I mean | 19:43 |
corvus | if i curl info/refs for x repos, i get the gerrit web app | 19:44 |
corvus | i'm a little worried there's some kind of routing thing in gerrit that assumes any one-letter path component is not a repo | 19:45 |
clarkb | oh fun | 19:45 |
fungi | yikes | 19:45 |
corvus | no basis for that other than observed behavior | 19:45 |
corvus | i'm going to start looking at gerrit source code | 19:46 |
clarkb | ok | 19:46 |
clarkb | down to 11.1k replication tasks and things look good on gitea01 disk wise | 19:48 |
clarkb | its x/ | 19:49 |
clarkb | corvus: java/com/google/gerrit/httpd/raw/StaticModule.java | 19:49 |
clarkb | it serves something related to polygerrit judging by the path names | 19:49 |
clarkb | s/path/variable/ | 19:49 |
corvus | clarkb: thx | 19:49 |
corvus | clarkb: poly gerrit extension plugins? | 19:52 |
clarkb | ya the docs talk about #/x/<plugin-name>/settings | 19:53 |
corvus | and /x/pluginname/*screenname* | 19:53 |
clarkb | do we need to start talking about renaming them? | 19:55 |
clarkb | I did test a rename and if you move the project in gerrit's git dir everything seems to be fine except for project watches config | 19:56 |
clarkb | you can do an online reindex too | 19:56 |
clarkb | or maybe this is somethign to pull luca in on | 19:56 |
corvus | i think a surprise project rename might be disruptive | 19:56 |
clarkb | agreed | 19:57 |
corvus | grepping logs, i'm not seeing any currently legit access for /x/* | 19:58 |
corvus | (other than attempted clones) | 19:58 |
corvus | there are some requests for fonts: /x/fonts/roboto/Roboto-Bold.ttf | 19:58 |
corvus | but i'm not sure those are actually returning fonts (i think they may just return the app) | 19:58 |
clarkb | thinking out loud here. I wonder if we can convince the gerrit http server to check for x/repo first then fallback to x/else | 19:59 |
corvus | clarkb: i think long term if gerrit wants to own x/ we can't have it | 20:00 |
clarkb | ya agreed, I figure something liek that wold be so we can schedule a rename not today | 20:00 |
corvus | but short term, i'm wondering if, since it doesn't seem like our gerrit is using x/ right now, we can rebuild it without that exclusion then work on a rename plan | 20:00 |
fungi | i'm around to review a gerrit patch, though getting started grilling | 20:01 |
corvus | (if we're right about x/ being used for plugins, then it'll become an issue as we add polygerrit plugins) | 20:01 |
fungi | i assume we'll want to start a thread on repo-discuss noting that polygerrit has made some repository names impossible. that seems like a bug they would be interested in fixing | 20:01 |
corvus | fungi: i assume they'll fix it with a doc change saying 'don't use these' | 20:02 |
clarkb | corvus: ya maybe we can add a sed to the jobs to comment that out on the 3.2 branch which will rebuild image then pull that and use it? | 20:02 |
corvus | just like /p/ and /c/ are unavailable | 20:02 |
corvus | clarkb: sounds good | 20:02 |
clarkb | corvus: do you want to write that change or should I/ | 20:03 |
fungi | if you can't use repositories whose names start with c/ or p/ or x/ but gerrit doesn't prevent you from creating them, that sounds like a bug | 20:03 |
corvus | clarkb: you if you're available | 20:03 |
clarkb | also I think we should trim down the images so its just 3.2 on that chagne | 20:03 |
clarkb | ok working on that now | 20:03 |
fungi | for not properly separating api paths from git project paths | 20:03 |
corvus | fungi: perhaps gerrit does prevent creation; we should check that | 20:03 |
corvus | i imagine we should just no longer allow single-char in the initial path component of project names to be safe for the future | 20:04 |
clarkb | ++ | 20:05 |
fungi | or is there a more correct path prefix we should switch to using to access git repositories? | 20:05 |
clarkb | I always have to spend 10 minutes figuring out how we build the gerrit wars in these jobs | 20:06 |
clarkb | fungi: the download urls are rooted at / | 20:06 |
clarkb | I checked that as I wondered too | 20:06 |
fungi | and arent' configurable? | 20:06 |
fungi | because that seems like it would be a relatively minor fix... deprecate the / routing for project names and add a new prefix | 20:07 |
fungi | and instruct users to migrate to the new prefix and then eventually rtemove the download routing at / in a later release | 20:08 |
*** Alex_Gaynor has joined #opendev-meeting | 20:10 | |
*** Alex_Gaynor has left #opendev-meeting | 20:10 | |
corvus | clarkb: i came to the same conclusion | 20:14 |
corvus | i mean, /p/ *used* to work :/ | 20:14 |
clarkb | remote: https://review.opendev.org/c/opendev/system-config/+/763600 Handle x/ prefix projects on gerrit 3.2 | 20:15 |
clarkb | I figure we can pull that image onto review-test and test out there first, then if that looks ok do it to prod | 20:15 |
clarkb | and I'll update my change so that we can land it | 20:16 |
corvus | clarkb: ++ | 20:16 |
corvus | clarkb: what needs to be updated? | 20:16 |
clarkb | corvus: stuff around which jobs to run I think | 20:16 |
clarkb | corvus: I removed 2.13 - 3.1 since they aren't necessary to get that image | 20:17 |
ianw | o/ ... well done everyone! | 20:17 |
corvus | clarkb: can't we land that? | 20:17 |
fungi | since luca reached out when he saw our upgrade was in progress and suggested we should let him know if we hit any snags, is this something we should give him a heads up about? | 20:17 |
clarkb | if we add them back in I need to make the sed branch specific. If we don't add them back in then I need squash it into fungi's use regular stable branches change I think | 20:17 |
corvus | fungi: yes | 20:18 |
clarkb | corvus: yes I think I need to update the system-config-run dependency maybe? | 20:18 |
corvus | i think we should send an email saying we found this issue and our proposed solution and see if he thinks it's ok | 20:18 |
clarkb | corvus: I'm sorry these jobs always confuse me | 20:18 |
clarkb | I'm basically hsut saying that we need to review teh job updates carefully if we land this | 20:18 |
fungi | i've got the grill starting so i'm happy to throw a quick e-mail out there pointing to our workaround and asking for suggestions | 20:18 |
clarkb | fungi: go for it | 20:18 |
clarkb | 5.8k on the replication | 20:22 |
clarkb | gitea01 is down to 18GB free. Should have plenty for the remaining replication | 20:24 |
clarkb | I'm going to find some food while I wait for zuul to build that image | 20:25 |
fungi | reply sent to luca, seems like my patio is experiencing unnecessary levels of packet loss so i'm less responsive than i might otherwise be at the moment | 20:31 |
clarkb | my ansible is bad | 20:32 |
clarkb | fixing | 20:32 |
ianw | the new UI is so much faster, very pleasant for us high latency users | 20:32 |
clarkb | new ps has been pushed | 20:34 |
clarkb | infra-root. I added myself to project bootstrappers and admins on review-test. Then went to /plugins/ which returns a json doc of plugins | 20:39 |
clarkb | the index_url for each plugin we have is listed there and they all start with plugins/ not x/ | 20:40 |
clarkb | (just another data point towards the safety of this change) | 20:40 |
clarkb | I think toget that document you have to be in the amdins group | 20:41 |
clarkb | could probably get it via the rest api instead too | 20:41 |
corvus | clarkb: yeah, if i'm following correctly x/ might be used by polygerrit plugins to serve certain resources | 20:41 |
clarkb | corvus: hrm are any of the plugins we have polygerrit plugins? I assume that some are like the codemirror-editor and download-commands? | 20:43 |
corvus | clarkb: no idea | 20:45 |
corvus | clarkb: ansible pares error again | 20:46 |
clarkb | k, can someone look at it really quickly? I feel like my brain isn't working | 20:47 |
corvus | clarkb: will do | 20:47 |
clarkb | poking at codemirror editor on review-test with ff dev tools it self hosts its static contents looks like | 20:48 |
clarkb | I think I see it/ shell needs to be a list | 20:49 |
corvus | yes i'm on it | 20:50 |
clarkb | k | 20:50 |
corvus | - "/x/*", | 20:50 |
corvus | + //"/x/*", | 20:50 |
corvus | clarkb: that's the intended change, yeah? | 20:50 |
clarkb | corvus: yes | 20:50 |
corvus | i'm validating it makes it all the way through ansible unscathed | 20:50 |
clarkb | it comments out that line with the /x/* in it | 20:50 |
corvus | clarkb: pushed | 20:51 |
corvus | i figured i'd double check the whole thing to save us any more round trips | 20:51 |
clarkb | thanks | 20:51 |
clarkb | ++ | 20:51 |
clarkb | ~600 replication tasks now | 20:56 |
fungi | one this is built, pulled and restarted, do we need to restart the executors and mergers as well? | 20:58 |
clarkb | its running the bazelisk build now | 20:59 |
clarkb | fungi: you want to respond to luca? | 21:00 |
clarkb | and file the bug? | 21:00 |
clarkb | replication is done. I'm going to do another round of gc'ing on the giteas | 21:00 |
fungi | oh, cool, he already replied. yeah i can do that immediately after dinner | 21:01 |
clarkb | fungi: specifically I think the bit that was missing in the email was that its cloning repos | 21:01 |
fungi | yes | 21:02 |
clarkb | giteas are gc'ing now | 21:03 |
corvus | build finished | 21:13 |
corvus | docker://insecure-ci-registry.opendev.org:5000/opendevorg/gerrit:f76ab6a8900f40718c6cd8a57596e3fc_3.2 | 21:13 |
clarkb | cool I'll get that on review-test momentarily | 21:14 |
corvus | i'm also running it locally for fun | 21:14 |
corvus | or will, when it downloads, in a few minutes | 21:14 |
clarkb | note review-test's LE cert expired a few days ago and we decided to leave it be | 21:15 |
clarkb | cloning x/ranger from review-test works now | 21:16 |
corvus | \o/ | 21:17 |
clarkb | https://review-test.opendev.org/x/fonts/fonts/robotomono/RobotoMono-Regular.ttf is a 404 | 21:18 |
corvus | clarkb: but it's also not a real thing on prod | 21:19 |
clarkb | ya I guess not | 21:19 |
clarkb | I just wanted to see what it does there | 21:19 |
corvus | clarkb: want me to update your patch with the system-config-run change? | 21:20 |
clarkb | corvus: that would be swell | 21:20 |
clarkb | then I think it sould be landable? | 21:20 |
corvus | clarkb: actually... maybe we should make this 2 changes | 21:22 |
clarkb | corvus: I'm good with that too | 21:22 |
clarkb | just the sed then a cleanup? | 21:22 |
corvus | yep | 21:22 |
clarkb | wfm | 21:22 |
corvus | i'll take care of that | 21:23 |
clarkb | corvus: remember you need to check the branch if you do that | 21:23 |
corvus | clarkb: meanwhile, we have a built image -- want to go ahead and run it on prod? | 21:23 |
clarkb | or have 3.2 use a differnet playbook | 21:23 |
corvus | clarkb: how about we invert the order? | 21:23 |
clarkb | corvus: that also works | 21:23 |
corvus | remove old stuff, then the x/ change | 21:23 |
clarkb | ++ | 21:23 |
corvus | will be easy to revert | 21:23 |
clarkb | for prod any concern that this may break something else? or are we willing to find out the hard way :) | 21:23 |
corvus | clarkb: i think we've done the testing we can | 21:24 |
clarkb | ok | 21:24 |
clarkb | I'll do this in the screen fwiw | 21:24 |
corvus | i'm not worried about it breaking anything in a way we can't roll back | 21:24 |
clarkb | gerrit is starting back up again on prod | 21:26 |
clarkb | hrm the chagne screen isn't loading for me though I thought I tested taht on review-test too | 21:28 |
clarkb | oh there it goes | 21:28 |
clarkb | I just need patience | 21:28 |
clarkb | I can clone ranger from prod via https now too | 21:28 |
corvus | remote: https://review.opendev.org/c/opendev/system-config/+/763616 Remove container image builds for old gerrit versions [NEW] | 21:30 |
corvus | remote: https://review.opendev.org/c/opendev/system-config/+/763600 Handle x/ prefix projects on gerrit 3.2 | 21:30 |
corvus | clarkb: i think we should do a full-reconfigure in zuul | 21:30 |
corvus | i'll do that | 21:30 |
clarkb | oh I should go rsetart gerritbot now that I restarted gerrit | 21:30 |
clarkb | corvus: ++ | 21:30 |
clarkb | gerritbot has been restarted | 21:30 |
corvus | i have more work to do on those image build changes; on it | 21:31 |
clarkb | btw zuul commented a -1 on https://review.opendev.org/c/openstack/os-brick/+/763599/ which was the first change that started runnign zuul jobs. That aspect of things looks good | 21:34 |
corvus | clarkb, fungi, ianw: remote: https://review.opendev.org/c/openstack/project-config/+/763617 Remove old gerrit image jobs from jeepyb [NEW] | 21:34 |
clarkb | +2 | 21:35 |
corvus | cat jobs are running | 21:36 |
clarkb | corvus: one small thing on https://review.opendev.org/c/opendev/system-config/+/763616 | 21:37 |
clarkb | I'm ahppy to fix the issue on ^ if you want to roll forward instead | 21:38 |
clarkb | er I mean fix it in a follow on | 21:38 |
corvus | clarkb: i'll respin | 21:38 |
clarkb | ok | 21:38 |
corvus | clarkb: respin done | 21:40 |
corvus | 2020-11-21 21:37:15,977 INFO zuul.Scheduler: Full reconfiguration complete (duration: 379.767 seconds) | 21:40 |
clarkb | and no more of thos errors? | 21:40 |
fungi | was review.o.o restarted with the fix? i guess so, my tests to reproduce the error don't fail | 21:41 |
fungi | what was the error message on attempting to clone? | 21:41 |
fungi | sorry, just now catching up since dinner's done | 21:41 |
corvus | fungi: heh, lemme see if i have a terminal open with the error :) | 21:41 |
fungi | back to nominal levels of packet loss again and can test things suitably | 21:41 |
fungi | thanks! | 21:41 |
fungi | working up the reply to luca now | 21:42 |
clarkb | another thing I notice is that gitweb doesn't work but gitiles seems to | 21:42 |
clarkb | I think we should just stop using gitweb maybe and have it gitiles | 21:42 |
clarkb | that isn't super urgent though | 21:42 |
clarkb | then we can add in gitea when we sort that out | 21:42 |
corvus | fungi: i don't, sorry :( | 21:42 |
corvus | 19:37 < corvus> fatal: https://review.opendev.org/x/ansible-role-bindep/info/refs not valid: is this a git repository? | 21:43 |
corvus | fungi: but i pasted that ^ | 21:43 |
corvus | that was about it | 21:43 |
corvus | clarkb: confirmed, no new 'invalid literal' errors from zuul | 21:44 |
clarkb | +2 from me on corvus' image stack | 21:45 |
corvus | +2 from me on clarkb's image stack | 21:46 |
clarkb | zuul stillcan't ssh into bridge (Ithink that is a good thing), once we've got these issues settled I figured we would use https://review.opendev.org/c/opendev/system-config/+/757161 this change as the canary for that? | 21:46 |
clarkb | my family has pointed out to me that I am yet to shower today though, so now might be time for me to take a break. | 21:46 |
clarkb | is there anything else you'd like me to do before I pop out for a bit? | 21:46 |
fungi | nope, go become less offensive to your family ;) | 21:48 |
corvus | i think now's a good break time | 21:48 |
clarkb | fungi: maybe you can include a diff for luca as well: http://paste.openstack.org/show/qz6zQ6a3jkRVluxebh8l/ | 21:48 |
corvus | fungi: can you +3 https://review.opendev.org/763617 ? | 21:48 |
fungi | corvus: thanks! i'll try to work with that | 21:48 |
fungi | yeah, will review | 21:48 |
fungi | and approved | 21:49 |
clarkb | giteas are still gc'ing but free disk space is going up so we should be more than good there | 21:49 |
clarkb | and now break time | 21:49 |
clarkb | I've also removed my normal user from privileged groups on review-test | 21:49 |
clarkb | as I am done testing there for now | 21:50 |
fungi | i've re-replied to luca, will start putting the bug report together shortly | 21:55 |
fungi | any other urgent upgrade-related tasks need my attention first? | 21:55 |
corvus | fungi: i don't think so. i'm about to +w the remaining image stack | 21:56 |
corvus | err, there's another error | 21:57 |
corvus | clarkb, fungi: can you +3 https://review.opendev.org/763616 ? | 21:59 |
corvus | missed an update for the infra-prod jobs to trigger on 3.2 builds | 22:00 |
fungi | yup, taking a look now | 22:01 |
corvus | current status: we need to merge https://review.opendev.org/763616 and https://review.opendev.org/763600 then the repos will match the image we're running in production. then we can proceed with enabling cd. aside from that, i think there's no known issues in prod and we're just waiting for replication to finish. | 22:02 |
fungi | i've approved 763616 now | 22:02 |
corvus | cool, then i'm going to afk for another errand | 22:02 |
corvus | infra-root: just a highlight ping for what i think is the current status (a couple lines up ^) as i think we're all on break while waiting for tasks to complete | 22:03 |
fungi | awesome, thanks again! | 22:04 |
fungi | https://bugs.chromium.org/p/gerrit/issues/detail?id=13721 | 22:44 |
fungi | if anyone feels inclined, please clarify mistakes or omissions therein | 22:45 |
clarkb | I'll take a look in a few. | 22:46 |
clarkb | gitea01 has finished gc'ing and has 22gb free which should be plenty for now | 22:47 |
clarkb | the others all have more free disk too | 22:48 |
clarkb | and are done as well | 22:48 |
clarkb | I think that means all the replication related activities are done | 22:48 |
clarkb | fungi: the bug looks good to me | 22:49 |
clarkb | I'm going to start drafting a "its up, this is what we've discovered, this is where we go from here" type email in etherpad | 22:50 |
fungi | thanks! don't forget to incorporate notes from https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes as appropriate | 22:52 |
clarkb | ya was going to link to that I think | 22:52 |
clarkb | https://etherpad.opendev.org/p/rNXB-vJe8IUeFnOKFVs8 is what I'm drafting | 22:55 |
ianw | fungi: no idea if it helps but i think x/ was introduced @ https://gerrit.googlesource.com/gerrit/+/153d46c367965cd7782a3ac86212c07b298eaca8 | 22:57 |
ianw | actually no, more to dig | 22:58 |
clarkb | the file was moved at some point which makes it difficult to go back in time with | 22:59 |
clarkb | I ended up doing a git log -p and grepping for it and giving up | 22:59 |
ianw | https://gerrit.googlesource.com/gerrit/+/7cadbc0c0c64b47204cf0de293b7c68814774652 | 23:00 |
ianw | + serve("/x/*").with(PolyGerritUiIndexServlet.class); | 23:00 |
ianw | that is really the first instance. i wonder if it's not really necessary and just been pulled along since | 23:00 |
clarkb | ianw: the docs hint at it but could still be dead code | 23:01 |
ianw | .. at least it's in a "add x/ this is a really important path never remove" type change i guess :) | 23:02 |
clarkb | not in or in? | 23:04 |
clarkb | https://etherpad.opendev.org/p/rNXB-vJe8IUeFnOKFVs8 ok I think thats largely put together at this point | 23:04 |
ianw | clarkb: minor suggestion on maybe something that explains the x/ thing at a high level but enough for people to understand | 23:13 |
clarkb | ianw: something liek that? | 23:14 |
ianw | yeah, i think so; feel like it explains how both want to "own" the /x endpoint | 23:15 |
ianw | namespace, whatever :) | 23:15 |
clarkb | oh shoot, I think there is a minor but not super important issue with https://review.opendev.org/763600 it doesn't update the dockerfile so we won't promote the image | 23:19 |
clarkb | corvus: ^ maybe thats something we can figure out manually or just push up another change that does a noop dockerfile edit? | 23:19 |
clarkb | double check me on all that first though | 23:20 |
clarkb | also I'm starting to feel the exhaustion roll in. If others want to drive things and get cd rolling again I'll do my best to help, otherwise, tomorrow morning might be good | 23:22 |
clarkb | ya I think the promote jobs for the 3.2 docker image tagging didn't run | 23:23 |
clarkb | I'll push up a noop job now to get that rolling | 23:23 |
clarkb | remote: https://review.opendev.org/c/opendev/system-config/+/763618 Noop change to promote docker image build | 23:26 |
ianw | i've got to run out, but i can get to the CD stuff early my tomorrow? i don't think we need it before then? | 23:26 |
clarkb | ya I don't think its super urgent unless others really want their sunday back. I'm just wiped out | 23:27 |
clarkb | fungi: corvus ^ fyi. Also any thoughts on that email? should I send that nowish? | 23:27 |
clarkb | infra-root Note that https://review.opendev.org/c/opendev/system-config/+/763618 or something like it should land before we start doing cd again | 23:28 |
ianw | ok, that's the new image with the x/ fix right? | 23:29 |
clarkb | yes | 23:29 |
ianw | i.e. we don't want to CD deploy the old image | 23:29 |
clarkb | we actually just built it when corvus' changes landed but because we didn't modify files that the promote jobs match we didn't promote it | 23:29 |
clarkb | we could also do an out of band promote via docker directly if we want | 23:29 |
clarkb | 763618 should also take care of it since the dockerfile is modified | 23:30 |
ianw | ok, i have to head out but will check back later | 23:30 |
clarkb | ianw: o/ | 23:30 |
fungi | clarkb: sorry, stepped away for a bit, reading draft e-mail now | 23:37 |
fungi | made a couple of minor edits but lgtm in general | 23:41 |
clarkb | cool I'll wait abit to seeif corvus is able to take a look then send that out | 23:42 |
clarkb | fungi: and maybe a corresponding #status notice | 23:42 |
clarkb | I'm taking abreak now though. The tired hit me hardin the lastlittle bit | 23:42 |
fungi | yup, a status notice at the same time that e-mail gets sent would make sense | 23:43 |
clarkb | fungi: did you see 763618 too? | 23:46 |
fungi | likely not if you're asking | 23:47 |
fungi | approvidado | 23:48 |
corvus | reading scrollback | 23:48 |
corvus | clarkb: email lgtm | 23:50 |
clarkb | cool I'll send that out momentarily | 23:50 |
clarkb | how about this for the notice #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. However, we are still working through things and there may be additional service restarts during out upgrad window which ends at 01:00 November 23. | 23:55 |
corvus | clarkb: s/out upgrad/our upgrade/ | 23:55 |
clarkb | I can also add "See http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details" | 23:55 |
clarkb | how about this for the notice #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. However, we are still working through things and there may be additional service restarts during our upgrade window which ends at 01:00 November 23. See http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for | 23:55 |
clarkb | more details | 23:55 |
clarkb | is that just short enough if I drop my prefix? | 23:56 |
fungi | maybe squeeze it down a bit so it fits in a single notice | 23:57 |
fungi | i think statusbot will truncate it otherwise | 23:57 |
clarkb | like #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. However, we are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. See http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details | 23:57 |
fungi | or, rather, statusbot doesn't know to so the irc server ends up discarding the rest | 23:57 |
fungi | looks good. hopefully that's short enough | 23:58 |
clarkb | I can trim it a bit more but I'll just go ahead and send it with that trimming | 23:58 |
clarkb | #status notice Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. We are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details | 23:59 |
openstackstatus | clarkb: sending notice | 23:59 |
-openstackstatus- NOTICE: Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. We are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!