clarkb | fungi: ^ fyi | 00:01 |
---|---|---|
*** mlavalle has quit IRC | 00:01 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Revert "Set env vars pointing to correct file locations" https://review.opendev.org/719124 | 00:07 |
clarkb | I'm not 100% sure we need to do that yet, but I haven't found anything that would indicate the new paths work with non container gerrit | 00:08 |
clarkb | mordred: fungi ^ I'll leave that up and if someone else thinks it is also necessary we can land it | 00:11 |
fungi | exporting unused environment variables shouldn't break anything | 00:17 |
fungi | the jeepyb side of that hasn't landed yet anyway, right? | 00:17 |
clarkb | only the git dir is new | 00:18 |
clarkb | I belive the other 4 vars areexisting | 00:18 |
clarkb | and will point to invalid paths on non container gerrit | 00:18 |
fungi | is the current running non-container gerrit deployed by that playbook also, or still by puppet? | 00:23 |
clarkb | puppetis gone aiui | 00:24 |
clarkb | that playbook affects the current running gerrit I think as the host isnt in the emergency file | 00:24 |
clarkb | but its friday and I could be wrong | 00:25 |
fungi | no, i think you're right | 01:03 |
fungi | mordred: ^ | 01:03 |
fungi | i'll go ahead and approve the revert for now | 01:04 |
*** Eighth_Doctor has joined #opendev | 05:35 | |
Eighth_Doctor | 👋 | 05:35 |
*** moppy has quit IRC | 07:16 | |
*** moppy has joined #opendev | 07:31 | |
*** DSpider has joined #opendev | 07:41 | |
*** tosky has joined #opendev | 08:29 | |
zbr | are there any active/in-progress plans to upgrade opendev gerrit? | 09:13 |
*** sgw has quit IRC | 11:52 | |
*** ChanServ has quit IRC | 12:55 | |
*** ChanServ has joined #opendev | 13:03 | |
*** tepper.freenode.net sets mode: +o ChanServ | 13:03 | |
*** ChanServ has quit IRC | 13:08 | |
*** ChanServ has joined #opendev | 13:10 | |
*** tepper.freenode.net sets mode: +o ChanServ | 13:10 | |
mordred | zbr: yup | 13:14 |
fungi | all of the recent gerrit maintenances have been mainly in service of getting us to a point where we can easily upgrade | 13:18 |
mordred | clarkb: good catch | 13:19 |
mordred | so we should land the re-revert and do another restart aroud the same time | 13:20 |
mordred | fungi, clarkb: the revert didn't land - might be the same amount of waiting/time to just land https://review.opendev.org/#/c/719052/ and do a quick restart | 13:26 |
mordred | zbr: yeah - what fungi said. once we're done with this current maint (finishing switching deployment from puppet to ansible/docker) we'll be working on the upgrade plan | 13:28 |
fungi | mordred: ahh, i'll take a quick look | 13:30 |
fungi | oh, yeah, i'm already +2 on that one | 13:31 |
fungi | if we want to give it another quick try i'm up for that | 13:31 |
fungi | i've gone ahead and approved it | 13:32 |
mordred | fungi: cool. I think on a holiday saturday morning it should be fairly low impact - and shouldn't be _worse- than waiting on the revert to land | 13:37 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Install ep_headings module https://review.opendev.org/719123 | 13:39 |
mordred | fungi: there's the ansible for the hack from yesterday btw ^^ | 13:40 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run cloud_launcher from zuul https://review.opendev.org/718798 | 14:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop removing cloud-launcher cron https://review.opendev.org/718799 | 14:03 |
corvus | mordred: +2 on the npm thing, but honestly, i think building the image is going to be the better long-term solution -- mostly because if we're running npm on the host, suddenly we care about what version of node/npm is on the host, which is the main thing we want to avoid with all the container stuff. | 14:16 |
openstackgerrit | Merged opendev/jeepyb master: Fix issues from rolling out containers https://review.opendev.org/719052 | 14:25 |
mordred | corvus: yeah - I thnik you're probably right | 14:25 |
mordred | I'll work on some patches to do that a little later | 14:25 |
corvus | mordred: you had some earlier right? | 14:26 |
corvus | did we merge those and revert, or did we just revise them before merging? | 14:26 |
mordred | corvus: we revised before merging | 14:27 |
mordred | but I can go cherry-pick the changes out | 14:27 |
mordred | corvus, fungi : the jeepyb promote job for 2.13 has succeeded, so we should have new gerrit images, and I think the new scripts are already in place on gerrit - should we try another restart? | 14:28 |
mordred | I confirm - the new version of the scripts have been applied | 14:29 |
corvus | mordred: wow so fast :) | 14:30 |
mordred | :) | 14:30 |
corvus | mordred: sure, let me blink the sleep out of my eyes and let's go for it | 14:30 |
mordred | kk. I'm in the root screen on review | 14:30 |
corvus | i have joined | 14:31 |
mordred | status notice Restarting gerrit to fix an issue from yesterday's mataintenance | 14:31 |
mordred | yeah? | 14:31 |
mordred | wow. except that's horrible spelling | 14:31 |
mordred | status notice Restarting gerrit to fix an issue from yesterday's maintenance | 14:31 |
corvus | lgtm | 14:31 |
mordred | #status notice Restarting gerrit to fix an issue from yesterday's maintenance | 14:31 |
openstackstatus | mordred: sending notice | 14:31 |
-openstackstatus- NOTICE: Restarting gerrit to fix an issue from yesterday's maintenance | 14:32 | |
mordred | wow, openstackstatus is taking its time | 14:34 |
openstackstatus | mordred: finished sending notice | 14:35 |
mordred | corvus: ok. shall we? | 14:36 |
corvus | mordred: ++ | 14:36 |
corvus | there's like a constant stream of hangups from stackalytics-bot-2 in the error log... | 14:36 |
mordred | corvus: "neat" | 14:36 |
fungi | okay, back | 14:37 |
mordred | I suppose I could have pulled before stopping :) | 14:37 |
mordred | fungi: we're in root screen on gerrit | 14:37 |
corvus | live and learn | 14:37 |
corvus | mordred: the screen has stopped updating for me | 14:38 |
corvus | it's on extracting 208407758d73: | 14:38 |
fungi | yep, joining | 14:38 |
corvus | mordred: but it looks like gerrit is running | 14:38 |
corvus | what's going on? | 14:38 |
mordred | corvus: weird. yeah- it seems fine? | 14:38 |
corvus | mordred: did it finish and did you restart it? | 14:39 |
mordred | yes | 14:39 |
fungi | i saw control return to a shell prompt | 14:39 |
mordred | I'm now tailing logs | 14:39 |
mordred | [2020-04-11 14:38:54,813] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 2.13.12-11-g1707fec ready | 14:39 |
corvus | my screen caught up | 14:39 |
mordred | let me push upa patch to trigger some scripts | 14:40 |
corvus | [2020-04-11 14:40:25,990] [HookQueue-1] INFO com.googlesource.gerrit.plugins.hooks.HookTask : hook[patchset-created] output: FileNotFoundError: [Errno 2] No such file or directory: '/home/gerrit2/review_site/etc/gerrit.config' | 14:40 |
mordred | oh - somebody did | 14:40 |
mordred | really? | 14:41 |
mordred | why is patchset-created not updated/ | 14:41 |
mordred | ? | 14:41 |
mordred | I'm going to manually fix that real quick to make sure it fixes the issue | 14:42 |
mordred | it's bind-mounted in so it should fix wihtout restart | 14:42 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: WIP Update install-ansible away from /opt/system-config https://review.opendev.org/719186 | 14:43 |
mordred | did that patchset created trigger the error? | 14:43 |
corvus | [2020-04-11 14:43:17,782] [HookQueue-1] INFO com.googlesource.gerrit.plugins.hooks.HookTask : hook[patchset-created] output: TypeError: cannot use a string pattern on a bytes-like object | 14:43 |
corvus | http://paste.openstack.org/show/791957/ | 14:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Actually install patchset-created hook https://review.opendev.org/719187 | 14:44 |
fungi | another missed python3 fix i suppose | 14:44 |
mordred | corvus: STELLAR | 14:44 |
mordred | well - there's the hook fix | 14:44 |
fungi | d'oh | 14:44 |
mordred | ah - it's because subprocess.Popen | 14:46 |
fungi | i guess we need to .decode('utf-8') the fd from it? | 14:47 |
openstackgerrit | Monty Taylor proposed opendev/jeepyb master: Decode utf-8 from subprocess.Popen https://review.opendev.org/719188 | 14:49 |
mordred | corvus, fungi: ^^ | 14:49 |
mordred | I could exec into the container and apply that same fix live to double check it (and keep things going until that patch lands) | 14:50 |
corvus | mordred: sgtm to keep the loop going | 14:51 |
* mordred is a little worried that the slow version of whack-a-mole here might take an age | 14:51 | |
mordred | yeah | 14:51 |
fungi | well, it's not overly-broken, code review is working some hooks aren't successfully running, and another restart or several for new images ought to address it, right? | 14:53 |
mordred | k. done | 14:53 |
corvus | fungi: yeah, but we might be able to get that down to just one restart with all the fixes at once | 14:53 |
mordred | yeah - but I _think_ we're close enough that we might be able to get by with only one more restat | 14:53 |
mordred | yeah | 14:53 |
mordred | and then be actually done with this mess | 14:53 |
mordred | and remind ourselves to never write a completely untested program like jeepyb ever again | 14:54 |
fungi | well, yes, hopefully only one restart. granted each time we've restarted so far we thought we had all the fixes in ;) | 14:54 |
* mordred looks forward to reworking these hooks as zuul jobs | 14:54 | |
mordred | fungi: indeed :) | 14:55 |
corvus | i pushed a patchset | 14:56 |
corvus | i watched the gerrit queue, there are no more patchet-created hook entries | 14:57 |
corvus | so i think that means success | 14:57 |
mordred | \o/ | 14:58 |
mordred | I've got an update on the jeepyb patch - pep8 gods are unhappy | 14:58 |
openstackgerrit | Monty Taylor proposed opendev/jeepyb master: Decode utf-8 from subprocess.Popen https://review.opendev.org/719188 | 14:58 |
mordred | corvus, fungi : ^^ | 14:58 |
corvus | i will prepare breakfast while those land | 15:02 |
mordred | corvus: one of the promote jobs failed on the previous jeepyb patch (not important, it was 2.15) | 15:03 |
mordred | corvus: https://zuul.opendev.org/t/openstack/build/800a7224cf0143158e86ede8a9a35bdd/log/job-output.txt#89 | 15:03 |
mordred | corvus: we might want to put in some retries | 15:03 |
mordred | corvus: although ,.. that's a little weird ... why does it say tag=change_719052_2.13 - that's the 2.15 job | 15:04 |
mordred | all the vars seem to match in the jobs fwiw | 15:05 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Update install-ansible away from /opt/system-config https://review.opendev.org/719186 | 15:16 |
openstackgerrit | Davlet Panech proposed openstack/project-config master: Add kernel to StarlingX https://review.opendev.org/718772 | 15:16 |
mordred | corvus: ^^ step one in "run ansible from zuul checkout" - I believe that's an ok and self-standing change | 15:16 |
corvus | mordred: that was a 'list tags' call | 15:21 |
corvus | mordred: it looks kind of like a dockerhub internal consistency error | 15:22 |
corvus | mordred: that might explain why an unrelated tag was mentioned | 15:22 |
mordred | corvus: ah - nod | 15:25 |
Eighth_Doctor | hey folks! | 15:27 |
Eighth_Doctor | nice to see that this channel isn't dead :D | 15:27 |
corvus | mordred: left an idea on that change | 15:29 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Update install-ansible away from /opt/system-config https://review.opendev.org/719186 | 15:33 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run playbooks out of zuul checkout https://review.opendev.org/719190 | 15:33 |
mordred | corvus: cool - there's the followup to finish it | 15:33 |
mordred | corvus: yes - I think that's a good idea | 15:33 |
mordred | is it possible to pass explicit vars to template: ? | 15:34 |
corvus | mordred: i think you can do that for any task? | 15:34 |
mordred | corvus: so you can just add a vars: block to it? | 15:35 |
fungi | Eighth_Doctor: why would it be dead? ;) | 15:35 |
Eighth_Doctor | well, when I joined last night, I was the only person here :) | 15:36 |
Eighth_Doctor | and I've been in other openstack channels that looked empty before... | 15:36 |
fungi | ahh, i think a lot of us were drained by a long friday after a long week | 15:36 |
Eighth_Doctor | was something particularly bad happening? | 15:43 |
fungi | nah, just getting lots done! | 15:43 |
fungi | also weekends tend to be quieter | 15:48 |
Eighth_Doctor | so there was something I was curious about | 15:52 |
mordred | yeah - we had some maintenances to do and wanted to take advantage of the slow holiday friday as a good time to do that | 15:52 |
mordred | them | 15:53 |
Eighth_Doctor | why did opendev select gitea over other options? | 15:53 |
mordred | it was visually nice - and it allowed us to completely disable the features we don't use (like pull requests) | 15:53 |
mordred | we used cgit before - but our users weren't super thrilled with it as a code browser and so more consistently fell back to mirrors on github | 15:54 |
Eighth_Doctor | was pagure ever considered? | 15:54 |
corvus | Eighth_Doctor: there's some documentation about that decision: https://docs.opendev.org/opendev/infra-specs/latest/specs/opendev-gerrit.html | 15:54 |
corvus | also https://review.opendev.org/#/c/623033/ | 15:56 |
mordred | Eighth_Doctor: I believe I did look at it - iirc one of the issues was inability to full disable things like the pull request interface. I feel like there was another reason as well but I sadly don't remember what it was | 15:56 |
Eighth_Doctor | mordred: so we added the ability to disable damn near everything instance-wide last year | 15:57 |
corvus | was it related to search? | 15:57 |
corvus | gitea does have code searching (though we aren't able to use it yet, we still plan to enable it) | 15:57 |
fungi | Eighth_Doctor: however, we were making this decision in 218 | 15:57 |
fungi | 2018 | 15:57 |
Eighth_Doctor | ah | 15:57 |
Eighth_Doctor | pagure 5.0 was released at the end of 2018, and our zuul integration was completed in mid 2019 | 15:58 |
Eighth_Doctor | so that explains it... poor timing | 15:58 |
mordred | yeah - might have just been timing | 15:58 |
mordred | yeah | 15:58 |
fungi | well, we also didn't need zuul integration for our use case | 15:58 |
fungi | we definitely didn't want to replace our choice of code review system | 15:59 |
fungi | just needed a source browser | 15:59 |
Eighth_Doctor | fungi: having zuul status report back into commits is nice though :) | 15:59 |
fungi | why? | 15:59 |
fungi | i mean, if you're proposing changes to that system then yes, but we're not | 15:59 |
Eighth_Doctor | because it makes it very easy for people to reference back and forth between tested commits and such | 15:59 |
Eighth_Doctor | it's something I personally find handy, even if you're not using PRs | 15:59 |
fungi | we're just using it as a read-only code browsing frontend, not to do change review | 16:00 |
fungi | we do pretty much all our testing pre-merge | 16:00 |
Eighth_Doctor | right, that's a zuul feature | 16:00 |
fungi | so still not clear what we'd be reporting from zuul into the code browsing system | 16:01 |
Eighth_Doctor | well, whatever the merged commit was, it would have a status link back to zuul that people can click to see the test results | 16:01 |
fungi | zuul in our case is being triggered by activity in the code review system | 16:01 |
Eighth_Doctor | presumably also would have a link to gerrit, so you can see the reviews | 16:01 |
fungi | so zuul doesn't/wouldn't know about the code browser | 16:01 |
corvus | that's a good point, i wonder if we can link change-id footers in gitea back to gerrit | 16:02 |
Eighth_Doctor | it'd also be trivial to customize the template so that instead of showing a PR tab or issues tab, it'd give a link to gerrit for the project | 16:02 |
Eighth_Doctor | or storyboard for issues | 16:02 |
fungi | and yeah, we're working on getting the gerrit links displaying. they're in git notes, we just need to turn on displaying git notes in gitea now that it (i think?) added capability to display arbitrary notes refs | 16:02 |
Eighth_Doctor | Fedora's pagure instance does this to replace issues with a link to rhbz | 16:02 |
corvus | we did do that for gitea -- the "Proposed changes" tab links to gerrit | 16:02 |
Eighth_Doctor | https://src.fedoraproject.org/rpms/pagure | 16:02 |
fungi | we do have gitea configured to link to gerrit and storyboard or launchpad already | 16:02 |
Eighth_Doctor | oh nice, I guess I missed that piece | 16:03 |
Eighth_Doctor | I mainly looked at the zuul projects, since that's my main opendev interest atm :) | 16:03 |
Eighth_Doctor | but yeah, I see you already did that | 16:03 |
mordred | another nice thing about pagure - it's in python :) | 16:03 |
Eighth_Doctor | yeah :D | 16:03 |
Eighth_Doctor | also, another thing about pagure, docs are stored as a git repo :) | 16:04 |
fungi | yep, at https://opendev.org/zuul/zuul issues links off to https://storyboard.openstack.org/#!/project/zuul/zuul and proposed changes links to https://review.opendev.org/#/q/status:open+project:zuul/zuul | 16:04 |
Eighth_Doctor | (technically, same goes for issues and PR metadata, but you don't care about those) | 16:04 |
fungi | what docs are you talking about? | 16:04 |
Eighth_Doctor | project documentation (e.g. gh-pages, readthedocs, etc. stuff) | 16:05 |
fungi | ahh, well we already develop our documentation in git repos through code review anyway | 16:05 |
Eighth_Doctor | ah okay | 16:05 |
fungi | and use zuul to render/publish them | 16:05 |
openstackgerrit | Merged opendev/system-config master: Actually install patchset-created hook https://review.opendev.org/719187 | 16:05 |
Eighth_Doctor | well, then I guess the only thing left I have is pagure scales? | 16:05 |
Eighth_Doctor | it handles ~30K repos with ~10K concurrent users accessing performantly from one server (src.fedoraproject.org) | 16:06 |
Eighth_Doctor | and has means for scaling beyond that | 16:06 |
fungi | we're running 8 gitea servers behind a load balancer right now, but better clustering (especially for the code search functionality) would be nice, yes | 16:06 |
Eighth_Doctor | holy crap, 8?! | 16:07 |
Eighth_Doctor | I knew gitea wasn't great for scaling, but that's awful | 16:07 |
fungi | partly so that we can handle bursts of cloning activity better | 16:07 |
Eighth_Doctor | sure, makes sense | 16:07 |
fungi | they're usually under-utilized | 16:07 |
mordred | since you say that - I'm curious if pague would be better at browsing teh nova repo | 16:07 |
fungi | also gives us the ability to take some of them offline without impacting performance | 16:08 |
fungi | for upgrades et cetera | 16:08 |
Eighth_Doctor | is openstack/nova usually the problem child? | 16:08 |
mordred | well - it's the best example of a problem child | 16:08 |
fungi | yeah, that repo is large, has ~10 years of history, et cetera | 16:09 |
mordred | it is a large repo and gitea has had some issues with doing the right things caching its refs in the past | 16:09 |
Eighth_Doctor | well, let's see if I can even download it! :P | 16:09 |
Eighth_Doctor | we've hosted mirrors of the linux kernel reasonably well on pagure.io (which has less resources than src.fedoraproject.org) and I think I have a copy of mongodb pre sspl there | 16:10 |
mordred | Eighth_Doctor: does pagure handle operating in a cluster decently? like - if we wanted to run 8 pagures in a k8s but treat them as a single server? | 16:10 |
Eighth_Doctor | mordred: this is of comparable size: https://pagure.io/mongodb-agplv3 | 16:10 |
Eighth_Doctor | mordred: I personally do not know because I don't run pagure that way, but I know of users who are running it in OpenShift or Kubernetes and scaling the backend workers accordingly to handle the load well | 16:11 |
Eighth_Doctor | so far, I haven't heard any complaints | 16:11 |
Eighth_Doctor | there's a WIP helm chart PR for pagure, but neither I nor the other developers have experience with k8s enough to be able to do anything meaningful with it | 16:12 |
mordred | nod. I mean - the k8s part isn't as important as the being able to scale it horizontally part | 16:12 |
fungi | master branch of nova is nearly 60k commits at this point, looks like | 16:12 |
fungi | Eighth_Doctor: what's the typical server size for pagure, do you think? part of why we're running 8 backends for gitea is that they're each small virtual machines with like 8gb ram | 16:13 |
fungi | but we're also not nearly the repository count of fedora, only a little over 2k repositories at the moment | 16:14 |
Eighth_Doctor | I don't have the exact details, but I think the existing src.fedoraproject.org server is basically a VM with 4GB of RAM | 16:14 |
fungi | neat | 16:14 |
Eighth_Doctor | it might be 8GB of RAM now, but I know it's not a huge machine | 16:14 |
corvus | here's utilization of the individual gitea backends: http://cacti.openstack.org/cacti/graph_view.php | 16:15 |
corvus | click the 'gitea farm' on the left | 16:15 |
Eighth_Doctor | that's not too bad | 16:16 |
Eighth_Doctor | storage I can ignore, since those are synced | 16:16 |
corvus | looks like a median load average might be about 0.25, peaking at 2 | 16:16 |
Eighth_Doctor | I'm pretty sure the utilization levels are similar on src.fp.o | 16:16 |
mordred | if I'm reading https://docs.pagure.org/pagure/overview.html#pagure-workers right - in general there is expected to be one copy of the git repos on disk and pushing to those would be via a gitolite instance. then the pagure web interface is going to read from that filesystem copy via async worker tasks | 16:17 |
fungi | our typical activity levels would probably be handled with only a couple backends, but with some frequency people point high-volume ci systems at our git refs and start cloning hundreds of copies of repositories at the same time | 16:17 |
Eighth_Doctor | yep | 16:17 |
mordred | so if the filesystem were shared amongst workers, teh read traffic looks like it would be pretty scalable | 16:17 |
corvus | i bet we could halve the cluster (to 4 8gb vms) with no significant impact to performance. more than that we'd probably have peak memory usage issues. | 16:17 |
Eighth_Doctor | this is essentially the characteristic for fedora | 16:18 |
Eighth_Doctor | we also have things like koschei, zuul, etc. constantly checking out and interacting with pagure API | 16:18 |
mordred | via scale out - but writes might still have a spof? | 16:18 |
Eighth_Doctor | and it's doing very well with just one server | 16:18 |
Eighth_Doctor | the only bottleneck is if you need to scale storage... but if you're operating in k8s, this is abstracted for you | 16:19 |
mordred | oh - we're not :) | 16:19 |
mordred | but - that's been a thing we've been looking at doing if we could get to a clustered solution for the git browsing | 16:19 |
Eighth_Doctor | ... then I'm confused about k8s? | 16:19 |
mordred | right now we replicate to all 8 machines independently | 16:19 |
Eighth_Doctor | oh... ouch | 16:20 |
mordred | we'd LIKE to have a single clustered system that we replicate to once | 16:20 |
mordred | but so far that's problematic | 16:20 |
Eighth_Doctor | that means you're inducing state sync load | 16:20 |
Eighth_Doctor | I've usually seen this solved with either shared nfs or gluster | 16:20 |
mordred | with cgit it was just impossible. with gitea there are some indexes that made single-machine assumptions that are in process of being fixed | 16:20 |
Eighth_Doctor | that's not to say other solutions aren't valid, but those are the two I usually see | 16:21 |
mordred | yeah- that was/is the gitea design - run the gitea cluster on top of a cephfs | 16:21 |
Eighth_Doctor | there is an option for sharding git storage in pagure | 16:21 |
mordred | but there were 2 things it was doing that were storing index files in the filesystem which needed to be abstracted out into plugin interfaces so they could store in a service | 16:21 |
Eighth_Doctor | but we don't use it in fedora right now and it needs some love | 16:22 |
Eighth_Doctor | https://github.com/repoSpanner/repoSpanner | 16:22 |
Eighth_Doctor | this does work with pagure, but the issue is that the sync penalty is too high in some cases | 16:22 |
mordred | that would be a cost in push right? | 16:23 |
Eighth_Doctor | there was some in-progress work for improve performance, but interest died off on completing it | 16:23 |
Eighth_Doctor | yes | 16:23 |
mordred | I'd LOVE to be able to scale without needing to run a shared filesystem | 16:23 |
Eighth_Doctor | repoSpanner was designed to avoid the shared filesystem requirement | 16:23 |
clarkb | yes memory is the major thing. You need about a gig of memory for each git operation on several of our repos | 16:23 |
Eighth_Doctor | because we don't use one in Fedora | 16:23 |
clarkb | as long as git is used regardless of frontend I dont expect that changes dramaticallu | 16:24 |
clarkb | then you add N operations amd suddenly you need quite a bit of memory | 16:24 |
openstackgerrit | Merged opendev/jeepyb master: Decode utf-8 from subprocess.Popen https://review.opendev.org/719188 | 16:24 |
clarkb | also note the split git repos arent an issue as long as this is a read only frontend | 16:24 |
clarkb | its going ti be eventually consistent regardless due to how gerrit replication works | 16:25 |
mordred | infra-root: ok the jeepyb change landed - I'm about ready to try another restart | 16:25 |
Eighth_Doctor | so perhaps pagure + repospanner would work in your specific scenario | 16:25 |
clarkb | (so overcomplicating that to sync isnt worth much imo, using a fa that syncs for us is nice and simple | 16:25 |
clarkb | *using a fs | 16:25 |
mordred | clarkb: yah - but ... it's possible running repoSpanner might be easier than running ceph | 16:26 |
Eighth_Doctor | mordred: _that_ I can say is true :) | 16:26 |
mordred | (if we got a ceph magically from someone already running one, using a ceph would be easier) | 16:26 |
Eighth_Doctor | isn't that how that always works? :) | 16:27 |
corvus | mordred: i have to run; i can check back in in a few hours, but i support you restarting if you're comfortable | 16:27 |
mordred | I'm comfortable | 16:27 |
fungi | thanks corvus | 16:27 |
fungi | and yeah, i'm around again | 16:28 |
mordred | corvus, fungi, clarkb: images ahve been pushed, ansible change have applied | 16:28 |
mordred | I'm going to try another restart | 16:28 |
mordred | clarkb: we're in root screen on review if you wanna watch | 16:28 |
clarkb | Im half around. Drinking tea and eating cornbread | 16:28 |
mordred | although the root screen itself isn't super exciting | 16:28 |
Eighth_Doctor | I don't know if you guys use ansible or something else for config management, but you can see Fedora's ansible role for pagure here: https://infrastructure.fedoraproject.org/cgit/ansible.git/tree/roles/pagure | 16:29 |
Eighth_Doctor | (the ansible repo hasn't yet moved to pagure.io) | 16:30 |
mordred | clarkb, fungi : gerrit looks like it's back u | 16:31 |
mordred | ip | 16:31 |
mordred | UP | 16:31 |
Eighth_Doctor | CentOS also runs an instance and has an Ansible role: https://github.com/CentOS/ansible-role-pagure | 16:31 |
Eighth_Doctor | pagure uses MySQL on CentOS and PostgreSQL on Fedora ;) | 16:32 |
fungi | based on our present direction for evolution of our deployment methodology, we'd presumably consume docker images for the service components and then deploy those images with ansible | 16:32 |
mordred | yeah | 16:32 |
Eighth_Doctor | that's fine too :) | 16:33 |
fungi | either consume upstream-provided docker images, or (re)build our own with our ci system and then consume those | 16:33 |
Eighth_Doctor | we have Dockerfiles for pagure that we use primarily for dev and CI, but we don't currently publish any containers for prod | 16:34 |
Eighth_Doctor | so the latter would probably be the way for you to go | 16:34 |
mordred | yeah - that's what we do for gitea too - their docker images aren't structured for what we'd want in prod - are more focused on the AIO "I want to run it quickly on my laptop" use case | 16:34 |
mordred | which is an important use case | 16:35 |
mordred | but not what we're doing :) | 16:35 |
fungi | yep, that's how we're deploying gerrit in production, as of, well, today i suppose (if we don't have to roll back again) | 16:35 |
mordred | fungi: I'm going to roll this forward today if it kills me | 16:35 |
fungi | how about let's just not stick to deployment models which leave dead sysadmins in their wake | 16:35 |
mordred | ok fair | 16:36 |
Eighth_Doctor | pagure is packaged for Fedora, RHEL/CentOS via EPEL, Mageia, and openSUSE by me | 16:36 |
Eighth_Doctor | so if you want to play with it in a VM or a container, it's pretty easy to do ;) | 16:36 |
clarkb | "Here lies Mordred. A java program eventually got the best of him" | 16:36 |
Eighth_Doctor | RIP | 16:36 |
mordred | HOOKS HAVE RUN WITH NO TRACEBACKS | 16:38 |
mordred | I declare victory | 16:38 |
clarkb | mordred: fungi if there is a list of things to review I can help with that but probably not get into it beyond that | 16:38 |
* Eighth_Doctor sees mordred fall over in a heap | 16:39 | |
fungi | clarkb: i think we've got them all in now? can probably abandon the revert of the hooks update which didn't merge | 16:39 |
clarkb | cool will do that | 16:39 |
fungi | we can resurrect it if we decide we do have to roll back for reasons we can't correct immediately | 16:40 |
mordred | fungi: while I've got you - would you mind reviewing https://review.opendev.org/#/c/719088/ ? | 16:42 |
mordred | fungi: if you're ok with that - I'll delete the old ones and land it | 16:42 |
Eighth_Doctor | fungi, mordred: if you were interested in the k8s based approach: https://pagure.io/pagure/pull-request/4483 | 16:44 |
fungi | mordred: yep, cool will do | 16:45 |
Eighth_Doctor | and since we've been talking about performance, here's the info I gave the FSF for helping them set up a performant system for their forge based on pagure: https://lists.pagure.io/archives/list/pagure-devel@lists.pagure.io/message/SZ7GJ5P65Q76FRZIDNYFP3HI4RD4H6LT/ | 16:47 |
clarkb | oh thats the other performance related issue we do have. We have to use source ip based load balancing due to unshared git repos | 16:50 |
clarkb | because a fetch executing across different repos of the same logical entity can fail | 16:50 |
clarkb | (it depends on how objects are packed iirc) | 16:51 |
clarkb | and that hasissues when large companies funnel through a single NAT IP | 16:52 |
Eighth_Doctor | yup | 16:52 |
Eighth_Doctor | that might be where repoSpanner helps here | 16:55 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Write out db config for root user https://review.opendev.org/719192 | 16:56 |
Eighth_Doctor | assuming you want to have multiple storage replicas | 16:56 |
Eighth_Doctor | clarkb: I'm not sure, given your usage model, that repoSpanner would be necessary, but it would avoid the load balancing problem | 16:59 |
Eighth_Doctor | you could run one frontend app with some number of workers, and then have a repoSpanner cluster that handles the git storage | 16:59 |
openstackgerrit | Merged opendev/system-config master: Install ep_headings module https://review.opendev.org/719123 | 16:59 |
clarkb | Eighth_Doctor: ya any shared repo content or synced content would fix that I think | 16:59 |
fungi | as long as the shared backend guaranteed all frontends were serving exact same copies of the content at the same times | 17:02 |
mordred | clarkb, fungi: corvus suggested earlier that we should build etherpad image instead of doing that ep_headings hack above and i agree | 17:05 |
mordred | I'll ressurect the child-image-building code in a bit | 17:05 |
fungi | mordred: i take it cron running track-upstream outside the container is fine? | 17:08 |
mordred | fungi: it actually runs it in a container :) | 17:09 |
fungi | huh... looking closer | 17:10 |
mordred | fungi: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/track-upstream.j2 | 17:10 |
mordred | fungi: that gets installed into /usr/local/bin | 17:10 |
fungi | oh! so it does ;) | 17:10 |
fungi | also... has gerritbot config update gotten solved yet? i want to say we're still not seeing infra-manual changes in here since the namespace move | 17:11 |
mordred | it has not - I've got the first change up | 17:12 |
mordred | https://review.opendev.org/#/c/715635/ | 17:13 |
openstackgerrit | Merged opendev/system-config master: Add review and etherpad to backup group https://review.opendev.org/719036 | 17:13 |
openstackgerrit | Merged opendev/system-config master: Run ansible on the backup server https://review.opendev.org/719076 | 17:13 |
fungi | oh, cool | 17:16 |
fungi | right, we were containering it, i forgot | 17:17 |
fungi | +2 | 17:17 |
mordred | fungi: so many containers | 17:18 |
fungi | okay, christine's got a lengthy list of things i need to repair around the house, but i'll be in and out to keep tabs on gerrit in case we run into any more unforeseen problems | 17:18 |
mordred | fungi: cool. I think we're good though - it seems like we've finally finished this phase! | 17:29 |
openstackgerrit | Merged opendev/system-config master: Add root cron jobs to gerrit https://review.opendev.org/719088 | 17:46 |
fungi | here's hoping! | 17:48 |
mordred | fungi, clarkb, corvus: the backup playbook is not working | 18:27 |
mordred | my brain can't quite process it at the moment | 18:27 |
mordred | but we shoud fix it :) | 18:28 |
fungi | i'll see if i can figure it out in a bit | 18:37 |
fungi | once this leftover curry is gone ;) | 18:37 |
fungi | mordred: when you say "the backup playbook is not working" you mean the periodic pipeline job running the playbooks/service-backup.yaml playbook? | 19:40 |
mordred | fungi: I mean the playbook itself - if you look in /var/log/ansible/service-backup.yaml.log on bridge | 19:44 |
mordred | fungi: looking at the ansible it looks like there's maybe a mismatch in variable name - but I'm not 100% sure and I'm not 100% sure of the intent | 19:44 |
mordred | so the job is running the playbook fine -but the playbook itself is bombing out :) | 19:46 |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Make job template update best effort https://review.opendev.org/719308 | 19:47 |
mordred | fungi: oh! I think I might know what the issue is | 19:50 |
mordred | fungi: etherpad01.opendev.org is in the disabled list | 19:50 |
mordred | so it's not being run in the backup role - so it's not setting the bup_user variable | 19:51 |
fungi | d'oh | 19:51 |
mordred | BUT - we do with_inventory_hostnames in backup-server | 19:51 |
mordred | on 'backup' | 19:51 |
mordred | which does not subtract hosts in the disabled group | 19:51 |
mordred | ooh - it supports exclusion patterns | 19:52 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Exclude disabled group from backup-server loop https://review.opendev.org/719309 | 19:54 |
mordred | fungi, corvus : ^^ | 19:54 |
mordred | this is an issue that will only arise if we have a server we backup disabled at a time when we have backup-server enabled | 19:54 |
mordred | like now | 19:54 |
mordred | also - I think we can unemergency etherpad | 19:55 |
mordred | but why don't I leave it in emergency so we can check that backup runs correctly in this scenario | 19:55 |
fungi | good call | 20:07 |
fungi | mordred: so... we don't want to backup servers which have config management disabled? | 20:09 |
mordred | fungi: well - we do - but we probably don't want to set up new backup info on them if they're disabled | 20:31 |
mordred | (or we can't, since we won't have run the corresponding stuff on the server themselves - so there's potentially no user to connect to yet - which would be true in the case of etherpad) | 20:32 |
mordred | backups _themselves_ are via cron - but attempting to set up new backups while disabled == sad panda | 20:32 |
fungi | okay, so the bup_users set would only be used for initial configuration, not to decide which to run the backups for when already set up, got it | 20:33 |
fungi | the job name system-config-run-backup was mildly misleading | 20:34 |
fungi | now realizing it's infra-prod-service-backup i meant to be looking at | 20:35 |
fungi | and yeah, now i see in playbooks/roles/backup/tasks/main.yaml we're still configuring a cronjob, not triggering backups directly | 20:36 |
fungi | makes sense, thanks | 20:36 |
*** tosky has quit IRC | 23:24 | |
*** DSpider has quit IRC | 23:48 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!