clarkb | Meeting time in a minute or two | 18:59 |
---|---|---|
clarkb | Its been a while too :) | 18:59 |
ianw | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Nov 30 19:01:02 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-November/000303.html Our Agenda | 19:01 |
clarkb | We have an agenda. | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | Gerrit User Summit is happening Thursday and Friday this week from 8am-11am pacific time virtually | 19:01 |
clarkb | If you are interested in joining registration is free. I think they will have recordings too if you prefer to catch up out of band | 19:02 |
fungi | also there was a new git-review release last week | 19:02 |
clarkb | I intend on joining as there is a talk on gerrit updates that will be useful to us to hear I think | 19:02 |
clarkb | yup please update your git-review installation to help ensure it is working properly. I've updated as my git version updated locally forcing me to update | 19:02 |
clarkb | I haven't had any issues with new git review yet | 19:03 |
fungi | git-review 2.2.0 | 19:03 |
fungi | i sort of rushed it through because an increasing number of people were upgrading to newer git which it was broken with | 19:03 |
clarkb | the delta to the previous release was small too so probably the right move | 19:03 |
fungi | but yeah, follow up on the service-discuss ml or in #opendev if you run into anything unexpected with it | 19:04 |
clarkb | #topic Actions from last meeting | 19:05 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-11-16-19.01.txt minutes from last meeting | 19:05 |
clarkb | I don't see any recorded actions | 19:05 |
clarkb | We'll dive right into the fun stuff then | 19:05 |
clarkb | #topic Topics | 19:05 |
clarkb | #topic Improving CD Throughput | 19:05 |
clarkb | sorry small network hiccup | 19:06 |
clarkb | A number of changes have landed to make this better while keeping our serialized one job after another setup | 19:07 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/807808 Update system-config once per buildset. | 19:07 |
clarkb | #link https://review.opendev.org/c/opendev/base-jobs/+/818297/ Reduce actions needed to be taken in base-jobs. | 19:07 |
ianw | yep, those are the last two | 19:08 |
clarkb | These are the last two updates to keep status quo but prepare for parallel ops | 19:08 |
clarkb | Once those go in we can start thinking about adding/updating semaphores to allow jobs to run in parallel. Very exciting. Thank you ianw for pushing this along | 19:08 |
ianw | yep i'll get to that change soon and we can discuss | 19:08 |
clarkb | #topic Zuul multi scheduler setup | 19:09 |
clarkb | Just a note that a number of bug fixes have landed to zuul since we last restarted | 19:09 |
clarkb | I expect that we'll be doing a restart at some point soon to check everything is happy before zuul cuts a new release | 19:10 |
clarkb | I'm not sure if that will require a full restart and clearing of the zk state, corvus would know. Basically it is possible that this won't be a graceful restart | 19:10 |
fungi | after our next restart, it would probably be helpful to comb the scheduler/web logs for any new exceptions getting raised | 19:10 |
clarkb | s/graceful/no downtime/ | 19:10 |
corvus | yes we do need a full clear/restart | 19:10 |
clarkb | corvus: thank you for confirming | 19:11 |
fungi | i saw you indicated similar in matrix as well | 19:11 |
fungi | (for future 4.11 anyway) | 19:11 |
clarkb | and ya generally be on the lookout for odd behaviors, our input has been really helpful to the development process here and we should keep providing that feedback | 19:11 |
corvus | i'd like to do that soon, but maybe after a few more changes land | 19:11 |
corvus | we should probably talk about multi web | 19:12 |
corvus | it is, amusingly, now our spof :) | 19:12 |
clarkb | corvus: are we thinking run a zuul-web on zuul01 as well then dns round robin? | 19:13 |
corvus | (amusing since it hasn't ever actually been a spof except that opendev only ever needed to run 1) | 19:13 |
corvus | that's an option, or a LB | 19:13 |
clarkb | if we add an haproxy that might work better for outages and balancing but it would still be a spof for us | 19:13 |
corvus | we might want to think about the LB so we can have more frequent restarts without outages | 19:13 |
clarkb | I guess the idea is haproxy will need to restart less often than zuul-web and in many cases haproxy is able to keep connections open until they complete | 19:14 |
fungi | dns round-robin is only useful for (coarse) load distribution, not failover | 19:14 |
frickler | do we have octavia available? that in vexxhost? | 19:14 |
corvus | i figure if it's good enough for gitea it's good enough for zuul; we know that we'll want to restart zuul-web frequently, and there's a pretty long window when a zuul-web is not fully initialized, so a lb setup could make a big difference. | 19:14 |
clarkb | frickler: I think it is available in vexxhost, but we don't host these services in vexxhost currently so that would add a large (~40ms?) rtt between the lb frontend and backend | 19:15 |
clarkb | corvus: good point re gitea | 19:15 |
fungi | on the other hand, if we need to take the lb down for an extended period, which is far less often, we can change dns to point directly to a single zuul-web while we work on the lb | 19:15 |
ianw | it's a bit old now, but https://review.opendev.org/c/opendev/system-config/+/677903 does the work to make haproxy a bit more generic for situations such as this | 19:16 |
fungi | or just build a new lb and switch dns to it, then tear down the old one | 19:16 |
ianw | (haproxy roles, not haproxy itself) | 19:16 |
clarkb | ianw: oh ya we'll want something like that if we go the haproxy route and don't aaS it | 19:17 |
corvus | ianw: is that for making a second haproxy server, or for using an existing one for more services? | 19:17 |
corvus | (i think it's option #1 from the commit msg) | 19:17 |
clarkb | corvus: I read the commit message as #1 as well | 19:17 |
ianw | corvus: iirc that was when we were considering a second haproxy server | 19:17 |
fungi | yeah, make it easier for us to reuse the system configuration, not the individual load balancer instances | 19:17 |
corvus | that approach seems good to me (but i don't feel strongly; if there's an aas we'd like to use that should be fine too) | 19:18 |
fungi | so that we don't end up with multiple almost identical copies of the same files in system-config for different load balancers | 19:18 |
clarkb | corvus: I think I have a slgiht prefernce of using our existing tooling for consistency | 19:19 |
clarkb | and separately if someone wants to investigate octavia we can do that and switch wholesale later (I'd be most concerned about using it across geographically distributed systems with disparate front and backe ends) | 19:19 |
fungi | though for that we'd probably be better off with some form of dns-based global load balancing | 19:20 |
fungi | granted it can be a bit hard on the nameservers | 19:20 |
fungi | (availability checks driving additions and removals to a dedicated dns record/zone) | 19:21 |
fungi | requires very short ttls, which some caching resolvers don't play nicely with | 19:21 |
corvus | ok, i +2d ianw's change; seems like we can base a zuul-lb role on that | 19:22 |
clarkb | sounds good, anything else zuul related to go over? | 19:22 |
corvus | i'll put that on my list, but it's #2 on my opendev task list, so if someone wants to grab it first feel free :) | 19:23 |
corvus | (and that's all from me) | 19:24 |
clarkb | #topic User management on our systems | 19:24 |
clarkb | The update to irc gerritbot here went really well. The update to matrix-gerritbot did not. | 19:24 |
clarkb | It turns out that matrix-gerritbot needs a cache dir in $HOME/.cache to store its dhall intermediate artifacts | 19:24 |
clarkb | and that didn't play nicely with the idea of running the container as a different user as it couldn't write to $HOME/.cache. I had thought I had bind mounted everything it needed and that it was all read only but that wasn't the case. TO make things a bit worse the dhall error log messages couldn't be written because the image lacked a utf8 locale and the error messages had | 19:25 |
clarkb | utf8 characters | 19:25 |
clarkb | tristanC has updated the matrix-gerritbot image to address these things so we can try again this week. I need to catch back up on that. | 19:25 |
clarkb | One thing I wanted to ask about is whether or not we'd like to build our own matrix-gerritbot images using docker instead of nix so that we can have a bit more fully featured image as well as understand the process | 19:26 |
clarkb | I found the nix stuff to be quite obtuse myself and basically punted on it as a result | 19:26 |
clarkb | (the image is really interesting it sets a bash prompt but no bash is installed, there is no /tmp (I tried to override $HOME to /tmp to fix the issue and that didn't work), etc) | 19:27 |
clarkb | I don't need an answer to that in this meeting but wanted to call it out. Let me know if you think that is a good or terrible idea once you have had a chance to ponder it | 19:28 |
fungi | i agree, it's nice to have images which can be minimally troubleshot at least | 19:28 |
ianw | it wouldn't quite fit our usual python-builder base images, though, either? | 19:29 |
clarkb | ianw: correct, it would be doing very similar things but with haskell and cabal instead of python and pip | 19:29 |
clarkb | ianw: we'd do a build in a throwaway image/layer and then copy the resulting binary into a more minimal haskell image | 19:29 |
clarkb | s/haskell/ghc/ I guess | 19:29 |
clarkb | https://hub.docker.com/_/haskell is the image we'd probably use | 19:30 |
clarkb | I don't think we would need to maintain the base images, we could just FROM that image a couple of times and copy the resulting binary over | 19:30 |
clarkb | We can move on. I wanted to call this out and get people thinking about it so that we can make a decision later. It isn't urgent to decide now as it isn't an operation issue at the moment | 19:31 |
clarkb | #topic UbuntuOne two factor auth | 19:31 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-November/000298.html Using 2fa with ubuntu one | 19:31 |
fungi | at the beginning of last week i started that ml thread | 19:31 |
fungi | i wanted to bring it up again today since i know a lot of people were afk last week | 19:32 |
fungi | so far there have been no objections to proceeding, and two new volunteers to test | 19:32 |
clarkb | I have no objections, if users are comfortable with the warning in the group description I think we should enroll those who are interested | 19:32 |
fungi | even though we haven't really made a call for volunteers yet | 19:32 |
ianw | (i think i approved one already, sorry, after not reading the email) | 19:32 |
fungi | no harm done ;) | 19:33 |
clarkb | ya was hrw, I think hrw was aware of the concerns after working at canonical previously | 19:33 |
clarkb | an excellent volunteer :) | 19:33 |
fungi | i just didn't want to go approving more volunteers or asking for volunteers until we seemed to have some consensus that we're ready | 19:33 |
clarkb | I think so, its been about a year, I have yet to have a problem in that time | 19:33 |
fungi | i'll give people until this time tomorrow to follow up on the ml as well before i more generally declare that we're seeking volunteers to help try it out | 19:34 |
clarkb | sounds like a plan, thanks | 19:34 |
frickler | I guess I can't be admin for that group without being member? | 19:35 |
fungi | frickler: correct | 19:35 |
clarkb | frickler: I think that is correct due to how lp works | 19:35 |
fungi | i'm also happy to add more admins for the group | 19:36 |
frickler | o.k., not a blocker I'd think, but I'm not going to join at least for now | 19:36 |
clarkb | One thing we might need to clarify with canonical/lp/ubuntu is what happens if someone is removed from the group | 19:36 |
clarkb | and until then don't remove anyone? | 19:36 |
fungi | i'll make sure to mention that in the follow-up | 19:37 |
fungi | maybe hrw knows, even | 19:37 |
ianw | it does seem like from what it says it's a one-way ticket, i was treating it as such | 19:37 |
ianw | but good to confirm | 19:37 |
clarkb | ianw: yup, that is why I asked because if we add more admins they need to be aware of that and not remove people potentially | 19:37 |
clarkb | it may also be the case that the enrollment happens on the backend once and then never changes regardless of group membership | 19:38 |
clarkb | We have a couple more topics so lets continue on | 19:38 |
clarkb | #topic Adding a lists.openinfra.dev mailman site | 19:38 |
clarkb | #link https://review.opendev.org/818826 add lists.openinfra.dev | 19:38 |
clarkb | fungi: I guess you've decided it is safe to add the new site based on current resource usage on lists.o.o? | 19:39 |
clarkb | One thing I'll note is that I don't think we've added a new site since we converted to ansible. Just be on the lookout for anything odd due to that. We do test site creation in the test jobs though | 19:39 |
fungi | yeah, i've been monitoring the memory usage there and it's actually under less pressure after the ubuntu/python/mailman upgrade | 19:39 |
clarkb | you'll also need to update DNS over in the dns as a service but that is out of bad and safe to land this before that happens | 19:40 |
fungi | for some summmary background, as part of the renaming of the openstack foundation to the open infrastructure foundation, there's a desire to move the foundation-specific mailing lists off the openstack.org domain | 19:40 |
fungi | i'm planning to duplicate the list configs and subscribers, but leave the old archives in place | 19:40 |
clarkb | fungi: is there any concern for impact on the mm3 upgrade from this? I guess it is just another site to migrate but we'll be doing a bunch of those either way | 19:41 |
fungi | and forward from the old list addresses to the new ones of course | 19:41 |
fungi | yeah, one of the reasons i wanted to knock this out was to reduce the amount of list configuration churn we need to deal with shortly after a move to mm3 when we're still not completely familiar with it | 19:42 |
clarkb | makes sense. I think you've got the reviws you need so approve when ready I guess :) | 19:42 |
fungi | so the more changes we can make before we migrate, the more breathing room we'll have after to finish coming up to speed | 19:42 |
clarkb | Anything else on this topic? | 19:42 |
fungi | nope, thanks. i mainly wanted to make sure everyone was aware this was going on so there were few surprises | 19:43 |
clarkb | thank you for the heads up | 19:43 |
clarkb | #topic Proxying and caching Ansible Galaxy in our providers | 19:43 |
clarkb | #link https://review.opendev.org/818787 proxy caching ansible galaxy | 19:43 |
clarkb | This came up in the context of tripleo jobs needing to use ansible collections and having less reliable downloads | 19:44 |
fungi | right | 19:44 |
clarkb | I think we set them up with zuul github projects they can require on their jobs | 19:44 |
fungi | yes we added some oc the collections they're using, i think | 19:44 |
clarkb | Is the proxy cache something we think we should moev those ansible users to? or should we continue adding github projects? | 19:44 |
clarkb | or do we need some combo of both? | 19:44 |
fungi | that's my main question | 19:45 |
fungi | one is good for integration testing, the other good for deployment testing | 19:45 |
fungi | if you're writing software which pulls things from galaxy, you may want to exercise that part of it | 19:45 |
clarkb | corvus: from a zuul perspective I know we've struggled with the github api throttling during zuul restarts. Is that something you think we should try to optimize by reducing the number of github projects in our zuul config? | 19:45 |
clarkb | fungi: I think you still point galaxy at a local file dir url. And I'm not sure you gain much testing galaxies ability to parse file:/// vs https:/// | 19:46 |
corvus | clarkb: i don't know if that's necessary at this point; i think it's worth forgetting what we knew and starting a fresh analysis (if we think it's worthwhile or is/could-be a problem) | 19:46 |
corvus | much has changed | 19:46 |
clarkb | corvus: got it | 19:46 |
clarkb | At the end of the day adding the proxy cache is pretty low effort on our end. But the zuul required projects should be far more reliable for jobs. And since we are already doing that I sort of lean that direction | 19:47 |
clarkb | But considering the low effort to run the caching proxy I'm good with doing both and letting users decide which tradeoff is best for them | 19:48 |
fungi | yeah, the latter means we need to review every new addition, even if the project doesn't actually need to consume that dependency from arbitrary git states | 19:48 |
fungi | with the caching proxy, if they add a collection or role from galaxy they get the benefit of the proxy right away | 19:49 |
clarkb | good point. I'll add this to my review list for after lunch and we can roll forweard with both while we sort out github connections in zuul | 19:49 |
clarkb | Anything else on this subject? | 19:49 |
fungi | but i agree that if the role or collection is heavily used then having it in the tenant config is going to be superior for stability | 19:49 |
fungi | i didn't have anything else on that one | 19:50 |
clarkb | #topic Open Discussion | 19:50 |
clarkb | We've got 10 minutes for any other items to discuss. | 19:50 |
fungi | you had account cleanups on the agenda too | 19:50 |
clarkb | ya but there isn't anything to say about them. I've been out and no time to discuss them | 19:50 |
fungi | for anyone reviewing storyboard, i have a couple of webclient fixes up | 19:51 |
clarkb | Its a bit aspirational at this point :/ I need to block of a solid day or three and just dive into it | 19:51 |
fungi | #link https://review.opendev.org/814053 Bindep cleanup and JavaScript updates | 19:51 |
fungi | that solves bitrot in the tests | 19:51 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/819733 upgrade Gerrit to 3.3.8 | 19:51 |
fungi | and makes it deployable again | 19:51 |
clarkb | Gerrit made new versions and ^ updates our image so that we can upgrade | 19:51 |
clarkb | Might want to do that during a zuul restart? | 19:51 |
fungi | yeah, since we need to clear zk anyway that probably makes sense | 19:52 |
fungi | #link https://review.opendev.org/814041 Update default contact in error message template | 19:52 |
fungi | that fixes the sb error message to point users to oftc now instead of freenode | 19:52 |
fungi | can't merge until the tests work again (the previous change i mentioned) | 19:52 |
ianw | oh i still have the 3.4 checklist to work through. hopefully can discuss next week | 19:53 |
clarkb | ianw: 819733 does update the 3.4 imgae to 3.4.2 as well. We may want to refresh the test system on that once the above change lands | 19:53 |
clarkb | The big updates in these new versions are to reindexing so something that might actually impact the upgrade | 19:53 |
clarkb | they added a bunch of performance improvements sounds like | 19:54 |
ianw | iceweasel ... there's a name i haven't heard in a while | 19:54 |
fungi | especially since it essentially no longer exists | 19:54 |
ianw | clarkb: ++ | 19:54 |
clarkb | Last call, then we can all go eat $meal | 19:56 |
ianw | kids these days wouldn't even remember the trademark wars of ... 2007-ish? | 19:56 |
fungi | i had to trademark uphill both ways in the snow | 19:57 |
clarkb | ianw: every browser is Chrome now too | 19:57 |
clarkb | #endmeeting | 19:57 |
opendevmeet | Meeting ended Tue Nov 30 19:57:59 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.html | 19:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.txt | 19:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-11-30-19.01.log.html | 19:57 |
clarkb | Thank you everyone | 19:58 |
clarkb | We'll see you here next week | 19:58 |
fungi | thanks clarkb! | 19:58 |
corvus | and app too (thanks electron!) | 19:58 |
fungi | same bat time, same bat channel! | 19:58 |
ianw | https://en.wikipedia.org/wiki/Mozilla_software_rebranded_by_Debian - 2006 - i'll take 2007 as a pretty good guess off the top of my head :) | 19:59 |
ianw | i remember installing it on an Itanium desktop, so that constrained the timeline a bit | 19:59 |
clarkb | wow itanium | 20:00 |
ianw | gosh that was a long time ago! | 20:00 |
clarkb | we had a couple racks of itanium servers when I was at Intel. I think they were largely idle because by that point in time everyone knew the arch wasn't going anywhere | 20:01 |
ianw | oh those were the days. this was pre amd64 (which was pre x86-64!) so just about everything had weird issues related to 64-bit pointers | 20:01 |
ianw | there was ~ nobody running gnome, etc. on 64-bit in those days | 20:02 |
clarkb | ianw: not even on sparc? | 20:02 |
ianw | maybe enthusiasts would play with things on a sparc, or a alpha multia etc. but generally you'd run x11 and something more basic like fvwm | 20:03 |
clarkb | I guess solaris had the CDE probably not many gnome users. But by the time opensolaris happened gnome was the default iirc | 20:04 |
clarkb | we had a lot of sparc stuff at the university | 20:04 |
ianw | yeah, sparc was fun and coveted hardware if you coud find it. i feel like people fiddling with the alternative archs were also more bsd-ish. a lot of netbsd going around for alpha and sparc at the time | 20:07 |
ianw | i don't know why i coveted 200lb boxes full of jet engine fans, but it was a different time :) | 20:09 |
fungi | i did have a 64-bit sparc (sunstation) and 64-bit mips (sgi indy) | 20:14 |
fungi | s/sunstation/sparcstation/ | 20:14 |
fungi | (the sunstations also existed but i didn't have one) | 20:15 |
fungi | er, no, the sgi indy was 32-bit mips, but i did have a dec alpha as well | 20:16 |
fungi | i eventually swapped out the sparcstation for sun t1-105 rackmount servers because they were more compact and drew less power | 20:17 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!