#openstack-meeting log

19:01:19 <clarkb> #startmeeting infra
19:01:19 <openstack> Meeting started Tue Jul 30 19:01:19 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:22 <openstack> The meeting name has been set to 'infra'
19:01:24 <ianw> o/
19:01:35 <clarkb> Good morning ianw
19:01:45 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-July/006431.html
19:01:52 <clarkb> #topic Announcements
19:02:14 <clarkb> I think everyone is having too much fun debugging web browser behavior. That is ok I don't have any announcements
19:02:49 <fungi> heh, "fun"
19:03:58 <clarkb> #topic Actions from last meeting
19:04:13 <clarkb> mordred: any progress on the github actions?
19:04:26 <clarkb> fwiw hogepodge was asking about zuul mirror on github because someone was asking him
19:04:26 <mordred> oh it's meeting time
19:04:28 <mordred> no
19:04:36 <clarkb> so we may want to get that sorted soon to make it clearer to people
19:04:46 <mordred> yes
19:05:04 <clarkb> #action mordred create opendevadmin github account
19:05:08 <mordred> sorry - I keep bouncing off the github api part of it - I'll try harder to not bounce off
19:05:10 <clarkb> #action mordred clean up openstack-infra github org
19:05:24 <clarkb> mordred: is the scope of the problem small enough that we can click buttons? we had a lot of repos so probably not?
19:05:46 <mordred> probably not - given how many repos we have
19:06:01 <clarkb> ya I guess that would be a lot of clicking
19:06:13 <clarkb> #topic Priority Efforts
19:06:17 <corvus> what do we need to do there?
19:06:22 <clarkb> #undo
19:06:23 <openstack> Removing item from minutes: #topic Priority Efforts
19:06:40 <corvus> we're what, force-pushing "moved" messages to the repos?
19:06:48 <clarkb> corvus: archive all of the repos under https://github.com/openstack-infra
19:06:53 <mordred> oh - you know ... yeah - I think I might have been fixating too much on the archive thing
19:07:01 <corvus> we're archiving them?
19:07:05 <mordred> just force-pushing "we've moved" is a bit easier
19:07:05 <clarkb> well I think archive does what we want?
19:07:22 <clarkb> it gives a banner that says "this repo is RO and archived and doesn't live here anymore" type of message
19:07:31 <corvus> does it say where it does live?
19:07:32 <mordred> how about I start with the force-push - and we can come back to the archive later?
19:07:41 <fungi> i'd be fine with just deleting them, but maybe i'm not thinking of the users
19:07:54 <clarkb> corvus: I'm not sure if the message is configurable by the user
19:08:05 <mordred> force-push (or really even just a regular push) is super easy and I can have that done pretty quickly
19:08:06 <corvus> fungi: anything without a new location is no better than a delete from my pov
19:08:10 <clarkb> mordred: but ya pushing something that says the content is over there: works too then if we archive it will just make that ab it more explicit
19:08:11 <fungi> the archive message does not appear to be configurable when i looked
19:08:18 <mordred> yeah
19:08:22 <mordred> so - I'll work on force-push
19:08:26 <fungi> so it would need to be readme/description/something
19:08:35 <mordred> then later, when we feel like it, we can archive or not archive as we feel like
19:08:44 <clarkb> ok
19:08:45 <mordred> fungi: yeah - I'm thikning a normal retire-this-repo type commit
19:08:53 <mordred> leaving only a README with a "this is now elsewhere"
19:09:02 <mordred> pointing to the elsewhere
19:09:04 <corvus> okay, then the 2 options i like are: 1) push readme + archive; 2) delete.  the one thing i don't want to do is archive without readme.
19:09:22 <mordred> I think push readme is the absolute easiest
19:09:37 <corvus> (because archive without readme looks even more like the project terminated than just deleting it)
19:10:18 <clarkb> sounds like 1) is the current plan. mordred any reason that you don't think 1) will work?
19:10:54 <corvus> cool, let's action that explicitly so we don't forget which way we decided to "clean up" the org :)
19:10:58 <mordred> nope. I am very confident in 1 - or at least the 1st part - and I think the second part is a get-to-it-later thing - since that's a new feature that hasn't even existed that long
19:11:21 <mordred> so I don't think we have to say "we must use the new gh feature that lets you mark a repo readonly" - since those are readonly anyway
19:11:36 <clarkb> #action mordred Push commits to repos under github.com/openstack-infra to update READMEs explaining where projects live now. Then followup with github repo archival when that can be scripted.
19:11:48 <corvus> \o/
19:11:50 <mordred> ++
19:11:50 <corvus> thanks mordred
19:11:58 <clarkb> #topic Priority Efforts
19:12:00 <fungi> yeah, the deeper i looked into the "archive" option the less convinced i became that it's super useful
19:12:04 <fungi> (for us)
19:12:06 <mordred> ++
19:12:12 <clarkb> #topic OpenDev
19:12:35 <clarkb> fungi has done a bunch of work over the last week or so to replace our gitea backends. Thank your fungi!
19:12:58 <clarkb> These gitea backends all have 80GB of disk now instead of 40GB. That should give us plenty of room for growth as repos are adding and otherwise have commits pushed to them.
19:13:20 <clarkb> This process has also added 8GB swapfiles to address the OOMing we saw and the images we used have properly sized ext4 journals
19:13:26 <clarkb> that should make teh servers happier
19:13:57 <fungi> and improve performance as well as stability/integrity
19:14:05 <clarkb> Some things we have learned in the process: the /var/gitea/data/git/.gitconfig.lock files can go stale when the servers have unhappy disks. When that happens all replications fail to that host
19:14:37 <clarkb> If we end up doing bulk replications in the future we want to double check that file isn't stale (check timestamp and dmesg -T for disk errors) before trigger replication
19:14:54 <clarkb> unfortunately from gerrit's perspective replication succeeds and it moves on, but the data doesn't actually update on the gitea server
19:15:14 <clarkb> fungi: ^ anything else you think we should call out from that process?
19:15:21 <corvus> we should also try to figure out where the sshd logs are going and fix that
19:15:26 <clarkb> ++
19:15:37 <clarkb> there is a good chance they go to syslog like the haproxy logs
19:15:39 <corvus> cause i think that would have helped debugging (we learned about the problem via strace)
19:15:47 <clarkb> and if we mount /dev/log into the container we'll get the logs on the host syslog
19:15:59 <fungi> yeah, nothing other than we also added health checks in haproxy
19:16:00 <corvus> or maybe we can add an option to send them to stdout/stderr?
19:16:06 <fungi> and got it restarting on config updates
19:16:07 <corvus> because we run sshd in the foreground
19:16:28 <corvus> though i don't know if that is compatible with its child processes
19:16:34 <clarkb> -e      Write debug logs to standard error instead of the system log.
19:16:39 <clarkb> that might work
19:16:46 <fungi> yeah, or i think sshd_config can be set for it
19:16:51 <fungi> either one
19:17:14 <clarkb> changes to the docker file should all be tested
19:18:05 <clarkb> #action infra Update gitea sshd container to collect logs (either via stderr/stdout or syslog)
19:19:01 <clarkb> #topic Update Configuration Management
19:19:17 <clarkb> mordred: has been doing a bunch of work to build docker images for gerrit
19:19:33 <clarkb> I think we should now (or very soon once jobs run) have docker images for gerrit 2.13, 2.15, 2.16, and 3.0
19:19:53 <clarkb> mordred: is the next step in that process to redeploy review-dev using the 2.13 image?
19:20:30 <mordred> clarkb: yes.
19:20:48 <mordred> well - the next step will be writing a little config management to do that
19:20:51 <mordred> but yes
19:20:55 <clarkb> exciting
19:21:53 <clarkb> ianw: on the ansible'd mirror side of thigns I had to reboot the new mirror in fortnebula that fungi built recently to get the afs mounts in place (it was failing the ansible afs sanity checks prior to that)
19:22:18 <clarkb> ianw: have all the fixes related to that merged? I thought I had approved the one fix you had called out. But maybe we need to add a modprobe to the ansible?
19:22:35 <ianw> yes, afaik
19:23:07 <clarkb> one thing it could be is if dkms only builds the module for the latest kernel on the host but we haven't rebooted into that kernel yet?
19:23:08 <fungi> yeah, afs clients will need a reboot partway through configuration (or maybe just a modprobe)
19:23:29 <fungi> i believe dkms will do it for all installed kernels by defauly
19:23:31 <ianw> in CI testing we do build and then test straight away
19:23:31 <fungi> default
19:24:06 <clarkb> hrm behavior difference between cloud images and nodepool images maybe?
19:24:13 <clarkb> something to look at closer when we build more opendev mirrors I guess
19:24:14 <fungi> maybe when ansible installed the openafs packages it didn't wait long enough for the dkms builds to complete?
19:24:23 <fungi> maybe longer because more kernels?
19:24:33 <clarkb> fungi: it should wait for apt-get to return
19:24:39 <ianw> maybe ... the dpkg doesn't return till it's done
19:24:55 <ianw> hard to say at this point, as you say something to watch with a new server
19:25:15 <ianw> i didn't notice this doing the other rax openafs based opendev.org servers, iirc
19:25:48 <clarkb> Any other configuration management updates/bugs to call out?
19:27:04 <clarkb> Sounds like now
19:27:09 <clarkb> #topic Storyboard
19:27:22 <clarkb> fungi: diablo_rojo: anything to call out for storyboard?
19:27:39 <fungi> i don't think we have anything new and exciting this week
19:27:55 <diablo_rojo> Nothing new, except to beg again for sql help
19:28:25 <diablo_rojo> mordred, would you have some time in the next week or so to look at the querylogs and suggest some changes?
19:28:42 <diablo_rojo> Better yet, make some changes..
19:29:13 <mordred> diablo_rojo: yes - I will look at them in the next week or so
19:29:31 <diablo_rojo> mordred, thank you thank you thank you
19:29:37 <fungi> i can try to follow the earlier example of generating a fresh analysis of the slow query log and publishing that, at least
19:29:52 <fungi> but my sql-fu doesn't run very deep
19:30:43 <fungi> so beyond a git grep for some of the combinations of field names to see where those queries could be coming from (or guessing based on what they look like they're trying to do) i don't know that i'll be able to run many of them down
19:31:16 <clarkb> does sqlalchemy have a way to annotate queries with that info?
19:31:34 <clarkb> (I'm guessing now because sql)
19:31:39 <clarkb> *no
19:31:59 <clarkb> like comments would just get thrown out before the slow query log ever sees them
19:33:23 <clarkb> Sounds like that may be it? lets move on
19:33:26 <clarkb> #topic General Topics
19:33:40 <clarkb> Trusty server update progress
19:33:42 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:33:53 <fungi> clarkb got wiki-dev puppeting!
19:33:55 <clarkb> I think puppet is running successfully on wiki-dev02 now
19:34:13 <fungi> yeah, it's probably time to delete it and boot another one now
19:34:14 <clarkb> fungi: what is the next step there? testing that the wiki works?
19:34:31 <clarkb> ah ok so start from fresh, make sure that works, then maybe copy data from prod wiki ?
19:34:33 <fungi> just to make sure things still get puppeted smoothly from the get-go
19:34:38 <clarkb> ++
19:34:44 <fungi> and then exercise it and try adding a copy of the db
19:35:07 <clarkb> sounds good
19:35:08 <fungi> i can try to poke at that some now that the gitea replacements are behind us
19:35:31 <clarkb> Separately corvus has been adding features to make browsing logs in zuul's web ui possible
19:35:45 <clarkb> Those changes merged yesterday. And today there has been much debugging
19:36:02 <clarkb> I think this is the last major step in getting us to a point where logs.o.o can be a swift container
19:36:07 <clarkb> corvus: ^ anything to add to that?
19:36:32 <corvus> at some point, this will become the default report url for jobs
19:36:50 <corvus> so instead of linking directly to the log server when we report a job back, we'll link to zuul's build page
19:37:18 <fungi> that will be great
19:37:20 <clarkb> It should give people more of what they expect if they've used a tool like travisci previously
19:37:23 <corvus> ideally, i think we'd want to make that switch before we switch to swift (from a UX pov), but technically they are orthogonal
19:37:54 <clarkb> ya I think this work makes the swift change far less painful for our users because the zuul experience should remain the same
19:38:16 <corvus> (because i think the transition of osla -> zuul-js is better than osla -> swift static -> zuul-js)
19:38:56 <corvus> so yeah, i think debugging this, then flipping that switch, then swift are the next major steps
19:38:57 <fungi> also the lack of autoindex if we point them directly at logs in swift, right?
19:39:10 <fungi> this way zuul acts as the file index
19:39:13 <clarkb> fungi: we actually do get autoindex in swift if you toggle the feature
19:39:18 <fungi> oh, cool
19:39:21 <clarkb> but the way zuul does it should be much nicer
19:39:24 <corvus> fungi: we have static index generation to compensate for that, but i'd rather not use it
19:39:25 <fungi> we just previously used swifts which didn't
19:39:47 <fungi> oh, also good point, i forgot we worked out uploading prebuilt file indices
19:39:48 <corvus> well, we actually put some static generation into the swift roles so we didn't have to rely on that
19:40:33 <corvus> but yeah, neither that, nor the static severity stuff is better than osla, but i think that zuul-js is (or, at the least, no worse).  so if we can do it in the order i propose, that's better
19:40:43 <clarkb> ++
19:40:47 <fungi> i concur
19:41:08 <clarkb> Next up on the agenda is a quick cloud status update
19:41:12 <fungi> one remaining blocker to doing them in that order is to add https for logs.openstack.org, yeah?
19:41:37 <clarkb> fungi: ya or we serve the zuul links with http://
19:41:44 <corvus> fungi: ah, yes.  or remove https from zuul.  :|
19:42:07 <clarkb> I think we can do https://logs.opendev.org easily
19:42:11 <clarkb> then set the CORS headers appropriately
19:42:21 <corvus> we should double check all the swifts we plan to use
19:42:41 <corvus> oh, someone said that most of those should be under the swift api anyway, right?  so should be https?
19:42:50 <clarkb> corvus: yes I would expect them to be https
19:43:00 <clarkb> we should double check but I'm not super worried about it
19:43:18 <corvus> that would only leave rax, and technically that needs a bit more work anyway to support the cdn stuff
19:43:36 <corvus> (er, i mean that would leave rax as something to investigate)
19:44:52 <clarkb> On the cloud test resources side of things donnyd has rebuilt the fortnebula control plane and we've run into some networking trouble with the mirror after that. I'll look at that after the meeting
19:45:17 <clarkb> Once that is sorted out I think we are hopefully near a longer term setup with that cloud whcih we can use going forward
19:45:32 <clarkb> mordred: any news on MOC enabling service tokens?
19:45:38 <clarkb> (or whatever that term is I'm searching for)
19:47:08 <clarkb> We must've lost mordred
19:47:15 <clarkb> I'll try to followup on that after the meeting too
19:47:36 <clarkb> Last up is PTG planning. Still quite a bit early but if you have any idea you can put them up at:
19:47:38 <clarkb> #link https://etherpad.openstack.org/p/OpenDev-Shanghai-PTG-2019
19:49:04 <clarkb> corvus: related to ^ I'm going to start sorting out the gitea stuff too
19:49:53 <corvus> clarkb: thanks
19:50:04 <clarkb> And that was the agenda
19:50:10 <clarkb> #topic Open Discussion
19:51:07 <corvus> did folks see the ml post about rget?
19:51:17 <corvus> i know there were some replies
19:51:27 <corvus> sounds like we have consensus to proceed
19:51:43 <clarkb> did not hear anything say it was a bad idea
19:51:50 <fungi> yeah, to reiterate here, it sounds like a worthwhile thing to participate in
19:51:54 <corvus> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008107.html
19:52:40 <corvus> i'm a little unclear about the githubiness of that...
19:52:53 <corvus> from what i can see, it doesn't look like it should be an issue
19:53:11 <corvus> but the authors seem to think maybe there's a little more work?  https://github.com/merklecounty/rget/issues/1
19:53:33 <clarkb> I think they've done some magic to handle github urls specially
19:53:38 <clarkb> but my read of it was that this wasn't required
19:53:46 <clarkb> we can feed it a tarballs.o.o url and validate that
19:54:06 <corvus> at any rate, it seems like they don't want to require github, so any issues we run into we should be able to resolve
19:54:11 <clarkb> oh that issue contradicts my read of it
19:54:17 <clarkb> but ya seems they'll be likely to fix that for us
19:54:31 <corvus> yeah, i think i'm going to go stick a static file on a private server and see what breaks :)
19:55:05 <clarkb> the way the certs end up into the certificate transparency log shouldn't prevent any domain from working
19:55:13 <fungi> i have personal projects where i already publish sha256sums i can easily test with as well
19:55:15 <corvus> (maybe the client isn't the big issue?  maybe it's the something the server does?)
19:55:19 <clarkb> they do mangle the path component and the hostname though
19:55:28 <clarkb> and with github it isn't a 1:1 I guess that is what they have to sort out
19:58:03 <clarkb> I'll let everyone got get breakfast/lunch/dinner now.
19:58:06 <clarkb> Thank you all!
19:58:09 <clarkb> #endmeeting