17:00:05 #startmeeting opendev-maint 17:00:06 Meeting started Fri Apr 10 17:00:05 2020 UTC and is due to finish in 60 minutes. The chair is corvus. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:09 The meeting name has been set to 'opendev_maint' 17:00:23 ha, apparently it's opendev_maint :) 17:00:44 #status notice etherpad.openstack.org will be offline for about 30 minutes while it is migrated to a new server with a new hostname; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html 17:00:44 corvus: sending notice 17:01:05 * mordred is in a screen on etherpad01.opendev.org 17:01:21 joined 17:01:39 joined as well 17:01:59 k. I'm ready to rock and roll there - somebody else want to stop existing etherpad? 17:02:03 ( 17:02:10 * clarkb is joining 17:02:16 i'll stop existing etherpad 17:02:20 I'm going to warn everybody - it's like watching paint dry in the screen once this is running 17:02:40 oh, for the db dump/source pipeline? 17:02:41 I've joined 17:02:42 yup 17:03:29 old etherpad is stopped 17:03:35 ok. I'm going to run the command 17:03:47 it is running 17:04:01 neat, old etherpad is running a puppetlabs mcollectived server 17:04:03 whatever that is 17:04:10 corvus: finished sending notice 17:04:30 WOW 17:04:33 mordred: is etherpad running on the new server? 17:04:38 corvus: it shold not be 17:04:44 I only started the mariadb service 17:04:53 cool, i confirm that's the case :) 17:05:14 mcollective was puppets message bus for doing orchestration like tasks 17:05:17 should we start the dns change now? 17:05:37 i believe we should change etherpad.openstack.org cname to point to etherpad.opendev.org ? 17:05:45 yeah - I think that's a good idea 17:06:08 i'll get started on that while clarkb and fungi confirm :) 17:06:29 ++ 17:07:43 yes definitely 17:07:53 to give the change time to propagate 17:08:17 presumably the plan is to delete the existing a/aaaa rrs for etherpad.openstack.org and replace it with a cname to etherpad.opendev.org 17:09:02 etherpad.openstack.org is currently a cname for etherpad01 17:09:07 etherpad.openstack.org is currently a cname for etherpad01.openstack.org 17:09:17 i was going to change it to be a cname for etherpad.opendev.org 17:09:33 so the result will be etherpad.openstack.org -> etherpad.opendev.org -> etherpad01.opendev.org 17:09:39 corvus: ++ 17:09:41 corvus: I think that's correct 17:10:09 ahh, right, so just update the cname, even easier 17:10:11 there's just one problem; i don't see etherpad.openstack.org in the list of records in the rax web ui 17:10:20 it was there when i changed the ttl a few days ago 17:10:36 scroll all the way to the end and then keyword search? 17:10:37 is there some kind of limit? 17:10:44 the rax records are paged and sorted by type 17:10:50 fungi: that is my usual procedure which i have done 17:10:51 it only pages in some at a time and you have to scroll 17:10:55 weird 17:10:57 ahh, i can try 17:11:02 the lenght of the db backup is making me think about this. Whats the disk situation like on the new server? it has a 50GB volume and is currently using ~3GB of that for the prod db? 17:11:04 also - https://review.opendev.org/#/c/718764 can be landed now 17:11:12 wait i found it 17:11:13 standing down! 17:11:22 also ^F doesn't work properly 17:11:22 ctrl-f was not bringing it up 17:11:23 corvus: once it's loaded it's about 30G of data 17:11:30 gah 17:11:32 clarkb: ^^ 17:11:34 mordred: is 50GB big enough? 17:11:38 but scrolling to it, it shows up (and it's highlighted) 17:11:44 that's what the volume was on the old one 17:11:49 mordred: ah ok 17:12:03 and we can always attach another volume and grow the lv 17:12:21 now that I've said ^ and checked lvs I'm far less worried :) 17:12:28 ++ 17:12:44 * fungi checks paint, still sticky 17:12:49 that said - I was totaly a shemp when I attached that volume so the lv has a stupid name 17:13:04 #info updated etherpad.openstack.org. CNAME from etherpad01.openstack.org. to etherpad01.opendev.org. 17:13:23 i left the ttl at 300 17:13:45 cool 17:14:00 do we have an ssl cert for etherpad.openstack.org on etherpad01.opendev.org? 17:14:08 yeah, i already tested that bit 17:14:22 cool, i thought so, just running through things again :) 17:14:28 if you want to watch the db size grow: 17:14:30 ls -ltrah /var/etherpad/db/etherpad@002dlite/store.ibd 17:14:36 on etherpad01.opendev.org 17:15:22 ya and the LE verification failed the first time around because dns wasn't set up properly to verify that the frist time 17:15:24 X509v3 Subject Alternative Name: DNS:etherpad.opendev.org, DNS:etherpad.openstack.org, DNS:etherpad01.opendev.org 17:15:29 according to openssl 17:15:50 woot 17:16:28 etherpad.openstack.org. 300 IN CNAME etherpad.opendev.org. 17:16:28 etherpad.opendev.org. 299 IN CNAME etherpad01.opendev.org. 17:16:28 etherpad01.opendev.org. 218 IN A 104.130.124.120 17:16:37 that's what i get from dig now 17:17:10 corvus: looks perfect 17:17:45 and cool, the http redirect is working 17:17:58 (because apache is up; it's just the eplite service that's down) 17:19:27 while we're waiting - it occurred to me recently - is having apache on the host rather than in a docker container and in the compose file the right choice? would it make more sense to run it as an apache container as well? 17:20:29 mordred: ya I was thinking about that back when I thought refstack might grow some momentum again. I think if we want to go away from using host networking having a host run webproxy is nice though it could be the one host network container too 17:20:59 right, i tested the redirect yesterday as well, albeit with the etherpad service down and apache serving an error for it 17:21:12 so looks like what i got from my local /etc/hosts edit 17:21:33 clarkb: yeah - I was thinking about it from a "what would be different about these container services if we decided to roll out k8s" 17:21:55 mordred: if we rolled out k8s we'd probably use the nginx ingress controller for a good chunk of that ? 17:22:00 i'm ambivalent about whether we run apache in a container or not; if we did, we could stull use host networking 17:22:04 though services like etherpad need rewriting which I don't know that can do 17:22:31 clarkb: we would use *some kind* in ingress controller, not necessarily the nginx one, depending on what our load balancer situation was like 17:22:38 fair 17:22:39 and many of them can rewrite 17:22:39 clarkb: yeah - I think we can still run apache behind the ingress controller in those cases - so that we don't have to rewrite all of our rewrites 17:22:48 but also - cloud load balancers are a thign too 17:23:19 when we did the gitea setup, we used a cloud load balancer that attached to exposed service of each pod running 17:23:49 and that cloud load balancer was running haproxy not nginx :) 17:23:53 that said - in our current clouds we can do the same thing only with nginx ingress if we use VRRP to manage which thing owns the VIP 17:24:19 if we don't want to rely on a cloud load balancer 17:24:46 I know that it's possible to create VRRP-enabled ports in neutron in vexxhost 17:25:27 mordred: ya the basic requirement is being able to control a shared l2 network between the instances with the 3 IPs on that network 17:26:05 though maybe you don't even need the third ip on that network if you can vrrp separately? its been a while since I had to do vrrp 17:26:08 here's an ingress controller config for gke with a path mapping (to /, but the syntax is there to imagine other roots); so it's doing layer 7 load balancing -- https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/k8s/zuul.yaml#315 17:27:14 clarkb: yeah, technically you can have vrrp/hsrp/carp use only two addresses (though a third makes it somewhat easier) 17:29:28 corvus: so that ingress setup seems like it's mapping a single external ip to the resources? 17:30:43 mordred: I think its a name not an ip 17:30:50 (so they could do magic with dns potentially) 17:31:08 kubernetes.io/ingress.global-static-ip-name: "zuul-static-ip" 17:31:13 is what I was keying off of 17:31:52 mordred: yes, it's a single pre-allocated static ip 17:32:10 (i previously ran "gcloud get me a static ip named zuul-static-ip") 17:32:15 ah 17:32:21 so its referencing cloud resources outside of k8s 17:32:24 nod. so pattern-wise (ignoring mechanics for a sec) - that would potentally map to the sorts of things we'd want to do 17:32:24 yep 17:33:40 so figuring out the equiv pattern for us inside of a k8s in openstack would be a key piece if we wanted to explore using k8s for services instead of compose 17:37:03 we are at 13GB used 17:41:33 and now 15GB this paint is sticky 17:44:02 yeah 17:44:06 "wet data, do not touch" 17:44:10 seems to be running slower today 17:48:40 it is a holiday 17:48:46 we're expiting it to be how big? 17:48:55 ~30gb clarkb said? 17:48:56 30g right? 17:49:04 ya thats what mordred said above 17:49:10 oh, got it 17:49:57 so we're 36 minutes away from completion 17:50:54 status notice The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC 17:50:57 should we send that? 17:51:17 yeah, warranted 17:51:19 ++ 17:51:22 #status notice The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC 17:51:22 corvus: sending notice 17:51:29 i'm going to afk for about 30m 17:52:22 once maintenance is concluded, it may be time to prepare for my annual viewing of "the life of brian" 17:53:33 I'll be making a tunafish sandwich for lunch when this is done 17:53:39 fungi, clarkb : while you're waiting: https://review.opendev.org/#/c/718764/ 17:54:41 and actually - I think we can not land that yet 17:54:48 corvus: finished sending notice 17:55:15 and land it once we take etherpad out of the emergency file to ... no, that's too laggy. nevermind me 17:55:16 https://review.opendev.org/#/c/719051/ another good one to review though it had a post failure 17:55:20 I think we can land it whenever 17:57:30 clarkb: and this one remote: https://review.opendev.org/719053 Set env vars pointing to correct file locations 18:01:57 and remote: https://review.opendev.org/719052 Fix issues from rolling out containers 18:02:10 infra-root db migration done 18:02:21 I might have been wrong about db size 18:02:33 or there were a lot of zeroes at the end 18:02:39 or newer mysql is more compact 18:02:56 I think actually 32G of free space on device is what I was looking at :) 18:02:57 so ready to start up the container? 18:03:01 yeah - I thnk so 18:03:19 any last concerns? 18:04:03 none for me 18:04:04 none from me 18:04:08 k. here we go 18:04:41 k. I reloaded an openstack etherpad, it redirected to opendev and all is good 18:05:04 i reconnected to a pad i already had open and got sent to the right (new) place 18:05:09 we might want to keep our eyes on this as it gets usage - might need to tune the my.cnf settings 18:05:32 didn't even reload, just clicked the reconnect button from when it got disconnected during the shutdown 18:05:56 we did at least incorporate the apache tuning we had on the old deployment, right? 18:06:27 yeah 18:06:40 innodb_buffer_pool_size = 256M is the one I think might be applicable 18:06:51 tested out a few more pads, not seeing any problem yet 18:06:54 mordred: thinking it may need to be bigger? 18:06:55 but honestly, 256M of hot data isn't bad 18:07:26 and ya I think individual etherpads tend to be pretty small. Its the history data that grows (I wonder if we can tune it to prefer the newer pad data) 18:07:57 it'll do that naturally - the buffer pool will only contain the most recently touched pages 18:08:20 so I think it should be fine 18:09:09 in other news, my new dowel-style rolling pin has arrived 18:11:30 have fun! i still just use a boring old marble cylinder roller 18:11:41 but i like the extra weight 18:12:27 are you saying I'm fat? 18:13:12 heh 18:13:35 that post failrue was due to an rsync failure fwiw 18:13:42 mordreds approval seems to have rechecked it 18:14:29 do we need to send an all clear now? and maybe end the meeting? 18:14:40 not sure what other work there is to do other than following up on gerrit jeepyb things 18:18:33 I think we should end the meeting - don't know if we need an all clear 18:18:36 I thnk this oe is good 18:19:09 we might need to restart etherpad to pick up the settings.json update - but that should be a thing that can just be done - in the margin of error of an internet facing service connectivity 18:19:37 oh - we need to take etherpad01.opendev.org out of emergency - shall I do that? 18:19:41 ++ 18:19:58 and then sometime next week clean up the old server and db? probably after we have backups running for the new server? 18:20:17 no - we need ot land ... 18:20:36 https://review.opendev.org/#/c/719036/ 18:20:39 and then ... one sec 18:20:40 mordred: are we missing an equivalent of https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/templates/gerrit_patchset-created.erb ? 18:21:58 mordred: comment on https://review.opendev.org/#/c/719036/1 18:22:22 nevermind, found it at https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/patchset-created.j2 18:23:14 clarkb: updated - and pushed up 2 additional 18:24:02 fungi: oh - I issed that one in the patch didn't I? 18:24:10 mordred: yeah, i commented 18:24:20 since it's a template it's not in the same directory 18:24:27 ++ 18:24:35 though maybe make it not a template? 18:24:39 o/ 18:25:18 it's only templated so we can toggle the welcome message feature on the existence or absence of a welcome_message_gerrit_ssh_private_key value 18:25:33 mordred: and do we expect that to noop for review01.openstack.org? I guess since its already configured? 18:25:35 fungi: yeah - which does't exist onreview-dev I think 18:25:42 which i expect was more transitionalor for the benefit of people who might reuse our hook scripts 18:25:56 clarkb: I thnik the backup group is intended to be a normal group for servers we backup? 18:26:00 anyway, yeah, drop the conditional, move to files, add envvar exports 18:26:03 mordred: aha got it 18:26:06 the backup-server is the only one we only run some times 18:26:13 also I accidentally adding a +W on that group change. I've removed that 18:26:14 (see the two followup patches) 18:26:47 fungi: no - I think review-dev doesn't have that key 18:27:03 fungi: we'd need to add one for it - and a welcome message user 18:27:51 that said ... 18:30:03 fungi: I updated it - I think you'll like it now 18:32:02 corvus: does the stack at https://review.opendev.org/#/c/719077/ look right to you? 18:33:13 mordred: yeah -- though what was the conclusion about puppet managing backups on review? 18:33:29 (have we confirmed that's gone?) 18:33:42 those would be cron jobs right? 18:33:52 mordred: yes cron jobs 18:33:59 and since puppet isn't running its not managing it 18:34:12 would mostly just be ensuring ansible applies the same or similar cron jobs and bup config 18:34:23 yeah. let me remove the bup cronjob 18:34:35 there's also 2 other cronjobs we have for root we need to add to ansible 18:34:41 but I'll leave them for now 18:34:51 until we have the patch to replace them 18:35:23 k. bup cronjob on review01.opendev.org has been removed - we should expect ansible to add one now 18:35:33 lemme make a patch to add the others 18:35:58 service-backup should apply it 18:36:07 when you add the server to the backup group 18:36:18 (I don't know what rtiggers that playbook though) 18:40:24 clarkb: well - we have a patch to trigger all playbooks on inventory changes 18:40:30 that hasn't landed 18:40:39 https://review.opendev.org/719088 <-- gerrit cron jobs 18:41:09 clarkb: I take it back - inventory changes trigger everything now: https://review.opendev.org/71908 18:41:25 clarkb: so adding and removing the things to groups should cause the backup playbook to run 18:41:38 k 18:41:55 mordred: that link is missing a digit 18:42:57 clarkb:https://review.opendev.org/#/c/717114/ is what I meant 18:43:50 specifically line 1716 of that change covers this case 18:43:56 yeah 18:44:55 hah 18:47:40 looks like it's time to end the meeting 18:47:50 #endmeeting