Wednesday, 2016-01-20

fungimordred: do you happen to know which provider/region test-mordred-config-drive is in (or an easy way for me to find out)?00:01
funginevermind, it's also ord00:02
mordredfungi: you can always eddit /var/cache/ansible-inventory/<tab> and look at the metadata in there associated with test-mordred-config-drive (in the future)00:03
mordredthere's probably a cooler way to do that00:03
mordredbutthat's what I do :)00:03
fungilooks like maybe the openstack_project::server class isn't applied to our afs servers, which explains why they didn't get my sources.list fix00:05
fungijeblair: ^00:05
fungiwas that intentional or an oversight or do you even remember?00:05
fungisorry, s/probably/definitely/00:06
jeblairfungi: i don't recall; mordred ^?00:06
mordredI also do not remember00:06
fungithey're using template instead of server00:06
fungilooks like maybe it was just a mistake since server is a very thin wrapper around template?00:07
clarkbshould I go ahead and reboot nodepool to pick up the code changes and patch kernel?00:18
fungiclarkb: yeah that can happen. double-check you see that new package installed first00:20
fungiclarkb: though worth noting, node utilization is dropping like a stone on the graphs00:21
fungiand it's not clear to me that it's due to a drop in workload00:21
fungioh, it's ticked back up now. it may have been one of those "zuul spent an hour handling a reconfigure"00:26
clarkbaptitude show linux-image-3.13.0-76-generic State: installed00:30
clarkbso I think I am going to go ahead and do nodepool00:30
clarkbin progress now00:33
clarkbLinux 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux00:34
clarkbdeficit calculations just happened so I think nodepool is running, will keep an eye on it00:35
fungisorry, had to bring a new mattress up a couple flights of stairs, back now00:44
fungiso anyway, 269930 has merged. once i see the updates propagate to the afs servers i'll catch them up00:45
fungii've gotten review-dev updated. checking into pypi.region-b.ord-1 now00:46
fungiit probably just doesn't have enough disk space to apt-get anything00:46
fungier, pypi.region-b-geo-100:47
fungihrm, no, it has disk space00:47
fungioh, it crashed a dpkg run at some point in the past, so it's not updating00:48
fungiand that got it to finish installing linux-image-3.13.0-76-generic00:49
fungiso that's just the afs servers missing now00:50
fungiand looks like the puppet update is hitting them00:50
fungiand now i've got the new kernel on those, so we should be installed everywhere. i'll run one more play to confirm that00:54
fungioh, right, i need to clear the hostlist cache00:56
fungias mordred suggested00:56
fungiokay, the skipped ones show ok=1 changed=0, the non-skipped ones show ok=2 changed=1, none are unreachable or failed01:02
fungiand the count on the non-skipped servers matches what puppetboard reports for trusty servers01:03
fungiso i think we're ready to start rebooting things (aside from nodepool, which clarkb already rebooted)01:03
fungithough i may need to take a backseat on this part, as the evening's wearing on here and i still need to unbox this mattress01:04
anteayasounds like pip has just created more fun and I'm too tired to get involved01:25
anteayaanything I can do to help this effort before I sign off for the night?01:25
anteayaokay good night01:32
anteayacongratulations on the new mattress fungi, I hope it gives you restful sleep01:32
* anteaya heads off to get some restful sleep of her own01:33
clarkbfungi: is there a list of things that need rebooting? maybe you have that handy and can make an etherpad?01:47
fungii simply queried puppetboard, but can whip that up easily enough01:47
fungithat's courtesy of
jeblairLinux 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux02:10
jeblairis that the version we're looking for?02:10
jeblairfungi, clarkb: ^?02:10
clarkbjeblair: yes02:10
fungijeblair: lgtm02:11
fungii've also noted on the etherpad that clarkb already took care of nodepool.o.o02:11
clarkbthanks I just got hit by a fulisade of house tings so distracted. Appraisals are very interesting documents02:12
clarkbI probably spelled that fwrong02:12
clarkbhere is 25 pages to read02:12
clarkbhave fun02:12
jeblairi picked a few off, but am going for food now02:27
jeblairjhesketh: and, ha! if you are bored, you could reboot some servers! :)02:27
jheskethjeblair: will do once I'm done here :-)02:28
jhesketh(here being devstack promotes)02:28
* jhesketh didn't manage to keep up with everything in here... 02:37
jheskethhave all those servers been upgraded and just need a restart?02:37
jhesketh(those servers being the list in the etherpad)02:37
fungithat would be our entire list of trusty servers. i've confirmed the new kernel package is installed on every one of them02:38
jheskethshould I do any kind of alerts that services are going down for emergency restarts?02:38
fungii wouldn't for the majority of them02:38
fungithough gerrit reboot and zuul reboot probably need to happen at the same time, and zuul's queue dumped/readded02:39
jheskethyeah that one won't be fun02:39
fungioh, wait, zuul doesn't need a reboot02:39
fungiso maybe we can just hope review.o.o reboots quickly?02:40
fungipick a time when the gate is nowhere near reporting a change if you can02:40
fungizuul gets a free pass this time because it's on precise, so too old of a kernel to be affected02:40
fungiwe sort of lucked out that we haven't gotten as far as we'd like on the trusty migrations i guess02:41
jheskethmy reading of the CVE is that it's not overly urgent since it requires an attacker to have an account already02:41
jheskethunless of course gerrit accounts count and then it's bad02:41
jheskethotherwise I can wait until later in my evening and things will be quieter for most02:41
fungiyeah, non-urgent02:41
jheskethdo we know if gerrit users can use the vector?02:41
fungithey shouldn't be able to, no02:41
jheskethokay, I'll do the easy ones first and wait for a bit lower traffic for the harder ones02:42
fungibut also you are well-located to be more awake than most of us at the times when activity on our systems is lowest02:42
jheskeththings like ask.o.o are pretty public facing, what's your opinion on that one?02:42
fungia few minutes of unannounced downtime for ask.o.o should be fine. i'm not sure the best way to reach the ask audience anyway since they're basically the people who can't suss mailing lists and don't even know what irc means02:44
jheskethhmm, asfd doesn't start automatically?02:47
fungimaybe that was being saved for later02:59
fungicould be there's still some tidying up to do on that deployment02:59
jheskethyeah I've made a note in the etherpad for somebody to look to03:01
jhesketh*at even03:02
-openstackstatus- NOTICE: is being restarted to apply patches11:42
*** ChanServ changes topic to " is being restarted to apply patches"11:42
*** ChanServ changes topic to "CVE-2016-0728"11:53
-openstackstatus- NOTICE: Restart done, is available11:53
*** AJaeger has joined #openstack-infra-incident12:01
jheskethMost servers are restarted. I've updated the etherpad:
jhesketh5 servers need checking, probably the most important one being stackalytics.openstack.org12:10
jheskethI don't think the others matter but need checking sooner or later12:11
jheskeththere are 5 more that still need restarting.. I suspect they won't be any trouble, but I had queries (see pad)12:12
*** crinkle_ has joined #openstack-infra-incident13:49
*** lifeless_ has joined #openstack-infra-incident13:51
*** ianw_ has joined #openstack-infra-incident13:54
*** ianw has quit IRC13:56
*** anteaya has quit IRC13:56
*** clarkb has quit IRC13:56
*** crinkle has quit IRC13:56
*** lifeless has quit IRC13:56
*** Zara has quit IRC13:56
*** ianw_ is now known as ianw13:56
*** Zara has joined #openstack-infra-incident13:56
*** anteaya has joined #openstack-infra-incident14:03
*** clarkb has joined #openstack-infra-incident14:04
fungithanks jhesketh!15:25
*** crinkle_ is now known as crinkle18:04
*** lifeless_ is now known as lifeless18:12
*** AJaeger has left #openstack-infra-incident21:18
*** ChanServ changes topic to "situation normal"21:54
fungii guess we're still waiting for centos 7 kernel updates, unless we want to try the stap workaround ianw linked21:55
clarkbI think the surface area is small enough on those that we can probably wait21:55
clarkbselinux is supposedly something that makes it harder to exploit too and we run with that enabled21:56
*** mordred has quit IRC23:04
*** mordred has joined #openstack-infra-incident23:06

Generated by 2.14.0 by Marius Gedminas - find it at!