Wednesday, 2016-08-17

*** jeblair has joined #openstack-infra-incident		01:47
*** anteaya has joined #openstack-infra-incident		01:47
fungi	quieter here	01:47
*** ChanServ changes topic to "wiki compromise"		01:48
anteaya	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=355&rra_id=all	01:48
fungi	so anyway, in combining jeblair's and my theories, best guess is that i missed that the trusty upgrade moved our v4 rules file incompatible wit the way it was previously symlinked, so a reboot ended with no v4 rules loaded exposing the elasticsearch api to the internet	01:50
jeblair	fungi: bup claims to have daily backups up to and including 8-16. they appear at 05:37	01:51
jeblair	they are append-only backups and can therefore be used for forensics as well as recovery	01:52
jeblair	having said that, i don't think we've done a restore test on wiki	01:52
jeblair	so i don't beleive anyone has verified whether there's anything actually in the backup.	01:53
fungi	with file uploads disabled, we're probably fine rolling back to the upgrade and using the current db from trove	01:53
fungi	i'll see if the snapshot i saved comes up in a sane state	01:53
jeblair	fungi: aren't there db migrations?	01:53
fungi	yes, i mean reupgrade the snapshot and use the migrated db	01:53
jeblair	fungi: okay, so create new server from snapshot, upgrade mediawiki, check firewall, then online?	01:54
fungi	right	01:54
fungi	no idea how long it'll take this to build from the snapshot though	01:54
fungi	though actually, we _do_ have one already booted and configured	01:56
fungi	with the db in trove, wiki-upgrade-test.openstack.org can simply have apache reenabled and started, and dns pointed at it	01:57
anteaya	that sounds faster	01:57
jeblair	fungi: was it similarly exposed?	01:58
jeblair	(i worry about whether we can trust it)	01:58
anteaya	I'm grateful for what you are worried about	01:59
jeblair	i worry a lot so i hope your gratitude is boundless :)	01:59
fungi	jeblair: not so far as i could tell. after the upgrade puppet ended up blocking everything because the server wasn't in system-config at all so ti got our default ssh-only rules	01:59
anteaya	jeblair: my gratitute for your worry is boundless :)	02:00
fungi	we didn't turn off puppet for wiki-upgrade-test until after the upgrade, and then i had to manually copy the iptables rules back over from wiki.o.o	02:00
jeblair	(i also worry a little about whether someone could have elevated wiki permissions by modifying the database, though i confess that might be somewhat low risk)	02:00
anteaya	I think that is a fair worry	02:01
anteaya	can we check if anyone was granted admin wiki status since friday?	02:01
anteaya	from what I'm guessing cacti is telling me whoever was doing what they were doing were doing it between 13:00 and 19:00 today	02:02
jeblair	fungi: there must have been days when the clone and the real server were both online?	02:03
fungi	yeah, it's possible, though it's also an unlikely target. once most scripted compromises get access to a shell their main interest is in using the server to run other stuff. it's possible we got compromised by someone who knew it was running mediawiki and wanted to leverage it to leave some sort of backdoor in the db, but it's on the unlikely end of things	02:03
fungi	jeblair: by clone you mean wiki-upgrade-test.o.o?	02:03
fungi	only one was pointed at the trove db at a time	02:03
jeblair	fungi: i'm trying to figure out how this affects backups	02:03
fungi	oh...	02:04
anteaya	oh you are talking about server permissions, not wiki permissions, sorry	02:04
jeblair	i imagine if we had to copies of a server online at a time, they would both run backups. i only see one backup per day, so it might be that bup only allowed the first one through, and which one that was was random...	02:04
jeblair	anteaya: well, could be either	02:05
fungi	yeah, both of them have the same backup cronjob	02:05
anteaya	ah so we might not be able to trust the backups :(	02:05
jeblair	anteaya: i'd rather say that they may not contain the data we expect	02:06
fungi	it hadn't dawned on me that the backups are initiated from the servers being backed up, so clones are certainly a danger for servers with backup configuration in this case	02:06
anteaya	jeblair: okay a better way of putting it	02:06
anteaya	so is fungi's snapshot the most recent backup that would contain the data we expect?	02:06
anteaya	that we have confidence in?	02:07
jeblair	anteaya: with a key of when the clones were active, we can figure the last date that was certainly the old server. backups after that could be one or the other, and it may be possible to determine which by clues on their filesystems.	02:07
jeblair	also, we need to rotate the backup key for wiki.o.o after this, because it has been compromised	02:08
anteaya	ack	02:08
jeblair	(the append-only nature of the backups makes them still reliable though [modulo the clone issue])	02:09
jeblair	i have moved the authorized_keys file for wiki on the backup server out of the way so that key can no longer be used to access the server	02:10
fungi	yeah, the wiki-upgrade-test clone was booted a few weeks ago, so i wouldn't trust it to necessarily be production data past july 21 (that's what rackspace says is when i created it)	02:10
jeblair	so we won't get a backup again until we fix that (by creating a new key, ideally)	02:11
jeblair	okay, well if we need the backups, we can poke at that, but i'm not going to restore 26 copies of the wiki server out of curiosity. :)	02:12
anteaya	ha ha ha	02:12
jeblair	that would probably take 26 days.	02:12
anteaya	I better things planned for the next 26 days	02:13
jeblair	dinner is here, i have to run	02:13
anteaya	enjoy dinner	02:13
anteaya	thank you	02:13
fungi	thanks jeblair!	02:13
anteaya	fungi: happy to help or listen	02:14
anteaya	or help by listening	02:15
fungi	i'm moving forward pointing wiki.o.o dns at wiki-upgrade-test.o.o and taking the compromised wiki.o.o offline	02:15
anteaya	ack	02:15
fungi	teh boot from snapshot is still spinning at "10%" so no idea how long that'll take	02:16
anteaya	woooo	02:16
anteaya	the watching paint dry routine	02:16
anteaya	your wife loves this one	02:16
fungi	okay, dns has propagated, configs have been adjusted on wiki-upgrade-test for the correct vhost name, and i've confirmed the firewall rules there are and were still sane	02:34
*** ChanServ changes topic to "situation normal"		02:59
*** openstack has joined #openstack-infra-incident		10:16
*** pabelanger has quit IRC		16:54
*** pabelanger has joined #openstack-infra-incident		16:54
-openstackstatus- NOTICE: The volume for logs.openstack.org filled up rather suddenly, causing a number of jobs to fail with a POST_FAILURE result and no logs; we're manually expiring some logs now to buy breathing room, but any changes which hit that in the past few minutes will need to be rechecked and/or approved again		19:45

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!