*** krotscheck_dcm is now known as krotscheck | 00:39 | |
*** hieulq has joined #openstack-sprint | 01:16 | |
*** baoli has quit IRC | 01:57 | |
*** baoli has joined #openstack-sprint | 02:04 | |
*** baoli has quit IRC | 02:26 | |
*** yuikotakada has joined #openstack-sprint | 05:55 | |
*** yuikotakada is now known as yuikotakadamori | 05:55 | |
*** yuikotakadamori has quit IRC | 07:23 | |
*** yuikotakadamori has joined #openstack-sprint | 08:20 | |
*** baoli has joined #openstack-sprint | 09:05 | |
*** baoli has quit IRC | 09:09 | |
*** cdelatte has joined #openstack-sprint | 11:05 | |
*** rfolco has joined #openstack-sprint | 12:00 | |
*** baoli has joined #openstack-sprint | 12:02 | |
*** anteaya has joined #openstack-sprint | 13:14 | |
*** ChanServ changes topic to "Taking Infra servers running Precise and upgrading them to Trusty | https://wiki.openstack.org/wiki/VirtualSprints#Infra_Trusty_Upgrade" | 14:13 | |
fungi | i've claimed the storyboard.o.o upgrade for myself. guess i'll try to kick the week off with that | 14:14 |
fungi | note that, contrary to what you might expect, there is a fair amount of change/review work which people without infra-root access can help with on this sprint | 14:15 |
anteaya | great | 14:15 |
anteaya | I'm fine to help with the change/review work | 14:15 |
anteaya | mostly I need to learn what to do there | 14:16 |
fungi | pick an entry in openstack-infra/system-config manifests/site.pp which only has precise in the the comment and add a second line for trusty (like what you see for storyboard.openstack.org already) | 14:16 |
anteaya | so I'll watch/observe/learn from the first one? | 14:16 |
fungi | then see if it passes tests, and if not, try to figure out what needs fixing first | 14:16 |
fungi | once an infra-root admin gets to the point of launching a replacement for that entry, they will quite probably find other things which need tweaking in various puppet modules as well, which will need reviewing | 14:17 |
fungi | also, even if not infra-root, you can try to launch your own versions of some of these on a trusty vm under your control and figure out if anything needs improving to work there | 14:18 |
anteaya | awesome thank you | 14:18 |
anteaya | do we want a separate URL in the topic for the etherpad: https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans | 14:31 |
anteaya | or is the link to the virtual sprint page sufficient? | 14:31 |
Zara | ooooOoooOOoo (I'm just lurking for now, feel free to ping me for reviews) | 14:36 |
anteaya | Zara: or you can help following the instructions fungi gave above | 14:37 |
Zara | anteaya: yeah, sorry, I meant because I'm behind with other things today, so unlikely to get to it until tomorrow | 14:38 |
anteaya | Zara: ah okay | 14:38 |
anteaya | tomorrow is great too | 14:39 |
fungi | yeah, i got sidetracked by someone asking me a ton of zuul questions in a private /msg | 14:39 |
anteaya | happy to have your help when you have the time | 14:39 |
anteaya | fungi: :( | 14:39 |
fungi | planning to hit up #storyboard to coordinate the server upgrade shortly | 14:39 |
anteaya | fungi: was that this person? http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-05-23.log.html#t2016-05-23T08:50:01 | 14:39 |
Zara | yay :) (in fairness I err on the side of under-committing myself so that I'm able to respond to messages.) | 14:40 |
anteaya | Zara: good plan | 14:40 |
anteaya | should we have a common gerrit topic for this work? | 14:45 |
anteaya | trusty-upgrade as a suggestion | 14:45 |
fungi | wfm | 14:46 |
anteaya | thanks, I'll add it to the etherpad | 14:47 |
fungi | appreciated https://review.openstack.org/#/q/topic:trusty-upgrade+status:open | 14:47 |
fungi | or you can add it to our sprint section on the wiki | 14:47 |
fungi | i wasn't sure what was best to link in the /topic here and still keep it short | 14:48 |
anteaya | sure | 14:48 |
anteaya | the sprint wiki section has the etherpad link | 14:48 |
anteaya | what is apps.openstack.org? | 14:49 |
anteaya | am I safe to take that one? | 14:49 |
anteaya | I'll take that one | 14:49 |
fungi | that's the openstack app catalog. i believe it's entirely self-contained at this point (no trove, no cinder) though it may have local data which needs copying to a replacement. docaedo will know | 14:50 |
fungi | however, a patch to start testing it against trusty will be the first step regardless | 14:51 |
anteaya | I will take it | 14:51 |
anteaya | thanks | 14:51 |
fungi | in spite of being completely distracted by all manner of other things, i have now gotten far enough to lower the ttl on our a/aaaa dns entries for storyboard.openstack.org to 5 minutes (previously 60, so i shouldn't change dns until ~15:50 utc) | 14:52 |
anteaya | fungi: yay | 14:57 |
anteaya | fungi: I have added an "how can I help?" section on the etherpad | 15:00 |
anteaya | fungi: can you check my instructions for create a patch to ensure I am conveying accurate information should we get a newcomer | 15:00 |
fungi | anteaya: looks great--thanks! | 15:02 |
anteaya | fungi: thank you | 15:02 |
clarkb | apache differences likely to be the common issue we hit | 15:12 |
anteaya | morning clarkb | 15:12 |
fungi | speaking of which, my attempt to deploy a storyboard.o.o replacement failed. i reran with --keep and am digging in syslog on it now to see what happened | 15:14 |
fungi | gotta be something which wasn't exercised when i deployed the storyboard-dev instance last week | 15:14 |
Zara | oh no! thank you for looking into it. | 15:15 |
fungi | oh, looks like maybe-racy resource ordering | 15:15 |
fungi | May 23 15:08:34 storyboard puppet-user[19383]: (/Stage[main]/Storyboard::Cert/File[/etc/ssl/certs/storyboard.openstack.org.pem]/ensure) change from absent to file failed: Could not set 'file' on ensure: Could not find group ssl-cert at 41:/etc/puppet/modules/storyboard/manifests/cert.pp | 15:15 |
fungi | that group should be created by the ssl-certs package | 15:16 |
Zara | ssl certssssssssss | 15:16 |
Zara | maybe if I hiss at them, things will work | 15:16 |
fungi | er, ssl-cert package | 15:17 |
clarkb | logstash-worker* can just be replaced in batches. remember to bounce iptables on elasticsearch nodes once dns is updated. I will likely start there when I have a good chunk of time today | 15:17 |
fungi | by the time the puppet manifest is done being applied, that package has been installed and the group it wants exists, just not soon enough to satisfy the dependency | 15:18 |
fungi | i have a feeling we're going to end up with a circular dependency issue here if i try to install that package to break the dependency cycle | 15:19 |
anteaya | :( | 15:19 |
clarkb | is there no dep on the package? | 15:19 |
fungi | i believe the apache puppet module is pulling it in | 15:20 |
fungi | but we need those files in place before we start apache | 15:20 |
fungi | so we make the apache class depend on those files | 15:20 |
fungi | which need permissions set to a group which is added by a dependency of the class we're trying to prepare for | 15:20 |
nibalizer | good morning | 15:27 |
fungi | nibalizer: ^ ideas on that circular logic problem? | 15:27 |
nibalizer | still reading | 15:28 |
fungi | i suppose i could add a resource for the ssl-cert group so that puppet creates it explicitly and break the cycle that way | 15:28 |
anteaya | nibalizer: morning | 15:28 |
nibalizer | fungi: yea that is one way | 15:28 |
fungi | though... for users of the module relying on its snakeoil cert support, that cert/key is also created by the ssl-cert package i think? | 15:29 |
nibalizer | so we xant drop the cert after the package? | 15:29 |
fungi | so that's probably not sufficient | 15:29 |
fungi | nibalizer: ideally we get it to install the ssl-cert package, then create the certs used by our vhost configs, then install apache (which will get automatically started by its packaging) | 15:29 |
fungi | but i think the apache puppet module is also separately depending on that package. still digging to see whether that's the case | 15:30 |
fungi | one of those order-of-operations situations puppet is lousy with | 15:31 |
anteaya | <-- food | 15:32 |
clarkb | just set that file to root:root | 15:32 |
fungi | clarkb: you're genius. there are no members of that group on the production storyboard.o.o | 15:34 |
fungi | sometimes the simple solutions are the hardest to arrive at | 15:34 |
nibalizer | heh | 15:39 |
fungi | added as https://review.openstack.org/320008 | 15:39 |
fungi | though i'm half expecting the next attempt to launch it will point out that we also don't have the parent directories for these files as they're probably also created by the ssl-cert package | 15:40 |
nibalizer | fungi: looking great | 15:40 |
anteaya | so apparently apps.o.o is already on trusty: http://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp#n1256 | 15:43 |
anteaya | can anyone confirm? | 15:44 |
nibalizer | anteaya: nope | 15:45 |
nibalizer | nibz@apps:~$ cat /etc/issue | 15:45 |
nibalizer | Ubuntu 12.04.5 LTS \n \l | 15:45 |
nibalizer | http://puppetboard.openstack.org/node/apps.openstack.org is another easy way to check | 15:46 |
anteaya | nibalizer: thanks | 15:46 |
anteaya | hmmm, what would be happening that site.pp has trusty but the node is precise? | 15:48 |
nibalizer | anteaya: someone goofed i'd guess | 15:50 |
anteaya | okay what if anything should I do now? | 15:50 |
fungi | probably need to coordinate with docaedo to find out what data should be copied to a new server and when a good time for a cutover would be | 15:56 |
anteaya | okey dokey | 15:58 |
*** kcalman_ has joined #openstack-sprint | 16:00 | |
pleia2 | o/ | 16:03 |
anteaya | morning pleia2 | 16:03 |
anteaya | the etherpad is here: https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans | 16:03 |
fungi | it's also linked from the wiki page mentioned in the /topic | 16:04 |
*** jeblair has joined #openstack-sprint | 16:04 | |
jeblair | howdy! | 16:04 |
fungi | we're still sort of spinning up here i think | 16:07 |
jeblair | whew, didn't miss much then | 16:07 |
fungi | i'm poking at trying to upgrade storyboard.o.o, anteaya is trying to coordinate details for the apps.o.o upgrade | 16:08 |
jeblair | why don't i go do cacti | 16:08 |
fungi | that sounds swell | 16:08 |
fungi | thanks! | 16:08 |
clarkb | cacti will require firewall update and iptables bounce everywhere | 16:09 |
jeblair | whee! :) | 16:09 |
fungi | oh, right, because snmp | 16:10 |
clarkb | I have furnace person coming over this morning to tell me what it will cost to make home habitablr | 16:10 |
clarkb | afterwards will start on logstash workers | 16:10 |
anteaya | clarkb: is this part of the inspection still | 16:11 |
anteaya | clarkb: or are you a proud homeowner? | 16:11 |
clarkb | I am a proud homeowner | 16:11 |
anteaya | yay! | 16:11 |
anteaya | \o/ | 16:11 |
anteaya | congratulations | 16:11 |
anteaya | when do you move in? | 16:11 |
clarkb | depends on ehat furnace person says :) probably mid next month | 16:12 |
jeblair | clarkb: congrats! (i also had to replace furnace first thing -- what's with that?) | 16:12 |
anteaya | clarkb: yay | 16:12 |
clarkb | this one is 27 years old was not working during inspectio so we knew it upfront | 16:12 |
clarkb | the fridge on the other hand was foa | 16:13 |
clarkb | er doa | 16:13 |
anteaya | :( | 16:13 |
jeblair | yeah, i guess "furnace broke -- time to sell the house" is a thing | 16:13 |
anteaya | ha ha ha | 16:13 |
pleia2 | fwiw, everything breaks when you first move in | 16:13 |
pleia2 | it's a thing | 16:13 |
pleia2 | it gets better :) | 16:13 |
anteaya | :) | 16:13 |
jeblair | clarkb, pleia2: my surprise doa appliance was the dishwasher | 16:13 |
pleia2 | garbage disposal | 16:14 |
jeblair | (oh, yeah, replacing this is what i want to be doing right now) | 16:14 |
jeblair | pleia2: we *thought* the disposal was bad, but then we found the thing that was stuck in it. ew. | 16:14 |
clarkb | we spent the weekend cleaning. previous occupant had a dog which means fur everywhere and dirty walls | 16:14 |
clarkb | which after cleaning will hopefully get new paint | 16:14 |
anteaya | yay new paint! | 16:15 |
anteaya | if you get a pet yourself your pet will go anywhere the prior pet went if it went in the house | 16:15 |
anteaya | so cleaning, steaming or replacing carpet is not a bad idea if you on on the fence on flooring | 16:16 |
clarkb | Chloe has already enjoyed licking things I assume the dog licked | 16:16 |
anteaya | ha ha ha | 16:16 |
anteaya | okay then, looks like you have addressed that concern | 16:16 |
jeblair | i lowered cacti's ttl | 16:22 |
jeblair | apparently it does not have an aaaa address | 16:22 |
jeblair | i'm certain we want to move the filesystem to cinder; do we also want to move mysql to trove? | 16:23 |
fungi | that's a good question... i know mordred was looking into trove's replication options to possibly mitigate our need to stop using it | 16:24 |
jeblair | i'm leaning toward trove for now.... it'll be in the same boat as most other services | 16:25 |
fungi | and even if we take trove as a short-term proxy for "remote mysql server we operate" it's a good next step | 16:25 |
jeblair | ya | 16:25 |
fungi | so i'm inclined to agree | 16:26 |
jeblair | cacti uses a whole 711M of disk for rrd | 16:26 |
jeblair | min volume size is 100G? | 16:26 |
jeblair | (for sata) | 16:26 |
jeblair | 75 | 16:27 |
jeblair | i guess i will give it a 75G volume | 16:27 |
fungi | heh | 16:29 |
fungi | that's hilarious | 16:29 |
pleia2 | fungi: might want to use trusty-upgrades (with an 's') for the topic since that's what pabelanger has been using | 16:30 |
fungi | ahh, good idea. i just went with whatever anteaya suggested, but we can adjust it | 16:32 |
anteaya | pleia2: do you want to change that on the etherpad? | 16:32 |
pleia2 | anteaya: I added the other link | 16:32 |
anteaya | I didn't know there was an existing topic | 16:32 |
pleia2 | already | 16:32 |
anteaya | we can change it, just fungi and I have patches up using today's topic | 16:32 |
anteaya | done | 16:33 |
anteaya | trusty-upgrades it is | 16:33 |
fungi | i just adjusted the topic on my patch too | 16:33 |
pleia2 | thanks | 16:33 |
fungi | though it looks like crinkle already approved it. thanks! | 16:33 |
anteaya | thank you | 16:33 |
crinkle | o/ | 16:34 |
anteaya | ah dueling topic upgrades | 16:34 |
*** sivaramakrishna has joined #openstack-sprint | 16:34 | |
anteaya | fungi: who won? | 16:34 |
pleia2 | so I'm gathering we're preferring to replace nodes rather than in-place upgrades | 16:34 |
pleia2 | need to add some details to the etherpad, re: lower dns ttl etc | 16:35 |
anteaya | looks like you did, guess it only accepts one update | 16:35 |
anteaya | pleia2: good idea | 16:35 |
pleia2 | going to drop planet.o.o ttl | 16:40 |
fungi | i'm 99% sure it's entirely stateless (or rather, will rebuild its state from scratch) | 16:41 |
pleia2 | done for a and aaaa | 16:41 |
pleia2 | fungi: 99%, it does store some information just in case a blog disappears during a run so the blog remains in the planet feed | 16:41 |
pleia2 | but I think we don't care about that so much | 16:41 |
pleia2 | it updates often enough, nbd | 16:42 |
pleia2 | anyway, a/aaaa records at 5min now | 16:42 |
anteaya | pleia2: can you do paste.o.o while you are there? | 16:43 |
pleia2 | anteaya: sure | 16:44 |
anteaya | thanks | 16:44 |
pleia2 | anteaya: done | 16:44 |
anteaya | thank you | 16:44 |
jeblair | cacti could use some more puppeting (its database configuration is not in puppet) | 16:51 |
anteaya | jeblair: should that become a story so it doesn't get lost? | 16:56 |
jeblair | anteaya: probably so | 16:56 |
jeblair | on puppet-cacti | 16:56 |
anteaya | jeblair: do you want to do it or shall I? | 16:56 |
jeblair | wait, we don't have a puppet-cacti | 16:56 |
jeblair | maybe we need one :) | 16:56 |
anteaya | if I do it it will just be a copy paste of your words | 16:56 |
anteaya | ah ha | 16:56 |
anteaya | I agree | 16:56 |
jeblair | anteaya: if you could, that would be great; i don't have any more words to add | 16:57 |
anteaya | I will | 16:57 |
jeblair | anteaya: it's LHF | 16:57 |
anteaya | I will tag it appropriately | 16:57 |
anteaya | I think we need to grow a puppet-cacti repo, I can't find one | 16:59 |
anteaya | then our puppet cacti repo should have database configuration | 17:00 |
anteaya | I had thought you had said we just didn't have one in storyboard | 17:00 |
anteaya | we don't have the repo at all | 17:00 |
anteaya | I'll mix up a repo | 17:00 |
anteaya | have we anything to import into it when it is created? | 17:00 |
jeblair | anteaya: there is no cacti module right now, it's managed out of system-config. | 17:01 |
anteaya | I'll create a new empty repo called openstack-infra/puppet-cacti | 17:02 |
anteaya | then we can go from there | 17:02 |
jeblair | anteaya: well, don't do that unless you intend to put something in it :) | 17:02 |
anteaya | oh sorry | 17:02 |
anteaya | I had thought that was what you wanted | 17:02 |
jeblair | anteaya: i would love for someone to do that. i do not plan on doing it. | 17:03 |
anteaya | you want a bug filed against system-config for now? to puppet more of puppet-cacti | 17:03 |
anteaya | 's database? | 17:03 |
jeblair | anteaya: that's probably best | 17:03 |
anteaya | okay I'll create the story, sorry I was confused | 17:03 |
anteaya | https://storyboard.openstack.org/#!/story/2000601 | 17:07 |
*** sivaramakrishna has quit IRC | 17:07 | |
*** sivaramakrishna has joined #openstack-sprint | 17:08 | |
*** sivaramakrishna has quit IRC | 17:09 | |
jeblair | fungi: apache on the new cacti server is continuously segfaulting | 17:12 |
fungi | jeblair: ouch | 17:13 |
fungi | continuously as in as soon as it gets browsed, or with no activity at all? when just trying to start or after it's running? | 17:13 |
jeblair | fungi: as soon as it receives a request. it worked with the ubuntu/debian start page, but once i removed that so that it hit the cacti endpoint (php), it started segfaulting on every request | 17:14 |
fungi | oh | 17:15 |
fungi | so segfault in mod_php | 17:15 |
jeblair | maybe. i will get dumps. | 17:15 |
fungi | likely means there is some php in there which is unhappy under/incompatible with the php version on trusty | 17:16 |
fungi | at least that's the most likely culprit | 17:16 |
fungi | or we have some php configuration there which is no longer valid under newer php releases | 17:17 |
fungi | jeblair: anteaya: yep, whoever picks up the task to puppet db setup for our cacti class could optionally extend the scope of work there to doing it in a new puppet module | 17:20 |
fungi | (and subtree-split or filter-branch the existing class in the process to preserve history on what's being copied in) | 17:21 |
anteaya | should I append anything to the story? https://storyboard.openstack.org/#!/story/2000601 | 17:21 |
fungi | i can | 17:23 |
*** bkero has joined #openstack-sprint | 17:24 | |
anteaya | fungi: thanks | 17:24 |
jeblair | fungi: http://paste.openstack.org/show/498224/ | 17:29 |
jeblair | fungi: that appears to infinitely recurse | 17:29 |
fungi | now that's fancy | 17:30 |
fungi | mod_php, as expected | 17:30 |
fungi | though looks like an actual php bug? | 17:31 |
fungi | at least i don't see it calling out to anything | 17:31 |
jeblair | fungi: browsing bug histories, i've seen those described as 'userland error' by php devs | 17:31 |
fungi | mmm | 17:32 |
jeblair | it *might* be the unconfigured database | 17:32 |
bkero | I'd be scared of any PHP error that happened in kernel land. | 17:33 |
jeblair | i'll copy that over now | 17:33 |
jeblair | bkero: i believe it's in userspace | 17:33 |
jeblair | bkero: (i think php provides dtrace hooks, which is why dtrace shows up in function names in the call stack) | 17:35 |
bkero | Odd that they wouldn't macro that out | 17:36 |
jeblair | uh yep. apparently that *particular* segfault meant "database not found" | 17:39 |
* jeblair cleans up host | 17:40 | |
nibalizer | hah | 17:40 |
pleia2 | yeesh | 17:42 |
jeblair | looks okay | 17:43 |
jeblair | i'm going to replace the hostname in dns now | 17:43 |
fungi | wow, heh | 17:43 |
jeblair | iptables configs are by hostname, so once i do that, i can rehup iptables everywhere | 17:43 |
fungi | missing database? raise exception in php, segfault apache | 17:44 |
fungi | sounds like sane error handling to me | 17:44 |
jeblair | (i mean, we could add the new and old ips to iptables globally, then remove them later; but i think we'll be okay with a small gap in cacti data) | 17:44 |
nibalizer | fungi: exactly | 17:44 |
fungi | jeblair: yeah, seems fine. when we see gaps in the graphs, we'll know the reason | 17:45 |
fungi | maybe #status log it for posterity | 17:45 |
jeblair | #status log changed cacti.openstack.org IP address (for upgrade to trusty); gap in data around this time while iptables updates everywhere to allow snmp | 17:46 |
openstackstatus | jeblair: finished logging | 17:46 |
jeblair | also, i just used "python dns.py 2fb22df0-c176-4885-9a66-5735519c719b" to output the dns commands | 17:47 |
fungi | glad that still works | 17:47 |
jeblair | that is necessary because dns.py <hostname> doesn't work when there are 2 hosts, but supplying the uuid does work | 17:47 |
jeblair | so yay! | 17:48 |
fungi | clarkb: nibalizer: crinkle: hey! the ssl-cert to root group name adjustment worked! i just successfully launched a trusty-based storyboard.o.o to completion from scratch | 17:48 |
pleia2 | \o/ | 17:49 |
anteaya | fungi: yay! | 17:49 |
crinkle | \o/ | 17:49 |
nibalizer | nice! | 17:49 |
nibalizer | in clarkb we trust | 17:49 |
anteaya | ha ha ha | 17:50 |
anteaya | I want a coin with that stamped on it | 17:50 |
bkero | On infrabucks | 17:50 |
jeblair | oh, weird | 17:51 |
jeblair | so *most* of our iptables is by hostname | 17:51 |
jeblair | but cacti snmp is not | 17:51 |
jeblair | and the reason i did not find this out by searching for the ip address in system-config is that it is in puppet-iptables | 17:51 |
nibalizer | yep | 17:51 |
* nibalizer really wants to make that whole system better | 17:51 | |
jeblair | nibalizer, fungi: can you instapprove remote: https://review.openstack.org/320085 Change cacti IP address | 17:53 |
jeblair | (i also removed the previous previous cacti address) | 17:53 |
nibalizer | o7 | 17:54 |
fungi | jeblair: awesome | 17:54 |
fungi | jeblair: what about its ipv6 address? i wonder if it's likely to start using that from the new server or not | 17:54 |
jeblair | fungi: i think it had an ipv6 interface+address on host before, just wasn't in dns. but yeah, we should add that... | 17:55 |
jeblair | oh hey | 17:55 |
jeblair | we had the v6 address too | 17:56 |
jeblair | fungi, nibalizer: remote: https://review.openstack.org/320089 Change cacti IPv6 address | 17:58 |
jeblair | i'm enqueing those into gate to reduce the window we are without data collection | 18:01 |
fungi | anybody happen to know what causes launch-node.py to do this? http://paste.openstack.org/show/498236/ | 18:03 |
fungi | it seems to be benign, but i get pages and pages of it after the script completes | 18:04 |
nibalizer | fungi: no idea | 18:05 |
nibalizer | do you have openstack related envvars sourced when you run it? | 18:05 |
anteaya | bkero: are you going to be testing paste.o.o on trusty on a local vm? | 18:05 |
fungi | nibalizer: yes, i do (following the README from the same directory to the letter) | 18:06 |
anteaya | bkero: I was just about to do that but won't bother if you are | 18:06 |
bkero | anteaya: I did during the summit. Paul was there. I tested using a trusty host against the a db made with precise | 18:06 |
bkero | works fine without modification :) | 18:06 |
anteaya | bkero: great do you want to update the etherpad for the paste.o.o entry? | 18:06 |
anteaya | so we all can track your progress | 18:07 |
jeblair | fungi: i did not see that with the host i just launched | 18:07 |
bkero | anteaya: sure, although I'm not a rooter and can't provision the boxes or do dns myself | 18:07 |
anteaya | right, but you can claim the server with your work | 18:07 |
anteaya | so I can stop dublicating what you have already done | 18:07 |
* bkero nods | 18:07 | |
fungi | jeblair: are you using an interactive root shell on the puppetmaser to run it? just wondering what i might be doing to cause different behavior | 18:07 |
fungi | bkero: yep, i moved our lodgeit db back into trove recently, so if the manifest is working fine on trusty then it should be an easy one for us to swap out. i'm happy to tackle it next after i get the storyboard swapout done | 18:09 |
bkero | fungi: ok, the review has us creating a new paste01 host, so I think that's going to need to be provisioned, then tested, and if works well we can update DNS. | 18:10 |
bkero | Thanks for the assist! | 18:10 |
anteaya | bkero: why are you creating a new domain name? https://review.openstack.org/#/c/311235/2/manifests/site.pp | 18:10 |
bkero | anteaya: see the -1 comment from paul | 18:11 |
bkero | He advised that's the new naming strategy, and I didn't have the context to refute that. So I shrugged and updated the patch to match. | 18:11 |
anteaya | bkero: I think it was decided, did he say who decided or where? | 18:11 |
nibalizer | i'm working on puppetdb01.openstack.org so that fits | 18:11 |
anteaya | okay well that is news to me | 18:12 |
nibalizer | we had a discussion in one of the workrooms at the summit | 18:12 |
bkero | anteaya: Not that I can recall | 18:12 |
fungi | oh, should i rebuild the storyboard.o.o replacement as storyboard01? | 18:12 |
anteaya | and doesn't match what the top of the etherpad says for how to proceed today | 18:12 |
anteaya | I don't care | 18:12 |
anteaya | but I would like some agreement | 18:12 |
nibalizer | yah i wouldn't go so far as to say we decided, but we disucssed it a bunch | 18:12 |
anteaya | nibalizer: who? | 18:12 |
fungi | that's mostly because i forgot we'd decided that | 18:12 |
nibalizer | it helps us prevent us having two hosts with the same name | 18:12 |
nibalizer | which makes our automation more predictable and less buggy (and stretches my brain less) | 18:13 |
anteaya | I obviously missed this dicussion and would like to read up on it | 18:13 |
jeblair | fungi: no, sudo from corvus | 18:13 |
anteaya | if anyone has any links to the discussion | 18:13 |
jeblair | fungi: i also literally followed the instructions | 18:13 |
bkero | nibalizer: it does allow us to use node regexes, which should make the manifests smaller | 18:13 |
fungi | nibalizer: what was the reason again? because tracking multiple hosts with the same fqdn is hard to make work correctly in our ansible inventory? | 18:13 |
jeblair | fungi: i have export OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml | 18:13 |
jeblair | fungi: in my .profile | 18:14 |
fungi | jeblair: those envvars get preserved by sudo? | 18:14 |
bkero | sudo -E preserves environment | 18:14 |
jeblair | bkero, nibalizer, fungi: i think if we decided on a new naming strategy, we should document it | 18:14 |
nibalizer | jeblair: yah | 18:15 |
bkero | jeblair: I agree. I think that's what this page is for. https://etherpad.openstack.org/p/infra-operating-system-upgrades | 18:15 |
bkero | Can we get quorum in here right now though? | 18:15 |
fungi | well, etherpads aren't documentation | 18:15 |
jeblair | fungi: oh, wait, i did not use sudo | 18:15 |
*** ig0r_ has joined #openstack-sprint | 18:15 | |
fungi | jeblair: oh, so it's working with our normal accounts again now? i didn't think to try that | 18:16 |
nibalizer | i had decided to try out the foo01.o.o syntax with the puppetdb upgrade | 18:16 |
bkero | Speaking of etherpads aren't documentation, does anyoen remember which workmeeting the decision was part of, so we can go through the summit pads and validate? | 18:16 |
jeblair | yeah, the documentation would go in http://docs.openstack.org/infra/system-config/sysadmin.html#launching-new-servers or https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README | 18:16 |
nibalizer | still in progress because puppetdb is annoying | 18:16 |
fungi | jeblair: for a while it was broken because of lack of filesystem access to reset the inventory cache | 18:16 |
jeblair | i would particularly like it to be documented because i don't recall whether we did 01 or date | 18:16 |
nibalizer | jeblair: want me to take a crack at a system-config patch? | 18:16 |
nibalizer | can whip that up pretty easily | 18:17 |
jeblair | and i'd really like it to go through a written discussion process because then i would know where to find the answer to "what does paste01 mean compared to jenkins01"? | 18:17 |
nibalizer | oh thats a good point i had not considered | 18:18 |
nibalizer | well I could start an ML thread that we can kick around for a while | 18:18 |
fungi | (git01, logstash-worker01, et cetera) | 18:18 |
nibalizer | and we could just upgrades in the standard way today? | 18:18 |
bkero | Send it to the committee! | 18:18 |
bkero | (heh, don't do that) | 18:18 |
jeblair | like, my recollection is that we *decided* we should change our naming scheme to something that supports more hosts. i don't remember deciding exactly what that scheme would be. i think we're at the point where we should kick around proposals) | 18:18 |
jeblair | nibalizer: ++ | 18:19 |
nibalizer | cool | 18:19 |
fungi | i suppose one answer there is that we just keep burning hostnames (so when you replace git01, do so with git09 and delete git01 later) | 18:19 |
bkero | https://etherpad.openstack.org/p/newton-infra-launch-node-ansible-and-puppet | 18:20 |
nibalizer | or use a - or something eg paste-01.o.o and git01-01.openstack.org | 18:20 |
jeblair | fungi: yeah, that would probably be fine | 18:20 |
bkero | This seems like where the discussion took place | 18:20 |
nibalizer | bkero: thanks for digging that up | 18:20 |
nibalizer | btw the spec i promised in that got written and is up against the specs repo | 18:21 |
fungi | we've already followed this pattern at least once, which is why our 6 elasticsearch nodes are numbered 02-07 | 18:21 |
jeblair | fungi: i assume we'll have an es01 again someday | 18:21 |
fungi | there's certainly nothing to stop us | 18:21 |
jeblair | (i'd expect 'lowest available number') to be the algorithm | 18:22 |
nibalizer | so most of the time it cycles between 01 and 02 ? | 18:22 |
jeblair | i'd imagine so | 18:22 |
bkero | In theory a piece of clustering software that was sufficiently braindead could see a replaced 'old' node and if that were the leader it would either split-brain or ruin state. | 18:23 |
fungi | or 00 and 01 (cardinals or ordinals?!?) | 18:23 |
bkero | But I don't think any of the clustering software we use is that dumb | 18:23 |
* fungi finds ways to paint the bikeshed | 18:23 | |
fungi | bkero: if it is, we should fix it | 18:24 |
bkero | Definitely, we just need to know that software matching that description exists first. Ideally without getting burned. | 18:28 |
anteaya | can someone who understands what the decision is update the etherpad please? https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans | 18:28 |
anteaya | then we at least have a chance of following the same workflow | 18:29 |
*** kcalman_ has quit IRC | 18:36 | |
anteaya | I'm going for a walk I'll be back in about an hour | 18:38 |
*** baoli has quit IRC | 19:01 | |
bkero | So, uh, I don't feel I have the authority to put that up at the top, but maybe someone else in here does? | 19:24 |
bkero | So we can move on with $THINGS. | 19:25 |
fungi | did nibalizer's spec cover it? | 19:32 |
* fungi checks | 19:32 | |
fungi | nibalizer: nice futurama reference there | 19:33 |
fungi | i'll propose a straw man of what i think we arrived at as a documentation patch. just a sec | 19:35 |
nibalizer | i jeblair was saying he wanted to hash it out on the ML | 19:42 |
fungi | that's fine, the ml discussion can be "here's a proposed patch" | 19:42 |
nibalizer | awesome | 19:44 |
nibalizer | thanks fungi | 19:44 |
fungi | https://review.openstack.org/320115 | 19:46 |
fungi | was someone writing the ml thread starter already, or shall i? | 19:46 |
anteaya | I just returned from my walk | 19:48 |
anteaya | I was not writing anything | 19:48 |
fungi | i was taking a longish break for lunch, so only just picked this back up myself | 19:48 |
anteaya | I do not object to fungi writing a ml thread starter | 19:48 |
anteaya | nice, I hope you had a nice lunch | 19:49 |
anteaya | gorgeous day here | 19:49 |
anteaya | sunny and quiet | 19:49 |
pleia2 | planet looks fine, but I haven't run launch-node.py in months, anything I should know once this system-config patch is in place? | 19:49 |
anteaya | like it used to be all summer long | 19:49 |
fungi | other than the fact that one replacement server may have already gone into production today without a numbered suffix, i didn't hear any particular objection to sticking with that pattern | 19:49 |
anteaya | pleia2: nice work on planet | 19:49 |
fungi | pleia2: which system-config patch? | 19:50 |
pleia2 | fungi: just https://review.openstack.org/#/c/320047/ for planet trusty node | 19:50 |
*** baoli has joined #openstack-sprint | 19:56 | |
*** baoli has quit IRC | 19:57 | |
*** baoli has joined #openstack-sprint | 19:57 | |
*** ig0r_ has quit IRC | 19:59 | |
fungi | pleia2: just cut-n-paste what's in launch/README in system-config | 20:01 |
pleia2 | fungi: alright good, no surprises then :) | 20:09 |
*** baoli_ has joined #openstack-sprint | 20:19 | |
*** baoli_ has quit IRC | 20:20 | |
*** baoli_ has joined #openstack-sprint | 20:20 | |
pleia2 | ooh, my local test was not a great one for planet, there are python errors I missed that I don't see in prod, and it's causing a much shorter list of blogs during processing | 20:21 |
* pleia2 digs | 20:21 | |
*** baoli has quit IRC | 20:21 | |
pleia2 | so it's this https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=747381 which brings me to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=735837 | 20:25 |
openstack | Debian bug 747381 in planet-venus "planet-venus: please move experimental version to unstable" [Important,Fixed] | 20:25 |
openstack | Debian bug 735837 in python-html5lib "python-html5lib 0.999-2 breaks planet-venus" [Important,Open] | 20:25 |
pleia2 | 14.04 has the buggy 0~bzr116-1 version, not the "new" unstable one | 20:26 |
jeblair | do we have any hosts without public ipv6 addresses? | 20:29 |
jeblair | oh, mirror servers... | 20:29 |
jeblair | i suspect that trusty's cacti+php do not play well with servers with aaaa records | 20:30 |
pleia2 | that's depressing | 20:30 |
fungi | wow | 20:31 |
jeblair | i think php's snmp module is misinterpreting the ":161" (port) that is appended to each hostname in the @snmp2_get calls | 20:31 |
pleia2 | x_x | 20:31 |
pleia2 | php! | 20:31 |
pleia2 | lunchtime | 20:32 |
*** baoli_ has quit IRC | 20:35 | |
jeblair | or... maybe it's actually trying to use ipv6... | 20:38 |
fungi | thread on new naming pattern http://lists.openstack.org/pipermail/openstack-infra/2016-May/004323.html | 20:39 |
jeblair | okay, i think the error is entirely in our snmpd config... patch incoming | 20:48 |
jeblair | (and i guess we should cheer that php/cacti have started trying ipv6 by default which has uncovered this?) | 20:48 |
fungi | yay progress! | 20:50 |
jeblair | fungi, pleia2, nibalizer: remote: https://review.openstack.org/320135 Listen on IPv6 | 20:50 |
jeblair | oh there is no subscribe there... | 20:51 |
jeblair | i wonder if we should set that up first | 20:51 |
*** baoli has joined #openstack-sprint | 20:52 | |
fungi | no subscribe for what? | 20:53 |
jeblair | fungi: to restart snmpd when the file is updated | 20:53 |
fungi | ahh | 20:53 |
fungi | in puppet, got it | 20:53 |
fungi | i was low on context | 20:53 |
jeblair | fungi: and i was stingy with it :) | 20:53 |
jeblair | remote: https://review.openstack.org/320137 Subscribe the snmp service to config file updates | 20:54 |
fungi | also for some reason i didn't realize you needed v6-specific snmp communities | 20:54 |
jeblair | fungi: i didn't either, but beat my head against it for several minutes before google told me about it | 20:54 |
jeblair | i restacked it so the subscribe is first, then the v6 update | 20:54 |
fungi | yeah, i just found similar documentation when i reviewed that. amusing | 20:55 |
jeblair | nibalizer, pleia2: if you can review 320137 and 320135 soon, that would help us get our cacti monitoring back | 21:02 |
pleia2 | I'll +A in a few minutes if nibalizer isn't around to look | 21:10 |
fungi | so back on the topic of server names as pertains to this week's exercise... should we proceed under the assumption that our server replacements need to be whatever01 now, or should we hold off doing that and stick with the old method of naming them the same thing as what they're replacing until we gain broader consensus on the ml/doc review? | 21:16 |
fungi | i'm fine going either direction, but am likely to default to $old_way if we're still trying to build consensus on $new_way | 21:17 |
jeblair | i just did cacti.o.o without a number, and don't really feel like redoing it, but i think if we figure we have consensus, i don't object to remaining hosts being numbered | 21:17 |
jeblair | (and am similarly ambivalent) | 21:17 |
nibalizer | ;[ya dont redo cacti | 22:03 |
* bkero would vote for numbering under the assumption that it or something like it would be used in the future. | 22:21 | |
jeblair | cacti has data again | 22:30 |
jeblair | i'll delete the old host tomorrow | 22:30 |
anteaya | yay cacti | 22:41 |
*** jhesketh_ has joined #openstack-sprint | 22:49 | |
*** craige_ has joined #openstack-sprint | 22:51 | |
*** natorious_ has joined #openstack-sprint | 22:52 | |
*** sbadia_ has joined #openstack-sprint | 22:53 | |
*** rfolco has quit IRC | 22:53 | |
*** bkero has quit IRC | 22:53 | |
*** zhenguo_ has quit IRC | 22:53 | |
*** sbadia has quit IRC | 22:53 | |
*** jhesketh has quit IRC | 22:53 | |
*** mmedvede has quit IRC | 22:53 | |
*** craige has quit IRC | 22:53 | |
*** natorious has quit IRC | 22:53 | |
*** sbadia_ is now known as sbadia | 22:53 | |
*** craige_ is now known as craige | 22:53 | |
*** natorious_ is now known as natorious | 22:53 | |
*** mmedvede has joined #openstack-sprint | 23:03 | |
*** yujunz has joined #openstack-sprint | 23:09 | |
*** yujunz has quit IRC | 23:15 | |
fungi | i mostly don't want to create a weird storyboard01 and put it into production only to find later there's consensus on not doing that | 23:15 |
fungi | but if the majority seems to be leaning there, i may just do it anyway and move on with my life | 23:16 |
fungi | i made some very excellent pizza tonight, so i'm in much more of a go-with-the-flow sort of mood | 23:17 |
nibalizer | mmm | 23:21 |
anteaya | pizza | 23:30 |
anteaya | if you create a storyboard01 server would users navigate to storyboard01.openstack.org? | 23:31 |
anteaya | or would it just show up in cacti and system-config (and puppetboard) as storyboard01? | 23:31 |
*** rfolco has joined #openstack-sprint | 23:38 | |
*** zhenguo_ has joined #openstack-sprint | 23:39 | |
pleia2 | anteaya: as I understand it, the name in nova would be storyboard01, and so would the hostname (for ansible) but otherwise it would show up/be referred to as storyboard.o.o | 23:46 |
pleia2 | so I think we'd continue to use storyboard.o.o in cacti and system-config | 23:47 |
anteaya | ah okay thank you | 23:47 |
anteaya | was wondering how far the enumerated name went | 23:47 |
pleia2 | I'm kind of guessing too :) | 23:48 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!