*** sarob_ has joined #openstack-infra | 00:01 | |
jeblair | that didn't quite tell me what i needed; restarted with more logging | 00:01 |
---|---|---|
*** nati_uen_ has joined #openstack-infra | 00:02 | |
anteaya | :( | 00:02 |
anteaya | yay restart | 00:02 |
*** sarob has quit IRC | 00:04 | |
*** nati_ueno has quit IRC | 00:05 | |
*** michchap has joined #openstack-infra | 00:05 | |
*** weshay has quit IRC | 00:05 | |
clarkb | jeblair: I feel like trying to debug this git thing is taking too much time. Everything works but https clone from centos clients | 00:06 |
clarkb | jeblair: we can either make http available too, clone from /cgit, or use git:// on centos nodes | 00:07 |
clarkb | cloning from /cgit appears to use the non smart protocol | 00:07 |
clarkb | I am going to see if pcrews is still in the office to see if he has any ideas | 00:08 |
pcrews | clarkb: /me is at home today :) | 00:08 |
jeblair | clarkb: ok. let's make http available as a backup but go back to git:// | 00:08 |
jeblair | clarkb: (with enough capacity to handle it this time) | 00:08 |
*** UtahDave has quit IRC | 00:09 | |
clarkb | jeblair: sounds good. I will push one more patchset to enable http then we should be ready to start spinning up nodes | 00:11 |
clarkb | pcrews: git clone https://foo through haproxy takes a really long time then fails. doing the same clone to the backend server takes a really long time but does not fail | 00:12 |
clarkb | pcrews: wondering what might cause that and it is only when using https not http and only on centos | 00:12 |
pcrews | ? not a clue | 00:14 |
*** senk has joined #openstack-infra | 00:16 | |
clarkb | jeblair: do you want to try doing this tonight? | 00:17 |
clarkb | fwiw /cgit isn't that much slower. about 2 minutes to clone over https | 00:18 |
jeblair | clarkb: up to you; i will need to continue to focus on zuul tonight | 00:18 |
jeblair | clarkb: but what is git:// ? | 00:18 |
clarkb | jeblair: 34 seconds or so | 00:18 |
jeblair | clarkb: that's kinda why i was thinking we should use it, but have http as a backup | 00:19 |
clarkb | ++ | 00:19 |
clarkb | jeblair: I think I would prefer at least one extra set of eyes when we make the switch as there are a lot of moving parts | 00:19 |
anteaya | clarkb: who do you have in mind? I'm no help there | 00:20 |
clarkb | anteaya: jeblair :) | 00:20 |
*** wenlock has quit IRC | 00:20 | |
jeblair | clarkb: how many servers do you want, and what size? | 00:20 |
anteaya | clarkb: ah okay, sorry thought you were talking about a 3rd person | 00:20 |
clarkb | jeblair: I think you have a better feel for that than I do | 00:21 |
jeblair | aha, i think i found the zuul problem | 00:21 |
clarkb | \o/ | 00:21 |
clarkb | jeblair: one additional thing to keep in mind with lots of small servers is the extra work gerrit will need to do replicating. Not sure if that is a big deal | 00:22 |
anteaya | yay | 00:22 |
comstud | btw, i appreciate the work all of you are doing... despite all of the cursing that I'm doing. :) | 00:22 |
jeblair | clarkb: good point; maybe we should go with several 8g servers then? | 00:22 |
anteaya | I vote we let jeblair try to patch zuul first | 00:22 |
anteaya | thanks comstud | 00:22 |
anteaya | I think we are doing our share of cursing too | 00:23 |
comstud | i figure so | 00:23 |
clarkb | jeblair: that sounds reasonable | 00:23 |
clarkb | jeblair: start with ~4 then we can add more if needed? | 00:23 |
*** alexpilotti has quit IRC | 00:23 | |
jeblair | clarkb: yeah | 00:24 |
clarkb | lifeless: does the haproxy source balance type completely break if you sources are in the same /24 subnet? | 00:24 |
clarkb | lifeless: I am slightly worried that replication delays will cause problems with git if it hits 5 different servers at once (which by default it can do that) | 00:25 |
*** jjmb has joined #openstack-infra | 00:25 | |
clarkb | at least with the http protocol. I think git:// is one connection | 00:25 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 00:28 |
lifeless | clarkb: git is one connection yes | 00:28 |
clarkb | jeblair: ^ that makes http a viable fallback option | 00:28 |
*** dina_belova has joined #openstack-infra | 00:28 | |
lifeless | clarkb: multiple http requests can go to different servers | 00:28 |
anteaya | 10 lovely patches in the gate, currently passing tests ha ha ha ha *flash of lightening* | 00:28 |
clarkb | lifeless: typically, however we will be replicated to 5 different servers and potentially different rates | 00:28 |
*** changbl has joined #openstack-infra | 00:28 | |
anteaya | the 11th one has a LOST | 00:28 |
lifeless | clarkb: if the first server hit's its max_queue, goes down | 00:29 |
lifeless | clarkb: yes, I get the case ;) | 00:29 |
lifeless | clarkb: have you read the docs for source - when you add servers, you'll shuffle 1/new-num-servers of the http clients onto new servers | 00:29 |
clarkb | lifeless: using the default round robin it dynamically weighs them | 00:30 |
lifeless | clarkb: right | 00:30 |
clarkb | oh looks like that would happen with source as there is a division of the hash | 00:31 |
lifeless | clarkb: there isn't a mode where you can avoid http requests going to different servers; you only get to choose whether it happens all the time or when you have servers going down/up/added. | 00:31 |
clarkb | I think I am less worried about that case and more generally worried about it when everything is going smoothly and gerrit happens to update one server more slowlythan the others | 00:31 |
lifeless | are you running git-http-backend, or plain-ol-HTTP ? | 00:32 |
harlowja | qq, is there going to be an update to say the mailing list when jenkins is ok again? | 00:32 |
clarkb | git-http-backend | 00:32 |
*** dina_belova has quit IRC | 00:33 | |
anteaya | harlowja: you will hear the party happening when jenkins is okay again | 00:33 |
harlowja | sounds great :) | 00:33 |
anteaya | and yes, we can do an update to the ml too | 00:33 |
anteaya | thanks | 00:33 |
harlowja | thx for your guys hardwork | 00:33 |
harlowja | *and gals | 00:33 |
anteaya | thanks harlowja | 00:34 |
anteaya | :D | 00:34 |
lifeless | clarkb: so, why do you want HTTP ? | 00:34 |
lifeless | clarkb: you should read http://git-scm.com/book/en/Git-Internals-Transfer-Protocols | 00:34 |
lifeless | thats why the http git CDN terrifies me :) | 00:36 |
*** chmouel has quit IRC | 00:36 | |
jeblair | stopped zuul again; it hit the bug and i have more logs | 00:36 |
*** westmaas has quit IRC | 00:36 | |
*** westmaas has joined #openstack-infra | 00:36 | |
*** chmouel has joined #openstack-infra | 00:37 | |
*** GheRivero has quit IRC | 00:37 | |
*** dtroyer has quit IRC | 00:37 | |
*** juice has quit IRC | 00:37 | |
*** GheRivero has joined #openstack-infra | 00:37 | |
anteaya | jeblair: yay | 00:37 |
*** dtroyer has joined #openstack-infra | 00:37 | |
anteaya | let's hope the secret is in the logs | 00:37 |
*** jpeeler has quit IRC | 00:37 | |
*** juice has joined #openstack-infra | 00:37 | |
*** jpeeler has joined #openstack-infra | 00:38 | |
*** jjmb1 has joined #openstack-infra | 00:39 | |
clarkb | lifeless: for a couple reasons. 1. Apache is generally good about helping us not shoot ourselves in the foot unlike git daemon 2. $RANDOM people can usually hit port 443 3. with https you a reasonable amount of trust of who the remote end is | 00:39 |
*** jjmb has quit IRC | 00:40 | |
clarkb | lifeless: I think we are getting better at item 1 with haproxy but items 2 and 3 aren't really solved with git daemon + haproxy | 00:40 |
*** melwitt has quit IRC | 00:40 | |
clarkb | 2 and 3 aren't really gate issues | 00:42 |
clarkb | jeblair: I have ten nova git clones over git protocol in a while true loop on the 30g host cloning from the 15g host through haproxy | 00:44 |
lifeless | clarkb: http://www.anchor.com.au/blog/2009/10/load-balancing-at-github-why-ldirectord/ might be an interesting read when you have time | 00:45 |
lifeless | clarkb: so I'm reasonably sure smart http will still do multiple requests. | 00:46 |
lifeless | clarkb: smart https will be totally fine.m it's only http that will suck. | 00:46 |
lifeless | clarkb: my suggestion, use roundrobin, but https and git ports only | 00:46 |
clarkb | lifeless: how is https different than http in this scenario? | 00:47 |
lifeless | clarkb: [and https in tcp mode so you just get a tunnel] | 00:47 |
mgagne | lifeless: where are their puppet manifests =) | 00:48 |
lifeless | clarkb: I'm fairly sure git will use one tcp connection for https | 00:48 |
lifeless | clarkb: because everyone knows how slow https handshakes are | 00:48 |
clarkb | lifeless: I wonder if that could be why it fails so hard on centos | 00:49 |
lifeless | clarkb: and intermediaries can't mess you up, whereas for http there are intercepting proxies all over the damn place | 00:49 |
lifeless | clarkb: https intercepting proxies are rarer | 00:49 |
clarkb | there is a long delay when starting an https clone. At first I thought it may be related to handshaking but cloning from /cgit over https does not have the same delay and they share the same ssl setup | 00:50 |
*** sarob_ has quit IRC | 00:51 | |
clarkb | and tcpdump showed the client reporting a zero window size frequently | 00:52 |
*** jhesketh has quit IRC | 00:53 | |
*** jhesketh has joined #openstack-infra | 00:53 | |
*** nati_uen_ has quit IRC | 00:55 | |
clarkb | jeblair: I have taken the load on fake git.o.o up to 23ish and haven't had any clones fall over yet | 00:55 |
*** nati_ueno has joined #openstack-infra | 00:57 | |
openstackgerrit | benley proposed a change to openstack-infra/jenkins-job-builder: Add display-name job property. https://review.openstack.org/41828 | 01:01 |
openstackgerrit | Angus Salkeld proposed a change to openstack/requirements: Add some more filters to the .gitignore https://review.openstack.org/43216 | 01:01 |
openstackgerrit | Angus Salkeld proposed a change to openstack/requirements: Bump python-ceilometerclient to 1.0.3 https://review.openstack.org/43217 | 01:01 |
jeblair | #status alert Zuul is offline for troubleshooting | 01:02 |
openstackstatus | NOTICE: Zuul is offline for troubleshooting | 01:02 |
*** ChanServ changes topic to "Zuul is offline for troubleshooting" | 01:02 | |
clarkb | jeblair: I am going to grab dinner soon | 01:05 |
jeblair | clarkb: k | 01:05 |
*** reed has quit IRC | 01:06 | |
clarkb | jeblair: we will need to spin up those 4 nodes tomorrow then write puppet changes to replicate to them and add them to haproxy then we can put everything in | 01:06 |
clarkb | oh and changes to update the clone urls | 01:06 |
jeblair | clarkb: are you happy with the haproxy config? | 01:06 |
*** nati_ueno has quit IRC | 01:07 | |
clarkb | jeblair: I think so. I hammered the git:// relatively hard with some for loops | 01:07 |
*** markmcclain has quit IRC | 01:07 | |
*** fifieldt_ has joined #openstack-infra | 01:20 | |
*** UtahDave has joined #openstack-infra | 01:20 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Make updateChange actually update the change https://review.openstack.org/43220 | 01:23 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Add some log lines https://review.openstack.org/43221 | 01:23 |
jeblair | let's hope that's it. | 01:23 |
jeblair | i will install that and restart zuul | 01:23 |
* clarkb reviews really quickly | 01:25 | |
jeblair | clarkb: i will wait to start until you have reviewed | 01:25 |
jeblair | clarkb: specifically it was the 'needed_by_changes that was the problem here | 01:25 |
jeblair | and a patch series of like 20 changes | 01:26 |
clarkb | ok interesting thing about the list of files | 01:26 |
clarkb | jeblair: yup lgtm. nice catch | 01:27 |
jeblair | i just did that because it looked like it could be wrong too (just keep appending files) | 01:27 |
clarkb | ya I agree | 01:27 |
clarkb | ok running off to find dinner and fungi | 01:27 |
jeblair | ok starting zuul | 01:27 |
*** dina_belova has joined #openstack-infra | 01:28 | |
*** dina_belova has quit IRC | 01:33 | |
*** gyee has quit IRC | 01:35 | |
openstackgerrit | Mathieu Gagné proposed a change to openstack-infra/config: Add commit-filter for cgit https://review.openstack.org/43222 | 01:36 |
*** huangtianhua has joined #openstack-infra | 01:37 | |
jeblair | #status ok Zuul is running again | 01:38 |
openstackstatus | NOTICE: Zuul is running again | 01:38 |
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure | docs http://ci.openstack.org | bugs https://launchpad.net/openstack-ci/+milestone/grizzly | https://github.com/openstack-infra/config" | 01:38 | |
anteaya | yay | 01:38 |
anteaya | let's see what happens now | 01:38 |
anteaya | way to go jeblair! | 01:39 |
jeblair | may want to wait about an hour before you cheer :) | 01:39 |
anteaya | I'll cheer again in an hour | 01:40 |
*** mriedem has joined #openstack-infra | 01:40 | |
anteaya | hopefully soon I will learn enough to help you | 01:40 |
*** dhellmann is now known as dhellmann_ | 01:48 | |
*** xchu has joined #openstack-infra | 01:52 | |
*** yaguang has joined #openstack-infra | 02:02 | |
*** roaet has joined #openstack-infra | 02:17 | |
roaet | Alright. Read through that scroll back. I'll try to pay attention now. :) | 02:17 |
anteaya | roaet: welcome | 02:19 |
anteaya | 66 in the gate, 25 in the check | 02:20 |
anteaya | any free nodes are going to check now, the gate queue seems to be loaded | 02:20 |
anteaya | it took almost an hour to load those 91 patches | 02:21 |
anteaya | and I expect there are at least 90 check patches to come into the check queue so I would look for your patch in the list in about an hour roaet | 02:22 |
roaet | anteaya: thanks. I will do so. trying to wrap my mind around all the information. | 02:22 |
anteaya | it is a lot | 02:22 |
anteaya | I suggest take a small piece | 02:22 |
anteaya | if you know jenkins plugins already, start there | 02:22 |
anteaya | ask questions, don't worry how dumb | 02:23 |
anteaya | I'll do my best to answer or help you find the answer | 02:23 |
roaet | Thanks. Look forward to working with you all. | 02:23 |
*** DennyZhang has joined #openstack-infra | 02:23 | |
anteaya | thanks looking forward to working with you too | 02:23 |
anteaya | what time zone are you in? | 02:24 |
anteaya | I'm in Eastern | 02:24 |
roaet | Central | 02:24 |
jeblair | anteaya: i don't have a full list of everything that needs to be rechecked/reverified; only the list from when i stopped it the first time | 02:25 |
anteaya | jeblair: great | 02:25 |
jeblair | anteaya: i'm slowly leaving recheck comments on those, to avoid the thundering herd | 02:25 |
anteaya | I have been encouraging those to wait for the queue to populate and then recheck if they don't see theirs | 02:25 |
anteaya | great | 02:25 |
jeblair | anteaya: but anything added since about 4 hours or so ago i won't have | 02:26 |
anteaya | I am encouraging the thundering herd to let us build up slowly | 02:26 |
anteaya | ah okay | 02:26 |
anteaya | I'll use that as a marker | 02:26 |
jeblair | (if i've been leaving recheck comments though, my script will get to those again) | 02:26 |
roaet | jeblair: I'm assuming if you hit my change then it was there (i see your recheck there) | 02:26 |
jeblair | roaet: probably, which number? | 02:26 |
roaet | 42242 | 02:27 |
jeblair | roaet: ye,h it's about 30-something down the list, so probably any time now | 02:27 |
roaet | Thanks a lot. I'll try to help however I can in the future. Don't want to mythical man month you at the moment. But I'll try my best. | 02:28 |
lifeless | ttx: jeblair: so - nova baremetal is broken at the moment | 02:28 |
lifeless | does that impact anything release mgmt wise right now ? | 02:28 |
*** dina_belova has joined #openstack-infra | 02:29 | |
jeblair | lifeless: i don't think so; we're just around feature freeze (we're actually only at feature proposal freeze) | 02:29 |
jeblair | lifeless: https://wiki.openstack.org/wiki/Havana_Release_Schedule | 02:30 |
jeblair | lifeless: h3 milestone release is sep 6 | 02:30 |
jeblair | \o/ 3 changes just merged | 02:33 |
*** dina_belova has quit IRC | 02:33 | |
morganfainberg | jeblair: just looking at https://review.openstack.org/#/c/39899/ i see zuul posted on it about 15 minutes ago, but don't see it in the queue | 02:34 |
morganfainberg | oh wait nvm | 02:34 |
morganfainberg | Misreading time | 02:34 |
morganfainberg | crap 2hrs ago | 02:34 |
anteaya | ah okay | 02:34 |
morganfainberg | i obviously can't think. | 02:34 |
anteaya | no worries | 02:34 |
jeblair | morganfainberg: yeah, you'll want to reverify that then, sorry | 02:35 |
anteaya | we are all tired | 02:35 |
morganfainberg | jeblair: not a worry man, just tyring to make sure i get these important ones in the queue | 02:35 |
*** gordc has joined #openstack-infra | 02:35 | |
anteaya | 4 successful patches in the gate | 02:35 |
anteaya | yay | 02:35 |
morganfainberg | jeblair: is it going to take a while to pickup reverifies since it's still slowly reconsituting the queues? | 02:36 |
*** rcleere has joined #openstack-infra | 02:36 | |
jeblair | morganfainberg: it's about 70 gerrit events behind right now, so if you add your reverify, it'll go onto the end of that queue first before it shows up in the gate queue | 02:37 |
morganfainberg | great thats that i wanted to know. | 02:38 |
morganfainberg | thanks! | 02:38 |
*** ftcjeff has joined #openstack-infra | 02:38 | |
jeblair | 6 more changes merged | 02:38 |
openstackgerrit | Steve Baker proposed a change to openstack-infra/config: Generate heat docs on check and gate https://review.openstack.org/43234 | 02:38 |
morganfainberg | woot. | 02:38 |
anteaya | yay | 02:39 |
morganfainberg | merged is good! | 02:39 |
anteaya | yay merged | 02:39 |
jeblair | the git server is still going to be a big problem; a lot of tests are going to fail because of that. | 02:40 |
anteaya | :( | 02:41 |
anteaya | https://tinyurl.com/m9gcyjp | 02:41 |
anteaya | the graph seems to be in UTC | 02:41 |
anteaya | hopefully zuul can stay up for the rest of the night (insert appropriate time of day for yourself, dear reader) | 02:42 |
anteaya | jeblair: so the plan is to address git changes tomorrow? | 02:51 |
anteaya | yay 3 jobs in post | 02:51 |
jeblair | anteaya: yes. just like today. | 02:52 |
anteaya | sorry I thought today was benchmarking and tomorrow is making the changes | 02:52 |
jeblair | anteaya: nope, benchmarking wasn't on the agenda until after we rolled it out. it just took a while for haproxy to get set up. | 02:53 |
anteaya | ah | 02:53 |
morganfainberg | anteaya: ok, my 2 changesets that are needed arrived in the gate queue, thanks again for keeping me posted on what was up over here earlier on. | 02:54 |
anteaya | sorry I missed that point | 02:54 |
anteaya | yay | 02:54 |
anteaya | you are welcome, morganfainberg, thanks for your patience | 02:54 |
gordc | sweet, finally got a jenkins result back. big thanks jeblair and anyone else working on the issues. | 02:55 |
anteaya | yay gordc | 02:58 |
anteaya | congratulations on your jenkins result | 02:58 |
anteaya | jeblair has been working hard on it | 02:58 |
gordc | small victories :) yep, i've seen his name all over the rechecks. | 02:59 |
*** markmcclain has joined #openstack-infra | 03:00 | |
anteaya | :D | 03:00 |
anteaya | gotta celebrate them when they occur | 03:00 |
*** mriedem has quit IRC | 03:01 | |
*** tjones has joined #openstack-infra | 03:02 | |
*** tjones has left #openstack-infra | 03:02 | |
*** blamar has quit IRC | 03:17 | |
anteaya | zuul has been up for an hour and 40 minutes, how is it looking jeblair? | 03:19 |
anteaya | can I cheer again? | 03:19 |
anteaya | everything I can see looks good | 03:20 |
anteaya | jobs are finishing, others are starting | 03:20 |
anteaya | roaet your patch is being tested as we speak | 03:22 |
*** dina_belova has joined #openstack-infra | 03:29 | |
*** jfriedly has quit IRC | 03:32 | |
anteaya | there are two patches, a cinder and a neutron patch that have been in the post queue for a while | 03:34 |
*** dina_belova has quit IRC | 03:34 | |
anteaya | the translation-update job passed for both but the other three jobs: tarball, coverage and docs are queued and have been for a while | 03:34 |
anteaya | I will watch them and see if they move along | 03:35 |
anteaya | gate 72, check 153, post 2 | 03:35 |
Alex_Gaynor | need more workers :) | 03:35 |
anteaya | yeah | 03:36 |
anteaya | that is what jeblair and clarkb talked about creating | 03:36 |
anteaya | I think it is on tomorrow's agenda | 03:36 |
Alex_Gaynor | "need more cloud" :) | 03:36 |
anteaya | moar cloud | 03:37 |
anteaya | yeah, I hear that | 03:37 |
Alex_Gaynor | did we land either the git mirroring or the zuul fix, or are we just flying on luck? | 03:37 |
anteaya | okay those post patches have jobs running | 03:37 |
anteaya | zuul fix landed | 03:37 |
anteaya | been up for two hours with the new zuul fix | 03:38 |
Alex_Gaynor | oh, awesome, so now it should be at least smooth sailing (but slow) | 03:38 |
anteaya | so far, so good from what I can see | 03:38 |
anteaya | smooth but slow would be great | 03:38 |
anteaya | hanging out to check on the smooth part | 03:38 |
Alex_Gaynor | hmm, was the commit in the zuul repo? I don't see any new commits | 03:38 |
anteaya | not yet | 03:38 |
anteaya | let me dig it up | 03:39 |
Alex_Gaynor | I thought I read everything in the backlog, I must have missed it | 03:39 |
anteaya | https://review.openstack.org/#/c/43220/ | 03:39 |
anteaya | https://review.openstack.org/#/c/43221/ | 03:39 |
anteaya | everything I understand has me believing that jeblair made these changes before the last zuul restart | 03:40 |
anteaya | I rely on jeblair to correct me if I am wrong | 03:40 |
anteaya | in this regard | 03:40 |
Alex_Gaynor | thanks | 03:40 |
anteaya | np | 03:41 |
anteaya | thanks for asking | 03:41 |
anteaya | you actually know more about what is going on than I do | 03:41 |
Alex_Gaynor | I seriously doubt it :) | 03:42 |
anteaya | ha ha ha | 03:42 |
*** yaguang has quit IRC | 03:42 | |
anteaya | well you know a lot | 03:42 |
*** nati_ueno has joined #openstack-infra | 03:42 | |
anteaya | grateful for you input | 03:42 |
*** dims has quit IRC | 03:43 | |
anteaya | s/you/your | 03:44 |
*** nati_ueno has quit IRC | 03:44 | |
*** yaguang has joined #openstack-infra | 03:45 | |
anteaya | yay Queue lengths: 0 events, 0 results. | 03:51 |
anteaya | gate 71, post 1, check 152 | 03:51 |
anteaya | it is deleting a bunch of servers and starting a bunch more jobs | 03:52 |
*** HenryG_ has joined #openstack-infra | 03:54 | |
*** dstufft_ has joined #openstack-infra | 03:54 | |
*** DennyZhang has quit IRC | 03:54 | |
*** cyeoh has quit IRC | 03:54 | |
*** soren has quit IRC | 03:54 | |
*** dstufft has quit IRC | 03:55 | |
*** soren has joined #openstack-infra | 03:55 | |
*** DennyZhang has joined #openstack-infra | 03:55 | |
*** cyeoh has joined #openstack-infra | 03:55 | |
*** HenryG has quit IRC | 03:57 | |
*** vogxn has joined #openstack-infra | 03:57 | |
anteaya | slow and smooth seems to characterize what I am seeing right now | 03:58 |
Alex_Gaynor | uh oh, we've got a failure coming up in the gate pipeline :( | 03:58 |
anteaya | :( | 03:58 |
anteaya | yeah that always makes me sad too | 03:59 |
anteaya | the third patch | 03:59 |
anteaya | so two have a chance of getting in, then reset | 03:59 |
* Alex_Gaynor shaves 45 minutes off his life | 03:59 | |
anteaya | when I see 6 or 8 passing in the gate, I do a little happy dance in my chair | 04:00 |
anteaya | yup | 04:00 |
anteaya | http://tinyurl.com/kmotmns | 04:00 |
anteaya | here is a url for the test node graph | 04:00 |
anteaya | refreshing the page updates the graph | 04:00 |
anteaya | I have to turn in | 04:00 |
anteaya | what patch are you waiting on Alex_Gaynor? | 04:01 |
Alex_Gaynor | Nothing in particular, I just like watching the patches flow through the system | 04:01 |
anteaya | cool | 04:01 |
anteaya | I hear that | 04:02 |
anteaya | 3 in post, yay! | 04:02 |
anteaya | okay I'm done | 04:02 |
anteaya | have a good night Alex_Gaynor | 04:03 |
Alex_Gaynor | you too! | 04:03 |
anteaya | thanks | 04:03 |
*** anteaya has quit IRC | 04:03 | |
*** eharney has joined #openstack-infra | 04:03 | |
*** dkliban has joined #openstack-infra | 04:04 | |
Alex_Gaynor | uh oh, the results seem to have started to build up again | 04:07 |
Alex_Gaynor | there's severeal changesets in check that should have been processed already | 04:08 |
Alex_Gaynor | maybe not, coming back down | 04:12 |
*** huangtianhua has quit IRC | 04:15 | |
*** dklyle has joined #openstack-infra | 04:23 | |
*** dtroyer has quit IRC | 04:23 | |
*** dtroyer has joined #openstack-infra | 04:23 | |
*** retr0h has quit IRC | 04:23 | |
*** david-lyle has quit IRC | 04:23 | |
*** samalba has quit IRC | 04:23 | |
*** comstud has quit IRC | 04:23 | |
*** mberwanger has joined #openstack-infra | 04:24 | |
*** retr0h has joined #openstack-infra | 04:25 | |
*** retr0h has joined #openstack-infra | 04:25 | |
*** comstud has joined #openstack-infra | 04:25 | |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Add support for parameter filters in copyartifact https://review.openstack.org/41582 | 04:26 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Fixed timeout wrapper https://review.openstack.org/42348 | 04:26 |
*** samalba has joined #openstack-infra | 04:27 | |
*** gordc has quit IRC | 04:29 | |
*** dina_belova has joined #openstack-infra | 04:30 | |
*** rcleere has quit IRC | 04:32 | |
*** markmcclain has quit IRC | 04:33 | |
*** markmcclain has joined #openstack-infra | 04:33 | |
*** dina_belova has quit IRC | 04:35 | |
clarkb | Alex_Gaynor: seems ok right now | 04:35 |
clarkb | Alex_Gaynor: ahve you seen any more oddness? | 04:35 |
Alex_Gaynor | clarkb: yeah, must have just been a temporary blip | 04:35 |
Alex_Gaynor | Also, damn you centos for being the only thing with py26 | 04:36 |
clarkb | Alex_Gaynor: I agree | 04:36 |
clarkb | centos git makes me so sad | 04:36 |
*** boris-42 has joined #openstack-infra | 04:36 | |
Alex_Gaynor | clarkb: I don't think so, the only other thing I've noticed is that sometimes the SCP step takes an abnormally long time, way longer than I remember it taking it previous weeks | 04:37 |
*** DennyZhang has quit IRC | 04:37 | |
clarkb | Alex_Gaynor: they may be contention on the log server | 04:37 |
*** eharney has quit IRC | 04:37 | |
Alex_Gaynor | makes sense | 04:37 |
Alex_Gaynor | most insane CI infrastructure I've ever been a part of | 04:37 |
clarkb | http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=13&page=2 | 04:38 |
clarkb | its possible the finds that cleanup things slow stuff down when they run | 04:38 |
clarkb | Alex_Gaynor: the big CPU blips you see on that page are a result of find running and deleting old logs, compressing new things and so on | 04:39 |
*** Anju has left #openstack-infra | 04:39 | |
clarkb | Alex_Gaynor: I will make a note to look at that once git is sorted | 04:39 |
Alex_Gaynor | clarkb: yeah, definitely a low priority item :) | 04:39 |
clarkb | jeblair: my git plan. 1. spin up new servers 2. replicate from gerrit to new servers. 3. merge change to use git:// in g-g-p 4. merge haproxy change 5. merge change to add haproxy nodes | 04:40 |
clarkb | jeblair: 4 and 5 may end up being squashed together or at least merged together | 04:40 |
clarkb | then at some point we can use git:// in d-g but d-g should continue to happily use https so we can make sure nothing is falling over before doing that | 04:41 |
*** dstufft_ is now known as dstufft | 04:46 | |
*** ftcjeff has quit IRC | 04:49 | |
* Alex_Gaynor wonders if there's merit in installing a ppa for 2.6 on some of the other nodes | 04:50 | |
clarkb | maybe? there is value in testing on centos | 04:51 |
clarkb | now we know that you don't want to deploy openstack with git on centos :) | 04:52 |
clarkb | at least not with https and git-http-backend | 04:52 |
* Alex_Gaynor doesn't want to deploy much of anything on centos | 04:52 | |
* dstufft concurs with Alex_Gaynor | 04:52 | |
Alex_Gaynor | basically the entire check queue is bottlenecked on 2.6 :{ | 04:53 |
*** SergeyLukjanov has joined #openstack-infra | 04:53 | |
clarkb | Alex_Gaynor: yeah made worse by the gate monopolizing those resources | 04:53 |
Alex_Gaynor | clarkb: probably better this way, as long as we don't get a gate reset | 04:54 |
clarkb | ya, making the gate a higher priority was done with reason | 04:55 |
clarkb | in part to remove barriers to merging security related fixes | 04:55 |
Alex_Gaynor | the right way to address starvation in check is to just add more workers, not mess with the algorithms, IMO | 04:55 |
clarkb | yup | 04:55 |
*** nati_ueno has joined #openstack-infra | 04:56 | |
clarkb | Alex_Gaynor: we definitely want to use nodepool to dynamically add slaves that hang around longer | 04:56 |
clarkb | Alex_Gaynor: mordred has even hacked up kexec machinery that might be useful in having single use slaves that aren't as expensive to use as today's singel use slaves | 04:56 |
Alex_Gaynor | neat | 04:57 |
clarkb | Alex_Gaynor: the tricky bit there is we have single use slaves like we do today because tests get root and can really hose stuff | 04:57 |
clarkb | Alex_Gaynor: making sure that kexec can reboot into a good state without having been hosed by a test is a bit of work | 04:57 |
Alex_Gaynor | I wonder if there's any prior art | 04:58 |
clarkb | Alex_Gaynor: jeblair had stuff to do it when the tests ran on hardware | 04:58 |
clarkb | but I am not sure how worried they were of root abuse (intentional or not) at the time | 04:58 |
*** dkliban has quit IRC | 05:00 | |
fungi | clarkb: still reading scrollback but for future reference i think you can pass git ipv6 address literals using standard square-bracket notation (git clone http://[2001:4800:7812:514:3bc | 05:03 |
pleia2 | ah! good to know | 05:04 |
*** Dr01d has joined #openstack-infra | 05:04 | |
clarkb | fungi: thanks. I wonder why it doesnt' split on the right side | 05:05 |
clarkb | seems like that should work just fine. I gues sif you leave the port off you won't knwo if it is a port or part of the address | 05:05 |
clarkb | fungi: if you want to poke at the centos git cloning the 15g server jeblair listed in scroll back 162.209.12.127 iirc (I remembered that somehow) is the haproxy + apache + git serve and has openstack/nova on it | 05:07 |
clarkb | fungi: the 30g server 198.something was where I was running the client | 05:07 |
clarkb | fungi: haproxy is listening on 80, 443, and 9418 and apache is on 8080, 4443. git-daemon is on 29418 | 05:09 |
*** Ryan_Lane has quit IRC | 05:10 | |
*** nicedice_ has quit IRC | 05:13 | |
fungi | clarkb: cool. i'll see if i can spot any major differences in window scaling defaults in the kernel tcp/ip settings vs on ubuntu precise | 05:15 |
clarkb | ty | 05:15 |
*** mberwanger has quit IRC | 05:16 | |
*** primeministerp has quit IRC | 05:22 | |
*** primeministerp has joined #openstack-infra | 05:29 | |
*** dina_belova has joined #openstack-infra | 05:30 | |
*** sridevi has joined #openstack-infra | 05:32 | |
*** dina_belova has quit IRC | 05:35 | |
openstackgerrit | A change was merged to openstack/requirements: Bump python-swiftclient requirement to >=1.5 https://review.openstack.org/43092 | 05:37 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Fixing override-votes for gerrit trigger https://review.openstack.org/42341 | 05:45 |
*** morganfainberg is now known as morganfainberg|a | 05:56 | |
*** dmakogon_ has joined #openstack-infra | 05:58 | |
*** UtahDave has quit IRC | 05:58 | |
*** xchu has quit IRC | 06:07 | |
*** SlickNik has quit IRC | 06:09 | |
*** SlickNik has joined #openstack-infra | 06:09 | |
*** morganfainberg|a is now known as morganfainberg | 06:14 | |
*** SergeyLukjanov has quit IRC | 06:15 | |
*** markmc has joined #openstack-infra | 06:20 | |
*** dguitarbite has quit IRC | 06:21 | |
markmc | clarkb, fwiw, https://bugs.launchpad.net/openstack-ci/+bug/1215290 | 06:21 |
uvirtbot | Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New] | 06:21 |
markmc | clarkb, just trying to get the info in one place | 06:21 |
markmc | clarkb, I'm gonna see what the story is with git being rebased in RHEL6 | 06:22 |
markmc | clarkb, happy to help build a newer git RPM, though, if you'd use that | 06:22 |
*** sridevi has quit IRC | 06:23 | |
clarkb | markmc: thanks. I would be open to building newer git rpms but jeblair was understandably hesitant | 06:23 |
markmc | clarkb, ok | 06:24 |
*** xchu has joined #openstack-infra | 06:24 | |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 06:26 |
*** HenryG_ has quit IRC | 06:30 | |
*** mikal has quit IRC | 06:30 | |
*** dina_belova has joined #openstack-infra | 06:31 | |
*** p5ntangle has joined #openstack-infra | 06:31 | |
*** mikal has joined #openstack-infra | 06:32 | |
*** Dr01d has quit IRC | 06:34 | |
*** dina_belova has quit IRC | 06:36 | |
*** markmc has quit IRC | 06:38 | |
*** nayward has joined #openstack-infra | 06:42 | |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 06:42 |
*** p5ntangle has quit IRC | 06:42 | |
*** AJaeger has joined #openstack-infra | 06:57 | |
*** nati_ueno has quit IRC | 06:58 | |
*** Dr01d has joined #openstack-infra | 07:01 | |
*** AJaeger has quit IRC | 07:05 | |
*** SergeyLukjanov has joined #openstack-infra | 07:05 | |
*** pblaho has joined #openstack-infra | 07:05 | |
*** odyssey4me4 has joined #openstack-infra | 07:14 | |
*** fbo_away is now known as fbo | 07:16 | |
*** markmcclain has quit IRC | 07:18 | |
*** yolanda has joined #openstack-infra | 07:21 | |
*** sridevi has joined #openstack-infra | 07:24 | |
*** Anju has joined #openstack-infra | 07:25 | |
*** sridevi has quit IRC | 07:28 | |
*** vogxn has quit IRC | 07:28 | |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 07:29 |
*** dina_belova has joined #openstack-infra | 07:32 | |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 07:33 |
*** dina_belova has quit IRC | 07:36 | |
*** jpich has joined #openstack-infra | 07:37 | |
*** mkerrin has joined #openstack-infra | 07:39 | |
*** SergeyLukjanov has quit IRC | 07:40 | |
*** p5ntangle has joined #openstack-infra | 07:43 | |
*** afazekas has joined #openstack-infra | 07:52 | |
*** boris-42 has quit IRC | 07:58 | |
*** p5ntangle has quit IRC | 08:01 | |
*** AJaeger has joined #openstack-infra | 08:01 | |
*** AJaeger has joined #openstack-infra | 08:01 | |
*** p5ntangle has joined #openstack-infra | 08:02 | |
*** xchu has quit IRC | 08:04 | |
*** AJaeger has quit IRC | 08:06 | |
*** AJaeger has joined #openstack-infra | 08:12 | |
*** AJaeger has joined #openstack-infra | 08:12 | |
*** fifieldt_ has quit IRC | 08:15 | |
*** xchu has joined #openstack-infra | 08:16 | |
*** p5ntangl_ has joined #openstack-infra | 08:19 | |
*** vogxn has joined #openstack-infra | 08:20 | |
*** AJaeger has quit IRC | 08:20 | |
*** cthulhup has joined #openstack-infra | 08:21 | |
*** markmc has joined #openstack-infra | 08:22 | |
*** p5ntangle has quit IRC | 08:22 | |
*** michchap_ has joined #openstack-infra | 08:25 | |
*** cthulhup has quit IRC | 08:25 | |
*** dmakogon_ has quit IRC | 08:26 | |
*** michchap has quit IRC | 08:27 | |
*** ruhe has joined #openstack-infra | 08:30 | |
*** dina_belova has joined #openstack-infra | 08:32 | |
*** dina_belova has quit IRC | 08:37 | |
*** koobs` has quit IRC | 08:45 | |
*** koobs` has joined #openstack-infra | 08:45 | |
*** koobs` is now known as koobs | 08:45 | |
*** AJaeger has joined #openstack-infra | 08:50 | |
*** AJaeger has quit IRC | 08:50 | |
*** AJaeger has joined #openstack-infra | 08:50 | |
*** p5ntangl_ has quit IRC | 08:54 | |
*** AJaeger has quit IRC | 08:55 | |
*** p5ntangle has joined #openstack-infra | 08:55 | |
*** sridevi has joined #openstack-infra | 08:57 | |
*** xBsd has joined #openstack-infra | 09:02 | |
*** sridevi has quit IRC | 09:03 | |
Anju | cyeoh : in neutron cli there is an optional argument of json and xml | 09:05 |
markmc | clarkb, jeblair, there are git 1.7.12.4 packages available for centos, signed with the centos testing key: https://bugs.launchpad.net/openstack-ci/+bug/1215290/comments/3 | 09:12 |
uvirtbot | Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New] | 09:12 |
*** cthulhup has joined #openstack-infra | 09:15 | |
*** cthulhup has quit IRC | 09:20 | |
openstackgerrit | Julien Danjou proposed a change to openstack-infra/statusbot: Handle topic via a configuration file https://review.openstack.org/43263 | 09:30 |
*** michchap_ has quit IRC | 09:30 | |
*** dina_belova has joined #openstack-infra | 09:33 | |
openstackgerrit | Serg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects https://review.openstack.org/41650 | 09:35 |
openstackgerrit | Serg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects https://review.openstack.org/41650 | 09:36 |
openstackgerrit | Serg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects https://review.openstack.org/41650 | 09:37 |
*** dina_belova has quit IRC | 09:37 | |
*** AJaeger has joined #openstack-infra | 09:37 | |
*** AJaeger has joined #openstack-infra | 09:37 | |
*** boris-42 has joined #openstack-infra | 09:37 | |
*** enikanorov_ has joined #openstack-infra | 09:47 | |
*** AJaeger has quit IRC | 09:49 | |
*** BobBallAway is now known as BobBall | 09:57 | |
*** xchu has quit IRC | 09:58 | |
*** afazekas has quit IRC | 10:07 | |
*** thomasbiege has joined #openstack-infra | 10:10 | |
*** ruhe has quit IRC | 10:15 | |
*** morganfainberg is now known as morganfainberg|a | 10:21 | |
*** ruhe has joined #openstack-infra | 10:26 | |
*** p5ntangl_ has joined #openstack-infra | 10:27 | |
*** thomasbiege has quit IRC | 10:28 | |
*** weshay has joined #openstack-infra | 10:29 | |
*** p5ntangle has quit IRC | 10:30 | |
*** ruhe has quit IRC | 10:30 | |
*** dina_belova has joined #openstack-infra | 10:33 | |
*** vogxn has quit IRC | 10:35 | |
*** vogxn has joined #openstack-infra | 10:37 | |
*** thomasbiege has joined #openstack-infra | 10:38 | |
*** vogxn has quit IRC | 10:38 | |
*** dina_belova has quit IRC | 10:38 | |
*** vogxn has joined #openstack-infra | 10:38 | |
*** vogxn has left #openstack-infra | 10:42 | |
*** vogxn has joined #openstack-infra | 10:42 | |
*** openstack has joined #openstack-infra | 15:12 | |
markmc | oh look, everything got requeued | 15:12 |
markmc | dark magic at work | 15:12 |
anteaya | except for 41070 at the bottom | 15:13 |
anteaya | it is still running | 15:13 |
*** reed has joined #openstack-infra | 15:13 | |
markmc | 41070 is the first patch | 15:14 |
markmc | none of the rest can merge without them | 15:14 |
markmc | wth? | 15:14 |
*** p5ntangle has quit IRC | 15:14 | |
jeblair | markmc: it just figured that out | 15:15 |
jeblair | 2013-08-22 15:14:42,294 INFO zuul.DependentPipelineManager: Dequeuing change <Change 0x7faf68327050 42433,7> because it can no longer merge | 15:15 |
markmc | and 4 have disappeared | 15:15 |
markmc | DRAMATIC SCENES UNFOLDING HERE | 15:15 |
*** ruhe has quit IRC | 15:17 | |
jeblair | the reason it's slow is because it's building up proposed states of the git repo _before_ it checks that. in retrospect, that does seem like a sub-optimal ordering. | 15:17 |
jd__ | just for my personal culture, the slowness is a problem with zuul or lack of resource to run the jobs? | 15:17 |
anteaya | yes | 15:17 |
jd__ | 'cause I saw a lot of checks waiting for python26 only | 15:18 |
anteaya | that is a git issue | 15:18 |
markmc | speaking of python26 | 15:18 |
anteaya | proxying problems | 15:18 |
anteaya | we are trying to address git today | 15:18 |
markmc | jeblair, saw my message about newer centos6 git rpms? | 15:18 |
anteaya | jd__: so more than one issue | 15:18 |
jd__ | anteaya: is there a trace of that I can read about? | 15:18 |
anteaya | hopefully the problem with zuul has been addresssed | 15:18 |
anteaya | just the log for the last 3-4 days | 15:19 |
anteaya | it has slowly built | 15:19 |
markmc | jd__, https://bugs.launchpad.net/openstack-ci/+bug/1215290 | 15:19 |
uvirtbot | Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New] | 15:19 |
anteaya | the tl;dr version is that zuul had a bug which was hard to trace but jeblair found it yesterday | 15:19 |
jeblair | markmc: yes, thanks; i'm not sure if we should try to do that now or stick with the current tentative plan and switch to the git protocol, which is faster even with the version in centos6 | 15:19 |
jeblair | (after we scale out the git server) | 15:20 |
markmc | jeblair, cool | 15:20 |
anteaya | the git issue I have a weak grasp of, but it is about not having enough git repos available to clone/download and we are having timeouts | 15:20 |
jeblair | anteaya: the git server is overloaded if we run all of the jobs we have at once | 15:20 |
anteaya | there we go, thanks jeblair | 15:20 |
anteaya | we are working to better load balance the git server | 15:21 |
jeblair | which is why we haven't added more centos slaves, because at this point, adding more slaves will only make that worse | 15:21 |
anteaya | hoping to make progress on that today | 15:21 |
*** rfolco has joined #openstack-infra | 15:21 | |
anteaya | right, it just increases the overload on the git server | 15:21 |
* anteaya feels understanding is starting to fall into place | 15:21 | |
jeblair | zaro: when you're up and have a minute; i have no idea why this happened: http://paste.openstack.org/show/44899/ | 15:22 |
markmc | anteaya, if you're up for it, it might be cool to file bugs about ongoing stuff like this and update the bug as progress is made | 15:22 |
*** thomasbiege has quit IRC | 15:22 | |
anteaya | I can make an attempt on it | 15:22 |
jeblair | zaro: oops, better paste here: http://paste.openstack.org/show/44900/ | 15:22 |
markmc | that'd be awesome | 15:22 |
anteaya | I welcome your direction on it as I go along, markmc, thanks | 15:22 |
jeblair | anteaya: ++ | 15:22 |
BobBall | recheck no bug doesn't seem to be working? | 15:23 |
anteaya | I will be afk for about 10 minutes and then will get started on bug reports | 15:23 |
BobBall | I added a few on my changes and the check queue is still empty? | 15:23 |
jeblair | BobBall: zuul has a backlog of gerrit events right now, it should get to it | 15:23 |
BobBall | I'll be patient then :) | 15:23 |
*** dims has joined #openstack-infra | 15:23 | |
jeblair | BobBall: "Queue lengths: 126 events" is the operative thing | 15:23 |
*** thomasbiege has joined #openstack-infra | 15:24 | |
jeblair | zaro: anyway, it looks like jenkins said it was taking the node offline, but it apparently wasn't offline when the functions were registered, so it ran a job anyway | 15:24 |
BobBall | ahhhh I see | 15:24 |
*** vogxn has quit IRC | 15:25 | |
anteaya | BobBall: we just restarted zuul, and we have the queue in a staggered start | 15:26 |
anteaya | once the queue length is 0 events - will probably take about 90 minutes | 15:27 |
anteaya | if you don't see your patch, then recheck | 15:27 |
BobBall | Makes sense | 15:27 |
BobBall | stop it overloading | 15:28 |
BobBall | Might be worth adding the "wait 90 minutes" in the topic? I'm sure I won't be the only person asking this | 15:28 |
anteaya | right | 15:28 |
*** jjmb has quit IRC | 15:28 | |
anteaya | I am going to work on some bugs as communication tools | 15:29 |
jeblair | zaro: i think it's because when that happens, jenkins disconnects the node asynchronously; so it may not actually be offline for a while | 15:29 |
anteaya | the wait will change as time passes, so the message will get stale quickly | 15:29 |
anteaya | I can answer questions and folks are good about reading logs | 15:29 |
*** ruhe has joined #openstack-infra | 15:30 | |
*** thomasbiege has quit IRC | 15:32 | |
*** pblaho has quit IRC | 15:33 | |
*** dina_belova has joined #openstack-infra | 15:35 | |
reed | hello folks | 15:35 |
jeblair | reed: hello | 15:35 |
*** CaptTofu has quit IRC | 15:36 | |
fungi | jeblair: clarkb: so i tried adjusting some tcp settings on git-test-15 but cloning nova from it via https was still taking ~8 minutes with nothing else going on | 15:40 |
fungi | plus a lot of errors like... | 15:40 |
fungi | error: Unable to get pack index https://162.209.12.127/openstack/nova/objects/pack/pack-38faaee3478b9e659b67b5f59b7ecb1e77552a93.idx | 15:40 |
*** Ryan_Lane has joined #openstack-infra | 15:40 | |
fungi | error: Unable to find 8f47cb63996d34ce3d8fcaf9f449b400ce033c70 under https://162.209.12.127/openstack/nova | 15:40 |
fungi | Cannot obtain needed object 8f47cb63996d34ce3d8fcaf9f449b400ce033c70 | 15:40 |
fungi | et cetera | 15:40 |
jeblair | fungi: well that's what happens when it falls back on the dumb http protocol | 15:41 |
*** vogxn has joined #openstack-infra | 15:41 | |
fungi | as opposed to git and http protocol which averaged a snappy 40 seconds | 15:41 |
fungi | so yeah, i suspect there is something terribly wrong on centos as pertains to the git cgi backend and https but still no clue what | 15:41 |
* fungi has to dash out to meet some people but will check back in later | 15:42 | |
clarkb | fungi I actually think it is a client side issue | 15:42 |
*** gyee has joined #openstack-infra | 15:42 | |
fungi | clarkb: marvellous | 15:42 |
clarkb | other git clients clone from that host just fine. centos 1.7.1 git does not | 15:42 |
clarkb | jeblair I am going to run to the office in a minute then will begin the load balance git process | 15:43 |
jeblair | clarkb: cool, i should be ready to pitch in then | 15:43 |
pleia2 | clarkb: want me to do some time tests with 1.7.1 and the rpmforge 1.7.11 so at least we have a data point? | 15:43 |
clarkb | pleia2 yes testing newer git clients on centos would at least help confirm it is client side | 15:44 |
pleia2 | k, will do | 15:44 |
markmc | pleia2, I provided a link to a repo containing 1.7.12.4 for centos, maintained by a centos maintainer | 15:46 |
* markmc digs it up again | 15:46 | |
pleia2 | markmc: saw the lp link, I can use that | 15:47 |
jeblair | https://bugs.launchpad.net/openstack-ci/+bug/1215290 | 15:47 |
uvirtbot | Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New] | 15:47 |
markmc | pleia2, ok, great | 15:47 |
markmc | pleia2, just wasn't sure from "rpmforge" | 15:47 |
pleia2 | markmc: the packages on rpmforge seem to be the most common way folks install newer versions on centos | 15:48 |
markmc | pleia2, who maintains those? | 15:48 |
pleia2 | markmc: I don't know | 15:49 |
markmc | pleia2, right :) | 15:49 |
*** wu_wenxiang has joined #openstack-infra | 15:52 | |
wu_wenxiang | https://review.openstack.org/#/c/43138/, I tried "recheck no bug" twice, however didn't start check process | 15:53 |
*** pabelanger_ has joined #openstack-infra | 15:55 | |
*** pabelanger_ has quit IRC | 15:56 | |
*** pabelanger_ has joined #openstack-infra | 15:56 | |
anteaya | here is the LOST test logs bug: https://bugs.launchpad.net/openstack-ci/+bug/1215511 | 15:57 |
uvirtbot | Launchpad bug 1215511 in openstack-ci "LOST test logs" [Undecided,New] | 15:57 |
*** CaptTofu has joined #openstack-infra | 15:57 | |
*** dina_belova has quit IRC | 15:58 | |
*** pabelanger has quit IRC | 16:00 | |
*** pabelanger_ is now known as pabelanger | 16:00 | |
markmc | anteaya, nice | 16:00 |
*** pabelanger_ has joined #openstack-infra | 16:00 | |
anteaya | thanks | 16:01 |
*** CaptTofu_ has joined #openstack-infra | 16:01 | |
*** CaptTofu has quit IRC | 16:02 | |
wu_wenxiang | https://review.openstack.org/#/c/43138/, I tried "recheck no bug" twice, however didn't start check process, Could anyone help? Thanks | 16:03 |
pleia2 | wu_wenxiang: someone should be able to take a look soon, it's been a bit of a crazy week | 16:05 |
jeblair | wu_wenxiang: it's probably in the backlog of gerrit events | 16:05 |
*** markmc has quit IRC | 16:05 | |
jeblair | wu_wenxiang: "Queue lengths: 106 events" on the status page; it should get to it soon | 16:05 |
wu_wenxiang | pleia2: jeblair: Thanks | 16:06 |
wu_wenxiang | pleia2: crazy week means too much commit? | 16:07 |
*** dkranz has quit IRC | 16:07 | |
*** cthulhup has joined #openstack-infra | 16:09 | |
*** datsun180b has quit IRC | 16:09 | |
*** datsun180b has joined #openstack-infra | 16:09 | |
*** alexpilotti_ has joined #openstack-infra | 16:11 | |
*** dklyle is now known as david-lyle | 16:11 | |
*** ruhe has quit IRC | 16:11 | |
*** pabelanger has quit IRC | 16:11 | |
*** zul has quit IRC | 16:12 | |
*** cppcabrera has left #openstack-infra | 16:12 | |
anteaya | here is the my rechecked/reverified patch isn't in the queue: https://bugs.launchpad.net/openstack-ci/+bug/1215522 | 16:13 |
uvirtbot | Launchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in status.openstack.org/zuul" [Undecided,New] | 16:13 |
*** alexpilotti has quit IRC | 16:13 | |
*** alexpilotti_ is now known as alexpilotti | 16:13 | |
anteaya | going to grab a bite to eat | 16:14 |
*** wu_wenxiang has quit IRC | 16:14 | |
*** SergeyLukjanov has quit IRC | 16:14 | |
jd__ | anteaya: ah, your bug is exactly the question I was going to ask! | 16:14 |
anteaya | yay | 16:15 |
anteaya | my first customer | 16:15 |
anteaya | jd__: add comments if I left anything out | 16:15 |
*** jfriedly has joined #openstack-infra | 16:17 | |
*** ^d has joined #openstack-infra | 16:19 | |
*** ^d has joined #openstack-infra | 16:19 | |
*** ruhe has joined #openstack-infra | 16:19 | |
*** ruhe has quit IRC | 16:19 | |
clarkb | jeblair: I am in front of the big monitor now | 16:20 |
clarkb | jeblair: I am going to spin up git01 through git04 on the ci account as 8GB centos nodes | 16:21 |
clarkb | jeblair: and will point them at the puppet development env so that they get all of the cgit stuff | 16:21 |
*** arezadr has quit IRC | 16:21 | |
zaro | morning | 16:22 |
clarkb | Then I will propose a change to replicate gerrit to them and update the existing change to balance across them. Once gerrit replication has caught up merge the haproxy and g-g-p changes | 16:22 |
zaro | jeblair: do we need a double check to make sure slave is offline in StartJobWorker? | 16:22 |
jeblair | zaro: that's the thing, there's a check in registerfunctions; and since it registered 46 functions, it must have been online | 16:23 |
jeblair | zaro: oh, i see | 16:23 |
jeblair | zaro: a check right before we accept a job | 16:23 |
*** arezadr has joined #openstack-infra | 16:24 | |
jeblair | zaro: yeah i think if we wanted to do that, maybe put it in the gearmanworkerimpl right before we do a grab_job? | 16:24 |
zaro | jeblair: that would probably work. i was thinking another one after setting slave offline? | 16:25 |
jeblair | zaro: so we don't get it from gearman (once we get the job from gearman, it doesn't matter, we have to run it) | 16:25 |
*** dkranz has joined #openstack-infra | 16:25 | |
jeblair | zaro: what do you mean about setting the slave offline? | 16:25 |
jeblair | zaro: and here's an idea we should have thought about earlier -- why don't we have the gearman plugin always return work_complete if the jenkins job finishes (regardless of the outcome); but have it return work_fail if it grabs a job and finds that it can't run it... | 16:27 |
jeblair | zaro: it already returns work_exception if there is a problem running it; i should have zuul catch that case and re-run the job | 16:27 |
jeblair | (that would help with some of the strange exceptions we've been seeing) | 16:28 |
jeblair | zaro: and then later, if we do the thing with work_fail, we could have zuul do the same thing (re-run the job) | 16:28 |
*** SergeyLukjanov has joined #openstack-infra | 16:28 | |
jeblair | clarkb: great, i'll be with you in just a min. | 16:28 |
*** dina_belova has joined #openstack-infra | 16:29 | |
*** datsun180b has quit IRC | 16:29 | |
zaro | jeblair: i don't see a problem with that off the top of my head. | 16:30 |
*** mrodden has quit IRC | 16:35 | |
HenryG | How do I search for gerrit reviews containing a specific string in the commit message? | 16:36 |
*** saper has quit IRC | 16:37 | |
*** saper has joined #openstack-infra | 16:37 | |
clarkb | HenryG: there may be a way to do it with grep and the ssh query interface, but the gerrit ui does not offer that functionality | 16:38 |
clarkb | HenryG: upstream gerrit has played with using lucene to index that stuff but it gets expensive quickyl | 16:38 |
clarkb | jeblair: rax gave me a build error on the first host (I was going to build one before the others to smooth out any additional stuff). Have you seen BUILD 0 then ERROR before? | 16:39 |
HenryG | clarkb: thanks. :( | 16:40 |
*** datsun180b has joined #openstack-infra | 16:40 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Add option to test jenkins node before use https://review.openstack.org/43313 | 16:40 |
clarkb | jeblair: I am going to try a second host and see if this is transient | 16:41 |
*** zul has joined #openstack-infra | 16:41 | |
jeblair | clarkb: all the time, yeah, just try again | 16:41 |
*** cthulhup has quit IRC | 16:41 | |
jeblair | clarkb: ^ that patch is untested; no rush -- but something to think about in the back of your head for after the git server. | 16:41 |
*** pcrews has quit IRC | 16:41 | |
clarkb | ok | 16:42 |
jeblair | clarkb: the idea is we can have nodepool run a very simple test job before actually putting each node into service | 16:42 |
jeblair | clarkb: it might be useful for some of the weird errors we've b | 16:42 |
jeblair | een seeing from jenkins | 16:42 |
jeblair | clarkb: (though it would mean quite a bit more work for jenkins) | 16:43 |
clarkb | I like the idea. Possibly try to find better performing nodes if we can test that quickly and have a decent understanding of what to look at | 16:43 |
jeblair | clarkb: yeah, could put anything in the test. though i was thinking "echo ok" for now. | 16:43 |
*** nicedice_ has joined #openstack-infra | 16:44 | |
*** morganfainberg|a is now known as morganfainberg | 16:44 | |
clarkb | ya performance testing probably won't happen any time soon | 16:44 |
jeblair | clarkb: anyway, how may i help? | 16:44 |
clarkb | jeblair: want to get a change ready to switch g-g-p to using git:// again? | 16:45 |
pleia2 | clarkb: time output is in the etherpad (1.7.12 is faster) | 16:46 |
*** Dr01d has quit IRC | 16:46 | |
HenryG | clarkb: Googling "<text> site:review.openstack.org" turned up some useable results for carefully chosen <text>. YMMV. | 16:46 |
jeblair | clarkb: ack | 16:46 |
clarkb | pleia2: cool. want to try cloning from https://162.209.12.127/openstack/nova using both client versions of git? | 16:47 |
clarkb | pleia2: I expect 1.7.1 to fail | 16:47 |
pleia2 | on it | 16:48 |
jeblair | 04:41 < clarkb> jeblair: my git plan. 1. spin up new servers 2. replicate from gerrit to new servers. 3. merge change to use git:// in g-g-p 4. merge haproxy change 5. merge change to add haproxy nodes | 16:48 |
*** vogxn has left #openstack-infra | 16:48 | |
*** mrodden has joined #openstack-infra | 16:49 | |
clarkb | jeblair: that is still the plan, though at this point I expect 4 and 5 to be one change | 16:49 |
jeblair | clarkb: why not do 3 last? | 16:49 |
clarkb | jeblair: I was thinking of propogation delay of the JJB update | 16:49 |
clarkb | jeblair: it can be done last | 16:49 |
pleia2 | clarkb: ssl errors, how were you getting around this? | 16:49 |
clarkb | s/propogation delay/time to run/ | 16:49 |
*** jpich has quit IRC | 16:50 | |
*** krtaylor has quit IRC | 16:50 | |
clarkb | pleia2: you have to tell git to ignore ssl errors /me looks in hsitory for the flag | 16:50 |
clarkb | pleia2: GIT_SSL_NO_VERIFY=true | 16:50 |
jeblair | clarkb: ok, original plan wfm | 16:50 |
pleia2 | clarkb: thanks | 16:51 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Switch ggp to use git:// https://review.openstack.org/43315 | 16:51 |
jeblair | clarkb: I'm updating the etherpad with the plan and changes | 16:52 |
*** bingbu has quit IRC | 16:53 | |
*** svarnau has joined #openstack-infra | 16:54 | |
jeblair | clarkb: is there a replication change? | 16:54 |
clarkb | jeblair: there isn't a replciation change yet. I suppose that doesn't need IP addresses. jeblair can you write that one too and put it on the bottom of the current stack of 2 changes? | 16:55 |
clarkb | (there will be a conflict because I put a todo in my change to do it) | 16:55 |
*** fbo is now known as fbo_away | 16:56 | |
jeblair | clarkb: oh, right, you said 4+5 are one change... hang on | 16:56 |
*** dina_belova has quit IRC | 16:57 | |
clarkb | jeblair: ya the haproxy stuff needs IP addreses so will happen after the nodes are all spun up and replicated to. But gerrit replication doesn't need IP addresses so you can get that change ready and merge it as soon as those hosts have DNS records | 16:57 |
clarkb | jeblair: what was the pyyaml workaround? | 16:57 |
pleia2 | clarkb: yeah, after ~6 minutes it fails on 1.7.1, but 1.7.12 works (added to pad) | 16:57 |
jeblair | clarkb: oh, that's what you meant by bottom. ok, i think i'm caught up now | 16:57 |
jeblair | clarkb: 'pip uninstall pyyaml'; re run puppet | 16:58 |
clarkb | pleia2: awesome. I think that confirms it is client side and version related | 16:58 |
clarkb | jeblair: thanks | 16:58 |
jeblair | clarkb: https://review.openstack.org/#/c/43012/3 | 16:58 |
jeblair | https://review.openstack.org/#/c/42784/ | 16:58 |
jeblair | clarkb: those are the 2 changes you're talking about, right? | 16:59 |
jeblair | (haproxy and xinetd) | 16:59 |
anteaya | etherpad link, for those viewers at home: https://etherpad.openstack.org/git-lb | 16:59 |
clarkb | jeblair: yes | 16:59 |
*** SergeyLukjanov has quit IRC | 17:00 | |
clarkb | git02 is happy now. I will add its DNS record then do the other three in one batch | 17:00 |
*** dkranz has quit IRC | 17:01 | |
*** dkranz has joined #openstack-infra | 17:01 | |
*** dims has quit IRC | 17:01 | |
*** nati_ueno has joined #openstack-infra | 17:02 | |
*** dims has joined #openstack-infra | 17:02 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Replicate to git01-git04 https://review.openstack.org/43316 | 17:03 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 17:03 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Swap git daemon in xinetd for service https://review.openstack.org/43012 | 17:03 |
*** morganfainberg is now known as morganfainberg|a | 17:05 | |
*** BobBall is now known as BobBallAway | 17:05 | |
*** SergeyLukjanov has joined #openstack-infra | 17:05 | |
*** nayward has quit IRC | 17:06 | |
clarkb | jeblair: that looks right | 17:07 |
clarkb | jeblair: pleia2's test indicates upgrading git would help in the https case should we need to go down that route | 17:07 |
jeblair | clarkb: excellent. i love plan b's. and c's. and d's. | 17:08 |
*** thomasbiege has joined #openstack-infra | 17:08 | |
clarkb | jeblair: eventually we will have the whole alphabet | 17:09 |
jeblair | sometimes i put j at the end, that could be confusing. | 17:09 |
anteaya | right now this is on the zuul status page: Queue lengths: 50 events, 84 results. What results are being referenced here? | 17:10 |
*** svarnau has quit IRC | 17:10 | |
jeblair | anteaya: results from jenkins | 17:10 |
anteaya | like logs? | 17:10 |
jeblair | anteaya: just information as to whether the job succeded | 17:10 |
anteaya | ah okay | 17:10 |
anteaya | success, failure, lost | 17:11 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Adding support for the Warnings plugin https://review.openstack.org/40621 | 17:11 |
clarkb | oh shiny I have two git01's because of the error | 17:11 |
jeblair | anteaya: when they pile up like that, it's usuaally because zuul either started or stopped a bunch of jobs. | 17:11 |
clarkb | jeblair: do I need to explicitly delete the one in ERROR state? | 17:11 |
jeblair | clarkb: yes | 17:11 |
*** lcestari has joined #openstack-infra | 17:11 | |
anteaya | ah okay, I didn't know the Jenkins results were queued as well | 17:11 |
*** svarnau has joined #openstack-infra | 17:12 | |
jeblair | anteaya: zuul is almost to the point where we can get rid of that. | 17:12 |
anteaya | yay | 17:12 |
jeblair | anteaya: it looks like it was a gate reset, so those were probably abort results | 17:12 |
anteaya | ah okay | 17:12 |
*** svarnau has quit IRC | 17:12 | |
*** svarnau has joined #openstack-infra | 17:13 | |
*** wenlock has joined #openstack-infra | 17:13 | |
anteaya | I think I can see the gate reset in this graph: https://tinyurl.com/kmotmns | 17:13 |
anteaya | looks like it happened 15 or 20 minutes ago | 17:14 |
jeblair | possibly | 17:14 |
* anteaya nods | 17:15 | |
anteaya | 282 results that are queued right now, I am going with no action is required from us | 17:16 |
jeblair | clarkb: $::ipaddress is a puppet fact? | 17:17 |
clarkb | jeblair: yes | 17:18 |
clarkb | git01 has dns records and is puppet happy | 17:18 |
clarkb | still waiting for the error state node to go away | 17:18 |
clarkb | git04 errord as well and git03 will be ready as soon as the reboot completes | 17:18 |
*** alexpilotti has quit IRC | 17:18 | |
jeblair | clarkb: don't hold your breath | 17:18 |
jeblair | anteaya: zuul is done launching all the jobs from the gate reset and is back processing the event and result queues again | 17:19 |
clarkb | launching a new git04. errored git04 went away faster than git01 | 17:20 |
*** thomasbiege has quit IRC | 17:20 | |
anteaya | jeblair: grand thank you | 17:20 |
clarkb | 1 though 3 should have DNS records and are puppet happy now. Just waiting on git04 | 17:21 |
*** SergeyLukjanov has quit IRC | 17:22 | |
jeblair | btw, the new image in az2 looks good (no java segfault), but i haven't deleted the old nodes in jenkins which are preventing its use | 17:23 |
clarkb | jeblair: ok | 17:24 |
jeblair | (as a mechanism to slow nodepool) | 17:24 |
clarkb | jeblair: note that I am running all puppet on these new nodes out of the development env so that when we do merge the prposed changes the diff puppet has to deal with should be minimal or nil | 17:24 |
*** boris-42 has joined #openstack-infra | 17:25 | |
jeblair | clarkb: ack | 17:25 |
clarkb | the exciting puppet run will be on git.o.o though :) | 17:25 |
jeblair | #status ok | 17:25 |
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure | docs http://ci.openstack.org | bugs https://launchpad.net/openstack-ci/+milestone/grizzly | https://github.com/openstack-infra/config" | 17:25 | |
*** pcm_ has quit IRC | 17:28 | |
anteaya | yay back to status ok | 17:28 |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 17:28 |
*** pcrews has joined #openstack-infra | 17:28 | |
*** pabelanger has joined #openstack-infra | 17:29 | |
*** morganfainberg|a is now known as morganfainberg | 17:29 | |
*** svarnau has quit IRC | 17:29 | |
*** svarnau has joined #openstack-infra | 17:30 | |
jswarren | Any thoughts on why the python26 jobs appear to be significantly slower than the python27 jobs? | 17:31 |
clarkb | jswarren: there are a couple related things but the biggest factor is we have fewer slaves capable of running python26 jobs | 17:32 |
clarkb | jeblair: git04 is almost ready | 17:32 |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 17:32 |
jswarren | OK. | 17:32 |
jswarren | Thanks. | 17:32 |
*** pcm_ has joined #openstack-infra | 17:32 | |
jeblair | Alex_Gaynor: mind if i quote you in my slides next time i give a presentation? :) | 17:32 |
Alex_Gaynor | jeblair: sure, what'd I say? | 17:33 |
*** markmcclain has quit IRC | 17:33 | |
*** thomasbiege has joined #openstack-infra | 17:33 | |
clarkb | jswarren: it also doesn't help that the python26 jobs do tend to take a little longer as they run on hosts with older slow git and I think running many of our tests on python26 just takes longer | 17:33 |
*** xBsd has quit IRC | 17:34 | |
jeblair | 04:38 < Alex_Gaynor> most insane CI infrastructure I've ever been a part of | 17:36 |
Alex_Gaynor | jeblair: oh, absolutely :D | 17:36 |
*** morganfainberg has left #openstack-infra | 17:36 | |
clarkb | git04 is happy with puppet now | 17:37 |
*** morganfainberg has joined #openstack-infra | 17:37 | |
clarkb | waiting for DNS records to resolve then I think we can prepare to replicate | 17:37 |
clarkb | jeblair: ^ does approving the replication change automatically restart gerrit? if not I think we should go ahead and merge | 17:37 |
jeblair | clarkb: i don't _think_ anything restarts gerrit except an upgrade | 17:38 |
*** fbo_away is now known as fbo | 17:38 | |
*** jbjohnso has quit IRC | 17:38 | |
jeblair | clarkb: yeah, looking at the puppet, i think we're fine. | 17:39 |
clarkb | jeblair: the haproxy change failed puppet lint but I can fix that when I add the balancermembers | 17:39 |
*** svarnau has quit IRC | 17:39 | |
* anteaya gets ready to applaud | 17:40 | |
clarkb | anteaya: we are still a little ways out | 17:40 |
*** svarnau has joined #openstack-infra | 17:40 | |
anteaya | I'll applaud all I can | 17:40 |
clarkb | going to wait for replication to happen completely before moving to the next step | 17:40 |
clarkb | jeblair: is it not possible to SIGHUP gerrit and have it pick up those changes? | 17:40 |
clarkb | iirc gerrit can pick up some config and project changes on the fly but I never remember which ones | 17:41 |
clarkb | one replicated we can do a quick set of tests to make sure 8080, 4443, and 29418 all answer to git operations | 17:42 |
jeblair | clarkb: i think i read in a stackoverflow question yesterday it needed a restart | 17:42 |
jeblair | clarkb: gerrit restarts are fairly fast, i don't think it's a big deal | 17:42 |
clarkb | jeblair: ok | 17:42 |
*** svarnau has quit IRC | 17:42 | |
openstackgerrit | A change was merged to openstack-infra/config: Replicate to git01-git04 https://review.openstack.org/43316 | 17:45 |
*** dina_belova has joined #openstack-infra | 17:45 | |
*** svarnau has joined #openstack-infra | 17:45 | |
jeblair | yay gate priority | 17:45 |
anteaya | :D | 17:46 |
*** SergeyLukjanov has joined #openstack-infra | 17:46 | |
clarkb | jeblair: do you want to kick gerrit when you think it is safe? I am going to fix the haproxy change and add the balancermembers | 17:47 |
*** thomasbiege2 has joined #openstack-infra | 17:47 | |
*** ruhe has joined #openstack-infra | 17:48 | |
*** svarnau has quit IRC | 17:48 | |
anteaya | do we need a channel status update for the gerrit reset? | 17:48 |
clarkb | anteaya: maybe not. as jeblair mentioned it goes really fast though occasionally people do notice | 17:49 |
anteaya | okay | 17:49 |
anteaya | I will stand by to field inquiries | 17:49 |
anteaya | though folks have been really patient and supportive | 17:49 |
anteaya | thanks everyone | 17:50 |
jeblair | clarkb: i will handle the gerrit restart | 17:50 |
*** changbl has joined #openstack-infra | 17:51 | |
*** thomasbiege has quit IRC | 17:51 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 17:52 |
clarkb | that should pass lint and it adds the balancer members | 17:52 |
pleia2 | great | 17:53 |
*** ^demon has joined #openstack-infra | 17:54 | |
jeblair | #status notice restarting gerrit to pick up a configuration change | 17:55 |
openstackstatus | NOTICE: restarting gerrit to pick up a configuration change | 17:55 |
^demon | jeblair: I wasn't paying attention to what channel I was in and I freaked out for a moment. | 17:56 |
^demon | I was like "who's making config changes and I don't know?" :) | 17:56 |
*** ^d has quit IRC | 17:56 | |
jeblair | ^demon: haha! | 17:57 |
uvirtbot | jeblair: Error: "demon:" is not a valid command. | 17:57 |
jeblair | wow, uvirtbot makes it really fun to talk to you ^demon :) | 17:57 |
pleia2 | hehe | 17:58 |
jeblair | need to get gerrit to accept the new hostkeys | 17:58 |
*** thomasbiege2 is now known as thomasbiege | 17:58 | |
clarkb | jeblair: pleia2: Is that puppetted or do we just do it by hand? | 17:59 |
jeblair | clarkb: i don't think it's puppeted | 17:59 |
*** AJaeger has joined #openstack-infra | 18:00 | |
*** AJaeger has joined #openstack-infra | 18:00 | |
clarkb | jeblair: ya I don't see it in the site.pp node for review.o.o | 18:00 |
pleia2 | there is an open bug for sorting out gerrit's keys | 18:00 |
pleia2 | (I opened it recently) | 18:00 |
anteaya | so zuul and jenkins are still working on what they had, but since gerrit is down nothing new is being queued | 18:00 |
anteaya | now I see | 18:00 |
jeblair | i think i may need to restart gerrit again? | 18:01 |
jeblair | anteaya: gerrit is not down | 18:01 |
anteaya | oh sorry | 18:01 |
anteaya | restarted | 18:01 |
clarkb | jeblair: maybe? java likes to cache a lot of stuff including perhaps the known hosts file | 18:01 |
jeblair | i'm going to restart gerrit again and see if it picks up the known hosts changes | 18:01 |
pleia2 | https://bugs.launchpad.net/openstack-ci/+bug/1209464 for when someone is bored ;) | 18:02 |
uvirtbot | Launchpad bug 1209464 in openstack-ci "Start managing ~gerrit2/.ssh/ contents in puppet" [Undecided,New] | 18:02 |
jeblair | pleia2: ++ | 18:02 |
jeblair | [2013-08-22 18:03:29,807] ERROR com.google.gerrit.server.git.PushReplication : Cannot replicate to file:///var/lib/git/stackforge/python-ipmi.git; repository not found | 18:03 |
jeblair | that's slightly disturbing | 18:03 |
clarkb | jeblair: that was one of the projects that got renamed | 18:04 |
jeblair | both python-ipmi and pyghmi exist in gerrit's git repo dir | 18:04 |
jeblair | <sigh> | 18:04 |
clarkb | :/ | 18:04 |
*** p5ntangle has joined #openstack-infra | 18:06 | |
jeblair | ok, the db has no python-ipmi entries | 18:08 |
clarkb | so monty must've done a cp instead of a mv | 18:09 |
jeblair | there doesn't seem to be anything new in python-ipmi.... | 18:09 |
jeblair | wait, i wonder if manage_projects put it back | 18:10 |
jeblair | because it's actually quite old | 18:10 |
mtreinish | jeblair: quick question: do I need to do another reverify on: https://review.openstack.org/#/c/43175/ | 18:10 |
*** svarnau has joined #openstack-infra | 18:11 | |
mtreinish | because I don't see it in the gate pipeline | 18:11 |
clarkb | mtreinish: yes I think so | 18:11 |
jeblair | clarkb: nah, projects.yaml looks right; probably a cp then. so i'll stop gerrit and mv it out of the way | 18:11 |
clarkb | jeblair: ok | 18:11 |
mtreinish | clarkb: ok thanks | 18:11 |
jeblair | #status notice stopping gerrit to correct a stackforge project rename error | 18:12 |
openstackstatus | NOTICE: stopping gerrit to correct a stackforge project rename error | 18:12 |
*** dmakogon_ has joined #openstack-infra | 18:13 | |
jeblair | this may make zuul unhappy, it's in the middle of a gate reset | 18:13 |
clarkb | jeblair: up to you if you want to wait | 18:13 |
clarkb | I replication does appear to be happening for everything else | 18:13 |
jeblair | it is done | 18:14 |
*** mrodden has quit IRC | 18:14 | |
clarkb | http and git:// seem to be working on git01 but not https. looking into that now | 18:16 |
clarkb | pleia2: jeblair did you guys want to try cloning from the other hosts? | 18:16 |
jeblair | clarkb: can do | 18:16 |
jeblair | clarkb, pleia2: gerrit replication is still runinng | 18:17 |
ttx | lifeless: as long as it gets fixed sometimes in the next two months (and stay fixed), we should be good | 18:17 |
jeblair | we might want to wait until that finishes | 18:17 |
clarkb | jeblair: ok | 18:17 |
mtreinish | clarkb: do again now, because I did it right before the gerrit restart? | 18:18 |
clarkb | mtreinish: gerrit restart shouldn't affect you (zuul shouldget that event quick enough) | 18:18 |
mtreinish | clarkb: ok | 18:18 |
anteaya | 33 events in the zuul queue | 18:18 |
ttx | jeblair: the gate looks calmer today. Anything special you've done ? Just arrived | 18:19 |
anteaya | mtreinish: when zuul has 0 events, your patch should show up on the status page | 18:19 |
anteaya | ttx kept zuul running overnight | 18:19 |
mtreinish | anteaya: ok | 18:19 |
anteaya | zuul had a bug which jeblair fixed last night | 18:20 |
jeblair | ttx: i fixed a zuul bug last night (which was causing us to restart zuul a lot with nothing merging) | 18:20 |
* ttx is still trying to understand the patterns that govern gate load | 18:20 | |
*** melwitt has joined #openstack-infra | 18:20 | |
* anteaya too | 18:20 | |
jeblair | ttx: we're working on load balancing git.o.o so that we can serve git repos to all the jobs we need to run | 18:20 |
clarkb | error: gnutls_handshake() failed: A TLS warning alert has been received. while accessing https://git01.openstack.org:4443/openstack/nova/info/refs is what I got speaking https to git01 | 18:20 |
anteaya | ttx can't hurt to read this: https://bugs.launchpad.net/openstack-ci/+bug/1215522 | 18:21 |
ttx | jeblair: is that the new bottleneck ? | 18:21 |
uvirtbot | Launchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in status.openstack.org/zuul" [Undecided,New] | 18:21 |
ttx | anteaya: thx for the pointer | 18:21 |
anteaya | np | 18:21 |
*** zul has quit IRC | 18:21 | |
jeblair | ttx: yes; we're actually keeping our slave count artificially low to try to stress it less (but we still get occasional errors) | 18:21 |
jeblair | ttx: once that's scaled out, we should be able to run a lot more tests at once, which should help with backlogs | 18:22 |
jeblair | ttx: (there are a few jenkins errors we've encountered as well that we need to work around; that's next up) | 18:22 |
*** mrodden has joined #openstack-infra | 18:22 | |
ttx | jeblair: thx for the executive summary :) | 18:22 |
jeblair | ttx: np | 18:22 |
clarkb | jeblair: I think it is related to the hostname and the cert. GIT_SSL_NO_VERIFY isn't letting it though though as it happens in the handshake. Speaking directly to the ip works | 18:24 |
clarkb | I am going to test with a hacked up /etc/hosts | 18:24 |
*** erfanian has quit IRC | 18:24 | |
clarkb | hacked up /etc/hosts makes it better | 18:28 |
*** sarob has joined #openstack-infra | 18:29 | |
*** sarob has quit IRC | 18:31 | |
*** alexpilotti has joined #openstack-infra | 18:31 | |
*** sarob has joined #openstack-infra | 18:31 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add git01-git04 to cacti https://review.openstack.org/43325 | 18:32 |
clarkb | jeblair: and git.o.o too | 18:32 |
jeblair | clarkb: fungi did that yesterday (or the day before) | 18:32 |
clarkb | cool I missed that | 18:33 |
jeblair | clarkb: why don't you push that one through real quick? | 18:33 |
clarkb | sure | 18:33 |
jeblair | clarkb: yeah, it's telling | 18:33 |
clarkb | done | 18:33 |
jeblair | clarkb: it's how we knew we were cpu bound and not io or network | 18:33 |
clarkb | git01 is serving files over all three protocols. I just have to give it an IP address for https otherwise tls handshaking complains | 18:34 |
clarkb | jeblair: gotcha | 18:34 |
clarkb | I am going to test git02 now | 18:34 |
jeblair | ok, i'll test 3 and 4 with the ip | 18:34 |
*** mrmartin has joined #openstack-infra | 18:34 | |
jeblair | clarkb: you mean with etc hosts, right? | 18:35 |
jeblair | oh, or use the ip but set the no verify var? | 18:35 |
jeblair | yeah, that seems to work | 18:35 |
*** sarob has quit IRC | 18:35 | |
clarkb | jeblair: IP and no verify or you can verify and put the ip address in /etc/hosts for git.o.o | 18:37 |
*** krtaylor has joined #openstack-infra | 18:37 | |
jeblair | clarkb: a nova clone too real 2m33.517s | 18:38 |
jeblair | took | 18:38 |
jeblair | over https | 18:38 |
anteaya | 7 jobs in post, yay! | 18:38 |
clarkb | I got one Timeout waiting for output from CGI script /usr/libexec/git-core/git-http-backend on git02 | 18:39 |
clarkb | is that timeout something we can extend? | 18:39 |
jeblair | clarkb: why did you get a timeout from a server under no load? | 18:39 |
clarkb | jeblair: I am cloning http https and git:// concurrently so it has some load | 18:39 |
anteaya | look at all the merges in the last hour: https://tinyurl.com/m2skvhg | 18:40 |
anteaya | yay | 18:40 |
openstackgerrit | A change was merged to openstack-infra/config: Add git01-git04 to cacti https://review.openstack.org/43325 | 18:40 |
*** pabelanger has quit IRC | 18:40 | |
jeblair | oh, i was cloning from 02, sorry; i guess that could explain the time | 18:40 |
*** pblaho has joined #openstack-infra | 18:40 | |
jeblair | but no, 03 and 04 are taking forever too | 18:42 |
clarkb | jeblair: there is some delay as git has to do pack files and things | 18:43 |
clarkb | load doesn't look terrible on 03 | 18:43 |
jeblair | 03 took 2m15.276s | 18:43 |
jeblair | clarkb: i just started another clone | 18:43 |
jeblair | clarkb: i believe we were shooting for <1 min, yeah? | 18:44 |
clarkb | jeblair: yeah, but really only for git:// | 18:44 |
jeblair | clarkb: i did all of my tests with https; and this is on a precise node | 18:44 |
clarkb | oh I see | 18:45 |
jeblair | i think the refs aren't packed at all | 18:45 |
clarkb | jeblair: ya | 18:45 |
jeblair | so somehowe the git.o.o repos ended up with packed refs, but not these. i'm testing if that's the diff. | 18:45 |
* anteaya tries to pick the best time for her 1 hour afternoon walk | 18:46 | |
clarkb | 02 is 1:42 git clone nova over git protocol | 18:47 |
clarkb | nova repo on 02 has one pack file and a bunch of loose files | 18:47 |
clarkb | I think you are onto something with thattheory | 18:48 |
jeblair | clarkb: i'm looking at refs, not objects | 18:48 |
clarkb | ah | 18:48 |
*** thomasbiege has quit IRC | 18:48 | |
jeblair | clarkb: after a 'git gc' (which did both objects and refs), it's real 0m52.021s | 18:49 |
clarkb | jeblair: should we add a daily/weekly cronjob to git gc? | 18:51 |
jeblair | clarkb, pleia2: does cgit do repo maintenance, or do we have a cron defined? | 18:51 |
*** nayward has joined #openstack-infra | 18:51 | |
pleia2 | jeblair: it does not | 18:51 |
Alex_Gaynor | so in prepare_devstack.sh is there a reason we don't use --depth 1 | 18:51 |
jeblair | how did we end up with a packed repo state? | 18:51 |
pleia2 | jeblair: it's really just a web interface that accesses the repo, doesn't do much else | 18:51 |
pleia2 | jeblair: maybe that's how it's replicated? | 18:51 |
clarkb | Alex_Gaynor: there is a reason and I always forget what it is | 18:52 |
jeblair | Alex_Gaynor: that's used to build an image, then the full repo is available at basically no cost (which is useful because tests can run on any branch) | 18:52 |
Alex_Gaynor | jeblair: ah ok, so it's in an image, that was hte missing bit in my mind | 18:52 |
jeblair | Alex_Gaynor: yep. mordred was pointing out that in devstack-gate itself (in the wrap script) we could possibly be doing something smarter than 'git remote update' | 18:53 |
jeblair | Alex_Gaynor: but we need to be careful that whatever we change there doesn't transfer load to the zuul server (where the actual test refs are served) | 18:53 |
Alex_Gaynor | right | 18:54 |
jeblair | pleia2: the repos that were just replicated look just like the gerrit repos | 18:54 |
pleia2 | ah, hrm | 18:54 |
clarkb | maybe the cgit package comes with a cron to do it? | 18:55 |
jeblair | btw, https clone from review.o.o is real 1m0.056s | 18:56 |
jeblair | (using the local mirr,r not gerrit) | 18:56 |
clarkb | so we are on par with that | 18:57 |
jeblair | clarkb: _if_ we pack refs | 18:57 |
jeblair | on the mirror | 18:57 |
clarkb | ya | 18:57 |
jeblair | packed refs only (not a gc): real 0m46.005s | 18:58 |
*** hartsocks has joined #openstack-infra | 18:58 | |
jeblair | that's actually faster than the gc | 18:58 |
bnemec | I'm seeing a couple of changes that have no Jenkins score and aren't showing up on the status page. | 18:59 |
anteaya | the git-fetch-pack command allows you to specify <refs>: http://git-scm.com/docs/git-fetch-pack/1.8.3 | 18:59 |
bnemec | Should I go ahead and recheck them? | 18:59 |
anteaya | ah shoot - that is 1.8.3 | 18:59 |
anteaya | :( | 18:59 |
jeblair | bnemec: yep | 19:00 |
bnemec | jeblair: Okay, thanks. I didn't want to drive any extra load unnecessarily. | 19:00 |
anteaya | bnemec: more explaination here: https://bugs.launchpad.net/openstack-ci/+bug/1215522 | 19:00 |
uvirtbot | Launchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in status.openstack.org/zuul" [Undecided,New] | 19:00 |
clarkb | jeblair: speaking of zuul. Does the zuul process that is currently running catch SIGUSR2 properly? | 19:00 |
jeblair | clarkb: so i think we should 'git pack-refs --all' nightly on the mirrors | 19:00 |
dansmith | jeblair: I haven't been rechecking things much since a lot of things seem to be failing all tests due to package fetch timeouts or something like that | 19:00 |
jeblair | clarkb: yes, i restarted with both of those changes | 19:00 |
dansmith | jeblair: is that just my imagination? | 19:01 |
bnemec | jeblair: Oh, that's embarrassing. I even saw that link earlier. | 19:01 |
clarkb | jeblair: I agree | 19:01 |
jeblair | dansmith: nope, we're working on that now | 19:01 |
anteaya | dansmith: no, that is the git issue we are working on | 19:01 |
anteaya | dansmith: not your imagination | 19:01 |
anteaya | bnemec: no worries | 19:01 |
jeblair | clarkb: i'll write that change real quick? | 19:01 |
dansmith | okay, I figured, but also figured more rechecks weren't likely to help :) | 19:02 |
clarkb | jeblair: go for it | 19:02 |
*** sarob has joined #openstack-infra | 19:02 | |
clarkb | jeblair: base it atop my haproxy change | 19:02 |
*** lbragstad has left #openstack-infra | 19:02 | |
anteaya | dansmith: not right now, but you are free to spin the wheel and take your chances like everyone else | 19:02 |
clarkb | jeblair: so that we can continue using the development env until we actually turn haproxy on | 19:02 |
dansmith | anteaya: hah, okay :P | 19:02 |
anteaya | :D | 19:02 |
jeblair | clarkb: i think later we may want to swing back around and look into using a newer git on these servers | 19:02 |
clarkb | jeblair: ++ | 19:02 |
jeblair | clarkb: because perhaps the newer git can deal with unpacked refs better | 19:02 |
jeblair | clarkb: but i think i'm fine with packed refs in a mirror | 19:03 |
clarkb | jeblair: any idea how packed refs like that will affect fetches of a few refs? | 19:03 |
clarkb | does git unpack them and give you just what you want? | 19:03 |
jeblair | clarkb: it's just the list of refs | 19:03 |
clarkb | oh right you are packing refs. I keeping thinking objects | 19:04 |
jeblair | clarkb: for use when git advertises what refs it has | 19:04 |
clarkb | objects != refs and I need to beat that into my brain | 19:04 |
clarkb | I am going to find some food really quick. I smell it so I won't be gone long | 19:04 |
anteaya | happy food clarkb | 19:04 |
pleia2 | yes, lunch | 19:05 |
reed | get good food | 19:05 |
anteaya | happy lunch pleia2 | 19:05 |
mrmartin | jeblair: if you have some free minutes, please review https://review.openstack.org/#/c/42608/ it is blocking task in the groups portal. thnx! | 19:05 |
reed | what's the current estimate for this patch to land somewhere? https://bugs.launchpad.net/horizon/+bug/1179526 | 19:05 |
uvirtbot | Launchpad bug 1179526 in horizon "source_lang in Horizon repo is overwritten by Transifex" [High,Confirmed] | 19:05 |
reed | no, not that | 19:05 |
reed | this one https://review.openstack.org/#/c/42608/ | 19:06 |
anteaya | is it in the queue, reed? | 19:06 |
reed | anteaya, waiting for review | 19:06 |
anteaya | sorry, no it isn't - I'm focused on queue questions, sorry | 19:06 |
jeblair | mrmartin: this week is very unusual -- we're having a lot of load problems because we have a feature freeze this week, and we only have 2 infra team members working | 19:07 |
jeblair | mrmartin: as soon as we have things working reliably again, i will review that patch and reed's as well | 19:07 |
mrmartin | jeblair: ok, maybe on monday? | 19:07 |
anteaya | both linked to the same patch | 19:07 |
jeblair | mrmartin: certainly by monday | 19:08 |
jeblair | mrmartin: did you see the instructions for running that locally on a test server? | 19:08 |
mrmartin | jeblair, I tested it in a local vm | 19:08 |
jeblair | mrmartin: even if we haven't merged that and launched the real server yet, i wanted to make sure you can work on it locally | 19:08 |
jeblair | mrmartin: okay, great | 19:08 |
mrmartin | was working in the test env, but you know, it doesn't mean that everything will be perfect on prod :D | 19:09 |
anteaya | true, but it is a very good start | 19:09 |
jeblair | yep :) | 19:09 |
*** CaptTofu_ has quit IRC | 19:09 | |
*** CaptTofu has joined #openstack-infra | 19:10 | |
anteaya | zuul reports 0 events, yay | 19:10 |
*** ^demon is now known as ^demon|lunch | 19:10 | |
*** sarob has quit IRC | 19:10 | |
anteaya | 24 gate, 1 post, 53 check | 19:10 |
anteaya | almost manageable again | 19:11 |
*** wenlock has quit IRC | 19:11 | |
*** CaptTofu has quit IRC | 19:14 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add a mirror repack cron to git servers https://review.openstack.org/43331 | 19:15 |
jeblair | I'm going to run the repack on all of the git servers, then eat. | 19:15 |
hartsocks | Hi. I think my account on review.openstack.org is screwed up. Is this the correct channel for that? | 19:16 |
clarkb | hartsocks: yes, what problem are you seeing | 19:17 |
clarkb | the food is a lie. I need to wait a little longer on it | 19:17 |
anteaya | jeblair: sounds good | 19:17 |
anteaya | clarkb: k | 19:17 |
hartsocks | clarkb: Example: https://review.openstack.org/#/c/43266/ | 19:18 |
hartsocks | The review system has decided I don't own about half my patches. Sometimes I show up as "hartsock" and other times as "hartsocks" I don't know why. Who do I ask about this? I've tried to get help on the mailing list in the past. | 19:18 |
anteaya | clarkb: does the db have both a hartsock and a hartsocks? | 19:18 |
clarkb | anteaya: checking | 19:18 |
hartsocks | my preference would be to fold everything into 'hartsocks' | 19:19 |
anteaya | hartsocks: great | 19:19 |
clarkb | hartsocks: yeah you have two accounts | 19:20 |
hartsocks | can you just fold everything to hartsocks? | 19:20 |
anteaya | hartsocks do you have any repos with connections to gerrit? you might need to delete the remote branch and create a new remote branch to gerrit with `git review -s` | 19:21 |
anteaya | to ensure you don't have any headed for hartsock | 19:21 |
hartsocks | okay. | 19:21 |
reed | stupid mailman | 19:21 |
*** thomasbiege has joined #openstack-infra | 19:21 | |
clarkb | hartsocks: first a little background on why this appears to have happened | 19:21 |
clarkb | hartsocks: you have logged into gerrit with two different launchpad accounts | 19:22 |
anteaya | reed an app called mailman or the human being holding your mail? | 19:22 |
clarkb | hartsocks: and if you push code with two different usernames changes will be attached to two different accounts | 19:22 |
hartsocks | clarkb: whoops :-/ | 19:22 |
clarkb | hartsocks: if you want to be hartsocks you should login with your vmware launchpad account | 19:22 |
reed | anteaya, the python code that delivers email | 19:22 |
hartsocks | clarkb: The only actions have been git actions that seem to screw up. | 19:23 |
clarkb | actually I take that back both acocunts see acm and vmware email | 19:23 |
anteaya | reed: ah okay, stupid python code that delivers email | 19:23 |
*** SergeyLukjanov has quit IRC | 19:23 | |
hartsocks | clarkb: I must have a git repo that was set up 'hartsock' | 19:23 |
clarkb | hartsocks: that will do it | 19:23 |
hartsocks | clarkb: I will go through them all and make sure they are 'hartsocks' | 19:23 |
clarkb | hartsocks: you can set gitreview.username in your global git config to set it globally | 19:23 |
clarkb | hartsocks: then make sure you don't have any local overrides | 19:24 |
hartsocks | clarkb: I'm guessing that's in .git/config locally | 19:24 |
anteaya | hartsocks: yes | 19:24 |
clarkb | hartsocks: ~/.gitconfig but setting it with the git config command is preferred. `git config --global gitreview.username hartsocks` | 19:24 |
reed | pleia2, mordred, jeblair: when you approve my message to infra mlist please whitelist also stefano+infra@openstack as allowed email | 19:25 |
hartsocks | clarkb: thanks | 19:25 |
pleia2 | reed: will do, sec | 19:25 |
clarkb | hartsocks: rolling stuff under the other name into hartstocks is probably possible, but this is a busy week and if you can live with those being wrong until they get merged or die that would probably be easiest | 19:25 |
clarkb | I am also not sure if we have updated changes and the like in the past | 19:26 |
clarkb | may not be possible | 19:26 |
hartsocks | clarkb: now that I know what's going on I can live with that for a while. | 19:26 |
hartsocks | clarkb: just want my karma points that's all :-) | 19:26 |
anteaya | my local .git changes are in .git/config and I put them there with the git config command | 19:26 |
*** thomasbiege has quit IRC | 19:27 | |
hartsocks | clarkb: (I know the points don't matter.) | 19:27 |
anteaya | just like on Whose Line | 19:27 |
*** ruhe has quit IRC | 19:27 | |
*** xBsd has joined #openstack-infra | 19:28 | |
bnemec | Dibs on being OpenStack's Ryan Stiles. | 19:29 |
bnemec | I've even got the requisite height. ;-) | 19:30 |
anteaya | perfect | 19:30 |
anteaya | as a Canadian I'd like to try for Colin Mocherie | 19:30 |
anteaya | but my gender might be a hinderance | 19:30 |
anteaya | and I'm not bald | 19:30 |
bnemec | It's all good - half the time they had him playing female characters anyway. ;-) | 19:31 |
bnemec | Although the inability to make bald jokes would definitely be a problem. :-D | 19:31 |
anteaya | I can wear a swim cap | 19:32 |
*** vipul is now known as vipul-away | 19:32 | |
anteaya | I'm out sick for the richard simmons episode though | 19:32 |
bnemec | Bah, what fun is that? :-P | 19:33 |
anteaya | it's all you Ryan | 19:33 |
anteaya | pleia2: are you still lunching? | 19:34 |
anteaya | I'm trying to find a space for some exercise | 19:34 |
pleia2 | anteaya: I'm back-ish :) | 19:35 |
anteaya | I can wait | 19:35 |
anteaya | let me know when you are back | 19:35 |
*** xBsd has quit IRC | 19:35 | |
*** HenryG has quit IRC | 19:36 | |
pleia2 | anteaya: I'm back | 19:36 |
anteaya | okay great | 19:36 |
*** sarob has joined #openstack-infra | 19:37 | |
anteaya | thanks, off for a walk I expect to be back in an hour | 19:37 |
pleia2 | enjoy | 19:37 |
*** yolanda has quit IRC | 19:38 | |
*** sdake_ has quit IRC | 19:39 | |
jeblair | okay, pack-refs has completed on all the git servers | 19:40 |
*** sarob has quit IRC | 19:41 | |
jeblair | real 0m40.868s | 19:42 |
jeblair | clone time for nova on 03 | 19:42 |
*** emagana has joined #openstack-infra | 19:43 | |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 19:43 |
clarkb | jeblair: nice | 19:43 |
* clarkb reviews the change to put that in place everywhere | 19:43 | |
jeblair | clarkb: are you back and ready to proceed, or killing time around lunch? | 19:44 |
clarkb | jeblair: I should technically kill more time around lunch because the food I smell hasn't made it to the scavenging grounds yet | 19:45 |
clarkb | jeblair: but I am also impatient. I think we should continue if you don't need more time for food | 19:46 |
*** wenlock has joined #openstack-infra | 19:47 | |
clarkb | jeblair: I am fetching your cron change into the puppet development env | 19:47 |
jeblair | clarkb: we can wait, i think we're getting to the point where we don't want to be interrupted | 19:47 |
clarkb | jeblair: ok | 19:47 |
clarkb | I will pull the change into that repo and possibly just find a sandwich | 19:48 |
clarkb | to speed things along | 19:48 |
*** p5ntangle has quit IRC | 19:48 | |
clarkb | jeblair: two things to note before I afk for a few minutes. The haproxy git:// queue and conn numbers may need changing and we may need to change the default balance type to source to accomodate lag in replication across the different servers | 19:49 |
*** p5ntangle has joined #openstack-infra | 19:49 | |
clarkb | jeblair: the current balance method is round robin and git http by default can open up to five connections. | 19:49 |
*** SergeyLukjanov has joined #openstack-infra | 19:51 | |
jeblair | k | 19:52 |
jeblair | clarkb: we have cacti graphs for 01-04 | 19:53 |
*** gyee has quit IRC | 19:57 | |
*** CaptTofu has joined #openstack-infra | 19:58 | |
*** sarob has joined #openstack-infra | 20:00 | |
*** pcm_ has quit IRC | 20:00 | |
*** nayward has quit IRC | 20:01 | |
*** dina_belova has quit IRC | 20:03 | |
*** dina_belova has joined #openstack-infra | 20:04 | |
*** sdake_ has joined #openstack-infra | 20:06 | |
*** sdake_ has quit IRC | 20:06 | |
*** sdake_ has joined #openstack-infra | 20:06 | |
*** nati_uen_ has joined #openstack-infra | 20:09 | |
clarkb | jeblair: woot. | 20:09 |
clarkb | jeblair: sandwich was good. ready whenever you are | 20:09 |
jeblair | 1 sec | 20:09 |
jeblair | k | 20:10 |
jeblair | so shall we merge the git:// change now? | 20:11 |
*** p5ntangle has quit IRC | 20:11 | |
jeblair | clarkb: i'll let you do that since you haven't reviewed it | 20:11 |
jeblair | https://review.openstack.org/#/c/43315/ | 20:11 |
clarkb | jeblair: the gate is undergoing a reset. should we wait a little bit for that? | 20:11 |
clarkb | or just power through? | 20:11 |
*** nati_ueno has quit IRC | 20:12 | |
jeblair | clarkb: power through | 20:13 |
jeblair | it'll be done by the time that gets merged | 20:13 |
clarkb | ok merging 43315 now | 20:13 |
clarkb | s/merging/approving/ | 20:13 |
*** ^demon|lunch is now known as ^d | 20:13 | |
clarkb | the zuul results queue is large again | 20:15 |
clarkb | but that may just be a side effect of cancelling a bunch of stuff | 20:15 |
jeblair | clarkb: yep | 20:15 |
*** lcestari has quit IRC | 20:22 | |
*** dina_belova has quit IRC | 20:25 | |
*** SergeyLukjanov has quit IRC | 20:25 | |
*** vipul-away is now known as vipul | 20:25 | |
*** jbjohnso has joined #openstack-infra | 20:26 | |
*** nati_uen_ has quit IRC | 20:26 | |
*** danger_fo_away is now known as danger_fo | 20:27 | |
clarkb | jeblair: still waiting to get queued. Should I go ahead and force submit the change? | 20:27 |
jeblair | clarkb: yeah, it's about 2/3 through reconfiguring the reset. let's not wait. | 20:28 |
anteaya | back | 20:28 |
openstackgerrit | A change was merged to openstack-infra/config: Switch ggp to use git:// https://review.openstack.org/43315 | 20:29 |
clarkb | jeblair: I am going to run a puppet agent --noop | 20:29 |
*** sarob_ has joined #openstack-infra | 20:29 | |
clarkb | as a quick sanity check but then we should be ready to apply the haproxy stuff to git.o.o | 20:30 |
clarkb | jeblair: it looks clean to me. should I go ahead and run puppet for real? are you ready? | 20:32 |
*** sarob_ has quit IRC | 20:32 | |
*** sarob_ has joined #openstack-infra | 20:32 | |
*** sarob has quit IRC | 20:33 | |
jeblair | clarkb: yep | 20:33 |
* clarkb pushes the go button | 20:34 | |
clarkb | jeblair: can I have you check the ip6tables rules after puppet is one on git.o.o? I noticed some weirdness there yesterday and think our iptables module may not be completely happy on centos | 20:34 |
jeblair | k | 20:34 |
clarkb | puppet is still running. I will let you know when to check | 20:35 |
jeblair | clarkb: what was weird? | 20:37 |
clarkb | jeblair: it didn't pick up the new 4443 29418 and 8080 rules. but I kicked it by hand and that seemed to work | 20:37 |
jeblair | that seems to be the case again. probably a puppet bug | 20:37 |
jeblair | clarkb: how's that puppet run? | 20:38 |
jeblair | we're starting to fail jobs | 20:38 |
clarkb | puppet is done running. haproxy is up | 20:39 |
* clarkb checks a local clone really fast | 20:39 | |
clarkb | local clone of nova via git:// works | 20:39 |
anteaya | yay | 20:39 |
jeblair | i just did an https clone from home | 20:39 |
jeblair | worked | 20:39 |
* clarkb looks in the haproxy log for anything crazy looking | 20:40 | |
jeblair | (i'm cloning zuul, not nova though so i don't impact the server) | 20:40 |
jeblair | git and http work too | 20:40 |
jeblair | clarkb: how do we examine haproxy state? | 20:41 |
clarkb | jeblair: /var/log/haproxy.log | 20:41 |
clarkb | jeblair: is the log | 20:41 |
pleia2 | git is still running from xinetd, right? | 20:41 |
clarkb | I think it opens a socket somewhere that you can talk directly to asw well /me finds that | 20:41 |
jeblair | any way to see the current connection count, distributions | 20:41 |
clarkb | pleia2: no this includes your daemon change | 20:41 |
pleia2 | clarkb: oh ok, great | 20:41 |
clarkb | jeblair: good question. I am looking for that socket now | 20:42 |
*** pblaho has quit IRC | 20:42 | |
clarkb | jeblair: http://code.google.com/p/haproxy-docs/wiki/UnixSocketCommands on /var/lib/haproxy/stats | 20:43 |
jeblair | pleia2: would you mind writing a change to add the 'socat' package to the git servers? | 20:44 |
*** woodspa has quit IRC | 20:44 | |
jeblair | i installed it manually on git.o.o | 20:44 |
pleia2 | jeblair: sure, on it | 20:44 |
clarkb | jeblair: out of curiousity what command(s) are you running against that socket? | 20:44 |
*** danger_fo is now known as danger_fo_away | 20:45 | |
jeblair | clarkb: https://etherpad.openstack.org/SIzEkjfC1R | 20:45 |
jeblair | clarkb: this looks useful http://joeandmotorboat.com/2009/08/20/haproxy-stats-socket-and-fun-with-socat/ | 20:46 |
jeblair | (i've pasted in more output into the etherpad) | 20:47 |
clarkb | thanks | 20:47 |
psedlak | hi, how is the issue with python-*client dependency collisions solved for stable/grizzly branch? i've tried to get similar env at my machine and nova failed to start at all due to wrong versions of keystoneclient ... :/ | 20:49 |
psedlak | *similar env as gate-devstack-tempest-vm-full for stable/grizzly | 20:49 |
*** jjmb has joined #openstack-infra | 20:49 | |
openstackgerrit | Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add socat package to cgit servers https://review.openstack.org/43354 | 20:50 |
anteaya | psedlak: we are doing a little internal work now, and the ones able to answer your question need to focus on their fix right now | 20:51 |
*** dkranz has quit IRC | 20:51 | |
anteaya | psedlak: if you have a link to a bug report or patch I can look at it, if you want | 20:51 |
clarkb | jeblair: are you seeing any rampant failure? | 20:51 |
clarkb | jeblair: best as I can tell we are mostly up | 20:51 |
psedlak | anteaya: you mean it's not best time for it now ... should i ask later/tomorrow? | 20:52 |
anteaya | you can try | 20:52 |
anteaya | have you tried in -dev or -nova yet? | 20:52 |
anteaya | the keystone folks hang out in -dev | 20:52 |
jeblair | clarkb: nope, afaict, we seem to be distributing across all servers | 20:52 |
jeblair | echo "show errors" |socat stdio /var/lib/haproxy/stats | 20:52 |
jeblair | is empty | 20:52 |
*** jjmb1 has joined #openstack-infra | 20:53 | |
psedlak | anteaya: no, not yet as on gate it reinstalls (at least keystoneclient, but maybe also others) multiple times during setup (devstack) ... and there are clearly incompatible reqs http://bpaste.net/show/vnYioO66WaD27IC7C1dh/ | 20:53 |
jeblair | clarkb: since we broke the gate queue during the hup, that actually stopped the gate reset | 20:54 |
jeblair | psedlak: in master, we are now forcing the requirements specified in openstack/requirements to be installed | 20:54 |
clarkb | jeblair: there are a handleful of could not get file errors due to the lack of no .git to .git translation but far fewer than we seemed to have in the past | 20:54 |
anteaya | psedlak: yeah, let's move to -dev and see if some -qa folks are around | 20:54 |
*** jjmb has quit IRC | 20:54 | |
jeblair | psedlak: i believe devstack has code to do that; i'm not sure if all of that has been backported to grizzly yet, but it's under consideration at least (if it hasn't been done) | 20:54 |
jeblair | psedlak: you might ask dtroyer | 20:55 |
jeblair | clarkb: it's possible those errors are due to a smart http request failing during the hup | 20:55 |
psedlak | jeblair: ok, thanks | 20:55 |
clarkb | jeblair: that is possible | 20:56 |
jeblair | clarkb: maybe give it a few mins and if they continue, start to worry? :) | 20:56 |
* jeblair adds a new tree to cacti | 20:56 | |
clarkb | jeblair: can you add one for logstash + elasticsearch if you are collapsing things together? | 20:57 |
*** thomasbiege has joined #openstack-infra | 20:57 | |
jeblair | clarkb: let me do that later; i'm just going to add a quick collection of graphs for git now; later i'll add a single graph that graphs multiple hosts; i'll do logstash then too | 20:57 |
clarkb | ok wfm | 20:57 |
*** thomasbiege has quit IRC | 20:57 | |
*** mrmartin has quit IRC | 20:58 | |
*** apcruz has quit IRC | 20:59 | |
jeblair | http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=2 | 21:00 |
anteaya | wow | 21:01 |
anteaya | a few cpus finally taking a smoke break | 21:01 |
*** gyee has joined #openstack-infra | 21:01 | |
jeblair | anteaya: heh, ie, smoking less? | 21:01 |
anteaya | yeah, taking a break from smoking | 21:02 |
*** hartsocks has left #openstack-infra | 21:02 | |
anteaya | what a difference | 21:02 |
anteaya | the greens are so similar, they aren't even in nice, they went right to idle | 21:03 |
jeblair | anteaya: you should be able to tell the difference on the graph; if you look at zuul, youll see nice time. | 21:03 |
*** CaptTofu has quit IRC | 21:04 | |
jeblair | but yeah, nothing is nice here | 21:04 |
clarkb | I think the gate reset is part of it | 21:04 |
*** jjmb1 has quit IRC | 21:04 | |
anteaya | yeah look at those idle numbers | 21:04 |
jeblair | clarkb: yes, git is basically idle now | 21:04 |
anteaya | clarkb: okay, I'll see if I can see what happens on a gate reset | 21:04 |
anteaya | w00t | 21:04 |
*** CaptTofu has joined #openstack-infra | 21:04 | |
jeblair | we're back at one job at a time until the next reset | 21:05 |
anteaya | one job? one what kind of job - one git clone job? | 21:06 |
anteaya | looks like a gate reset coming up | 21:07 |
jeblair | anteaya: well, we don't usually clone things, but yes, since all nodes are now occupied, they will each just pick up a new jenkins job (which will perform some git action) one at a time as they finish | 21:07 |
anteaya | ah okay, I think I understand | 21:07 |
jeblair | anteaya: without a new error, we're 17 minutes away from a gate reset | 21:07 |
*** sarob_ has quit IRC | 21:08 | |
jeblair | openstack/nova 42435,7 is the first change with an failed job in the gate (at the moment) | 21:08 |
anteaya | okay, can I see that on status.openstack.org/zuul? | 21:08 |
anteaya | ah okay | 21:08 |
*** sarob has joined #openstack-infra | 21:08 | |
*** CaptTofu has quit IRC | 21:08 | |
anteaya | right, a failed voting job | 21:09 |
anteaya | I see it | 21:09 |
*** changbl has quit IRC | 21:10 | |
anteaya | what is expected to happen at the next gate reset? | 21:10 |
jeblair | anteaya: zuul will cancel any running jobs in the gate queue which will free many jenkins slaves at once to immediately start running new gate jobs which will stress the git server | 21:11 |
anteaya | ah ha | 21:11 |
anteaya | then we will see what happens | 21:11 |
jeblair | then we'll see how the load-balanced server performs under our current load | 21:11 |
jeblair | if it performs well, we can add more nodes; if it does not, we can add more git servers | 21:11 |
anteaya | okay | 21:11 |
anteaya | so 13 minutes of downtime for you | 21:12 |
anteaya | or maybe a smoke break? | 21:12 |
anteaya | or maybe not quite the stress load | 21:12 |
*** sarob_ has joined #openstack-infra | 21:12 | |
*** sarob has quit IRC | 21:13 | |
anteaya | 8 minutes | 21:13 |
*** sarob_ has quit IRC | 21:13 | |
jeblair | well, perhaps a few minutes to switch to the other desktop and check in on the nodepool change i'm working on | 21:13 |
*** sarob has joined #openstack-infra | 21:13 | |
clarkb | we might also need to tune the maxconn settings for git:// | 21:13 |
anteaya | jeblair: :D | 21:13 |
jeblair | clarkb: this is one of those times i miss gerritbot reading merges in dev | 21:13 |
clarkb | jeblair: ya | 21:13 |
anteaya | like a mini vacation | 21:14 |
clarkb | jeblair: I have been watching the post queue for that info now | 21:14 |
jeblair | clarkb: we just merged 13 changes in the past 8 minutes | 21:14 |
clarkb | nice | 21:14 |
anteaya | here is a graph of merged changes: http://graphite.openstack.org/graphlot/?width=586&height=308&_salt=1377178709.576&target=stats.gerrit.event.change-merged | 21:14 |
clarkb | fatal: git upload-pack: not our ref 39f1e9314ee28eed74cdaf3c447fc32a64e76f45 multi_ack_detailed side-band-64k thin-pack no-progress include-tag ofs-delta | 21:15 |
clarkb | I think ^ may be related to non atomic mirror replication | 21:15 |
* clarkb looks in the error log of the other servers | 21:15 | |
jeblair | clarkb: ya, where'd you see it? | 21:15 |
clarkb | jeblair: that is on git.o.o and git02 has a couple as well | 21:16 |
clarkb | git01 is clean | 21:16 |
clarkb | 03 is clean | 21:16 |
clarkb | 04 as well | 21:16 |
clarkb | so not common at least not under heavy load | 21:17 |
anteaya | 3 minutes to gate reset | 21:17 |
clarkb | jeblair: we can try switching to source balancing which may suck with the d-g slaves as they are all in similar network space, or add retries to our git stuff | 21:17 |
jeblair | clarkb: what's the mask on source balancing? | 21:18 |
jeblair | clarkb: why not go with the full 32? | 21:18 |
clarkb | jeblair: I don't know that you can provide the mask | 21:19 |
clarkb | I will look into it more closely | 21:19 |
anteaya | gate is resetting | 21:19 |
*** AJaeger has quit IRC | 21:19 | |
clarkb | jeblair: also note that reload in the haproxy init script should be mostly invisible to the clients | 21:19 |
jeblair | clarkb: excellent | 21:20 |
*** vipul is now known as vipul-away | 21:22 | |
*** nati_ueno has joined #openstack-infra | 21:22 | |
*** boris-42 has quit IRC | 21:23 | |
openstackgerrit | A change was merged to openstack/requirements: Allow use of oslo.messaging 1.2.0a10 https://review.openstack.org/43060 | 21:23 |
openstackgerrit | @Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248 | 21:23 |
jeblair | clarkb: lhttp://code.google.com/p/haproxy-docs/wiki/balance makes it look like like it considers the whole ip | 21:23 |
clarkb | jeblair: yeah I am beginning to think that too. Looking in the source they use a hash over 32bit space with good distribution | 21:24 |
clarkb | (accordng to the comments anyways) | 21:25 |
clarkb | let me see if we can make the change with puppet (depends on whether or not it uses reload vs restart) | 21:25 |
*** dina_belova has joined #openstack-infra | 21:25 | |
jeblair | gate just reset | 21:26 |
clarkb | might be a little while before we reenable puppet though so I am open to doing it by hand if you want to get it in | 21:26 |
jeblair | clarkb: may as well see how this reset goes, no rush | 21:26 |
clarkb | ok | 21:26 |
anteaya | when the patches in the gate pipeline change to unknown - that is the indicator that the gate is reset? | 21:27 |
jeblair | oh, this is still markmc's chain, so it has to kick a bunch of changes out first before it actually starts jobs | 21:27 |
jeblair | anteaya: there are no running jobs currently, it has canceled everything and is recomputing the new proposed series to merge | 21:27 |
anteaya | okay, how do I see that using the status page, cacti and graphite? | 21:28 |
anteaya | or can I? | 21:28 |
*** dina_belova has quit IRC | 21:28 | |
jeblair | anteaya: the status page; if you look at the gate queue, you should see that nothing has started running yet | 21:28 |
anteaya | right, but all the old jobs with any logs are no longer in the queue | 21:29 |
anteaya | so that can be my indicator | 21:29 |
*** dprince has quit IRC | 21:30 | |
*** krtaylor has quit IRC | 21:31 | |
*** krtaylor has joined #openstack-infra | 21:32 | |
jeblair | ok it's starting jobs now | 21:35 |
anteaya | yes I see that | 21:35 |
*** dina_belova has joined #openstack-infra | 21:35 | |
anteaya | and cpu usage for user on the git server is 1.7 | 21:36 |
anteaya | I don't see a spike | 21:36 |
jeblair | anteaya: there's a 5 minute polling interval on graphite | 21:36 |
anteaya | ah ha | 21:36 |
jeblair | s/graphite/cacti/ | 21:36 |
anteaya | I'll check back in 5+ minutes | 21:36 |
*** mriedem has quit IRC | 21:36 | |
anteaya | time for toast | 21:37 |
*** dina_belova has quit IRC | 21:40 | |
anteaya | so the jobs that stress the git server are any job with devstack in it, correct? | 21:41 |
clarkb | anteaya: they are the worst offenders | 21:41 |
anteaya | ah okay | 21:41 |
*** vipul-away is now known as vipul | 21:42 | |
anteaya | so far on my cacti graph user is up to 4 | 21:42 |
anteaya | with idle at 92.8 | 21:43 |
anteaya | nice ratio | 21:43 |
clarkb | I am not seeing terrible load average on the individual servers | 21:43 |
anteaya | yay | 21:43 |
anteaya | any numbers for the etherpad? | 21:43 |
clarkb | not yet. I am not sure that the full wave has hit us yet | 21:44 |
*** pblaho has joined #openstack-infra | 21:44 | |
anteaya | okay | 21:44 |
anteaya | but good early results | 21:44 |
clarkb | load average: 0.39, 0.45, 0.43 on git.o.o these numbers are on cacti too | 21:44 |
* anteaya scrolls down | 21:45 | |
*** ftcjeff has quit IRC | 21:45 | |
anteaya | toast | 21:45 |
*** Ryan_Lane has quit IRC | 21:47 | |
*** Ryan_Lane has joined #openstack-infra | 21:47 | |
jeblair | clarkb: i have a disturbing thought; what if nova was the only repo on git.o.o that had packed refs? | 21:48 |
clarkb | oh | 21:48 |
clarkb | jeblair: hahahahaha | 21:48 |
clarkb | well it seems happy now in any case :) | 21:49 |
jeblair | clarkb: if that graph holds, then the inflection point of load dropping on git.o.o is much closer to the point where i ran pack-refs than when we started the other servers | 21:49 |
clarkb | jeblair: makes sense | 21:49 |
clarkb | if that is the case we can always scale back the additonal nodes | 21:50 |
jeblair | well, if so, maybe we can just throw more load at it sooner. :) | 21:50 |
clarkb | or that | 21:50 |
jeblair | i see a lot of graphs on the status page that should have passed the point of git errors by now; and there are basically no git connections | 21:51 |
jeblair | so i think we've seen as much 'rush' from this reset as we're going to | 21:51 |
anteaya | yay | 21:51 |
clarkb | jeblair: I agree. I do however think we should switch to source balance method | 21:51 |
jeblair | clarkb: yep, let's do it now before the next rush? | 21:52 |
jeblair | clarkb: and then perhaps unstick az2 and give nodepool its reigns again? | 21:52 |
clarkb | ya I am checking puppet now and if puppet is sane will do it with puppet and if it isn't sane will do it by hand and update puppet | 21:52 |
clarkb | jeblair: ++ | 21:52 |
clarkb | looks like it iwll use restart | 21:53 |
clarkb | https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/init.pp#L132 | 21:53 |
jeblair | grumble | 21:53 |
clarkb | I will edit the file by hand, reload haproxy then do puppet so puppet doesn't see the change | 21:54 |
jeblair | is it haproxy.cfg? | 21:54 |
clarkb | jeblair: yes | 21:54 |
clarkb | in /etc/haproxy/ | 21:54 |
*** dmakogon_ has quit IRC | 21:54 | |
jeblair | mgagne: want to write a puppet patch? | 21:55 |
mgagne | jeblair: go on | 21:55 |
*** ^d has quit IRC | 21:55 | |
jeblair | mgagne: it would be cool if changes to haproxy.cfg could run '/etc/init.d/haproxy reload' instead of 'restart' in the puppetlabs haproxy module https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/init.pp#L132 | 21:55 |
jeblair | clarkb: do you think that makes sense? or are the times when you'd want to reload vs restart significant enough that there isn't a clear winner? | 21:56 |
*** jjmb has joined #openstack-infra | 21:56 | |
anteaya | currently no failures in the gate queue/pipeline | 21:57 |
jeblair | mgagne: restart is disruptive to clients, reload is not, and you do things like 'add new backend servers' by editing that file | 21:57 |
mgagne | jeblair: are you suggesting setting $manage_service to false and handling the definition of the haproxy service within the node manifest? | 21:57 |
*** mrodden has quit IRC | 21:58 | |
clarkb | I think that makes sense. But I don't know enough about haproxy to know if one is preferred over the other in some instances | 21:58 |
jeblair | mgagne: well, either that, or make the puppetlabs module better; clarkb what do you think? | 21:59 |
*** sdake_ has quit IRC | 21:59 | |
mgagne | jeblair: according to my coworker, reload is preferred. If restart is used and config contains an error, you are screwed, haproxy won't restart. restart will kill all the connections, reload won't. | 22:00 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Use the haproxy source balance method. https://review.openstack.org/43359 | 22:00 |
clarkb | mgagne: yeah that is why we want reload. it should be much more invisible to end users | 22:01 |
clarkb | I wrote 43359 so I can see what the puppet concat diff looks like before modifying the file | 22:01 |
clarkb | jeblair: once I have reloaded haproxy we should merge these puppet changes | 22:01 |
mgagne | jeblair: depends on your urgency: designing and proposing a patch, having it accepted, releasing on forge won't happen in one day | 22:01 |
*** prad_ has quit IRC | 22:02 | |
clarkb | mgagne: understood. we will work around it now. But it is something that will probably end up being desirable to us and others | 22:02 |
clarkb | at the very least I suppose I hsould open a bug with puppetlabs | 22:02 |
mgagne | clarkb: yes, bodepd could use his contact to fast-forward the patch ;) | 22:02 |
jeblair | clarkb: also, hunner looks like he's involved in that | 22:03 |
mgagne | clarkb: it will be useful to us too as we are dealing with haproxy tuning atm | 22:03 |
clarkb | oh I could just bug hunner | 22:03 |
anteaya | the entire gate queue/pipeline has some test jobs running, so far no failures | 22:03 |
mgagne | clarkb: yes, hunner is the man | 22:03 |
clarkb | mgagne: are you puppetconfing? | 22:03 |
anteaya | first failure is on the last (27th) patch | 22:04 |
mgagne | clarkb: could you make the scope of your question smaller? :D | 22:04 |
jeblair | anteaya: and it's a real test question | 22:04 |
jeblair | anteaya: and it's a real test failure | 22:04 |
* jeblair just writes what he reads | 22:04 | |
anteaya | yes, a voting job | 22:04 |
mtreinish | anteaya: does that include the testr-full jobs too? | 22:04 |
clarkb | mgagne: are you at the conference? | 22:04 |
clarkb | a bunch of folks are there | 22:05 |
mgagne | clarkb: not me =) | 22:05 |
clarkb | just curious if you were part of the bunch | 22:05 |
anteaya | mtreinish: testr jobs are running | 22:05 |
pleia2 | I have a dr appt to run off to, bbiab | 22:05 |
mtreinish | anteaya: yeah, but they're not voting. I was curious if you've seen random failures there. (since they wouldn't trigger a gate reset) | 22:06 |
clarkb | the way puppet concat works is weird. I am not entirely sure that merging that puppet change won't cause an ha proxy restart | 22:06 |
mgagne | clarkb: I don't use puppet for client products, only internal stuff, mainly openstack. So they sent the ones designing products with puppet =) | 22:06 |
jeblair | grenade test failed: https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/3624/console | 22:06 |
clarkb | jeblair: but I figure I should write the change locally, reload haproxy then worry about the restart later | 22:06 |
anteaya | 33 minutes gate-grenade-devstack-vm failed | 22:06 |
anteaya | see ya pleia2 | 22:06 |
jeblair | but that's also a real test failure, not an infra failure) | 22:06 |
anteaya | mtreinish: yes, so far testr jobs are running, not results back yet in the grouping | 22:06 |
mtreinish | anteaya: ok thanks | 22:07 |
anteaya | np | 22:07 |
jeblair | i really need to mask aborted test results in zuul | 22:07 |
clarkb | jeblair: reloading haproxy now | 22:07 |
mgagne | clarkb: haproxy will be "notified" if haproxy.cfg is regenerated: https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/init.pp#L79-L87 | 22:07 |
anteaya | jeblair: yay | 22:07 |
clarkb | mgagne: yeah and I think concat ends up building it from scratch | 22:07 |
clarkb | mgagne: but I think it checks a diff maybe | 22:08 |
clarkb | haproxy reloaded | 22:08 |
anteaya | jeblair: now I understand your prior question, I don't know how to open test logs reporting failure when the patch is still in the queue | 22:08 |
anteaya | it takes me to jenkins and then I can't get to the log itself | 22:09 |
mgagne | clarkb: it concats a bunch of fragments using a bash script: https://github.com/puppetlabs/puppetlabs-concat/blob/master/files/concatfragments.sh#L22-24 | 22:09 |
clarkb | anteaya: click on "console log" on the left hand side in jenkins | 22:09 |
anteaya | clarkb: thanks | 22:09 |
clarkb | jeblair: If you are happy with that stack of changes I think you can approve them now | 22:10 |
clarkb | then we can reenable puppet on the servers | 22:10 |
anteaya | here is a python26 error and it looks like a real error, not a git timeout: https://jenkins02.openstack.org/job/gate-nova-python26/1261/console | 22:10 |
jeblair | clarkb: including source? | 22:10 |
*** weshay has quit IRC | 22:10 | |
clarkb | jeblair: yes including source | 22:10 |
clarkb | jeblair: I will just be careful when I start puppet again... I am not sure there is much we can do there | 22:11 |
clarkb | I could move the init script aside :) | 22:11 |
anteaya | mtreinish: here is a testr failure for a swift patch: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-testr-full/3500/console | 22:11 |
mtreinish | anteaya: thanks I was just looking at it. It looks like one I've seen before where all the server creates in nova go to an error state | 22:12 |
anteaya | okay | 22:12 |
anteaya | hmmmm | 22:12 |
clarkb | you can definitely see it is no longer roudn robinning requests if you tail the log | 22:14 |
anteaya | have a nova patch failing both 26 and 27, look like real failures - 23 minutes until if finishes | 22:14 |
clarkb | anteaya: link to py26 | 22:14 |
anteaya | py26: https://jenkins02.openstack.org/job/gate-nova-python26/1261/console | 22:15 |
anteaya | py27: https://jenkins02.openstack.org/job/gate-nova-python27/1560/console | 22:15 |
anteaya | the patch passed both in the check queue | 22:15 |
clarkb | yup real failure | 22:16 |
anteaya | I see those as being actual python failures, not git timeouts | 22:16 |
anteaya | yay, my log parsing skills are getting better | 22:16 |
anteaya | funny they passed in check | 22:16 |
*** burt has quit IRC | 22:16 | |
jeblair | clarkb: did you confirm whether smart http client is one connection? if so, do you want to round-robin it? or shelve this topic until we have more graphs for 'source'? | 22:18 |
clarkb | pleia2 mind checking cgit? | 22:18 |
jeblair | clarkb: i think she's afk | 22:18 |
clarkb | jeblair I think shelve | 22:18 |
jeblair | clarkb: wfm | 22:18 |
clarkb | jeblair thanks | 22:18 |
anteaya | at 14 minutes we have a postgress failure: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-postgres-full/3919/console | 22:20 |
anteaya | she is at a dr appointment | 22:20 |
anteaya | s/postgress/postgres | 22:21 |
mgagne | clarkb: trying to see if puppet service resource supports reload. But I'm always finding puppet bugs that have been opened for years without patch or conclusion... | 22:21 |
clarkb | mgagne: I think you have to give ita restart command or something like that | 22:21 |
clarkb | where puppet intends to `restart` but you have told it to do something else | 22:22 |
mgagne | clarkb: yes, which (IMO) is suboptimal | 22:22 |
clarkb | mgagne: I agree | 22:22 |
*** pblaho has quit IRC | 22:22 | |
anteaya | the postgres error is from nova patch 28819,3 | 22:22 |
jeblair | clarkb: shall i unstick az2 nodepool now? | 22:23 |
jeblair | anteaya: it's not an infra error | 22:23 |
anteaya | jeblair: yay | 22:23 |
anteaya | so far, no infra errors in the gate | 22:23 |
mgagne | clarkb: is haproxy actually restarted when the config is updated? | 22:23 |
clarkb | jeblair: yes I think we can open the flood gates | 22:23 |
clarkb | mgagne: --noop says the service will be restarted | 22:23 |
clarkb | mgagne: let me get the exact log line | 22:23 |
mgagne | clarkb: service resource has the "refreshable" feature | 22:23 |
anteaya | patch which will spark a gate reset to be finished in 4 minutes | 22:24 |
clarkb | mgagne: notice: /Stage[main]/Haproxy/Service[haproxy]: Would have triggered 'refresh' from 1 events | 22:24 |
jeblair | clarkb: done; az2 nodes should start showing up in a few mins | 22:24 |
anteaya | 8 in post, hopefully 6 more to join them | 22:24 |
mgagne | clarkb: we can only hope the service provider detects that the haproxy service can actually be reloaded. | 22:25 |
mgagne | clarkb: I don't see any trace of reload in that file: https://github.com/puppet/puppet/blob/master/lib/puppet/provider/service/init.rb | 22:26 |
anteaya | this patch has the nova py26 and py27 errors: https://review.openstack.org/#/c/40565/ it is going to remain in the queue after reset, I guess there is nothing we can do about that | 22:27 |
anteaya | it needs the logs from the failure attached to the patch and it won't get them otherwise | 22:27 |
clarkb | anteaya: yeah that is normal | 22:28 |
anteaya | okay | 22:28 |
jeblair | clarkb: the first new az2 node is in use, it appears to be running a job | 22:28 |
clarkb | mgagne: ok, I think I will just try starting puppet again on that server when jenkins is quiet | 22:28 |
clarkb | mgagne: that way if it restarts it doesn't hurt a lot of stuff and we know about it. Otherwise \o/ | 22:28 |
jeblair | clarkb: you mean in november? :) | 22:28 |
clarkb | jeblair: Friday afternoons are usually sanish | 22:28 |
clarkb | of course this is no normal week | 22:28 |
anteaya | heh | 22:29 |
mgagne | clarkb: thanks for asking about reload, now I have to fix haproxy to reload with my setup =) | 22:29 |
anteaya | that git.o.o cacti graph just looks beautiful | 22:30 |
openstackgerrit | A change was merged to openstack/requirements: Allow pyflakes 0.7.3 https://review.openstack.org/35804 | 22:31 |
openstackgerrit | A change was merged to openstack-infra/config: Swap git daemon in xinetd for service https://review.openstack.org/43012 | 22:31 |
anteaya | 10, 10 pretty patches in post ah ha ha ha *lightening flash* | 22:32 |
clarkb | It feels like we are moving again | 22:33 |
anteaya | w00t | 22:33 |
anteaya | look at that graph of test nodes climb | 22:34 |
anteaya | https://tinyurl.com/kmotmns | 22:34 |
openstackgerrit | A change was merged to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 22:35 |
*** dina_belova has joined #openstack-infra | 22:36 | |
openstackgerrit | A change was merged to openstack-infra/config: Add a mirror repack cron to git servers https://review.openstack.org/43331 | 22:37 |
clarkb | jeblair: the time remaining numbers when you hover over the progress bars on the status page don't add hours properly | 22:39 |
clarkb | jeblair: you can see that now if you look at the gate tempest jobs. I intend on taking a look at that when things are not so busy if no one else beats me to it | 22:39 |
openstackgerrit | A change was merged to openstack-infra/config: Use the haproxy source balance method. https://review.openstack.org/43359 | 22:39 |
*** dina_belova has quit IRC | 22:41 | |
jeblair | clarkb: thx; yeah, i _think_ the bug is in status.js | 22:41 |
anteaya | clarkb: just seems to be the ones in the gate, check and post seem reasonable | 22:41 |
jeblair | clarkb: also, it needs to round better; anything < 60 seconds is 0min | 22:42 |
clarkb | anteaya: yeah it has to do with jobs that roll over an hour in length | 22:42 |
clarkb | anteaya: we keep the hour set to 00 | 22:42 |
anteaya | ah | 22:42 |
anteaya | okay | 22:42 |
anteaya | what happens if you just go with minutes and get rid of hours | 22:42 |
anteaya | 90 minutes rather than 1 hour 30 minutes | 22:43 |
clarkb | anteaya: humans don't like reading timestamps like that | 22:43 |
anteaya | I can live with it | 22:43 |
anteaya | but other humans, okay | 22:43 |
anteaya | movie running times are all like that | 22:43 |
anteaya | 120 minutes | 22:43 |
anteaya | 200 minutes | 22:43 |
anteaya | gate reset | 22:44 |
anteaya | 12 in post! | 22:44 |
anteaya | look at the test node numbers climb | 22:44 |
mgagne | feels like a sport commentator =) | 22:44 |
anteaya | I have to do something | 22:45 |
anteaya | don't know enough to write any scripts to do any helpful changes | 22:45 |
anteaya | I would have to ask questions, slows them down | 22:45 |
mgagne | =) | 22:46 |
anteaya | :D | 22:46 |
anteaya | I'll learn more when it is quieter | 22:46 |
wenlock | hey guys, grats on getting your current challenge fixed... was wondering if i could ask a few questions ... ive been working on trying to understand puppet and using wiki | 22:47 |
jeblair | clarkb: a noticable bump in the git cpu graphs | 22:49 |
jeblair | wenlock: what's your question? | 22:49 |
clarkb | jeblair: we still seem to be under control though | 22:51 |
jeblair | clarkb: yep, seems well within capabality atm | 22:51 |
anteaya | clarkb: here is a bug report for you: https://bugs.launchpad.net/openstack-ci/+bug/1215659 | 22:52 |
clarkb | jeblair: I am going to enable puppet on 01-04 since all of the outstanding changes that affect them have merged | 22:52 |
uvirtbot | Launchpad bug 1215659 in openstack-ci "zuul status bars hover box "time remaining" fails after 61 minutes" [Undecided,New] | 22:52 |
clarkb | jeblair: I will hold off on git.o.o until I can do it semi safely | 22:52 |
jeblair | clarkb: ok | 22:53 |
*** mrodden has joined #openstack-infra | 22:54 | |
clarkb | in other news I think the kicking out of changes that may not merge is the greatest thing ever | 22:54 |
jeblair | This change was unable to be automatically merged with the current state of the repository and the following changes which were enqueued ahead of it: 31061, 41723, 42430, 42431, 43088, 42751, 42746, 42744, 42743, 41070, 42745, 42765, 42747, 42432, 42433, 42434, 42435, 42436, 42437, 42748, 42749, 42750, 42752, 40845, 37465, 38601. Please rebase your change and upload a new patchset. | 22:55 |
jeblair | clarkb: ^ you mean like that? :) | 22:55 |
jeblair | there's a merge conflict in there! somewhere! | 22:55 |
clarkb | jeblair: ya :) | 22:55 |
clarkb | I think the choice to sacrifice the few for the many was the correct one | 22:56 |
wenlock | i setup wiki on a private server, using the wiki.pp module it installed ok, but seems only mysql is started | 22:56 |
wenlock | is there some additional modules that control started state? | 22:57 |
clarkb | gate throughput is much higher now in the best case scenario | 22:57 |
wenlock | or should i have expected to see a running server on port 80? | 22:57 |
jeblair | clarkb: the needs of the many outweigh the needs of the few (or the one). | 22:57 |
anteaya | look at all those recent merges: http://graphite.openstack.org/graphlot/?width=586&height=308&_salt=1377178709.576&target=stats.gerrit.event.change-merged | 22:57 |
jeblair | wenlock: unfortunately, some parts of the wiki servers aren't in puppet :( | 22:57 |
jeblair | wenlock: i believe Ryan_Lane is planning on working on that when he gets a chance | 22:58 |
jeblair | wenlock: but i think at least some of the config is just on-host | 22:58 |
Ryan_Lane | very little of it is just on-host | 22:58 |
jeblair | wenlock: however, we do have some documentation about how upgrades are manually performed | 22:58 |
jeblair | wenlock: http://ci.openstack.org/wiki.html | 22:58 |
Ryan_Lane | just the mediawiki software and its config | 22:58 |
Ryan_Lane | everything else is in the module | 22:58 |
jeblair | Ryan_Lane: ah ok | 22:59 |
jeblair | wenlock: that upgrade documentation might be able to serve as install configuration too | 22:59 |
jeblair | act | 22:59 |
jeblair | wenlock: that upgrade documentation might be able to serve as install documentation too | 22:59 |
wenlock | ok, cool... thats making a little more sense now :D | 23:00 |
*** datsun180b has quit IRC | 23:00 | |
clarkb | puppet is running on 01-04 | 23:01 |
clarkb | now I will check cgit | 23:01 |
clarkb | cgit seems happy | 23:02 |
anteaya | i don't see any failures in the gate queue/pipeline yet | 23:02 |
clarkb | jeblair: http://git.openstack.org/cgit/openstack-infra/config/stats/ you write a lot of commits apparently | 23:03 |
*** rnirmal has quit IRC | 23:03 | |
mgagne | at least I'm on the list ^^' | 23:03 |
*** notmyname has quit IRC | 23:04 | |
clarkb | I wonder if that is counting patchsets | 23:04 |
*** notmyname has joined #openstack-infra | 23:04 | |
anteaya | mgagne: I'm just hoping I am in other somewhere | 23:04 |
anteaya | :D | 23:04 |
* clarkb looks in status.js to focus on something different for a bit | 23:05 | |
mgagne | anteaya: http://git.openstack.org/cgit/openstack-infra/config/stats/?period=q&ofs=25 | 23:06 |
*** pcrews has quit IRC | 23:06 | |
mgagne | clarkb: how about upgrading apache puppet module to latest version :D /jk | 23:06 |
clarkb | mgagne: you arefunny | 23:06 |
*** wenlock has quit IRC | 23:07 | |
anteaya | mgagne: yay I'm on the list, thanks | 23:07 |
anteaya | clarkb: did you see this? https://bugs.launchpad.net/openstack-ci/+bug/1215659 | 23:07 |
uvirtbot | Launchpad bug 1215659 in openstack-ci "zuul status bars hover box "time remaining" fails after 61 minutes" [Undecided,New] | 23:07 |
anteaya | or did it get lost in the blur? | 23:07 |
*** pabelanger_ has quit IRC | 23:07 | |
clarkb | anteaya: I did thanks | 23:07 |
anteaya | np | 23:07 |
clarkb | it popped up in my email which is what prompted me to look a tit | 23:08 |
clarkb | that is an unfortunate typo | 23:08 |
anteaya | cool | 23:08 |
anteaya | let it pass | 23:08 |
*** pabelanger has joined #openstack-infra | 23:08 | |
anteaya | did you ever get real food, clarkb? | 23:08 |
*** notmyname has quit IRC | 23:08 | |
anteaya | or are you still running on sandwich? | 23:09 |
clarkb | anteaya: sandwiches are real food | 23:09 |
*** notmyname has joined #openstack-infra | 23:09 | |
anteaya | that they are yes, I was referring to the aromatic food that was cooking earlier | 23:09 |
clarkb | jeblair: I think I see the bug in status.js | 23:09 |
* anteaya lives on sandwiches herself | 23:09 | |
*** jhesketh has quit IRC | 23:11 | |
*** sdake_ has joined #openstack-infra | 23:12 | |
*** jhesketh has joined #openstack-infra | 23:14 | |
*** _TheDodd_ has quit IRC | 23:14 | |
pleia2 | clarkb: back, lmk if you still need tests | 23:15 |
clarkb | pleia2: I think we are good | 23:16 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Fix zuul status hours display. https://review.openstack.org/43375 | 23:16 |
clarkb | jeblair: anteaya ^ | 23:16 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Fix error with stats for de-configured resources https://review.openstack.org/43376 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Make jenkins username and private key path configurable https://review.openstack.org/43377 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Move setup scripts destination https://review.openstack.org/43033 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Change credentials-id parameter in config file https://review.openstack.org/43016 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Reduce timeout when waiting for server deletion https://review.openstack.org/43017 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Add option to test jenkins node before use https://review.openstack.org/43313 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Add JenkinsManager https://review.openstack.org/43014 | 23:17 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Add an ssh check periodic task https://review.openstack.org/43015 | 23:17 |
jeblair | clarkb: something about a lot of patches? | 23:17 |
clarkb | jeblair: ya you write them :) | 23:17 |
pleia2 | whee :) | 23:17 |
*** mriedem has joined #openstack-infra | 23:18 | |
jeblair | clarkb: so that adds the node test feature; it's completely optional, and i'm not sure i want to use it, but i figured it'd be good to get that lever in place in case we want to pull it | 23:18 |
clarkb | ++ | 23:18 |
jeblair | clarkb: i'm actually more leaning toward thinking that getting zuul to re-run jobs that come back with jenkins exceptions is the way to go, and i think we can do that without a change to the gearman plugin | 23:18 |
clarkb | ooh | 23:19 |
jeblair | clarkb: but i'll go ahead and write up the jjb change to populate the node test job so it'll be there if we want it | 23:19 |
*** mrodden1 has joined #openstack-infra | 23:19 | |
*** mrodden has quit IRC | 23:20 | |
clarkb | sounds good. I may take a break shortly to do something other than type in a terminal. But plan to do some code review after that | 23:23 |
anteaya | if we are re-running jobs that return with exceptions do we have some form of counter so it doesn't loop endlessly? | 23:23 |
clarkb | I have found that code review at night is nice because there are few distractions | 23:23 |
clarkb | anteaya: In this case it may be ok to loop endlessly as the failure is on th ejenkins side | 23:23 |
anteaya | okay | 23:24 |
jeblair | anteaya, clarkb: i think i would use a counter; if jenkins goes crazy i don't want everything stuck in zuul | 23:24 |
anteaya | makes sense | 23:24 |
jeblair | clarkb: do you think we're in a good place to add, say, 8 more centos nodes? | 23:24 |
jeblair | oh,... | 23:25 |
jeblair | actually, we should re-evaluate now that they should be using the git protocol | 23:25 |
jeblair | they may not be as far behind now | 23:25 |
clarkb | jeblair: they are using git protocal and a spot check showed that it sped up ggp tremendously for them | 23:25 |
*** Adri2000 has quit IRC | 23:25 | |
anteaya | we have a LOST in the gate, 40833, 10: https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/3669/console | 23:26 |
jeblair | anteaya: yeah, that's the situation that either the nodepool change or (hopefully, still looking into it) a zuul change would fix | 23:26 |
anteaya | okay | 23:26 |
anteaya | what more does zuul need fixed? | 23:27 |
anteaya | trying to keep up | 23:27 |
clarkb | I think the LOST jobs is the last major outstanding item | 23:27 |
jeblair | anteaya: the change we were just talking about with exceptions | 23:27 |
anteaya | yay, we finally got there | 23:27 |
anteaya | sorry, I will re-read | 23:27 |
clarkb | which means I need to get into code review mode soon | 23:27 |
anteaya | oh yeah, coming back from jenkins with an exception | 23:27 |
clarkb | if it makes everyone feel better about this week NASDAQ halted trading today due to a technical issue | 23:28 |
anteaya | you are kidding | 23:29 |
clarkb | nope. for 3 hours today they shut it down | 23:29 |
jeblair | clarkb: you know, the sun's magnetic polarity is reversing. just sayin. | 23:29 |
anteaya | can't imagine what it would be like on the NASDAQ tech team | 23:29 |
anteaya | ha ha ha | 23:29 |
anteaya | it happens every 11 years | 23:29 |
anteaya | but yeah, 11 years ago we didn't have the reliance on tech we have today | 23:30 |
anteaya | that is for sure | 23:30 |
clarkb | jeblair: I joked in a different channel that their ops team must be at puppetconf | 23:31 |
clarkb | anteaya: ^ | 23:32 |
anteaya | ha ha ha | 23:32 |
pleia2 | hah | 23:32 |
jeblair | clarkb: hrm, it looks like that error came back as a regular work_fail, just without a result | 23:32 |
jeblair | clarkb: so not quite as nice as a work_exception, but that might still be actionable | 23:32 |
clarkb | jeblair: hmm. I think jenkins is catching that and bottling it up before gearman plugin sees it | 23:33 |
clarkb | jeblair: so it becomes a failed test with no result | 23:33 |
*** Adri2000 has joined #openstack-infra | 23:33 | |
clarkb | there is just not enough data in the return from the job future | 23:33 |
jeblair | clarkb: possibly; but i'm also double checking that either gearman-plugin or java-gearman isn't turning that into work_fail | 23:34 |
*** jhesketh has quit IRC | 23:35 | |
clarkb | jeblair: does gearman plugin break the timeout plugin? there are a few jobs that seem to have run much longer than is allowed | 23:36 |
clarkb | back when git was slow | 23:36 |
*** dina_belova has joined #openstack-infra | 23:36 | |
jeblair | clarkb: yeah, i think you're right; if gearman-plugin gets an exception, it should return work_exception | 23:37 |
clarkb | jeblair: there may be info returned by the future that can be examined | 23:38 |
clarkb | jeblair: you may have to grep through the console log which seems dirty | 23:38 |
clarkb | or treat a failure with no result as a jenkins exception | 23:39 |
jeblair | it seems weird that the result would be null | 23:39 |
clarkb | ya | 23:39 |
*** jhesketh has joined #openstack-infra | 23:39 | |
jeblair | it seems accurate enough; i'm willing to do it, but it also seems tenuous | 23:39 |
clarkb | to slightly change the subject, I think we should release a new zuul version if lasts night's bug fix holds up | 23:40 |
clarkb | though that bug was only in unreleased zuul so it may not be very urgent | 23:40 |
*** dina_belova has quit IRC | 23:41 | |
jeblair | i think it's probably time for me to write a mailing list update | 23:41 |
clarkb | ++ | 23:42 |
jeblair | along the lines of 'mostly better' still working on a few things. | 23:42 |
*** shardy is now known as shardy_afk | 23:42 | |
jeblair | and i guess an announcement of git.o.o (not the way i expected it to be announced) | 23:43 |
clarkb | ya | 23:43 |
clarkb | these things happen | 23:43 |
pleia2 | jeblair: including git.o.o in the same post? (I don't mind writing a separate one, I was thinking about blogging about it too) | 23:44 |
*** sdake_ has quit IRC | 23:44 | |
jeblair | i think it actually deserves its own post, so i think i'll mention it, but i think pleia2 should also write an email about it | 23:44 |
anteaya | jeblair: I think there would be many happy people if there was a ml update | 23:44 |
jeblair | i think i should mention it as i describe what we're doing to handle the load | 23:44 |
jeblair | but i also want people to really learn about git.o.o and how cool it is | 23:44 |
clarkb | jeblair: ++ | 23:45 |
jeblair | and that should be its own topic/post | 23:45 |
jeblair | pleia2: how does that sound? | 23:45 |
anteaya | yes, I agree | 23:45 |
pleia2 | jeblair: wfm | 23:45 |
clarkb | I think if you mention it in passing to explain the mitigation of test failures that leaves the door open to give it a proper writeup | 23:45 |
pleia2 | I'll update the ci.o.o/git docs real quick first | 23:45 |
pleia2 | (I'll need clarkb to review) | 23:45 |
clarkb | oh ya I completely neglected to write docs on the haproxy stuff >_> | 23:46 |
*** fbo is now known as fbo_away | 23:46 | |
jeblair | pleia2: cool, so you'll handle the git.o.o post then, at your leisure, and i'll mention it in passing and that you'll be sending a real announcement | 23:46 |
jeblair | clarkb: i haven't written nodepool docs yet either | 23:46 |
pleia2 | clarkb: no worries, I'm on it | 23:46 |
jeblair | speaking of which... | 23:46 |
jeblair | fungi hasn't disappeared yet, has he? | 23:46 |
* clarkb waits for gerritbot to announce new change adding nodepool docs :) | 23:47 | |
jeblair | clarkb: ha | 23:47 |
clarkb | jeblair: it sounded like today was going to busy for him | 23:47 |
clarkb | and that he would try to be on this evening | 23:47 |
jeblair | ok, but he's not on a boat yet, so he might catch this... | 23:47 |
clarkb | jeblair: correct. boat is tomorrow morning | 23:47 |
jeblair | fungi: for the 'run your own devstack-gate node' thing -- i need to delete all the node launching stuff from d-g.... | 23:47 |
jeblair | fungi: the shell scripts to actually do all the work are fairly well split out now... | 23:48 |
jeblair | fungi: so there are two approaches for migrating that | 23:48 |
clarkb | I am going to run home really quick so that I can do code review on the couch | 23:48 |
clarkb | s/code/docs/ as appropriate | 23:48 |
jeblair | fungi: 1) instruct people on how to run those scripts on a node (sort of a one-off "make this a devstack-gate node" process) | 23:49 |
openstackgerrit | A change was merged to openstack-infra/zuul: Make updateChange actually update the change https://review.openstack.org/43220 | 23:49 |
jeblair | fungi: or 2) how to set up a local nodepool (more complicated, but you can spin up replacement nodes easily) | 23:49 |
jeblair | fungi: (#2 is more or less palatable depending on whether nodepoll still works with sqlite in low-volume; that's unknown at this point) | 23:50 |
*** michchap has joined #openstack-infra | 23:52 | |
*** Adri2000 has quit IRC | 23:53 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add node-test job https://review.openstack.org/43381 | 23:53 |
*** rcleere has quit IRC | 23:55 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!