*** mrodden has quit IRC | 00:00 | |
jog0 | clarkb: moving over to infra | 00:01 |
---|---|---|
jog0 | got a cool graph about the swift bug | 00:01 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: update doc and add new JJB unit tests https://review.openstack.org/56715 | 00:01 |
jog0 | http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkdvdCBlcnJvciBmcm9tIFN3aWZ0OiBwdXRfb2JqZWN0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzODQ4MTkyMDI2ODd9 | 00:01 |
*** MarkAtwood has joined #openstack-infra | 00:01 | |
*** dcramer_ has quit IRC | 00:02 | |
clarkb | so it happened a bunch last week and is more sparse this week | 00:02 |
jog0 | clarkb: yup after the fix | 00:02 |
jog0 | but it is stil an issue | 00:02 |
clarkb | fungi: you can start rax on jenkins01 now | 00:03 |
fungi | ahh, yup, see that now | 00:03 |
fungi | thx | 00:03 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: fix jjb job template documentation https://review.openstack.org/57062 | 00:03 |
*** rnirmal has quit IRC | 00:05 | |
*** sarob has joined #openstack-infra | 00:05 | |
*** loq_mac has quit IRC | 00:07 | |
*** mriedem has joined #openstack-infra | 00:07 | |
*** mriedem has quit IRC | 00:08 | |
*** loq_mac has joined #openstack-infra | 00:08 | |
*** herndon_ has quit IRC | 00:09 | |
*** mriedem has joined #openstack-infra | 00:09 | |
*** julim has quit IRC | 00:10 | |
mikal | Is there a known issue with the gate at the moment? | 00:11 |
mikal | 49305,17 seems to have been at the top of the gate for a long time | 00:11 |
anteaya | yeah, well in as much as jenkins has had a hard day | 00:11 |
mikal | Well, that review has had py26 running for over 12 hours | 00:12 |
*** oubiwann has quit IRC | 00:12 | |
anteaya | ah that is a long time | 00:12 |
mikal | Quite | 00:12 |
anteaya | clarkb fungi | 00:12 |
anteaya | the gate might need a shove | 00:12 |
*** jamesmcarthur has joined #openstack-infra | 00:13 | |
clarkb | ya those are just hosed | 00:14 |
clarkb | we are waiting for everything in the gate to get unhosed so that we can restart zuul | 00:14 |
clarkb | or you can push a new patchset to gerrit, that may or may not unstick it | 00:14 |
anteaya | how about reverify? | 00:14 |
mikal | clarkb: if I -2 that review, wont that cancel its zuul run? | 00:14 |
anteaya | would reverify work? | 00:15 |
*** jamesmcarthur has quit IRC | 00:15 | |
*** pmathews has quit IRC | 00:15 | |
*** matsuhashi has joined #openstack-infra | 00:15 | |
fungi | a new patchset might jostle it out, but more likely it will have to wait for the zuul restart here in a little while | 00:16 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1252514 https://review.openstack.org/57070 | 00:16 |
uvirtbot | Launchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New] https://launchpad.net/bugs/1252514 | 00:16 |
*** jcooley_ has joined #openstack-infra | 00:17 | |
fungi | oh, and clarkb has already stated those things | 00:17 |
clarkb | mikal: no zuul doesn't react on -2 yet | 00:17 |
fungi | the -2 simply prevents it from merging | 00:17 |
clarkb | fungi: there appear to be 3 jenkins01 hpcloud straggles | 00:17 |
fungi | clarkb: yup, they were probably in the process of being created when the original list ran, or else they failed to delete on the first try. as soon as the rax nodes finish clearing in the next minute or two i'll do a cleanup pass | 00:18 |
clarkb | k | 00:18 |
*** danger_fo_away is now known as dangers | 00:24 | |
yjiang5 | Hi, all, noticed from http://ci.openstack.org/devstack-gate.html that the destack starts from a virtual machine. Where can I find more detailed information on how th VM created and is it a HVM VM? Asking this because I'm considering if it's possible to add a PCI device to that VM to test PCI device assignment in openstack. | 00:26 |
fungi | yjiang5: https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/README.rst#n100 | 00:28 |
*** markwash has quit IRC | 00:29 | |
fungi | yjiang5: in short, those virtual machines are currently whatever rackspace and hpcloud provide to us | 00:29 |
yjiang5 | fungi: thanks, I just focus on checking the three script in the repo. Sorry. | 00:29 |
yjiang5 | fungi: I will have a look on it in detail. | 00:29 |
jog0 | http://paste.openstack.org/show/53572/ | 00:31 |
clarkb | fungi: down to 5 nodes on jenkins01 | 00:33 |
*** loq_mac has quit IRC | 00:34 | |
*** jcooley_ has quit IRC | 00:35 | |
*** adalbas has quit IRC | 00:35 | |
fungi | yeah, manually deleting the last three now which escaped the first apsses | 00:35 |
fungi | passes | 00:35 |
zaro | mgagne: what do you think? https://review.openstack.org/#/c/57068 | 00:36 |
*** rnirmal has joined #openstack-infra | 00:38 | |
fungi | okay, all jenkins01 nodes are gone from the nodepool list | 00:39 |
NobodyCam | humm is this job stuck? | 00:39 |
NobodyCam | ironic/conductor/manager.py | 00:39 |
fungi | starting jenkins01 back up now | 00:39 |
NobodyCam | https://jenkins01.openstack.org/job/gate-ironic-docs/566/ | 00:39 |
NobodyCam | :) | 00:39 |
NobodyCam | ty fungi | 00:39 |
clarkb | fungi: woot | 00:39 |
fungi | let's see where this gets us | 00:39 |
mgagne | zaro: the idea is nice. Are tests bundled when shipping JJB? | 00:40 |
*** nati_uen_ has quit IRC | 00:40 | |
NobodyCam | still think this job is stuck | 00:40 |
NobodyCam | https://jenkins01.openstack.org/job/gate-ironic-docs/566/ | 00:40 |
*** ryanpetrello has joined #openstack-infra | 00:40 | |
mgagne | zaro: generating docs would require tests to be there. Is it a reasonable assumption? | 00:40 |
clarkb | NobodyCam: it is | 00:41 |
NobodyCam | :) | 00:41 |
clarkb | you can attempt pushing a new patchset which zuul should treat as an indication to kick the existing change out of the queues | 00:41 |
zaro | mgagne: tests do not need to be bundled. | 00:41 |
clarkb | or you can wait for things to settle down allowing us to restart zuul | 00:41 |
NobodyCam | I'll wait | 00:41 |
NobodyCam | almost 5 | 00:41 |
zaro | mgagne: docs are generated by jenkins which has source and tests so once doc is generated i believe it has everything. | 00:41 |
*** loq_mac has joined #openstack-infra | 00:42 | |
clarkb | fungi: nodepool appears to slowly be deleting hpcloud nodes attached to jenkins02 | 00:42 |
clarkb | so I think this was a event stream derp | 00:42 |
NobodyCam | you guys are awesome | 00:42 |
NobodyCam | just fyi | 00:43 |
clarkb | NobodyCam: our software isn't :/ it really went sidewyas today | 00:43 |
NobodyCam | :) | 00:44 |
*** emagana has quit IRC | 00:44 | |
fungi | clarkb: agreed, that's the best fit for what we saw | 00:44 |
fungi | clarkb: aside from the original original issue, puppet agent eating itself. we still need to discuss options for safely rolling that out | 00:45 |
*** Ryan_Lane has quit IRC | 00:45 | |
fungi | probably need to rubify the certname in the config | 00:46 |
clarkb | fungi: or have a killswitch where the config is only managed on the first run | 00:46 |
clarkb | (not sure how that would work though) | 00:46 |
*** Ryan_Lane has joined #openstack-infra | 00:47 | |
clarkb | jenkins02 has masses of slaves backed up now too | 00:47 |
fungi | but yeah, right now that's a time bomb waiting to strike any of our systems | 00:47 |
clarkb | I guess I/we should just be patient and see if it picks those off slowly | 00:47 |
fungi | it did last time around | 00:47 |
clarkb | the graphs look a lot better now than they did before (lots of hosts in deleting instead of ready) | 00:47 |
*** senk has joined #openstack-infra | 00:48 | |
clarkb | fungi: right the difference I think is that nodepool actually knows to delete them rather than treating them as ready | 00:48 |
*** gyee has quit IRC | 00:48 | |
fungi | yup | 00:48 |
clarkb | you can see that in the zuul graph as the green vs purple | 00:48 |
*** MarkAtwood has quit IRC | 00:48 | |
fungi | well, i meant it was successfully whittling away at the delete list on jenkins02 earlier when we swamped it | 00:48 |
*** yamahata_ has joined #openstack-infra | 00:48 | |
*** dcramer_ has joined #openstack-infra | 00:49 | |
clarkb | ah | 00:50 |
*** jergerber has quit IRC | 00:51 | |
*** ryanpetrello has quit IRC | 00:56 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: use jjb tests as the examples https://review.openstack.org/57068 | 00:57 |
mgagne | zaro: example now includes "# vim: sw=4 ts=4 et" ^^' | 00:58 |
clarkb | mgagne: zaro: we should probably just remove the modelines unless people really find them useful | 00:59 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1252514 https://review.openstack.org/57070 | 00:59 |
clarkb | (I don't and I use vim :) ) | 00:59 |
uvirtbot | Launchpad bug 1252514 in swift "glance doesn't recover if Swift returns an error" [Undecided,New] https://launchpad.net/bugs/1252514 | 00:59 |
*** wenlock has quit IRC | 00:59 | |
*** herndon has joined #openstack-infra | 00:59 | |
*** MarkAtwood has joined #openstack-infra | 01:00 | |
jog0 | why is a file from a week ago not on logs.o | 01:00 |
jog0 | http://logs.openstack.org/69/51169/3/gate/gate-tempest-devstack-vm-postgres-full/d8a7b6f/console.html | 01:00 |
jog0 | ohh .gz | 01:00 |
*** matsuhashi has quit IRC | 01:01 | |
*** matsuhashi has joined #openstack-infra | 01:02 | |
*** matsuhashi has quit IRC | 01:02 | |
*** matsuhashi has joined #openstack-infra | 01:02 | |
fungi | i do use vim, and i still am no fan of modelines | 01:02 |
fungi | best suggestion i saw in the modelines thread was that people who care strongly about editor magic ought to write openstack-specific editor style plugins | 01:03 |
fungi | and maintain them as projects for use by the rest of the development community | 01:03 |
*** ryanpetrello has joined #openstack-infra | 01:04 | |
fungi | i could imagine them fitting in openstack-dev/.* if we determined that we wanted to continue to put things in that container | 01:04 |
*** senk has quit IRC | 01:05 | |
*** ^demon|busy has quit IRC | 01:05 | |
fungi | or even in a contrib subdirectory of hacking, maybe | 01:08 |
fungi | though jog0 might not agree ;) | 01:09 |
jhesketh | fungi: If you have a moment to approve this change that'd be great please: https://review.openstack.org/#/c/56857/ | 01:09 |
fungi | jhesketh: i'm only one cider into the evening, so still safe i think | 01:10 |
clarkb | :) | 01:10 |
jhesketh | excellent :-) | 01:10 |
jog0 | fungi: keep your modelines out of my hacking | 01:11 |
jog0 | ;) | 01:11 |
reed | the TOPIC police! | 01:11 |
fungi | reed: wave that baton fiercely | 01:12 |
mikal | Hey... How hard is it to pin a ubuntu package on your workers? | 01:13 |
mikal | jog0 has noted that the console log failure coincides with going from libvirt0_0.9.8-2ubuntu17.13_amd64.deb to libvirt0_0.9.8-2ubuntu17.15_amd64.deb | 01:14 |
jog0 | mikal: you don't want to use that word around here | 01:14 |
jog0 | pin | 01:14 |
mikal | pins are bad? | 01:14 |
mikal | We have voodoo dolls or something? | 01:14 |
fungi | our test slaves are voodoo dolls | 01:14 |
fungi | so, pinning a deb on those assumes that it hasn't been upgraded yet, that we can safely downgrade it, and that the deb in question is still in a package repository somewhere | 01:15 |
*** oubiwann has joined #openstack-infra | 01:15 | |
fungi | better would be to pester zul about libvirt0_0.9.8-2ubuntu17.15_amd64.deb introducing some sort of issue not present in libvirt0_0.9.8-2ubuntu17.13_amd64.deb | 01:16 |
mikal | This is the only thing in the change log for .15: | 01:16 |
mikal | update fix-for-parallel-port-passthrough-for-qemu: the xml for the new | 01:16 |
mikal | testcase was too modern and caused the test to fail by including new | 01:16 |
mikal | keywords. | 01:16 |
anteaya | that is just funny | 01:18 |
anteaya | who decides the definition of "too modern" | 01:19 |
mikal | .14 has two very boring changes as well | 01:19 |
mikal | * qemu-delete-usb-devices-on-stop and | 01:19 |
mikal | qemu-build-activeusbhostdevs-on-reconnect: ensure that we can re-use | 01:19 |
mikal | a usb device after another domain using the device has shut | 01:19 |
mikal | down. (LP: #1190387) Backported from upstream git. | 01:19 |
mikal | * cherrypick fix-for-parallel-port-passthrough-for-qemu from upstream | 01:19 |
mikal | (LP: #1203620) | 01:19 |
*** dcramer_ has quit IRC | 01:19 | |
*** jhesketh has quit IRC | 01:21 | |
*** nosnos has joined #openstack-infra | 01:21 | |
*** jhesketh has joined #openstack-infra | 01:22 | |
*** sjing has joined #openstack-infra | 01:25 | |
*** nati_ueno has joined #openstack-infra | 01:26 | |
*** bingbu has joined #openstack-infra | 01:27 | |
fungi | jenkins01 vs jenkins02 devstack node numbers are approaching equilibrium, the graph looks more like what i would expect (lots of used, a bunch building, very little deleting or available) and the gate seems to be moving | 01:27 |
clarkb | fungi: woot that is my reading as well | 01:28 |
fungi | gate and check pipeline totals have started to drop as well | 01:28 |
*** sjing has quit IRC | 01:29 | |
*** sjing has joined #openstack-infra | 01:30 | |
*** rnirmal has quit IRC | 01:30 | |
*** dcramer_ has joined #openstack-infra | 01:33 | |
*** vipul has quit IRC | 01:38 | |
*** michchap has quit IRC | 01:38 | |
*** vipul has joined #openstack-infra | 01:38 | |
*** michchap has joined #openstack-infra | 01:38 | |
*** zjdriver has quit IRC | 01:45 | |
*** rnirmal has joined #openstack-infra | 01:45 | |
*** dstanek has quit IRC | 01:47 | |
*** dstanek has joined #openstack-infra | 01:47 | |
clarkb | fungi: I updated thatreview with some distilled questions | 01:51 |
clarkb | fungi: I think I covered the important bits, let us see if mordred has answers (or anyone else) | 01:51 |
openstackgerrit | A change was merged to openstack-infra/config: Install graphviz on jenkins slaves for docs https://review.openstack.org/56857 | 01:51 |
fungi | okay, great | 01:51 |
clarkb | tl;dr I believe the patch will work as is but only because the availability_zone kwarg passed to that __init__ is ignored | 01:53 |
clarkb | and nova will just give us nodes on whichever az | 01:53 |
clarkb | mikal: fungi: since we create new d-g images once a day we can pin it then new images will use the old version | 01:54 |
*** jcooley_ has joined #openstack-infra | 01:54 | |
clarkb | mikal: fungi: I don't think we should do that if we can figure out what the actual problem is, but it may give us temporary sanity | 01:54 |
anteaya | is there any other kind of sanity> | 01:55 |
anteaya | ? | 01:55 |
clarkb | anteaya: I like to think of my insanity as temporary :) | 01:55 |
anteaya | nice | 01:55 |
anteaya | I will support that fantasy you have | 01:55 |
* fungi has plenty of false sanity | 01:55 | |
anteaya | ha ha ha | 01:55 |
*** jcooley_ has quit IRC | 01:55 | |
clarkb | anteaya: thank you, it really helps to have everyone else playing along | 01:56 |
mikal | clarkb: I'd hold off. We can't find a change in the package which justifies it as the problem. | 01:56 |
anteaya | I try to do my bit | 01:56 |
clarkb | :) | 01:56 |
clarkb | mikal: ok | 01:56 |
clarkb | mikal: is it possible that there was an undocumented change? | 01:56 |
*** dcramer_ has quit IRC | 01:56 | |
mikal | clarkb: that is possible | 01:57 |
mikal | We're trying to push in some better debugging now to see if that poitns the finger at something | 01:57 |
clarkb | sounds good | 01:57 |
*** jcooley_ has joined #openstack-infra | 01:57 | |
clarkb | fungi: I have just been summoned for dinner, I will be back on IRC in a bit but jenkins seems mostly happy for the moment | 01:58 |
clarkb | fungi: I will restart zuul later tonight if the queues go away | 01:59 |
fungi | awesome. pleasant eats | 02:00 |
*** hashar has quit IRC | 02:03 | |
clarkb | fungi is it too late to look at havana jobs? | 02:09 |
anteaya | check is down to 25 | 02:09 |
*** dcramer_ has joined #openstack-infra | 02:10 | |
fungi | i can look... there are changes, yes? | 02:10 |
clarkb | yes ai have one and dprince wrote one | 02:11 |
*** changbl has quit IRC | 02:12 | |
fungi | i'll find them and take a peek | 02:12 |
*** ryanpetrello has quit IRC | 02:16 | |
*** Ryan_Lane has quit IRC | 02:17 | |
*** senk has joined #openstack-infra | 02:18 | |
*** senk has quit IRC | 02:22 | |
openstackgerrit | Matt Riedemann proposed a change to openstack-infra/elastic-recheck: Add query for bug 1252170 https://review.openstack.org/57081 | 02:23 |
uvirtbot | Launchpad bug 1252170 in tempest "tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm[compute] failed" [Undecided,New] https://launchpad.net/bugs/1252170 | 02:23 |
*** b3nt_pin has quit IRC | 02:23 | |
*** amotoki has joined #openstack-infra | 02:27 | |
*** che-arne has quit IRC | 02:27 | |
*** sarob has quit IRC | 02:27 | |
*** ljjjustin has joined #openstack-infra | 02:33 | |
*** reed has quit IRC | 02:39 | |
*** MarkAtwood has quit IRC | 02:43 | |
*** yamahata_ has quit IRC | 02:44 | |
*** senk has joined #openstack-infra | 02:46 | |
*** nati_ueno has quit IRC | 02:47 | |
fungi | clarkb: 56720 xml comparison job says two periodic grizzly jobs went missing. those got renamed, right | 02:50 |
fungi | ? | 02:50 |
*** changbl has joined #openstack-infra | 02:51 | |
*** jcooley_ has quit IRC | 02:52 | |
*** jcooley_ has joined #openstack-infra | 02:52 | |
clarkb | yup | 02:54 |
clarkb | to be consistent | 02:54 |
*** jcooley_ has quit IRC | 02:54 | |
*** jcooley_ has joined #openstack-infra | 02:56 | |
fungi | k, lgtm then. okay to merge? | 02:58 |
*** wenlock has joined #openstack-infra | 02:59 | |
*** masayukig has joined #openstack-infra | 03:00 | |
fungi | clarkb: ^ | 03:00 |
clarkb | fungi: I think so | 03:00 |
clarkb | feel like the gate is going nowhere fast (too many resets) | 03:00 |
*** xeyed4good has quit IRC | 03:01 | |
*** yaguang has joined #openstack-infra | 03:03 | |
*** sjing has quit IRC | 03:05 | |
fungi | clarkb: 57066 can't pass grenade, do you agree it's going to require forcing in? | 03:06 |
*** sjing has joined #openstack-infra | 03:06 | |
clarkb | is that dprinces? | 03:07 |
fungi | yah | 03:08 |
clarkb | fungi: I don't think that one should be forced in | 03:09 |
clarkb | all of those d-g cahnges that sdague wrote were part of the retooling. I think that failure may be one of those 10%ers | 03:09 |
fungi | i suspect it may instabreak the gate until havava upgrades are pounded back into shape | 03:10 |
*** dcramer_ has quit IRC | 03:10 | |
clarkb | that may be the case as well | 03:10 |
fungi | looking at grenade logs on my current device is not a quick task | 03:11 |
clarkb | fungi: several tempest failures at the end of the run | 03:12 |
clarkb | which means the first bits were ok | 03:12 |
clarkb | I think this may be "flaky" tests | 03:12 |
fungi | grrr | 03:12 |
*** salv-orlando has quit IRC | 03:13 | |
fungi | heatclient maybe | 03:13 |
*** pcrews has quit IRC | 03:13 | |
clarkb | FAIL: tempest.api.compute.v3.servers.test_server_actions.ServerActionsV3TestXML.test_resize_server_confirm[gate,smoke] things like that | 03:14 |
clarkb | which may or may not be related to the change | 03:14 |
clarkb | fungi: I think we should get sdague to look at it since he orchestrated the whole thing | 03:14 |
clarkb | I am not very hopeful that the gate queue will be empty before I go to bed | 03:15 |
*** guohliu has joined #openstack-infra | 03:16 | |
*** wenlock has quit IRC | 03:17 | |
openstackgerrit | A change was merged to openstack-infra/config: Add stable/havana jobs. https://review.openstack.org/56720 | 03:21 |
*** Hunner_irssi has quit IRC | 03:27 | |
*** Hunner_irssi has joined #openstack-infra | 03:27 | |
*** Hunner_irssi has joined #openstack-infra | 03:27 | |
*** herndon has quit IRC | 03:27 | |
*** matsuhashi has quit IRC | 03:29 | |
fungi | well, we can restart zuul prior to that if we want and it should mostly resume where it left off, yeah? | 03:29 |
*** matsuhashi has joined #openstack-infra | 03:30 | |
fungi | as for sdague, i think he didn't expect to be around again until next week, right? | 03:30 |
*** changbl has quit IRC | 03:31 | |
clarkb | I am not sure when sdague planned on being back | 03:31 |
clarkb | fungi: I don't think we can start a graceful restart because those jobs that got "stuck" will prevent zuul from stopping | 03:32 |
fungi | oh, very good point. we'd have to wait until those are all that's left during a graceful quiescence and then dump the state and feed that list back into rechecks/reverifies | 03:34 |
fungi | nontrivial | 03:34 |
*** matsuhashi has quit IRC | 03:35 | |
clarkb | ya :/ | 03:35 |
*** krast has joined #openstack-infra | 03:40 | |
*** krast has quit IRC | 03:41 | |
*** nati_ueno has joined #openstack-infra | 03:41 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: use jjb tests as the examples https://review.openstack.org/57068 | 03:42 |
fungi | interesting oscillation emerging in the nodepool graph | 03:43 |
fungi | dprince is rechecking his change, but not hanging out in irc. poo | 03:47 |
*** dkliban has joined #openstack-infra | 03:47 | |
*** rnirmal has quit IRC | 03:50 | |
clarkb | ya we basically use all the nodes then reset | 03:52 |
fungi | lather, rinse, reset | 03:53 |
*** melwitt has quit IRC | 03:58 | |
mikal | Is there some way I can get access to devstack-precise-check-rax-ord-676412 ? | 04:00 |
clarkb | did the test finish already | 04:01 |
mikal | Yes | 04:01 |
clarkb | if so no | 04:01 |
mikal | Bugger | 04:01 |
fungi | mikal: unlikely. by the time tests finished running it probably survived no more than a few minutes | 04:01 |
mikal | Because we don't have a very big window between tempest failing and the test ending | 04:02 |
mikal | Given tempest is the long thing | 04:02 |
fungi | we run very tight on quotas there | 04:02 |
mikal | Is there a simple way to tell in log stash which cluster a test ran in? | 04:02 |
clarkb | mikal no unfortunately, you can look at the first few lines of the console log but that is the easiest way | 04:03 |
mikal | So I have http://logstash.openstack.org/#eyJzZWFyY2giOiJmaWxlbmFtZTpjb25zb2xlLmh0bWwgQU5EIG1lc3NhZ2U6XCJhc3NlcnRpb25lcnJvcjogY29uc29sZSBvdXRwdXQgd2FzIGVtcHR5XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzODQ2NDEwNzIxODl9 | 04:03 |
mikal | It would be interesting to turn that into a list of clusters | 04:03 |
clarkb | you can query rhe string that says running on node foo | 04:03 |
mikal | Perhaps just one cluster hates us | 04:03 |
mikal | clarkb: I don't think I can because I am already matching on a line from the log? | 04:03 |
mikal | Unless I can assert I want files with two different lines in them? | 04:03 |
clarkb | ya you can use OR and () to group | 04:04 |
clarkb | so blah and blah (message:foo OR message:bar) | 04:04 |
*** Ryan_Lane has joined #openstack-infra | 04:04 | |
mikal | I would want (message:foo and message:bar) | 04:05 |
mikal | Would that work? | 04:05 |
mikal | There's no machine name attribute on the log files? | 04:05 |
clarkb | no because jenkins doesnt tell us that info | 04:05 |
mikal | Ok | 04:05 |
fungi | EJENKINS | 04:05 |
clarkb | hmm yeah you want and. I think you need two queries | 04:05 |
* mikal feels a script coming on | 04:05 | |
*** nati_ueno has quit IRC | 04:06 | |
*** nati_ueno has joined #openstack-infra | 04:06 | |
mikal | Yeah, I think I can do something horrible with the RSS feed | 04:06 |
fungi | mikal: they make a pill for that | 04:06 |
mikal | Please hold while I flail around | 04:06 |
mikal | Is there some way to get more results from the rss feed version of the results? | 04:08 |
clarkb | I dont know | 04:10 |
mikal | Ok | 04:11 |
mikal | Well, lets see if these 20 results show something interesting first | 04:11 |
*** wenlock has joined #openstack-infra | 04:14 | |
mikal | So, there are failures on non-rax nodes at leasrt | 04:14 |
*** ArxCruz has quit IRC | 04:22 | |
*** ArxCruz has joined #openstack-infra | 04:23 | |
*** DennyZhang has joined #openstack-infra | 04:24 | |
*** jcooley_ has quit IRC | 04:29 | |
openstackgerrit | Edward Raigosa proposed a change to openstack-infra/config: Make pip install from upstream better https://review.openstack.org/51425 | 04:30 |
* fungi is nodding off... until we meet on the morrow | 04:31 | |
*** markwash has joined #openstack-infra | 04:33 | |
clarkb | gnight | 04:33 |
clarkb | I am going to keep one eye on zuul just in case it starts moving | 04:33 |
clarkb | fungi: if you get up early and it is quiet I say go for the restart (will note here if I have restarted it) | 04:34 |
openstackgerrit | Kei YAMAZAKI proposed a change to openstack-infra/jenkins-job-builder: Added support for Emotional Jenkins https://review.openstack.org/56779 | 04:35 |
*** UtahDave has joined #openstack-infra | 04:35 | |
*** mriedem has quit IRC | 04:38 | |
*** senk has quit IRC | 04:42 | |
*** matsuhashi has joined #openstack-infra | 04:46 | |
*** balar has quit IRC | 04:55 | |
*** pcrews has joined #openstack-infra | 04:58 | |
*** SergeyLukjanov has joined #openstack-infra | 05:00 | |
*** nati_uen_ has joined #openstack-infra | 05:00 | |
*** nati_ueno has quit IRC | 05:03 | |
*** nati_uen_ is now known as nati_ueno | 05:04 | |
*** rakhmerov has joined #openstack-infra | 05:08 | |
*** UtahDave has quit IRC | 05:12 | |
*** jcooley_ has joined #openstack-infra | 05:13 | |
*** sarob has joined #openstack-infra | 05:15 | |
*** boris-42 has joined #openstack-infra | 05:18 | |
*** nosnos has quit IRC | 05:23 | |
*** nosnos has joined #openstack-infra | 05:23 | |
*** senk has joined #openstack-infra | 05:23 | |
*** jcooley_ has quit IRC | 05:27 | |
*** jcooley_ has joined #openstack-infra | 05:28 | |
*** jcooley_ has quit IRC | 05:32 | |
*** sdake_ has joined #openstack-infra | 05:35 | |
*** changbl has joined #openstack-infra | 05:38 | |
*** dangers is now known as danger_fo_away | 05:40 | |
*** senk has quit IRC | 05:41 | |
*** DennyZhang has quit IRC | 05:43 | |
*** sdake_ has quit IRC | 05:43 | |
*** sdake_ has joined #openstack-infra | 05:43 | |
*** rcleere has quit IRC | 05:51 | |
*** loq_mac has joined #openstack-infra | 05:51 | |
*** jcooley_ has joined #openstack-infra | 05:54 | |
*** masayukig has quit IRC | 05:55 | |
*** guohliu has quit IRC | 05:57 | |
*** guohliu has joined #openstack-infra | 06:00 | |
*** xeyed4good has joined #openstack-infra | 06:01 | |
*** SergeyLukjanov has quit IRC | 06:02 | |
*** michchap has quit IRC | 06:02 | |
*** oubiwann has quit IRC | 06:02 | |
*** michchap has joined #openstack-infra | 06:04 | |
*** pcrews has quit IRC | 06:05 | |
openstackgerrit | Russell Bryant proposed a change to openstack-infra/config: Add gate-solum-devstack job https://review.openstack.org/57098 | 06:05 |
*** xeyed4good has quit IRC | 06:05 | |
*** sarob has quit IRC | 06:07 | |
*** dstanek has quit IRC | 06:21 | |
*** wenlock has quit IRC | 06:25 | |
openstackgerrit | Fei Long Wang proposed a change to openstack/requirements: Get better format for long lines with PrettyTable https://review.openstack.org/57104 | 06:39 |
*** DennyZhang has joined #openstack-infra | 06:39 | |
*** matsuhashi has quit IRC | 06:41 | |
*** matsuhashi has joined #openstack-infra | 06:42 | |
*** DennyZhang has quit IRC | 06:44 | |
*** matsuhashi has quit IRC | 06:46 | |
*** matsuhashi has joined #openstack-infra | 06:48 | |
*** fifieldt has quit IRC | 06:50 | |
*** dizquierdo has joined #openstack-infra | 06:51 | |
*** jhesketh has quit IRC | 06:55 | |
*** denis_makogon has joined #openstack-infra | 07:02 | |
*** boris-42 has quit IRC | 07:02 | |
*** michchap has quit IRC | 07:03 | |
*** michchap has joined #openstack-infra | 07:04 | |
*** nosnos_ has joined #openstack-infra | 07:05 | |
*** nosnos has quit IRC | 07:05 | |
*** sarob has joined #openstack-infra | 07:08 | |
*** matsuhashi has quit IRC | 07:09 | |
*** matsuhashi has joined #openstack-infra | 07:10 | |
*** yolanda has joined #openstack-infra | 07:12 | |
*** matsuhas_ has joined #openstack-infra | 07:13 | |
*** sarob has quit IRC | 07:13 | |
*** matsuhashi has quit IRC | 07:14 | |
*** arata has joined #openstack-infra | 07:17 | |
*** davidhadas has quit IRC | 07:23 | |
*** SergeyLukjanov has joined #openstack-infra | 07:24 | |
*** SergeyLukjanov has quit IRC | 07:30 | |
*** dizquierdo has quit IRC | 07:36 | |
*** sjing has quit IRC | 07:36 | |
*** DinaBelova has joined #openstack-infra | 07:37 | |
*** SergeyLukjanov has joined #openstack-infra | 07:38 | |
*** sjing has joined #openstack-infra | 07:38 | |
*** nsaje has joined #openstack-infra | 07:38 | |
*** DennyZhang has joined #openstack-infra | 07:40 | |
*** DennyZhang has quit IRC | 07:44 | |
*** fbo is now known as fbo_away | 07:57 | |
*** nsaje has quit IRC | 08:03 | |
*** nsaje has joined #openstack-infra | 08:03 | |
*** jcoufal has joined #openstack-infra | 08:04 | |
*** marun has quit IRC | 08:06 | |
*** dizquierdo has joined #openstack-infra | 08:06 | |
*** nsaje has quit IRC | 08:08 | |
*** sarob has joined #openstack-infra | 08:09 | |
*** osanchez has joined #openstack-infra | 08:11 | |
*** sarob has quit IRC | 08:13 | |
*** nsaje has joined #openstack-infra | 08:14 | |
*** sarob has joined #openstack-infra | 08:15 | |
*** mestery_ has joined #openstack-infra | 08:15 | |
*** jcooley_ has quit IRC | 08:18 | |
*** mestery has quit IRC | 08:19 | |
*** sarob has quit IRC | 08:19 | |
*** katyafervent has quit IRC | 08:20 | |
ttx | markwash, clarkb, fungi: I thought the whole plan around client releases is that you have a single release channel, which means no stable branches and no backward-incompatible changes. When mordred pushed it I remember that we said the model would break in several places if we suddenly introduced branches in it | 08:23 |
ttx | markwash, clarkb, fungi: that would make a good topic for release meeting today if mordred can make it | 08:23 |
*** marun has joined #openstack-infra | 08:24 | |
*** masayukig has joined #openstack-infra | 08:25 | |
*** denis_makogon has quit IRC | 08:28 | |
*** jcooley_ has joined #openstack-infra | 08:30 | |
*** davidhadas has joined #openstack-infra | 08:32 | |
*** afazekas_ has joined #openstack-infra | 08:34 | |
*** hashar has joined #openstack-infra | 08:34 | |
*** flaper87|afk is now known as flaper87 | 08:37 | |
*** arata has left #openstack-infra | 08:39 | |
*** boris-42 has joined #openstack-infra | 08:41 | |
*** jhesketh_ has joined #openstack-infra | 08:41 | |
*** jhesketh__ has joined #openstack-infra | 08:41 | |
*** rakhmerov has quit IRC | 08:42 | |
*** boris-42_ has joined #openstack-infra | 08:45 | |
*** boris-42 has quit IRC | 08:45 | |
*** jcooley_ has quit IRC | 08:49 | |
openstackgerrit | Sascha Peilicke proposed a change to openstack-dev/pbr: Support building wheels (PEP-427) https://review.openstack.org/57117 | 08:52 |
*** dachary has quit IRC | 08:53 | |
*** ilyashakhat has quit IRC | 08:53 | |
*** DinaBelova has quit IRC | 08:54 | |
*** ilyashakhat has joined #openstack-infra | 08:54 | |
*** DinaBelova has joined #openstack-infra | 08:54 | |
*** dachary has joined #openstack-infra | 08:55 | |
*** yassine has joined #openstack-infra | 08:56 | |
*** mkerrin has quit IRC | 08:56 | |
*** nati_ueno has quit IRC | 08:59 | |
*** davidhadas_ has joined #openstack-infra | 09:00 | |
*** yamahata_ has joined #openstack-infra | 09:01 | |
*** jhesketh__ has quit IRC | 09:02 | |
*** jhesketh_ has quit IRC | 09:02 | |
*** davidhadas has quit IRC | 09:03 | |
*** salv-orlando has joined #openstack-infra | 09:05 | |
*** osanchez has quit IRC | 09:08 | |
*** sarob has joined #openstack-infra | 09:08 | |
*** jpich has joined #openstack-infra | 09:09 | |
*** nsaje has quit IRC | 09:11 | |
*** derekh has joined #openstack-infra | 09:11 | |
*** nsaje has joined #openstack-infra | 09:12 | |
*** sarob has quit IRC | 09:12 | |
*** jhesketh__ has joined #openstack-infra | 09:14 | |
*** jhesketh_ has joined #openstack-infra | 09:15 | |
*** nsaje has quit IRC | 09:15 | |
*** nsaje has joined #openstack-infra | 09:16 | |
*** jcooley_ has joined #openstack-infra | 09:20 | |
*** locke105 has quit IRC | 09:21 | |
*** Ryan_Lane has quit IRC | 09:21 | |
*** fbo_away is now known as fbo | 09:23 | |
*** michchap has quit IRC | 09:24 | |
*** matsuhas_ has quit IRC | 09:25 | |
openstackgerrit | Peter Liljenberg proposed a change to openstack-infra/jenkins-job-builder: Added support for Jenkins plugin Blame upstream committers https://review.openstack.org/54085 | 09:25 |
*** matsuhashi has joined #openstack-infra | 09:25 | |
*** jcooley_ has quit IRC | 09:25 | |
*** SergeyLukjanov has quit IRC | 09:26 | |
*** michchap has joined #openstack-infra | 09:26 | |
*** bingbu has quit IRC | 09:29 | |
*** rakhmerov has joined #openstack-infra | 09:29 | |
*** yamahata_ has quit IRC | 09:30 | |
*** matsuhashi has quit IRC | 09:30 | |
*** matsuhashi has joined #openstack-infra | 09:30 | |
*** nosnos_ has quit IRC | 09:33 | |
*** nosnos has joined #openstack-infra | 09:34 | |
*** BobBallAway is now known as BobBall | 09:44 | |
*** alexpilotti has joined #openstack-infra | 09:51 | |
*** odyssey4me has joined #openstack-infra | 10:02 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Add story dialog improvement https://review.openstack.org/57180 | 10:03 |
*** guohliu has quit IRC | 10:03 | |
*** sjing has quit IRC | 10:04 | |
*** nsaje has quit IRC | 10:04 | |
*** davidhadas has joined #openstack-infra | 10:05 | |
*** ilyashakhat has quit IRC | 10:06 | |
*** ilyashakhat has joined #openstack-infra | 10:07 | |
*** arata has joined #openstack-infra | 10:07 | |
*** davidhadas_ has quit IRC | 10:08 | |
*** sarob has joined #openstack-infra | 10:08 | |
*** nati_ueno has joined #openstack-infra | 10:09 | |
*** nati_ueno has quit IRC | 10:11 | |
*** sgran has joined #openstack-infra | 10:11 | |
sgran | morning | 10:11 |
sgran | is something broken with check-tempest-devstack-vm-postgres-full ? | 10:12 |
sgran | apologies if others have brought this up this morning | 10:12 |
*** sarob has quit IRC | 10:13 | |
*** SergeyLukjanov has joined #openstack-infra | 10:13 | |
*** SergeyLukjanov has quit IRC | 10:15 | |
*** SergeyLukjanov has joined #openstack-infra | 10:16 | |
ogelbukh | I recall some bot in infra which reported new bugs created in launchpad, is it for real or just my imagination? | 10:19 |
*** jcooley_ has joined #openstack-infra | 10:22 | |
*** adalbas has joined #openstack-infra | 10:23 | |
*** nsaje has joined #openstack-infra | 10:23 | |
openstackgerrit | Kei YAMAZAKI proposed a change to openstack-infra/jenkins-job-builder: Added support for Emotional Jenkins https://review.openstack.org/56779 | 10:26 |
*** jcooley_ has quit IRC | 10:28 | |
*** lcestari has joined #openstack-infra | 10:32 | |
*** nsaje has quit IRC | 10:36 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Added task ordering https://review.openstack.org/56026 | 10:36 |
*** nsaje has joined #openstack-infra | 10:37 | |
*** nsaje has quit IRC | 10:37 | |
*** nsaje has joined #openstack-infra | 10:38 | |
*** yamahata_ has joined #openstack-infra | 10:38 | |
*** lcestari has quit IRC | 10:39 | |
*** nsaje has quit IRC | 10:42 | |
*** nsaje has joined #openstack-infra | 10:46 | |
*** lcestari has joined #openstack-infra | 10:46 | |
*** marun has joined #openstack-infra | 10:49 | |
*** hashar has quit IRC | 10:51 | |
*** davidhadas has quit IRC | 10:52 | |
*** davidhadas has joined #openstack-infra | 10:53 | |
*** vladan_ has joined #openstack-infra | 10:56 | |
*** boris-42_ is now known as boris-42 | 10:56 | |
*** matsuhashi has quit IRC | 10:58 | |
*** matsuhashi has joined #openstack-infra | 10:59 | |
*** vladan has quit IRC | 10:59 | |
*** vladan_ is now known as vladan | 10:59 | |
*** matsuhashi has quit IRC | 11:04 | |
*** rfolco has joined #openstack-infra | 11:08 | |
*** sarob has joined #openstack-infra | 11:08 | |
*** dizquierdo has quit IRC | 11:12 | |
*** sarob has quit IRC | 11:12 | |
*** plomakin has quit IRC | 11:19 | |
*** yaguang has quit IRC | 11:20 | |
*** CaptTofu has quit IRC | 11:21 | |
*** CaptTofu has joined #openstack-infra | 11:22 | |
*** jcooley_ has joined #openstack-infra | 11:25 | |
*** CaptTofu has quit IRC | 11:27 | |
*** SergeyLukjanov has quit IRC | 11:30 | |
*** marun has joined #openstack-infra | 11:31 | |
*** SergeyLukjanov has joined #openstack-infra | 11:32 | |
*** pcm_ has joined #openstack-infra | 11:34 | |
*** pcm_ has quit IRC | 11:34 | |
*** pcm_ has joined #openstack-infra | 11:35 | |
*** mugsie has quit IRC | 11:40 | |
*** ogelbukh1 has quit IRC | 11:41 | |
*** plomakin has joined #openstack-infra | 11:46 | |
*** ilyashakhat has quit IRC | 11:47 | |
*** ilyashakhat has joined #openstack-infra | 11:48 | |
*** ben_duyujie has joined #openstack-infra | 11:48 | |
*** SergeyLukjanov has quit IRC | 11:51 | |
*** arata has left #openstack-infra | 11:57 | |
*** jcooley_ has quit IRC | 11:59 | |
*** davidhadas_ has joined #openstack-infra | 12:02 | |
*** amotoki has quit IRC | 12:03 | |
*** davidhadas has quit IRC | 12:05 | |
*** ruhe has joined #openstack-infra | 12:08 | |
*** sarob has joined #openstack-infra | 12:08 | |
*** davidhadas has joined #openstack-infra | 12:09 | |
*** mugsie has joined #openstack-infra | 12:09 | |
*** davidhadas_ has quit IRC | 12:12 | |
*** sarob has quit IRC | 12:14 | |
*** osanchez has joined #openstack-infra | 12:20 | |
*** markmc has joined #openstack-infra | 12:23 | |
*** nati_ueno has joined #openstack-infra | 12:24 | |
*** odyssey4me has quit IRC | 12:26 | |
*** ruhe has quit IRC | 12:32 | |
*** salv-orlando has quit IRC | 12:34 | |
openstackgerrit | Zane Bitter proposed a change to openstack-infra/reviewstats: Add randall-burt to heat-core https://review.openstack.org/57211 | 12:34 |
*** davidhadas_ has joined #openstack-infra | 12:35 | |
*** davidhadas has quit IRC | 12:38 | |
*** salv-orlando has joined #openstack-infra | 12:43 | |
soren | I'm looking at why this failed: https://review.openstack.org/#/c/55519/ Does the name in zuul's layout.yaml's jobs section need to start with ^ and end with $ for it to be recognised as a regex or did I just screw up the regex? | 12:44 |
*** jhesketh_ has quit IRC | 12:45 | |
*** masayukig has quit IRC | 12:48 | |
soren | Yup, it seems it needs to start with a ^. | 12:48 |
* soren fixes | 12:48 | |
*** markmc has quit IRC | 12:49 | |
openstackgerrit | Soren Hansen proposed a change to openstack-infra/config: Add BasicDB to stackforge https://review.openstack.org/55519 | 12:50 |
*** jcooley_ has joined #openstack-infra | 12:55 | |
*** davidhadas has joined #openstack-infra | 12:57 | |
*** davidhadas_ has quit IRC | 13:00 | |
*** dkranz has joined #openstack-infra | 13:00 | |
*** jcooley_ has quit IRC | 13:00 | |
*** dizquierdo has joined #openstack-infra | 13:01 | |
*** afazekas_ has quit IRC | 13:02 | |
*** sarob has joined #openstack-infra | 13:08 | |
*** sarob has quit IRC | 13:13 | |
*** afazekas_ has joined #openstack-infra | 13:14 | |
*** salv-orlando has quit IRC | 13:17 | |
*** xeyed4good has joined #openstack-infra | 13:17 | |
*** dprince has joined #openstack-infra | 13:18 | |
*** marun has quit IRC | 13:20 | |
*** thomasem has joined #openstack-infra | 13:34 | |
*** nosnos has quit IRC | 13:35 | |
*** matel has joined #openstack-infra | 13:36 | |
yassine | Hi all, | 13:42 |
*** nsaje has quit IRC | 13:42 | |
yassine | could someone please validate https://review.openstack.org/#/c/56927 | 13:42 |
*** nsaje has joined #openstack-infra | 13:43 | |
matel | Do we have any stats about the number of the nova check jobs? | 13:43 |
anteaya | the gate appears to be starved for nodes | 13:44 |
anteaya | check jobs are running though | 13:45 |
anteaya | matel do you mean the number of nova check jobs running right now? | 13:45 |
*** weshay has joined #openstack-infra | 13:45 | |
anteaya | or waht time frame | 13:45 |
anteaya | hi yassine when a core wanders by, they can have a peek | 13:46 |
anteaya | *what time frame | 13:46 |
*** sandywalsh_ has joined #openstack-infra | 13:46 | |
*** nsaje has quit IRC | 13:47 | |
yassine | anteaya, thank you Anita ;) | 13:47 |
*** yamahata_ has quit IRC | 13:48 | |
*** dstanek has joined #openstack-infra | 13:48 | |
anteaya | np :D | 13:48 |
matel | anteaya: I would like to know how many jobs do I need to run if I do the third party testing on nova. | 13:48 |
*** yamahata_ has joined #openstack-infra | 13:49 | |
matel | to know how much compute capacity is needed. | 13:49 |
*** marun has joined #openstack-infra | 13:49 | |
*** wenlock has joined #openstack-infra | 13:50 | |
*** wenlock has quit IRC | 13:50 | |
anteaya | matel ummm, are you asking what compute capcity is needed by infrastructure or on your own local system? | 13:50 |
matel | yep. | 13:51 |
anteaya | I haven't caught up to the context of your question | 13:51 |
anteaya | compute capacity for infrastructure would be a question for fungi when he awakes | 13:51 |
matel | I am looking at: http://ci.openstack.org/third_party.html | 13:51 |
anteaya | I'd give him about another 40 minutes to arrive | 13:51 |
anteaya | ah okay | 13:52 |
anteaya | know that right now our infrastructure is in a very unhappy place | 13:52 |
matel | So if I were doing the 3rd party testing in my own datacenter, roughly how many machines would I need. | 13:52 |
anteaya | and fungi will probably be attending to that for at least the first 30 minutes upon his arrival | 13:52 |
anteaya | matel: okay, catching up with you now | 13:52 |
anteaya | you you are just testing nova? | 13:53 |
matel | Yes, I am only interested in nova atm. | 13:53 |
anteaya | okay AruxCrux might have some suggestions, as he has set up his own infra that listens to our gerrit stream | 13:53 |
dprince | sdague: ping | 13:54 |
anteaya | so that is an option, but he isn't online right now | 13:54 |
matel | Do you have any stats on the check jobs? How many jobs did you run on each day, or something like this? | 13:54 |
anteaya | morning dprince | 13:54 |
dprince | anteaya: morning! | 13:54 |
anteaya | matel: have you seen our status page? http://status.openstack.org/zuul/ | 13:54 |
anteaya | dprince: sdague is day 2 on $new_job and is in San Jose this week | 13:54 |
*** arata has joined #openstack-infra | 13:55 | |
*** SergeyLukjanov has joined #openstack-infra | 13:55 | |
anteaya | dprince: didn't see him online at all yesteday | 13:55 |
*** SergeyLukjanov is now known as _SergeyLukjanov | 13:55 | |
anteaya | not sure what to expect today | 13:55 |
dprince | anteaya: maybe he's in training on the new job | 13:55 |
anteaya | dprince: that is my guess as well | 13:55 |
*** _SergeyLukjanov is now known as SergeyLukjanov | 13:56 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 13:56 | |
anteaya | dprince: hang out with us today, jenkins is sick and at least having you know that helps to curb the recheck | 13:56 |
*** jcooley_ has joined #openstack-infra | 13:57 | |
*** wenlock has joined #openstack-infra | 13:57 | |
anteaya | and the gate has some nodes now | 13:57 |
matel | anteaya: I am looking for a number of checks per day for project nova, and I cannot find it at zuul's page, could you help me? | 13:57 |
anteaya | matel: I am giving you what I have | 13:57 |
* dprince hangs out | 13:57 | |
anteaya | it is possible we don't keep that stat | 13:57 |
anteaya | dprince: thanks | 13:57 |
*** hashar has joined #openstack-infra | 13:58 | |
matel | dprince: do you have any stats regarding to the number of nova check jobs per day? | 13:58 |
*** sandywalsh__ has joined #openstack-infra | 13:58 | |
anteaya | also there may be other -infra folks who happen by who might be better help than I | 13:58 |
anteaya | so hang around | 13:58 |
matel | yep, trying dprince atm. | 13:58 |
*** davidhadas_ has joined #openstack-infra | 13:58 | |
matel | maybe ss has some stats. | 13:58 |
anteaya | you don't have to ask people, they will read the backscroll | 13:59 |
anteaya | and respond to you if they have an answer | 13:59 |
anteaya | just hang out and welcome to the channel | 13:59 |
dprince | matel: between 200-400 a day I think is reasonable for nova | 13:59 |
matel | dprince: thanks. | 13:59 |
dprince | matel: actually I think that is total (not just nova) | 13:59 |
anteaya | while you are waiting you are welcome to read some logs: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/ | 13:59 |
*** herndon_ has joined #openstack-infra | 14:00 | |
dprince | matel: but nova runs the bulk of them for sure | 14:00 |
*** sandywalsh_ has quit IRC | 14:00 | |
*** _SergeyLukjanov has quit IRC | 14:00 | |
*** sandywalsh__ has quit IRC | 14:01 | |
*** davidhadas has quit IRC | 14:01 | |
*** SergeyLukjanov has joined #openstack-infra | 14:01 | |
*** johnthetubaguy has joined #openstack-infra | 14:01 | |
fungi | what was the capacity question? | 14:01 |
*** dkranz has quit IRC | 14:02 | |
*** afazekas_ has quit IRC | 14:02 | |
fungi | oh, asking "why does the gate go so slowly?" | 14:02 |
fungi | it's that thread posted to the -dev ml last week (i think by jog0?) about "gate math" wherein it was explained that the current deciding factor in gating throughput is the percentage of nondeterministic failures in integration tests | 14:03 |
*** yaguang has joined #openstack-infra | 14:04 | |
anteaya | morning fungi | 14:04 |
*** arata has left #openstack-infra | 14:04 | |
anteaya | matel needs to do third party testing on nova and wants to know how large his/her data centre should be | 14:04 |
*** ruhe has joined #openstack-infra | 14:04 | |
fungi | since every change which encounters a test failure restarts tests for all the changes behind it. when a change averages half a dozen devstack-gate nodes and the queue is 75 changes deep, a reset near the head of the queue throws away 450 (75*6) virtual machines. our combined quota across all providers is right around 400 at the moment | 14:05 |
fungi | ahh, so for third-party testing, chances are it's once job per change being checked | 14:06 |
anteaya | fungi: yes, so the question was do we have stats on how many nova check jobs run in a 24 hour period | 14:06 |
*** sandywalsh_ has joined #openstack-infra | 14:07 | |
*** nsaje has joined #openstack-infra | 14:07 | |
anteaya | if we have this stat, I don't know where it would be | 14:07 |
*** dkliban has quit IRC | 14:07 | |
fungi | i don't have exact numbers for nova (might be able to dig them out of graphite.openstack.org if we have per-project stats there?), but if you look at the patchset created graph at the bottom of status.openstack.org/zuul (the green line on the gerrit events chart) we seem to be floating around 50 new patchsets an hour project-wide | 14:08 |
*** sarob has joined #openstack-infra | 14:08 | |
fungi | i expect nova to be less than half that | 14:08 |
anteaya | awesome thanks fungi | 14:09 |
anteaya | matel: did you catch that? | 14:09 |
anteaya | rough guess is about 20-25 new patchsets per hour | 14:10 |
fungi | i would call that a safe overestimate | 14:11 |
*** julim has joined #openstack-infra | 14:12 | |
*** sarob has quit IRC | 14:13 | |
*** mriedem has joined #openstack-infra | 14:13 | |
*** dkranz has joined #openstack-infra | 14:14 | |
*** ben_duyujie has quit IRC | 14:14 | |
*** jergerber has joined #openstack-infra | 14:16 | |
*** ryanpetrello has joined #openstack-infra | 14:16 | |
*** DinaBelova has quit IRC | 14:17 | |
*** yamahata_ has quit IRC | 14:18 | |
anteaya | I'd like to have a conversation about resource usage for devstack/tests | 14:20 |
anteaya | right now there is no policy | 14:20 |
anteaya | so tests that use more resources just consume more | 14:21 |
*** dolphm has joined #openstack-infra | 14:21 | |
*** alcabrera has joined #openstack-infra | 14:21 | |
anteaya | I'd like us to have a discussion about having some form of policy about it so that folks using local development as well as scaling for more tests (infra) at least have some structure regarding resource usage expecations | 14:22 |
*** sgran has left #openstack-infra | 14:23 | |
*** dolphm has quit IRC | 14:23 | |
*** yamahata_ has joined #openstack-infra | 14:23 | |
fungi | a start would be to push back on people wanting to increase timeouts on some jobs, and consider any job which comes within 25% of its allotted run time to need "fixing" (more parallelization? split types of tests between an increased number of jobs?) | 14:24 |
*** mestery_ is now known as mestery | 14:24 | |
*** changbl has quit IRC | 14:25 | |
*** DinaBelova has joined #openstack-infra | 14:25 | |
*** dolphm has joined #openstack-infra | 14:28 | |
anteaya | would creating more jobs reduce resource consumption overall? | 14:28 |
anteaya | thinking of the spliting-tests suggestion | 14:29 |
*** jcooley_ has quit IRC | 14:30 | |
*** mfer has joined #openstack-infra | 14:30 | |
fungi | it could at least reduce the time changes take to get through the gate since it would trivially increase parallelization | 14:31 |
fungi | the tradeoff is that it would utilize more jenkins slaves (so we'd want more compute resources donated, maybe from additional providers) | 14:32 |
anteaya | fair enough | 14:32 |
anteaya | did we get any indication at teh summit there might be any additional providers in the future? | 14:33 |
*** mrodden has joined #openstack-infra | 14:34 | |
fungi | ttx: while you seem to be around, thoughts on tagging folsom-eol to the tip of stable/folsom on each of the projects listed in http://lists.openstack.org/pipermail/openstack-stable-maint/2013-November/001707.html | 14:34 |
fungi | ttx: is that something you want to do (looks like you did diablo and essex), or a task for adam_g/apevec? | 14:34 |
fungi | (or should i just do it so we can get on to deleting the branches?) | 14:35 |
*** xeyed4good has quit IRC | 14:35 | |
*** afazekas_ has joined #openstack-infra | 14:35 | |
anteaya | you know the model that has individuals running scientific computational processes running on their laptop during downtime to help large science projects? | 14:36 |
anteaya | I wonder if we could look at that as a model for our testing somehow | 14:36 |
*** oubiwann has joined #openstack-infra | 14:36 | |
anteaya | since even getting new cloud providers will again reach a limit | 14:36 |
anteaya | simple tests like the pep8 tests | 14:37 |
*** locke105 has joined #openstack-infra | 14:38 | |
fungi | something distributed-grid-ish like boinc? it's well designed for basic computations, but emulating an entire operating system under that model would quickly cease to be viable i think | 14:39 |
fungi | i used to manage high-performance distributed compute clusters, and even with existing homogeneity (achieved by having all cluster nodes consist of identical hardware with identical software and configuration), there's a distinct lower bound on how finely you can separate tasks | 14:40 |
*** markmc has joined #openstack-infra | 14:41 | |
fungi | a distributed scheduler like mosix is about the closest i ever saw, and even that depended on the underlying operating systems and hardware to be identical | 14:41 |
fungi | as far as running os-level tasks distributed throughout a cluster i mean | 14:42 |
fungi | pvm/mpi got better performance, but there you were writing specifically cluster-aware applications to take advantage of your message passing interfaces | 14:42 |
anteaya | okay | 14:43 |
anteaya | didn't think it was viable | 14:43 |
anteaya | but the idea arrived so I thought I would give it some air time | 14:43 |
fungi | basically, with cloud providers giving us mostly-identical virtual machines, we're sort of already doing something like that | 14:44 |
anteaya | yes | 14:44 |
anteaya | am just trying the thing of options for scaling out | 14:45 |
anteaya | with the rate of growth, I can see us spending a lot of time this cycle hitting limits | 14:45 |
*** jlk has quit IRC | 14:45 | |
*** jamesmcarthur has joined #openstack-infra | 14:45 | |
anteaya | so trying to float ideas for new models, rather than just making the current model bigger | 14:45 |
*** jlk has joined #openstack-infra | 14:46 | |
anteaya | I have a lot of ideas that don't work | 14:46 |
fungi | but even if we conquered the virtual-machine-on-a-loose-grid challenge, massive distributed projects like seti, folding and so on rely on multiple systems redundantly solving the same tasks so that they can deal with nodes misreporting or disappearing. worse, there's no real guarantee on how long a task will take to complete under that model, so i think our throughput would likely suffer | 14:46 |
*** thedodd has joined #openstack-infra | 14:46 | |
*** zul has quit IRC | 14:47 | |
*** zul has joined #openstack-infra | 14:47 | |
anteaya | yes it would | 14:47 |
anteaya | was thinking the same | 14:47 |
dstanek | is there a way to open a closed review? there are a couple of reviews that i'm interested in that have been abandoned because the author never came back after a bad review | 14:47 |
anteaya | nodes going down due to power outages, or folks shutting down | 14:47 |
anteaya | dstanek: do you have urls? | 14:48 |
*** davidhadas_ has quit IRC | 14:48 | |
anteaya | infra core can open as well as the original author | 14:48 |
anteaya | those are the only 2 choices I know of | 14:48 |
fungi | dstanek: if you're not the author, you need a gerrit admin to restore it. give me the review numbers and i can take care of it | 14:48 |
dstanek | anteaya, fungi: is there a permission to allow core reviewers to reopen - so i don't have to bug you guys all the time? | 14:49 |
anteaya | fungi: is gerrit admin different from infra-core? | 14:49 |
dstanek | a simple example is review.openstack.org/54632 - that came across in my email today | 14:49 |
fungi | dstanek: nope. if there were, we wouldn't be having this conversation ;) | 14:49 |
dstanek | anteaya: i mean a core reviewer of the project | 14:50 |
dstanek | fungi: :) | 14:50 |
fungi | i agree it's silly that it isn't a gerrit acl permission | 14:50 |
fungi | anteaya: at the moment there is a 1:1:1 correspondence between gerrit admins and infra core reviewers and infra root sysadmins | 14:52 |
fungi | i was simply being more specific | 14:52 |
anteaya | okay, thanks | 14:53 |
anteaya | I'm losing the distinction in my mind since it is all the same group | 14:53 |
anteaya | but gerrit admins it is | 14:53 |
anteaya | I count on your specificity | 14:53 |
*** dolphm has quit IRC | 14:54 | |
mriedem | i'm not subscribed to the openstack-infra mailing list, is it pretty active, or does openstack-dev with the [infra] tag get just as much traffic? | 14:57 |
*** sarob has joined #openstack-infra | 14:59 | |
anteaya | mriedem: fairly quiet | 15:00 |
anteaya | a thread or two a week and pleia2 meeting reminder and the link to logs | 15:00 |
anteaya | mriedem: http://lists.openstack.org/pipermail/openstack-infra/ | 15:01 |
*** dolphm has joined #openstack-infra | 15:02 | |
*** salv-orlando has joined #openstack-infra | 15:02 | |
*** sarob has quit IRC | 15:03 | |
mriedem | anteaya: ah, ok, was looking for this: http://lists.openstack.org/pipermail/openstack-infra/2013-November/000432.html | 15:03 |
*** sarob has joined #openstack-infra | 15:04 | |
mriedem | we have a team in china getting CI setup for DB2 and they are relatively new to community | 15:04 |
*** dkliban has joined #openstack-infra | 15:04 | |
mriedem | was trying to monitor what they are doing but wasn't on the infra ML | 15:04 |
anteaya | mriedem: ah okay, great | 15:05 |
anteaya | well feel free to join or track archieves, your call | 15:05 |
*** rnirmal has joined #openstack-infra | 15:05 | |
anteaya | http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra | 15:05 |
anteaya | if you want to sign up | 15:06 |
matel | anteaya: Thanks for the info (I re-read the logs) | 15:07 |
anteaya | matel: great | 15:08 |
anteaya | feel free to idle, read logs, ask questions and help if you can | 15:08 |
anteaya | as you set up your 3rd party tests | 15:08 |
anteaya | and afterward | 15:09 |
mriedem | anteaya: do you know if there are some numbers for how many patches go through the check queue in a month for a given project? not seeing anything obvious on zuul, but i have to believe that infra tracks that | 15:12 |
mriedem | getting an idea for load | 15:12 |
anteaya | mriedem: seems to be a popular question today | 15:12 |
anteaya | fungi did give an explaination about how one might calculate that | 15:13 |
anteaya | and no we don't track check load based on project | 15:13 |
fungi | well, what i said was we might have per-project stats on graphite.openstack.org but i don't remember | 15:13 |
anteaya | ah yes | 15:13 |
* mriedem looks | 15:14 | |
*** xeyed4good has joined #openstack-infra | 15:14 | |
fungi | we do have per-job stats, so could probably base it off a graph for the nova python 2.7 unit tests | 15:15 |
dolphm | anyone- is it possible to grant Restore privileges to *-core groups? | 15:15 |
anteaya | dolphm: do we have a use case? | 15:15 |
mriedem | fungi: bingo, looking at gate-sqlalchemy-migrate-python27 in graphite | 15:16 |
dolphm | anteaya: during release time and summit time, a lot of reviews get abandoned because *-core is busy | 15:17 |
dolphm | anteaya: it's really *-core's fault, and we should have the power to fix the problem | 15:17 |
fungi | dolphm: we wish, but no it's a change-owner-only feature with no corresponding acl permission | 15:18 |
dolphm | anteaya: rather than depending on the author to Restore (they're generally just as busy anyway) | 15:18 |
fungi | dolphm: only the change owner and gerrit administrators can restore an abandoned change | 15:18 |
fungi | dolphm: it's one of the oversights i hope to be corrected in newer gerrit releases | 15:19 |
dolphm | fungi: boo, thanks | 15:20 |
dolphm | fungi: if you know of anything tracking that upstream, i'd appreciate it :) | 15:20 |
fungi | dolphm: i'll try to remember to ask zaro if he's tried it on gerrit 2.7, since he's been running the effort to eventually be able to upgrade us | 15:21 |
anteaya | here is the patchset created graph for the last 24 hours, matel, mriedem: http://graphite.openstack.org/graphlot/?width=1228&height=630&_salt=1384874559.636&target=stats_counts.gerrit.event.patchset-created | 15:23 |
*** nati_uen_ has joined #openstack-infra | 15:25 | |
*** dolphm has quit IRC | 15:26 | |
*** jcooley_ has joined #openstack-infra | 15:27 | |
*** nati_ueno has quit IRC | 15:28 | |
matel | anteaya: Thanks, that's great! | 15:28 |
anteaya | matel: glad to help | 15:28 |
*** dcramer_ has joined #openstack-infra | 15:28 | |
*** dolphm_ has joined #openstack-infra | 15:29 | |
fungi | note that that's patchsets created for all projects | 15:29 |
mordred | GAH scrollback | 15:31 |
mordred | fungi: morning fungi - have I missed anything interesting? | 15:32 |
*** datsun180b has joined #openstack-infra | 15:32 | |
*** jcooley_ has quit IRC | 15:33 | |
mordred | ttx: yes. that is exactly correct. it basically comes down to an end user downloading *client | 15:33 |
mordred | ttx: if they want to talk to a nova, they do not necessarily know, nor should they need to, what version of nova is runnin | 15:33 |
mordred | so the decision was that we'd make sure that client libs were always backwards compatible | 15:34 |
mordred | now we just need to land the jobs taht test that :) | 15:34 |
*** senk has joined #openstack-infra | 15:34 | |
ttx | mordred: see backlog, someone suggested multiple client branches | 15:35 |
mordred | ttx: I scanned and couldn't find the mention | 15:35 |
mordred | ttx: do you know what they were trying to achieve? | 15:35 |
*** atiwari has joined #openstack-infra | 15:36 | |
ttx | mordred: <markwash> 18:09:48> can I use gerrit to track two major-release series of python-glanceclient? Like, suppose I want to create a 1.0 branch and start making backwards incompatible changes. . . | 15:37 |
mordred | ah. ok. that's slightly different | 15:37 |
mordred | and that can theoretically be done -- the intent there is not "I want a stable/grizzly glanceclient" but instead "I want to rev the major version" | 15:38 |
mordred | there are a BUNCH of things we'd need to think through, but mechanically it's not impossible | 15:38 |
mordred | that's what jog0 does with hacking | 15:38 |
dolphm_ | is this "official"? https://twitter.com/osjenkins | 15:40 |
*** dolphm_ is now known as dolphm | 15:40 | |
anteaya | dolphm: never seen it before, don't know who put it up | 15:44 |
fungi | dolphm: i do not believe so, but not sure whose it is | 15:45 |
*** Loquacity has quit IRC | 15:45 | |
anteaya | since it is using OpenStack as part of its title, would the foundation be interested in this account, I wonder? | 15:45 |
anteaya | what do you think ttx? | 15:45 |
fungi | mordred: but i think determining when to perform integration tests with which client branch (would we duplicate every test involving that client?) might get complicated | 15:46 |
* ttx looks | 15:47 | |
dolphm | i assume one of the people that retweeted the only tweet created the account | 15:47 |
*** Loquacity has joined #openstack-infra | 15:47 | |
ttx | anteaya: maybe? | 15:48 |
*** SpamapS_ has joined #openstack-infra | 15:48 | |
*** Vivek_ has joined #openstack-infra | 15:48 | |
* ttx enjoys a 10-min break | 15:48 | |
anteaya | fair enough | 15:48 |
anteaya | :D | 15:48 |
zaro | fungi: huh, what about gerrit? | 15:48 |
*** pcrews has joined #openstack-infra | 15:48 | |
ttx | These 10 15-min status syncs over the day are a bit crazier idea than I thought | 15:49 |
anteaya | dolphm: the three retweets are all eNovance folks | 15:49 |
anteaya | sounds like an eNovance thing | 15:49 |
fungi | zaro: any idea if acls in 2.7 allow us to set groups who are allowed to restore (unabandon) changes? | 15:49 |
zaro | yes, i believe that's available. | 15:49 |
*** salv-orlando has quit IRC | 15:50 | |
*** afazekas_ has quit IRC | 15:50 | |
dolphm | anteaya: well there you go | 15:50 |
fungi | dolphm: ^ sounds like maybe it'll be fixed once we upgrade then | 15:50 |
dolphm | fungi: zaro: ooh, thanks! | 15:50 |
* dolphm waits patiently for 2.7 | 15:51 | |
*** klrmn1 has joined #openstack-infra | 15:51 | |
anteaya | dolphm: yup, thanks for the link | 15:51 |
*** ArxCruz has quit IRC | 15:51 | |
*** ArxCruz has joined #openstack-infra | 15:51 | |
*** samalba_ has joined #openstack-infra | 15:51 | |
*** koohead17_ has joined #openstack-infra | 15:52 | |
*** juice_ has joined #openstack-infra | 15:52 | |
*** Vivek has quit IRC | 15:52 | |
*** ianw has quit IRC | 15:52 | |
*** samalba has quit IRC | 15:52 | |
*** SpamapS has quit IRC | 15:52 | |
*** mordred has quit IRC | 15:52 | |
*** juice has quit IRC | 15:52 | |
*** klrmn has quit IRC | 15:52 | |
*** koolhead17 has quit IRC | 15:52 | |
*** klrmn1 is now known as klrmn | 15:52 | |
*** juice_ is now known as juice | 15:52 | |
fungi | ttx: thoughts on tagging folsom-eol to the tip of stable/folsom on each of the projects listed in http://lists.openstack.org/pipermail/openstack-stable-maint/2013-November/001707.html (you did the previous ones, but do you want adam_g/apevec to do it this time, or should i just do it so i can go ahead deleting the branches)? | 15:53 |
*** atiwari has quit IRC | 15:53 | |
*** jcooley_ has joined #openstack-infra | 15:54 | |
*** samalba_ is now known as samalba | 15:54 | |
*** atiwari_ has joined #openstack-infra | 15:54 | |
*** atiwari_ has quit IRC | 15:54 | |
*** atiwari has joined #openstack-infra | 15:54 | |
*** adam_g has left #openstack-infra | 15:54 | |
*** adam_g has joined #openstack-infra | 15:54 | |
adam_g | fungi, hey! sorry, im on holiday. this is my first hour online since HK. im fine with you tagging it. | 15:55 |
*** jcooley_ has quit IRC | 15:55 | |
*** jcooley_ has joined #openstack-infra | 15:56 | |
adam_g | fungi, about to be offline again but back the 25th | 15:56 |
fungi | adam_g: no worries. have a good holiday | 15:57 |
*** salv-orlando has joined #openstack-infra | 15:58 | |
*** ianw has joined #openstack-infra | 15:59 | |
ttx | fungi: you can go ahead | 15:59 |
*** mordred has joined #openstack-infra | 15:59 | |
fungi | ttx: okay, i'll sign them and emulate whatever you did with tag message for diablo/essex | 15:59 |
*** primemin1sterp is now known as primeministerp | 16:01 | |
*** acabrera has joined #openstack-infra | 16:02 | |
*** sandywalsh_ has quit IRC | 16:03 | |
*** reed has joined #openstack-infra | 16:03 | |
*** adalbas has quit IRC | 16:04 | |
*** nsaje has quit IRC | 16:05 | |
*** adam_g is now known as adam_g_afk | 16:05 | |
*** alcabrera has quit IRC | 16:05 | |
*** nsaje has joined #openstack-infra | 16:05 | |
openstackgerrit | Alex Gaynor proposed a change to openstack-dev/pbr: Bump the development status classifier. https://review.openstack.org/57272 | 16:06 |
*** sarob has quit IRC | 16:07 | |
*** sarob has joined #openstack-infra | 16:07 | |
*** adalbas has joined #openstack-infra | 16:07 | |
*** xeyed4good has quit IRC | 16:07 | |
*** koohead17_ is now known as koolhead17 | 16:08 | |
*** rcleere has joined #openstack-infra | 16:09 | |
*** nsaje has quit IRC | 16:10 | |
*** danger_fo_away is now known as dangers | 16:11 | |
*** sarob has quit IRC | 16:11 | |
*** katyafervent has joined #openstack-infra | 16:13 | |
*** rakhmerov has quit IRC | 16:14 | |
*** rakhmerov has joined #openstack-infra | 16:14 | |
*** rakhmerov has quit IRC | 16:14 | |
*** mrodden has quit IRC | 16:15 | |
*** ruhe has quit IRC | 16:15 | |
*** dkranz has quit IRC | 16:19 | |
mordred | netsplit ftw | 16:20 |
mordred | fungi: Alex_Gaynor says we have some issues in the gate | 16:20 |
Alex_Gaynor | mordred: technically in the check I suppose :) | 16:21 |
*** ruhe has joined #openstack-infra | 16:22 | |
*** senk has quit IRC | 16:22 | |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Create and upload wheels https://review.openstack.org/56760 | 16:22 |
*** senk has joined #openstack-infra | 16:22 | |
Alex_Gaynor | mordred: Any idea how hard it would be to make the bot use different messages for "new patch" vs. "udpated a patch"? | 16:23 |
*** nati_uen_ has quit IRC | 16:24 | |
mordred | Alex_Gaynor: hrm. I have no idea. the event is always patchset-created ... I'm not sure if a ,2 is in the payload anywhere | 16:24 |
*** nsaje has joined #openstack-infra | 16:24 | |
EmilienM | mordred: i'm trying to get more review on https://review.openstack.org/#/c/48042/ since long time, without success, could you help on this ? | 16:24 |
mordred | EmilienM: I'm not core on devstack | 16:25 |
EmilienM | mordred: oops, ok | 16:25 |
mriedem | EmilienM: https://review.openstack.org/#/admin/groups/50,members | 16:25 |
EmilienM | mriedem: thx | 16:26 |
mordred | EmilienM: sorry. I'd totally ehlp you otherwise | 16:26 |
*** senk has quit IRC | 16:26 | |
EmilienM | mordred: appreciate | 16:26 |
*** noorul has joined #openstack-infra | 16:27 | |
noorul | Hello | 16:28 |
*** cody-somerville has quit IRC | 16:28 | |
noorul | https://review.openstack.org/#/c/57095/ | 16:28 |
jeblair | hello everybody | 16:28 |
noorul | In the review commit lob "blueprint api" is mentioned | 16:28 |
*** senk has joined #openstack-infra | 16:28 | |
noorul | It gets translated to https://blueprints.launchpad.net/openstack/?searchtext=api | 16:28 |
noorul | Instead of https://blueprints.launchpad.net/solum/+spec/api | 16:29 |
fungi | mordred: Alex_Gaynor: what issues exactly? we have a few hung changes waiting for a zuul restart, but the gate was too deep overnight and this morning to consider it yet | 16:29 |
noorul | Is this expected behavior? | 16:29 |
jeblair | mordred, fungi, ttx, clarkb (when you get online): i'm making a back-from-vacation todo list; anything you need me to put on it? | 16:29 |
mordred | jeblair: yay! | 16:29 |
mordred | jeblair: you're back! | 16:29 |
Alex_Gaynor | fungi: yeah it's the hung ones I think. They are 404s n Jenkins at this point | 16:29 |
mordred | jeblair: I would really like to get everything in the wheels-in-mirror topic landed | 16:29 |
fungi | Alex_Gaynor: yeah, those are left over from when puppet agent ate jenkins01 yesterday | 16:30 |
mordred | jeblair: AND - when you have a minute, I'd like to talk to you about a logic add to nodepool | 16:30 |
mordred | jeblair: related to adding the new hp cloud region | 16:30 |
Alex_Gaynor | fungi: heh. Okey doke | 16:30 |
mordred | jeblair: I wanted to talk to you before I did anything in that direction | 16:30 |
mordred | although landing the new hp cloud region into nodepool now should be fine as a trial baloon | 16:31 |
*** senk has quit IRC | 16:31 | |
mordred | jeblair: other than that - welcome back! | 16:31 |
*** ruhe has quit IRC | 16:31 | |
jeblair | mordred: ack, thx. | 16:32 |
fungi | mordred: i hadn't looked back from after clarkb and i discussed the hpcloud beta region addition, but does that nodepool change deal okay with providers with no region-name specified (getting set to none in novaclient)? | 16:32 |
mordred | jeblair: (some of the wheels-in-mirror patches have -1 vrfy because of gate races, some of them are waiting on https://review.openstack.org/#/c/56920/ to merge) | 16:33 |
*** senk has joined #openstack-infra | 16:33 | |
mordred | fungi: you mean availability-zone? I believe it should - I think it should only add availability-zone if it's in the file | 16:34 |
fungi | mordred: er, right. az | 16:34 |
*** hashar has quit IRC | 16:34 | |
*** dkranz has joined #openstack-infra | 16:35 | |
fungi | it was getting late and we thought the code was setting it to none in the connection if not specified, but i'll review with that in mind | 16:35 |
mordred | jeblair: tl;dr - new hp cloud has multi-az in single region, which means we still have multiple azs to balance across, but they share a quota | 16:35 |
mordred | jeblair: so we need to tell the nova api which az to put a thing in | 16:36 |
fungi | noorul: gerrit can't differentiate between projects when hyperlinking blueprints to lp | 16:36 |
mordred | jeblair: alternately, jog0 says that if we leave az out, nova will just pick one - so it's possible that just ignoring the az altogether might be fine | 16:36 |
fungi | noorul: if you use globally unique bp names and make solum "part of" the openstack project group (do other stackforge projects do that?) the bp search link may work as expected | 16:37 |
*** hashar has joined #openstack-infra | 16:38 | |
*** ruhe has joined #openstack-infra | 16:38 | |
*** changbl has joined #openstack-infra | 16:38 | |
noorul | fungi: I see | 16:38 |
jeblair | mordred: that last sounds ideal, maybe lets try it first? | 16:39 |
*** yaguang has quit IRC | 16:40 | |
*** cody-somerville has joined #openstack-infra | 16:40 | |
*** dkranz has quit IRC | 16:42 | |
fungi | jeblair: for the o'reilly press conference call scheduling, see e-mails from holly bauer in thread with subject: openstack workflow meeting part 2 | 16:46 |
*** sdake_ has quit IRC | 16:46 | |
*** jcooley_ has quit IRC | 16:46 | |
*** sdake_ has joined #openstack-infra | 16:47 | |
*** sdake_ has quit IRC | 16:47 | |
*** sdake_ has joined #openstack-infra | 16:47 | |
*** dprince has quit IRC | 16:50 | |
openstackgerrit | A change was merged to openstack-infra/reviewstats: Add randall-burt to heat-core https://review.openstack.org/57211 | 16:50 |
openstackgerrit | A change was merged to openstack-infra/storyboard: Added task ordering https://review.openstack.org/56026 | 16:50 |
*** markmc has quit IRC | 16:52 | |
*** markmc has joined #openstack-infra | 16:52 | |
openstackgerrit | A change was merged to openstack-infra/config: Update PyPI mirror for sqlalchemy-migrate releases https://review.openstack.org/56693 | 16:54 |
*** DennyZhang has joined #openstack-infra | 16:55 | |
*** dkranz has joined #openstack-infra | 16:55 | |
yolanda | hi, i'm just taking a look at bug https://bugs.launchpad.net/openstack-ci/+bug/950407 | 16:57 |
uvirtbot | Launchpad bug 950407 in openstack-ci "jenkins should run licensecheck on all projects" [Low,Triaged] | 16:57 |
*** dangers is now known as danger_fo_away | 16:57 | |
yolanda | and wanted to have a bit more info about it | 16:57 |
yolanda | should that involve create new jobs, like for example the *-pep8 ones? | 16:57 |
jeblair | yolanda: yay! :) i love it when people pick up low-hanging-fruit bugs! | 16:57 |
jeblair | yolanda: we could create new ones... or maybe we should just have the pep8 job run it... (it's really the "style check job" at this point, it's already more than pep8) | 16:58 |
jeblair | mordred, jog0: ^ | 16:58 |
*** sarob has joined #openstack-infra | 16:58 | |
*** chandankumar has quit IRC | 16:59 | |
mordred | jeblair: I'm fine with the pep8 job running it | 16:59 |
yolanda | so bugfix will pass by updating job configuration in jjb for pep8? | 17:00 |
jeblair | yolanda: that sounds like a good end-state goal. though there's one other consideration -- all our projects may not "pass" licensecheck right now... | 17:02 |
*** hashar is now known as hasharCall | 17:02 | |
yolanda | jeblair, so that licensecheck should be run for some specific ones? | 17:03 |
jeblair | yolanda: so maybe we should start with a new job to run licensecheck, put it in the experimental queue (and then silent, and later check), and when we're happy it's working, add it to pep8 | 17:03 |
mgagne | jeblair: better add a non-voting job until it gets fixed? otherwise gates will be blocked until it's fixed. | 17:03 |
*** nibalizer has quit IRC | 17:03 | |
mgagne | forgot about experimental queue ^^' | 17:03 |
jeblair | mgagne: yeah, same idea, basically | 17:04 |
yolanda | jeblair, so you have a jenkins that runs only the experimental jobs, right? how is it called? | 17:04 |
*** markwash has quit IRC | 17:04 | |
*** gyee has joined #openstack-infra | 17:05 | |
*** changbl has quit IRC | 17:05 | |
jeblair | yolanda: zuul has an experimental pipeline, so if you add the new job just to that pipeline, you can run it on request | 17:06 |
jeblair | yolanda: see https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/zuul/layout.yaml for some examples (gate-devstack-vm-cells is one) | 17:06 |
yolanda | ok | 17:06 |
jeblair | yolanda: you leave a comment with "check experimental" and it will run those jobs on that change | 17:06 |
*** nibalizer has joined #openstack-infra | 17:07 | |
yolanda | ok, i see it | 17:07 |
mordred | neat! | 17:08 |
mordred | mordred@camelot:~/src/openstack/nova$ licensecheck nova . | grep -v Apache | 17:08 |
mordred | ./run_tests.sh: *No copyright* UNKNOWN | 17:08 |
mordred | that's not bad | 17:08 |
*** afazekas_ has joined #openstack-infra | 17:08 | |
yolanda | i'll do some tries first on my test jenkins and then try to push some job | 17:09 |
jeblair | yolanda: awesome, thanks! | 17:09 |
*** sdake_ has quit IRC | 17:09 | |
*** alexpilotti has quit IRC | 17:10 | |
*** jcooley_ has joined #openstack-infra | 17:10 | |
pleia2 | welcome back, jeblair :) | 17:12 |
*** pycabrera has joined #openstack-infra | 17:13 | |
*** pycabrera is now known as alcabrera | 17:13 | |
jeblair | pleia2: thanks :) | 17:14 |
hub_cap | jeblair: how do i get u the wads of unmarked bills i owe u | 17:14 |
*** salv-orlando has quit IRC | 17:14 | |
hub_cap | they have successfully been laundered as per your wishes | 17:14 |
*** locke1051 has joined #openstack-infra | 17:16 | |
*** acabrera has quit IRC | 17:16 | |
*** locke105 has quit IRC | 17:17 | |
jeblair | hub_cap: i'll email you shortly (accounts receivable is #3 on my todo list [which i think is shockingly low]) | 17:19 |
Alex_Gaynor | fungi: fwiw, in addition tot eh stopped jobs, zuul's queue is also growing (643 results right now) | 17:19 |
fungi | Alex_Gaynor: yikes, that is high. it wasn't like that earlier i don't think | 17:22 |
*** salv-orlando has joined #openstack-infra | 17:22 | |
SpamapS_ | Anyone know why this automated "update from global requirements" patch isn't being.. well.. updated? https://review.openstack.org/#/c/56161 | 17:23 |
fungi | Alex_Gaynor: though gate resets with the pipeline as deep as it is probably would account for spikes that high | 17:23 |
hub_cap | jeblair: roger | 17:23 |
Alex_Gaynor | fungi: hmm, good point | 17:23 |
jeblair | Alex_Gaynor, fungi: yeah, that's likely a gate reset in progress | 17:23 |
fungi | SpamapS_: we disabled the job temporarily because it wasn't properly branch aware. there is a change in review to fix it, but i think it still needs work | 17:23 |
*** fbo is now known as fbo_away | 17:24 | |
jeblair | Alex_Gaynor, fungi: which just completed | 17:24 |
*** mrodden has joined #openstack-infra | 17:24 | |
*** changbl has joined #openstack-infra | 17:26 | |
*** flaper87 is now known as flaper87|afk | 17:27 | |
*** alcabrera is now known as alcabrera|afk | 17:27 | |
Alex_Gaynor | indeed, looks like it's starting to come down now | 17:27 |
ttx | looks like we could issue some openstackstatus notice to account for current gate slowness | 17:28 |
ttx | (or could have issued) | 17:28 |
* ttx goes for dinner | 17:28 | |
openstackgerrit | Roman Prykhodchenko proposed a change to openstack-infra/devstack-gate: Support Ironic in devstack gate https://review.openstack.org/53899 | 17:28 |
*** pcrews has quit IRC | 17:29 | |
jeblair | ttx: you can do that, you know. :) | 17:29 |
*** SergeyLukjanov has quit IRC | 17:30 | |
anteaya | fungi Alex_Gaynor correct at one point the zuul queue was 0 | 17:30 |
anteaya | about an hour or so ago | 17:30 |
*** reed has quit IRC | 17:30 | |
anteaya | welcome back jeblair | 17:31 |
SpamapS_ | fungi: ahh thanks! | 17:31 |
*** SpamapS_ is now known as SpamapS | 17:31 | |
*** DinaBelova has quit IRC | 17:32 | |
annegentle | jeblair: welcome back! Do you have any availability at all today for a call with O'Reilly on workflow? Otherwise has to be put off til next week. | 17:32 |
*** datsun180b has quit IRC | 17:34 | |
clarkb | jeblair the zuul nnfi bug | 17:35 |
fungi | annegentle: oh, they weren't going to be available at all the rest of teh week after today? i misread and thought it was next week they were out | 17:35 |
jeblair | annegentle: today? hrm, fungi told me they were looking at this week... | 17:35 |
clarkb | the one mtreinish found. i submitted it against zuul and is a "high" bug iirc | 17:36 |
annegentle | jeblair: Holly's last email said "If not, we may need to try for a week from today due to travel schedules. " | 17:36 |
fungi | ahh, yes, i just reread it now. i guess so | 17:36 |
clarkb | fungi: I need to kill old etherpad this morning but then I wanted to add new hpcloud if possible | 17:37 |
annegentle | jeblair: it'd be great to get it done before t-giving is all, punt to next week on the email thread as needed. My afternoon is already sunk anyway | 17:37 |
clarkb | mordred did you see my comment on the az nodepool change? | 17:37 |
fungi | clarkb: it sounds like we may want to rework the hpcloud additionm | 17:37 |
fungi | clarkb: in particular, there was a suggestion (i think per jog0?) that hpcloud's nova will just pick an az for us if we don't specify one | 17:38 |
clarkb | fungi: yes that is the case | 17:38 |
*** salv-orlando has quit IRC | 17:39 | |
clarkb | and image snapshots should cross AZs | 17:39 |
clarkb | so we should be able to treat that region as a single "AZ" | 17:39 |
fungi | so we could in theory try just adding that region and forego any concerns of cross-az quota sharing support in nodepool for now | 17:39 |
clarkb | ++ simplifies things | 17:39 |
clarkb | did floating ip quota come up? | 17:39 |
fungi | oh, right, forgot that was also a thing | 17:40 |
*** moted has joined #openstack-infra | 17:40 | |
clarkb | jeblair: https://bugs.launchpad.net/zuul/+bug/1246838 | 17:40 |
uvirtbot | Launchpad bug 1246838 in zuul "NNFI doesn't always reattach child of failing change to nearest non failing item." [High,Triaged] | 17:40 |
fungi | mordred: we worry that a 15 floating ip address limit may become an issue, so are you able to request a large enough fip quota to match our server quota in that region? | 17:40 |
*** dprince has joined #openstack-infra | 17:41 | |
fungi | but anyway, at least having nodepool not care which az it's using in the new region would also make it so that we don't immediately need to worry about implementing sub-region az support in nodepool either | 17:42 |
Alex_Gaynor | When did the gate get so slow again :/ It used to be like 25-30 minutes | 17:43 |
clarkb | Alex_Gaynor: between havana release and now | 17:43 |
clarkb | Alex_Gaynor: it is kind of funny because changes to havana fly through testing with few problems :) | 17:43 |
clarkb | we have regressed quickly | 17:44 |
Alex_Gaynor | Yeah, was going to say, that didn't take long | 17:44 |
clarkb | mordred: east nodes don't get a public IP you must floating ip them, so we either have proxy nodes or a giant floating ip quota | 17:44 |
*** boris-42 has quit IRC | 17:44 | |
esmute | Hey guys, can you guys take a look at https://review.openstack.org/#/c/52137/? It is currently random tempest tests. This is currently blocking our trove integration in horizon and heat | 17:45 |
Alex_Gaynor | esmute: Small thing, not everyone here is a guy. | 17:45 |
clarkb | esmute: there isn't much to say other than everyone is in the same boat and it sucks. we had a long discussion over the weekend about how we could tackle the problem but we didn't come away from that with anything concrete | 17:45 |
esmute | Alex_Gaynor: The term 'guys' refers to both male and female. | 17:47 |
*** salv-orlando has joined #openstack-infra | 17:47 | |
esmute | Alex_Gaynor: I apologize if that is not the case. | 17:47 |
clarkb | we need folks to be jumping on grenades and sorting out these bugs (and some people are yay jog0 and mikal) but so far it has been a losing battle (bugs come in quicker than we can fix them :/) | 17:47 |
jeblair | annegentle: my morning is mostly infra and tc meeting prep, could probably call this afternoon (>2100 utc; after tc meeting), but if your afternoon is sunk, maybe we should punt. | 17:48 |
*** markwash has joined #openstack-infra | 17:48 | |
*** jerryz has joined #openstack-infra | 17:48 | |
clarkb | oh meeeting is earlier now (relative to me but not fungi) /me runs to office so that old etherpad can be killed | 17:48 |
*** hasharCall has quit IRC | 17:49 | |
esmute | clarkb: Thanks. The tests fail in different test everytime. So i cant get jenkins to +1. | 17:49 |
clarkb | esmute: right, these are real bugs in openstack that are causing the failures and there is more than one of them | 17:50 |
*** hogepodge has joined #openstack-infra | 17:50 | |
clarkb | esmute: http://status.openstack.org/elastic-recheck/ is tracking some of the ones we know about | 17:50 |
clarkb | 1251920 is particularly bad | 17:51 |
*** derekh has quit IRC | 17:51 | |
*** dizquierdo has quit IRC | 17:51 | |
jeblair | clarkb: are they neutron related? would running duplicate jobs on neutron help? | 17:52 |
clarkb | jeblair: some are but 1251920 isn't | 17:52 |
clarkb | jeblair: and we already have extra neutron jobs which seems to have helped | 17:52 |
clarkb | https://bugs.launchpad.net/nova/+bug/1251920 mikal thought it may be a libvirt issue but hasn't been able to track that down yet | 17:53 |
uvirtbot | Launchpad bug 1251920 in nova "Tempest failures due to failure to return console logs from an instance" [High,Confirmed] | 17:53 |
fungi | well, except that extra neutron jobs mean neutron changes are even less likely to make it through the gate, so the roughly 1/3 changes in the gate right now which belong to neutron are a big reason our gate throughput has ground slower and slower | 17:53 |
clarkb | also checked the cirros issues | 17:53 |
annegentle | jeblair: it would be fine for you to meet without me, it's just that I already have to miss the first half of the TC meeting. I could meet after the TC meeting though | 17:53 |
annegentle | jeblair: fungi should we try for 4:00 CST/ 5:00 EST? | 17:54 |
jeblair | clarkb: we do? i don't see any duplicates (i thought the proposal was to run vm-neutron like 4 times or something) | 17:54 |
jeblair | clarkb: i do see there are some extra variants though | 17:54 |
clarkb | jeblair: well we have a new postgres job | 17:54 |
fungi | jeblair: there are additional job variants which run only on neutron | 17:55 |
fungi | right | 17:55 |
clarkb | that | 17:55 |
annegentle | jeblair: saw the email, thanks! | 17:55 |
clarkb | we could do more with those special jobs | 17:55 |
clarkb | before I forget and I want to talk about this in the meeting, I would like all new d-g/tempest jobs to come with a periodic variant | 17:56 |
clarkb | keep getting distracted by things I Need to do before leaving | 17:56 |
fungi | neutron also gets an isolation job which we don't run elsewhere yet | 17:56 |
fungi | at least last i checked | 17:56 |
clarkb | fungi: I think there are two isolation and two non isolation one for each DB | 17:56 |
clarkb | then there is the large ops job which isn't voting yet but could be | 17:57 |
fungi | yeah, also soon the parallel isolation job, which is only experimental for the moment | 17:57 |
jeblair | annegentle, fungi: 2200 utc wfm | 17:57 |
clarkb | that works for me too if having extra folks listen in isn't too horrible | 17:58 |
*** SergeyLukjanov has joined #openstack-infra | 17:58 | |
fungi | jeblair: annegentle: clarkb: i'm fine with 2200z today too | 17:59 |
*** sdake_ has joined #openstack-infra | 18:02 | |
NobodyCam | clarkb: that stuck ironic job is still showing in zuul? any thing I can do to clear it out? | 18:03 |
*** melwitt has joined #openstack-infra | 18:04 | |
*** DinaBelova has joined #openstack-infra | 18:05 | |
clarkb | NobodyCam push a new patchset or continue waiting. zuul is mega busy though :( | 18:06 |
NobodyCam | lol ack :) | 18:06 |
*** jcooley_ has quit IRC | 18:08 | |
*** DennyZhang has quit IRC | 18:09 | |
*** shardy is now known as shardy_afk | 18:10 | |
*** afazekas_ has quit IRC | 18:12 | |
*** osanchez has quit IRC | 18:12 | |
fungi | clarkb: yeah, when i woke up this morning the gate was around 75 changes deep, so i didn't think adding a zuul restart into the mix was such a great idea after all | 18:14 |
*** ruhe has quit IRC | 18:17 | |
openstackgerrit | Roman Prykhodchenko proposed a change to openstack-infra/config: Adds devstack-gate tests for Ironic https://review.openstack.org/53917 | 18:17 |
*** hogepodge has quit IRC | 18:18 | |
zaro | fungi: yeh. about the gerrit unabandon thing. | 18:18 |
*** flaper87|afk is now known as flaper87 | 18:18 | |
zaro | fungi: i think any user that has abandon privileges can restore. | 18:18 |
*** shardy_afk is now known as shardy | 18:18 | |
*** moted has quit IRC | 18:19 | |
zaro | fungi: so i think we can do that now. right? | 18:19 |
fungi | zaro: in 2.4? i didn't think that was an available option | 18:19 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Added support for Emotional Jenkins https://review.openstack.org/56779 | 18:19 |
* fungi checks | 18:19 | |
*** moted has joined #openstack-infra | 18:20 | |
*** UtahDave has joined #openstack-infra | 18:20 | |
zaro | fungi: http://gerrit-documentation.googlecode.com/svn/Documentation/2.4.2/access-control.html | 18:21 |
zaro | fungi: yup. your right, i don't see abandon control. | 18:21 |
fungi | i was going to check, but the zuul status page is eating my browser alive | 18:22 |
*** pcrews has joined #openstack-infra | 18:22 | |
fungi | swap thrashfest | 18:22 |
*** nsaje has quit IRC | 18:23 | |
*** nsaje has joined #openstack-infra | 18:23 | |
Shrews | nom nom nom... browser cookies... nom nom nom | 18:24 |
fungi | wow... firefox consuming all 16g of ram on my desktop | 18:26 |
clarkb | fungi: nice | 18:26 |
fungi | i should probably not leave a window displaying that page 24x7 for weeks | 18:27 |
*** johnthetubaguy has quit IRC | 18:28 | |
*** johnthetubaguy has joined #openstack-infra | 18:28 | |
*** nsaje has quit IRC | 18:29 | |
openstackgerrit | lifeless proposed a change to openstack-infra/reviewstats: Add jenkins-job-builder as a broken out project https://review.openstack.org/56603 | 18:29 |
fungi | very leaky. after a restart, ~700mb | 18:29 |
clarkb | fungi: does iceweasel keep up with upstream updates? | 18:30 |
anteaya | ttx 3.5 hours remaining in the horizon election? | 18:30 |
clarkb | you might try firefox proper I don't have that problem keeping zuul status open for long periods of time | 18:30 |
clarkb | I am going to kill old etherpad-dev now | 18:30 |
clarkb | then old etherpad.o.o | 18:31 |
*** reed has joined #openstack-infra | 18:31 | |
*** david-lyle is now known as david-lyle_lunch | 18:33 | |
*** sarob has quit IRC | 18:35 | |
*** sarob has joined #openstack-infra | 18:35 | |
*** UtahDave has quit IRC | 18:35 | |
*** hogepodge has joined #openstack-infra | 18:36 | |
lifeless | clarkb: care to abandond your 'test change do not merge' from d-g ? It's up in reviewers faces ;) | 18:37 |
openstackgerrit | Roman Prykhodchenko proposed a change to openstack-infra/config: Adds devstack-gate tests for Ironic https://review.openstack.org/53917 | 18:37 |
clarkb | lifeless: sure | 18:37 |
clarkb | done | 18:38 |
lifeless | thanks! | 18:40 |
*** sarob has quit IRC | 18:40 | |
lifeless | zuul still looks kindof unhappy? | 18:40 |
*** johnthetubaguy has quit IRC | 18:41 | |
clarkb | yup, gate fails have it near worst case so gate pipeline has backed up | 18:41 |
clarkb | last night I was seeing the head of the piepline fail most of the time (we did merge three changes once ) | 18:41 |
clarkb | so not quite near worst case but not very happy either | 18:41 |
*** Ryan_Lane has joined #openstack-infra | 18:42 | |
lifeless | wheeee | 18:42 |
zul | where are the instructions for deploying openstack-ci for 3rd party testing? | 18:45 |
*** blamar has quit IRC | 18:45 | |
clarkb | zul: http://ci.openstack.org/third-party-testing.html iirc | 18:46 |
lifeless | zul: | 18:46 |
lifeless | http://ci.openstack.org/third_party.html | 18:46 |
lifeless | bah | 18:46 |
lifeless | :) | 18:46 |
*** nsaje has joined #openstack-infra | 18:46 | |
fungi | zul: though keep in mind that's not "installing openstack ci" (we assume people have their own ci of some variety they want to hook up to ours) | 18:46 |
fungi | building a mock openstack infra deployment to use for third-party testing is probably overkill, and also a much more involved topic | 18:47 |
zul | fungi: cool thanks | 18:47 |
*** mrmartin has joined #openstack-infra | 18:47 | |
*** blamar has joined #openstack-infra | 18:48 | |
*** datsun180b has joined #openstack-infra | 18:49 | |
clarkb | etherpad-dev is dead | 18:50 |
clarkb | probably not going to get etherpad.o.o before the meeting as I want to double check a few things | 18:50 |
anteaya | clarkb: well done | 18:50 |
*** senk has quit IRC | 18:54 | |
*** rnirmal has quit IRC | 18:58 | |
jog0 | jeblair: hacking has a apache 2 check thanks to sdague | 19:00 |
jog0 | and -1 to overloading the pep8 job any more then we have | 19:00 |
*** alcabrera|afk is now known as alcabrera | 19:01 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: update doc and add new JJB unit tests https://review.openstack.org/56715 | 19:01 |
*** sarob has joined #openstack-infra | 19:01 | |
fungi | meetin time! | 19:01 |
jeblair | jog0: does licensecheck == apache 2 check? not so sure about that. | 19:01 |
jeblair | jog0: what would running that outside of the pep8 ("style check") job get us? | 19:02 |
jog0 | jeblair: someone can run 'tox -epep8' on there own box and get the same as the check | 19:02 |
clarkb | I think that is always the goal so tox would capture it at the very least | 19:03 |
jog0 | clarkb: the bug lsited this as an apt-get install | 19:03 |
*** ruhe has joined #openstack-infra | 19:03 | |
*** jcoufal has quit IRC | 19:04 | |
jog0 | so I assume its a no go for tox | 19:05 |
jeblair | jog0: the bug is _very_ old | 19:05 |
jeblair | jog0: if it won't work in the pep8 job, then yes, it would need to be a different one | 19:05 |
jeblair | yolanda: ^ | 19:06 |
*** blamar has quit IRC | 19:06 | |
*** jcoufal has joined #openstack-infra | 19:06 | |
jog0 | jeblair: IMHO the bug is done, we have aapache2 test | 19:06 |
jog0 | jeblair: ohh and welcome back | 19:07 |
anteaya | jog0: joining us for -infra meeting? | 19:07 |
jeblair | jog0: apparently it fails on nova because run_tests.sh does not have a licence check | 19:07 |
jeblair | jog0: thanks! :) | 19:07 |
jog0 | jeblair: run_tests.sh isn't deliverable code, I am less worried about missing apache2 headers there | 19:07 |
clarkb | I don't think license check == apache 2 | 19:08 |
jog0 | anteaya: in the triplo meeting, but ping me if anything comes up | 19:08 |
clarkb | I always interpreted that bug as look for potential license conflicts | 19:08 |
anteaya | jog0: very good | 19:08 |
*** david-lyle_lunch is now known as david-lyle | 19:08 | |
jog0 | clarkb: hmm | 19:08 |
yolanda | jeblair, so no need for that bugfix? or it still makes sense to run it as an extra check? | 19:09 |
pleia2 | yolanda: I think you mean jog0 :) | 19:09 |
jeblair | yolanda: let's see if we can get some more agreement from jog0 and mordred about that. | 19:10 |
yolanda | both :) | 19:10 |
yolanda | ok | 19:10 |
*** jamesmcarthur has quit IRC | 19:10 | |
jog0 | - licensecheck: given a list of source files, attempt to determine which license (or combination of licenses) each file is placed under. | 19:10 |
jog0 | if thats what it is, then we don't need it AFAIK | 19:11 |
*** jamesmcarthur has joined #openstack-infra | 19:11 | |
jog0 | its easy for us | 19:11 |
jog0 | apache2 FTW | 19:11 |
jog0 | yolanda: | 19:11 |
*** blamar has joined #openstack-infra | 19:11 | |
yolanda | so yes, licensecheck reports every file without a valid license, the reason for that licensecheck job is only for Apache? | 19:12 |
jog0 | yolanda: we only have apache2 | 19:12 |
clarkb | jog0: that isn't 100% true :) pbr is an offender | 19:13 |
jog0 | clarkb: WAT | 19:13 |
clarkb | pbr bundles d2to1 which isn't apache2 iirc | 19:14 |
clarkb | it is bsd or something | 19:14 |
jog0 | ahh anyway thats a special case | 19:14 |
jog0 | this bug doesn't seem valid anymore IMHO | 19:14 |
*** sandywalsh_ has joined #openstack-infra | 19:15 | |
*** mrmartin has quit IRC | 19:16 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: fix jjb job template documentation https://review.openstack.org/57062 | 19:19 |
*** shardy is now known as shardy_afk | 19:20 | |
*** adalbas has quit IRC | 19:21 | |
* sdague finally getting back to having a functional laptop | 19:24 | |
jog0 | jeblair: want to see a fun graph http://paste.openstack.org/show/53572/ | 19:24 |
clarkb | sdague: \o/ | 19:24 |
*** whoops has joined #openstack-infra | 19:24 | |
clarkb | sdague: if you look at the bottom of the elastic-recheck page you will see mikal's current nemesis | 19:25 |
*** WarrenUsui has joined #openstack-infra | 19:25 | |
sdague | ok, well, I'm just in the barely getting back stage :) so will need to continue on setup much of the day | 19:25 |
sdague | I was however racing to get irc working again by TC meeting | 19:25 |
clarkb | sdague: as much as it hurts to say this no rush | 19:26 |
clarkb | we have been dealing with this for the last few days | 19:26 |
sdague | heh | 19:26 |
sdague | yeh, I noticed the spike gate pattern this morning | 19:26 |
openstackgerrit | A change was merged to openstack-infra/reviewstats: Add jenkins-job-builder as a broken out project https://review.openstack.org/56603 | 19:27 |
ttx | anteaya: sounds about right | 19:28 |
*** aardvark has quit IRC | 19:28 | |
anteaya | k | 19:30 |
*** sarob_ has joined #openstack-infra | 19:30 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: fix jjb configuration documentation https://review.openstack.org/57062 | 19:30 |
*** sarob has quit IRC | 19:33 | |
*** sarob_ has quit IRC | 19:34 | |
*** sarob has joined #openstack-infra | 19:34 | |
*** rnirmal has joined #openstack-infra | 19:35 | |
jeblair | mordred: can you weigh in on the necessity of licensecheck? | 19:35 |
jeblair | mordred: jog0 thinks it's redundant (see above ^) | 19:36 |
jeblair | mordred: and i'd hate for yolanda to waste effort | 19:36 |
*** alexpilotti has joined #openstack-infra | 19:37 | |
yolanda | jeblair, ATM only wasted like 10 minutes looking at the bug, no worries | 19:37 |
portante | folks, regarding the 1251920 related failures, should we be rechecking/reverifying them? | 19:37 |
Hunner | clarkb: I have a puppetboard demo set up. Want to G+ screenshare some time today to see if it's something you'd be interested in to replace your dashboard use cases? | 19:38 |
*** jerryz has quit IRC | 19:39 | |
*** marun has quit IRC | 19:39 | |
*** sarob has quit IRC | 19:39 | |
clarkb | Hunner: maybe? I am still kind of swamped after the summit. anteaya and pleia2 might be interested too | 19:39 |
*** sarob_ has joined #openstack-infra | 19:39 | |
*** jerryz has joined #openstack-infra | 19:40 | |
pleia2 | Hunner: oh cool, if you and anteaya are free in an hour I'd love to have a look | 19:40 |
anteaya | let's try it | 19:41 |
anteaya | I can never get sound to work on this laptop but I can look | 19:41 |
*** markmc has quit IRC | 19:41 | |
Hunner | I have a meeting in 49 minutes, but will be free in 19 | 19:41 |
anteaya | or use my personal laptop | 19:42 |
anteaya | 19 minutes works for me | 19:42 |
*** markmc has joined #openstack-infra | 19:42 | |
pleia2 | haha, I have a meeting in 19 | 19:42 |
*** sarob_ has quit IRC | 19:42 | |
Hunner | pleia2: anteaya: 22:00UTC? | 19:42 |
pleia2 | sure | 19:43 |
anteaya | k | 19:43 |
Hunner | :D | 19:43 |
*** sarob has joined #openstack-infra | 19:44 | |
Hunner | anteaya: pleia2: Actually sorry, I have a 22:00-22:30utc. So how about 22:30-23:00? | 19:45 |
anteaya | sure | 19:45 |
pleia2 | ok | 19:46 |
*** sarob has quit IRC | 19:47 | |
*** sarob has joined #openstack-infra | 19:47 | |
*** chuck__ has joined #openstack-infra | 19:48 | |
*** sarob has quit IRC | 19:48 | |
*** sarob has joined #openstack-infra | 19:49 | |
*** rockyg has joined #openstack-infra | 19:49 | |
*** davidhadas has joined #openstack-infra | 19:51 | |
openstackgerrit | Peter Liljenberg proposed a change to openstack-infra/jenkins-job-builder: Added support for Jenkins plugin Blame upstream committers https://review.openstack.org/54085 | 19:53 |
*** alcabrera is now known as alcabrera|afk | 19:55 | |
*** dolphm has quit IRC | 19:55 | |
*** sarob has quit IRC | 19:57 | |
*** sarob_ has joined #openstack-infra | 19:57 | |
*** sarob_ has quit IRC | 19:59 | |
*** yassine has quit IRC | 19:59 | |
*** sarob has joined #openstack-infra | 19:59 | |
*** mihgen has joined #openstack-infra | 19:59 | |
jeblair | clarkb: would you like me to tag jjb 0.6.0 ? | 20:01 |
clarkb | jeblair: I am not sure. zaro mgagne were there any outstanding changes you feel really need to get in a release? | 20:01 |
jog0 | jeblair: so after lunch want to talk about the future of rechecks for a few minutes? | 20:02 |
*** sarob has quit IRC | 20:02 | |
clarkb | I did a quick review off what was there and we merged a few things. Not sure if there are others that are really important | 20:02 |
*** sarob has joined #openstack-infra | 20:02 | |
jeblair | jog0: sure | 20:02 |
zaro | clarkb: let me take a look. | 20:02 |
*** zul has quit IRC | 20:03 | |
zaro | clarkb: this would be the only one i feel strongly about https://review.openstack.org/#/c/56715 | 20:04 |
*** lcestari has quit IRC | 20:04 | |
*** ryanpetrello_ has joined #openstack-infra | 20:05 | |
*** sarob has quit IRC | 20:08 | |
*** sparkycollier has joined #openstack-infra | 20:08 | |
*** ryanpetrello has quit IRC | 20:09 | |
*** ryanpetrello_ is now known as ryanpetrello | 20:09 | |
*** yolanda has quit IRC | 20:09 | |
*** sarob has joined #openstack-infra | 20:09 | |
*** yolanda has joined #openstack-infra | 20:10 | |
clarkb | reviewed | 20:11 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Add new HP Cloud region https://review.openstack.org/56260 | 20:12 |
mordred | clarkb: ^^ | 20:12 |
openstackgerrit | Sergey Lukjanov proposed a change to openstack-infra/config: Setup devstack-gate tests for Savanna https://review.openstack.org/57317 | 20:12 |
clarkb | mordred: reviewed, couple things noted | 20:13 |
clarkb | also ty | 20:13 |
*** alcabrera|afk is now known as alcabrera | 20:14 | |
jeblair | mordred: did you see the floating ip thing earlier? | 20:14 |
*** ace05_ has quit IRC | 20:15 | |
*** markmc has quit IRC | 20:16 | |
mordred | jeblair: just learning about it now | 20:16 |
*** japplewhite has joined #openstack-infra | 20:17 | |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: update doc and add new JJB unit tests https://review.openstack.org/56715 | 20:19 |
*** ace05__ has joined #openstack-infra | 20:20 | |
*** vipul is now known as vipul-away | 20:21 | |
*** vipul-away is now known as vipul | 20:21 | |
*** mrmartin has joined #openstack-infra | 20:22 | |
sparkycollier | mordred: does the new hp trunk zone include Heat? | 20:23 |
mordred | sparkycollier: no | 20:23 |
*** mrmartin has quit IRC | 20:26 | |
pabelanger | Asked before but didn't see a reply, but why don't we include tox -e venv -- python setup.py build_sphinx by default as a tox.ini env? | 20:27 |
openstackgerrit | Khai Do proposed a change to openstack-infra/jenkins-job-builder: update doc and add new JJB unit tests https://review.openstack.org/56715 | 20:28 |
clarkb | pabelanger: I don't know | 20:29 |
clarkb | pabelanger: though it may be reasonable to do that especially since some projects are tracking different requirements for doc builds. I htink the biggest hurdle is we currently use tox -evenv -- python setup.py build_sphinx in the doc build jobs, so we would need to update all projects then switch the doc job script | 20:30 |
dprince | sdague: you mentioned a grenade fix related to https://review.openstack.org/#/c/57066/? | 20:30 |
pabelanger | clarkb, ya, I suspect that might be the issue | 20:31 |
sdague | dprince: yeh, maurosr said he'd get the patch posted today | 20:31 |
sdague | later in that meeting | 20:31 |
*** yolanda has quit IRC | 20:32 | |
dprince | sdague: ah, cool. I missed that. (too many concurrent meetings) | 20:32 |
japplewhite | question all: is this bug https://bugs.launchpad.net/devstack/+bug/1248923 causing any problems in the CI environment? If not can someone explain why not? (I would like for our test env to not be affected too!) | 20:33 |
uvirtbot | Launchpad bug 1248923 in devstack "Devstack install is failing:" [Undecided,Confirmed] | 20:33 |
sdague | dprince: I hear you | 20:35 |
*** vipul is now known as vipul-away | 20:36 | |
*** Hefeweizen has quit IRC | 20:37 | |
openstackgerrit | Sergey Lukjanov proposed a change to openstack-infra/config: Setup devstack-gate tests for Savanna https://review.openstack.org/57317 | 20:43 |
clarkb | we dont enable novncso dont hit that problem | 20:46 |
japplewhite | clarkb: thanks - is that considered deprecate then? | 20:47 |
clarkb | I dont think so. more of an always problematic so we ignore it issue | 20:48 |
*** dolphm has joined #openstack-infra | 20:48 | |
clarkb | there is history there that I dont have enough familiarity with | 20:49 |
*** DinaBelova has quit IRC | 20:49 | |
fungi | also i have doubts we'd be able to exercise novnc properly in an unattended/headless test | 20:49 |
*** Hefeweizen has joined #openstack-infra | 20:49 | |
fungi | at least not without a lot of work to emulate client interaction with the interface | 20:50 |
*** pcrews has quit IRC | 20:50 | |
fungi | anyone want to guess what happened here? https://jenkins01.openstack.org/job/gate-python-glanceclient-pep8/191/console | 20:55 |
clarkb | fungi: looking | 20:55 |
fungi | i'm suspecting a broken git repo in that workspace. confirming now | 20:55 |
japplewhite | clarkb - devstack is working on Ubuntu 13.04 - just tested it…seems limited to 12.04 | 20:56 |
anteaya | fungi yeah, the git disappeared | 20:56 |
anteaya | japplewhite: can you add that to the bug report? | 20:57 |
fungi | clarkb: yeah, http://paste.openstack.org/show/53619/ | 20:58 |
fungi | clearing | 20:58 |
*** datsun180b_ has joined #openstack-infra | 20:59 | |
fungi | unfortunately that issue caused at least one gate reset | 21:00 |
*** Ryan_Lane1 has joined #openstack-infra | 21:01 | |
*** Ryan_Lane has quit IRC | 21:01 | |
openstackgerrit | Sergey Lukjanov proposed a change to openstack-infra/devstack-gate: Add Savanna testing support https://review.openstack.org/57325 | 21:02 |
*** datsun180b has quit IRC | 21:02 | |
*** datsun180b_ is now known as datsun180b | 21:02 | |
*** pcrews has joined #openstack-infra | 21:03 | |
sdague | mordred: I actually know a guy that knows some prolog | 21:05 |
sdague | if you had interest in solving it that way | 21:05 |
anteaya | sdague: zaro is always looking for folks with more prolog | 21:06 |
*** japplewhite has quit IRC | 21:08 | |
anteaya | sdague: how goes the new job? | 21:08 |
anteaya | finding the coffee maker okay? | 21:09 |
*** Ryan_Lane1 is now known as Ryan_Lane | 21:09 | |
*** Ryan_Lane has joined #openstack-infra | 21:09 | |
sdague | um... still setting up new laptop, will be slow for a couple days still | 21:09 |
anteaya | sdague: happy? | 21:10 |
sdague | so far so good :) | 21:10 |
anteaya | yay | 21:11 |
anteaya | glad for you | 21:11 |
*** rahmu has quit IRC | 21:12 | |
*** yamahata_ has quit IRC | 21:12 | |
*** nsaje has quit IRC | 21:13 | |
*** ruhe has quit IRC | 21:13 | |
*** nsaje has joined #openstack-infra | 21:14 | |
*** rahmu has joined #openstack-infra | 21:14 | |
*** ericw has joined #openstack-infra | 21:16 | |
*** alcabrera has quit IRC | 21:17 | |
mriedem | mikal: hey, so regarding the direct emails from shaomei ji, sorry about that, i gave those guys a lecture on a call this morning | 21:18 |
mikal | Heh, thanks | 21:22 |
mikal | I'm sure he means well | 21:22 |
mikal | I think he's just very confused about who I am | 21:22 |
mikal | And how big I am in infra | 21:22 |
mriedem | yeah, they are new | 21:22 |
mriedem | that's why i had to start getting on calls with them b/c they were not making progress | 21:23 |
mriedem | so when they were like 'well we emailed this guy and blah blah blah' the sirens went off | 21:23 |
mikal | I have learned never to reply to requests and say "root is a bad account name" | 21:23 |
mikal | I am being punished for not making clarkb do that | 21:24 |
mriedem | ha | 21:24 |
clarkb | mikal: but we really appreciated your helpfulness :) would request mikal help again | 21:24 |
mikal | LOL | 21:24 |
*** davidhadas_ has joined #openstack-infra | 21:24 | |
* mikal creates bogus LP bugs to test changes because clarkb is a hater | 21:25 | |
*** vipul-away is now known as vipul | 21:26 | |
*** davidhadas has quit IRC | 21:26 | |
*** MarkAtwood has joined #openstack-infra | 21:26 | |
clarkb | mikal: 1251920 is the biggest pain ever | 21:27 |
clarkb | I am going to kill old etherpad.o.o now but when that is over I am half tempted to become your apprentice until 1251920 is fixed | 21:27 |
clarkb | s/apprentice/sidekick/ because super heros are cooler than magicians | 21:28 |
mikal | clarkb: yeah, my problem with that bug is I am running out of ideas | 21:28 |
mikal | clarkb: the only one I have left is that perhaps we should just increase the timeout and see if that fixes it | 21:28 |
mikal | Although I can't see it helping for the "no console logs at all" case | 21:29 |
mikal | It would be super nice if we could poke around a failed instance | 21:29 |
clarkb | mikal: ok, let me see what I can do about holding nodes. we can try holding a handful then hope one of them runs into the failure | 21:29 |
clarkb | but first must kill etherpad.o.lo | 21:29 |
mriedem | mikal: i don't think increasing the timeouts would help | 21:29 |
mriedem | in some cases the mismatch is 9 != 10, and you'll see a lot of responses with the 9 split output | 21:29 |
clarkb | fungi: jeblair: do you have any better ideas than grabbing a few tests slaves and crossing fingers? | 21:30 |
*** jamesmcarthur has quit IRC | 21:30 | |
mriedem | my feeling is that's not going to change regardless of the timeout | 21:30 |
mikal | clarkb: the failure rate is pretty high | 21:30 |
mikal | I don't think you'd need to cross your fingers all that much... | 21:30 |
*** jhesketh__ has quit IRC | 21:30 | |
*** rockyg has quit IRC | 21:30 | |
clarkb | ya | 21:30 |
*** jhesketh_ has joined #openstack-infra | 21:30 | |
clarkb | fungi: jeblair: the oldest local DB backup on new and old servers is from october 21st so they overlap. I think I am good to delete the old server | 21:31 |
clarkb | fungi: jeblair: is there anything you would like me to check in addition to DB backups? | 21:31 |
openstackgerrit | Michael Still proposed a change to openstack-infra/jeepyb: Allow automatic subscription to DocImpact bugs https://review.openstack.org/56158 | 21:31 |
*** esker has joined #openstack-infra | 21:31 | |
*** che-arne has joined #openstack-infra | 21:34 | |
*** yamahata_ has joined #openstack-infra | 21:34 | |
*** jamesmcarthur has joined #openstack-infra | 21:36 | |
*** atiwari has quit IRC | 21:36 | |
*** jamesmcarthur has quit IRC | 21:36 | |
mikal | clarkb: so... this config file I am pushing with puppet | 21:37 |
clarkb | ya | 21:37 |
mikal | clarkb: where are the other config files for gerrit? Where should I put it on the machines? | 21:37 |
mikal | /home/gerrit2>? | 21:38 |
clarkb | mikal: I think it can live where the shell wrapper scripts for jeepyb scripts live | 21:38 |
mikal | clarkb: ok | 21:38 |
clarkb | mikal: eg the patchset-created or change-merged scripts | 21:38 |
clarkb | that call out to jeepyb | 21:38 |
*** japplewhite has joined #openstack-infra | 21:39 | |
jeblair | clarkb: hold and cross fingers is best idea | 21:40 |
jeblair | clarkb: can't think of anything else to check | 21:40 |
mriedem | was wondering if maybe danpb could help with the console problems, i saw he committed a change to libvirt within the last week regarding consoles | 21:41 |
mriedem | http://libvirt.org/git/?p=libvirt.git;a=commit;h=5087a5a0092853702eb5e0c0297937a7859bcab3 | 21:42 |
notmyname | jeblair: russellb: idea for gerrit to get more people to do reviews: add a column that shows how many reviews the author of the submitted patch has done in the past 30 days | 21:42 |
mriedem | that wouldn't be related to what we're seeing b/c danpb's patch was for pty but we're seeing file consoles in the logs | 21:42 |
*** japplewhite has quit IRC | 21:43 | |
mriedem | notmyname: but is the new contributor implicitly punished then? | 21:43 |
mikal | mriedem: including danpb is a good idea | 21:43 |
mikal | mriedem: he's in the EU though, so might be hard to catch now | 21:43 |
mriedem | mikayeah | 21:43 |
mriedem | mikal: yeah | 21:43 |
russellb | notmyname: i've started tracking that in my stats | 21:44 |
notmyname | russellb: ya, I saw that today. it's similar | 21:44 |
notmyname | russellb: I've got the same issue in swift that you have in nova: an ever-growing patch queue and a static number of hours in the day | 21:45 |
notmyname | russellb: and I'd love to have an answer on how to keep it manageable | 21:46 |
clarkb | jeblair: fungi: if there is nothing else to check I am deleting etherpad.o.o old now | 21:46 |
mriedem | notmyname: this might be a good example of why i think it might not work https://review.openstack.org/#/c/56381/ | 21:46 |
russellb | we have some great reviewers, and then a long tail of infrequent contributors that don't really review (and may not be good candidates to anyway). it's tough | 21:46 |
jeblair | clarkb: +1 | 21:46 |
openstackgerrit | Michael Still proposed a change to openstack-infra/config: Add an initial subscriber map for notify_impact https://review.openstack.org/57332 | 21:46 |
russellb | just need to continue grooming regulars to help with the load | 21:46 |
notmyname | russellb: yup | 21:46 |
mriedem | i triaged that bug this weekend, the guy pushed a patch but when i assigned the bug to him in launchpad, it was his first one | 21:46 |
mikal | clarkb: wanna re-review https://review.openstack.org/#/c/56158/ and then https://review.openstack.org/57332 | 21:46 |
mikal | ? | 21:46 |
russellb | notmyname: but at least having numbers helps my sanity ... it was just constant feeling of drowning before | 21:47 |
mriedem | notmyname: so it's good that he's reporting the bug and pushing the patch, but if people ignore his patch b/c it's his first contribution, that's probably bad | 21:47 |
russellb | i can see 8 days, vs feeling like it must be 92340234 days based on some attitudes | 21:47 |
russellb | :) | 21:47 |
russellb | good times. | 21:47 |
notmyname | mriedem: ya, it may be a horrible idea (if it is, I'll take credit. if not, I'll tell you who came up with it) | 21:47 |
mriedem | notmyname: i'm not actually sure it's an original idea - seems like that came up in the ML thread about the same topic back when hyperv was threatening to go out of tree for lack of review love | 21:48 |
notmyname | mriedem: ah. could be | 21:48 |
openstackgerrit | Khai Do proposed a change to openstack-infra/config: add nodepool-dev to jenkins-dev server https://review.openstack.org/57333 | 21:49 |
clarkb | jeblair: fungi: nova delete executed now we wait | 21:49 |
mriedem | does gerrit have a limit on showing the number of patches i'm currently reviewing? | 21:49 |
clarkb | and it is gone | 21:49 |
mriedem | seems i can't always find the most recent ones | 21:49 |
clarkb | mikal: sure | 21:50 |
*** denis_makogon_ has joined #openstack-infra | 21:51 | |
mikal | clarkb: ta | 21:51 |
openstackgerrit | Khai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server https://review.openstack.org/57333 | 21:51 |
*** sparkycollier has quit IRC | 21:53 | |
jog0 | jeblair: ping | 21:53 |
*** nati_ueno has joined #openstack-infra | 21:53 | |
jeblair | jog0: pong | 21:54 |
clarkb | jeblair: https://jenkins01.openstack.org/job/tempest-docs/245/console zuul triggered a bunch of jobs with null zuul refs, any idea of why that may happen? | 21:54 |
jeblair | clarkb: branch deletion | 21:54 |
openstackgerrit | Khai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server https://review.openstack.org/57333 | 21:54 |
clarkb | jeblair: gotcha thanks | 21:55 |
fungi | clarkb: sorry, was away cooking dinner (still am sort of, back shortly) but +++ on etherpad.o.o deletion | 21:56 |
clarkb | fungi: its gone | 21:57 |
clarkb | so \o/ | 21:57 |
jog0 | so according to my most likely slightly wrong graph in http://paste.openstack.org/show/53572/ | 21:58 |
jog0 | jeblair: our recheck policy is resulting in the gate not working | 21:58 |
*** gyee has quit IRC | 21:58 | |
*** denis_makogon_ is now known as denis_makogon | 21:59 | |
clarkb | ooh we get to continue this weekends discussion | 22:00 |
fungi | yes, that seems to be a common conclusion | 22:00 |
jeblair | jog0: studying graph | 22:01 |
notmyname | jog0: to help me understand, hat graphs the percentage of jobs that result in FAILURE? | 22:01 |
jog0 | notmyname: yes | 22:02 |
jog0 | at least that is the intention of it | 22:02 |
jog0 | failure / (failure+pass) | 22:02 |
notmyname | jog0: which means that gate-tempest-devstack-vm-neutron is always passing, right? | 22:03 |
jog0 | notmyname: it appears to be the case | 22:03 |
notmyname | ok, thanks | 22:03 |
jog0 | but check is not | 22:03 |
notmyname | ya, I was just trying to learn to graphite ;-) | 22:03 |
*** hogepodge has quit IRC | 22:03 | |
jog0 | notmyname: I'm new to it too | 22:04 |
fungi | i personally think that nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so even when it's their patch which is introducing a nondeterministic bug | 22:04 |
jog0 | and have a strong hunch I am doing something wrong | 22:04 |
jog0 | fungi: I agree with you nondeterministically :) | 22:04 |
jog0 | jeblair: I think fungi summed it up pretty well | 22:04 |
clarkb | fungi: ++ | 22:04 |
*** hogepodge has joined #openstack-infra | 22:04 | |
fungi | and i avoided saying "flaky" so as to not raise sdague's ire | 22:05 |
clarkb | mikal: ok changes reviewed, I am going to hold a few nodes across rackspace and hp now | 22:05 |
* fungi puts a dollar in the "flaky" jar | 22:05 | |
clarkb | fungi: we need a bot to track flaky jar and quantum jar use then at the next summit we have it buy dev lounge goodies or something | 22:05 |
jeblair | fungi: i don't think "flaky" is the objection but "$FLAKY test" | 22:05 |
*** japplewhite has joined #openstack-infra | 22:06 | |
*** dcramer_ has quit IRC | 22:06 | |
fungi | flaky pastry | 22:06 |
jeblair | fungi: as in, it's not usually the _test_ that's flaky, these days. | 22:06 |
jeblair | anyway | 22:06 |
* fungi nods | 22:06 | |
jog0 | so the check queue is failing just under around 50% of the time | 22:06 |
jeblair | i wound tend to agree with this sentiment, jog0, fungi | 22:06 |
* fungi feels very wounded now | 22:07 | |
jeblair | fungi: ? | 22:07 |
clarkb | mikal: does 1251920 affect all devstack jobs? neutron postgres and normal full? or just postres and full? (making sure I hold nodes running the correct tests | 22:08 |
fungi | sorry, making fun of your typo. i should just assume jetlag fingers | 22:08 |
*** dkliban has quit IRC | 22:08 | |
openstackgerrit | Clay Gerrard proposed a change to openstack-dev/hacking: Add noqa support for H201 (bare except) https://review.openstack.org/57334 | 22:08 |
jeblair | fungi: aha. :) | 22:08 |
openstackgerrit | Michael Still proposed a change to openstack-infra/jeepyb: Allow automatic subscription to DocImpact bugs https://review.openstack.org/56158 | 22:09 |
jeblair | jog0: so we've talked about removing the recheck commands... | 22:09 |
jog0 | getting the real failure rate for check: | 22:09 |
jeblair | i think this is something we should probably only do when things are working well, as otherwise it would just grind the gate to a halt and make more work for everyone | 22:09 |
*** nati_ueno has quit IRC | 22:10 | |
mikal | clarkb: I've just uploaded a new version of notify_impact with fixed whitespace | 22:10 |
jog0 | with postgres | 22:10 |
clarkb | jeblair: jog0: I was actually thinking keeping recheck but removing reverify would help | 22:10 |
clarkb | that allows people to sort out problems pre merge but not during gating | 22:10 |
jog0 | http://paste.openstack.org/show/53626/ | 22:10 |
clarkb | then we would rely on cores to do assessments and traige before reapproving | 22:10 |
jog0 | clarkb: so removing recheck isn't enough IMHO | 22:11 |
jog0 | there is a deaper issue | 22:11 |
jeblair | clarkb: reasonable | 22:11 |
clarkb | mordred pointed out we have far too many cores to make that practical though | 22:11 |
clarkb | but I think we can try it | 22:11 |
clarkb | jog0: do go on | 22:11 |
jog0 | ok so current check failure rate: | 22:11 |
*** japplewhite has left #openstack-infra | 22:11 | |
*** pcm_ has quit IRC | 22:12 | |
*** yamahata_ has quit IRC | 22:12 | |
jog0 | 1-(0.7)*(0.7)*(0.88)=56% | 22:12 |
jog0 | so check fails 56% of time | 22:12 |
jog0 | taking a step back the goal of gate is to keep trunk working | 22:14 |
clarkb | mikal: in addition to tests were you seeing a higher incidence on hpcloud or rax or is it a crapshoot? | 22:14 |
jog0 | I don't think we are doing that today | 22:14 |
clarkb | jog0: I agree | 22:14 |
*** davidhadas_ has quit IRC | 22:14 | |
mikal | clarkb: I had a theory briefly yesterday that it only happened on rax, but I couldn't find data to support that | 22:14 |
jog0 | so this is what I have: | 22:14 |
jog0 | top goal: keep gate stable and green | 22:14 |
jeblair | jog0: i agree, though not because check fails 56% of the time, but rather because gate fails so often. | 22:14 |
fungi | right, i think the technical measures we've put in place presently guarantee that a change will be merged so long as it works "some of the time" | 22:14 |
clarkb | mikal: ok, and which jobs trigger it? is it tempest*full and tempest*postgres-full? | 22:15 |
mikal | clarkb: yes | 22:15 |
clarkb | ok going to hold a few nodes now | 22:15 |
jog0 | subgoals: | 22:15 |
jog0 | * when bugs get past the gate, squash quickly | 22:15 |
jog0 | * make it harder for bugs to get passed the gate | 22:15 |
fungi | clarkb: i'm curious to know how your node holding turns out. i can't remember whether that's expected to work now | 22:15 |
jeblair | fungi: why wouldn't it? | 22:16 |
jog0 | I think we need solutions to both of those issues | 22:16 |
*** anteaya_ has joined #openstack-infra | 22:16 | |
fungi | jeblair: at one point nodepool hold worked but the nodepool status change code didn't check for that status or did so incorrectly. i think that got fixed though--i just haven't had a chance to try it since | 22:16 |
jog0 | and addressing the nature of 'recheck' is only a subset of those | 22:16 |
clarkb | fungi: pretty sure it works now I did it semi recently for a different bug | 22:17 |
fungi | oh, awesome | 22:17 |
*** ftcjeff has joined #openstack-infra | 22:17 | |
anteaya_ | Hunner: you wanted to do something on G+ is that correct? | 22:17 |
*** sandywalsh_ has quit IRC | 22:18 | |
mikal | clarkb: of 38 fails, 28 are rax, 10 are not | 22:18 |
jog0 | jeblair: so as fungi pointed out we have gotten accustomed to using recheck which is bad | 22:18 |
jeblair | jog0: removing/reducing recheck/reverify makes it harder for a change to land if it fails tests sometimes | 22:18 |
clarkb | mikal: ok I held 6 rax 2 hp | 22:18 |
jeblair | jog0: that addresses point 2 | 22:18 |
jog0 | jeblair: partially yes, I think we need to run the tests multiple times too | 22:19 |
*** mriedem has quit IRC | 22:19 | |
fungi | jog0: so anyway, one solution i was noodling around was whether we could integrate the e-r matching so that changes could only be rechecked/reverified if e-r was already successfully classifying it. and then have strict rules in place not to add matches in e-r unless more than one change encountered that issue already | 22:19 |
fungi | not sure whether reasonable, but it was a jumping off point anyway | 22:19 |
jog0 | fungi: Ithink thats a good idea | 22:19 |
*** ljjjusti1 has joined #openstack-infra | 22:19 | |
clarkb | mikal: the nodes I held are running these jobs https://jenkins01.openstack.org/job/check-tempest-devstack-vm-postgres-full/6677/ https://jenkins01.openstack.org/job/check-tempest-devstack-vm-full/7009/ https://jenkins01.openstack.org/job/check-tempest-devstack-vm-postgres-full/6678/ https://jenkins01.openstack.org/job/check-tempest-devstack-vm-full/7011/ | 22:19 |
jeblair | jog0: it might address point 1, if the lack of ability to easily merge a change that doesn't fix an outstanding bug is seen as a motivator for people to squash bugs. i don't think it will be, so i don't think it addresse point 1e. | 22:19 |
clarkb | https://jenkins02.openstack.org/job/check-tempest-devstack-vm-full/6138/ https://jenkins02.openstack.org/job/check-tempest-devstack-vm-full/6139/ https://jenkins02.openstack.org/job/check-tempest-devstack-vm-postgres-full/6370/ https://jenkins02.openstack.org/job/check-tempest-devstack-vm-postgres-full/6373/ | 22:19 |
jog0 | just to do a full brain dump I had afew other ideas (msotly orthogonal) | 22:19 |
lifeless | fungi: so I like the way you're thinking, but stats wise | 22:20 |
*** nati_ueno has joined #openstack-infra | 22:20 | |
lifeless | I'm not sure it makes sense | 22:20 |
fungi | yeah, it very well may be idiotic | 22:20 |
jeblair | jog0: we've had the ability to run jobs multiple times now, but no one seems to have taken advantage of that... | 22:20 |
clarkb | fungi: I like the idea simply because it forces us to track the problems | 22:20 |
fungi | mainly just trying to stab at how do we keep people from rechecking a new nondeterministic failure into trunk | 22:20 |
clarkb | but it may add significant overhead for folks like sdague and jog0 | 22:20 |
jeblair | jog0: sdague had a change up to run neutron 4 times, which prompted me to write the code to make that change work, but he abandoned it, and i haven't seen anything since. | 22:20 |
jog0 | on a high level we need to put more pressue on devs when gate isn't stable | 22:21 |
jog0 | what fungi proposed helps with that | 22:21 |
jeblair | jog0: to be fair, i haven't actually looked at reviews in 3 weeks. | 22:21 |
jog0 | we are getting better at identifying the issues at hand | 22:21 |
clarkb | mikal: I will try keeping an eye on those, but please let me know if you have ruled any of them out allowing them to be released | 22:21 |
jog0 | but we can do better at finding bugs | 22:21 |
jeblair | jog0: but i'm definitely okay and the system is ready to run jobs multiple times | 22:21 |
jog0 | I think mart ofthe issue is no one reads the log files | 22:21 |
jog0 | jeblair: cool | 22:22 |
sdague | jeblair: so I fixed the neutron needs to run more jobs thing by blasting out a different feature matrix | 22:22 |
jog0 | jeblair: can we just start running everything 2x ? | 22:22 |
*** dprince has quit IRC | 22:22 | |
jog0 | clarkb: I am fine with the overhead of only rechecking if in e-r | 22:22 |
jog0 | sounds reasonable | 22:22 |
jog0 | and if we make it harder for bugs to get in the burdon will slow down over time | 22:22 |
*** thomasem has quit IRC | 22:23 | |
sdague | jog0: I think until we get at least the start of the dashboard, we're going to drive more breaks, but not more clarity. I won't really be able to get focus time on that until next week. | 22:23 |
*** ljjjustin has quit IRC | 22:23 | |
jeblair | jog0: it is technically possible to run everything 2x. atm that means adding a lot of lines to zuul's layout.yaml (at least, until one of us gets around to fixing the templating support to make that easier) | 22:23 |
*** gyee has joined #openstack-infra | 22:23 | |
*** xeyed4good has joined #openstack-infra | 22:24 | |
jeblair | jog0, fungi: i'm not certain about the practicality of directly integrating zuul and e-r, however, we can update the recheck regexes to match the list of bugs in e-r) | 22:24 |
jeblair | it's ugly and annoying, but will work immediately | 22:24 |
jog0 | sdague: I think the dashboard will help a lot, but more for when gate is more stable and we have more subtle bugs | 22:24 |
jog0 | (not that it won't help now too) | 22:24 |
lifeless | fungi: the problem is that if something occurs (say) 10 % of the time | 22:25 |
jog0 | so there is another aspect to this | 22:25 |
jog0 | fixing the bugs | 22:25 |
lifeless | fungi: it will still get into trunk | 22:25 |
lifeless | fungi: so your think would help identify whats broken more rigorously, but wouldn't stop things breaking | 22:25 |
clarkb | I just noticed my sampling is iad and hpaz1 not very diverse... hopefully we don't need to do more than a couple passes at holding nodes :) | 22:25 |
lifeless | fungi: perhaps that is something we need | 22:25 |
lifeless | fungi: but it should be looked at that way, not as a prevention tool | 22:26 |
*** svarnau has joined #openstack-infra | 22:26 | |
jog0 | with regard to actually getting bugs fixed, I don't think we have the right culture around fixing these gate hitting bugs | 22:27 |
*** hashar has joined #openstack-infra | 22:27 | |
jog0 | IMHO they should be marked as critical and all hands on deck for the related teams | 22:27 |
jog0 | especially for the big ones | 22:27 |
clarkb | mikal: I think I got one ! | 22:27 |
*** hashar has quit IRC | 22:28 | |
mikal | clarkb: http://logs.openstack.org/33/54833/8/check/check-tempest-devstack-vm-full/fb77190/console.html is a winner on one of your held machines | 22:28 |
*** ArxCruz has quit IRC | 22:28 | |
clarkb | mikal: yup thats the one, have a public key I can throw on the host? | 22:29 |
mikal | Sure, one sec | 22:29 |
jog0 | jeblair: so I wanted to talk to you, (and sdague and other TC people) because I think we need a few drastic steps ASAP | 22:29 |
*** hashar has joined #openstack-infra | 22:30 | |
jeblair | jog0: how should getting all-hands on deck work? | 22:30 |
jog0 | jeblair: not sure | 22:30 |
jeblair | jog0: should the qa team have the ability to triage those bugs and set them as critical? | 22:30 |
jog0 | I think so | 22:30 |
jog0 | accross all projects | 22:30 |
jog0 | and they shouldn't ahve to fix em | 22:30 |
jog0 | I think part ofthat is blocking gate | 22:30 |
jog0 | not exactly block but something along those lines | 22:31 |
jog0 | russellb: ^ | 22:31 |
jog0 | you may want to lurl | 22:31 |
jog0 | lurk | 22:31 |
*** sdake_ has quit IRC | 22:31 | |
jeblair | jog0: and expect that within 24 hours after being triaged, projects should have someone assigned to them? | 22:31 |
jog0 | jeblair: I don't think rules like that are enough | 22:31 |
jog0 | it depends on how bad the bug actually is | 22:31 |
jeblair | jog0: it's less about _rules_ and more about process | 22:32 |
jog0 | this is where sdague's dashboard is nice | 22:32 |
jog0 | we want to ask, what % of failures are because of bug x | 22:32 |
Hunner | anteaya: anteaya_: pleia2: Does https://plus.google.com/hangouts/_/7acpiif9bk4ip30vud3e224rdg?hl=en work for you? | 22:32 |
*** sdake_ has joined #openstack-infra | 22:32 | |
jog0 | if its bad then gather the troops and fix ASAP | 22:32 |
jeblair | jog0: it sounds like your goal is to get people working on them, so that's the strawman i'm putting out -- a process for ensuring people are working on them. | 22:32 |
pleia2 | Hunner: seems to! | 22:32 |
jog0 | jeblair: so thats part of it | 22:32 |
jog0 | and a good start | 22:32 |
pleia2 | Hunner: haha, doh, when I tried to connect "You're not allowed to join this video call." | 22:32 |
jog0 | but there is the second part | 22:33 |
*** SergeyLukjanov has quit IRC | 22:33 | |
Hunner | pleia2: Okay, just a sec... | 22:33 |
jog0 | do we tweak how recheck is used? do we run all tests twice to make the gate a little more rigorous? | 22:33 |
jog0 | also we need to get buy in from teams | 22:33 |
fungi | sorry, stepped away | 22:33 |
jog0 | so I was thinking a ML most? | 22:33 |
jeblair | jog0: yes, of course. i thought you wanted brainstorming. | 22:33 |
*** ArxCruz has joined #openstack-infra | 22:33 | |
jog0 | I do | 22:33 |
jog0 | I was hoping we could get a few ideas and then take the ones that make the most sense to the ML | 22:34 |
jog0 | I like clarkb's idea about recheck | 22:34 |
fungi | lifeless: agreed, i wasn't expecting any solution to drive out 100% of nondeterministic failures, just discouraging the current behavior of reverifying because of a nondeterministic bug you're introducing in your change just to get it to land | 22:34 |
*** whoops has quit IRC | 22:34 | |
jog0 | so if we kill 'recheck no bug' | 22:35 |
Hunner | anteaya: anteaya_: pleia2: https://plus.google.com/hangouts/_/7ecpj1pbmc3jnmao9coklmbp0k?hl=en | 22:35 |
jog0 | that will make sure we do a better job of classifying bugs and hopefully that will make fixing them a higher priority | 22:35 |
* jog0 opens up a etherpad | 22:35 | |
jog0 | https://etherpad.openstack.org/p/future-of-recheck | 22:36 |
jog0 | jeblair: lets brainstorm in there | 22:36 |
anteaya_ | Hunner: I never use G+ so I have to install the plugins | 22:37 |
Hunner | anteaya_: Okay | 22:37 |
fungi | infra does have a need to be able to readd changes into gate and check pipelines, but maybe we can address that through something other than recheck/reverify comments | 22:37 |
*** nati_ueno has quit IRC | 22:37 | |
jeblair | fungi: that's been a goal for a while | 22:37 |
*** dkranz has quit IRC | 22:38 | |
*** nati_ueno has joined #openstack-infra | 22:38 | |
fungi | jog0: what was clarkb's idea you're referring to? i seem to have missed it in scrollback | 22:38 |
clarkb | fungi: removing the ability to reverify? | 22:38 |
fungi | oh, that. okay | 22:39 |
hashar | Hunner: that was antoine, sorry :D | 22:39 |
mikal | So, I now have access to a jenkins node which had the problem | 22:39 |
mikal | At 22:19:08 we started an instance | 22:39 |
fungi | combining some ideas, how about running multiple instances of all jobs if it's due to a recheck/reverify? might conserve resources but make it harder to let those introduce new failures | 22:39 |
jog0 | fungi lifeless clarkb: https://etherpad.openstack.org/p/future-of-recheck | 22:40 |
clarkb | fungi: that allows changes that passed but should've failed through | 22:40 |
clarkb | fungi: we have a higher chance of catching those if we just run more jobs | 22:40 |
fungi | well, true. i meant if we can't spare enough resources to double or triple all job counts, then only do it as a deterrent to reverify-induced merging of new issues | 22:41 |
*** alchen99 has joined #openstack-infra | 22:41 | |
fungi | agreed that being able to shore up even initial attempts against new nondeterministic bugs is better | 22:42 |
clarkb | firefox kept disconnecting me from the etherpad... chromium seems to be fine with it (if anyone else is having trouble) | 22:43 |
*** rnirmal has quit IRC | 22:43 | |
jeblair | clarkb: i'm fine in ff | 22:43 |
jeblair | clarkb: did you get a phone call yesterday? | 22:44 |
clarkb | jeblair: yes | 22:44 |
jeblair | clarkb: did you speak to mordred about it? | 22:44 |
clarkb | I did | 22:44 |
jeblair | clarkb: because mordred has been VERY supportive of our using quite as many resources as we want in order to test all the things | 22:44 |
jeblair | clarkb: i had taken that as being somewhat representative of hp's willingness to support this effort | 22:44 |
clarkb | jeblair: yup, I think there is tension between running the cloud and the resources we want, I punted to mordred and haven't heard from them since. But it is something to consider | 22:45 |
jeblair | if it is not, then actually, we need to seriously reconsider some things. | 22:45 |
fungi | oh, heh, i missed the etherpad until just now. derp | 22:45 |
jeblair | clarkb: when you say you punted to mordred... what is your current understanding of the situation? | 22:45 |
clarkb | jeblair: that this was a temporary situation and one of the reasons we need to start using the new region | 22:46 |
jeblair | mordred: perhaps you could advise us on (a) whether we can continue to use the resources we have been given, and (b) whether we can expect more? | 22:46 |
Alex_Gaynor | jog0: when I said it got slower, I mean the time to run the tests themselves has increased, not time-to-land | 22:46 |
jeblair | clarkb: er, so you're saying we should not consider this as a change, and it is not really relevant to our current discussion? | 22:46 |
clarkb | I think we have to consider it in the short term until we get the new region going | 22:47 |
jog0 | Alex_Gaynor: oh good point | 22:47 |
jeblair | clarkb: i thought we were already supposed to be using the new region, and at any rate, should certainly be using it within a week or two. | 22:47 |
jeblair | clarkb: that sounds like an operational issue and does not need to be considered for a conversation about high-level direction. | 22:48 |
clarkb | right, I just want to make sure we keep that in mind. let me rephrase what I have on the etherpad | 22:48 |
*** mgagne has quit IRC | 22:51 | |
*** jhesketh__ has joined #openstack-infra | 22:52 | |
*** mfer has quit IRC | 22:52 | |
mordred | jeblair, clarkb: ola | 22:53 |
clarkb | mordred: hi there see https://etherpad.openstack.org/p/future-of-recheck and questions of quota | 22:54 |
mordred | clarkb: blerg. http proxy. can you tl;dr the quota question? | 22:54 |
*** denis_makogon has quit IRC | 22:54 | |
mordred | also - the new region is intended to be the only region that exists in the cloud in the future | 22:54 |
clarkb | mordred: hp politely asked me yesterday to use less of our quota, is that going to be a long term problem for us? | 22:55 |
mordred | so it's not that they're going to remove our quota in the current regions, as much as they are going to delete the regions themselves | 22:55 |
mordred | clarkb: they can shove it | 22:55 |
mordred | clarkb: and I'll escalate to people above their paygrade if they don't like it | 22:55 |
jeblair | mordred: that is perfectly understandable, and i expect us to move to the new region asap. | 22:55 |
mordred | jeblair: I do think we may need to think about the floating-ip thing that clarkb was mentioning | 22:56 |
mordred | jeblair: in terms of perhaps having a pre-allocated pool of floating-ips that we re-use, rather than creating/destroying each time? | 22:56 |
jeblair | mordred: do you believe that we can get a quota increase, after we move to the new region? | 22:56 |
mordred | jeblair: how about I ask around some - but I'm sure we can | 22:56 |
jeblair | mordred: oh, i thought the problem was just that our floating ip quota was more limited than our machine quota? | 22:56 |
*** mihgen has quit IRC | 22:57 | |
mordred | ah. I may be misunderstanding then | 22:57 |
jeblair | mordred: if it's something other than that, where can i find the problem statement? | 22:57 |
clarkb | floating ip problem statement is essentially that, by default we get much fewer floating ips than host quota. How do we fix this? get more quota? use a proxy? and so on | 22:57 |
mordred | can I get a set of numbers that describe what we want quota-wise to move from old to new? (like, what quota size do we need in the new region for it to be sufficient) | 22:57 |
mordred | and then I'll fun that up the flagpole immediately | 22:57 |
mordred | let's start with get more quota | 22:58 |
clarkb | mordred: currently we have 8GB*96*3 RAM quota which is ~2.3TB of total RAM quota (did I do that math correctly?) | 22:58 |
clarkb | mordred: and 288 hosts which would each need floating IPs | 22:58 |
jog0 | the etherpad is looking pretty good | 22:59 |
jog0 | we have a bunch of good ideas | 22:59 |
clarkb | that is the quota needed to maintain current levels of use | 22:59 |
jeblair | clarkb: correct | 22:59 |
clarkb | if we want to increase that we would be bumping that 96 value to something greater (maybe 128 for a start?) which would bump RAM to ~3TB and floating ips to 384 | 23:00 |
*** flaper87 is now known as flaper87|afk | 23:00 | |
clarkb | 8GB and 3 are the constants. 96 is the value we want to increase | 23:00 |
fungi | math++ | 23:01 |
*** ryanpetrello_ has joined #openstack-infra | 23:02 | |
*** ryanpetrello has quit IRC | 23:02 | |
*** ryanpetrello_ is now known as ryanpetrello | 23:02 | |
*** mindjive1 has joined #openstack-infra | 23:02 | |
jeblair | fungi: so, maths? :) | 23:02 |
clarkb | 3 comes from 3 AZs (I wasn't clear about that) | 23:02 |
*** thedodd has quit IRC | 23:03 | |
*** mindjiver has quit IRC | 23:03 | |
*** julim has quit IRC | 23:03 | |
fungi | jeblair: yes. maths. i learnt the britism from watching "look around you" which had an entire episode entitled "maths" | 23:03 |
*** anteaya_ has quit IRC | 23:04 | |
fungi | such a marvellous show | 23:04 |
fungi | s/show/program/ | 23:04 |
*** jpich has quit IRC | 23:04 | |
*** hashar has quit IRC | 23:04 | |
pleia2 | Hunner: https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting is our meeting schedule fyi, added puppetboard to it for next week (excited, so want to do it sooner, but we're all swamped I think, next week is best) cc: anteaya | 23:04 |
anteaya | pleia2: ack | 23:04 |
anteaya | looks good Hunner, thanks for setting this up | 23:05 |
Hunner | pleia2: Yep on the swamped part. Queues work better than interrupts for this sort of thing | 23:05 |
pleia2 | yes, thanks \o/ | 23:05 |
*** jcoufal has quit IRC | 23:06 | |
Hunner | np :) | 23:06 |
openstackgerrit | Khai Do proposed a change to openstack-infra/pypi-mirror: add an export option https://review.openstack.org/57345 | 23:07 |
*** salv-orlando has quit IRC | 23:07 | |
*** salv-orlando has joined #openstack-infra | 23:07 | |
*** jcooley_ has joined #openstack-infra | 23:08 | |
*** dolphm has quit IRC | 23:09 | |
*** jcooley_ has quit IRC | 23:10 | |
*** atiwari has joined #openstack-infra | 23:11 | |
*** atiwari has quit IRC | 23:11 | |
jog0 | so now that we have a awesome list of the ideas | 23:11 |
jog0 | lets look at them | 23:12 |
jog0 | so running tests 2x? | 23:12 |
jog0 | is that feasable with current resources? | 23:12 |
jog0 | clarkb: ^ | 23:14 |
*** jergerber has quit IRC | 23:14 | |
clarkb | jog0: it is possible, it will just reduce our throughput by 1/2 in theory | 23:14 |
fungi | depends on how big the gate pipeline gets | 23:14 |
*** hogepodge has quit IRC | 23:15 | |
jog0 | if it means its harder to get breaking patches in, then sounds worth it | 23:16 |
*** jcooley_ has joined #openstack-infra | 23:16 | |
jog0 | jeblair mordred lifeless: ^ | 23:17 |
*** nati_ueno has quit IRC | 23:17 | |
fungi | i think once we get things near squeaky-clean, it will reduce our throughput to slightly less than half | 23:17 |
fungi | until then, it will probably be more of a geometric reduction in throughput | 23:17 |
lifeless | 2x will make ~- difference | 23:18 |
openstackgerrit | Khai Do proposed a change to openstack-infra/config: add nodepool to jenkins-dev server https://review.openstack.org/57333 | 23:18 |
jog0 | lifeless: what does'~- | 23:19 |
jog0 | mean | 23:19 |
lifeless | about none | 23:19 |
lifeless | N/3 probability | 23:20 |
jog0 | lifeless: hmm I'm afraid you are right | 23:20 |
jog0 | in that case what about the future af recheck proposals | 23:20 |
jeblair | i don't think it's none... | 23:20 |
jeblair | for some code paths 2x means 2x, for others it means 16x | 23:21 |
jeblair | (that is to say for code paths that are already exercised 8x, or whatever our full complement of gate jobs is) | 23:21 |
jeblair | (i haven't counted since i got back) | 23:21 |
jog0 | jeblair: at the very least all we loose is some resources | 23:22 |
jog0 | which sounds like a good tradeoff | 23:22 |
*** Ryan_Lane has quit IRC | 23:22 | |
*** Ryan_Lane1 has joined #openstack-infra | 23:22 | |
*** dkranz has joined #openstack-infra | 23:23 | |
*** weshay has quit IRC | 23:23 | |
jog0 | who is wenlock? | 23:24 |
clarkb | wenlock: jog0 | 23:24 |
clarkb | jog0: wenlock | 23:24 |
clarkb | :) | 23:24 |
wenlock | jog0 hi hi | 23:25 |
*** datsun180b has quit IRC | 23:25 | |
jog0 | wenlock: o/ | 23:25 |
jog0 | wenlock: I didn't want to scare you away, was just wondering | 23:25 |
wenlock | no worries, was following along.... checking it out | 23:26 |
*** hogepodge has joined #openstack-infra | 23:26 | |
fungi | wenlock: roll up your sleeves and jump in. more help==better | 23:26 |
jog0 | jeblair: so you say its easy to make things run 2x? | 23:27 |
jog0 | if so lets propose a patch | 23:27 |
jog0 | and lets talk about rechecks | 23:27 |
*** dkranz has quit IRC | 23:27 | |
jog0 | clarkb: btw https://review.openstack.org/#/c/57070/ | 23:28 |
jeblair | jog0: mechanically easy: double the line count in the gate: section of project definitions in zuul's layout.yaml. it's likely to be a huge patch, at least until someone makes zuul's templating a bit more flexible | 23:28 |
jog0 | jeblair: https://review.openstack.org/#/c/56118/4/elastic_recheck/elasticRecheck.py | 23:28 |
clarkb | jog0: approved | 23:28 |
fungi | right. running additional copies of a job under some circumstances is a trivial patch. running two of every single job is probably better addressed through some mechanism other than roughly doubling the number of lines in layout.yaml | 23:29 |
lifeless | jeblair: so if we take a codepath from 8x to 16x | 23:30 |
openstackgerrit | Khai Do proposed a change to openstack-dev/pbr: show transitive dependencies https://review.openstack.org/54639 | 23:30 |
lifeless | probability of a broken path that we don't detect goes from 3/8 to 3/16 or .37 to .16 | 23:31 |
lifeless | jeblair: but AIUI we're looking to avoid failures that occur more than a few percent | 23:31 |
jeblair | jog0: cool. however, to keep my sanity, i need to work through some more back-from-vacation todos and then will probably review changes in order | 23:31 |
lifeless | jeblair: 60x should get us 5% sensitivity, for instance. | 23:32 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/config: Always run check-tempest-devstack-vm-full twice https://review.openstack.org/57347 | 23:33 |
jog0 | jeblair: np | 23:33 |
jog0 | that is a POC patch | 23:33 |
jog0 | feel free to -2 it or w/e | 23:33 |
lifeless | jeblair: when I say approximately no effect, I mean that the individual failure rates we're seeing are already much lower than the sensitivity we'll get from 8x -> 16x | 23:33 |
jog0 | so rechecks themselves | 23:34 |
*** ftcjeff has quit IRC | 23:34 | |
jog0 | we have some good ideas here: https://etherpad.openstack.org/p/future-of-recheck | 23:34 |
jog0 | do we want to propose any to the ML or just do them etc? | 23:34 |
*** oubiwann has quit IRC | 23:35 | |
mikal | jog0: Ok, so I am pretty much out of ideas on this console log thing | 23:36 |
jog0 | mikal: did you try cursing? | 23:36 |
jeblair | lifeless: not necessarily; part of the hypothesis is that many bugs may be entering even after failing already, so part of the solution involves making that harder to happen; separately making the screen that catches those failure finer may be part of a layered solution | 23:36 |
mikal | jog0: yes, yes I did | 23:36 |
mikal | jog0: I can see entries in the libvirt log which imply the monitor has crashed | 23:36 |
mikal | jog0: but the version of qemu-kvm we're running is a couple of weeks older than us having this problem | 23:37 |
*** Ryan_Lane1 is now known as Ryan_Lane | 23:37 | |
*** Ryan_Lane has joined #openstack-infra | 23:37 | |
jeblair | jog0: i think 'run more jobs' and 'be more strict' are things we can just do... | 23:37 |
jog0 | jeblair: I agree with that | 23:37 |
jog0 | buit for recheck | 23:37 |
jog0 | I am not saying we need to go to the ML | 23:37 |
jog0 | I just want to get something changed sooner then later | 23:38 |
jog0 | because status quo = mikal crying | 23:38 |
jog0 | mikal: lets chat in nova | 23:38 |
mikal | jog0: well, I was rather thinking of getting out of my pjs | 23:38 |
jeblair | jog0: i think we should decide within infra/qa what would be the best approach to take wrt to recheck/reverify and then notify the list about that. | 23:39 |
jog0 | jeblair: ++ | 23:39 |
fungi | i think improvements (expected or even experimental) which are unlikely to be contentious should just be implemented and see how they fare | 23:39 |
jeblair | jog0: it's not something i think we can do right away anyway; i think we have to wait until the gate gets more normal | 23:39 |
jog0 | so what do we think the best approach is? | 23:39 |
jeblair | jog0: but we should be prepared to implement it when that happens | 23:39 |
jog0 | jeblair: why wait till its more normal? | 23:39 |
jog0 | and can we assume that will happen? | 23:39 |
fungi | it has been normal before. the sun also may not rise tomorrow | 23:40 |
jeblair | jog0: we could have done this last week easily, for instance. | 23:40 |
*** rcleere has quit IRC | 23:41 | |
jeblair | fungi: " This may make it harder for people to re-run jobs for purposes of testing the jobs themselves... allow it but don't let zuul update its -1 vote?" | 23:41 |
*** vipul is now known as vipul-away | 23:41 | |
jog0 | jeblair: I'm not sure we need to wait but thats a moot point | 23:41 |
jeblair | fungi: i don't follow that | 23:41 |
fungi | jeblair: there have been times where people want to rerun self-gating changes to tests to see if they're deterministic. adding a recheck comment has been how that was accomplished in the past | 23:42 |
lifeless | jeblair: in that people are retrying? | 23:42 |
clarkb | I think retrying in the check queue is fine and should be allowed | 23:42 |
lifeless | jeblair: so the bug is being detected but folk are shoving it through ? | 23:42 |
jeblair | lifeless: yes, that is jog0's hypothesis | 23:42 |
clarkb | the problem is that you can force things through the gate queue with a little luck and persistence | 23:42 |
jog0 | the bug isn't always there fault either | 23:43 |
fungi | jeblair: merely suggesting that if the goal in that section is to completely neuter "recheck no bug" then possibly do that by having zuul not update its vote when leaving further result comments on the same patchset | 23:43 |
jog0 | if a bug is already in gate, you may hit it | 23:43 |
*** sdake_ has quit IRC | 23:43 | |
jog0 | and just push through | 23:43 |
clarkb | jog0: right which is why I think falling back on the core reviwers may be a good idea | 23:43 |
*** nsaje has quit IRC | 23:43 | |
clarkb | as they can make large project wide judgements (I hope) | 23:43 |
jog0 | clarkb: agreed | 23:43 |
jog0 | they can run a 'no bug' test | 23:44 |
jog0 | and everyone else can do recheck bug x | 23:44 |
mordred | you're assuming they're not part of the problem | 23:44 |
*** nsaje has joined #openstack-infra | 23:44 | |
mordred | the reason we don't let people push code directly is because of people who work on it all the time being more likely to override protections | 23:45 |
jog0 | mordred: yes, that is correct. Not sure how to tell who is the problem yet | 23:45 |
jeblair | jog0: perhaps we should do a more rigorous analysis... | 23:45 |
mordred | if the system depends on one set of people behaving better than another set, I believe we're screwed | 23:45 |
jog0 | mordred: ohh you mean core vs non-core | 23:45 |
mordred | because it's essentially re-introducing the idea of a committer | 23:46 |
mordred | just doing it weirdly | 23:46 |
jog0 | misunderstood, yeah as a core I am happt to say I am part of the problem | 23:46 |
jeblair | for recent nondeterministic bugs that merged, do we links to the patches that introduced them? | 23:46 |
jog0 | jeblair: its not always clear how they got introduced, but we do have some records | 23:47 |
clarkb | one example of that would be the swift related bugs that tempest was hitting | 23:47 |
jog0 | I think some of the neutron bugs of late have records like that | 23:47 |
jeblair | jog0: can we test your hypothesis that way (whether people really are reverifying to get in flaky changes), and also identify the culprits (to know whether they are core, in certain projects, or possibly even just tell personally them to stop) | 23:47 |
*** ryanpetrello has quit IRC | 23:47 | |
clarkb | we were running out of disk space, but what change made that a problem in tempest we don't know | 23:47 |
jog0 | jeblair: with a big bug failing 10% of the time they may never see the bug themselves | 23:47 |
clarkb | and we still have no idea why 1251920 is a thing (still hitting it with a hammer) | 23:48 |
jeblair | jog0: right, in which case it was not a problem where someone overrode the test results by reverifying | 23:48 |
jog0 | but jeblair let me dig upn example | 23:48 |
*** changbl has quit IRC | 23:48 | |
*** loq_mac has joined #openstack-infra | 23:48 | |
*** nsaje has quit IRC | 23:49 | |
*** loq_mac has quit IRC | 23:49 | |
jog0 | jeblair: so my concern is this - gate is so flaky now that even if you see your patch fail you don't think it was you | 23:49 |
jog0 | but that is secondary to the bigger issue IMHO, | 23:49 |
jog0 | I push a docs change. Gate fails I know it couldn't have been my code free patch that broke it, but e-r doesn't know why it failed | 23:49 |
openstackgerrit | Edward Raigosa proposed a change to openstack-infra/config: Make pip install from upstream better https://review.openstack.org/51425 | 23:50 |
jog0 | I get impatient and say 'recheck no bug' | 23:50 |
jog0 | so the bug exists and is unclassified (we know nothing about it) and the more we do this the more bugs we get | 23:50 |
jog0 | until we notice, hey why is gate so bad? turns out its many bugs that crept in not one | 23:50 |
jog0 | so the issue to me isn't someone trying to force a bad patch through the gate | 23:51 |
jog0 | its apathy | 23:51 |
jeblair | jog0: ok, you want to improve the data collection for unknown bugs | 23:51 |
fungi | and bugs are likely to invite their friends until it becomes a party | 23:51 |
jog0 | jeblair: I do | 23:51 |
jog0 | fungi: ++ | 23:51 |
jog0 | jeblair: I was thinking of making a webapp that lists all unclassified failures | 23:51 |
jog0 | and has you add queires to e-r that list is filtered | 23:51 |
jog0 | and you get points for how many you remove | 23:52 |
fungi | i rather like your "spot that bug" game idea | 23:52 |
jog0 | fungi: good, you want to write it? | 23:52 |
jeblair | jog0: ok, so i think removing 'no bug' and optionally further restricting reverify bug citations to bugs in e-r helps with that | 23:52 |
*** ryanpetrello has joined #openstack-infra | 23:52 | |
jog0 | jeblair: agreed | 23:52 |
jeblair | jog0: but i think we should fully explore the process a dev goes through there | 23:52 |
lifeless | what about infrastructure failures | 23:53 |
jeblair | if a docs change fails, what does a dev do? | 23:53 |
lifeless | like when jenkins slaves are killed | 23:53 |
jeblair | lifeless: infra isn't exempt from this, even now the instructions say if it's an infra bug, file it in openstack-ci and cite it in recheck | 23:54 |
lifeless | jeblair: ok; I had had the impression than when a slave goes sideways it was was considered a fact of life and not something to do root cause analysis on | 23:54 |
lifeless | jeblair: I'm glad to be wrong! | 23:54 |
fungi | the current annoying loophole there being that people file bugs which say, "infra failed my patch. have a bug" | 23:55 |
fungi | but i suspect the same befalls all projects, not just infra | 23:55 |
clarkb | fungi: yup, I think the tempest guys deal with it more than we do | 23:55 |
jeblair | lifeless: depends on how it goes sideways. we've done quite a bit of root-cause analysis which has led us to the conclusion "we should stop using jenkins" | 23:56 |
*** rfolco has quit IRC | 23:56 | |
jeblair | lifeless: if we see repetitions or similar failures, we are likely to say something that sounds more like "it's a fact of life" than "let's do root cause analysis!"; but that's not because we accept failure, it's because we're in the middle of a _very_ long process of fixing it. :) | 23:57 |
clarkb | jeblair: though maybe this new jenkins that fixed their thread insanity fixes these problems | 23:57 |
clarkb | they went from 3 threads per executor to 1 or some such (which is a massive simplification) | 23:57 |
fungi | but also, random transient network breakage can impact test results and render false negatives. having a way to classify those would be nice (i guess even those could be viewed as fixable problems, given enough resources to throw at it) | 23:58 |
jeblair | clarkb: number of threads is not the only problem we have with jenkins. very far from it, in fact. | 23:58 |
clarkb | jeblair: definitely (just there is the particular bug of jenkins losing connectivity with a slave without knowing) | 23:58 |
lifeless | fungi: right, thats indeed another case | 23:58 |
jeblair | these instances are VERY rare, partly because we treat them so seriously and have gone to great lengths to eliminate them. | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!