sdague | fungi / jeblair: next time there is a reset event - https://review.openstack.org/#/c/45766/ could use to go to the top | 00:00 |
---|---|---|
jeblair | fungi: yeah, rackspace is fairly bad at deleting nodes, so i think nodepool may need to be more aggressive about deleting them | 00:00 |
fungi | also, immediately following the restart, that band of green "ready lag" which the graph was showing constantly has disappeared | 00:00 |
jeblair | fungi: they hit nodepool's 10 minute timeout waiting for deletion often, which makes deleting change from a parallel to serial operation (via the cleanup thread) | 00:01 |
fungi | i'll review 45766 and bump it on the next reset | 00:01 |
fungi | sdague: ^ | 00:01 |
sdague | fungi: thanks | 00:02 |
fungi | sdague: flashgordon: i read "multi threaded conductor" as "go more faster" | 00:02 |
sdague | fungi: yeh, basically | 00:03 |
fungi | okay, hopefully that's a | 00:03 |
fungi | "good" faster and not a "bad" faster | 00:03 |
*** ^demon|away has quit IRC | 00:03 | |
sdague | rustlebee and damnsmith both believe so | 00:03 |
sdague | look at rustlebee's comment in there | 00:04 |
fungi | okay, awesomesauce | 00:05 |
*** gokrokve has quit IRC | 00:05 | |
*** gokrokve has joined #openstack-infra | 00:05 | |
sdague | ok, I think I'm done for the night / weekend. Have a good one folks | 00:06 |
jeblair | sdague: good night/weekend! | 00:06 |
sdague | I will look forward to a time when the top of queue isn't 16hrs - http://f.slukjanov.name/w/review.o.o/66056/1/status.html | 00:06 |
fungi | have a good weekend, sdague | 00:06 |
fungi | jeblair: if you're in the mood for other excitement, we're at about 85% full on the 300+gb root filesystem of the graphite server. basically all whisper files. is there pruning we need to be doing? also, it's got one cpu pegged solid on iowait, which seems to be driven by carbon-cache.py | 00:07 |
fungi | and it's frequently having trouble serving up images | 00:07 |
flashgordon | fungi: without the patches | 00:07 |
fungi | flashgordon: too fun | 00:09 |
jeblair | fungi: no pruning; they are fixed size | 00:09 |
fungi | flashgordon: did you try tempest yet? | 00:09 |
*** mrodden has joined #openstack-infra | 00:09 | |
fungi | jeblair: okay, so it's a whisper file count driving the utilization then? | 00:09 |
flashgordon | fungi: that failed but not sure if failure was related :/ | 00:09 |
fungi | flashgordon: these days, it's a coin toss on tempest fails | 00:09 |
*** gokrokve has quit IRC | 00:09 | |
jeblair | fungi: yes. oh... when we do things like change job names, we could probably delete old metrics files if we don't care about them. | 00:10 |
jeblair | fungi: so perhaps find to delete files older than 4 weeks or something. | 00:10 |
flashgordon | I didn't see any libvirt issues so thinking it was something else | 00:10 |
fungi | that makes sense. basically if the metric name/bucket changed more than a month ago, we don't care to keep it around | 00:11 |
*** banix has quit IRC | 00:11 | |
flashgordon | one of the patches was needed to pass unit tests https://review.openstack.org/#/c/65360/ | 00:11 |
fungi | flashgordon: i can rerun my baseline this weekend and see if i still get the same consistent fails i was seeing before with it | 00:11 |
flashgordon | fungi: that would be great | 00:11 |
jeblair | fungi: aggregate graphs with lots of metrics can cause a lot of graphite load, so people need to be careful creating those | 00:11 |
flashgordon | I'm hoping new libvirt will make gate more stabile and fix | 00:12 |
flashgordon | https://bugs.launchpad.net/bugs/1254872 | 00:12 |
fungi | jeblair: i was wondering whether we might have an uptick in people pulling/reloading those. maybe we need some caching in front of the graphs? squid? | 00:12 |
flashgordon | which would be a big win | 00:12 |
jeblair | fungi: yeah, we should look into that. note that graphs on the zuul status page have cache-busting parameters added to them. we might need to think about caching holistically. | 00:14 |
*** vipuls-away is now known as vipuls | 00:14 | |
*** thuc has joined #openstack-infra | 00:15 | |
*** thuc has quit IRC | 00:15 | |
*** thuc has joined #openstack-infra | 00:16 | |
fungi | jeblair: the iowait seems to be mostly the kjournald thread though, so these are presumably write operations. i remounted the root filesystem with relatime just on a whim, in case reads were driving atime updates, but that ended up not helping | 00:16 |
*** mrodden has quit IRC | 00:16 | |
*** thuc has quit IRC | 00:17 | |
*** thuc has joined #openstack-infra | 00:17 | |
jeblair | fungi: (i think that's because if you update a page with js, the browser won't even check whether it should reload an image at the same url) | 00:17 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Fix the e-r query for bug 1258682 https://review.openstack.org/65303 | 00:18 |
jeblair | fungi: wow, that's exciting. maybe we need to put them on an ssd cinder volume? | 00:18 |
fungi | maybe | 00:19 |
fungi | i wouldn't think we'd really be generating *that* many statsd updates, though, in the grand scheme of things | 00:19 |
fungi | which is why it strikes me as a little odd | 00:20 |
*** denis_makogon has quit IRC | 00:20 | |
jeblair | fungi: i agree | 00:21 |
*** sarob has joined #openstack-infra | 00:23 | |
fungi | i mean, tens a second maybe, tops. it's also possible that a process there has just worked itself into a wonky state due to some bug, and is going wild with flush calls in a loop | 00:24 |
fungi | david-lyle: https://pypi.python.org/pypi/django_openstack_auth | 00:25 |
fungi | david-lyle: zuul was just busy, but it got done | 00:25 |
*** rnirmal has joined #openstack-infra | 00:26 | |
*** sarob has quit IRC | 00:28 | |
*** sarob has joined #openstack-infra | 00:29 | |
*** ryanpetrello has joined #openstack-infra | 00:29 | |
openstackgerrit | Aaron Greengrass proposed a change to openstack-infra/config: Remove hardcoded config assumptions, cleanup variables https://review.openstack.org/66072 | 00:30 |
*** sarob_ has joined #openstack-infra | 00:30 | |
*** CaptTofu has joined #openstack-infra | 00:30 | |
fungi | i'm promoting 45766,2 because the change at the head of the gate is getting ready to fail anyway according to the console log on its last remaining test | 00:30 |
fungi | so we're bound for a 100% gate reset either way. maybe that fix will save more changes from a reverify | 00:31 |
*** sarob__ has joined #openstack-infra | 00:32 | |
*** CaptTofu has quit IRC | 00:32 | |
flashgordon | fungi: sweet | 00:32 |
flashgordon | what about the postgres patch? | 00:33 |
flashgordon | is that already in | 00:33 |
fungi | flashgordon: yep, all jobs getting restarted in the gate now will get a longer timeout on the postgres-full job | 00:33 |
flashgordon | excellent | 00:33 |
fungi | which should help too | 00:33 |
*** dstanek has quit IRC | 00:34 | |
*** sarob has quit IRC | 00:34 | |
flashgordon | /openstack/openstack$ git log --since=1.day --oneline | wc -l | 00:35 |
flashgordon | 70 | 00:35 |
*** sarob_ has quit IRC | 00:35 | |
flashgordon | not bad for a slow gate day | 00:35 |
fungi | flashgordon: so even will all this, we merged 70 changes through the integrated gate? not bad indeed | 00:35 |
david-lyle | fungi: I see it now, thanks for your help! | 00:35 |
david-lyle | and the education | 00:36 |
fungi | david-lyle: no problem--sorry about the wait! | 00:36 |
*** hogepodge has quit IRC | 00:36 | |
fungi | david-lyle: on a "good" day, it takes less than a minute from tag push to pypi availability | 00:36 |
*** dstanek has joined #openstack-infra | 00:36 | |
fungi | (though there's still a 30-60 minute wait after that before it makes it into our testing mirror for jobs to make use of) | 00:37 |
david-lyle | plenty fast enough | 00:38 |
fungi | i'd like it to be faster. maybe eventually | 00:38 |
*** ivar-lazzaro has quit IRC | 00:42 | |
*** mrodden has joined #openstack-infra | 00:44 | |
*** mrodden has quit IRC | 00:44 | |
*** rwsu has quit IRC | 00:46 | |
*** rnirmal_ has joined #openstack-infra | 00:54 | |
*** rnirmal has quit IRC | 00:55 | |
*** rnirmal_ is now known as rnirmal | 00:55 | |
*** prad has quit IRC | 00:58 | |
sdague | guess I'm not quite gone for the night. So enqueue time gets completely reset when you promote, huh? | 00:58 |
sdague | http://f.slukjanov.name/w/review.o.o/66056/1/status.html | 00:59 |
sdague | the entire queue is now 20m | 00:59 |
*** ryanpetrello has quit IRC | 01:02 | |
*** gyee_ has joined #openstack-infra | 01:04 | |
*** pcrews has joined #openstack-infra | 01:05 | |
*** _ruhe is now known as ruhe | 01:09 | |
*** thuc has quit IRC | 01:11 | |
*** thuc has joined #openstack-infra | 01:11 | |
fungi | sdague: yeah, i guess since the queue gets reordered, thus not a strict fifo change, add time restarts | 01:13 |
*** thuc has quit IRC | 01:15 | |
sdague | it would be nice to preserve the enqueue time if possible in zuul | 01:17 |
*** sarob__ has quit IRC | 01:24 | |
*** AaronGr is now known as AaronGr_Zzz | 01:24 | |
*** sarob has joined #openstack-infra | 01:25 | |
*** yamahata has quit IRC | 01:25 | |
jeblair | sdague: yeah, it's implemented as a complete dequeue and re-queue, basically because the queue ordering logic is complicated, and that's the easiest way to re-use it. | 01:26 |
jeblair | sdague: it may be possible to save those values to the side and restore them though. | 01:27 |
sdague | yeh, that would be cool | 01:27 |
sdague | because knowing those numbers is extremely useful in understanding where we stand, I think more so than the gate queue length | 01:28 |
jeblair | sdague: can you file a zuul bug? that's a pretty good lhf item | 01:28 |
sdague | jeblair: which tracker do you use for that? | 01:28 |
*** larrycai has joined #openstack-infra | 01:28 | |
jeblair | sdague: launchad/zuul; you can also target openstack-ci | 01:28 |
sdague | jeblair: where abouts in the code would you implement that? | 01:29 |
notmyname | sdague: what are those timings on that link? | 01:29 |
*** sarob has quit IRC | 01:29 | |
sdague | notmyname: time in queue, though they get reset by priority bumping | 01:30 |
notmyname | sdague: good info, especially if you can get the total time (but you're already talking to jeblair about that ;-) ) | 01:30 |
jeblair | sdague: probably in def _doPromoteEvent(self, event): | 01:30 |
*** banix has joined #openstack-infra | 01:31 | |
notmyname | sdague: "prioroty bumping" == gate flush? | 01:31 |
jeblair | sdague: do you know if sergey has that change up for review yet, or is he experimenting locally? | 01:31 |
jeblair | it's really cool. :) | 01:32 |
* jeblair -> sprint | 01:34 | |
sdague | jeblair: I have it up for review | 01:35 |
sdague | jeblair: https://review.openstack.org/#/c/65993/ | 01:36 |
sdague | he just did a test and deploy somewhere public | 01:36 |
fungi | gah, 45766,2 hit "Availability zone 'test_az_-tempest-2024199811' is invalid (HTTP 400)" | 01:40 |
fungi | reenqueue it? | 01:40 |
flashgordon | fungi: are we having fun yet? | 01:40 |
fungi | flashgordon: apparently | 01:41 |
*** dstanek has quit IRC | 01:41 | |
flashgordon | and yes to reenqueue | 01:41 |
fungi | doing | 01:41 |
* fungi sighs | 01:41 | |
*** ruhe is now known as _ruhe | 01:41 | |
fungi | the failure log for that job is https://jenkins04.openstack.org/job/gate-tempest-dsvm-full/1671/consoleText in case someone wants to satisfy themselves it isn't caused by that nova fix | 01:42 |
*** larrycai has quit IRC | 01:45 | |
*** flashgordon is now known as jog0 | 01:46 | |
*** CaptTofu has joined #openstack-infra | 01:50 | |
*** sarob has joined #openstack-infra | 01:52 | |
*** reed has quit IRC | 01:52 | |
sdague | jog0: you have an er bug for the az issue yet? | 01:55 |
sdague | that's actually probably a tempest bug | 01:55 |
*** CaptTofu has quit IRC | 01:55 | |
sdague | I bet someone forgot a lock on az manip | 01:56 |
*** banix has quit IRC | 01:57 | |
*** nati_uen_ has quit IRC | 01:58 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/zuul: make enqueue_time durable by caching in change https://review.openstack.org/66095 | 02:03 |
sdague | ok, dinner on the table | 02:03 |
*** mriedem has quit IRC | 02:05 | |
*** starmer_ has joined #openstack-infra | 02:07 | |
*** ryanpetrello has joined #openstack-infra | 02:10 | |
*** gokrokve has joined #openstack-infra | 02:13 | |
*** oubiwann has joined #openstack-infra | 02:15 | |
*** pcrews has quit IRC | 02:16 | |
*** weshay has quit IRC | 02:22 | |
jog0 | sdague: there is I think | 02:26 |
jog0 | sdague: 1265672 | 02:27 |
*** sarob has quit IRC | 02:28 | |
*** sarob has joined #openstack-infra | 02:28 | |
*** sarob has quit IRC | 02:32 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/zuul: make enqueue_time passable to addChange https://review.openstack.org/66095 | 02:37 |
sdague | jeblair: so how about that approach instead? | 02:37 |
*** pcrews has joined #openstack-infra | 02:42 | |
*** gokrokve has quit IRC | 02:43 | |
*** gokrokve has joined #openstack-infra | 02:44 | |
*** banix has joined #openstack-infra | 02:45 | |
*** ryanpetrello has quit IRC | 02:47 | |
*** kraman1 has quit IRC | 02:47 | |
*** gokrokve has quit IRC | 02:48 | |
jeblair | sdague: lovely; should it have a regression test? | 02:51 |
jeblair | sdague: i think promote has a test; you could probably just check the times in that existing test | 02:52 |
sdague | jeblair: possibly, though you might need to walk me through that bit | 02:52 |
sdague | ok, sure | 02:52 |
sdague | self.builds is the queue? | 02:53 |
sdague | that will have to wait until morning | 02:55 |
*** rakhmerov has joined #openstack-infra | 02:58 | |
*** jerryz has quit IRC | 03:00 | |
jeblair | sdague: self.builds in the set of running (jenkins) builds; you'll probably either want to inspect the zuul pipeline directly, or get the status json; there are tests that do both | 03:01 |
*** senk has quit IRC | 03:03 | |
*** wenlock has joined #openstack-infra | 03:03 | |
*** loq_mac has joined #openstack-infra | 03:07 | |
openstackgerrit | Devananda van der Veen proposed a change to openstack-infra/config: Enable tempest/ironic gate tests https://review.openstack.org/65845 | 03:08 |
*** coolsvap has quit IRC | 03:13 | |
*** gokrokve has joined #openstack-infra | 03:14 | |
*** banix has quit IRC | 03:22 | |
*** banix has joined #openstack-infra | 03:24 | |
openstackgerrit | Michael Still proposed a change to openstack-infra/zuul: Implement a simple mysql reporter. https://review.openstack.org/65885 | 03:30 |
*** praneshp has quit IRC | 03:31 | |
openstackgerrit | Michael Still proposed a change to openstack-infra/zuul: Implement a simple mysql reporter. https://review.openstack.org/65885 | 03:33 |
*** MarkAtwood has joined #openstack-infra | 03:43 | |
*** pcrews has quit IRC | 03:45 | |
*** michchap_ has quit IRC | 04:01 | |
*** michchap has joined #openstack-infra | 04:01 | |
*** banix has quit IRC | 04:04 | |
*** praneshp has joined #openstack-infra | 04:07 | |
*** praneshp_ has joined #openstack-infra | 04:10 | |
*** praneshp has quit IRC | 04:11 | |
*** praneshp_ is now known as praneshp | 04:11 | |
*** julim has quit IRC | 04:28 | |
*** FallenPegasus has joined #openstack-infra | 04:35 | |
*** FallenPegasus has quit IRC | 04:37 | |
*** MarkAtwood has quit IRC | 04:38 | |
*** rnirmal has quit IRC | 04:42 | |
*** yamahata has joined #openstack-infra | 04:46 | |
*** loq_mac has quit IRC | 04:47 | |
*** loq_mac has joined #openstack-infra | 04:47 | |
*** rakhmerov has quit IRC | 04:52 | |
*** oubiwann has quit IRC | 04:52 | |
*** coolsvap has joined #openstack-infra | 04:57 | |
*** rakhmerov has joined #openstack-infra | 05:00 | |
*** oubiwann has joined #openstack-infra | 05:01 | |
*** loq_mac has quit IRC | 05:03 | |
*** rakhmerov has quit IRC | 05:05 | |
*** gokrokve has quit IRC | 05:20 | |
*** rakhmerov has joined #openstack-infra | 05:22 | |
*** rakhmerov has quit IRC | 05:27 | |
*** loq_mac has joined #openstack-infra | 05:28 | |
*** yamahata has quit IRC | 05:41 | |
*** yamahata has joined #openstack-infra | 05:42 | |
*** oubiwann has quit IRC | 05:45 | |
*** rcarrillocruz1 has quit IRC | 05:54 | |
*** dpyzhov has joined #openstack-infra | 05:59 | |
*** gyee_ has quit IRC | 06:02 | |
*** mattoliverau has joined #openstack-infra | 06:06 | |
*** rakhmerov has joined #openstack-infra | 06:08 | |
*** mattoliverau has quit IRC | 06:09 | |
*** rakhmerov has quit IRC | 06:12 | |
*** dcramer_ has joined #openstack-infra | 06:13 | |
*** mattoliverau has joined #openstack-infra | 06:15 | |
*** mattoliverau has quit IRC | 06:15 | |
*** mattoliverau has joined #openstack-infra | 06:20 | |
*** loq_mac has quit IRC | 06:22 | |
*** loq_mac has joined #openstack-infra | 06:23 | |
*** SergeyLukjanov has joined #openstack-infra | 06:28 | |
*** mattoliverau has quit IRC | 06:29 | |
*** mattoliverau has joined #openstack-infra | 06:30 | |
*** starmer_ has quit IRC | 06:31 | |
*** praneshp has quit IRC | 06:38 | |
*** mattoliverau has quit IRC | 06:44 | |
openstackgerrit | Michael Still proposed a change to openstack-infra/zuul: Implement a simple mysql reporter. https://review.openstack.org/65885 | 06:46 |
*** mattoliverau has joined #openstack-infra | 06:48 | |
*** mattoliverau has quit IRC | 06:49 | |
*** mattoliverau has joined #openstack-infra | 06:53 | |
*** mattoliv1rau has joined #openstack-infra | 06:58 | |
*** mattoliverau has quit IRC | 06:58 | |
*** mattoliv1rau has quit IRC | 07:07 | |
*** mattoliverau has joined #openstack-infra | 07:08 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 07:08 | |
*** rakhmerov has joined #openstack-infra | 07:08 | |
*** _SergeyLukjanov has quit IRC | 07:09 | |
*** harlowja is now known as harlowja_away | 07:10 | |
*** mattoliverau has quit IRC | 07:12 | |
*** rakhmerov has quit IRC | 07:13 | |
*** SergeyLukjanov has joined #openstack-infra | 07:13 | |
*** starmer has joined #openstack-infra | 07:16 | |
*** loq_mac has quit IRC | 07:25 | |
*** loq_mac has joined #openstack-infra | 07:29 | |
*** a7ndrew has joined #openstack-infra | 07:32 | |
*** bogdando has quit IRC | 07:34 | |
*** loq_mac has quit IRC | 07:34 | |
*** loq_mac has joined #openstack-infra | 07:35 | |
openstackgerrit | Ruslan Kamaldinov proposed a change to openstack-infra/storyboard: [Do not review] Added tests for DB migrations https://review.openstack.org/66116 | 07:37 |
*** denis_makogon has joined #openstack-infra | 07:47 | |
*** cbkyeoh has joined #openstack-infra | 07:48 | |
*** larrycai has joined #openstack-infra | 07:50 | |
*** larrycai has quit IRC | 07:54 | |
*** dpyzhov has quit IRC | 07:55 | |
*** coolsvap has quit IRC | 07:55 | |
*** SergeyLukjanov has quit IRC | 08:03 | |
*** _ruhe is now known as ruhe | 08:08 | |
*** loq_mac has quit IRC | 08:08 | |
*** rakhmerov has joined #openstack-infra | 08:09 | |
*** rakhmerov has quit IRC | 08:14 | |
*** dpyzhov has joined #openstack-infra | 08:14 | |
*** ruhe is now known as _ruhe | 08:19 | |
*** nicedice has quit IRC | 08:31 | |
*** nicedice has joined #openstack-infra | 08:33 | |
*** hashar has joined #openstack-infra | 08:38 | |
*** yolanda has joined #openstack-infra | 08:58 | |
*** rakhmerov has joined #openstack-infra | 09:10 | |
*** rakhmerov1 has joined #openstack-infra | 09:12 | |
*** rakhmerov has quit IRC | 09:12 | |
*** SergeyLukjanov has joined #openstack-infra | 09:12 | |
*** nicedice has quit IRC | 09:15 | |
*** rakhmerov1 has quit IRC | 09:16 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 09:27 | |
*** _SergeyLukjanov has quit IRC | 09:28 | |
*** denis_makogon has quit IRC | 09:29 | |
*** starmer has quit IRC | 09:30 | |
*** SergeyLukjanov has joined #openstack-infra | 09:48 | |
*** hashar has quit IRC | 09:55 | |
*** rakhmerov has joined #openstack-infra | 10:12 | |
*** yolanda has quit IRC | 10:13 | |
*** yolanda has joined #openstack-infra | 10:13 | |
*** rakhmerov has quit IRC | 10:17 | |
*** dpyzhov has quit IRC | 10:27 | |
*** michchap has quit IRC | 10:37 | |
*** michchap has joined #openstack-infra | 10:37 | |
*** wenlock has quit IRC | 11:12 | |
*** rakhmerov has joined #openstack-infra | 11:13 | |
*** enikanorov_ has joined #openstack-infra | 11:14 | |
*** enikanorov has quit IRC | 11:16 | |
*** rakhmerov has quit IRC | 11:17 | |
*** tma996 has joined #openstack-infra | 11:26 | |
*** boris-42 has quit IRC | 11:28 | |
*** boris-42 has joined #openstack-infra | 11:28 | |
*** uvirtbot has joined #openstack-infra | 11:45 | |
*** hashar has joined #openstack-infra | 12:09 | |
*** rakhmerov has joined #openstack-infra | 12:13 | |
*** rakhmerov has quit IRC | 12:18 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 12:21 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 12:21 | |
*** hashar has quit IRC | 12:48 | |
*** denis_makogon has joined #openstack-infra | 12:51 | |
*** bauzas has joined #openstack-infra | 12:55 | |
*** yolanda has quit IRC | 13:08 | |
*** rakhmerov has joined #openstack-infra | 13:14 | |
*** cbkyeoh has quit IRC | 13:14 | |
*** cbkyeoh has joined #openstack-infra | 13:15 | |
*** rakhmerov has quit IRC | 13:19 | |
*** cbkyeoh has quit IRC | 13:32 | |
*** cbkyeoh has joined #openstack-infra | 13:34 | |
*** cyeoh has quit IRC | 13:39 | |
*** cbkyeoh is now known as cyeoh | 13:39 | |
*** mozawa has joined #openstack-infra | 14:04 | |
*** rakhmerov has joined #openstack-infra | 14:15 | |
*** rakhmerov has quit IRC | 14:19 | |
*** dcramer_ has quit IRC | 14:33 | |
*** dstanek has joined #openstack-infra | 14:38 | |
*** sdague has quit IRC | 14:44 | |
*** dstanek has quit IRC | 14:48 | |
*** sdague has joined #openstack-infra | 15:02 | |
*** sdague has quit IRC | 15:07 | |
*** sdague has joined #openstack-infra | 15:08 | |
*** dmsimard has joined #openstack-infra | 15:14 | |
*** coolsvap has joined #openstack-infra | 15:14 | |
dmsimard | Is there some problems with Jenkins this morning ? Getting weird checks again. | 15:15 |
*** dcramer_ has joined #openstack-infra | 15:15 | |
*** rakhmerov has joined #openstack-infra | 15:15 | |
*** ryanpetrello has joined #openstack-infra | 15:17 | |
*** tma996 has quit IRC | 15:18 | |
*** dstanek has joined #openstack-infra | 15:18 | |
*** ryanpetrello has quit IRC | 15:19 | |
*** rakhmerov has quit IRC | 15:20 | |
*** sdake has quit IRC | 15:21 | |
fungi | dmsimard: "weird" how? | 15:22 |
fungi | have an example link? | 15:23 |
dmsimard | fungi: https://review.openstack.org/#/c/64388/1 | 15:23 |
*** tma996 has joined #openstack-infra | 15:24 | |
dmsimard | Only have logs for 2 of the 5 checks and then we get errors that shouldn't be happening on gate-puppet-ceph-puppet-unit-3.0 | 15:25 |
*** CaptTofu has joined #openstack-infra | 15:27 | |
fungi | yeah, i suspect another jenkins unit test slave has gone rogue. hunting it now | 15:29 |
dmsimard | Appreciate it fungi, thanks | 15:29 |
dmsimard | Tough week for you guys ? | 15:30 |
fungi | yes :/ | 15:30 |
fungi | jenkins02 seems to be mostly unresponsive | 15:30 |
fungi | oh, it came up for me on a reload | 15:30 |
*** ryanpetrello has joined #openstack-infra | 15:31 | |
*** gokrokve has joined #openstack-infra | 15:32 | |
*** gokrokve has quit IRC | 15:37 | |
fungi | yep, precise20 was impacted by bug 1267364 now... https://jenkins02.openstack.org/job/gate-puppet-ceph-puppet-syntax/51/console | 15:38 |
uvirtbot | Launchpad bug 1267364 in openstack-ci "Recurrent jenkins slave agent failures" [Critical,In progress] https://launchpad.net/bugs/1267364 | 15:38 |
fungi | i've taken it offline | 15:38 |
fungi | oh, actually maybe not. that's a different backtrace | 15:38 |
fungi | yeah, this one failed the same way on precise20 too... https://jenkins02.openstack.org/job/gate-puppet-ceph-puppet-lint/44/console | 15:41 |
fungi | and i'm getting tons of timeouts/proxy errors from the jenkins02 webui, so i'm going to put it in shutdown, delete the currently disabled slaves so that they don't get reenabled on startup, then give jenkins02 a service restart | 15:42 |
fungi | jenkins01/03/04 are all snappy by comparison | 15:44 |
fungi | at least as snappy as the horrible jenkins webui ever gets | 15:44 |
*** ryanpetrello has quit IRC | 15:47 | |
fungi | oh yeah, cpu graph for jenkins02 is fun... http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=823&rra_id=all | 15:47 |
dmsimard | ouch | 15:47 |
dmsimard | be right back | 15:48 |
fungi | and load average http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=822&rra_id=all | 15:48 |
*** dmsimard has quit IRC | 15:48 | |
*** SergeyLukjanov is now known as _SergeyLukjanov | 15:48 | |
*** dmsimard has joined #openstack-infra | 15:49 | |
fungi | however, matching the cpu and load average graphs to memory consumption makes it look like maybe it was doing something to reclaim cache memory starting around the same time or just before http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=826&rra_id=all | 15:49 |
dmsimard | I'm not familiar with the CI infra. The jenkins nodes are just dispatching to the slaves, right ? | 15:49 |
dmsimard | Why would jenkins02 be crawling then :( | 15:49 |
fungi | EJAVA | 15:50 |
dmsimard | Oh, java, of course | 15:50 |
fungi | jenkins is one giant java virtual machine, and it is doing more than just dispatching jobs. it also gets dynamic slaves registered/deregistered, collects job artifacts from them, keeps track of job status and estimated run time, et cetera | 15:51 |
fungi | and has tons of global locks, so doesn't scale well to our work volume. that's why zuul grew the ability to manage multiple jenkins masters (we have five now) | 15:52 |
fungi | jenkins itself always assumes one and only one master server, so we had to add a layer above it | 15:53 |
*** rakhmerov has joined #openstack-infra | 15:53 | |
fungi | oh, wow, first time i've seen this message on the jenkins management page... "This Jenkins appears to have crashed. Please check the logs." | 15:55 |
*** rakhmerov has quit IRC | 15:55 | |
*** rakhmerov1 has joined #openstack-infra | 15:55 | |
fungi | oh... "Jan 6, 2014 9:07:26 PM (4 days 18 hr ago)" | 15:56 |
fungi | that's probably when we were restarting it | 15:56 |
fungi | and i had to forcibly kill it | 15:56 |
fungi | okay, i've placed jenkins02 in prepare for shutdown state | 15:58 |
dmsimard | Think I'll have to issue another recheck comment ? | 15:58 |
fungi | dmsimard: no, hopefully not for this issue anyway | 15:59 |
*** dstanek has quit IRC | 15:59 | |
dmsimard | ok if it happens again i'll link that bug | 16:00 |
fungi | dmsimard: oh, well, you'll need to issue *a* recheck comment (you entered "reverify" instead) | 16:00 |
dmsimard | So it's recheck, not reverify ? I could've sworn I saw a reverify work | 16:00 |
fungi | reverify is for changes which get approved and fail to merge to to test failuews | 16:01 |
fungi | failures | 16:01 |
dmsimard | Ah | 16:01 |
fungi | recheck is for patchsets which are still undergoing review | 16:01 |
*** gokrokve has joined #openstack-infra | 16:02 | |
fungi | dmsimard: they're discussed briefly in the article section jenkins links in its failure comments... https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures | 16:02 |
dmsimard | K, sent a recheck, let's wait and see | 16:04 |
*** oubiwann has joined #openstack-infra | 16:04 | |
*** dmsimard has quit IRC | 16:07 | |
*** ryanpetrello has joined #openstack-infra | 16:10 | |
fungi | jenkins02 is estimating over an hour for some currently running jobs, so i'm going to step away for a few while it winds down | 16:12 |
*** wenlock has joined #openstack-infra | 16:12 | |
*** _SergeyLukjanov is now known as SergeyLukjanov | 16:12 | |
*** gokrokve has quit IRC | 16:16 | |
*** wenlock has quit IRC | 16:16 | |
*** ryanpetrello has quit IRC | 16:35 | |
*** tma996 has quit IRC | 16:37 | |
*** oubiwann has quit IRC | 16:47 | |
*** gokrokve has joined #openstack-infra | 16:55 | |
*** krtaylor has quit IRC | 16:59 | |
fungi | last job on jenkins02 is finishing up now | 17:00 |
*** gokrokve has quit IRC | 17:08 | |
*** agordeev has quit IRC | 17:10 | |
*** agordeev has joined #openstack-infra | 17:15 | |
fungi | okay, i was able to delete all slaves on jenkins02 with hung jobs from several days ago, though it took multiple tries of offlining/disconnecting to be able to delete them... centos6-{2,12,14} and precise{2,22,36} | 17:26 |
*** dcramer_ has quit IRC | 17:27 | |
fungi | there was also a hung gate-swift-docs job pending in the queue on jenkins02 for some reason, again dating back a couple days, which i was able to kill | 17:27 |
fungi | i also deleted the slaves which either we or jenkins02 offlined for connectivity problems... precise{4,10,12,18,20,26,34} | 17:27 |
fungi | and deleted all nodepool nodes for jenkins02 from nodepool except for that one hung devstack-precise-hpcloud-az2-777371 which even novaclient can't seem to delete | 17:28 |
fungi | and also deleted any lingering nodepool-managed nodes from the jenkins02 webui | 17:28 |
fungi | tried but was unable to stop the jenkins service on jenkins02 cleanly from its initscript (like earlier in the week) | 17:30 |
fungi | sigterm and sighup were being ignored by both the daemon process and the java subprocess. had to take the child down with sigsegv and then separately do the same to the parent since it still didn't terminate on its own (even with lots of waiting between various signals) | 17:35 |
fungi | uhh, top claims the 10-minute load average is 2195.30 | 17:36 |
fungi | that doesn't even seem possible | 17:36 |
fungi | dropping rapidly, so must have been much higher | 17:37 |
fungi | jenkins02 is completely idle now though, so starting jenkins service on it again | 17:38 |
*** dcramer_ has joined #openstack-infra | 17:38 | |
fungi | it's back up, running jobs, and at least one has already succeeded | 17:40 |
*** SergeyLukjanov has quit IRC | 17:40 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: Zuul status: don't toggle on link click https://review.openstack.org/64716 | 17:43 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: provide time in queue in zuul ui https://review.openstack.org/65993 | 17:43 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: clean up possible js incompatibilities https://review.openstack.org/66057 | 17:43 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: make merge conflict changes black https://review.openstack.org/66056 | 17:43 |
sdague | fungi: is it possible to inject a periodic event? | 17:47 |
*** boris-42 has quit IRC | 17:48 | |
fungi | sdague: not currently i don't think, but i'll try | 17:48 |
*** dcramer_ has quit IRC | 17:48 | |
sdague | I was trying to fix a layout issue | 17:48 |
sdague | but it only shows up on periodics | 17:49 |
fungi | yep, see that--was in the middle of reviewing the new patchset for it | 17:49 |
fungi | looks like zuul enqueue requires --change which i suspect won't take a git refname | 17:50 |
fungi | trigger-job.py can take refs, but only has parameters for a few of the pipelines (not the periodic one) | 17:51 |
*** boris-42 has joined #openstack-infra | 17:51 | |
fungi | and while i think zuul-dev runs some dummy periodic changes with great frequency (or at least used to), its status.json currently isn't available apparently... http://zuul-dev.openstack.org/ | 17:52 |
fungi | sdague: i believe there are some sample files in the zuul repo though, which might be fed in to test the interface? | 17:53 |
fungi | sample json payloads i mean | 17:53 |
sdague | fungi: yeh, so I actually think my current layout work around will be fine | 17:53 |
sdague | honestly, I'm about to stop doing this for the day, just one more email :) | 17:54 |
fungi | http://git.openstack.org/cgit/openstack-infra/zuul/tree/etc/status/public_html/status-openstack.json-sample | 17:54 |
fungi | oh, looks like the sample is missing the periodic pipeline | 17:55 |
sdague | the fact that we're still 40 deep on Sat is not a good sign | 18:06 |
*** starmer has joined #openstack-infra | 18:06 | |
*** fbo_away is now known as fbo | 18:07 | |
fungi | yeah, i still saw quite a few tempest jobs failing on a variety of issues, most commonly ssh timeouts though | 18:07 |
*** mozawa has quit IRC | 18:09 | |
fungi | yep, the tempest change which caused the last reset died on an ssh timeout in the tempest run at the end of its grenade job | 18:11 |
*** rakhmerov1 has quit IRC | 18:23 | |
*** denis_makogon has quit IRC | 18:26 | |
*** yolanda has joined #openstack-infra | 18:48 | |
*** boris-42 has quit IRC | 18:51 | |
*** boris-42 has joined #openstack-infra | 18:51 | |
*** CaptTofu has quit IRC | 19:19 | |
*** CaptTofu has joined #openstack-infra | 19:20 | |
*** coolsvap has quit IRC | 19:21 | |
*** CaptTofu has quit IRC | 19:22 | |
fungi | the 13 persistent slaves which i deleted out of jenkins02 before the reboot have been halted, hard-rebooted through the rackspace dashboard, readded to jenkins02 and i've watched each of them run and complete at least one job successfully apiece | 19:58 |
fungi | er, before the jenkins02 service restart (wasn't a reboot of the server) | 19:59 |
*** CaptTofu has joined #openstack-infra | 20:04 | |
*** mdenny has quit IRC | 20:09 | |
*** dstanek has joined #openstack-infra | 20:28 | |
*** starmer has quit IRC | 20:30 | |
*** starmer has joined #openstack-infra | 20:33 | |
*** dstanek has quit IRC | 20:51 | |
*** senk has joined #openstack-infra | 20:56 | |
*** oubiwann has joined #openstack-infra | 21:02 | |
*** erfanian has joined #openstack-infra | 21:13 | |
*** erfanian has quit IRC | 21:18 | |
*** rakhmerov has joined #openstack-infra | 21:20 | |
*** senk has quit IRC | 21:47 | |
*** senk has joined #openstack-infra | 21:48 | |
*** fbo is now known as fbo_away | 21:48 | |
*** dcramer_ has joined #openstack-infra | 21:49 | |
*** oubiwann has quit IRC | 21:59 | |
*** dimsum has quit IRC | 21:59 | |
*** oubiwann has joined #openstack-infra | 22:04 | |
sdague | fungi: if you are still around today - https://review.openstack.org/#/c/65804/ and pop it to top of queue? Will be interesting to see if that clears everything out. | 22:16 |
*** olaph has quit IRC | 22:17 | |
*** annegentle_ has quit IRC | 22:20 | |
*** ryanpetrello has joined #openstack-infra | 22:23 | |
fungi | gotta run out to dinner, but i'll do that real fast | 22:24 |
fungi | done | 22:25 |
fungi | back later | 22:25 |
*** senk has quit IRC | 22:29 | |
*** ryanpetrello has quit IRC | 22:36 | |
*** rnirmal has joined #openstack-infra | 22:39 | |
*** salv-orlando has quit IRC | 22:54 | |
*** ryanpetrello has joined #openstack-infra | 22:55 | |
*** yolanda has quit IRC | 23:02 | |
openstackgerrit | Emilien Macchi proposed a change to openstack-infra/devstack-gate: Enable Neutron metering service plugin https://review.openstack.org/66142 | 23:14 |
zaro | fungi: i'm having trouble making a release. | 23:14 |
jeblair | sdague: did you see i pasted the json with a periodic event? | 23:15 |
jeblair | sdague: i was hoping that would help you debug/fix that | 23:16 |
jeblair | sdague, fungi: also, 65804 wasn't designed to change anything; it just adds the knob, it doesn't turn it. | 23:17 |
jeblair | sdague, fungi: https://review.openstack.org/#/c/65805/ turns the knob down, at the expense of increasing the runtime to 1.0-1.5 hours | 23:20 |
*** bauzas has quit IRC | 23:21 | |
*** dstanek has joined #openstack-infra | 23:26 | |
openstackgerrit | A change was merged to openstack-infra/config: Zuul status: don't toggle on link click https://review.openstack.org/64716 | 23:34 |
*** erfanian has joined #openstack-infra | 23:34 | |
*** rnirmal has quit IRC | 23:37 | |
*** oubiwann has quit IRC | 23:39 | |
*** michchap has quit IRC | 23:46 | |
*** michchap has joined #openstack-infra | 23:46 | |
*** dstanek has quit IRC | 23:47 | |
*** erfanian has quit IRC | 23:56 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!