*** xek__ has joined #openstack-infra | 00:00 | |
*** xek_ has quit IRC | 00:02 | |
*** jamesmcarthur has joined #openstack-infra | 00:03 | |
fungi | does seem to be going a bit more than twice slower | 00:03 |
---|---|---|
*** mattw4 has quit IRC | 00:03 | |
*** jamesmcarthur has quit IRC | 00:06 | |
*** jamesmcarthur has joined #openstack-infra | 00:06 | |
clarkb | fungi: seems like we are about 14-15 minutes in and it has 2100 ish tasks | 00:08 |
clarkb | so about the same speed? | 00:08 |
fungi | yeah, maybe | 00:08 |
*** slaweq has joined #openstack-infra | 00:11 | |
*** weifan has joined #openstack-infra | 00:13 | |
*** slaweq has quit IRC | 00:15 | |
*** weifan has quit IRC | 00:17 | |
*** rosmaita has quit IRC | 00:19 | |
*** gyee has quit IRC | 00:19 | |
fungi | nearly done | 00:20 |
*** yolanda has quit IRC | 00:20 | |
fungi | so ~2x14min | 00:20 |
clarkb | ya, you'd expect it to be quicker than that | 00:20 |
clarkb | but at least it hasn't gone slower than the single case | 00:20 |
fungi | so far | 00:20 |
clarkb | should I do 05, 07, 08 next? | 00:20 |
fungi | but yeah, replicating two in parallel took roughly as long as replicating one | 00:20 |
fungi | has 01 been done yet? | 00:21 |
clarkb | yes 01 06 03 04 ar edone | 00:21 |
fungi | oh, right, 01 went first then 06 | 00:21 |
clarkb | if we do 05 07 08 together we can see if it is 3x14 minutes | 00:22 |
fungi | and we're not doing 02 because it's already removed to replace | 00:22 |
fungi | so, yep, sounds good | 00:22 |
clarkb | fungi: ya, though maybe you should double check that haproxy has 02 pulled as we expect? | 00:22 |
clarkb | doing 05 07 08 now | 00:22 |
fungi | checking now, sure | 00:22 |
clarkb | 6.3k tasks now | 00:23 |
clarkb | I'm going to start on that curry now | 00:23 |
*** weifan has joined #openstack-infra | 00:25 | |
fungi | yeah, 02 is no longer in the pools according to "show stat" | 00:25 |
fungi | enjoy curry! | 00:25 |
fungi | i'm monitoring the times for the gerrit replication queue with timestamps, so should know within a minute when it's caught up | 00:29 |
fungi | you know, for science | 00:30 |
*** weifan has quit IRC | 00:30 | |
*** rosmaita has joined #openstack-infra | 00:32 | |
*** igordc has quit IRC | 00:45 | |
ianw | anyoen know how to encode the date range in the url of the kibana query so if you send it to someone they see the same range? | 00:47 |
fungi | no clue, sorry | 00:51 |
fungi | i've not been able to figure out how to direct-link kibana queries for that matter | 00:51 |
*** betherly has joined #openstack-infra | 00:51 | |
*** yamamoto has joined #openstack-infra | 00:52 | |
logan- | ?from=5d | 00:52 |
logan- | there is a "share" button at the top that will link to the query (2nd from the right) | 00:52 |
ianw | logan-: yeah, that share link doesn't seem to add in the date range if you've selected a custom one | 00:53 |
logan- | nope, it doesn't | 00:53 |
logan- | you have to add the from parameter manually :/ | 00:53 |
ianw | what if it's a range? | 00:54 |
logan- | not sure | 00:54 |
logan- | sorry, i only know how to specify the start time | 00:54 |
logan- | half way there at least :P | 00:54 |
ianw | maybe ?from=..?to=... ... ? | 00:55 |
fungi | worth a try | 00:55 |
*** ricolin has joined #openstack-infra | 00:56 | |
*** betherly has quit IRC | 00:56 | |
clarkb | sorry I dont know | 00:58 |
ianw | http://logstash.openstack.org/#/dashboard/file/logstash.json?from=2019-07-25T16:39:00&to=2019-07-25T17:00:00&query=message:%5C%22HTTP%20Error%20404%5C%22%20AND%20node_provider:%20rax-iad | 00:58 |
ianw | does not appear to work | 00:58 |
ianw | it seems like this is done by default in kibana 4, but for 3 might be out of luck :/ | 00:59 |
ianw | (date range as part of shared url i mean) | 00:59 |
clarkb | ya but 4 requires write accessto elastocsearch | 01:01 |
clarkb | whoch is why wenever upgraded | 01:02 |
fungi | last batch nearly done | 01:02 |
clarkb | fungi: about on time | 01:02 |
fungi | yeah | 01:02 |
fungi | and done | 01:02 |
clarkb | I wonder where we degrade to an hour per and if it is related to load like you suggested | 01:03 |
fungi | so... 40 minutes? | 01:03 |
fungi | right | 01:03 |
openstackgerrit | Joshua Hesketh proposed opendev/system-config master: Toggle CI should also hide old zuul comments https://review.opendev.org/671436 | 01:09 |
*** betherly has joined #openstack-infra | 01:13 | |
rm_work | err, is there a good channel to ask someone about some Shanghai visa specifics? nothing to do with infra really, but you folks are the most connected I know :D | 01:14 |
*** betherly has quit IRC | 01:17 | |
clarkb | rm_work: there is an invite letter form to fill out on the summit site. Other than that it hasbeen suggested to me to use ahandling company | 01:17 |
rm_work | yeah i think we have one internally we use | 01:18 |
rm_work | following the email directions seemed to indicate i had to sign up for the summit first but i think i misread pre-coffee and it's fine | 01:18 |
rm_work | (to get the invite letter) | 01:19 |
ianw | clarkb / auristor: correlating everything (as clarkb had done anyway) it seems very likely that apache can think files aren't on disk during releases even with 5.3-rc1 afs-next branch; see -> http://lists.infradead.org/pipermail/linux-afs/2019-July/003122.html | 01:19 |
rm_work | ahh no, it does, the form says your order ID is required | 01:22 |
rm_work | so ... if we are waiting for speaker codes... we just have to keep waiting before we can get our visa thing? | 01:23 |
clarkb | I guess? I was also told the visa is relatively quick | 01:24 |
auristor | ianw: is there FileAuditLog data for [Thu Jul 25 16:39:18 2019] kAFS: Volume 536870968 'mirror.epel' is offline ? | 01:24 |
ianw | auristor: no unfortunately, it's turned off atm. i can go back and re-enable it like last time now we have these new changes to test | 01:25 |
rm_work | i've been told to allow 2 months for the visa process, lol | 01:26 |
rm_work | but I guess we do have a bit | 01:26 |
*** diablo_rojo has quit IRC | 01:27 | |
ianw | auristor: i can do that in a bit. the tar.gz i provided before was sufficient right? just replicate that? | 01:27 |
*** jamesmcarthur has quit IRC | 01:30 | |
auristor | ianw: yes, the same contents as the last time would be great | 01:32 |
*** betherly has joined #openstack-infra | 01:33 | |
auristor | ianw: openafs unlike kafs will upon receiving a VBUSY or VOFFLINE error will sleep 15 seconds and then retry the request up to 100 times. at the moment, kafs will immediately failover to the other locations but will then fail the syscall. | 01:35 |
auristor | ianw: I would like to confirm from the FileAuditLog entries whether or not the failover is taking place | 01:36 |
ianw | ok. i'm not sure if there's a way to make apache a bit more verbose too about what it is seeing on disk | 01:38 |
*** betherly has quit IRC | 01:38 | |
auristor | I think the translation of VOFFLINE to ENOENT is wrong | 01:41 |
*** betherly has joined #openstack-infra | 01:54 | |
*** betherly has quit IRC | 01:59 | |
auristor | fs/afs/misc.c afs_abort_to_error() should not convert VOFFLINE to ENOENT but to ENODEV because ENOENT will cause a negative lookup to be cached resulting in a missing file error. | 02:03 |
*** tinwood has quit IRC | 02:10 | |
*** slaweq has joined #openstack-infra | 02:11 | |
*** tinwood has joined #openstack-infra | 02:12 | |
*** betherly has joined #openstack-infra | 02:15 | |
*** slaweq has quit IRC | 02:16 | |
*** betherly has quit IRC | 02:20 | |
*** bobh has joined #openstack-infra | 02:22 | |
*** whoami-rajat has joined #openstack-infra | 02:26 | |
*** bobh has quit IRC | 02:36 | |
*** yamamoto has quit IRC | 02:36 | |
*** factor has joined #openstack-infra | 02:38 | |
*** yamamoto has joined #openstack-infra | 02:50 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: AFS audit logging: helper script https://review.opendev.org/672847 | 02:54 |
ianw | auristor: logging enabled; for infra-root ^ should be helpful so there's less magic involved | 02:54 |
*** betherly has joined #openstack-infra | 02:57 | |
prometheanfire | ianw: do_extra_package_install does not include the hooks mount, is this intentional? | 02:58 |
auristor | ianw; thanks | 02:59 |
*** jamesmcarthur has joined #openstack-infra | 02:59 | |
ianw | prometheanfire: ahhh, i have no idea :) i guess it's just always been like that | 02:59 |
prometheanfire | ok, may need to unify that behavior then :D | 02:59 |
prometheanfire | I guess I could just have it run portageq itself, since it will always be in the chroot | 02:59 |
prometheanfire | ya, will just do that | 02:59 |
*** betherly has quit IRC | 03:01 | |
*** EmilienM|pto is now known as EmilienM | 03:01 | |
*** HenryG has quit IRC | 03:04 | |
*** slaweq has joined #openstack-infra | 03:11 | |
*** bhavikdbavishi has joined #openstack-infra | 03:13 | |
*** rh-jelabarre has quit IRC | 03:14 | |
*** slaweq has quit IRC | 03:15 | |
*** psachin has joined #openstack-infra | 03:22 | |
*** psachin has quit IRC | 03:23 | |
*** psachin has joined #openstack-infra | 03:26 | |
*** betherly has joined #openstack-infra | 03:28 | |
*** michael-beaver has quit IRC | 03:32 | |
*** ykarel|away has joined #openstack-infra | 03:32 | |
*** betherly has quit IRC | 03:33 | |
*** gregoryo has joined #openstack-infra | 03:35 | |
*** diablo_rojo has joined #openstack-infra | 03:43 | |
*** bhavikdbavishi1 has joined #openstack-infra | 03:46 | |
prometheanfire | doesn't do the caching either | 03:47 |
*** bhavikdbavishi has quit IRC | 03:48 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:48 | |
*** betherly has joined #openstack-infra | 03:48 | |
*** betherly has quit IRC | 03:53 | |
*** HenryG has joined #openstack-infra | 03:54 | |
*** udesale has joined #openstack-infra | 03:57 | |
*** betherly has joined #openstack-infra | 04:09 | |
*** slaweq has joined #openstack-infra | 04:11 | |
*** ramishra has joined #openstack-infra | 04:12 | |
*** betherly has quit IRC | 04:14 | |
*** slaweq has quit IRC | 04:15 | |
*** apetrich has quit IRC | 04:20 | |
*** betherly has joined #openstack-infra | 04:30 | |
*** jamesmcarthur has quit IRC | 04:32 | |
*** betherly has quit IRC | 04:35 | |
*** ykarel|away has quit IRC | 04:40 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 04:41 |
*** yamamoto_ has joined #openstack-infra | 04:46 | |
*** yamamoto has quit IRC | 04:50 | |
*** diablo_rojo has quit IRC | 04:56 | |
*** betherly has joined #openstack-infra | 05:01 | |
*** jbadiapa has quit IRC | 05:02 | |
*** ykarel|away has joined #openstack-infra | 05:02 | |
*** betherly has quit IRC | 05:06 | |
*** armax has quit IRC | 05:16 | |
*** jaosorior has joined #openstack-infra | 05:21 | |
*** ramishra has quit IRC | 05:27 | |
*** ramishra has joined #openstack-infra | 05:38 | |
*** rtjure has joined #openstack-infra | 05:42 | |
*** yamamoto_ has quit IRC | 05:43 | |
*** zbr has joined #openstack-infra | 05:44 | |
*** kjackal has quit IRC | 05:44 | |
*** jamesmcarthur has joined #openstack-infra | 05:45 | |
*** yamamoto has joined #openstack-infra | 05:56 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 05:57 |
*** rcernin has quit IRC | 06:08 | |
*** slaweq has joined #openstack-infra | 06:11 | |
*** odicha has joined #openstack-infra | 06:14 | |
*** apetrich has joined #openstack-infra | 06:15 | |
*** slaweq has quit IRC | 06:16 | |
*** gregoryo has quit IRC | 06:17 | |
*** yamamoto_ has joined #openstack-infra | 06:19 | |
*** yamamoto has quit IRC | 06:21 | |
*** jbadiapa has joined #openstack-infra | 06:24 | |
*** dpawlik has joined #openstack-infra | 06:25 | |
*** Lucas_Gray has quit IRC | 06:26 | |
*** jamesmcarthur has quit IRC | 06:33 | |
*** ramishra has quit IRC | 06:34 | |
*** iurygregory has joined #openstack-infra | 06:34 | |
*** ykarel|away is now known as ykarel | 06:36 | |
*** joeguo has quit IRC | 06:44 | |
*** kjackal has joined #openstack-infra | 06:45 | |
*** rlandy has joined #openstack-infra | 06:47 | |
*** kjackal has quit IRC | 06:48 | |
*** jpena|off is now known as jpena | 06:51 | |
*** jpena is now known as jpena|mtg | 06:51 | |
*** raukadah is now known as chandankumar | 06:51 | |
*** Vadmacs has joined #openstack-infra | 06:59 | |
*** kjackal has joined #openstack-infra | 07:00 | |
*** tesseract has joined #openstack-infra | 07:03 | |
*** slaweq has joined #openstack-infra | 07:07 | |
*** ginopc has joined #openstack-infra | 07:10 | |
*** ykarel is now known as ykarel|lunch | 07:26 | |
*** bhavikdbavishi has quit IRC | 07:26 | |
*** kopecmartin|off is now known as kopecmartin | 07:26 | |
*** cshen has joined #openstack-infra | 07:32 | |
*** tosky has joined #openstack-infra | 07:34 | |
*** dchen has quit IRC | 07:39 | |
*** cshen has quit IRC | 07:43 | |
*** rpittau|afk is now known as rpittau | 07:44 | |
*** pcaruana has joined #openstack-infra | 07:44 | |
*** Goneri has joined #openstack-infra | 07:45 | |
*** ramishra has joined #openstack-infra | 07:53 | |
*** bhavikdbavishi has joined #openstack-infra | 07:54 | |
*** dtantsur|afk is now known as dtantsur | 07:55 | |
*** ralonsoh has joined #openstack-infra | 07:56 | |
*** cshen has joined #openstack-infra | 07:56 | |
*** ramishra has quit IRC | 07:57 | |
*** ramishra has joined #openstack-infra | 07:57 | |
*** lucasagomes has joined #openstack-infra | 08:03 | |
*** yamamoto_ has quit IRC | 08:05 | |
*** yamamoto has joined #openstack-infra | 08:13 | |
*** ricolin has quit IRC | 08:19 | |
*** pkopec has joined #openstack-infra | 08:23 | |
*** siqbal has joined #openstack-infra | 08:29 | |
*** rosmaita has quit IRC | 08:30 | |
*** rosmaita has joined #openstack-infra | 08:34 | |
*** joeguo has joined #openstack-infra | 08:41 | |
*** bhavikdbavishi has quit IRC | 08:47 | |
*** ykarel|lunch is now known as ykarel | 08:49 | |
*** bhavikdbavishi has joined #openstack-infra | 08:56 | |
*** e0ne has joined #openstack-infra | 09:16 | |
*** bhavikdbavishi has quit IRC | 09:17 | |
*** derekh has joined #openstack-infra | 09:23 | |
*** siqbal has quit IRC | 09:31 | |
*** siqbal has joined #openstack-infra | 09:31 | |
*** arxcruz is now known as arxcruz|off | 09:32 | |
*** cshen has left #openstack-infra | 09:33 | |
*** jbadiapa has quit IRC | 09:39 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Spec for allowing circular dependencies https://review.opendev.org/643309 | 09:49 |
*** priteau has joined #openstack-infra | 09:53 | |
*** rlandy has quit IRC | 10:03 | |
*** dpawlik has quit IRC | 10:08 | |
*** yamamoto has quit IRC | 10:10 | |
*** yamamoto has joined #openstack-infra | 10:14 | |
*** yamamoto has quit IRC | 10:15 | |
*** dpawlik has joined #openstack-infra | 10:26 | |
*** tdasilva has quit IRC | 10:29 | |
*** psachin has quit IRC | 10:38 | |
*** udesale has quit IRC | 10:44 | |
*** udesale has joined #openstack-infra | 10:45 | |
*** yamamoto has joined #openstack-infra | 10:45 | |
*** siqbal has quit IRC | 10:47 | |
*** kjackal has quit IRC | 10:48 | |
*** dpawlik has quit IRC | 11:00 | |
*** tdasilva has joined #openstack-infra | 11:02 | |
*** psachin has joined #openstack-infra | 11:02 | |
*** dpawlik has joined #openstack-infra | 11:06 | |
*** jaosorior has quit IRC | 11:15 | |
*** ramishra has quit IRC | 11:19 | |
*** ramishra has joined #openstack-infra | 11:20 | |
*** roman_g has quit IRC | 11:25 | |
*** roman_g has joined #openstack-infra | 11:26 | |
*** kjackal has joined #openstack-infra | 11:26 | |
*** EmilienM has quit IRC | 11:27 | |
*** EmilienM has joined #openstack-infra | 11:28 | |
*** jbadiapa has joined #openstack-infra | 11:37 | |
*** larainema has joined #openstack-infra | 11:51 | |
*** yamamoto has quit IRC | 11:51 | |
*** irclogbot_2 has quit IRC | 11:53 | |
*** yamamoto has joined #openstack-infra | 11:54 | |
*** irclogbot_1 has joined #openstack-infra | 11:54 | |
*** rh-jelabarre has joined #openstack-infra | 11:59 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 12:05 |
*** joeguo has quit IRC | 12:06 | |
*** bhavikdbavishi has joined #openstack-infra | 12:07 | |
*** hwoarang has quit IRC | 12:08 | |
*** jbadiapa has quit IRC | 12:08 | |
*** jbadiapa has joined #openstack-infra | 12:08 | |
*** bhavikdbavishi1 has joined #openstack-infra | 12:09 | |
*** bhavikdbavishi has quit IRC | 12:11 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 12:11 | |
*** hwoarang has joined #openstack-infra | 12:13 | |
*** derekh has quit IRC | 12:24 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: manager: specify report failure in logs https://review.opendev.org/671760 | 12:31 |
*** rascasoft has quit IRC | 12:32 | |
*** Goneri has quit IRC | 12:34 | |
*** rascasoft has joined #openstack-infra | 12:34 | |
*** yamamoto has quit IRC | 12:34 | |
*** derekh has joined #openstack-infra | 12:36 | |
*** dpawlik has quit IRC | 12:37 | |
*** yamamoto has joined #openstack-infra | 12:38 | |
*** dpawlik has joined #openstack-infra | 12:43 | |
*** jpena|mtg is now known as jpena|off | 12:47 | |
*** mriedem has joined #openstack-infra | 12:53 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 12:55 |
*** yamamoto has quit IRC | 12:57 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Builds page - Fix bad labels display https://review.opendev.org/672973 | 12:59 |
*** bhavikdbavishi has quit IRC | 12:59 | |
*** xek__ has quit IRC | 13:01 | |
*** xek__ has joined #openstack-infra | 13:02 | |
donnyd | Last night I finally got together something to gather metrics for FN. It looks like the hypervisors are under utilized, so I am going to turn things back up a bit. | 13:06 |
donnyd | CPU utilization for the most part sits around 20%, and I am thinking 40-50% would make more use of my equipment. So with that, I turned it back up to 60 | 13:07 |
donnyd | I will watch over the weekend to see what it does... I am still.. waiting... on... parts.. for the new storage (fans this time) and when I go to put it in place there will be hard downtime. So I will need to roll it back to zero, but there should be plently of advanced notice. | 13:08 |
donnyd | Still getting timeouts, but about 1/3 as much as rax-ord ( probably because they have 4x the instances) | 13:10 |
donnyd | Seems the be the same jobs timing out everywhere, so I am pretty sure its not from the infra | 13:10 |
*** b3nt_pin is now known as beagles | 13:15 | |
*** bhavikdbavishi has joined #openstack-infra | 13:18 | |
*** jpena|off is now known as jpena | 13:19 | |
*** ykarel has quit IRC | 13:19 | |
*** ekultails has joined #openstack-infra | 13:20 | |
*** aaronsheffield has joined #openstack-infra | 13:28 | |
*** smcginnis has joined #openstack-infra | 13:29 | |
*** Goneri has joined #openstack-infra | 13:29 | |
*** yamamoto has joined #openstack-infra | 13:31 | |
*** goldyfruit has joined #openstack-infra | 13:31 | |
*** bhavikdbavishi has quit IRC | 13:35 | |
*** pkopec has quit IRC | 13:36 | |
*** pkopec has joined #openstack-infra | 13:37 | |
*** michael-beaver has joined #openstack-infra | 13:38 | |
*** jpena is now known as jpena|off | 13:38 | |
fungi | donnyd: there are also guaranteed to be at least some jobs whose runtimes have crept up close to their timeout values and so minor variances cause them to overrun the allowed duration | 13:40 |
fungi | usually easy to tell by looking at the success runs, though the durations for them are a click away from http://zuul.opendev.org/t/openstack/builds | 13:42 |
fungi | i wonder if including a duration column there would be useful | 13:42 |
fungi | and then there are jobs which have nondeterministic/race condition issues causing some process to hang, so the success runs are well under their timeouts but then sometimes they timeout inexplicably because the job stops indefinitely halfway through | 13:43 |
donnyd | To me it would be extremly useful. A large part of this project is to figure out what it takes to make a CI system goes as fast as possible | 13:44 |
*** yamamoto has quit IRC | 13:45 | |
*** xek__ has quit IRC | 13:46 | |
*** xek__ has joined #openstack-infra | 13:47 | |
*** yamamoto has joined #openstack-infra | 13:48 | |
mordred | fungi: if we look in to adding that duration column, we should keep in mind there are some jobs that will show a long duration because they were paused (docker registry, for instance) - so we should account for that somehow, or maybe have a total duration and a total active duration or something | 13:51 |
donnyd | I would think any metric on runtime would be a great place to start. Is there anything I could do to speed up the container based builds, they seem to timeout more than others | 13:52 |
*** liuyulong has joined #openstack-infra | 13:55 | |
*** FlorianFa has quit IRC | 13:58 | |
*** eharney has joined #openstack-infra | 14:02 | |
AJaeger | config-core, puppet-crane is updated now - want to help retiring it, please? https://review.opendev.org/#/c/671268/ | 14:05 |
AJaeger | thanks, mordred | 14:08 |
AJaeger | config-core, and one review to rename a job in grafana, please - https://review.opendev.org/672290 | 14:09 |
*** ykarel has joined #openstack-infra | 14:12 | |
*** dpawlik has quit IRC | 14:16 | |
*** psachin has quit IRC | 14:22 | |
*** odicha has quit IRC | 14:22 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add react-lazylog package https://review.opendev.org/672988 | 14:23 |
fungi | mordred: great point | 14:24 |
fungi | donnyd: have an example? | 14:24 |
*** bhavikdbavishi has joined #openstack-infra | 14:25 | |
*** bnemec is now known as beekneemech | 14:26 | |
*** Goneri has quit IRC | 14:27 | |
*** xek__ has quit IRC | 14:28 | |
*** xek__ has joined #openstack-infra | 14:29 | |
donnyd | Well from a provider prospective I concerned with job start/finish times. I am pretty sure we already gather the data. But I really only can track instance on to off on my end | 14:30 |
*** smrcascao has joined #openstack-infra | 14:31 | |
clarkb | we track that for every job in graphite | 14:31 |
fungi | donnyd: oh, i meant an example build for something containery which was slow | 14:31 |
donnyd | Oh, yea | 14:31 |
fungi | but sure, makes sense | 14:31 |
clarkb | every job also has its own timeout value though which isn't in graphtie and I think what we really want is proximit of duration to timeout | 14:31 |
fungi | i'm mostly just curious if some of these jobs are failing to use mirrors and whatnot | 14:31 |
clarkb | fungi: ++ | 14:31 |
openstackgerrit | Merged openstack/project-config master: Remove puppet-crane https://review.opendev.org/671268 | 14:32 |
donnyd | Well for the containery thing, maybe a proxy that can cache container images in memory or very fast disk | 14:32 |
clarkb | fungi: for gitea backend replacements do we have a change to pull more of them out of inventory? or do we want to go ahead with 02 for now? | 14:32 |
donnyd | But I have no real ideas what it would take without digging in | 14:32 |
clarkb | donnyd: that is what our mirror node does | 14:32 |
donnyd | Does it do that for containers | 14:32 |
clarkb | donnyd: yup | 14:33 |
clarkb | donnyd: as long as you request the images through mirror.regionone.fortnebula.opendev.org instead of hub.docker.com directly | 14:33 |
fungi | clarkb: i was mostly thinking of doing more in parallel because syncing took so long, but last night's syncs were fast, so one at a time is likely fine | 14:34 |
donnyd | Oh... well then the job timeouts have to be more related to the actual job config then the service provider... Without a massive CPU / Memory upgrade, I cannot make things turn any faster.. and from what I could tell the workload seems to be IO bound anyways | 14:34 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: try lazylog https://review.opendev.org/672991 | 14:34 |
donnyd | clarkb: I guess we would have to look at the failing jobs to see where they are getting their bits from | 14:35 |
*** ricolin has joined #openstack-infra | 14:36 | |
fungi | clarkb: anyway, should be safe to delete 02 so i'm doing that now and will then boot the replacement | 14:37 |
openstackgerrit | Graham Hayes proposed zuul/nodepool master: Implement an Azure driver https://review.opendev.org/554432 | 14:39 |
clarkb | donnyd: ya that is why fungi was looking for examples | 14:39 |
clarkb | fungi: gotcha. fwiw you don't have to delete the old one first if you don't want to (though with 8 total backends deleting first should also be totally fine) | 14:40 |
clarkb | fungi: don't forget to look for leaked volume after server delete | 14:40 |
fungi | yup | 14:40 |
fungi | no available volumes in that region/tenant | 14:41 |
fungi | so nova took care of it this time | 14:41 |
fungi | sudo /opt/system-config/launch/launch-node.py gitea02.opendev.org --flavor=v2-highcpu-8 --cloud=openstackci-vexxhost --region=sjc1 --image=infra-ubuntu-bionic-minimal-20190612 --boot-from-volume --volume-size=80 --ignore_ipv6 --network=public --config-drive | 14:43 |
fungi | those are the options we settled on previously | 14:43 |
*** jamesmcarthur has joined #openstack-infra | 14:43 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 14:43 |
clarkb | looks correct to me | 14:43 |
fungi | [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details | 14:44 |
fungi | [WARNING]: Unable to parse /opt/system-config/inventory/emergency.yaml as an inventory source | 14:45 |
fungi | yeah, that file doesn't exist | 14:45 |
clarkb | fungi: that may be a regression in launch-node looking for specific inventory files which have since moved | 14:46 |
fungi | yeah, i'm hunting it down | 14:46 |
clarkb | yup in launch-node.py it lists that file specifically | 14:46 |
clarkb | should be updated to point to /etc/ansible/hosts/emergency.yaml I think | 14:46 |
fungi | it already includes that | 14:47 |
fungi | oh, wait, playbooks/roles/install-ansible/templates/ansible.cfg.j2 includes it | 14:48 |
fungi | launch/launch-node.py only includes the one from the system-config repo | 14:48 |
*** chandankumar is now known as raukadah | 14:48 | |
clarkb | /opt/system-config/inventory/emergency.yaml is in launch-node.py | 14:48 |
fungi | i guess we can also clean out the nonexistent system-config one and make sure they both use the one from /etc | 14:48 |
clarkb | yup | 14:49 |
fungi | i'll also correct some references in doc/source/sysadmin.rst | 14:50 |
*** openstackgerrit has quit IRC | 14:51 | |
*** openstackgerrit has joined #openstack-infra | 14:52 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: try lazylog https://review.opendev.org/672991 | 14:52 |
*** kopecmartin is now known as kopecmartin|off | 14:56 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Correct emergency file reference in launch script https://review.opendev.org/672996 | 14:57 |
fungi | turns out the other entries in playbooks/roles/install-ansible/templates/ansible.cfg.j2 were fine, they were for the actual inventories in the system-config repo | 14:57 |
openstackgerrit | Jeremy Stanley proposed opendev/zone-opendev.org master: Update IP address for gitea02 https://review.opendev.org/672997 | 15:02 |
clarkb | fungi: ^ we need a change to add that host to the inventory as gitea02 too | 15:04 |
*** piotrowskim has quit IRC | 15:04 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Add gitea02 replacement to inventory https://review.opendev.org/672999 | 15:05 |
fungi | yeah, i was typing it ;) | 15:05 |
*** ykarel is now known as ykarel|away | 15:05 | |
*** yamamoto has quit IRC | 15:06 | |
clarkb | left a couple notes on the emergency file docs changes. I think followups for that are fine so I +2'd | 15:07 |
clarkb | fungi: I +2'd 672999 but just realized I think it needs the exclusion on the remote_puppet_git.yaml playbook to avoid having ansible update the db | 15:08 |
*** yamamoto has joined #openstack-infra | 15:09 | |
*** dtantsur is now known as moltendmitry | 15:09 | |
fungi | oh, right, thanks | 15:09 |
*** yamamoto has quit IRC | 15:10 | |
*** bhavikdbavishi has quit IRC | 15:10 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Add gitea02 replacement to inventory https://review.opendev.org/672999 | 15:11 |
clarkb | +2 mordred corvus ^ have a moment for that change? | 15:13 |
fungi | gonna grab an early lunch while all that percolates and get the repos initialized when i get back | 15:14 |
clarkb | fungi: we'll want ot double check that gitea01 noops as expected when ansible runs against it but it did with 06 so should be fine | 15:14 |
donnyd | Ok, I'm picking up what your are putting down now | 15:14 |
mordred | clarkb: lgtm | 15:14 |
clarkb | donnyd: I'm happy to look at some examplse too and should have a logstash window with the timeout queryup somewhere let me see | 15:14 |
donnyd | node_provider:"fortnebula-regionone" AND filename:job-output.txt AND message:"RUN END RESULT_TIMED_OUT" | 15:15 |
clarkb | you can keep all your tabs forever but what they don't tell you is that if you do you'll never find the one you want again | 15:15 |
* clarkb has a tab problem | 15:15 | |
*** jamesmcarthur has quit IRC | 15:17 | |
donnyd | clarkb: needs nested tabbing | 15:17 |
openstackgerrit | Merged opendev/zone-opendev.org master: Update IP address for gitea02 https://review.opendev.org/672997 | 15:21 |
*** jamesmcarthur has joined #openstack-infra | 15:21 | |
*** cmurphy is now known as cmorpheus | 15:25 | |
clarkb | http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/job-output.txt is an example of a timed out default tempest job | 15:29 |
clarkb | devstack took 32 minutes to run there which isn't the fastest but is also within range of other cloud regions | 15:29 |
clarkb | and it doesn't timeout until it gets into the slow tempest tests run which is right at the end of the job so we are very near the timeout | 15:29 |
donnyd | I would like to get the devstack install times down a little further, but I am not sure where the bottleneck in it is | 15:30 |
clarkb | donnyd: I think a lot of it could be improvements to devstack and the projects themselves. For example we do database migrations for a lot of projects from many releases ago as they haven't been rolled up | 15:32 |
donnyd | On the bright side I bumped the cpu ratios back up and it would seem that this is more where I would like density to be | 15:32 |
clarkb | but ya I agree devstack could be quicker, we just don't have anyone investing in that (and when people suggest alternatives to devstack they tend to be even slower :( ) | 15:33 |
donnyd | https://usercontent.irccloud-cdn.com/file/K6D0a0CW/Screenshot%20from%202019-07-26%2011-32-21.png | 15:33 |
donnyd | towards the left side is this morning with cpu ratio at 1.5:1 | 15:34 |
donnyd | and right side is 2.0:1 | 15:34 |
donnyd | in case anyone is curious | 15:34 |
clarkb | the good thing about that tempest job is that it has dstat logs so we can sanity check those | 15:34 |
clarkb | if you download http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/dstat-csv_log.txt.gz, uncompress it then you can feed it to https://lamada.eu/dstat-graph/ | 15:35 |
donnyd | I am interesting in what can be done from an infra perspective to speed up devstack | 15:35 |
*** yamamoto has joined #openstack-infra | 15:35 | |
donnyd | interested* | 15:35 |
clarkb | looking at that dstat graph I think (lack of) memory and resulting spike in load average and disk usage and paging are a big hit | 15:37 |
clarkb | c-bak is still a running service which was identified as a memory hog that isn't actually tested by the job? mriedem you responded to my emails about this in the past do you have up to date info? | 15:38 |
mriedem | patch in tempest is still sitting | 15:38 |
mriedem | https://review.opendev.org/#/c/651865/ | 15:38 |
mriedem | i think it's waiting on gmann's work to split apart the integrated gate template into separate jobs so that cinder is running something which runs c-bak so those tests still get run on cinder changes | 15:39 |
*** jbadiapa has quit IRC | 15:39 | |
mriedem | having said that, now that gmann is adding all of these new templates, those new templates, except for anything that runs cinder, should probably also disable c-bak | 15:39 |
clarkb | looking at the graph a bit more closely we can see lack of memory results in a spike in wai state | 15:39 |
clarkb | which will definitely slow things down | 15:40 |
clarkb | mriedem: Ideally we'd look at where the memory use in the projects is coming from too. I know heat did this once and dropped memroy use by like 800MB | 15:40 |
donnyd | I can't make disk speeds much more than they already are | 15:40 |
*** yamamoto has quit IRC | 15:41 | |
clarkb | donnyd: I think that is fine. Once we hit swap its sort of a "good luck" point | 15:41 |
donnyd | writes are hitting the limits of an individual nic, although reads could be faster | 15:41 |
*** noorul has joined #openstack-infra | 15:41 | |
clarkb | donnyd: we have swap there because sometimes it is fast enough that we don't have to throw away the job, but if it isn't well hard to blame the cloud for that | 15:41 |
mriedem | aha https://review.opendev.org/#/c/669312/6/.zuul.yaml@213 | 15:41 |
donnyd | I am curious to see if the new storage will speed the jobs up any at all... doesn't look like there is much of a write workload looking at an individual job | 15:42 |
*** armax has joined #openstack-infra | 15:43 | |
donnyd | https://usercontent.irccloud-cdn.com/file/YeqEgCTu/image.png | 15:43 |
donnyd | But looking at network traffic overall I surely bang up against the limits of an individual nic | 15:43 |
clarkb | scanning devstack logs we install uwsgi and compile it because our wheel mirror doesn't have it | 15:44 |
clarkb | fixing that will save ~15 seconds | 15:44 |
clarkb | prometheanfire: smcginnis tonyb any idea why uwsgi isn't listed in global-requirements? if it were we would have wheels ready to go for it | 15:45 |
donnyd | On at least my cloud, my cpu to memory ratios could go way up on the memory side. I am using not even 25% of the addressable memory | 15:46 |
clarkb | donnyd: we've intentionally limited memory in the test environments in part to make it possible for someone to say "my test failed" then run it locally on say a laptop or desktop and not require them to have a rack of servers | 15:46 |
clarkb | it also helps reach a balance with clouds and resource utilization where we don't have a ton of underutilized instances | 15:47 |
clarkb | (tests can scale up by requesting multiple nodes if they know they need that) | 15:47 |
donnyd | That makes sense, but I am curious to know if giving devstack more memory would in-fact speed it up | 15:47 |
donnyd | I am not sure what laptops out there have 8 cores, but only 8 GB of memory | 15:48 |
donnyd | The laptops I have that do have better processors (with 8 cores), also usually have a bit more in the memory dept... mine specifically 64G... But i don't think 16 is a typical at this point... | 15:49 |
clarkb | donnyd: with cores we use more of them if we have them (to speed up testing) but you don't need 8 to run tempest | 15:49 |
clarkb | 8GB remains the pretty typical laptop memory setup | 15:49 |
clarkb | going to single dimm machines for thinness seems to have really impacted memory availability | 15:50 |
donnyd | I am not disagreeing, because what you are saying makes sense | 15:50 |
*** moltendmitry is now known as dtantsur|afk | 15:50 | |
donnyd | I am just trying to find out where the optimal amount of memory lies to make devstack go as fast as possible with reasonable DC equipment | 15:51 |
*** xek__ has quit IRC | 15:51 | |
clarkb | But also we have bloated memory use and I don't think anyone has looked at why other than my quick "oh c-bak made dstat sad" | 15:51 |
clarkb | and rather than simply through more memory at the problem it would be good to understand | 15:51 |
*** xek__ has joined #openstack-infra | 15:51 | |
prometheanfire | clarkb: at the time we didn't want to choose one impl (gunicorn vs uwsgi vs whatever) | 15:52 |
donnyd | I will run some tests on my end to see where the balance in between "not achievable on a laptop" and "fast as possible lies" | 15:52 |
prometheanfire | not sure about license off the top of head either | 15:52 |
*** Vadmacs has quit IRC | 15:52 | |
*** jangutter has joined #openstack-infra | 15:54 | |
clarkb | prometheanfire: uwsgi is gpl v2+ with linking exception | 15:54 |
prometheanfire | clarkb: ya, just looked it up | 15:54 |
prometheanfire | not sure that's allowed or not, I know the base gpl is not | 15:55 |
paladox | mordred bazel caused load for me to go up to 233 apparently | 15:55 |
clarkb | we spend almost 7 minutes just creating keystone services and roles and such | 15:57 |
*** eharney has quit IRC | 15:57 | |
prometheanfire | guess it'd be considered Projects run as part of the OpenStack Infrastructure | 15:57 |
prometheanfire | so as it's OSI, it'd be fine | 15:57 |
clarkb | cmorpheus: ^ Any idea if we can trim that down? like maybe we don't need all of those roles by default? | 15:57 |
clarkb | prometheanfire: and the linking exception makes it extra sfae | 15:57 |
prometheanfire | yep | 15:57 |
*** rpittau is now known as rpittau|afk | 15:57 | |
clarkb | cmorpheus: starts at about http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/devstacklog.txt.gz#_2019-07-26_10_36_48_436 there and goes to about http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/devstacklog.txt.gz#_2019-07-26_10_43_43_153 | 15:58 |
cmorpheus | clarkb: looking | 15:58 |
*** gyee has joined #openstack-infra | 15:59 | |
openstackgerrit | Graham Hayes proposed zuul/nodepool master: Implement an Azure driver https://review.opendev.org/554432 | 16:01 |
cmorpheus | the reader member and admin roles are created by keystone and can't be trimmed down, the ResellerAdmin role i think is for swift, i don't think anotherrole or invisible_to_admin are really useful right now | 16:01 |
cmorpheus | oh invisible_to_admin is a project i guess | 16:02 |
cmorpheus | i haven't read scrollback, what is the actual problem? | 16:02 |
prometheanfire | clarkb: I'd put it to the list before we settle on it, given that reqs doesn't like duplicate functionality | 16:03 |
clarkb | cmorpheus: looking into general slowness of devstack + tempest jobs | 16:03 |
clarkb | cmorpheus: devstack took 32 minutes in this case and ~7 of that is just that keystone setup | 16:03 |
clarkb | separately it appears that digging into swap during tempest runs may be cause of slowdown when running tempest | 16:03 |
clarkb | cmorpheus: mostly just trying to see if we can improve runtime by fixing inefficiencies | 16:04 |
*** lucasagomes has quit IRC | 16:04 | |
cmorpheus | okay cool | 16:05 |
clarkb | is osc the only way to create those keystone setup entries? Is keystoneclient still a thing or is there an admin tool? | 16:06 |
*** icarusfactor has joined #openstack-infra | 16:06 | |
clarkb | (might be helpful to do comparison of tool costs if we can) | 16:06 |
cmorpheus | keystoneclient has no cli and is going away some day | 16:07 |
cmorpheus | the keystone-manage admin tool can't be used for most of this, we only use it to bootstrap an admin user | 16:07 |
clarkb | we unfortunately pushed everything into osc and then realized after it was too late that it had a large performance impact | 16:08 |
*** factor has quit IRC | 16:08 | |
clarkb | I guess we could write a script to do that chunk of config | 16:08 |
clarkb | to avoid the cost of python and pkg_resources spin up time | 16:08 |
cmorpheus | one thing we could do is remove all the service users and just have one service user do all the things | 16:09 |
clarkb | mordred: ^ how crazy would such a thing be? and maybe you already have such a thing because sdk testing? | 16:09 |
cmorpheus | crap i have a meeting brb | 16:09 |
*** e0ne has quit IRC | 16:12 | |
clarkb | https://opendev.org/openstack/devstack/src/branch/master/stack.sh#L1146-L1161 that is the 7 minute block | 16:14 |
openstackgerrit | Paul Belanger proposed opendev/base-jobs master: Switch base-test ansible version to 2.8 https://review.opendev.org/673012 | 16:14 |
*** jamesmcarthur has quit IRC | 16:15 | |
pabelanger | infra-root: ^if you don't mind reviewing, this allows us to start testing base-test jobs using ansible 2.8. Which should be fine, and humans need to opt into base-test | 16:15 |
*** jamesmcarthur has joined #openstack-infra | 16:16 | |
*** mattw4 has joined #openstack-infra | 16:17 | |
openstackgerrit | Merged opendev/system-config master: Add gitea02 replacement to inventory https://review.opendev.org/672999 | 16:17 |
*** pkopec has quit IRC | 16:18 | |
jangutter | is #openstack-infra the go-to place to talk about devstack? | 16:19 |
jonher | jangutter #openstack-qa is probably best for devstack | 16:20 |
jangutter | thanks jonher! | 16:20 |
jonher | np | 16:20 |
*** diablo_rojo has joined #openstack-infra | 16:21 | |
*** jamesmcarthur has quit IRC | 16:21 | |
fungi | okay, back now... let's see where we're at | 16:23 |
clarkb | fungi: change to run ansible on gitea02 just merged | 16:23 |
clarkb | fungi: waiting on that to appyl now | 16:23 |
*** ricolin has quit IRC | 16:23 | |
fungi | accepted the gitea02 ssh host key on bridge.o.o now | 16:23 |
*** noorul has quit IRC | 16:23 | |
fungi | on the next ansible pass it ought to set up docker/gitea and then i can initialize the repos | 16:24 |
*** larainema has quit IRC | 16:25 | |
clarkb | mordred: do methods like conn.identity.role_project_user_assignments in sdk's examples actually exist? and if so why do they not show up in the api docs nor outside of the examples when grepped for? | 16:27 |
clarkb | (in particular it would be great to know if that takes any parameters for narrowing the list of roles) | 16:28 |
*** jamesmcarthur has joined #openstack-infra | 16:31 | |
openstackgerrit | Merged opendev/base-jobs master: Switch base-test ansible version to 2.8 https://review.opendev.org/673012 | 16:32 |
*** pkopec has joined #openstack-infra | 16:33 | |
*** icarusfactor has quit IRC | 16:33 | |
*** icarusfactor has joined #openstack-infra | 16:33 | |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: DNM: Switch unitests to use base-test https://review.opendev.org/673014 | 16:35 |
*** mriedem is now known as mriedem_lunch | 16:35 | |
*** jamesmcarthur has quit IRC | 16:36 | |
*** icarusfactor has quit IRC | 16:36 | |
*** jamesmcarthur has joined #openstack-infra | 16:36 | |
openstackgerrit | Paul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673015 | 16:36 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673016 | 16:37 |
*** roman_g has quit IRC | 16:41 | |
*** iurygregory has quit IRC | 16:46 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add severity filtering to logs https://review.opendev.org/672839 | 16:46 |
*** iurygregory has joined #openstack-infra | 16:47 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add severity filtering to logs https://review.opendev.org/672839 | 16:49 |
clarkb | cmorpheus: mordred https://review.opendev.org/673018 is the I have no idea what I am doing change | 16:51 |
openstackgerrit | Jean-Philippe Evrard proposed openstack/project-config master: [WIP] Add tooling to update python jobs on branch creation https://review.opendev.org/673019 | 16:52 |
*** mattw4 has quit IRC | 16:54 | |
clarkb | fungi: base.yaml is running on gitea01 now | 16:57 |
clarkb | er 02 | 16:57 |
fungi | yep, docker not running yet though | 16:58 |
*** mattw4 has joined #openstack-infra | 16:58 | |
fungi | keeping tabs on it and will initialize repos as soon as gitea is up | 16:58 |
*** derekh has quit IRC | 16:58 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 17:01 |
*** eharney has joined #openstack-infra | 17:01 | |
fungi | it did install gnupg this time | 17:01 |
fungi | not docker yet though | 17:01 |
*** jangutter has quit IRC | 17:02 | |
mnaser | clarkb: i wonder if replacing all that osc shell code by some sort of python code would speed things up | 17:03 |
mnaser | i'd imagine it would | 17:03 |
clarkb | mnaser: yup see https://review.opendev.org/673018 | 17:04 |
clarkb | mnaser: thats the small scale check it on a thing we run 15 times | 17:04 |
clarkb | if that shows improvements we can rewrite to be more complete | 17:04 |
* fungi misread that as a suggestion to replace openstackclient with something written in python | 17:04 | |
fungi | 1000 4476 1.4 1.1 1702848 90592 ? Ssl 17:04 0:01 /app/gitea/gitea web | 17:06 |
fungi | woo! | 17:06 |
fungi | proceeding | 17:06 |
*** iurygregory has quit IRC | 17:07 | |
fungi | https://docs.openstack.org/infra/system-config/gitea.html#deploy-a-new-backend indicates the next step is to stop gitea and restore a database dump | 17:08 |
clarkb | ++ | 17:08 |
*** jamesmcarthur has quit IRC | 17:08 | |
dtroyer | heh, I proposed once upon a time replacing those keystone bits with a string of commands piped into osc so it only loads once, it does help time-wise but the lack of decent error handling was a concern so we dropped it… | 17:09 |
* dtroyer is still accepting sponsorship proposals to write a proper cli in a single non-nterpreted binary | 17:09 | |
fungi | `docker ps -a` indicates i should stop 5c59a8a31b9d (opendevorg/gitea:latest) and 8e68bb69a209 (opendevorg/gitea-openssh) but leave 5bdefc623895 (mariadb:10.4) running | 17:09 |
clarkb | dtroyer: fwiw non interpreted binary isn't really the problem as much as "python is silly and scans the entire disk when loading packages then sorts them all by name and version because that is fast" | 17:10 |
fungi | okay, now only the mariadb:10.4 container is "up" | 17:10 |
clarkb | dtroyer: I fully expect my python script there to be much quicker since it doesn't pkg_resources | 17:10 |
clarkb | (at least I hope it doesn't end up doing that via openstacksdk) | 17:11 |
clarkb | this is why I'm testing it small scale first | 17:11 |
dtroyer | clarkb: yes, but it is still a PITA to install | 17:11 |
clarkb | dtroyer: wget'ing a prebuilt binary has a lot of problems with it too :/ | 17:12 |
clarkb | mostly in verifying the contents are as expected | 17:12 |
* mnaser rather wget a prebuilt binary that's always a fast client (or even build once) than wait 3 seconds every single time i run a command :( | 17:12 | |
mnaser | on a brand new openstack cluster (3 controllers in ha, zero load, zero vms): real 0m2.605s for openstack server list | 17:13 |
fungi | see, it's only 3 seconds because you rounded to the nearest second! ;) | 17:14 |
clarkb | mnaser: ya when I first looekd at this a couple years ago I think the numbers I had were about 50% scanning packages and sorting them and 50% http rtt | 17:14 |
clarkb | but again you can avoid scanning packages and sorting them with python | 17:15 |
clarkb | unfortunately the thing that does the scanning and sorting is pretty well entrenched so shows up all over the place (meaning if you remove it one place you find it in another and the list goes on and on) | 17:16 |
clarkb | and of that 50% rtt for http I want to say a good chunk of it is getting a token? | 17:18 |
clarkb | I don't think I was caching the tokens | 17:18 |
clarkb | but maybe that is automagic and I didn't notice | 17:18 |
*** bobh has joined #openstack-infra | 17:23 | |
clarkb | fungi: how goes gitea02 db recovery? | 17:25 |
fungi | just about done shuffling db copies around | 17:26 |
*** bobh has quit IRC | 17:26 | |
fungi | wanted to grab a fresh one just in case | 17:26 |
fungi | i can reuse it for subsequent replacements today/tomorrow at least | 17:26 |
clarkb | cmorpheus: can you see my responses to your comments on that devstack change? the sdk api docs say that those parameters are not valid | 17:28 |
*** weifan has joined #openstack-infra | 17:28 | |
*** udesale has quit IRC | 17:31 | |
*** igordc has joined #openstack-infra | 17:34 | |
*** harlowja has quit IRC | 17:35 | |
*** igordc has quit IRC | 17:36 | |
*** igordc has joined #openstack-infra | 17:36 | |
fungi | db import to gitea02 completed and "All missing Git repositories for which records existed have been reinitialized." | 17:37 |
fungi | starting the gerrit replication to it now | 17:37 |
clarkb | cmorpheus: does that mean I should do a filter just of the user then scan the resulting list for RoleAssignments that match the user_domain (and project_domain if assigned)? | 17:38 |
*** electrofelix has quit IRC | 17:38 | |
fungi | 2115 tasks | 17:39 |
clarkb | fungi: I've realized this may take longer than ~14 minutes beacuse there is no data at all on the rmote | 17:39 |
fungi | oh, perhaps | 17:39 |
clarkb | but we should see how long it actually takes compared to the mostly noop case of 14 minutes | 17:39 |
fungi | that could explain some of it | 17:39 |
fungi | yep | 17:39 |
*** Vadmacs has joined #openstack-infra | 17:41 | |
cmorpheus | clarkb: did you see my response? | 17:41 |
*** hwoarang has quit IRC | 17:42 | |
*** hwoarang_ has joined #openstack-infra | 17:43 | |
cmorpheus | filtering by role assignment on a domain is not the same as the user domain that the get_or_add_user_project_role function is getting | 17:43 |
clarkb | cmorpheus: ya I'm not quite sure I understand what we an to get back? I guess we specify the user domain for when the user isn't in the default domain set by env vars? in which case asking for all the roles won't work? | 17:43 |
clarkb | cmorpheus: right I get they aren't the same but I don't know what it is we actualyl want | 17:43 |
clarkb | we want the list of role assignments for the user that is in domain_not_default_in_env_var ? | 17:43 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Swap gitea02 into service and bring down gitea03 https://review.opendev.org/673026 | 17:44 |
fungi | that ^ should be the next round once replication completes | 17:44 |
clarkb | and sdk doesn't (document at least) a method to get that data short of creating a different connection with different user domain details maybe | 17:44 |
cmorpheus | clarkb: we want the list of role assignments that the user has on the project, domains only come into play here because both the user and the project are namespaced by a domain | 17:44 |
cmorpheus | if you only have the username then you always need to specify the domain | 17:45 |
cmorpheus | except if it's the default domain then i think osc and may sdk do some magic for you there | 17:45 |
clarkb | cmorpheus: right ok. Domain is specified to be default via env vars. So this will work as long as the user isn't in a non default domain | 17:45 |
clarkb | (and I update it to not filter on domain) | 17:45 |
cmorpheus | okay sounds good | 17:45 |
cmorpheus | also i apologize on behalf of my predecessors who came up with this | 17:46 |
fungi | i'll drink to that | 17:48 |
*** rtjure has quit IRC | 17:50 | |
clarkb | once I get enough logging to hopefully figure out if the domains are supplied as ID's or names I'll update the connect call | 17:50 |
*** mriedem_lunch is now known as mriedem | 17:50 | |
*** ykarel|away has quit IRC | 17:51 | |
*** tesseract has quit IRC | 17:54 | |
clarkb | we appear to primarily operate using IDs. change updated to reflect that now | 17:54 |
*** hwoarang_ has quit IRC | 17:54 | |
fungi | 651 tasks | 17:54 |
*** Vadmacs has quit IRC | 17:55 | |
clarkb | fungi: that is slower but not significanlty so | 17:55 |
fungi | i'll self-approve 673026 once it hits ~0 and start on replacing 03 | 17:55 |
*** rtjure has joined #openstack-infra | 17:56 | |
*** hwoarang has joined #openstack-infra | 17:56 | |
*** chason has quit IRC | 17:56 | |
clarkb | fungi: I just noticed a bug in that chagne, the ip addr for gitea02 in the haproxy config is not up to date | 17:57 |
fungi | thanks!!! | 17:59 |
fungi | fixing now | 17:59 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Swap gitea02 into service and bring down gitea03 https://review.opendev.org/673026 | 18:01 |
fungi | clarkb: ^ | 18:01 |
clarkb | that looks better thanks | 18:02 |
fungi | also looks like replication finished | 18:02 |
*** mattw4 has quit IRC | 18:02 | |
*** mattw4 has joined #openstack-infra | 18:02 | |
fungi | currently there's a slew of git-upload-pack '/openstack/nova' for the zuul user | 18:03 |
*** hwoarang has quit IRC | 18:03 | |
fungi | and a sudden burst of index changes | 18:03 |
*** hwoarang has joined #openstack-infra | 18:04 | |
*** ramishra has quit IRC | 18:12 | |
*** weifan has quit IRC | 18:16 | |
*** weifan has joined #openstack-infra | 18:17 | |
*** weifan has quit IRC | 18:17 | |
donnyd | clarkb: fungi So I am hoping that this will be the weekend I can actually get my controller swapped out. Do you think we should update the quota in zuul, as the control plane will be unreachable for an hour or so? Or just let it ride because weekend loads are low anyways | 18:20 |
fungi | donnyd: dropping the quota to 0 won't help i don't think if the api itself is unreachable, but we can certainly set max-servers to 0 in nodepool. or alternatively just expect that there will be a handful of boot failures logged by nodepool until the api is back up | 18:22 |
donnyd | thats what I mean.. yes. max-servers | 18:22 |
donnyd | thanks fungi | 18:22 |
fungi | i don't see the latter as a problem really | 18:22 |
fungi | i mean, we've designed nodepool to withstand severe provider outages | 18:23 |
donnyd | I am hoping to do an mostly online swap from my edge LB between the two control planes | 18:23 |
fungi | if setting max-servers=0 for it will help you feel less rushed, i'm happy to approve such a change | 18:23 |
donnyd | So the new controllers will be built and populated in parallel and then just swap out the place the edge LB sends requests to | 18:24 |
donnyd | All sounds good... till it doesn't | 18:24 |
fungi | if the account credentials, endpoint and so on will remain the same, then i don't personally see any need to zero out the max-servers for it | 18:25 |
donnyd | Yea, I have all of what was used to provision this automated (mostly) | 18:27 |
donnyd | so nothing should change on your end at all | 18:27 |
pleia2 | happy sysadmin day :) | 18:27 |
clarkb | nodepool should happily deal with those api errors, it may print a lot of log meesages about it though (thats fine) | 18:27 |
clarkb | pleia2: and to you too! | 18:28 |
donnyd | I'm hopeful for a 5 minute outage... but I also live in the real sysadmin world | 18:28 |
clarkb | pleia2: are you sysadmining for ibm? | 18:28 |
*** Vadmacs has joined #openstack-infra | 18:29 | |
fungi | thanks pleia2!!! | 18:31 |
*** hwoarang has quit IRC | 18:32 | |
fungi | may your systems be bountiful | 18:32 |
pleia2 | clarkb: only a tiny bit here and there (we run an openstack-driven cloud that launches VMs on mainframes, so I poke my head in when needed) | 18:32 |
pleia2 | mostly I do dev advocacy though | 18:32 |
*** hwoarang has joined #openstack-infra | 18:32 | |
pleia2 | (we use the z/VM connector for nova, but switching to KVM soon, which runs on s390x and will make our lives 100x easier) | 18:33 |
*** tdasilva has quit IRC | 18:33 | |
*** weifan has joined #openstack-infra | 18:34 | |
*** weifan has quit IRC | 18:34 | |
*** weifan has joined #openstack-infra | 18:38 | |
fungi | pleia2: openstack in use at ibm? i thought that was only a myth! | 18:39 |
pleia2 | haha | 18:41 |
clarkb | pleia2: is kvm loaded off of a virtual card deck? | 18:42 |
*** weifan has quit IRC | 18:42 | |
clarkb | it would make me so happy if it is | 18:42 |
pleia2 | clarkb: I don't actually know how it works :) | 18:43 |
pleia2 | it's a supported thing though, right alongside z/VM | 18:44 |
*** mattw4 has quit IRC | 18:46 | |
*** bobh has joined #openstack-infra | 18:48 | |
corvus | pleia2: happy sysadmin day to you too! | 18:48 |
corvus | who brought the cake? | 18:50 |
clarkb | I don't have cake but now I want some | 18:50 |
clarkb | I did just eat some leftover curry | 18:50 |
*** mattw4 has joined #openstack-infra | 18:51 | |
corvus | clarkb: was that breakfast or lunch? | 18:51 |
clarkb | something in the middle but closer to lunch | 18:51 |
clarkb | I think my little python script in devstack didn't break this time | 18:52 |
clarkb | not sure if it is faster yet. Will have to wait for logs and compare to other jobs on that cloud | 18:52 |
fungi | i could go for some currycake | 18:53 |
clarkb | hrm I got the handling of user_domain and project_domain wrong looks like | 18:53 |
clarkb | because they are names not ids | 18:53 |
clarkb | so inconsistent | 18:53 |
fungi | pleia2: i have fond memories of being a racf administrator for s/390 clusters running linux in lpars. i hope it's as enjoyable for you! | 18:57 |
fungi | [edit: i guess we called it a "sysplex" not a "cluster"] | 18:57 |
*** bobh has quit IRC | 19:02 | |
pleia2 | fungi: cool, it sure is :) | 19:03 |
clarkb | cmorpheus: it is amazing how complicated this little script ends up getting, makes me wonder how much faster it would be overall to continue to support names and ids over osc | 19:04 |
*** rh-jelabarre has quit IRC | 19:04 | |
cmorpheus | heh | 19:04 |
clarkb | I've basically run into needingto look up all inputs to get their id's because they might be names | 19:05 |
clarkb | (devstack uses both names and ids) | 19:05 |
fungi | and i guess the api doesn't treat them interchangeably | 19:07 |
clarkb | the wins now may not be seen unless we rewrite that section of shell into python entirely. Then we can get the role user and project data once and reuse it over and over and over | 19:07 |
*** weifan has joined #openstack-infra | 19:07 | |
*** weifan has quit IRC | 19:08 | |
fungi | and so the slow progression of translating devstack into python continues | 19:08 |
*** weifan has joined #openstack-infra | 19:08 | |
clarkb | fungi: apparently not re treating them the same | 19:08 |
fungi | that's one thing i've come to appreciate about gerrit's rest api... you can provide a variety of typed inputs for certain parameters and it will decide how to equate them | 19:09 |
fungi | so if you're doing an account lookup you can provide an id number or a username or an e-mail address or... and it will dereference them all to the same values behind the scenes | 19:10 |
fungi | granted, it also returns lists for just about everything, because there's no guarantee that the inputs reduce to a single value | 19:12 |
cmorpheus | clarkb: yeah you're starting to reproduce part of why osc is so slow | 19:12 |
*** weifan has quit IRC | 19:13 | |
fungi | sometimes the only way to truly understand a problem is to try and reproduce the solution? | 19:13 |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: DNM: Switch unitests to use base-test https://review.opendev.org/673014 | 19:15 |
*** rh-jelabarre has joined #openstack-infra | 19:16 | |
openstackgerrit | Paul Belanger proposed zuul/zuul master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673016 | 19:18 |
openstackgerrit | Paul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673015 | 19:18 |
*** diablo_rojo has quit IRC | 19:26 | |
clarkb | cmorpheus: ya I think the proper way this gets faster is to rewrite the whole configure accounts stuff in python rather than just the function bits | 19:30 |
openstackgerrit | Merged opendev/system-config master: Swap gitea02 into service and bring down gitea03 https://review.opendev.org/673026 | 19:31 |
*** weifan has joined #openstack-infra | 19:32 | |
fungi | as soon as that gets installed onto the lb and the active connections to 03 trail off, i'll rip and replace | 19:35 |
*** weifan has quit IRC | 19:36 | |
*** weifan has joined #openstack-infra | 19:36 | |
*** weifan has quit IRC | 19:37 | |
*** weifan has joined #openstack-infra | 19:37 | |
*** igordc has quit IRC | 19:37 | |
*** weifan has quit IRC | 19:38 | |
*** weifan has joined #openstack-infra | 19:38 | |
*** weifan has quit IRC | 19:39 | |
*** weifan has joined #openstack-infra | 19:39 | |
*** weifan has quit IRC | 19:39 | |
*** weifan has joined #openstack-infra | 19:40 | |
*** weifan has quit IRC | 19:40 | |
*** slaweq has quit IRC | 19:41 | |
*** wpp has quit IRC | 19:48 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 19:50 |
*** ralonsoh has quit IRC | 19:55 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Verify Operator Pod Running https://review.opendev.org/670395 | 19:55 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 19:55 |
mordred | wow. that's a fun change! | 19:58 |
*** goldyfruit has quit IRC | 20:02 | |
paladox | mordred bazel caused load on my mac to go up to 233. | 20:03 |
mordred | paladox: well - I can build 2.15 now - but 2.16 and 3.0 are _really_ unhappy | 20:04 |
*** goldyfruit has joined #openstack-infra | 20:04 | |
paladox | oh | 20:04 |
paladox | mordred do you run out of cpu/ram for 2.16? | 20:04 |
mordred | yeah - if I use the same settings that work for 2.15 | 20:05 |
paladox | I'm surprised 3.0 is a problem as that removed GWTUI (so less to build) | 20:05 |
*** wpp has joined #openstack-infra | 20:05 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 20:05 |
mordred | yeah | 20:05 |
mordred | paladox: oh - actually - the last time I ran it, 2.16 failed in a new and different way | 20:06 |
paladox | oh | 20:06 |
mordred | http://logs.openstack.org/73/672273/6/check/system-config-build-image-gerrit-2.16/1c13564/job-output.txt.gz#_2019-07-26_13_32_10_643536 | 20:06 |
*** harlowja has joined #openstack-infra | 20:07 | |
mordred | and I seem to be foot-gunning with 3.0 ... doh. | 20:07 |
paladox | ohhh, was it trying to use the master branch? | 20:07 |
*** Vadmacs has quit IRC | 20:07 | |
paladox | https://github.com/GerritCodeReview/plugins_download-commands/commit/891455076417dd097fdfd63f4afc0d28a3e85aff <-- was the change that caused that | 20:08 |
*** Vadmacs has joined #openstack-infra | 20:08 | |
paladox | https://github.com/GerritCodeReview/plugins_download-commands/branches dosen't appear to have a stable-2.16 branch | 20:08 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Colorize log severity https://review.opendev.org/673103 | 20:09 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 20:09 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 20:09 |
*** igordc has joined #openstack-infra | 20:10 | |
openstackgerrit | Paul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673015 | 20:11 |
*** slaweq has joined #openstack-infra | 20:11 | |
*** weifan has joined #openstack-infra | 20:11 | |
mordred | paladox: hrm. good catch. | 20:11 |
*** sgw has joined #openstack-infra | 20:12 | |
*** weifan has quit IRC | 20:16 | |
*** slaweq has quit IRC | 20:16 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add raw links to log manifest https://review.opendev.org/673104 | 20:25 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Rename view to logfile https://review.opendev.org/673105 | 20:25 |
*** wolke has quit IRC | 20:26 | |
*** wolke has joined #openstack-infra | 20:27 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273 | 20:27 |
mordred | paladox: thanks! I Think that version might just work (instead of overlaying the zuul cloned repo directly, it uses submodule update --init to do it - but since we don't have an origin remote but we DO have things cloned in the right relative path, it should clone from the already cloned repo and not across the network | 20:28 |
mordred | corvus: ^^ weird but nice side-effect for gerrit submodule plugin repos and zuul | 20:28 |
paladox | :) | 20:29 |
mordred | corvus: I *think* we might want to update the playbook to do that for all of the plugin repos, not just download-commands | 20:29 |
mordred | so that we get the ref that the gerrit repo is expecting. now - that obviously breaks depends-on - but we can solve that when we have a need to do a depends-on with a plugin ref | 20:29 |
*** wolke has quit IRC | 20:29 | |
*** wolke has joined #openstack-infra | 20:30 | |
corvus | mordred: wait we use required-projects for the plugins | 20:32 |
mordred | yes. that's why the submodule update --init will work | 20:32 |
mordred | since there's no remote origin remote, it'll actually do relative path | 20:32 |
*** wolke has quit IRC | 20:33 | |
*** wolke has joined #openstack-infra | 20:33 | |
corvus | right, yes. i think i misunderstood what you were saying before :) | 20:33 |
mordred | corvus: I probably misunderstood what I'm saying - we're talking about submodules after all :) | 20:33 |
corvus | haha | 20:34 |
corvus | and yeah the shape of that change looks good to me | 20:34 |
mordred | corvus: here's hoping this build works! | 20:34 |
mordred | if it does, I might try making all of the plugin repos use this mechanism instead of the copy | 20:34 |
mordred | in fact - why don't I do that as a followup... | 20:35 |
fungi | gitea02 is in rotation now and gitea03 is removed, working on replacement | 20:35 |
corvus | mordred: oh wait | 20:35 |
*** wolke has quit IRC | 20:35 | |
paladox | that reminded me to pull in 2.15.15 :P, which i also found the build now fails :( | 20:35 |
corvus | mordred: okay, so download-commands is the issue -- why don't we just specify the right branch for that? | 20:35 |
paladox | corvus it dosen't have any 2.16+ branches | 20:35 |
paladox | apparently | 20:35 |
paladox | See https://github.com/GerritCodeReview/plugins_download-commands/branches | 20:36 |
*** wolke has joined #openstack-infra | 20:36 | |
corvus | mordred: ok; the downside of that is that depends-on won't work | 20:37 |
corvus | building at all > supporting depends-on > not supporting depends on | 20:38 |
mordred | yes | 20:38 |
corvus | so i agree this is the best we can do with download-commands :) | 20:38 |
corvus | but maybe it's not what we want for the others | 20:38 |
mordred | yeah - I agree | 20:38 |
mordred | also - doing it for the others makes the playbook more, not less, complex | 20:39 |
mordred | because doing submodule update --init is only useful for "builtin" plugins - and our "standard" set is a mix of both | 20:39 |
mordred | for non-standard, moving the repo in is always the right choice | 20:40 |
mordred | corvus: can override checkout take a sha? | 20:40 |
corvus | mordred: i think it has to be a ref | 20:42 |
mordred | darn | 20:42 |
corvus | but can be a branch or tag | 20:42 |
*** weifan has joined #openstack-infra | 20:42 | |
mordred | I was thinking if it could we could just override-checkout for the sha that 2.16 wants and then depends-on is only borked for 2.16 | 20:43 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Override-checkout download-commands to v2.16.10 https://review.opendev.org/673107 | 20:44 |
mordred | corvus: woot. there's a tag | 20:44 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 20:47 |
*** wolke has quit IRC | 20:48 | |
*** goldyfruit has quit IRC | 20:48 | |
*** wolke has joined #openstack-infra | 20:49 | |
*** wolke has joined #openstack-infra | 20:51 | |
*** wolke has quit IRC | 20:52 | |
*** priteau has quit IRC | 20:53 | |
jrosser | this has merged https://review.opendev.org/#/c/672952/, but i don't see it here https://opendev.org/openstack/ansible-config_template/commits/branch/master?lang=en-US# | 20:54 |
jrosser | is something broken? | 20:54 |
fungi | the commit itself seems to have replicated: https://opendev.org/openstack/ansible-config_template/commit/b7f38639a21857aead860195d12eccf6eb9f437e | 20:56 |
jrosser | i just rechecked a ton of jobs that need that and they've all failed | 20:57 |
jrosser | suggests that master doesnt point to quite the right place | 20:58 |
corvus | jrosser: have a link to a job that failed? | 20:59 |
fungi | gerrit's on-disk copy of the repository indicates b7f38639a21857aead860195d12eccf6eb9f437e is the tip of master, so in theory ci jobs should be using that | 20:59 |
jrosser | http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz | 21:00 |
fungi | but i do wonder what's happened to replication to gitea | 21:00 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 21:02 |
*** goldyfruit has joined #openstack-infra | 21:05 | |
clarkb | cmorpheus: https://review.opendev.org/673108 I probably went a little crazy | 21:05 |
fungi | that master branch state doesn't seem to have replicated to any of the active gitea backends | 21:06 |
corvus | evrardjp, jrosser: is anyone working on updating that job to use zuul git repo checkouts? because we really shouldn't be cloning from opendev.org in jobs | 21:06 |
cmorpheus | clarkb: omg | 21:06 |
fungi | i'm going to try to manually trigger full replication for openstack/ansible-config_template and see what happens | 21:06 |
cmorpheus | clarkb: you replaced 80 lines of shell with 300 lines of python | 21:07 |
clarkb | cmorpheus: I know. I just want real data to know if there are gains to be had here. If not I'll give up | 21:07 |
clarkb | cmorpheus: this change should be alrge enough and caching enough id data to see a delta though | 21:08 |
jrosser | corvus: well in theory it should be using them https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-role-requirements.yml#L35-L61 | 21:08 |
jrosser | but of course that could be broken | 21:08 |
mordred | http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_33_116422 seems to be running, but then http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_46_107736 is still doing the clone | 21:10 |
corvus | jrosser: oh that looks promising... /me digs into that | 21:10 |
mordred | so I'm thinking maybe the filtering in https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-role-requirements.yml#L74-L78 is not doing the right thing/ | 21:10 |
mnaser | oh look osa things | 21:10 |
mordred | ? | 21:10 |
mnaser | i wonder if we're missing required-projects | 21:11 |
fungi | forcing full replication doesn't seem to have updated openstack/ansible-config_template master branch state | 21:11 |
jrosser | this looks suspect http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_43_646148 | 21:11 |
mnaser | https://review.opendev.org/#/c/670473/ is an os_ceilometer change | 21:11 |
mnaser | and well it checked out os_ceilometer | 21:11 |
corvus | mnaser, jrosser: yes, i think that's it | 21:11 |
mnaser | so because we dont have all the required-projects listed | 21:12 |
corvus | so it is working, it's just there are no required projects, so it's only activated for dependencies | 21:12 |
fungi | for some reason the master branch state is correct in the github mirror but not on the gitea backends | 21:12 |
mnaser | we're not hard failing like things failed whn we moved to zuulv3 | 21:12 |
fungi | checking gerrit logs next | 21:12 |
mnaser | cause we don't hard fail if the stuff is not missing | 21:12 |
mnaser | (we should probably hard fail in ci if a required-project is missing) | 21:12 |
corvus | and if we added more there, then the job would benefit from the cache and be faster (and also be immune to mirror hiccups) | 21:12 |
mnaser | and skip when not in ci | 21:12 |
mnaser | let me hack up something | 21:12 |
*** rfarr has joined #openstack-infra | 21:13 | |
jrosser | mnaser: thanks :) | 21:13 |
corvus | mnaser, jrosser, fungi: cool, so to summarize -- the job is set up to use zuul repos but simply doesn't have enough required projects listed so we fell back on the mirror, and the mirror is slightly out of date for as-yet-unknown reason. | 21:13 |
corvus | evrardjp: ^ you can ignore ping from earlier :) | 21:14 |
fungi | corvus: agreed. still digging into the outdated state of the master branch ref for that repo in gitea | 21:15 |
fungi | no mentions of that repo in the gerrit error log aside from some timeouts while stackalytics-bot-2 was trying to query it | 21:15 |
clarkb | fungi: if you rereplicate that repo does it update? | 21:15 |
clarkb | can limit it to a single gitea to prevent polluting the debuggable state | 21:16 |
fungi | that did nothing as far as i could tell (saw it enqueue the replication events, waited until they were done, still no change) | 21:16 |
fungi | all active gitea backends seem to be in a similar state with that repo too | 21:16 |
clarkb | did we have fs/disk problems again? | 21:19 |
clarkb | or maybe this is a hold over? | 21:19 |
mnaser | jrosser, corvus, fungi: https://review.opendev.org/673109 should be a failing CI job (because not everything is in required projects). if that fails as expected, i'll readd them to required-projectrs | 21:21 |
*** beekneemech is now known as bnemec-pto | 21:22 | |
*** eharney has quit IRC | 21:23 | |
fungi | clarkb: what's strange is that gitea02 was created at 14:43z today, its database was copied from gitea01's most recent nightly mysqldump, and then content was replicated in. the openstack/ansible-config_template master branch state updated at 19:38z today, long after we had finished replicating all repository states into gitea02 | 21:25 |
clarkb | oh interesting | 21:25 |
fungi | so it's hard to blame this on old state, unless we blame the copied database dump? | 21:25 |
*** raissa has quit IRC | 21:26 | |
*** Lucas_Gray has joined #openstack-infra | 21:29 | |
*** rfolco|rover has quit IRC | 21:30 | |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020 | 21:30 |
fungi | on gitea02, /var/gitea/data/git/repositories/openstack/ansible-config_template.git/refs/heads/master is still 09c76e238026d7ba4134ee2b66a4e9fd2617b843 | 21:33 |
fungi | which coincides with what the webui shows | 21:33 |
clarkb | does a fsck report anyting? | 21:34 |
*** wolke has joined #openstack-infra | 21:34 | |
*** ekultails has quit IRC | 21:34 | |
fungi | i do realize i forgot to do a git gc on gitea02, though that shouldn't affect this | 21:34 |
clarkb | no should only affect performance (laod avg may be higher on that host than the others) | 21:35 |
fungi | git fsck reports no problems for that repo | 21:36 |
*** wolke has left #openstack-infra | 21:37 | |
*** Vadmacs has quit IRC | 21:38 | |
fungi | also, it replicated the head change to github, but not to gitea | 21:38 |
fungi | or rather, the replication to github is successfully reflected while on gitea it is not | 21:39 |
clarkb | maybe fsck of the content in gerrit/github will reveal something that might make gitea unhappy? | 21:40 |
clarkb | we have had that with github before where replication didn't work due to problems in the repo that gerrit was ok with | 21:40 |
fungi | just a few dangling commits: http://paste.openstack.org/show/754915/ | 21:41 |
clarkb | fungi: if you docker ps -a you can check the logs for the gitea ssh container using docker logs --since --before $containerid type stuff | 21:42 |
clarkb | maybe do that for the time around when you triggered replication of that repo and see if anything shows up as a problem? | 21:42 |
fungi | the opendevorg/gitea-openssh has no docker logs at all | 21:49 |
fungi | the opendevorg/gitea:latest docker logs don't seem to have any sort of failure/error messages related to openstack/ansible-config_template (just entries which look like clients fetching/cloning from it) | 21:50 |
clarkb | in that case you may have to docker exec -it $opensscontainer bash | 21:52 |
clarkb | then poke around and look for logs | 21:52 |
clarkb | that gives you a bash process running int he context of that container | 21:53 |
*** mattw4 has quit IRC | 21:53 | |
*** jamesmcarthur has joined #openstack-infra | 21:55 | |
fungi | it looks like everything in that filesystem root is actually just mapped to files in /var/gitea | 21:56 |
fungi | so easier to browse/view them without docker getting in the way | 21:57 |
clarkb | including sshd logs? | 21:57 |
fungi | (at least so far i've found no files via a docker shell which weren't present there outside the container) | 21:57 |
fungi | oh, the ssh container. i'll try that one | 21:58 |
fungi | looks like they share the same filesystem tree | 21:59 |
paladox | mordred heh gerrit 2.15.15 broke some of our plugins due to a bazel change, so i've had to spend time pulling in the plugin update && also adapting one of our other plugins to the changes. | 22:00 |
clarkb | fungi: it is possible that sshd is only logging to syslog and we have to mount in /dev/log for that container to properly log (we did this with haproxy) | 22:01 |
clarkb | corvus: mordred ^ do you know? | 22:01 |
paladox | but that also means that the gerrit docker image we have will have to stay broken with 2.15.15 until we merge && deploy a new image. | 22:01 |
fungi | clarkb: yeah, i don't see any syslog in /var/log of the ssh container at least | 22:03 |
*** whoami-rajat has quit IRC | 22:06 | |
fungi | kinda tempted to press forward with the gitea03 replacement (ready to write the change to update dns records and add it back to the inventory) to see if it gets that head replicated | 22:06 |
clarkb | fungi: that seems like a reasonable test, worst case 03 will be in the same situation as the others | 22:07 |
clarkb | it would be cool if pip didn't tell you every package it was ignoring bceause your python version idnd't match env markers | 22:10 |
*** kjackal has quit IRC | 22:10 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Add gitea03 replacement to inventory https://review.opendev.org/673113 | 22:10 |
clarkb | that IP tells me it is a gitea03 + 2 | 22:11 |
*** slaweq has joined #openstack-infra | 22:11 | |
openstackgerrit | Jeremy Stanley proposed opendev/zone-opendev.org master: Update IP address for gitea03 https://review.opendev.org/673114 | 22:11 |
*** xek__ has quit IRC | 22:12 | |
*** raissa has joined #openstack-infra | 22:14 | |
*** raissa has quit IRC | 22:14 | |
*** raissa has joined #openstack-infra | 22:15 | |
*** jamesmcarthur has quit IRC | 22:15 | |
*** raissa has quit IRC | 22:15 | |
*** slaweq has quit IRC | 22:16 | |
*** raissa has joined #openstack-infra | 22:16 | |
*** raissa has joined #openstack-infra | 22:16 | |
*** raissa has quit IRC | 22:17 | |
fungi | infra-root: i'm at a loss for why /var/gitea/data/git/repositories/openstack/ansible-config_template.git/refs/heads/master is out of date on all our gitea servers (still referring to 09c76e238026d7ba4134ee2b66a4e9fd2617b843 when it should be b7f38639a21857aead860195d12eccf6eb9f437e like /home/review_site/git/openstack/ansible-config_template.git/refs/heads/master has) | 22:17 |
fungi | (...has on our gerrit server) | 22:17 |
clarkb | fungi: you did say the ref itself is present but the master ref isn't updated? | 22:17 |
fungi | yep | 22:17 |
corvus | by ref you mean commit? | 22:18 |
clarkb | er yes | 22:18 |
fungi | ref is already there, would have been replicated when the review patchset was pushed | 22:18 |
fungi | commit, yes | 22:18 |
corvus | and yeah, it's not a merge commit, so it's the same as a refs/changes ref | 22:18 |
clarkb | as a sanity check plenty of disk | 22:18 |
clarkb | at least on gitea01 | 22:18 |
corvus | i don't know where the ssh logs end up | 22:18 |
corvus | wherever ssh puts them by default at LogLevel INFO | 22:19 |
fungi | yeah, i had no luck finding them either | 22:19 |
clarkb | I think sshd probably logs to syslog and we don't have a syslog | 22:19 |
corvus | there's no other special logging | 22:19 |
fungi | i'm in the process of building a new gitea server (since i was doing that anyway) to see what state ends up replicated to it | 22:19 |
clarkb | we could mount /dev/log as with haproxy, that might get a little confusing as there is the host's sshd and the container sshd but probably not to the point where we don't want to do that | 22:20 |
fungi | from a timeline perspective, that change merged after gitea02 was built, replicated to and brought into service yet it has the same stale state as its sibling gitea servers | 22:21 |
corvus | what was the state of gitea03 at the time | 22:21 |
corvus | [2019-07-26 22:19:55,500] [f900b6cd] Cannot replicate to ssh://git@gitea03.opendev.org:222/openstack/ansible-config_template.git | 22:21 |
fungi | offline | 22:21 |
fungi | i had already deleted the nova server instance for it | 22:21 |
openstackgerrit | Merged opendev/zone-opendev.org master: Update IP address for gitea03 https://review.opendev.org/673114 | 22:22 |
fungi | there are now changes up to add dns and inventory entries for the new 03 | 22:22 |
fungi | dns just now merged, from the looks of it | 22:22 |
corvus | when was the original replication time? | 22:22 |
clarkb | https://review.opendev.org/#/c/672952/ is the change looks like 19:38UTC ish | 22:23 |
fungi | there were two because i manually initiated a replication of that repo in troubleshooting this | 22:23 |
clarkb | for when it merged which should be near the replication time | 22:23 |
fungi | but yeah, that would be the earlier one | 22:23 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 22:24 |
fungi | second was sometime between 21:06 and 21:11 since the various tasks were queued in gerrit for a few minutes | 22:24 |
corvus | in a minute, i'm going to manually trigger replication while running strace on sshd on gitea02 | 22:26 |
corvus | just as soon as this openstack/openstack replication finishes | 22:26 |
*** michael-beaver has quit IRC | 22:27 | |
fungi | sounds like a reasonable test | 22:31 |
corvus | the replication commands are not appearing in gerrit's log | 22:33 |
corvus | if i run this: ssh review replication start zuul/zuul-operator --url gitea02.opendev.org | 22:33 |
corvus | i see this: [2019-07-26 22:32:49,250] [] scheduling replication zuul/zuul-operator:..all.. => ssh://git@gitea02.opendev.org:222/zuul/zuul-operator.git | 22:33 |
corvus | if i run this: ssh review replication start openstack/ansible-config_templates --url gitea02.opendev.org | 22:34 |
corvus | i see nothing in replication.log | 22:34 |
corvus | did i spell the project right? | 22:34 |
corvus | no i did not, it's singular. | 22:34 |
corvus | i'll try again | 22:34 |
clarkb | openstack/ansible-config_template <- straight copy paste from gerrit webui | 22:35 |
clarkb | no s | 22:35 |
clarkb | oh you figured it out before me | 22:35 |
corvus | it's "running" now | 22:35 |
fungi | yep, sorry, had my nose in the gitea database schema seeing if it could somehow be double-tracking head states in there incorrectly | 22:35 |
fungi | (found no obvious place it might track repository heads in the db, fwiw) | 22:36 |
*** mattw4 has joined #openstack-infra | 22:38 | |
*** gyee has quit IRC | 22:39 | |
corvus | i think the logs are going to stderr | 22:41 |
corvus | i don't know why they are not showing up in "docker logs" | 22:41 |
corvus | -rw-r--r-- 1 1000 1000 73 Jul 26 19:38 .gitconfig | 22:43 |
corvus | -rw-r--r-- 1 1000 1000 0 Jul 26 19:43 .gitconfig.lock | 22:43 |
corvus | well that's a coincidince, huh? | 22:43 |
corvus | have we confirmed any updates since then? | 22:44 |
fungi | those times do look suspicious | 22:44 |
fungi | is that inside that particular repo? | 22:44 |
corvus | no, that's the homedir | 22:45 |
corvus | on gitea08 | 22:45 |
corvus | same times on all servers | 22:45 |
*** jamesmcarthur has joined #openstack-infra | 22:45 | |
corvus | /var/gitea/data/git | 22:45 |
clarkb | fwiw that .gitconfig.lock is the same file we had to delete after restarting the giteas yesterday before replication would work | 22:46 |
clarkb | (was assumed to be fallout of the ceph disaster earlier in the day) | 22:46 |
clarkb | maybe an OOM left it behind and the 8GB swapfile isn't big enough? | 22:46 |
fungi | that's well after gitea02 was brought back online, so there was nothing manual going on with gitea servers | 22:46 |
corvus | i don't see any currently running processes started around that time | 22:47 |
clarkb | [Fri Jul 26 19:39:45 2019] INFO: task khugepaged:65 blocked for more than 120 seconds. | 22:47 |
clarkb | from dmesg -T | 22:48 |
clarkb | not an OOM but unhappy disk? | 22:48 |
clarkb | [Fri Jul 26 19:39:19 2019] systemd[1]: systemd-journald.service: State 'stop-sigabrt' timed out. Terminating. | 22:48 |
* clarkb looks on other giteas | 22:48 | |
fungi | it does indeed look like other repos aren't updating either | 22:48 |
fungi | https://opendev.org/opendev/zone-opendev.org/commits/branch/master is missing 673114 which merged at 22:22 | 22:49 |
clarkb | [Fri Jul 26 19:38:44 2019] INFO: task jbd2/vda1-8:248 blocked for more than 120 seconds. <- from gitea01 | 22:49 |
corvus | [pid 26929] write(2, "error: could not lock config fil"..., 68) = 68 | 22:49 |
clarkb | seems like something happened in the cloud around that time and made the hosts unhappy, git process or gitea may have not cleaned up after itself as a result? | 22:49 |
corvus | [pid 26929] exit_group(-1 <unfinished ...> | 22:49 |
corvus | (that's from that strace) | 22:50 |
clarkb | so maybe we remove those lock files again, rereplicate (again) and bring this up with mnaser? | 22:50 |
fungi | gitea02 is showing signs of write errors as well from 19:39 today | 22:50 |
clarkb | fungi: I expect all of them will exhibit that but 03 | 22:50 |
fungi | gitea02 didn't even get nova booted until 14:43 today, so this is definitely not lingering issues from instances which went through the problems on wednesday | 22:51 |
*** goldyfruit has quit IRC | 22:52 | |
fungi | looks like a wholly fresh incident similar to the early wednesday one | 22:52 |
clarkb | yup | 22:52 |
fungi | or was that yesterday? i've lost track already | 22:52 |
corvus | clarkb: i agree with the mitigation plan | 22:53 |
corvus | i can easily remove the locks if ya'll are ready | 22:53 |
fungi | previous incident was early 2019-07-25 so i guess that was yesterday (thursday) | 22:53 |
*** weifan has quit IRC | 22:53 | |
fungi | corvus: sounds good | 22:53 |
corvus | clarkb: ? | 22:54 |
clarkb | corvus: am ready | 22:54 |
clarkb | and plan sounds good | 22:54 |
corvus | done | 22:54 |
clarkb | fungi: should we trigger replication ~3 at a time like yesterday? | 22:55 |
fungi | i'm on hand to babysit the replication... no exciting friday night plans | 22:55 |
*** gyee has joined #openstack-infra | 22:55 | |
clarkb | I'm around for the next hour at least | 22:55 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 22:55 |
clarkb | fungi: just let me know how I can help | 22:55 |
fungi | we could try x4 and see what the slowdown is | 22:55 |
clarkb | fungi: ++ | 22:55 |
corvus | i'll trigger the single rep on gitea02 now | 22:56 |
fungi | thanks corvus, a good canary | 22:57 |
fungi | i have a feeling we could fire all 8 together and they'd still complete in a reasonable amount of time, if the slowdown was really related to other use of the server | 22:57 |
*** aaronsheffield has quit IRC | 22:57 | |
corvus | 02 is good now on a-c_t | 22:58 |
fungi | though it was rather uncanny that last night the replication time scaled roughly linearly to the number of repos we replicated at once | 22:58 |
fungi | implying "parallel" replication still has some inherent serialization imposed somewhere | 22:58 |
corvus | i have to run, so i'll leave the repl kick-off to you | 22:59 |
fungi | thanks again corvus!!! | 22:59 |
*** weifan has joined #openstack-infra | 22:59 | |
fungi | i'll fire up full replication for 05-08 first | 22:59 |
clarkb | sounds good | 22:59 |
corvus | (it wouldn't be sysadmin day if we didn't get to do some sysadminning, huh?) | 22:59 |
fungi | all queued | 23:00 |
fungi | indeed, indeed | 23:00 |
fungi | happy sysadmin day to all | 23:00 |
*** jamesmcarthur has quit IRC | 23:00 | |
fungi | i'm polling the queue length with some timestamping again so i'll know what time it reaches ~0 | 23:03 |
*** weifan has quit IRC | 23:03 | |
fungi | guessing it'll be around 23:55 if the trending from last night remains consistent | 23:03 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 23:05 |
*** mriedem has quit IRC | 23:08 | |
*** pkopec has quit IRC | 23:09 | |
*** jamesmcarthur has joined #openstack-infra | 23:10 | |
*** slaweq has joined #openstack-infra | 23:11 | |
*** slaweq has quit IRC | 23:15 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 23:22 |
clarkb | I think my devstack into python chagne is working now and takes keystone account setup from ~60 seconds to 7 seconds? | 23:27 |
clarkb | there are a bunch of other keystone user and endpoint type creation things that before were taking that ~7 minutes which we can probably get down to ~30 seconds if rewritten in python | 23:27 |
*** diablo_rojo has joined #openstack-infra | 23:28 | |
*** tosky has quit IRC | 23:31 | |
*** jamesmcarthur has quit IRC | 23:42 | |
fungi | replication seems to be tracking far longer than my earlier estimate | 23:42 |
fungi | so either things are slower now than this time last night, or 4x is past the tipping point for contention | 23:43 |
*** mattw4 has quit IRC | 23:45 | |
fungi | taking nearly twice as long as projected | 23:50 |
clarkb | huh | 23:51 |
fungi | yeah, i expected it to wrap up around nowish, but 2788 tasks | 23:54 |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530 | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!