ianw | ok, so it's failing on "TASK [cloud-launcher : Processing keypair infra-root-keys for openstackzuul-citycloud]" | 00:01 |
---|---|---|
clarkb | ianw: ok for that one I think we have to delete the existing keys | 00:01 |
clarkb | I recall running into that whenspinning up packethost? basically we changed the value in yaml but launcher doesn't know how to do a delete add to update | 00:02 |
ianw | clarkb: oh, it's an authentication issue (at least the first problem is :) | 00:02 |
clarkb | ianw: if you delete it nodepool won't be able to boot instances while its gone but citycloud is disabled anyway | 00:02 |
clarkb | ah | 00:02 |
ianw | i guess we *also* want the region | 00:03 |
ianw | in the task message | 00:03 |
*** betherly has quit IRC | 00:04 | |
clarkb | http://graphite.openstack.org/render/?from=20181019&until=20181106&target=scale(stats.timers.zuul.tenant.openstack.pipeline.check.resident_time.mean,%200.00001666666) illustrates the queue improvements we've seen | 00:07 |
clarkb | thank you tripleo! | 00:07 |
openstackgerrit | Ian Wienand proposed openstack/ansible-role-cloud-launcher master: Add region to task output https://review.openstack.org/616035 | 00:11 |
ianw | clarkb: ok, that will put info in the launcher, but i'll look at manual auth now | 00:12 |
*** slaweq has quit IRC | 00:12 | |
*** bobh has joined #openstack-infra | 00:12 | |
*** slaweq has joined #openstack-infra | 00:13 | |
ianw | yep, so even "OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml ./env/bin/openstack --os-cloud=openstackzuul-citycloud --os-region Lon1 server list" doens't work right now | 00:13 |
clarkb | hrm did openstacksdk 0.19.0 install properly? | 00:14 |
clarkb | its there | 00:15 |
ianw | yeah, this is my virtualenv with that | 00:15 |
ianw | mordred: ? ^ | 00:15 |
clarkb | ianw: it works with my venv | 00:16 |
clarkb | /home/clarkb/venv/bin/openstack if you want to cross check that | 00:16 |
*** bobh has quit IRC | 00:17 | |
ianw | hrm | 00:17 |
ianw | ./env-ansible-devel/bin/pip list | grep openstacksdk | 00:17 |
ianw | openstacksdk 0.19.0 | 00:17 |
ianw | $ /home/clarkb/venv/bin/pip list | grep openstacksdk | 00:18 |
ianw | openstacksdk 0.17.2 | 00:18 |
clarkb | hrm | 00:18 |
clarkb | could be a new regression in the sdk? | 00:18 |
ianw | i feel like I've journeyed a long way from trying to stand up a mirror server in the new arm cloud :) | 00:19 |
clarkb | ianw: fwiw, I would not be opposed to you running cloud launcher against the armci cloud alone then setup a mirror there | 00:20 |
clarkb | ianw: and we can work the larger more general problem in parallel (with mordred hopefully as he is able) | 00:20 |
ianw | yeah, that's an option | 00:22 |
clarkb | I'm going to update my venv now and run with debug output from openstackclient | 00:22 |
clarkb | ianw: hrm it actually works with 0.19.0 for me too | 00:23 |
clarkb | bug in another library that didn't get updated maybe? | 00:23 |
ianw | :/ ... ok, i have debugging output too, want to be careful with it that i don't post a password :) | 00:23 |
ianw | it's talking to "Starting new HTTPS connection (1): lon1.citycloud.com:5000" | 00:24 |
clarkb | ianw: feel free to use my venv as needed to debug if you decide to do that | 00:25 |
ianw | clarkb: it doesn't work for me using your venv | 00:26 |
clarkb | I'm gettin sucked into eelection watching. And I can always recreate the venv if I need to | 00:26 |
clarkb | ianw: hrm something external to the venv then?? | 00:26 |
ianw | OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml /home/clarkb/venv/bin/openstack --os-cloud=openstackzuul-citycloud -vvv --os-region Lon1 server list | 00:26 |
clarkb | oh! I'm using openstackci-citycloud, not openstackzuul-citycloud | 00:26 |
clarkb | ok now it doesn't work | 00:26 |
clarkb | so difference in those two accounts | 00:27 |
ianw | hrm, then i wonder if we just have a wrong password | 00:27 |
clarkb | maybe? | 00:27 |
mordred | clarkb: what did I do? | 00:27 |
clarkb | mordred: maybe nothing? citycloud still not working quite right, but we've narrowed it down to openstackzuul-citycloud. openstackci-citycloud works | 00:28 |
ianw | mordred: can you make OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml /home/ianw/env/bin/openstack --os-cloud=openstackzuul-citycloud --os-region Lon1 server list work? :) | 00:28 |
*** ssbarnea has quit IRC | 00:28 | |
ianw | did the password change maybe? | 00:28 |
mordred | oh - ... I think the project id for lon is different | 00:28 |
clarkb | ianw: we've been in hibernation with citycloud so it is possible | 00:28 |
mordred | I think I remember seeing that this was the case when I was logged in to their web interface | 00:29 |
mordred | or maybe the domain id - or something | 00:29 |
clarkb | switch to name maybe? | 00:29 |
openstackgerrit | Merged openstack/ansible-role-cloud-launcher master: Add region to task output https://review.openstack.org/616035 | 00:29 |
mordred | yeah - I think that's what I did for the other one I updated ... becaues hte names are consistent across the regions | 00:30 |
mordred | Additionally, Sto2 and Lon1 each have different domain ids. The domain | 00:30 |
ianw | ok, pulling up webui ... | 00:30 |
mordred | names are the same though - and that's good, because logical names are | 00:30 |
mordred | nicer in config files anyway. | 00:30 |
*** gfidente|afk has quit IRC | 00:31 | |
mordred | yes - that will be it - update project_id project_domain_id and user_domain_id to be the name varieties | 00:31 |
mordred | sorry I didn't catch that in the other patch | 00:32 |
ianw | right yeah they're all CCP_Domain_27611 ? | 00:32 |
mordred | I don't know if the openstackzuul user is in the same domain | 00:33 |
mordred | I'm *certain* that 'Default Project 27611' will not be the project | 00:33 |
clarkb | I think each user/account has a different domain there | 00:33 |
ianw | mordred: from the webui, i think so -- both users seem to be there | 00:33 |
clarkb | ianw: oh you know what it is we have one account with two users in the same domain | 00:34 |
clarkb | maybe | 00:34 |
mordred | yah | 00:34 |
mordred | and probably 2 different projects | 00:34 |
mordred | which have a selector up top | 00:34 |
clarkb | ya | 00:34 |
ianw | all the endpoints https://sto2.citycloud.com:8776/v2/65684... end with the same digits | 00:35 |
ianw | is that the project? | 00:35 |
mordred | yah - that'll be the project_id | 00:35 |
*** mriedem has quit IRC | 00:36 | |
*** longkb has joined #openstack-infra | 00:36 | |
ianw | that's not private is it? | 00:36 |
mordred | it's not | 00:36 |
mordred | the project name will be up in the top - it's a little harder to find | 00:36 |
* mordred has to run - but is very confident this will fix things | 00:38 | |
*** irclogbot_1 has quit IRC | 00:38 | |
ianw | hrm, i see where that projectid comes from ... bed89257500340af8d0fbe7141b1bfd6 ... but that's the one not working | 00:38 |
clarkb | well if domain is wrong the project id wont matter | 00:39 |
ianw | and it's different the the one listed at the end of the endpoint urls | 00:39 |
ianw | hang on they have an rc file ... maybe this will help | 00:41 |
ianw | ok, i think so, testing ... | 00:42 |
ianw | yep, ok, got it | 00:45 |
clarkb | yay | 00:46 |
ianw | oh haha it's already ok nodepool config | 00:48 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Update citycloud project details https://review.openstack.org/616039 | 00:50 |
ianw | clarkb / mordred : ^ | 00:50 |
clarkb | +2 | 00:51 |
clarkb | any other infra-root around to do second eyeball on ^ | 00:51 |
jhesketh | clarkb, ianw: yep, taking a look | 00:51 |
*** sthussey has quit IRC | 00:52 | |
jhesketh | lgtm | 00:53 |
ianw | thanks. i'm going to grab some lunch and when i come back will check on cloud launcher again ... | 00:58 |
*** Swami has quit IRC | 01:01 | |
*** fresta has quit IRC | 01:38 | |
*** fresta has joined #openstack-infra | 01:40 | |
*** yamamoto has joined #openstack-infra | 01:46 | |
*** fresta has quit IRC | 01:47 | |
*** fresta has joined #openstack-infra | 01:51 | |
*** rlandy has quit IRC | 01:58 | |
*** markvoelker has quit IRC | 02:00 | |
*** markvoelker has joined #openstack-infra | 02:00 | |
*** markvoelker has quit IRC | 02:02 | |
*** armax has quit IRC | 02:10 | |
*** dave-mccowan has joined #openstack-infra | 02:16 | |
*** dayou has quit IRC | 02:20 | |
*** dayou has joined #openstack-infra | 02:21 | |
*** dayou has quit IRC | 02:25 | |
*** dayou has joined #openstack-infra | 02:26 | |
*** dave-mccowan has quit IRC | 02:27 | |
*** dayou has quit IRC | 02:29 | |
*** dayou has joined #openstack-infra | 02:33 | |
*** annp has joined #openstack-infra | 02:34 | |
*** dayou has quit IRC | 02:34 | |
*** dayou has joined #openstack-infra | 02:36 | |
*** dayou has quit IRC | 02:37 | |
*** dayou has joined #openstack-infra | 02:38 | |
*** dayou has quit IRC | 02:42 | |
*** dayou has joined #openstack-infra | 02:44 | |
*** dayou has quit IRC | 02:45 | |
*** dayou has joined #openstack-infra | 02:46 | |
*** dayou has quit IRC | 02:47 | |
*** mrsoul has quit IRC | 02:48 | |
*** dayou has joined #openstack-infra | 02:49 | |
*** yamamoto has quit IRC | 02:49 | |
*** felipemonteiro has joined #openstack-infra | 02:51 | |
*** dayou has quit IRC | 03:05 | |
*** diablo_rojo has quit IRC | 03:06 | |
*** hongbin has joined #openstack-infra | 03:12 | |
*** felipemonteiro has quit IRC | 03:16 | |
*** dayou has joined #openstack-infra | 03:20 | |
*** dave-mccowan has joined #openstack-infra | 03:24 | |
ianw | hrm, i'm not sure we haven't broken the system-config gate now | 03:33 |
*** dave-mccowan has quit IRC | 03:36 | |
*** felipemonteiro has joined #openstack-infra | 03:37 | |
ianw | http://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/job-output.txt.gz | 03:37 |
ianw | [WARNING]: * Failed to parse /opt/system-config/inventory/openstack.yaml with | 03:38 |
ianw | 2018-11-07 01:00:14.693975 | bridge.openstack.org | openstack plugin: Auth plugin requires parameters which were not given: | 03:38 |
ianw | 2018-11-07 01:00:14.694008 | bridge.openstack.org | auth_url | 03:38 |
clarkb | hrm we don't want to use the openstack plugin there do we? the inventory is provided by the job itself | 03:40 |
clarkb | weird that it started later though | 03:40 |
clarkb | ianw: thats part of the ansible config that says which plugins to enable iirc | 03:41 |
*** kota_ has quit IRC | 03:41 | |
ianw | yeah, i dunno, it's maybe working now ... | 03:42 |
clarkb | ianw: ok we copy ./playbooks/roles/install-ansible/files/ansible.cfg to /etc/ansible/ansible.cfg. Then update the ini values to remove openstack.yaml? | 03:44 |
clarkb | http://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/ara-report/file/16abe354-93fb-4feb-a763-838fed1379b6/#line-24 I'm guessing that didn't work for some reason? | 03:45 |
clarkb | unfortunately the logging there is light | 03:45 |
clarkb | ianw: http://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/ansible/ansible.cfg we seem to log that file nad it doesn't look updated | 03:46 |
clarkb | that is really weird | 03:46 |
clarkb | ara indicates it did what we wanted | 03:47 |
clarkb | bug in ansible maybe? | 03:47 |
*** owalsh_ has joined #openstack-infra | 03:47 | |
ianw | i'm not sure, i think that the openstack plugin should have read the file, but didn't, giving that error with auth_url | 03:48 |
ianw | maybe it was an old openstacksdk | 03:49 |
clarkb | ianw: well I don't thin kwe are supposed to use the openstack plugin at all | 03:49 |
clarkb | that is what http://logs.openstack.org/39/616039/1/check/system-config-run-base/e432132/ara-report/file/16abe354-93fb-4feb-a763-838fed1379b6/#line-24 should chnage (it writes a new inventory option value) | 03:49 |
*** rkukura has quit IRC | 03:50 | |
clarkb | but if you look at the file value we log for that job its still got the original inventory option value | 03:50 |
ianw | oh right, ok | 03:50 |
ianw | well, it all just passed, so it's non-deterministic :/ | 03:50 |
*** owalsh has quit IRC | 03:51 | |
clarkb | ya none of the three options we should update appear to have bene changed there | 03:51 |
*** bhavikdbavishi has joined #openstack-infra | 03:52 | |
clarkb | looks like they try really hard to make that an atomic update | 03:57 |
clarkb | I wonder if it is buggy | 03:57 |
clarkb | I take that back. This ran in ansible 2.5.11 that doesn't do the atomic move | 04:01 |
clarkb | ianw: I wonder if it is a sync issue? we read the file, update the contents and write it back then we do that over and over to do each individual option | 04:06 |
clarkb | if we somehow read back something that hasn't been updated on the write side? | 04:06 |
openstackgerrit | Merged openstack-infra/system-config master: Update citycloud project details https://review.openstack.org/616039 | 04:08 |
clarkb | ianw: if it persists maybe we should use sed :/ | 04:12 |
*** felipemonteiro has quit IRC | 04:15 | |
*** yamamoto has joined #openstack-infra | 04:21 | |
*** bhavikdbavishi has quit IRC | 04:30 | |
*** felipemonteiro has joined #openstack-infra | 04:31 | |
*** roman_g has quit IRC | 04:35 | |
*** felipemonteiro has quit IRC | 04:43 | |
*** hongbin has quit IRC | 04:47 | |
*** felipemonteiro has joined #openstack-infra | 04:47 | |
*** felipemonteiro has quit IRC | 04:53 | |
*** noama has joined #openstack-infra | 04:54 | |
*** annp has quit IRC | 05:00 | |
*** longkb has quit IRC | 05:00 | |
*** longkb has joined #openstack-infra | 05:00 | |
*** agopi has joined #openstack-infra | 05:08 | |
ianw | it looks like 2604:e100:1:0:f816:3eff:fe05:7ce0 is hanging hte ansible run | 05:09 |
ianw | it's planet.o.o | 05:11 |
ianw | $ ssh ianw@planet.openstack.org | 05:11 |
ianw | ssh_exchange_identification: read: Connection reset by peer | 05:11 |
ianw | or it hangs; but apache is responding | 05:12 |
ianw | planet01 login: [195722.883411] INFO: task jbd2/vda1-8:287 blocked for more than 120 seconds. | 05:16 |
*** hamzy has joined #openstack-infra | 05:18 | |
*** armax has joined #openstack-infra | 05:20 | |
ianw | looks like it's dead on the remote end | 05:20 |
ianw | interesting, i can't seem to get to vexxhost.com ... can anyone else? | 05:21 |
clarkb | it pings but no http(s) | 05:22 |
bkero | I get a HTTP connection and SSL negotiation | 05:24 |
clarkb | ianw: those hosts are boot from volume ceph right? though I don't know if planet is. But maybe ceph had a sad | 05:24 |
ianw | clarkb: i'm not sure; planet.o.o is the only control-plane server we have in vexxhost | 05:27 |
ianw | i tried to reboot it, and it's gone into an error state | 05:27 |
ianw | {'message': 'internal error: process exited while connecting to monitor', 'code': 500, 'created': '2018-11-07T05:20:07Z'} | 05:28 |
*** rascasoft has quit IRC | 05:28 | |
*** pall has quit IRC | 05:28 | |
clarkb | ya the vda message makes me think that maybe the backing store is unhappy | 05:29 |
ianw | i'm seeing a cloudflare error for vexxhost.com now. so i'm guessing something is wrong :/ | 05:29 |
ianw | #status log planet.o.o shutdown and in error state, vexxhost.com currently not responding (planet.o.o is hosted in ca-ymq-2) | 05:30 |
openstackstatus | ianw: finished logging | 05:30 |
ianw | i guess what i'll do is put it in the emergency file. hopefully it doesn't require being rebuilt but i'm not sure what else to do right now | 05:30 |
*** rascasoft has joined #openstack-infra | 05:31 | |
ianw | for reference, here is the status of the server after i rebooted it -> http://paste.openstack.org/show/734320/ | 05:31 |
ianw | and here is the log of it going down -> http://paste.openstack.org/show/734321/ | 05:32 |
clarkb | seems like a reasonable step and we can check in with mnaser when he is around | 05:32 |
ianw | hrm, the emergency.yaml file is empty | 05:33 |
ianw | i'm not sure that's right | 05:33 |
clarkb | ianw: iirc mordred did that because all the emergency file hosts were in the disabled lists in yamlgroups file | 05:34 |
clarkb | so we didn't need anything in emergency file? | 05:34 |
clarkb | it is getting late for me though and election things are wrapping up so I expect to be afk soon | 05:34 |
ianw | oh, ok, so we just add them as an entry under disabled? | 05:34 |
clarkb | ya or you can add an emergency file for more temporary things | 05:35 |
clarkb | (since emergency file doesn't go thorugh code review) | 05:35 |
ianw | why is /etc/ansible/groups.yaml in the old format? | 05:37 |
ianw | hosts/groups.yaml ... | 05:38 |
clarkb | ianw: I think that file is ignored (we should probably clean that up) the ansible.cfg points to other files (those in /opt/system-config/inventory) | 05:38 |
ianw | ohhh, right | 05:39 |
ianw | so we're not deploying it, i must have missed that bit | 05:39 |
ianw | #status log planet01.o.o in the emergency file, pending investigation with vexxhost | 05:42 |
openstackstatus | ianw: finished logging | 05:42 |
*** wznoinsk has quit IRC | 05:46 | |
*** jrist has quit IRC | 05:58 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Pin bridge.o.o to ansible 2.7.0, add devel testing job https://review.openstack.org/614894 | 05:58 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk https://review.openstack.org/615982 | 05:58 |
*** d0ugal has quit IRC | 06:05 | |
*** jrist has joined #openstack-infra | 06:11 | |
*** dayou has quit IRC | 06:14 | |
*** dayou has joined #openstack-infra | 06:15 | |
*** haleyb has quit IRC | 06:20 | |
*** d0ugal has joined #openstack-infra | 06:21 | |
*** apetrich has quit IRC | 06:29 | |
*** apetrich has joined #openstack-infra | 06:44 | |
*** felipemonteiro has joined #openstack-infra | 06:50 | |
*** bhavikdbavishi has joined #openstack-infra | 06:52 | |
*** andreaf has quit IRC | 06:53 | |
*** andreaf has joined #openstack-infra | 06:55 | |
*** diablo_rojo has joined #openstack-infra | 06:56 | |
AJaeger | config-core, could I get a second +2 on https://review.openstack.org/615637 (clarkb already reviewed) - removes now obsolete publish-static jobs from project-config, please? | 06:58 |
AJaeger | ianw: I left a question on https://review.openstack.org/#/c/615698/5/roles/validate-host/library/zuul_network_validate.py for you | 07:07 |
*** bhavikdbavishi has quit IRC | 07:10 | |
*** quiquell|off is now known as quiquell | 07:10 | |
*** dpawlik has joined #openstack-infra | 07:11 | |
*** bhavikdbavishi has joined #openstack-infra | 07:13 | |
*** alexchadin has joined #openstack-infra | 07:15 | |
*** dpawlik has quit IRC | 07:17 | |
*** maciejjozefczyk has joined #openstack-infra | 07:17 | |
*** dpawlik has joined #openstack-infra | 07:17 | |
openstackgerrit | Merged openstack-infra/project-config master: Remove publish-static https://review.openstack.org/615637 | 07:18 |
*** rkukura has joined #openstack-infra | 07:30 | |
*** ccamacho has joined #openstack-infra | 07:31 | |
*** diablo_rojo has quit IRC | 07:32 | |
*** apetrich has quit IRC | 07:33 | |
*** pcaruana has joined #openstack-infra | 07:36 | |
*** bhavikdbavishi has quit IRC | 07:37 | |
*** aojea has joined #openstack-infra | 07:42 | |
*** ifat_afek has joined #openstack-infra | 07:43 | |
*** yamamoto has quit IRC | 07:43 | |
*** apetrich has joined #openstack-infra | 07:47 | |
*** e0ne has joined #openstack-infra | 07:49 | |
*** rkukura has quit IRC | 07:49 | |
*** rkukura has joined #openstack-infra | 07:49 | |
*** quiquell is now known as quiquell|brb | 07:49 | |
*** rkukura has quit IRC | 07:49 | |
*** yamamoto has joined #openstack-infra | 07:57 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: doc: fix typo in secret example https://review.openstack.org/616095 | 08:02 |
*** jtomasek has joined #openstack-infra | 08:02 | |
*** quiquell|brb is now known as quiquell | 08:04 | |
*** felipemonteiro has quit IRC | 08:06 | |
ifat_afek | Hi, IRC bot seems to be down on #openstack-meeting-4. The Vitrage meeting is not being logged | 08:07 |
AJaeger | ifat_afek: you added an extra space before "#startmeeting" - try again without | 08:08 |
AJaeger | ifat_afek: see http://eavesdrop.openstack.org/irclogs/%23openstack-meeting-4/%23openstack-meeting-4.2018-11-07.log for your invocation | 08:09 |
ifat_afek | AJaeger: you are probably right, I’ll try again. Thanks! | 08:09 |
AJaeger | ifat_afek: yes, try again... | 08:10 |
*** lpetrut has joined #openstack-infra | 08:20 | |
*** ginopc has joined #openstack-infra | 08:20 | |
*** bhavikdbavishi has joined #openstack-infra | 08:23 | |
ianw | clarkb: "localhost : ok=601 changed=13 unreachable=0 failed=0 skipped=1535" ... yay good cloud launcher run! but that's using ansible from devel. so we either need to look into the snat thing or think about https://review.openstack.org/#/c/614894/ and pin to non-release version | 08:30 |
ianw | AJaeger: thanks, i'll have to think about your comment :) | 08:30 |
ianw | ... till tomorrow, i'm out! | 08:30 |
AJaeger | ianw: good night! | 08:32 |
*** ralonsoh has joined #openstack-infra | 08:33 | |
*** dayou has quit IRC | 08:35 | |
*** ginopc has quit IRC | 08:36 | |
*** bhavikdbavishi has quit IRC | 08:36 | |
*** roman_g has joined #openstack-infra | 08:47 | |
*** gfidente has joined #openstack-infra | 08:50 | |
*** jpena|off is now known as jpena | 08:50 | |
*** dims has quit IRC | 08:52 | |
*** dims has joined #openstack-infra | 08:53 | |
*** dims has quit IRC | 08:58 | |
*** dims has joined #openstack-infra | 08:59 | |
*** jpich has joined #openstack-infra | 09:01 | |
openstackgerrit | Merged openstack-infra/yaml2ical master: update the versions of python 3 claimed https://review.openstack.org/615266 | 09:04 |
*** dayou has joined #openstack-infra | 09:04 | |
*** kukacz has quit IRC | 09:04 | |
*** kukacz has joined #openstack-infra | 09:05 | |
*** alexchadin has quit IRC | 09:06 | |
openstackgerrit | Merged openstack-infra/yaml2ical master: add base class for recurrence https://review.openstack.org/615267 | 09:11 |
openstackgerrit | Merged openstack-infra/yaml2ical master: add day_specifier to recurrence https://review.openstack.org/615268 | 09:11 |
*** yamamoto has quit IRC | 09:14 | |
*** ssbarnea has joined #openstack-infra | 09:18 | |
*** shardy has joined #openstack-infra | 09:19 | |
*** owalsh_ is now known as owalsh | 09:23 | |
*** jtomasek has quit IRC | 09:26 | |
*** electrofelix has joined #openstack-infra | 09:29 | |
*** jtomasek has joined #openstack-infra | 09:31 | |
*** ttx has quit IRC | 09:37 | |
*** dayou has quit IRC | 09:38 | |
*** dpawlik has quit IRC | 09:38 | |
*** yamamoto has joined #openstack-infra | 09:41 | |
*** dpawlik has joined #openstack-infra | 09:42 | |
*** panda|off is now known as panda | 09:43 | |
*** derekh has joined #openstack-infra | 09:44 | |
*** ttx has joined #openstack-infra | 09:50 | |
*** kopecmartin|off is now known as kopecmartin | 09:59 | |
*** sshnaidm|afk is now known as sshnaidm|rover | 10:01 | |
*** e0ne has quit IRC | 10:02 | |
*** e0ne has joined #openstack-infra | 10:04 | |
*** dayou has joined #openstack-infra | 10:05 | |
*** longkb has quit IRC | 10:13 | |
*** rfolco|ruck has joined #openstack-infra | 10:36 | |
*** yamamoto has quit IRC | 10:39 | |
*** ginopc has joined #openstack-infra | 10:39 | |
*** e0ne has quit IRC | 10:45 | |
*** dtantsur|afk is now known as dtantsur | 10:47 | |
*** admcleod has joined #openstack-infra | 10:47 | |
*** admcleod has quit IRC | 10:48 | |
*** admcleod has joined #openstack-infra | 10:49 | |
*** alexchadin has joined #openstack-infra | 10:50 | |
*** bhavikdbavishi has joined #openstack-infra | 10:51 | |
*** priteau has joined #openstack-infra | 10:53 | |
*** e0ne has joined #openstack-infra | 10:54 | |
*** admcleod has quit IRC | 10:54 | |
*** admcleod has joined #openstack-infra | 10:59 | |
*** shrasool has joined #openstack-infra | 11:02 | |
*** yamamoto has joined #openstack-infra | 11:16 | |
*** bhavikdbavishi has quit IRC | 11:24 | |
*** yamamoto has quit IRC | 11:25 | |
*** florianf has quit IRC | 11:30 | |
*** ssbarnea has quit IRC | 11:34 | |
*** florianf has joined #openstack-infra | 11:48 | |
*** ssbarnea has joined #openstack-infra | 11:53 | |
*** bhavikdbavishi has joined #openstack-infra | 11:59 | |
*** e0ne has quit IRC | 12:04 | |
*** e0ne has joined #openstack-infra | 12:08 | |
*** e0ne has quit IRC | 12:09 | |
*** pbourke has quit IRC | 12:11 | |
*** dpawlik has quit IRC | 12:11 | |
*** pbourke has joined #openstack-infra | 12:11 | |
*** dpawlik has joined #openstack-infra | 12:12 | |
*** yamamoto has joined #openstack-infra | 12:12 | |
*** dpawlik has quit IRC | 12:16 | |
*** florianf has quit IRC | 12:16 | |
openstackgerrit | Stephen Finucane proposed openstack-infra/git-review master: As suggested by pep8 don't compare boolean values or empty sequences https://review.openstack.org/221172 | 12:16 |
*** alexchadin has quit IRC | 12:17 | |
openstackgerrit | Dmitry Tantsur proposed openstack/diskimage-builder master: Add an element to configure iBFT network interfaces https://review.openstack.org/391787 | 12:20 |
*** snapiri has joined #openstack-infra | 12:22 | |
openstackgerrit | Stephen Finucane proposed openstack-infra/git-review master: option.compatible as a config variable This is to avoid the annoying use of '-c' option with old gerrit servers. Usage doc has been updated to include '-c' option https://review.openstack.org/444274 | 12:24 |
openstackgerrit | Stephen Finucane proposed openstack-infra/git-review master: Fix log command used to count refs to be submitted. https://review.openstack.org/337883 | 12:25 |
*** e0ne has joined #openstack-infra | 12:26 | |
openstackgerrit | Stephen Finucane proposed openstack-infra/git-review master: Fix "git-review -d" erases work directory if on the same branch as the change downloaded https://review.openstack.org/399779 | 12:27 |
*** ifat_afek has quit IRC | 12:37 | |
*** ifat_afek has joined #openstack-infra | 12:38 | |
*** dpawlik has joined #openstack-infra | 12:39 | |
*** dpawlik has quit IRC | 12:39 | |
*** dpawlik has joined #openstack-infra | 12:40 | |
*** ifat_afek has quit IRC | 12:43 | |
*** sshnaidm|rover is now known as sshnaidm|afk | 12:50 | |
*** jpena is now known as jpena|lunch | 12:53 | |
*** rlandy has joined #openstack-infra | 12:54 | |
coreycb | clarkb: AJaeger: if you have a moment can you respond to this thread? http://lists.openstack.org/pipermail/openstack-dev/2018-November/136337.html | 12:57 |
*** shrasool has quit IRC | 13:06 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for mirror-update.o.o https://review.openstack.org/615991 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for rax mirror https://review.openstack.org/615992 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all mirrors https://review.openstack.org/615993 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for files.openstack.org https://review.openstack.org/615994 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for on zookeeper instance https://review.openstack.org/615995 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zookeeper instances https://review.openstack.org/615996 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one nameserver https://review.openstack.org/615998 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all nameservers https://review.openstack.org/615999 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for master nameserver https://review.openstack.org/616000 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid-dev https://review.openstack.org/616001 | 13:13 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid https://review.openstack.org/616002 | 13:13 |
*** jpena|lunch is now known as jpena | 13:18 | |
*** trown|outtypewww is now known as trown | 13:21 | |
*** chem has joined #openstack-infra | 13:23 | |
chem | Hi, I don't get why https://review.openstack.org/#/c/611677/ doesn't go through the gate. Should I change the depends-on to point only to the master patch ? | 13:25 |
AJaeger | chem: new syntax is "depends-on: https://review.openstack.org/616666". It only merges if *all* dependencies are in. | 13:27 |
AJaeger | chem: and your cahnge has a dependency on a stable change with the ID. | 13:27 |
AJaeger | chem: so, depends on what you want - wait for all - or only a single change (or multiple ones with multiple depends-on) | 13:27 |
chem | AJaeger: Thanks for the confirmation. Will use the new syntax from now on for this kind of patch. | 13:28 |
*** bhavikdbavishi has quit IRC | 13:29 | |
*** ansmith has joined #openstack-infra | 13:30 | |
*** dpawlik has quit IRC | 13:35 | |
*** dpawlik has joined #openstack-infra | 13:35 | |
*** dpawlik has quit IRC | 13:40 | |
*** e0ne has quit IRC | 13:41 | |
*** bobh has joined #openstack-infra | 13:42 | |
*** shrasool has joined #openstack-infra | 13:45 | |
*** sthussey has joined #openstack-infra | 13:45 | |
*** pcaruana has quit IRC | 13:45 | |
*** e0ne has joined #openstack-infra | 13:45 | |
*** agopi has quit IRC | 13:47 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one zookeeper instance https://review.openstack.org/615995 | 13:50 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zookeeper instances https://review.openstack.org/615996 | 13:50 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one nameserver https://review.openstack.org/615998 | 13:50 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all nameservers https://review.openstack.org/615999 | 13:50 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for master nameserver https://review.openstack.org/616000 | 13:50 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid-dev https://review.openstack.org/616001 | 13:50 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid https://review.openstack.org/616002 | 13:50 |
*** kgiusti has joined #openstack-infra | 13:50 | |
openstackgerrit | Luka Peschke proposed openstack-infra/irc-meetings master: Creating an IRC meeting for CloudKitty https://review.openstack.org/616205 | 13:51 |
dhellmann | smcginnis , fungi, AJaeger: we had another rsync error in the releases repo publish job. I thought we had eliminated all of those, so I wonder if this is somehow due to switching to the new job? http://logs.openstack.org/9d/9d2593cee917d4956ffd5b6ea809e0bd872465f6/release-post/publish-tox-docs-static/4b9be16/ | 13:52 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 13:57 |
*** mriedem has joined #openstack-infra | 13:58 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 13:59 |
*** pcaruana has joined #openstack-infra | 14:00 | |
openstackgerrit | Slawek Kaplonski proposed openstack-infra/irc-meetings master: Add ralonsoh to chairing neutron-qos meeting https://review.openstack.org/616208 | 14:03 |
*** shrasool has quit IRC | 14:03 | |
smcginnis | dhellmann: I didn't think we actually did something to fix those. There was a thought that the way zuul would run things it would be less likely, but I assumed we hadn't seen it in awhile just because we didn't release two branches of the same deliverable at the same time lately. | 14:08 |
dhellmann | ok, I couldn't remember what changed but I thought it was fixed | 14:09 |
dhellmann | I'm not sure why releasing different branches would matter, since it's always master in that repo | 14:09 |
smcginnis | Something about the local paths ending up the same or something. It is a bit vague from my memory. | 14:10 |
dhellmann | hmm | 14:10 |
dhellmann | ok | 14:10 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 14:11 |
*** e0ne has quit IRC | 14:13 | |
*** e0ne has joined #openstack-infra | 14:14 | |
*** agopi has joined #openstack-infra | 14:17 | |
openstackgerrit | Merged openstack-infra/nodepool master: Correct heading levels for Kubernetes config docs https://review.openstack.org/616007 | 14:20 |
*** rh-jelabarre has joined #openstack-infra | 14:21 | |
*** ifat_afek has joined #openstack-infra | 14:22 | |
*** yamamoto has quit IRC | 14:29 | |
efried | Anyone else having log server woes? Some (but not all) requests to the log server just spin and eventually time out? E.g. http://logs.openstack.org/24/615724/2/check/openstack-tox-lower-constraints/a135914/ | 14:32 |
smcginnis | Hmm, first I've seen of that, but I get the same thing with that link. | 14:32 |
efried | It happened briefly yesterday, then seemed to clear up. | 14:33 |
*** pcaruana has quit IRC | 14:33 | |
efried | and I thought it was happening on some links and not others, but maybe it's just that it's intermittent. | 14:33 |
*** pcaruana has joined #openstack-infra | 14:34 | |
cmurphy | yeah it's been kind of slowish for me | 14:35 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zookeeper instances https://review.openstack.org/615996 | 14:38 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one nameserver https://review.openstack.org/615998 | 14:38 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all nameservers https://review.openstack.org/615999 | 14:38 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for master nameserver https://review.openstack.org/616000 | 14:38 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid-dev https://review.openstack.org/616001 | 14:38 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for openstackid https://review.openstack.org/616002 | 14:38 |
*** florianf has joined #openstack-infra | 14:40 | |
*** bobh has quit IRC | 14:42 | |
*** jcoufal has joined #openstack-infra | 14:43 | |
*** bobh has joined #openstack-infra | 14:50 | |
*** yamamoto has joined #openstack-infra | 14:52 | |
*** jistr is now known as jistr|call | 15:00 | |
*** priteau has quit IRC | 15:01 | |
*** priteau has joined #openstack-infra | 15:03 | |
*** anteaya has joined #openstack-infra | 15:08 | |
*** rpioso|afk is now known as rpioso | 15:13 | |
*** roman_g has quit IRC | 15:15 | |
*** felipemonteiro has joined #openstack-infra | 15:16 | |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: add day_specifier from recurrence https://review.openstack.org/615270 | 15:19 |
*** dpawlik has joined #openstack-infra | 15:22 | |
*** dpawlik has quit IRC | 15:22 | |
*** dpawlik has joined #openstack-infra | 15:23 | |
fungi | the cacti graphs for static.o.o don't look especially anomalous | 15:25 |
fungi | http://logs.openstack.org/24/615724/2/check/openstack-tox-lower-constraints/a135914/ loaded for me instantly but i can try reloading it a bunch and see what happens | 15:26 |
fungi | not seeing anything so far | 15:26 |
fungi | dmesg on the server isn't reporting trouble reaching any of its cinder volumes, nor any problems with the filesystem | 15:27 |
*** jistr|call is now known as jistr | 15:29 | |
fungi | dhellmann: smcginnis: yes it could be changes triggered by different branches racing i suppose... zuul no longer runs jobs in parallel in (release-)post if they're triggered by ref updates for the same project+branch but could still run them in parallel if they're for different branches of the same project (i think) | 15:30 |
fungi | triggered by ref updates of different branches for the same project, that is | 15:31 |
fungi | though looks like that build was triggered by a ref update for the master branch of openstack/releases? and it only has one branch, right? | 15:32 |
corvus | fungi: i believe that description of behavior is correct | 15:32 |
fungi | rsync: rename failed for "/srv/static/releases/.buildinfo" (from .~tmp~/.buildinfo): No such file or directory (2) | 15:33 |
fungi | http://logs.openstack.org/9d/9d2593cee917d4956ffd5b6ea809e0bd872465f6/release-post/publish-tox-docs-static/4b9be16/ara-report/ | 15:34 |
fungi | er, http://logs.openstack.org/9d/9d2593cee917d4956ffd5b6ea809e0bd872465f6/release-post/publish-tox-docs-static/4b9be16/ara-report/result/afe3b900-624a-4435-bf7f-01db4b4981a0/ | 15:34 |
smcginnis | fungi: I thought that was the same behavior we were seeing before. | 15:35 |
fungi | it certainly looks familiar | 15:35 |
smcginnis | Before as in, when we saw two jobs running simultaneously and modifying local temporary directory paths. | 15:35 |
fungi | well, i don't know that we ever managed to confirm positively that was the cause. i remember speculating about the possibility | 15:36 |
*** felipemonteiro has quit IRC | 15:39 | |
*** ralonsoh has quit IRC | 15:41 | |
*** dtantsur is now known as dtantsur|afk | 15:42 | |
*** david-lyle has quit IRC | 15:42 | |
*** dklyle has joined #openstack-infra | 15:47 | |
*** kopecmartin is now known as kopecmartin|off | 15:51 | |
*** mriedem has quit IRC | 15:53 | |
*** ralonsoh has joined #openstack-infra | 15:54 | |
*** gyee has joined #openstack-infra | 15:58 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Migration to PHP 7.x https://review.openstack.org/616226 | 16:00 |
ssbarnea | fungi: do you know of we can use "tr:" or "bug:" searches on our gerrit? | 16:07 |
clarkb | ssbarnea: should be able to if searching with the message: filed | 16:08 |
clarkb | *field | 16:08 |
ssbarnea | i am trying to find a way to perform a query that would return all CR related to a specific sprint. using topic is not possible as we have multiple ones. | 16:08 |
clarkb | message:"bug: foo" type of query | 16:08 |
fungi | ssbarnea: we likely need 471078 which nobody ever reviewed after i proposed it | 16:09 |
fungi | if it's something you'd like to take advantage of i can try to get people to take a look | 16:10 |
ssbarnea | clarkb: ok, so if we use a keyword like ooo-sprint-123 we can rely on this to get, it would work. | 16:11 |
clarkb | ssbarnea: possibly? you should test that gerrit is indexing the entire commit message | 16:11 |
clarkb | but yes I expect that is how it should work | 16:11 |
ssbarnea | fungi: well, I do also support https://review.openstack.org/#/c/471078/ -- and added a +1 on it. who can merge it? | 16:11 |
*** agopi is now known as agopi|lunch | 16:14 | |
*** lpetrut has quit IRC | 16:14 | |
fungi | ssbarnea: infra-root members are core reviewers on that repository | 16:15 |
clarkb | oh thats an actual gerrit feature TIL | 16:15 |
ssbarnea | btw, we started to use taiga and we are posting URLs on "Story:". They look bit ugly but they work. | 16:15 |
ssbarnea | yep, tr:/bug: is a gerrit thing. in fact I hope to see it upgraded to be able to benefit from other improvements. | 16:16 |
clarkb | ssbarnea: any reason to not use storyboard? | 16:16 |
clarkb | I thought tripleo had migrated | 16:16 |
*** imacdonn has quit IRC | 16:17 | |
ssbarnea | yeah, lots of them and more important was to use same tool as rdo infra team. they started first. | 16:17 |
*** imacdonn has joined #openstack-infra | 16:17 | |
clarkb | ssbarnea: the feedback would likely be appreciated by the storyboard team if anyone has time to write it down | 16:18 |
fungi | if we get that tracking change in (it may need updating, i can check after my current meeting) then it will go into effect along with the task footer hyperlinks at the next gerrit restart | 16:18 |
*** quiquell is now known as quiquell|off | 16:19 | |
ssbarnea | clarkb: i plan to do this but it will be in december: i need to learn more about both of them. | 16:19 |
mordred | I didn't think the openstack policy was to allow official projects to use whatever tracker they felt like, but instead that projects needed to use one of the agreed upon tools that the community as a whole uses - did we change that policy? | 16:20 |
ssbarnea | this was the 2nd sprint and we still learn about it, after 3 sprints I think I will know enough to make a correct comparison. | 16:20 |
clarkb | mordred: I don't know if the TC has said one way or the other | 16:20 |
ssbarnea | slow down: LP is still used for bugs, this is planning tool. | 16:20 |
fungi | mordred: it sounds more like tripleo is interested in splitting off from openstack anyway? so should be fine | 16:21 |
clarkb | ssbarnea: storyboard provides that functionality is the concern I think | 16:21 |
mordred | clarkb: the _original_ tooling decision, that stands until it's changed, is that projects must use tools from a tc-approved selection of tools | 16:21 |
mordred | fungi: ah - I didn't know that | 16:21 |
mordred | ssbarnea: and no worries either way - I'm mostly just clarifying what's up | 16:21 |
fungi | it's what i'm getting from this discussion at least | 16:21 |
clarkb | ssbarnea: there are workboards that can either be automatically managed or managed by hand to track work in a kanban like fashion (which is what taiga does aiui) | 16:21 |
* mordred isn't going to rage around anywhere or anything | 16:21 | |
fungi | but yeah, they were using trello before, which wasn't something openstack had officially approved either | 16:22 |
mordred | ah - gotcha | 16:22 |
* SotK would be interested in hearing about what taiga provides that storyboard doesn't, or what is so painful in storyboard as to make folk go elsewhere | 16:22 | |
ssbarnea | i don't want to upset anyone that spend effort on storyboard, so I will stop here... but i will try to document what is missing. | 16:23 |
fungi | SotK: doesn't sound like it's even that, more that a red hat product team had already decided to use taiga, and as another red hat product the tripleo team is expected to fall in line | 16:23 |
clarkb | coreycb: I did ack the py37 changes from infra perspective | 16:24 |
clarkb | er on the mailing list | 16:24 |
coreycb | clarkb: awesome, thanks | 16:25 |
SotK | ssbarnea: thanks, I'd appreciate that | 16:25 |
*** mriedem has joined #openstack-infra | 16:26 | |
ssbarnea | SotK: it would be useful to hear about a team doing mixed SCRUM and Kanban on storyboard, i am curious how the planning looks like. | 16:27 |
fungi | also curious how anyone ever thought applying agile methodology to a free software project was anything other than completely nuts | 16:28 |
dmsimard | :( | 16:30 |
fungi | i've never been a fan of agile to start with, but seems like it sucks all the fun out of making free software | 16:32 |
clarkb | ssbarnea: fwiw I think the concern here is that an openstack project is demanding that someone lese use the openstack tool first rather than being willing to dogfood. particularly when the motivation seems to be that a non openstack project has chosen a different tool | 16:32 |
clarkb | ssbarnea: it would be a lot easier to swallow if we could identify deficiencies that could at least be reported and possibly addressed, but as is we don't have even that feedback | 16:33 |
dmsimard | fungi: let's have alcoholic beverages over that topic one day | 16:37 |
*** pcaruana has quit IRC | 16:38 | |
fungi | happy to. i've turned down a number of jobs over their use of agile, because i find it absolutely painful | 16:38 |
ssbarnea | dmsimard: first good comment in while! this is the kind of subject to discuss over drinks. | 16:39 |
ssbarnea | ssbarnea: experience with storyboard while trying to create a board: 500: POST /api/v1/worklists: _CallbackResult was already set --- and this is not the first time when I see failures or timeouts. | 16:41 |
ssbarnea | even the /#!/board/list page takes like 5-10s to load the two columns, even they have <100 elemens each. | 16:43 |
*** e0ne has quit IRC | 16:44 | |
*** e0ne has joined #openstack-infra | 16:44 | |
*** e0ne has quit IRC | 16:44 | |
*** shardy has quit IRC | 16:46 | |
clarkb | ssbarnea: poking around storyboard it appears that starlingx may be attemping to use the boards in the way you were describing? dtroyer may have more info | 16:49 |
hogepodge | I'm having a tough time accessing logs.openstack.org | 16:49 |
ssbarnea | perfect, I will keep an eye on that. | 16:50 |
ttx | hogepodge: yes me too | 16:50 |
hogepodge | I'm trying to figure out why Loci jobs are failing stochastically, with a failure rate of about 1 in 10 | 16:50 |
clarkb | hogepodge: ttx trouble is opening log files? | 16:50 |
hogepodge | clarkb: I can't access the ARA reports. I was able to read the job log, but now I want to chase down what exactly is timing out to see if it's a problem on our side or the infra side. | 16:51 |
clarkb | seems like indexes open quickly | 16:51 |
clarkb | but actual content does not | 16:51 |
ssbarnea | i can confirm the logs issue, but it also happens often to load the index page fast and when you click to open the log file ... "waiting for ...". Many times I press it multiple time, probably causing extra load to the server. | 16:51 |
fungi | a couple of folks reported a problem accessing logs earlier but the link they posted was to the index which was loading fine for me. i didn't get that it was maybe the log content they were having trouble with | 16:52 |
ssbarnea | this is not an unique problem, happened like this for weeks, but experience varies a lot. | 16:52 |
fungi | cacti wasn't indicating anything out of the ordinary for the server itself | 16:52 |
fungi | apache processes are spiking to ~100% cpu | 16:53 |
fungi | could this be only a problem for ara and prettified logs? | 16:54 |
fungi | or are raw logs having trouble too? | 16:54 |
odyssey4me | I'm getting timeouts when trying to brows logs/ara reports from time to time. Is it just me, or are others seeing this too? | 16:54 |
odyssey4me | ah, fungi same with raw logs | 16:54 |
ssbarnea | probably because they are trying to re-gzip an already gzip files, but it does make no sense that file was 85kb only (gz). | 16:54 |
clarkb | fungi: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=138&rra_id=all that is very periodic in its iowait vs not. But we are in the not iowait period | 16:54 |
ssbarnea | take a look http://logs.openstack.org/13/609413/5/check/tripleo-ci-centos-7-containers-multinode/0099c05/job-output.txt.gz | 16:54 |
odyssey4me | I hit it with http://logs.openstack.org/18/616218/1/check/openstack-ansible-functional-centos-7/94aef6a/job-output.txt.gz and http://logs.openstack.org/18/616218/1/check/openstack-ansible-functional-centos-7/94aef6a/logs/ara-report/ on the first try, then a refresh worked. | 16:55 |
ssbarnea | i tried to load it multiple times and it takes ages to start serving it. | 16:55 |
odyssey4me | ssbarnea yep me too | 16:55 |
clarkb | ssbarnea: -rw-r--r-- 1 jenkins jenkins 83K Nov 6 16:21 job-output.txt.gz | 16:55 |
ssbarnea | i tried even with wget same experience: the web server doesn't even start to reply for about 10-20s. when it starts you get it instantly, but connection is the issue. | 16:55 |
clarkb | ssbarnea: looks like 83K according to the filesystem | 16:56 |
clarkb | ssbarnea: so I don't think file sizes are the issue here | 16:56 |
fungi | there is a fair amount of cache memory in use | 16:56 |
ssbarnea | clarkb: i agree with you, is something with the webserver. | 16:56 |
fungi | fairly large spikes in established tcp connections | 16:56 |
clarkb | fungi: is it possible we are queuing connections? | 16:57 |
fungi | a but higher than this time previous weeks | 16:57 |
ssbarnea | either lack of http connections or something more sinister regarding how files are served. | 16:57 |
fungi | well, yes that could be a symptom of apache taking longer to return content | 16:57 |
corvus | i've pulled up a server-status page... there are available connection slots | 16:58 |
corvus | odyssey4me: are you going over ipv4 or v6? | 17:00 |
clarkb | corvus: I am ipv4 and saw it | 17:01 |
odyssey4me | corvus hmm, good question - most likely v4 but let me verify | 17:01 |
corvus | i'm not seeing any v4 packet loss from my workstation | 17:02 |
fungi | things are loading for me over v6, though slow-ish | 17:02 |
*** irclogbot_1 has joined #openstack-infra | 17:02 | |
odyssey4me | yep, looks like v4 for me | 17:02 |
fungi | and once they're cached they seem speedier | 17:02 |
fungi | actually they're back to loading very quickly for me now | 17:02 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=311&rra_id=all doesn't seem to show an appreciable drop in data being served | 17:03 |
fungi | i have a feeling there is contention for the storage leading to long access times reading from disk. could be something like a noisy neighbor starving bandwidth for the provider's cinder network connection | 17:03 |
fungi | or a stripe set is degraded, or... | 17:04 |
clarkb | fungi: or given we are still serving the same amount of data maybe we have specific files/requests being our own dos? | 17:04 |
fungi | also possible | 17:04 |
ssbarnea | out of file handlers? | 17:04 |
ssbarnea | it is true that I also have ipv6 working (even if it proved to be permanent source of problems) | 17:05 |
clarkb | we are using 78% of disk with ~50% of inodes in use which is a huge swing in the opposite direction of the old not enough inodes problem | 17:06 |
clarkb | does make me wonder if there are large files dominating the downloads | 17:06 |
corvus | fwiw, i'm seeing cacti behaving slowly. that's a different server. | 17:06 |
clarkb | (and using available bw/disk iops) | 17:06 |
fungi | where are the queue graphs for our logstash workers lately? | 17:06 |
clarkb | fungi: grafana under zuul-status dashboard | 17:06 |
fungi | ahh, just found | 17:06 |
fungi | sorry for the noise | 17:06 |
clarkb | might be "Zuul Status" | 17:06 |
fungi | sime fairly large spikes in logstash queue | 17:07 |
fungi | perhaps those could correspond to starvation? | 17:07 |
fungi | http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=16&fullscreen&orgId=1 | 17:07 |
clarkb | fungi: ya could be the pipeline backs up waiting for logs to process | 17:08 |
*** efried is now known as efried_rollin | 17:08 | |
fungi | your comment about "maybe we're being our own dos" got me thinking about that | 17:09 |
mordred | yah - that could totally be a thing | 17:09 |
mordred | we're very good at DOSing things | 17:09 |
clarkb | apache logs should tell us which requests take the most time right? | 17:11 |
fungi | the apache access logs don't tell you how long the connection took, only the time it started and the size transferred | 17:12 |
clarkb | I do wonder if there are specific files dominating (maybe because wsgi middleware is spinning on them or they are huge) | 17:12 |
fungi | it may be possible to add an elapsed time field? i know there are options for adding fields to the access log | 17:12 |
mordred | clarkb: 31849 www-data 20 0 343168 74488 6044 S 97.6 0.9 51:45.11 apache2 | 17:14 |
mordred | clarkb: there is defintely some high-cpu apaching going on | 17:14 |
fungi | well, i mean apache will use as much cpu as it can to fulfill a request. it's more a question of how long it's sustained for | 17:15 |
*** trown is now known as trown|lunch | 17:15 | |
*** rkukura has joined #openstack-infra | 17:15 | |
mordred | yah | 17:15 |
ssbarnea | high cpu for serving plain text files? this smells like .log.gz recompresson to me. | 17:17 |
*** yamamoto has quit IRC | 17:17 | |
clarkb | ssbarnea: remember the wsgi that rewrite the text based on regexes | 17:18 |
*** florianf is now known as florianf|afk | 17:18 | |
ssbarnea | clarkb: nope. | 17:19 |
clarkb | ssbarnea: thats the os-loganalyze stuff you updated teh css for | 17:19 |
clarkb | it checks against known lists of files, then rewrite the plain text into html for colorizing and severity filtering | 17:20 |
clarkb | (so it isn't just gzip) | 17:20 |
ssbarnea | clarkb: yep but i was expecting this to happen offline, not on each request.... it would be stupid. | 17:21 |
clarkb | ssbarnea: its wsgi, it runs on request | 17:21 |
clarkb | ssbarnea: and its not stupid because our contraint is disk space not cpu typically | 17:21 |
clarkb | if we did it online the 12TB of disk we have would be even less useful | 17:21 |
clarkb | er offline | 17:21 |
ssbarnea | i was expecting the logs server to be as plain as possible, easy to replace with a CDN. | 17:22 |
mordred | there is work underway to shift to uploading logs to swift - when that is done, what is now active wsgi is done at upload time | 17:23 |
clarkb | fungi: fwiw on the log processors I see them basically going out to lunch like everyone else. Haven't found an indication they are the cause of the lunch break yet | 17:23 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: Use python3 to run pep8 job https://review.openstack.org/616256 | 17:24 |
fungi | clarkb: yeah, the spikes there may be symptoms of the underlying problem just like the spikes in the established tcp connections graph for the logserver | 17:24 |
*** roman_g has joined #openstack-infra | 17:24 | |
clarkb | and the log processors are in the same cloud as the fileserver so I think that rules out internet networking trouble | 17:24 |
corvus | ssbarnea: please don't call the good work of other folks stupid. | 17:25 |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver https://review.openstack.org/604404 | 17:25 |
fungi | ssbarnea: 99.99% of the logs on the logserver are never viewed with a browser, so rendering the browser-friendly view on demand is actually far less computation expended | 17:25 |
ssbarnea | corvus: it wasn't my intent to upset anyone. | 17:26 |
*** shrasool has joined #openstack-infra | 17:26 | |
fungi | pre-rendering them all *would* be stupid, in my opinion | 17:27 |
corvus | fungi: well, that's what the logs-in-swift approach does :) | 17:27 |
fungi | yep, only because we end up with no other choice | 17:27 |
clarkb | Script timed out before returning headers: wsgi.py | 17:27 |
fungi | rather, with no more efficient choice | 17:28 |
ssbarnea | i would go so far to say that processing logs should be the task of the *uploader*. | 17:28 |
mordred | ssbarnea: yup. that's what logs-in-swift does | 17:28 |
clarkb | that log comes from logs.openstack.org error file | 17:28 |
*** jpich has quit IRC | 17:28 | |
fungi | ssbarnea: yeah, that's basically what the logs in swift solution is switching ot | 17:28 |
clarkb | so either the disk isn't returning the data or os-loganalyze is getting really confused | 17:28 |
clarkb | now to figure out what files were requested at that time :/ | 17:30 |
ssbarnea | it seems that is time for me to play a little bit with os-loganalyze, unless there are plans to replace it with something else. | 17:31 |
fungi | if it ends up being noisy neighbor bandwidth consumption on the cinder network, we should expect to find no appreciable correlation for the requests being processed | 17:31 |
ssbarnea | also another question that may sound weird: why we are not doing the processing client-side (javascript)? it would mess the browsers? | 17:31 |
clarkb | ok I see hogepodge's requests (they stand out because its ara and loci) | 17:31 |
*** bobh has quit IRC | 17:32 | |
hogepodge | clarkb: ? Are we doing something that needs to be addressed? | 17:32 |
fungi | hogepodge: you merely provided a great example to research | 17:32 |
clarkb | hogepodge: I don't think so. Mostly just I can correlate the errors you saw to specific logs entries | 17:32 |
hogepodge | clarkb: with loading logs? | 17:33 |
hogepodge | clarkb: or with the job timeouts? | 17:33 |
*** mriedem has quit IRC | 17:33 | |
corvus | ssbarnea: if you write a javascript log viewer, i can incorporate it into zuul and the swift work i'm doing. see http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-July/000501.html for what that would look like | 17:33 |
mordred | ++ | 17:33 |
clarkb | hogepodge: loading logs and that being slow | 17:34 |
mordred | ssbarnea: yah - a js log viewer has been on my tdl - just haven't gotten to it. I wouldn't spend much effort on improving the os-loganalyze wsgi app though - it will hopefully be going away in the not too distant future | 17:34 |
ssbarnea | corvus: i am already doing something like this in https://github.com/openstack/coats -- so it may not bee such a big deal. | 17:35 |
*** rh-jelabarre has quit IRC | 17:35 | |
ssbarnea | nowadays clients are more powerful than the severs, also a less powerful client may just skip doing any processing and load the plain text. | 17:36 |
clarkb | fwiw both os-loganalyze and ara wsgi timeout | 17:36 |
clarkb | unless we've got a bug in both my hunch is that it is disk related | 17:37 |
*** ginopc has quit IRC | 17:38 | |
clarkb | reading os loganalyze it returns headers with a start_response basically as soon as it has created the python fileinput object | 17:40 |
*** shrasool has quit IRC | 17:41 | |
clarkb | corvus: jobs don't have a log upload size limit yet right? or do they? | 17:41 |
*** rkukura has quit IRC | 17:42 | |
corvus | clarkb: only the executor workspace limit | 17:42 |
corvus | i have a request hanging for a plain old svg file. | 17:43 |
*** derekh has quit IRC | 17:44 | |
*** shrasool has joined #openstack-infra | 17:44 | |
*** shrasool has quit IRC | 17:44 | |
corvus | i don't see my browser connection in netstat | 17:44 |
clarkb | corvus: now we don't have available connectiosn on apache | 17:45 |
clarkb | maybe it is/was that all along | 17:45 |
fungi | which, again, is likely a symptom of taking too long to return content | 17:45 |
fungi | connections pile up waiting | 17:46 |
clarkb | fungi: ya though if we have ~80 logstash log processors we can chew up a good number of apache workers | 17:46 |
clarkb | apache status shows what seems to mostly be log processing | 17:46 |
*** noama has quit IRC | 17:46 | |
corvus | i think the "_" connections are still available for use | 17:46 |
fungi | we could pause them for a bit and see what happens | 17:46 |
*** jpena is now known as jpena|off | 17:46 | |
clarkb | and then another good chunk is red hat doing something | 17:46 |
clarkb | corvus: ah | 17:47 |
clarkb | lots of red hat requests to ci.openstack.org | 17:47 |
clarkb | (which is fine, still fewer of those than log processors) | 17:48 |
clarkb | (I didn't realize that alias was in place too, neat) | 17:49 |
corvus | yeah i still use it. so easy to type :) | 17:49 |
clarkb | given that we have cpu and memory available to expand should we consider allowing for more connections to see if that helps? It could be that the requests using up the slots aren't any slower than before (see bw hasn't changed much) but we have more of them? | 17:51 |
clarkb | that doesn't explain the wsgi errors that correlate to when hogepodge was having trouble though | 17:52 |
corvus | clarkb: i don't think we're running out of slots | 17:53 |
clarkb | corvus: wouldn't your conenction have gone though if we had slots? Or do you suspect possibly some other error in front of apache? | 17:53 |
corvus | clarkb: since i've seen latency in http requests to static and cacti as well as shell latency on logs, i'm suspecting that there's a network issue between me and rax. but i can't pinpoint it, or say whether it's closer to me or rax. | 17:55 |
corvus | s/logs/static/ | 17:55 |
*** yamamoto has joined #openstack-infra | 17:56 | |
clarkb | corvus: do you know why dm-2 (logs) isn't in cacti but dm-0 and dm-1 (static and tarballs) are? | 18:00 |
clarkb | I'm thinking adding dm-2 graphs to cacti is what we may need to further debug this | 18:00 |
corvus | clarkb: no; i think those may have had to be added manually | 18:00 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: add day_specifier from recurrence https://review.openstack.org/615270 | 18:00 |
corvus | clarkb: ++ | 18:00 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: Add ralonsoh to chairing neutron-qos meeting https://review.openstack.org/616208 | 18:00 |
openstackgerrit | Thierry Carrez proposed openstack-infra/irc-meetings master: Creating an IRC meeting for CloudKitty https://review.openstack.org/616205 | 18:00 |
clarkb | corvus: by manual does that mean edit the database directly? | 18:01 |
*** jcoufal has quit IRC | 18:01 | |
corvus | clarkb: no, through the ui; i can do it real quick if you want | 18:02 |
clarkb | corvus: that would be great (I'm not quite sure where to start myself) | 18:02 |
clarkb | maybe you can show me in berlin | 18:02 |
corvus | oh, hrm. there are 81 http connections from my ip in CLOSE_WAIT | 18:02 |
clarkb | corvus: we have both iops and bytes for dm-0 and dm-1 | 18:02 |
fungi | that sounds like packet loss | 18:03 |
openstackgerrit | Merged openstack-infra/irc-meetings master: Use python3 to run pep8 job https://review.openstack.org/616256 | 18:03 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes https://review.openstack.org/616262 | 18:04 |
corvus | there are 162 connections in closewait from 3 ips. one is mine, 1 is redhat in boston, one is comcast in washington | 18:04 |
openstackgerrit | Merged openstack-infra/zuul master: Fix reporting ansible errors in buildlog https://review.openstack.org/616206 | 18:04 |
openstackgerrit | Merged openstack-infra/zuul master: doc: fix typo in secret example https://review.openstack.org/616095 | 18:04 |
clarkb | the red hat oen is NAT so could push a lot of connections, but if you alone account for half of that then maybe that isn't the case for the red hat ip | 18:05 |
clarkb | I've got to go find breakfast now but ya my suggestion is add dm-2 to cacti and go from there if the network trouble from corvus/redhat/comcast isn't the issue | 18:08 |
corvus | clarkb: dm2 is added now | 18:09 |
*** yamamoto has quit IRC | 18:09 | |
clarkb | I see it (but no graph yet) | 18:09 |
clarkb | thanks | 18:09 |
corvus | yeah, it will only have data starting now (we should see a graph in 5m, and a graph with data in 10m) | 18:10 |
*** panda is now known as panda|off | 18:10 | |
fungi | fwiw, i'm not seeing any (icmp) packet loss to static.o.o from home over either v4 or v6 routes | 18:10 |
corvus | me neither (on v4 only). nor the other direction from static to my home router. | 18:11 |
corvus | i will delete a bunch of old hosts from cacti while i'm here | 18:11 |
*** agopi|lunch is now known as agopi | 18:15 | |
*** trown|lunch is now known as trown | 18:20 | |
*** e0ne has joined #openstack-infra | 18:22 | |
*** e0ne has quit IRC | 18:25 | |
openstackgerrit | James E. Blair proposed openstack-infra/system-config master: Remove infracloud from cacti https://review.openstack.org/616265 | 18:26 |
*** rkukura has joined #openstack-infra | 18:26 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/os-loganalyze master: Adds libmagic on plaforms missing it https://review.openstack.org/616266 | 18:28 |
*** Swami has joined #openstack-infra | 18:30 | |
*** ifat_afek has quit IRC | 18:30 | |
*** ralonsoh has quit IRC | 18:30 | |
ssbarnea | corvus: any reason why pypy is included on os-loganalyze tox targets? | 18:31 |
ssbarnea | is kinda broken on my system, and it was a surprise to see it, especially as py3 is missing. | 18:32 |
fungi | likely inherited from a very old cookiecutter template | 18:32 |
mordred | ssbarnea: probably just historical | 18:32 |
mordred | yeah | 18:32 |
corvus | yep, historical accident of when active work on osla more or less stopped | 18:32 |
corvus | (at that time, pypy was interesting and py3 was not [shrug]) | 18:32 |
fungi | we could install python3.1 from a ppa back then, i think, or maybe compile 3.2 from source | 18:33 |
fungi | py3k porting didn't really gain traction in our community until 3.3 was readily available | 18:33 |
hogepodge | so clarkb and fungi, once the the log situation is sorted out, any chance we can take a peek at what's going on with jobs like this? https://review.openstack.org/#/c/616044/2 | 18:34 |
fungi | mainly because the options for writing a nontrivial application which could run unaltered on 2.6, 2.7 and 3.2 was... basically not happening | 18:34 |
hogepodge | About 1 in every 10 loci jobs is timing out, and I can't find where in the logs things are going wrong. | 18:34 |
*** haleyb has joined #openstack-infra | 18:34 | |
clarkb | corvus: on https://etherpad.openstack.org/p/WUNOTv8MuP some of the faq questions I think you had thoughts on. Care to add them? | 18:35 |
hogepodge | If there's a place I can add logging to get that information (or where I should be looking), I'm all ears. From what I can tell, the build jobs for ubuntu/centos/leap are being done in parallel and it's not returning. | 18:35 |
*** e0ne has joined #openstack-infra | 18:36 | |
clarkb | ok its happening again with logs amd dm-2 shows device reads dropping | 18:37 |
fungi | yeah, i just tried to load the ara-report for the failing job hogepodge mentioned | 18:37 |
*** e0ne has quit IRC | 18:38 | |
hogepodge | I'm sure this has nothing to do with jobs named after a trickster god | 18:38 |
*** mriedem has joined #openstack-infra | 18:38 | |
corvus | clarkb: wow that is a write-only filesystem if i've ever seen one :) | 18:38 |
clarkb | corvus: indeed | 18:39 |
ssbarnea | mwhahaha: please read my comment on https://review.openstack.org/#/c/616203/ | 18:39 |
ssbarnea | mordred: i was expecting historical reasons. so nobody will be against me removing it. later, if needed, i will add a py3 one which seems more practical. | 18:40 |
*** jtomasek has quit IRC | 18:40 | |
mordred | clarkb: I just added the word "a" on line 22 | 18:41 |
mordred | ssbarnea: ++ | 18:41 |
ssbarnea | i guess i should also move zuul job into the repository. if we ask others to do the same, it would be a good idea to do it for infra repos too :D | 18:42 |
*** boden has joined #openstack-infra | 18:42 | |
corvus | ssbarnea: it's fine to do, but it's low priority for us -- we aren't blockers to our own config changes. | 18:47 |
ssbarnea | corvus: true, i am doing it just because I am starting to touch the project. seems like the kind of change you do the first time you need to reconfigure the jobs. | 18:48 |
corvus | yep | 18:49 |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/os-loganalyze master: Move zuul job definition inside the respository https://review.openstack.org/616273 | 18:50 |
*** diablo_rojo has joined #openstack-infra | 18:51 | |
*** rh-jelabarre has joined #openstack-infra | 18:52 | |
*** kjackal has quit IRC | 18:52 | |
*** rh-jelabarre has quit IRC | 18:52 | |
*** rh-jelabarre has joined #openstack-infra | 18:52 | |
*** eharney has quit IRC | 18:53 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/project-config master: Move os-loganalize job definition inside the project https://review.openstack.org/616275 | 18:56 |
*** gtmanfred has quit IRC | 18:57 | |
openstackgerrit | Sorin Sbarnea proposed openstack-infra/os-loganalyze master: Move zuul job definition inside the respository https://review.openstack.org/616273 | 18:57 |
*** roman_g has quit IRC | 18:57 | |
*** gtmanfred has joined #openstack-infra | 18:57 | |
*** gfidente is now known as gfidente|afk | 18:58 | |
ssbarnea | are circular dependencies allowed by zuul? | 18:59 |
mordred | ssbarnea: no, they are not | 18:59 |
ssbarnea | mordred: so which CR must go first? the one adding the new jobs, or the one removing it from project-config? | 19:01 |
clarkb | hogepodge: reading the ara report for that I'm confused why the async stuff is used. Couldn't you just wait for the build to finish? I ask because I think we've found that ansible async is really buggy | 19:02 |
clarkb | mordred: ^ maybe you know the status on the async work but iirc you had pushed a patch or two to try and clean that up | 19:02 |
mordred | clarkb, hogepodge it is _definitely_ buggy. when last we sent in patches, we were basically the only ones touching it - and then we stopped | 19:02 |
mordred | ssbarnea: it's safer to add them first, then remove them - if you do it the other way the repo will be ungated for a little bit | 19:02 |
mordred | ssbarnea: double-defining the jobs isn't an issue, they'll still only be run once | 19:02 |
Shrews | and ansible declared us "experts in async" LOL | 19:02 |
*** kmalloc is now known as needscoffee | 19:05 | |
*** bobh has joined #openstack-infra | 19:06 | |
mordred | Shrews: to be fair - I believe at that moment in time we were :) | 19:06 |
*** bobh has quit IRC | 19:07 | |
*** priteau has quit IRC | 19:10 | |
*** priteau has joined #openstack-infra | 19:12 | |
fungi | a frightening thought | 19:13 |
clarkb | fungi: I recall you had an excellent answer for conference opendev association. Care to add that to https://etherpad.openstack.org/p/WUNOTv8MuP faq? | 19:15 |
clarkb | looking at dm-2 cacti info for the last slow spell I'm not seeing anything that looks suspicious | 19:16 |
clarkb | the next step may be to try stracing a wsgi processing when it happens if we can catch it? | 19:16 |
fungi | clarkb: my pleasure | 19:17 |
*** rh-jelabarre has quit IRC | 19:18 | |
clarkb | tyty | 19:21 |
*** jcoufal has joined #openstack-infra | 19:22 | |
*** pfallenop has joined #openstack-infra | 19:23 | |
*** e0ne has joined #openstack-infra | 19:23 | |
fungi | clarkb: that lgty? | 19:24 |
clarkb | fungi: I'll look in a sec, but logs is doing it again so I'm stracing stuff | 19:25 |
clarkb | the pid reported by apache status for my connection is blocking on a read to a pipe | 19:26 |
*** eharney has joined #openstack-infra | 19:26 | |
clarkb | how do you find out what is on the other side of the pipe? | 19:26 |
clarkb | procfs? | 19:26 |
AJaeger | clarkb: yeah, procfs | 19:27 |
AJaeger | /proc/<pid>/fd | 19:27 |
clarkb | lr-x------ 1 root root 64 Nov 7 19:25 7 -> pipe:[1025660877] | 19:27 |
AJaeger | mmh, now I'm lost ;( | 19:28 |
clarkb | lsof seems to think that maybe that is a pipe between apache processes | 19:29 |
*** rh-jelabarre has joined #openstack-infra | 19:29 | |
*** e0ne has quit IRC | 19:31 | |
corvus | clarkb: the faq answers lgtm; i left some comments on the prose in chat | 19:32 |
*** dpawlik has quit IRC | 19:34 | |
*** dpawlik_ has joined #openstack-infra | 19:34 | |
clarkb | thanks | 19:35 |
dmsimard | clarkb: gave it a read, +1 | 19:39 |
clarkb | apparently the wsgi script timed out errors can be related to mod wsgi python itnerpreters and the GIL? | 19:39 |
clarkb | ok heres a theory | 19:39 |
clarkb | ssbarnea's change to osla while not in python would've caused us to reinstall osla and its been a while since that last happened | 19:40 |
clarkb | maybe we pulled in some new dep causing that sort of python problem with mod_wsgi? | 19:40 |
*** betherly has joined #openstack-infra | 19:42 | |
clarkb | WSGIApplicationGroup %{GLOBAL} could be the workaround for this (but forces all wsgi processes to run in the context of the main python interpreter?) | 19:43 |
fungi | there should be a pip log which mentions what got upgraded/installed on the server | 19:44 |
clarkb | there is no ~root/.pip on logs.o.o | 19:46 |
*** electrofelix has quit IRC | 19:47 | |
*** betherly has quit IRC | 19:47 | |
clarkb | ok nevermind we already set the WSGIApplicationGroup to GLOBAL | 19:47 |
clarkb | could it be related to having two wsgi apps in the same vhost? (ara and osla) | 19:47 |
fungi | but why didn't we notice this until *very* recently if so? | 19:48 |
fungi | we've been doing ara-on-wsgi for a few months at this point, right? | 19:48 |
clarkb | fungi: ya but we haven't updated osla in logner than that I think | 19:49 |
dmsimard | ara senses tingling | 19:49 |
dmsimard | I had to put in a bit of effort for ara and osla to not conflict but it was a matter of rewrite rules, not deps | 19:50 |
dmsimard | i.e, ara was hijacking osla routes (or vice versa) | 19:50 |
clarkb | dmsimard: in this case we are seeing both of them time out in apache, which apparently can indicate conflicts between the wsgi processes when sharing the python interpreter? | 19:51 |
dmsimard | I'll poke at the logs a bit | 19:53 |
*** mriedem is now known as mriedem_afk | 19:53 | |
dmsimard | no recent occurrences of those but that's still odd http://paste.openstack.org/raw/734369/ | 19:54 |
dmsimard | oh wait, I was looking at .1 | 19:55 |
clarkb | the next time someone notices slowness can you see if logs-dev.openstack.org has the same behavior for that file? | 20:03 |
clarkb | they share the global application wsgi group | 20:03 |
clarkb | so I think tehy should both be slow if the problem is associated with the python interpreter | 20:03 |
*** dpawlik_ has quit IRC | 20:06 | |
clarkb | corvus: want to double check my edits based on your feedback? | 20:07 |
*** dpawlik has joined #openstack-infra | 20:07 | |
*** dpawlik has quit IRC | 20:09 | |
*** dpawlik has joined #openstack-infra | 20:09 | |
dmsimard | clarkb: so there's a pattern emerging from the error logs: http://paste.openstack.org/show/734370/ | 20:11 |
dmsimard | 38.145.34.10 is a tripleo tool, I'm in touch with them to understand what it does and determine if it's misbehaving | 20:12 |
clarkb | dmsimard: what does this tool do? | 20:13 |
dmsimard | not entirely sure yet haha | 20:13 |
openstackgerrit | Merged openstack-infra/storyboard master: Fix up a few requirements https://review.openstack.org/611194 | 20:15 |
clarkb | corvus: fungi dmsimard the other thing I notice digging into wsgi config is we only have 8 single thread wsgi processes? which means that apache may be able to take more connections but have busy wsgi processes behind them? | 20:17 |
clarkb | its possible this is a dos after all and we are just not able to keep up with demand for those processes? | 20:17 |
dmsimard | clarkb: yes, the thread/process config caught my eye as well | 20:17 |
clarkb | this would explain why apache sees my connection that is hanging as connected and otherwise happy, because its waiting for wsgi processes to open up? | 20:18 |
*** efried_rollin is now known as efried | 20:18 | |
dmsimard | clarkb: the logstash workers are (understandably) hammering logs.o.o on a regular basis -- they probably don't need their logs to go through osla ? | 20:18 |
clarkb | dmsimard: they use osla to filter out debug logs | 20:19 |
clarkb | doesn't necessarily have to happen there | 20:19 |
*** needscoffee is now known as kmalloc | 20:19 | |
clarkb | (but that is why it is done this way) | 20:19 |
dmsimard | I had put a reasonable default in the template, perhaps we need to tune it http://git.openstack.org/cgit/openstack-infra/puppet-openstackci/tree/templates/logs.vhost.erb#n16 | 20:22 |
clarkb | dmsimard: I think if we notice it happening again and logs-dev works fine then that would be a good next step | 20:23 |
clarkb | if logs-dev shows the same problem then it likely isn't process/thread contention (those two have different process groups) | 20:23 |
*** kjackal has joined #openstack-infra | 20:23 | |
dmsimard | ++ | 20:24 |
dmsimard | the server doesn't seem particularly busy on the cpu side of things http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=138&rra_id=0&view_type=tree&graph_start=1541103780&graph_end=1541622180 | 20:24 |
dmsimard | so it wouldn't shock me to increase the amount of threads | 20:24 |
*** eharney has quit IRC | 20:25 | |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one nodepool launcher https://review.openstack.org/616288 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all nodepool launchers https://review.openstack.org/616289 | 20:29 |
clarkb | ok it just did it to me. logs-dev was fine | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one nodepool builder https://review.openstack.org/616290 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all nodepool builders https://review.openstack.org/616291 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for one zuul executor https://review.openstack.org/616292 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on future parser for all zuul executors https://review.openstack.org/616293 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for one zuul merger https://review.openstack.org/616294 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for all zuul mergers https://review.openstack.org/616295 | 20:29 |
openstackgerrit | Colleen Murphy proposed openstack-infra/system-config master: Turn on the future parser for zuul.openstack.org https://review.openstack.org/616296 | 20:29 |
clarkb | http://logs-dev.openstack.org/44/616044/2/check/loci-heat/6909fc4/job-output.txt.gz vs http://logs.openstack.org/44/616044/2/check/loci-heat/6909fc4/job-output.txt.gz | 20:29 |
clarkb | infra-root given ^ thoughts on bumping the process/thread count for logs.o.o? | 20:30 |
clarkb | er specifically for wsgi in logs.o.o | 20:30 |
fungi | it sounds like a plausible theory. what's the default and where it it tuned? | 20:31 |
dmsimard | I'll have a patch up in a minute | 20:32 |
clarkb | fungi: http://git.openstack.org/cgit/openstack-infra/puppet-openstackci/tree/templates/logs.vhost.erb#n16 is where we set it, those vars are at http://git.openstack.org/cgit/openstack-infra/puppet-openstackci/tree/manifests/logserver.pp#n34 | 20:32 |
fungi | those do look like small numbers | 20:33 |
dmsimard | hmm, we can either update the default there | 20:33 |
clarkb | infra-root unrelated to the logs my plan is to send out the opendev email to the various -dev and -discuss tomorrow morning local time | 20:33 |
dmsimard | or bump it from openstack_project::static | 20:33 |
clarkb | dmsimard: I think we should bump it in openstack_project | 20:33 |
clarkb | dmsimard: then consider changing the default if this fixes the issue | 20:33 |
fungi | clarkb: i fully endorse your communication plan. thanks!!! | 20:34 |
mordred | clarkb: ++ | 20:34 |
clarkb | I think it looks quite good as far as an email draft goes. Thank you all fior the help | 20:34 |
clarkb | following up on ianw's planet investigation from last night it appears to be up and running now. Did any of us fix it? | 20:35 |
fungi | clarkb: before it goes out, can we change "best practices" to something like "recommended practices" or "standard practices" or even just "practices"? | 20:35 |
fungi | "best practices" makes my skin crawl | 20:35 |
clarkb | fungi: ++ feel free to update to the least skin crawlly version | 20:35 |
dmsimard | clarkb: do we want to "dry" test at 16 processes before patching it in ? | 20:35 |
clarkb | dmsimard: we can. Let me put static.o.o in the emergency file | 20:36 |
clarkb | dmsimard: thats done. Do you want me to edit to 16 processes or were you planning to do it? | 20:37 |
openstackgerrit | David Moreau Simard proposed openstack-infra/system-config master: Bump amount of mod_wsgi processes for static vhosts to 16 https://review.openstack.org/616297 | 20:37 |
dmsimard | I can | 20:37 |
clarkb | (puppet can still override it depending on where it is in the run... but meh we can set it back again fi that happens) | 20:37 |
clarkb | (it should only overwrite the one possible time though, after that its stopped) | 20:37 |
clarkb | I see new processes | 20:38 |
dmsimard | it's restarting now | 20:38 |
dmsimard | #status log logs.o.o was put in the emergency file to test if bumping to 16 wsgi processes addresses timeout issues pending https://review.openstack.org/616297 | 20:39 |
openstackstatus | dmsimard: finished logging | 20:39 |
clarkb | there are 24 processes now instead of 16 (16 + 8 vs 8+8) that lgtm | 20:39 |
*** rfolco|ruck is now known as rfolco|off | 20:40 | |
clarkb | now I guess we ask people to be on the lookout for subsequent slow behaviopr | 20:42 |
clarkb | hogepodge: ^ fyi | 20:42 |
clarkb | ssbarnea: your line break change seems to be working too fwiw | 20:42 |
*** dpawlik has quit IRC | 20:43 | |
*** dpawlik has joined #openstack-infra | 20:43 | |
clarkb | dmsimard: bah puppet updated the vhost file ~2 minutes after you restarted | 20:44 |
clarkb | dmsimard: can you update it again and restart again? (puppet shouldn't run anymoer after that last one) | 20:44 |
clarkb | the race there is in generating the ansible groups and running puppet. If that happens before we add a host to the emergency file then puppet runs one more time on that host | 20:44 |
clarkb | *if generating the ansible groups happen before editing the file and then running puppet happens after | 20:45 |
ianw | clarkb: i didn't do anything to it | 20:46 |
clarkb | dmsimard: let me know if I should do the update instead (I don't want to step on toes) | 20:46 |
ianw | planet.o.o i mean | 20:46 |
clarkb | ianw: must've been cloud side fix then? I'm happy to take that | 20:46 |
ianw | yep :) the cloud giveth, the cloud taketh away | 20:47 |
*** guilhermesp has quit IRC | 20:47 | |
*** jungleboyj has quit IRC | 20:47 | |
*** rajinir has quit IRC | 20:47 | |
*** dpawlik has quit IRC | 20:47 | |
*** mrhillsman has quit IRC | 20:47 | |
*** liusheng__ has quit IRC | 20:47 | |
*** jbryce has quit IRC | 20:47 | |
*** hogepodge has quit IRC | 20:47 | |
*** kopecmartin|off has quit IRC | 20:47 | |
*** sparkycollier has quit IRC | 20:47 | |
*** fdegir has quit IRC | 20:47 | |
*** diablo_rojo_phon has quit IRC | 20:47 | |
*** neith has quit IRC | 20:47 | |
*** jbryce has joined #openstack-infra | 20:48 | |
*** mnaser has quit IRC | 20:48 | |
*** lamt has quit IRC | 20:48 | |
*** mwhahaha has quit IRC | 20:48 | |
*** seongsoocho has quit IRC | 20:48 | |
*** sparkycollier has joined #openstack-infra | 20:48 | |
*** guilhermesp has joined #openstack-infra | 20:48 | |
*** hogepodge has joined #openstack-infra | 20:48 | |
*** fdegir has joined #openstack-infra | 20:48 | |
*** eharney has joined #openstack-infra | 20:48 | |
*** lucasagomes has quit IRC | 20:48 | |
*** seongsoocho has joined #openstack-infra | 20:49 | |
clarkb | hrm its been a few minutes, dmsimard must've stepped away. I will make the process change again | 20:49 |
*** mwhahaha has joined #openstack-infra | 20:49 | |
*** mnaser has joined #openstack-infra | 20:49 | |
*** kopecmartin has joined #openstack-infra | 20:49 | |
hogepodge | clarkb: I didn't write the original gate jobs, and the person who did has disappeared. I expect that they're being run in parallel to save on time, but I was thinking the same thing, send up a patch to make them serial | 20:49 |
clarkb | hogepodge: reading it, it appeared it was still running one image at a time? | 20:50 |
clarkb | but I am bad at understanding ansible looping so could be me misunderstanding | 20:50 |
*** kgiusti has left #openstack-infra | 20:50 | |
hogepodge | clarkb: what do you mean? | 20:50 |
*** Qiming has quit IRC | 20:50 | |
*** rajinir has joined #openstack-infra | 20:50 | |
hogepodge | me too | 20:50 |
*** andreaf has quit IRC | 20:51 | |
clarkb | hogepodge: well I had missed that it has a distros with_items | 20:51 |
fungi | okay, taking a break to go get very late lunch, but i'll be back | 20:51 |
clarkb | hogepodge: so ya its trying to build all the distros in parallel. /me wishes we would get away from the idea that our container images should supprot all the base distros | 20:51 |
hogepodge | clarkb: so it's doing docker builds with different base images (which triggers different code paths based on introspection) | 20:52 |
*** andreaf has joined #openstack-infra | 20:52 | |
clarkb | hogepodge: one thing you might be able to do if the time cost for doing it serially is high is move the asynchronous polling out of ansible | 20:52 |
clarkb | have a shell script do that or similar | 20:52 |
hogepodge | clarkb: it's the only way to get buy in from vendors. redhat and suse and canonical are all involved in some way, and they want to use their own base container images | 20:52 |
fungi | clarkb: i've done one last pass over the announcement and made some minor edits for improved accuracy | 20:53 |
hogepodge | clarkb: let me send a patch up that does it serial and we can do measurements | 20:53 |
clarkb | hogepodge: ya but it also defeats the purpose of containers imo. THe whole point is you shouldn't care what the surrounding env of the application is as long as it is functional | 20:53 |
clarkb | fungi: thanks! | 20:53 |
*** Qiming has joined #openstack-infra | 20:54 | |
dmsimard | clarkb: I had indeed stepped momentarily away | 20:54 |
* fungi disappears for a little while | 20:54 | |
clarkb | dmsimard: no worries. I think its all set now | 20:54 |
hogepodge | If you buy into the idea that containers don't contain, but are convenient isolation barriers that you can mix and match, that becomes a less compelling argument | 20:54 |
clarkb | hogepodge: in particular the tiny images for go things that are basically libc + compiled go file are really neat | 20:54 |
clarkb | iirc there are things for python like that too | 20:54 |
hogepodge | openstack needs to touch a lot of system stuff, which complicates matters | 20:54 |
clarkb | that is true. | 20:55 |
hogepodge | but are limited in scope to python. venv doesn't solve you mysql problems | 20:55 |
clarkb | hogepodge: no, but also mysql doesn't require to be run on any one of those many distros | 20:55 |
clarkb | and https://hub.docker.com/r/mysql/mysql-server/ has made a choice for you | 20:56 |
clarkb | and you aren't supposed to care if it is alpine or ubuntu or centos or something else | 20:56 |
hogepodge | True. I always use the mariadb provided mysql container image, and I don't know what it's built on, which is fine by me | 20:56 |
mordred | hogepodge: exactly :) | 20:56 |
clarkb | oraclelinux | 20:57 |
hogepodge | but I may care about what neutron or nova-compute is built on, because it wants to control my lower level system | 20:57 |
clarkb | not surprising, also I don't care | 20:57 |
*** kota_ has joined #openstack-infra | 20:57 | |
hogepodge | for most services it doesn't matter, totally agree. | 20:57 |
clarkb | mariadb is based on ubuntu | 20:58 |
persia | Interestingly, there are more ways to do "containers" than folk who like to use the word, so there's no answer to this. In practice, it makes sense to build whatever folk want to use: an integration test that runs on some mix of things that can be identified but doesn't match someone's plan doesn't let them reuse the result. | 20:58 |
hogepodge | right or wrong, we live in a multi-vendor world and you're going to have a hard time getting VendorA to use packages build on VendorB when each is selling their own OpenStack solution (there was an instance where we tried, and once someone figured out what was going on they were eUpset) | 20:59 |
mordred | I agree with both hogepodge and persia that this is the state of the world | 20:59 |
clarkb | hogepodge: ya that is why I think the actual answer there is a non vendor base. Like the libc + binary go stuff | 20:59 |
persia | hogepodge: There are semantic ways around that, but they are hard. Happy to discuss in depth sometime, but I suspect it will take a very long time to make progress. | 21:00 |
hogepodge | fwiw someone just sent up a patch to loci to add another base image and I was strongly "nope, we have configuration to let you build your own images from your own base... we don't want to be responsible for maintaining your special needs" | 21:00 |
mordred | however, I agree with clarkb that the specifics of that state of the world are sad, as they remove a major benefit | 21:00 |
mordred | there is a reason we only support one base image in pbrx :) | 21:01 |
persia | flatpak.org has an interesting answer to that. They run on any distro, but ship the freedesktop SDK as a base, and build all the flatpaks against that. | 21:01 |
clarkb | persia: yup I am actually hopeful that flatpak and/or snaps (and there is a third option that does the same thing iirc) will push things more towards the nix model for packaging with containers in mind | 21:02 |
mordred | persia: I haven't been able to get myself as excited about flatpak as I'd like to be - especially given it aligns very well with things we discussed at UDS 10 years ago :) | 21:02 |
clarkb | because one of the things we've (openstack, infra, kolla, tripleo) struggled with here is that with the multi distro vendor container stuff we end up with massive containers for tiny packages | 21:02 |
mordred | although I will admit I have used a snap for something | 21:02 |
clarkb | like tripleo rsyslog container is >600MB | 21:02 |
clarkb | I'm sure the package on centos is like 2MB | 21:03 |
persia | I have only limited excitement about flatpak, due to all the issues with distributing blobs, but I am really excited about freedesktop SDK. I hope we can reach a point where we can use that as a reference point, and distros can value-add to that ABI, rather than the current confusion about "versions" of things. | 21:03 |
clarkb | I just want packages to be 2MB again and not 700MB :) | 21:04 |
clarkb | so that you can realistically have 100 of them in your system and not want to cry doing CI | 21:04 |
clarkb | (then multiple that by number of supported distros) | 21:04 |
mordred | clarkb: or run out of filesystem space when trying to use docker to install a web app | 21:04 |
persia | clarkb: If you don't want to cry during CI, you might consider some of the non-package models (nix is a good example) :) | 21:04 |
clarkb | ya nix has been hgih on my list of things I should look at but then when I have a free moment I get distracted by brisket and fishing and my kids | 21:05 |
persia | A few years ago I was playing with some systems that let you define your system in yaml, talked to a build farm, and let you "checkout" specific configurations by git sha of the yaml config which applied binary diffs to the filesystem, and it made everything super-fast. Unfortuantely, it didn't work in lots of other ways, and that project isn't active anymore. We're getting closer to sane with these sorts of things. | 21:06 |
*** ifat_afek has joined #openstack-infra | 21:06 | |
clarkb | in a former life I did linux sysadmin for a university and nix would be perfect there because then users can install packages and get the specific version of a thing they want | 21:06 |
clarkb | which seemed like half the work we did as admins. Oh you need valgrind version x.y.z_1 because _2 has a bug? ugh | 21:06 |
persia | (side note: Apache doesn't like it very much when you upgrade the live-running server by slowly switching network namespaces on each new connection to the new backend version in the new filesystem namespace) | 21:07 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Report tenant and project specific resource usage stats https://review.openstack.org/616306 | 21:07 |
clarkb | This was also my first run in with lisp. Debugging why certain versions of valgrind would crash under lush2 | 21:08 |
persia | In a former life I spent most of my time trying to untangle the mess that was created after that, and make sure that everyone could install everything they wanted on a single set of library versions. I like nix in lots of ways, but the security implications still scare me. | 21:08 |
clarkb | persia: don't user package installs theoretically isolate the user to what they could already do? they can run a webserver on high ports but no setuid etc | 21:09 |
clarkb | *nix user packages | 21:09 |
persia | clarkb: Right, which means they can run a webserver with a remote exploit, and maybe something else with a privilege escalation exploit, and ... | 21:09 |
clarkb | persia: ya but they could do that anyway (without nix helping) | 21:10 |
*** rh-jelabarre has quit IRC | 21:10 | |
persia | True, but in a package world, when the admins update the mirrors to have patched versions, any systems using those mirrors that are configured to upgrade, upgrade. Users have to do more work to cause issues. | 21:10 |
clarkb | this is true the barrier to entry is a bit lower | 21:11 |
persia | Note that it's possible to cause nix to deal with it: it's just harder than with package-based systems. | 21:11 |
*** gfidente|afk has quit IRC | 21:11 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Add resource metadata to nodes https://review.openstack.org/616262 | 21:13 |
*** dpawlik has joined #openstack-infra | 21:14 | |
openstackgerrit | Merged openstack-infra/system-config master: Remove infracloud from cacti https://review.openstack.org/616265 | 21:14 |
clarkb | ianw: should we remove planet01 from the emergency file? | 21:14 |
dmsimard | clarkb: did you have a reliable way of reproducing the logs.o.o issue ? | 21:17 |
clarkb | dmsimard: no, but it was reproducing itself about once an hour or so | 21:17 |
clarkb | dmsimard: http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=16&fullscreen&orgId=1&from=now-6h&to=now the large spikes there seem to correlate with it | 21:18 |
dmsimard | oh, interesting | 21:19 |
clarkb | dmsimard: the logprocessor nodes were backing up like web clients | 21:20 |
clarkb | so their queues would grow | 21:20 |
ianw | clarkb: if it's back, yep | 21:21 |
*** dpawlik has quit IRC | 21:22 | |
ianw | clarkb: done | 21:22 |
*** dpawlik has joined #openstack-infra | 21:22 | |
clarkb | thanks | 21:23 |
ianw | corvus/clarkb: https://review.openstack.org/#/c/614894/ is for pinning ansible, and came out of prior discussions on corvus' pin. thoughts welcome :) there's a follow-on for openstacksdk too | 21:23 |
*** ifat_afek has quit IRC | 21:24 | |
corvus | ianw: my main concern with voting on that is github reliability -- however, we do have ansible in our zuul; we can add it to required-projects and install from there | 21:26 |
*** dpawlik has quit IRC | 21:27 | |
*** rh-jelabarre has joined #openstack-infra | 21:28 | |
ianw | corvus: hrrm, good point, the devel job should be able to use a local path just as easily, i'll look at that | 21:29 |
corvus | ianw: bonus: depends-on will work :) | 21:29 |
ianw | ++ | 21:30 |
ianw | i was a bit tunnel visioned on getting a URL working and being able to leave the version out, should have thought of pulling it via zuul | 21:31 |
*** jcoufal has quit IRC | 21:32 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Set ansible's default branch to devel https://review.openstack.org/616314 | 21:33 |
corvus | ianw: we should land that first ^ :) | 21:33 |
*** dpawlik has joined #openstack-infra | 21:33 | |
corvus | mordred, Shrews: ^ | 21:34 |
dmsimard | At what point should we consider moving the base job from xenial to bionic ? | 21:35 |
dmsimard | I had to move certain jobs to bionic because they required a more recent version of py3 | 21:35 |
clarkb | dmsimard: the TC and QA team are still trying to figure that out I think. We've asked that infra not be the decision maker on that | 21:36 |
dmsimard | fair | 21:36 |
clarkb | (this is one reason that multiple tenants in zuul will be helpful) | 21:36 |
clarkb | fungi: fyi I created https://etherpad.openstack.org/p/BER-opendev-feedback-and-missing-features for your forum session | 21:37 |
*** ansmith has quit IRC | 21:37 | |
*** dpawlik has quit IRC | 21:38 | |
*** dpawlik has joined #openstack-infra | 21:38 | |
clarkb | added it to the etherpad too | 21:38 |
clarkb | er wiki page for etherpads | 21:38 |
*** dpawlik has quit IRC | 21:39 | |
*** dpawlik_ has joined #openstack-infra | 21:39 | |
clarkb | fungi: prepopulated it with the list of things from the schedule which is probably a good start | 21:40 |
clarkb | If anyone else has forum sessions they intend on using an etherpad with please update https://wiki.openstack.org/wiki/Forum/Berlin2018 :) | 21:41 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: run_cloud_launcher.sh : generate runtime stats https://review.openstack.org/616355 | 21:43 |
*** bobh has joined #openstack-infra | 21:48 | |
*** bobh has quit IRC | 21:51 | |
*** fuentess has joined #openstack-infra | 21:52 | |
clarkb | hrm I think there was another case of slowness at ~21:24-21:30 | 21:56 |
clarkb | and the wsgi change went in at 20:49 | 21:56 |
clarkb | LogLevel info next? | 21:56 |
*** jungleboyj has joined #openstack-infra | 21:57 | |
dmsimard | clarkb: it's recurring every hour, do we have something that could explain that ? There's some spikes in load elsewhere http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=25&fullscreen&orgId=1 | 21:57 |
clarkb | dmsimard: could be gate resets | 21:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests https://review.openstack.org/615356 | 21:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove unneeded nodepool test skips https://review.openstack.org/616358 | 21:58 |
dmsimard | clarkb: the logstash workers are kinda spiky too | 21:58 |
dmsimard | http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=1267&rra_id=0&view_type=tree&graph_start=1541541508&graph_end=1541627908 | 21:58 |
clarkb | dmsimard: ya ultimately its all driven by how quickly zuul pushes stuff out | 21:59 |
Shrews | corvus: what does that fix (if anything)? | 22:03 |
clarkb | ianw: the wsgi timeouts ring any bells for you? logs.o.o server will get slow periodically (seems to coincide with when log processing has a backlog http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=16&fullscreen&orgId=1&from=now-6h&to=now then will go away | 22:03 |
Shrews | corvus: oh, ok. helps to read scrollback | 22:03 |
clarkb | ianw: best I can tell logs-dev.o.o doesn't have that issue on the same log files implying its somethign to do with apache + mod_wsgi + our wsgi apps in that particular process group | 22:04 |
corvus | Shrews: ok; was about to answer it's for jobs like ianw's to cross-test our playbooks with ansible, but let me know if you have further questions. | 22:04 |
corvus | Shrews: (basically gets rid of a bunch of override-checkouts. i don't think it will affect openstacksdk, which *does* have a bunch of override checkouts -- i wouldn't remove them since there's complex branch logic that probably has to stay anyway) | 22:05 |
Shrews | corvus: yeah, i was seeing the "branches: devel" in the job and was confused at first | 22:05 |
*** trown is now known as trown|outtypewww | 22:06 | |
hogepodge | clarkb: it looks to me like async doesn't give us any performance advantage, so I may just pull it all out | 22:07 |
clarkb | hogepodge: ok, will be curious to know if it is more reliable | 22:07 |
ianw | clarkb: sorry no :/ | 22:07 |
clarkb | dmsimard: at full steam ahead logprocessors will process 80 files at once. Though the vast majority of the time involved there should be in the file munging not downloading (so I don't expect we'd see 80 concurrent requests often). Does make me think the issue is elsewhere and log processing just sees the symptom as fungi put it | 22:10 |
openstackgerrit | Merged openstack-infra/project-config master: Set ansible's default branch to devel https://review.openstack.org/616314 | 22:11 |
dmsimard | clarkb: need to take care of dinner, I can take another look later tonight | 22:13 |
clarkb | dmsimard: ya I'm beginning to think this may be a sleep on it problem | 22:13 |
clarkb | I've got a tail on the error log now. If I catch it erroring out in the near future I'll poke around and try to see if we had a gate reset or any other external event that may be clogging the pipeline | 22:13 |
corvus | clarkb: or we could switch to swift logs | 22:14 |
clarkb | but otherwise I need to clear head and get some other stuff done and look at it with fresh eyes | 22:14 |
clarkb | corvus: ya there is that too | 22:14 |
clarkb | corvus: the last remaining bits were mostly edge cases that tobiash was finding? | 22:14 |
corvus | oh, i'm not aware of any issues. i thought they're ready to go; i've mostly just been hoping we can get other stuff in place in zuul to make them nicer, before the next system crash forces us to switch. | 22:15 |
clarkb | gotcha | 22:15 |
*** kjackal has quit IRC | 22:20 | |
*** mriedem_afk is now known as mriedem | 22:22 | |
*** dpawlik_ has quit IRC | 22:23 | |
*** dpawlik has joined #openstack-infra | 22:24 | |
clarkb | I do worry that if we change it and all get on planes the next few days we might not be able to fix bigger issues if somethign goes wrong (right now jobs are functioning and uploading logs and you can usually get the log files ) | 22:25 |
clarkb | maybe plan for it week after summit (when its bit quieter for US thanksgiving) and we can watch it a bit closer? | 22:26 |
*** dpawlik has quit IRC | 22:28 | |
corvus | yeah. | 22:30 |
hogepodge | clarkb: well, failure rate is still the same, some jobs are about 50% slower, but now I can actually get a log of what went wrong | 22:31 |
hogepodge | for example http://logs.openstack.org/37/616337/2/check/loci-glance/89bbd6e/job-output.txt.gz#_2018-11-07_22_26_39_546696 | 22:31 |
*** rfolco|off has quit IRC | 22:34 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Pin bridge.o.o to ansible 2.7.0, add devel testing job https://review.openstack.org/614894 | 22:35 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk https://review.openstack.org/615982 | 22:35 |
*** boden has quit IRC | 22:36 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk https://review.openstack.org/615982 | 22:36 |
*** ssbarnea has quit IRC | 22:37 | |
clarkb | well other than that one at 21:30 ish we've not appeared to see any more slowdowns since. Maybe the 16 threads did help but not fix the problem (eg need more processes?) | 22:38 |
clarkb | need more data | 22:38 |
hogepodge | clarkb: is there a way to recheck just a single job, or do I need to do them all? | 22:40 |
*** dpawlik has joined #openstack-infra | 22:40 | |
clarkb | hogepodge: you have to do all of them (this is by design to prevent lockign in of results for flaky jobs) | 22:40 |
ianw | hogepodge: i have no idea what you're doing :) but for intermediate testing i often edit down the zuul file to just the interesting jobs | 22:41 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: Set relative priority of node requests https://review.openstack.org/615356 | 22:41 |
hogepodge | ianw: I'm trying to figure out what I'm doing. loci testing does a bunch of things. It has one job for every supported openstack project (and requirements), and within those jobs it does three builds. I think I want to do a major refactoring, but have to think about how to organize it. I'm just trying to understand why jobs have a 10% failure rate in infra | 22:43 |
hogepodge | For that particular project. It's too high and has turned into a blocker for us. | 22:43 |
lbragstad | has anyone seen things like this crop up recently wrt reno? http://logs.openstack.org/61/610661/4/check/test-release-openstack/17eca5b/job-output.txt.gz#_2018-11-07_14_04_51_887972 | 22:43 |
clarkb | lbragstad: I think that is the twine based linting of your python package | 22:44 |
clarkb | lbragstad: basically python tooling is saying your package is bad please fix it | 22:44 |
dhellmann | yeah, look at the links in your readme | 22:44 |
hogepodge | samyaple set up all those jobs and is mia, so there's a bit of learning going on. | 22:44 |
*** dpawlik has quit IRC | 22:44 | |
lbragstad | ok - so glance master would be broken, too? | 22:45 |
dhellmann | lbragstad : that job only runs if someone tries to change the packaging files (including the readme) | 22:45 |
lbragstad | aha - ok that makes sense | 22:45 |
dhellmann | you can run the check locally by installing twine and docutils, building an sdist, then running twine check on the output file | 22:46 |
dhellmann | in this case there is almost certainly a redundant link with the title "release notes" in the readme | 22:46 |
dhellmann | to fix that you can make them anonymous links (use double underscore at the end like : `blah <url>`__) | 22:47 |
*** priteau has quit IRC | 22:47 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Pin bridge.o.o to ansible 2.7.0, add devel testing job https://review.openstack.org/614894 | 22:51 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: bridge.o.o: Use latest openstacksdk https://review.openstack.org/615982 | 22:51 |
lbragstad | thanks dhellmann clarkb | 22:51 |
*** rh-jelabarre has quit IRC | 22:54 | |
*** slaweq has quit IRC | 23:01 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [dnm] testing depends-on for openstacksdk in devel job https://review.openstack.org/616372 | 23:06 |
fungi | okay, back and catching up now | 23:11 |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: [dnm] testing depends-on for openstacksdk in devel job https://review.openstack.org/616372 | 23:15 |
clarkb | I tried inducing load on apache by downloadign the same file 80 times in different processes and that didn't do it either (which would be similar ish to what the logprocessors do) | 23:16 |
*** rlandy is now known as rlandy|bbl | 23:29 | |
ianw | corvus / clarkb: https://review.openstack.org/#/c/614894 & https://review.openstack.org/#/c/615982 updated to use zuul checkouts for devel job, thanks. (it even works https://review.openstack.org/#/c/616372 :) | 23:31 |
clarkb | ianw: other than needing to figure out router things for citycloud did you get what you needed for brinign up the armci cloud? | 23:31 |
ianw | clarkb: i think so, the cloud launcher has at least run there now so i'll try bringing up a mirror shortly | 23:32 |
ianw | clarkb: have we restarted nodepool launcher with recent sdk's lately? | 23:35 |
clarkb | ianw: yes, I did so last week | 23:36 |
clarkb | and removed all the nodepool nodes from the emergency file at that time | 23:36 |
ianw | clarkb: hrm, we still don't seem to be getting stats; i wonder if we missed the fixes or something else ... | 23:36 |
ianw | 0.19.0 according to pip | 23:37 |
ianw | that should have the fix ... hrm | 23:38 |
clarkb | ianw: I think I caught a bug in the ansible and sdk install changes (please double check me) | 23:39 |
clarkb | on the logs front no issues sicne I started tailing the error log. Clearly we should just all look at the logs server sternly and then it will work | 23:39 |
clarkb | ianw: also we've been iterating on https://etherpad.openstack.org/p/WUNOTv8MuP for opendev messaging. If you want to take a look at that sometime. My goal is to send it out tomorrow morning local time | 23:41 |
ianw | clarkb: replied, it's as i intended but i'm very open to better ways of doing it ... | 23:42 |
clarkb | ianw: looking at that chain of logic though we'll set the var to latest if it is not defined. Then the next block won't set the _foo_var name and we'll omit version at the end because _foo_var isn't set? | 23:43 |
clarkb | ianw: its that middle block not setting the _foo_var when latest that I think breaks the first block's intent? | 23:43 |
ianw | clarkb: that's right, because "latest" is a special variable where we don't want to set the version:, but the state: | 23:44 |
*** dhellmann has quit IRC | 23:45 | |
clarkb | ah | 23:45 |
clarkb | this is ansible being fancy | 23:45 |
clarkb | ok | 23:45 |
ianw | yeah, i guess as far as the pip module is concerned a version could be anything, so it's quite happy trying to do "pip install package@latest" | 23:46 |
*** eharney has quit IRC | 23:47 | |
clarkb | I've updated my reviews and left a breadcrumb for anyone that is confused like me | 23:47 |
ianw | i think it's a bit nicer to hide the complexity in the role, and then at the top level we just either set the version to "x.y.z" or "latest" and don't have to worry about fiddling state tags, etc | 23:49 |
clarkb | ya | 23:49 |
clarkb | ianw: fwiw I'm not seeing anything wrong with the nodepool and sdk installations | 23:51 |
clarkb | ianw: is https://git.openstack.org/cgit/openstack/openstacksdk/commit/?id=dd5f0f68274df4106902aa71a3f882d70c673dab the fix for stats? | 23:52 |
clarkb | ianw: I want to say nodepool doesn't use the taskmanager in sdk yet | 23:52 |
ianw | clarkb: that was my fix but clearly something is wrong ... | 23:56 |
clarkb | ianw: I think it may be beacuse nodepool doesn't use that code? I'm working to confirm | 23:56 |
ianw | i'm not seeing any "Manager %s ran task %s in %ss" messages being trigger | 23:57 |
clarkb | ianw: ya it subclasses the sdk class btu doesn't super anything | 23:57 |
clarkb | so submit_task for example is all nodepool | 23:57 |
clarkb | same with post_run_task | 23:58 |
ianw | hrmm, i can't see that post_run_task is being triggered at all ... and then why did we break with the new sdk? | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!