*** rlandy|bbl is now known as rlandy|out | 02:05 | |
*** diablo_rojo_phone is now known as Guest86 | 03:40 | |
*** bhagyashris_ is now known as bhagyashris | 04:55 | |
*** ysandeep|out is now known as ysandeep | 04:56 | |
*** ysandeep is now known as ysandeep|afk | 07:19 | |
*** clarkb is now known as Guest106 | 07:31 | |
*** gibi_pto is now known as gibi | 07:56 | |
*** ysandeep|afk is now known as ysandeep | 08:04 | |
*** arxcruz is now known as arxcruz|ruck | 08:06 | |
*** jpena|off is now known as jpena | 08:37 | |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Execute vacuum command after cleaning up DB https://review.opendev.org/c/openstack/ci-log-processing/+/834828 | 09:22 |
---|---|---|
*** ysandeep is now known as ysandeep|lunch | 09:35 | |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Execute vacuum command after cleaning up DB https://review.opendev.org/c/openstack/ci-log-processing/+/834828 | 09:52 |
*** rlandy|out is now known as rlandy | 10:36 | |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Move log files to check to external config file https://review.opendev.org/c/openstack/ci-log-processing/+/833624 | 10:48 |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Move log files to check to external config file https://review.opendev.org/c/openstack/ci-log-processing/+/833624 | 10:49 |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Improve fields in logsender https://review.opendev.org/c/openstack/ci-log-processing/+/833360 | 10:51 |
*** ysandeep|lunch is now known as ysandeep | 10:51 | |
opendevreview | Merged openstack/ci-log-processing master: Add missing parameters for logscraper service https://review.opendev.org/c/openstack/ci-log-processing/+/834116 | 11:16 |
*** dviroel|out is now known as dviroel | 11:19 | |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Move log files to check to external config file https://review.opendev.org/c/openstack/ci-log-processing/+/833624 | 12:54 |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Parse timestamps in log files correctly https://review.opendev.org/c/openstack/ci-log-processing/+/834126 | 12:57 |
arxcruz|ruck | fungi hi, we are getting a ubuntu node on centos-9-stream nodeset did something change? | 13:22 |
arxcruz|ruck | https://dc0059b2deec140455ec-8e6063eece8c96bdec38e25d6079d8b4.ssl.cf5.rackcdn.com/834051/2/gate/tripleo-ci-centos-9-content-provider/bf287b9/job-output.txt | 13:22 |
fungi | arxcruz|ruck: that would be extremely odd | 13:22 |
fungi | looking to see if i can spot the cause | 13:22 |
arxcruz|ruck | 2022-03-23 11:54:48.700386 | localhost | Hostname: ubuntu-focal-iweb-mtl01-0028848617 | 13:22 |
arxcruz|ruck | 2022-03-23 11:54:48.700453 | localhost | Username: zuul | 13:22 |
arxcruz|ruck | 2022-03-23 11:54:48.700529 | localhost | Distro: Ubuntu 20.04 | 13:22 |
arxcruz|ruck | 2022-03-23 11:54:48.700651 | localhost | Provider: iweb-mtl01 | 13:22 |
arxcruz|ruck | 2022-03-23 11:54:48.700720 | localhost | Label: centos-9-stream | 13:22 |
fungi | do you have a link to the build result page for that? it's much easier to browse and investigate | 13:23 |
fungi | otherwise i end up reconstructing it myself by digging the build id out of the log files | 13:23 |
arxcruz|ruck | https://dc0059b2deec140455ec-8e6063eece8c96bdec38e25d6079d8b4.ssl.cf5.rackcdn.com/834051/2/gate/tripleo-ci-centos-9-content-provider/bf287b9/job-output.txt | 13:23 |
arxcruz|ruck | the log | 13:23 |
fungi | the build result page allows me to do things like look at the job description info in the api, and trace its parentage | 13:24 |
fungi | the url reported to gerrit by zuul | 13:24 |
ysandeep | https://zuul.opendev.org/t/openstack/buildset/7887b0b1a23344cb9560b7a45c1ac1d6 ? | 13:24 |
arxcruz|ruck | fungi ^ | 13:25 |
fungi | close enough, i can find the build page from the buildset page, thanks | 13:25 |
arxcruz|ruck | ty ysandeep | 13:25 |
fungi | though the build result page is what zuul puts in gerrit comments, not the buildset page, for reference | 13:25 |
fungi | https://zuul.opendev.org/t/openstack/build/bf287b96d99442c58222fde07992e5d7 is the url i was looking for, looks like | 13:26 |
ysandeep | fungi: ack, thanks for informing | 13:27 |
arxcruz|ruck | fungi ack, next time i'll check it out :) usually we go more on the console log file | 13:27 |
opendevreview | Merged openstack/ci-log-processing master: Improve fields in logsender https://review.opendev.org/c/openstack/ci-log-processing/+/833360 | 13:29 |
fungi | this is definitely strange | 13:34 |
fungi | 2022-03-23 11:53:01,706 INFO nodepool.NodeLauncher: [e: e58f618f98a74351b8bc6867b3e803ce] [node_request: 200-0017642005] [node: 0028979246] Creating server with hostname centos-9-stream-iweb-mtl01-0028979246 in iweb-mtl01 from image centos-9-stream | 13:34 |
fungi | 2022-03-23 11:53:03,134 DEBUG nodepool.NodeLauncher: [e: e58f618f98a74351b8bc6867b3e803ce] [node_request: 200-0017642005] [node: 0028979246] Waiting for server 7d657092-6bc2-42ef-8e54-8b14b7586634 | 13:35 |
fungi | that server instance uuid matches what's in the inventory: https://zuul.opendev.org/t/openstack/build/bf287b96d99442c58222fde07992e5d7/log/zuul-info/inventory.yaml#34 | 13:36 |
fungi | it was creating 0028979246 but the job ended up running on older 0028848617 | 13:37 |
fungi | i'll see if i can find when/where 0028848617 was created | 13:38 |
fungi | but my gut says that's a rogue vm which never got cleared out in iweb's environment and got into an arp fight with the correct node for the job | 13:38 |
arxcruz|ruck | rlandy ^ | 13:39 |
frickler | fungi: that node is running for 7d: | 13:39 |
frickler | root@ubuntu-focal-iweb-mtl01-0028848617:~# w 13:39:11 up 7 days, 11:09, 1 user, load average: 0.00, 0.00, 0.00 | 13:39 |
ysandeep | fungi, we already hit that issue thrice in last 3 run: https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-content-provider&skip=0 | 13:40 |
fungi | yeah, week-old ubuntu node at the ip address nodepool expected the new node to have | 13:40 |
ysandeep | but they all have same Interface IP: | 13:40 |
fungi | ysandeep: always in inap-mtl01? | 13:40 |
fungi | yeah | 13:40 |
fungi | that pretty much confirms it | 13:40 |
fungi | nova probably failed to completely delete the node and then lost track of it existing | 13:41 |
fungi | and the ip address keeps getting handed to new nodes at random | 13:41 |
ysandeep | that's interesting, I think I have seen that somewhere in an old OpenStack env, I wonder which version of Openstack we run on? | 13:42 |
fungi | supposedly inap is discarding their openstack services completely a week from tomorrow, so i wouldn't be surprised if they've stopped any manual cleanup/grooming they might previously have been performing on that environment | 13:42 |
fungi | but since we basically have no operator contacts there any longer, there's probably not much we can do about this other than turn them off a week early and take the (significant) capacity hit | 13:44 |
frickler | we could shutdown the rogue node manually until then | 13:44 |
fungi | oh, yeah that's not a bad idea | 13:45 |
frickler | not sure though what neutron would do with the port then | 13:45 |
fungi | do a `sudo poewroff` so it hopefully won't get rebooted if the host restarts | 13:45 |
fungi | er, `sudo poweroff` | 13:45 |
frickler | but at least it should give a network failure instead of the current issue | 13:45 |
fungi | i'll do that now. it can't be any worse than having jobs run on a reused node which may be for an entirely different distro/version | 13:46 |
frickler | did it already | 13:46 |
frickler | now I get onto a different node immediately | 13:46 |
fungi | yeah, in that case i probably powered off the wrong (newer) node, but whatever job was using that was also almost certainly struggling or about to be | 13:47 |
fungi | yep, looking back, shell prompt says it was node 0028982952 | 13:48 |
fungi | i should have compared that before issuing the poweroff | 13:48 |
frickler | nodepool should clean that node up soon, so the impact shouldn't be too bad. and I was surprised myself when I still could connect to that address after the poweroff | 13:54 |
*** dasm|off is now known as dasm | 13:58 | |
fungi | yeah, as soon as the executor for whatever job was running there ceased to be able to connect, it would have ended the build and probably retried it | 14:22 |
opendevreview | Teresa Ho proposed openstack/project-config master: Add Istio app to StarlingX https://review.opendev.org/c/openstack/project-config/+/834896 | 14:37 |
*** Guest106 is now known as clarkb | 15:24 | |
*** dviroel is now known as dviroel|lunch | 15:36 | |
opendevreview | Teresa Ho proposed openstack/project-config master: Add Istio app to StarlingX https://review.opendev.org/c/openstack/project-config/+/834896 | 15:48 |
*** ysandeep is now known as ysandeep|out | 16:09 | |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: WIP Parse timestamps in log files correctly https://review.opendev.org/c/openstack/ci-log-processing/+/834126 | 16:31 |
*** dviroel|lunch is now known as dviroel | 16:46 | |
*** jpena is now known as jpena|off | 17:40 | |
*** timburke__ is now known as timburke | 20:58 | |
*** dviroel is now known as dviroel|afk | 20:59 | |
*** rlandy is now known as rlandy|out | 21:55 | |
*** dviroel|afk is now known as dviroel\ | 23:44 | |
*** dviroel\ is now known as dviroel | 23:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!