*** rfolco has joined #openstack-third-party-ci | 01:11 | |
*** apoorvad has quit IRC | 01:56 | |
*** rfolco has quit IRC | 03:26 | |
*** openstackgerrit has quit IRC | 06:17 | |
*** openstackgerrit has joined #openstack-third-party-ci | 06:18 | |
*** rfolco has joined #openstack-third-party-ci | 12:09 | |
*** apoorvad has joined #openstack-third-party-ci | 15:53 | |
*** asselin has joined #openstack-third-party-ci | 15:56 | |
*** asselin__ has quit IRC | 15:58 | |
*** asselin has quit IRC | 16:23 | |
cbader | mmedvede, are you there | 19:21 |
---|---|---|
mmedvede | hi cbader | 19:21 |
cbader | hi I have an issue with my jenkins and nodepool not talking, I have restarted zuul zuul-merger jenkins and nodepool and I can't get them to respond to each other any clue were to look? | 19:22 |
mmedvede | cbader: first thing to check is if you have firewall, e.g. iptables | 19:23 |
mmedvede | iptables -L | 19:23 |
mmedvede | try disabling it | 19:23 |
mmedvede | if you have any rules there | 19:23 |
cbader | policy ACCEPT for input forward and output | 19:24 |
cbader | no targets listed | 19:24 |
cbader | this just stopped working last night. | 19:24 |
cbader | was working yesterday fine | 19:25 |
mmedvede | cbader: so nodepool boots a VM, but fails to register it with jenkins? | 19:25 |
cbader | mmedvede, yes all services are on the same vm | 19:25 |
cbader | mmedvede, zuul, jenkins, nodepool on same vm | 19:26 |
mmedvede | ok, that makes it easier | 19:26 |
mmedvede | jenkins/nodepool use zmq to talk | 19:26 |
cbader | mmedvede, I even stopped them all and rebooted the vm | 19:26 |
mmedvede | there is a tool in nodepool repo, nodepool/tools/zmq-stream.py | 19:26 |
cbader | mmedvede, yes I know it is configured for the jobs | 19:27 |
cbader | mmedvede, is that under system-config | 19:27 |
mmedvede | cbader: I would guess you should find it under /opt/nodepool/tools/zmg-stream.py | 19:28 |
mmedvede | on your all-in-one VM | 19:28 |
cbader | mmedvede: found it | 19:28 |
cbader | mmedvede: did python zmq-stream.py returned ready. | 19:29 |
mmedvede | I am checking if the 8888 port is correct | 19:30 |
cbader | mmedvede, is there another way to check if it is up. I didn't find it as a process | 19:30 |
cbader | mmedvede, my global is set to 8888 | 19:30 |
*** apoorvad has quit IRC | 19:31 | |
cbader | mmedvede: so zmq-stream.py uses 8888 and it responded ready so that seems to work. put if the script works does that mean the port is not being used by jenkins at the time. | 19:34 |
mmedvede | cbader: which process were you looking for? | 19:34 |
mmedvede | cbader: is your jenkins installed on the same VM? | 19:34 |
cbader | mmedvede: yes all-in-one | 19:34 |
cbader | mmedvede, my nodepool list only show status of ready even when they are being used by jenkins so that is why I was wondering if there was some change made upstream I didn't see anything. | 19:36 |
mmedvede | cbader: I've seen this before | 19:37 |
mmedvede | cbader: I normally fix it by restarting nodepool and jenkins | 19:38 |
cbader | mmedvede: I have tried that twice seems to come back | 19:38 |
mmedvede | cbader: ordering could matter too | 19:38 |
mmedvede | cbader: also make sure you do not have left-over processes. So instead of simply restarting, stop the process, make sure it has actually stopped, and then start it | 19:39 |
cbader | mmedvede: I wonder if I need to remove the vms from the providers before starting nodepool and jenkins | 19:39 |
cbader | mmedvede: ok will do thank you for your time. | 19:40 |
mmedvede | once it works (VMs get marked as being used), you'll need to manually cleanup VMs that where never marked as used | 19:40 |
cbader | mmedvede: great thanks will do. have a nice day. so where are you located. | 19:41 |
mmedvede | cbader: yws. let me know if it did not help | 19:41 |
mmedvede | cbader: I am in Texas | 19:41 |
cbader | mmedvede: oh so two hours ahead. I am in California | 19:42 |
mmedvede | it is still day here :) | 19:42 |
cbader | mmedvede: my day starts at 5:00 so almost over if I can fix this else stay till done. | 19:43 |
cbader | mmedvede: do you work in office or from home. I am in office 5 days a week | 19:44 |
mmedvede | cbader: frequently from home, most of my team is remote | 19:45 |
mmedvede | cbader: which CI are you running? | 19:46 |
mmedvede | found it, HP Storage CI | 19:47 |
cbader | mmedvede: I run openstackci to report cinder, manila, for 3par then two others for inside company to test on. | 19:47 |
*** openstackgerrit has quit IRC | 19:48 | |
cbader | mmedvede: yup sorry, I have been having an internal network error with pypi and apt-get which is causing all my error. | 19:48 |
*** openstackgerrit has joined #openstack-third-party-ci | 19:48 | |
mmedvede | I've got pypi-mirror working, also using aptcacher to mirror apt repos | 19:49 |
mmedvede | but it is not HA | 19:49 |
mmedvede | saves tons of traffic to outside net | 19:50 |
cbader | mmedvede: for some reason I get can't connect. the blades and the mirror on the same blade enclosure so they don't even go to the wire. | 19:51 |
*** apoorvad has joined #openstack-third-party-ci | 20:15 | |
cbader | mmedvede: so got it back up. shutdown all services zuul,nodepool, jenkins, cleared all running vms, rebooted providers, cleared node entries in mysql db, then restarted jenkins, nodepool, zuul in order and zmq is working now. thanks for you help. | 20:20 |
mmedvede | cbader: awesome | 20:22 |
mmedvede | you're welcome | 20:23 |
cbader | mmedvede: get to leave on time for a change worked all last weekend trying to figure out the libffi.h issue with my nodes. | 20:27 |
mmedvede | sorry to hear that. cbader, next time it happens, it would be faster to figure out | 20:28 |
mmedvede | dependency problems always lurking around | 20:28 |
cbader | mmedvede: only problem job is going away in Oct 31 | 20:29 |
mmedvede | why? | 20:29 |
cbader | mmedvede, well it is not fixed my jenkins shows slaves offline, so lost communiction with nodepool. crud. | 20:50 |
mmedvede | cbader: I forgot to ask, are you pinning your nodepool/zuul versions? That would help to avoid random failures | 20:52 |
mmedvede | it does make sense to pin nodepool, and only update it manually from time to time | 20:52 |
cbader | mmedvede, yes I am | 20:52 |
cbader | mmedvede; zuul_revision: 7fca9c1cc625dec94b1c06a6a65216cd1a041e85 and nodepool_revision: c8c680912384f041dd1a824e9970c98dc74c7ef0 | 20:54 |
cbader | mmedvede: it has been pretty stable. but yesterday night went in the toilet. | 20:55 |
mmedvede | cbader: also, slaves offline on jenkins does not necessarily means connection lost to nodepool | 20:58 |
mmedvede | if you deleted all VMs manually, you should expect to see them as offline once jenkins starts back up | 20:59 |
cbader | mmedvede: this is after it recreated them new after the restart. the were reporting normal then went off-line | 20:59 |
mmedvede | cbader: ok, just sanity checking | 21:00 |
cbader | mmedvede: so this is different. the ssl connection can't be made. connection closed that is different | 21:00 |
mmedvede | I am out of ideas, other than going deeper into logs and debugging lower level | 21:00 |
cbader | mmedvede: need to see ssh key got messed up somehow thanks. might just leave and work on it at home. | 21:02 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!