*** akahat is now known as akahat|ruck | 05:00 | |
*** ysandeep|out is now known as ysandeep | 05:05 | |
*** chkumar|rover is now known as chandankumar | 05:05 | |
*** arxcruz is now known as arxcruz|rover | 06:54 | |
noonedeadpunk | that is _suuuper_ interesting read | 07:15 |
---|---|---|
noonedeadpunk | but right now we use 2.12.6 and kind of see the issue in CI? | 07:16 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-config_template master: Use release-ansible-collections from project-config https://review.opendev.org/c/openstack/ansible-config_template/+/850666 | 07:38 |
*** ysandeep is now known as ysandeep|lunch | 07:44 | |
*** ysandeep|lunch is now known as ysandeep | 09:42 | |
mgariepy | hmm. sad news.. i cannot reproduce with only the ssh plugins on a fresh jammy container | 11:44 |
mgariepy | https://paste.openstack.org/show/bdoLY0fLP7dEHxaFfOGo/ | 12:11 |
mgariepy | hmm. | 12:12 |
mgariepy | https://zuul.opendev.org/t/openstack/build/4211b9d532a247029df82a57cd7e2fa3/log/job-output.txt#35 | 12:13 |
mgariepy | hmm running a 500 loop on 2.12.6 aio | 12:21 |
mgariepy | ho.. just took a bit longer.. | 12:23 |
jrosser | is that the same? i a zuul job i don't think i've ever seen stdout/stderr have anything other than "" | 12:30 |
mgariepy | https://zuul.opendev.org/t/openstack/build/4211b9d532a247029df82a57cd7e2fa3/log/job-output.txt#13621 | 12:30 |
mgariepy | thats xena.. ansible 2.11.6 | 12:30 |
mgariepy | in the same AIO i can reproduce the error on 2.12.6, 2.12.7 and 2.13.2 | 12:32 |
mgariepy | looks like our connection plugin is causing issue | 12:32 |
mgariepy | i do run the same tests in 2 containers (1 jammy, 1 focal) on my laptop running 200 long loops and it doesn't crash | 12:33 |
mgariepy | in my aio, ansible 2.12.7+ does crash in less than 20 loops most of the time | 12:34 |
mgariepy | 2.12.6 took 96 iteration | 12:34 |
mgariepy | looks to me like it's a race on the control socket | 12:35 |
mgariepy | with our connection plugin | 12:36 |
mgariepy | oops ERROR! Exceeded maximum object depth. This may have been caused by excessive role recursion. maximum recursion depth exceeded | 12:50 |
mgariepy | jrosser, i think it's the -vvvv that changes it. | 12:54 |
mgariepy | that's without -vvv : https://paste.openstack.org/show/bySeiRKYlalcPruPPYgR/ | 12:57 |
*** ysandeep is now known as ysandeep|afk | 13:07 | |
mgariepy | i think it has to do with the retry decorator but my python is not quite good enough to try to fix that. | 13:11 |
spatel | jamesdenton around? | 13:20 |
spatel | I am stuck here in ovn-bgp-agent deployment, any idea what is wrong here - https://paste.opendev.org/show/b0jSgXWlkkkws8Wp18Xz/ | 13:20 |
*** ysandeep|afk is now known as ysandeep | 13:31 | |
*** ysandeep is now known as ysandeep|mtg | 13:31 | |
jamesdenton | is that the latest FRR? | 13:53 |
spatel | jamesdenton yes.. 7.2.1 | 14:03 |
jamesdenton | i don't think that's the latest? | 14:03 |
spatel | If you look at openstack-neutron channel i am already talk to one of developer | 14:03 |
jamesdenton | i see | 14:03 |
spatel | made little progress but stuck again in OVS schema related stuff | 14:03 |
jamesdenton | kk | 14:03 |
spatel | This is latest error - https://paste.opendev.org/show/bYxy4cBWygOQ21JgEJZb/ | 14:04 |
spatel | Error starting thread.: AssertionError - not sure what is that | 14:05 |
jamesdenton | i guess it didn't like: assert table_name in schema.tables | 14:06 |
spatel | i am running latest devstack version of openstack and ovn 20.03 | 14:07 |
spatel | latest version of ovn is 22.03 (LTS) | 14:08 |
spatel | just curious may be that is the issue because i am on very older version of ovn | 14:10 |
jamesdenton | quite possible | 14:15 |
spatel | jamesdenton i talk to one of developer and he suggested go back 5 commit in ovn-bgp-agent (he believe its lb related issue) | 14:28 |
spatel | git question how do i check out specific SHA in git :) | 14:28 |
spatel | let me google it out | 14:29 |
noonedeadpunk | `git checkout $SHA`? :D | 14:29 |
spatel | help is here let me try | 14:29 |
mgariepy | noonedeadpunk, how is your python ? | 14:34 |
noonedeadpunk | you literally had an answer in your question | 14:34 |
noonedeadpunk | mgariepy: meh, I'd say quite average. why? | 14:34 |
mgariepy | our plugin seems to cause issue | 14:34 |
mgariepy | our ssh plugins** | 14:34 |
mgariepy | from what i understand it's only a subclass of the original one. but it doesn't seems to work correctly | 14:35 |
noonedeadpunk | yeah, we basically parenting it. I wonder if it's indeed shs plugin and not our strategy thing? | 14:37 |
noonedeadpunk | but yes, likely you're right about connection plugin | 14:38 |
noonedeadpunk | mgariepy: so the way to reproduce is just loop for gather facts in aio? | 14:39 |
mgariepy | i think it's only a mather to delegate the tasks. | 14:40 |
mgariepy | from the error i get it's not the setup module that fail but the ssh connection | 14:40 |
mgariepy | this: https://paste.openstack.org/show/b8sxbUflXJJXF7PaiXZw/ | 14:43 |
mgariepy | running it without -vvvv does crash faster | 14:43 |
mgariepy | the debug adds some delay that makes the code not crashing as much :D | 14:44 |
noonedeadpunk | and since you run against just aio, this is not reproducable once you don't use our conenction plugin? | 14:45 |
mgariepy | lol my play name is not great | 14:45 |
mgariepy | i've not beeing able to | 14:46 |
noonedeadpunk | damn,.... I can't really dig into that right now despite it's super interesting | 14:46 |
mgariepy | as a workaround/ugly hack we could always se the controlpersist to something like 600s | 14:47 |
mgariepy | the default is 60s | 14:48 |
mgariepy | so i guess it should happens 10x less.. | 14:49 |
noonedeadpunk | mgariepy: well, that looks quite differently from what we have https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L462-L538 | 14:49 |
noonedeadpunk | and basically, exec_command is not covered with retry, as what's calling it is covered | 14:50 |
mgariepy | yes but we don't overwrite it. | 14:50 |
noonedeadpunk | it's the only place where retry is used? https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/connection/ssh.py#L422 | 14:50 |
noonedeadpunk | so I have some concerns if we should have our retry implementation nowadays, as it seems to be covered now.... | 14:51 |
mgariepy | i think it's only for the lxc exect stuff | 14:51 |
mgariepy | ttps://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/connection/ssh.py#L503-L507 | 14:51 |
mgariepy | https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/connection/ssh.py#L503-L507 | 14:52 |
mgariepy | should call https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L1178 | 14:52 |
mgariepy | with it's decorator | 14:52 |
mgariepy | not ours | 14:52 |
noonedeadpunk | yeah | 14:53 |
mgariepy | unless the issue it present on the ssh plugin but our code add just enough delay to crash it? | 14:53 |
noonedeadpunk | uh, let me try it out to reproduce :D | 14:54 |
mgariepy | it's quite puzzeling haha | 14:54 |
noonedeadpunk | export ControlPersist is essential for that, right? | 14:55 |
mgariepy | it does make the issue reproduce frequently | 14:55 |
noonedeadpunk | hm...... | 15:00 |
noonedeadpunk | on loop 50 and works fine so far.... | 15:00 |
mgariepy | with our plugin? | 15:00 |
noonedeadpunk | yup | 15:00 |
mgariepy | without -vvv ? | 15:00 |
noonedeadpunk | without any -v | 15:00 |
mgariepy | waht version of ansible ? | 15:01 |
noonedeadpunk | no freaking idea :D Could be 2,13.2 | 15:01 |
mgariepy | haha | 15:01 |
mgariepy | lol | 15:01 |
noonedeadpunk | will tell you once it finish | 15:01 |
noonedeadpunk | yes, 2.13.2 | 15:02 |
mgariepy | i dont get up to 10 on my server. | 15:02 |
mgariepy | stupid race condition | 15:02 |
mgariepy | your server is either too fast or too slow ;p | 15:02 |
noonedeadpunk | sounds like it also depends on performance of hardware | 15:02 |
mgariepy | if it's race it sure is | 15:03 |
noonedeadpunk | already on 110 | 15:03 |
mgariepy | you did bootstrap-ansible.sh ? | 15:04 |
noonedeadpunk | sure | 15:06 |
noonedeadpunk | can place your key.... | 15:06 |
mgariepy | what is your hw ? | 15:06 |
noonedeadpunk | not sure what's underneath, but virtualized as Haswell-noTSX-IBRS | 15:07 |
noonedeadpunk | and have 6 cores/12Gb RAM on VM | 15:07 |
mgariepy | 4 core / 30gb ram | 15:07 |
mgariepy | on ceph, sata ssds | 15:07 |
noonedeadpunk | ah, ceph with nvmes :) | 15:08 |
mgariepy | i'm too poor for that haha | 15:09 |
noonedeadpunk | but still, ssds should be fine... | 15:09 |
noonedeadpunk | so I can not reproduce that.... | 15:12 |
mgariepy | https://launchpad.net/~mgariepy/+sshkeys | 15:14 |
noonedeadpunk | hardware is shitty - E5-2680 v3 | 15:14 |
jrosser | I can try this later on a xeon gold type cpu | 15:16 |
mgariepy | it fails. | 15:20 |
noonedeadpunk | yeah, have these as well | 15:20 |
noonedeadpunk | whaat | 15:20 |
mgariepy | must be me.. | 15:20 |
noonedeadpunk | now I really do wonder if that could be some SSH env you bring in ? | 15:21 |
noonedeadpunk | ok, mgariepy, tell me waht you did, step by step :D | 15:22 |
mgariepy | first time 11. | 15:22 |
noonedeadpunk | I'm right now on 40 | 15:22 |
mgariepy | ssh ubuntu@yourvm | 15:22 |
mgariepy | sudo -i | 15:22 |
mgariepy | cd /home/ubuntu | 15:22 |
mgariepy | export ANSIBLE_SSH_ARGS="${ANSIBLE_SSH_ARGS:-'-C -o ControlMaster=auto -o ControlPersist=2s'}" | 15:22 |
mgariepy | openstack-ansible loop-test.yml | 15:23 |
mgariepy | i'm at 51 on the second run | 15:23 |
mgariepy | not failed yet. | 15:23 |
*** dviroel is now known as dviroel|lunch | 15:25 | |
*** ysandeep|mtg is now known as ysandeep|out | 15:26 | |
*** elenalindq_ is now known as elenalindq | 15:27 | |
*** johnsom_ is now known as johnsom | 15:27 | |
mgariepy | look like this time it won't fail | 15:30 |
noonedeadpunk | it's annoying | 15:31 |
mgariepy | well it doesn't fail that often.. but often enough to see it a couple of time per day in the CI.. | 15:32 |
noonedeadpunk | nah, I can't reproduce anywhere.... | 15:58 |
mgariepy | o | 16:01 |
mgariepy | well. it sucks | 16:01 |
mgariepy | fucking race.. | 16:01 |
noonedeadpunk | ok, just cacthed | 16:07 |
noonedeadpunk | and again o_O | 16:07 |
mgariepy | ouf now you know i'm not lying | 16:08 |
noonedeadpunk | lol, but now it's being reproduced each and every time | 16:09 |
mgariepy | have you seen it in your vm ? | 16:10 |
noonedeadpunk | in your | 16:10 |
mgariepy | so i was lucky :D i had just the right combinaison of hw | 16:11 |
mgariepy | your vm just did it.. at iteration 159.. | 16:13 |
*** dviroel|lunch is now known as dviroel | 16:42 | |
noonedeadpunk | I think it's not our plugin that is broken, but rather some changes to ansible one that are | 16:43 |
noonedeadpunk | As once I `export ANSIBLE_TRANSPORT="ssh"` it still re-occured | 16:44 |
noonedeadpunk | But, once I did `export ANSIBLE_TRANSPORT="paramiko"` I can't reproduce again | 16:44 |
noonedeadpunk | So likely we just need to adjust our connection plugin to leverage paramiko rather then ssh.... | 16:44 |
noonedeadpunk | or well. just make another one | 16:45 |
mgariepy | it's been broken for a while. | 16:45 |
noonedeadpunk | paramiko? | 16:45 |
mgariepy | ansible | 16:45 |
mgariepy | it still have issue on xena. | 16:45 |
noonedeadpunk | ah. well. they use smart as default. and I can recall reading in release notes that they will prefer paramico when they can | 16:46 |
mgariepy | not sure how far we need to test to see. | 16:46 |
noonedeadpunk | likely that was for ansible-core 2.12 | 16:46 |
noonedeadpunk | just to we exact https://paste.openstack.org/show/bwqxhLNtCJroaaBgsgOJ/ | 16:48 |
mgariepy | if you want to change the ansible version in that vm you can | 16:50 |
noonedeadpunk | basically, paramiko does not support ControlPersist | 16:51 |
jrosser | iirc paramiko doesnt understand anything in ~/.ssh/config | 16:52 |
noonedeadpunk | you're right here | 16:54 |
jrosser | that already causes us issues with with the ansible networking modules | 16:55 |
noonedeadpunk | anyway I can totally reproduce it even without our connection plugin | 17:01 |
noonedeadpunk | So it's smth else.... | 17:01 |
jrosser | oh thats good | 17:02 |
noonedeadpunk | is it?:) | 17:02 |
jrosser | well, we would have no hope making a bug report using our connection plugin | 17:03 |
noonedeadpunk | yeah. but I can't say I catch issue when installed ansible inside venv | 17:03 |
noonedeadpunk | and not sourcing our .rc | 17:03 |
mgariepy | might be interaction with our inventory script ? | 17:04 |
noonedeadpunk | I passed inventory | 17:05 |
noonedeadpunk | Also I would say ansible from venv is kind of faster | 17:05 |
noonedeadpunk | so maybe also bunch of vars... | 17:05 |
jrosser | thats actually more tech debt we have - the script should be converted to an inventory plugin | 17:05 |
spatel | jamesdenton ovn-bgp-agent working :) | 17:06 |
spatel | I am able to advertise vm fips in EVPN fabric | 17:06 |
spatel | Only Floating IPs getting advertise not VM original address.. | 17:07 |
noonedeadpunk | but totally smth out of our .rc make ansible fail | 17:12 |
noonedeadpunk | as once I fully source it issue gets reproduced | 17:12 |
mgariepy | strip 1 by 1 env from it >? | 17:12 |
noonedeadpunk | yeah, but I'm populating isntead ;) | 17:12 |
mgariepy | wouldn't it be faster the otherway around ? | 17:13 |
mgariepy | lol | 17:13 |
noonedeadpunk | and I've commented out connection plugin before sourcing to be sure it's not it | 17:13 |
mgariepy | ANSIBLE_SSH_PIPELINING ? | 17:15 |
noonedeadpunk | I _think_ it wasn't it | 17:15 |
mgariepy | https://xkcd.com/1722/ | 17:21 |
mgariepy | lol | 17:21 |
noonedeadpunk | :D | 17:22 |
mgariepy | thank god it's friday ! :) haha | 17:22 |
noonedeadpunk | damn, now I can't reproduce at all again :facepalm: | 17:26 |
noonedeadpunk | what I did though, I stupidly dropped facts for aio1 from | 17:28 |
*** dviroel is now known as dviroel|afk | 17:28 | |
noonedeadpunk | instead of moving them somewhere else | 17:28 |
mgariepy | do you think it's related ? | 17:32 |
jamesdenton | spatel awesome! can't wait to read about it :) | 17:38 |
spatel | Yes.. i will blog it out.. | 17:39 |
jamesdenton | i am very behind in fun stuff | 17:39 |
spatel | currently figuring out how to advertise tenant network | 17:39 |
spatel | provider network is working fine.. | 17:39 |
spatel | Issue was OVN version, newer version of OVN trying to find Load_Balancer schema table which is not exist in older OVN | 17:41 |
spatel | I move back to 5 commit that fixed my issue. | 17:41 |
jamesdenton | ahh good to know | 17:41 |
jamesdenton | and that FRR error? with the running-config? | 17:41 |
spatel | It was code issue https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/privileged/vtysh.py#L26-L28 | 17:42 |
spatel | This is modified code - https://paste.opendev.org/show/bDvYGmrG21HDh8xRH0K2/ | 17:42 |
spatel | I will submit patch or ask dev to see what is going on | 17:42 |
spatel | command should be split | 17:43 |
spatel | jamesdenton i can see my floating ip getting advertise in BGP fabric - https://paste.opendev.org/show/bSqLjbj3DFcKG2u2E66e/ | 17:45 |
spatel | Trying to find out how to advertise tenant network | 17:46 |
opendevreview | Merged openstack/openstack-ansible-haproxy_server stable/xena: Don't restrict haproxy tunable options https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/850480 | 18:21 |
mgariepy | ANSIBLE_SSH_PIPELINING=True seems to be related. | 18:23 |
mgariepy | jrosser, noonedeadpunk ^^ | 18:23 |
jrosser | `When pipelining is enabled, Ansible does not save the module to a temporary file on the client. Instead it pipes the module to the remote python interpreter's stdin` | 18:29 |
mgariepy | so if the socket drops mid-air it fails.. | 18:30 |
mgariepy | obviously :D | 18:30 |
jrosser | is that -EPIPE? | 18:31 |
jrosser | the -13 | 18:31 |
mgariepy | https://paste.openstack.org/show/bdoLY0fLP7dEHxaFfOGo/ | 18:31 |
jrosser | why would it drop it though - thats odd | 18:34 |
jrosser | `the backgrounded master connection will automatically terminate after it has remained idle (with no client connections) for the specified time` | 18:34 |
mgariepy | because we do not use it often. | 18:34 |
mgariepy | since we moslty probably interract with contrainers via lxc and not ssh. except when we delegate ? | 18:34 |
jrosser | does that ssh to the container with delegate? | 18:36 |
jrosser | it's doesnt still realise the the container is local and not use ssh....? | 18:36 |
jrosser | perhaps i'm confused here too tbh | 18:36 |
jrosser | or perhaps i meen it ssh to aio1 and then use the special stuff in the lxc aware connection plugin | 18:37 |
mgariepy | how do we look at ara report with the sqlite now ? | 18:38 |
jrosser | oh well thats difficult | 18:38 |
mgariepy | lol | 18:38 |
jrosser | do you have a build result handy? | 18:38 |
mgariepy | https://a7e4a39e8fb82330ec44-84709aab2060acc1565f07c661aff448.ssl.cf5.rackcdn.com/846123/2/check/openstack-ansible-deploy-aio_lxc-ubuntu-focal/4211b9d/logs/ara-report/index.html | 18:39 |
jrosser | do you have the link to the job? | 18:39 |
mgariepy | https://zuul.opendev.org/t/openstack/build/4211b9d532a247029df82a57cd7e2fa3/log | 18:39 |
mgariepy | i can reproduce with the playbook without osa :D | 18:48 |
jrosser | \o/ | 18:48 |
mgariepy | ho. wait.. not completly lol | 18:49 |
mgariepy | lol :( | 18:49 |
mgariepy | i'm getting tired haha | 18:49 |
mgariepy | if i disalbe pipelining. it does not fail on openstack-ansible run . | 18:51 |
mgariepy | disable* wow can't type either | 18:51 |
mgariepy | i can reproduce on my laptop now tho.. | 18:57 |
mgariepy | have a great weekends guys | 19:06 |
noonedeadpunk | sorry I had to drop this for today :( | 19:14 |
noonedeadpunk | hopefully will have some time on monday or during weekends | 19:14 |
noonedeadpunk | what if we replace ANSIBLE_SSH_PIPELINING with ANSIBLE_PIPELINING.... I bet nothing will change actually... | 19:18 |
spatel | jamesdenton figured out how to expose tenant network in BGP :) | 20:01 |
spatel | still getting some strange error which i need to figure out before i close this loop | 20:02 |
*** qwebirc59942 is now known as batman | 20:20 | |
*** batman is now known as r98geh31rt | 20:20 | |
manowarrior | Hi all, looking to cross reference osa periodic tasks with https://docs.openstack.org/operations-guide/ops-advanced-configuration.html#implementing-periodic-tasks. I see https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html but that doesn't seem to cover cron stuff eg: image cache pruning, etc. Anyone know a good place to check as a new osa user? | 21:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!