dviroel|rover | master promoted, reverting | 00:08 |
---|---|---|
dviroel|rover | current-tripleo/2022-08-26 00:02 | 00:08 |
rlandy | nice | 01:05 |
rlandy | train is promoting | 01:05 |
*** rlandy is now known as rlandy|out | 01:33 | |
*** pojadhav|out is now known as pojadhav | 01:35 | |
*** dviroel|rover is now known as dviroel|out | 01:37 | |
*** ysandeep|out is now known as ysandeep | 02:21 | |
ysandeep | good morning oooci o/ | 02:25 |
ysandeep | rlandy|out, regarding sc01 slowness card.. its not only sc01 - all the rdo jobs are slow | 03:28 |
ysandeep | https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-wallaby/bbc1dd7/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz | 03:28 |
ysandeep | took ~30 mins: 022-08-25 18:51:02.273791 | fa163e1b-009f-a236-db4b-000000000889 | TIMING | tripleo_firewall : Manage firewall rules | standalone | 0:29:08.429153 | 1684.59s | 03:28 |
ysandeep | Simply running : iptables -t filter -L INPUT - takes 40 seconda | 03:28 |
ysandeep | iptables -t filter -L INPUT -n returns quite quickly | 03:29 |
ysandeep | I am debugging with takashi | 03:29 |
ysandeep | https://www.fir3net.com/UNIX/Linux/iptables-l-output-displays-slowly.html | 03:29 |
ysandeep | https://serverfault.com/questions/791911/centos-extremely-slow-dns-lookup suggests we remove 127.0.0.1 from resolv.conf entry and that resolved the issue | 03:55 |
ysandeep | looks like there was a historical reason they added 127.0.0.1 : https://opendev.org/openstack/tripleo-ci/commit/5d612318c95b9b3ff78e66e91d5c225274cb8b09 | 04:06 |
*** soniya29|ruck is now known as soniya29 | 04:56 | |
ysandeep | finally, we know what the issue is unbound service not present in RDO images | 06:04 |
ysandeep | that's the difference with Upstream.. causing slowness in dns resolution | 06:05 |
ysandeep | let me check with infra and save ~40 min on each of our job run | 06:05 |
ysandeep | fyi.. Incase some want to know the details https://bugs.launchpad.net/tripleo/+bug/1983718/comments/10 | 06:06 |
ysandeep | updated: https://bugs.launchpad.net/tripleo/+bug/1983718 and https://trello.com/c/R6SuOv6E/2661-cixlp1983718tripleociproa-periodic-master-scen1-standalone-fails-timeout-manage-firewall-rules | 06:25 |
*** pojadhav is now known as pojadhav|ruck | 06:47 | |
*** jm1|ruck is now known as jm1|rover | 06:48 | |
jm1 | happy friday #oooq ysandeep pojadhav|ruck :) | 06:48 |
pojadhav|ruck | jm1, happy friday you too :) | 06:48 |
ysandeep | jm1, good morning o/ | 06:49 |
frenzyfriday | hey, does anyone know this error: SELinux relabeling of /etc/pki is not allowed I am gettingthis while getting telegraf container up in downstream. I have already set selinux on the host to permissive | 07:33 |
frenzyfriday | I have tried adding privileged: true to the docker compose | 07:36 |
jm1 | frenzyfriday: so the line this is coming from is https://github.com/rdo-infra/ci-config/blob/63b70523433c31df47eb5cddef19a7a3e9c96a2d/ci-scripts/infra-setup/roles/rrcockpit/files/docker-compose.yml#L87 | 07:44 |
jm1 | frenzyfriday: question is, why do we have this volume in the first place? | 07:44 |
frenzyfriday | yep, it says we are son supposed to mount /etc and some other dirs | 07:45 |
jm1 | frenzyfriday: i guess the only reason is that we want to pass the rh internal ca cert to the container? | 07:45 |
frenzyfriday | We need this to get the certs to connect to downstream | 07:45 |
frenzyfriday | I was wondering if mounting does not work probably we have to run copy certs to the container itself in the dockerfile | 07:45 |
frenzyfriday | But how did it work for the c7 internal cockpit vm? | 07:46 |
frenzyfriday | But in this case ^ whenever the certs change , we will have to rebuild the container | 07:46 |
jm1 | frenzyfriday: c7 had docker which behaves differently from podman in c8/9 in some parts | 07:46 |
chandankumar | frenzyfriday: regarding selinux relabelling issue https://github.com/containers/podman/issues/2379#issuecomment-466770671 might help | 07:48 |
jm1 | frenzyfriday: why not download the ca cert in infra-setup playbook (only for incockpit) to both /etc/pki and another place and then mount it into the container? | 07:48 |
frenzyfriday | yeah, that might work! /m tries | 07:49 |
jm1 | chandankumar, frenzyfriday: from security side this is not a good idea https://bugzilla.redhat.com/show_bug.cgi?id=1594485#c4 | 07:49 |
jm1 | chandankumar, frenzyfriday: we are running containers as root hence we could face exactly the issue described in bz | 07:50 |
*** jpena|off is now known as jpena | 07:52 | |
jm1 | frenzyfriday: we have to add the "ca download step" (to /etc/pki on incockpit) as a task to the ansible playbook/role anyway. so copying it to a safe location as well shouldnt be too hard | 07:53 |
ysandeep | frenzyfriday, jm1 try removing :z from https://github.com/rdo-infra/ci-config/blame/63b70523433c31df47eb5cddef19a7a3e9c96a2d/ci-scripts/infra-setup/roles/rrcockpit/files/docker-compose.yml#L87 | 07:54 |
ysandeep | i think I added that because it was needed for C7 | 07:54 |
jm1 | ysandeep: and it probably still is required because we access https pages which are signed with rh internal ca | 07:55 |
ysandeep | keep the mount just remove :z , I mean revert this patch: https://github.com/rdo-infra/ci-config/commit/ef60d16d9a1ff86ec2087e82f1d26a1b80af77c8 | 07:56 |
ysandeep | not 100% sure that will fix the issue but worth a try. | 07:57 |
jm1 | ysandeep, frenzyfriday: very certainly the rh ca is still required because ruck rover script needs it to access internal pages | 07:57 |
jm1 | ysandeep, frenzyfriday: good point about removing :z, worth a try | 07:59 |
jm1 | ysandeep,frenzyfriday: although it would be nice to use selinux in enforcing mode instead of permissive mode | 08:00 |
ysandeep | +1, remove :z and turn selinux in enforcing mode | 08:02 |
*** ysandeep is now known as ysandeep|away | 08:18 | |
frenzyfriday | containers are up \o/ | 08:29 |
frenzyfriday | ysandeep|away, jm1, chandankumar thank you guys! | 08:29 |
* frenzyfriday now tweaking nginx | 08:29 | |
jm1 | frenzyfriday: awesome! | 08:29 |
chandankumar | yw :-) | 08:32 |
chandankumar | jm1: thank you for the bz link. :-) | 08:33 |
*** pojadhav is now known as pojadhav|ruck | 08:57 | |
*** pojadhav is now known as pojadhav|ruck | 09:03 | |
jm1 | pojadhav|ruck, rlandy|out: rr notes are up to date now. they are now also more or less in sync with cix cards, ordering of known bugs matches "Prodchain Blocked" lane in cix trello board. | 09:16 |
pojadhav|ruck | jm1, I have started looking at upstream components | 09:17 |
jm1 | pojadhav|ruck, rlandy|out: does not look too bad: c8 train compute is 6 days out, c9 wallaby security is 5 days out, the rest is less than 4 days out | 09:17 |
jm1 | * less or equal | 09:18 |
soniya29 | pojadhav|ruck, jm1, let me know when can we have sync-up? | 09:19 |
jm1 | soniya29, pojadhav|ruck: do we want to wait for rlandy|out? i guess she will be here soon | 09:20 |
pojadhav|ruck | yeah.. we can wait for more 1 hr | 09:21 |
pojadhav|ruck | then lets sync | 09:21 |
soniya29 | pojadhav|ruck, jm1, okay | 09:21 |
*** pojadhav|ruck is now known as pojadhav|lunch | 09:22 | |
jm1 | pojadhav: i will start rerunning jobs for upstream | 09:22 |
* pojadhav|lunch will be back in half hr | 09:22 | |
pojadhav|lunch | jm1, sure | 09:22 |
pojadhav|lunch | jm1, based on cix card update I am rerunning valiation mater job here: https://review.rdoproject.org/r/c/testproject/+/40897 | 09:23 |
jm1 | pojadhav|lunch: ack | 09:24 |
frenzyfriday | incockpit: http://10.0.109.28/ finally | 09:41 |
frenzyfriday | The scripts need some change. I will create a patch | 09:42 |
jm1 | frenzyfriday: cool! where do you want to put internal tenant_vars? | 09:56 |
frenzyfriday | there is a base repo downstream, lemme check | 10:05 |
*** pojadhav- is now known as pojadhav|ruck | 10:23 | |
jm1 | pojadhav|ruck, rlandy|out: went through all failing rdo jobs and annotated failing component jobs in rr notes | 10:29 |
* jm1 lunch | 10:30 | |
*** rlandy|out is now known as rlandy | 10:30 | |
rlandy | jm1: thanks | 10:30 |
rlandy | let's sync when dviroel|out is in | 10:30 |
rlandy | pojadhav|ruck: ^^ | 10:30 |
rlandy | is there a new rr hackmd | 10:30 |
rlandy | https://hackmd.io/94uNoMlnQgegrgy1iXV1kQ - ok great | 10:30 |
rlandy | jm1: frenzyfriday: pojadhav|ruck: cockpit upstream is brokem | 10:33 |
rlandy | out of date | 10:33 |
rlandy | http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1 | 10:33 |
rlandy | promotions are more recent than that | 10:33 |
rlandy | pojadhav|ruck: hello - you around? | 10:34 |
rlandy | chandankumar: hi | 10:34 |
chandankumar | rlandy: hello | 10:34 |
rlandy | arrived safe? | 10:34 |
chandankumar | yes | 10:34 |
chandankumar | thank you :-) | 10:34 |
pojadhav|ruck | rlandy, yes around | 10:35 |
rlandy | pojadhav|ruck: how are you and jm1 dividing the rr work? | 10:36 |
rlandy | are you working downstream> | 10:36 |
pojadhav|ruck | yes I started looking at d/stream, but I didnt found any major blocker, only rerunning jobs which are are having sigle failures. now started looking at upstream components. | 10:37 |
rlandy | pojadhav|ruck: we will need to meet about the TC stuff when dasm gets in | 10:37 |
pojadhav|ruck | rlandy, yes | 10:37 |
rlandy | pojadhav|ruck: pls puts notes and last promotes dates on the rr notes like jm1 has done | 10:37 |
rlandy | downstream is in good shape | 10:37 |
pojadhav|ruck | rlandy, yep I will add | 10:37 |
rlandy | frenzyfriday: hi | 10:40 |
rlandy | pls ping when around | 10:40 |
rlandy | looks like upstream cockpit is not getting latest data | 10:40 |
frenzyfriday | rlandy, hi | 10:41 |
frenzyfriday | upstream also? :'( | 10:41 |
rlandy | frenzyfriday: hi - can you meet for a few? | 10:41 |
* frenzyfriday checks | 10:41 | |
frenzyfriday | yep sure | 10:41 |
rlandy | frenzyfriday: https://meet.google.com/tug-xmbn-rag?pli=1&authuser=0 | 10:41 |
rlandy | bhagyashris: hi - will need to meet with you next | 10:41 |
rlandy | pls ping when you are around | 10:42 |
rlandy | jm1: pls nicj | 10:47 |
rlandy | nick | 10:47 |
rlandy | jm1: pojadhav|ruck: promotions are in better shape than that - the cockpit is not getting latest data - spoke with frenzyfriday - pls check dlrn for your data | 10:47 |
pojadhav|ruck | ack | 10:48 |
*** ysandeep|away is now known as ysandeep | 10:53 | |
rlandy | bhagyashris: thanks for https://code.engineering.redhat.com/gerrit/c/openstack/tripleo-ci-internal-config/+/426246 - merging | 10:56 |
rlandy | we will need a zuul restart to get the line to show | 10:56 |
rlandy | bhagyashris: can you add 17.1 on rhel8 to the promoter? | 11:00 |
chandankumar | ysandeep+++++++++++++++++++++ | 11:04 |
ysandeep | chandankumar++ thanks for suggestions on c9 dib image :D | 11:07 |
ysandeep | reviewbot: please add in review list: https://review.opendev.org/c/openstack/tripleo-ci/+/854751 | 11:07 |
reviewbot | I could not add the review to Review List | 11:07 |
rlandy | ysandeep: chandankumar: ok - will merge | 11:07 |
bhagyashris | rlandy, hey i am around... | 11:14 |
rlandy | bhagyashris: hi ... review time now | 11:15 |
rlandy | pls join that - will chat with you afterwards | 11:15 |
bhagyashris | sure | 11:15 |
bhagyashris | joining... | 11:15 |
bhagyashris | folks review time... | 11:17 |
bhagyashris | rlandy, https://review.opendev.org/c/openstack/tripleo-heat-templates/+/852446/ | 11:21 |
*** ysandeep is now known as ysandeep|afk | 11:23 | |
bhagyashris | rlandy, https://hackmd.io/kqJ-XcUeQ_24bjQ7pjTVJQ | 11:23 |
*** dviroel|out is now known as dviroel | 11:23 | |
rlandy | akahat: hey - https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-9-multinode-mixed-os&pipeline=openstack-periodic-integration-stable1-cs8&skip=0 | 11:27 |
rlandy | failing | 11:27 |
rlandy | can you look into that | 11:27 |
rlandy | ysandeep|afk: dviroel: you should be unblocked now on resources | 11:30 |
rlandy | dviroel: good morning | 11:30 |
rlandy | jm1: pojadhav|ruck: soniya: dviroel:pls ping when you are all around so we can rr sync | 11:30 |
ysandeep|afk | rlandy++ wohoo awesome \o/ | 11:31 |
ysandeep|afk | thanks! | 11:31 |
dviroel | rlandy: nice o/ | 11:32 |
rlandy | dviroel: fyi - cockpit is not updating | 11:32 |
rlandy | so last nights promos are not showing | 11:32 |
frenzyfriday | there is a problem with the podman networking in the incockpit. the containers are up but they cannot communicate with each other | 11:32 |
rlandy | which is why jm1 thinks things are worse than they are | 11:33 |
rlandy | dviroel: did we promote w c9 security component? | 11:33 |
dviroel | yeah, i see an comment abot that above | 11:33 |
dviroel | yes, it is promoted | 11:33 |
rlandy | dviroel++ | 11:33 |
rlandy | so what's out is w c8 integ and w c9 | 11:33 |
rlandy | ok - we can deal with those when jm1 and pojadhav|ruck sync happens | 11:33 |
dviroel | https://trunk.rdoproject.org/centos9-wallaby/component/security/ | 11:33 |
dviroel | rlandy: is this dns fix for vexx affecting everything | 11:34 |
dviroel | ? | 11:34 |
pojadhav|ruck | rlandy, i am available for the sync, will wait for others | 11:35 |
ysandeep|afk | frenzyfriday, podman networking -- hmm are we starting containers as root? | 11:36 |
*** ysandeep|afk is now known as ysandeep | 11:36 | |
frenzyfriday | ysandeep, yep | 11:36 |
ysandeep | I think I know what's the issue and possible workaround, fetching.. | 11:37 |
frenzyfriday | ysandeep, even if I do docker exec it keeps throwing me out | 11:37 |
jm1 | rlandy: nick what? | 11:38 |
jm1|rover | rlandy: i am here | 11:38 |
jm1 | rlandy: me too | 11:38 |
rlandy | ok- think we are all here | 11:39 |
ysandeep | frenzyfriday, read this: https://bugzilla.redhat.com/show_bug.cgi?id=2091840 | 11:39 |
rlandy | dviroel: jm1|rover: pojadhav|ruck: soniya: https://meet.google.com/ocx-ttaj-zbe?pli=1&authuser=0 | 11:39 |
ysandeep | frenzyfriday, run this as root - ip link | grep mtu | 11:40 |
ysandeep | and see if podman bridge have higher mtu than eth0? | 11:40 |
ysandeep | frenzyfriday, let me know if you want to discuss over a gmeet? | 11:42 |
frenzyfriday | ysandeep, yep, cni-podman1: 1500, eth0 1450 | 11:43 |
frenzyfriday | ysandeep, yep sure: https://meet.google.com/sug-xzxz-xvn | 11:44 |
akahat | rlandy, looking.. | 11:56 |
chandankumar | I saw some chatter around downstream promoter , Does it sorted out? | 11:58 |
chandankumar | *it got | 11:58 |
*** ysandeep is now known as ysandeep|afk | 12:05 | |
frenzyfriday | chandankumar, nope. Containers are up - but now they cannot communicate between each other | 12:06 |
rlandy | jm1: we missed our 1-1 | 12:15 |
jm1 | rlandy: want to do it in 5mins? | 12:15 |
rlandy | scheduling that for next week when you are off rr | 12:15 |
rlandy | can't have other meetings today | 12:15 |
jm1 | rlandy: ok sure | 12:17 |
jm1 | dviroel, soniya: thanks for your rr notes, that really helped us getting started :) | 12:19 |
jm1 | frenzyfriday: need help with upstream cockpit? | 12:20 |
dviroel | jm1: pojadhav|ruck: testing cloudops fix here https://review.rdoproject.org/zuul/status#44652 | 12:21 |
pojadhav|ruck | dviroel, ack | 12:21 |
jm1 | pojadhav|ruck: will sync our rr notes against cix board again and then update promotion dates in rr notes | 12:24 |
pojadhav|ruck | jm1, sure | 12:24 |
pojadhav|ruck | d/stream one already update based on dlrn data | 12:24 |
pojadhav|ruck | updated* | 12:24 |
pojadhav|ruck | promotion dates | 12:24 |
jm1 | pojadhav|ruck: ack, thanks! | 12:25 |
frenzyfriday | jm1, hey, do you know if there is an ansible role for podman-compose like https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/rrcockpit/tasks/start_services.yml#L16 ? | 12:27 |
frenzyfriday | I can see containers.podman.podman_container but it does not accept compose files | 12:27 |
jm1 | frenzyfriday: i dont think so, this does not show anything https://github.com/containers/ansible-podman-collections/tree/master/plugins/modules | 12:28 |
jm1 | frenzyfriday: podman has kube files. but iirc one can use docker compose with podman. are you running in issues with that? | 12:29 |
jm1 | frenzyfriday: we can debug together if you want | 12:29 |
frenzyfriday | yes, if we have posman installed and try to set up the containers with dockerfile, docker-compose then the containers cannot communicate with each other. But if I use podman-compose up it works fine. Now outr playbook uses docker-compose module and passes the compose file to it. We need to do it through podman compose | 12:30 |
frenzyfriday | jm1, you are RRing this week. I'll try to set up the incockpit manually for now (the compose part) and we can check after your RR | 12:31 |
jm1 | frenzyfriday: whatever you prefer ^^ | 12:31 |
frenzyfriday | I am also off next week | 12:31 |
jm1 | frenzyfriday: clean solution would be switching to kube files on podman | 12:32 |
jm1 | frenzyfriday: what about upstream cockpit? | 12:32 |
jm1 | frenzyfriday: do you need help there? is it working again? | 12:32 |
frenzyfriday | jm1, no, did not get to the upstream yet | 12:33 |
jm1 | frenzyfriday: ack | 12:33 |
frenzyfriday | jm1, ysandeep|afk for the upstream cockpit I see an error in the telegraf container logs: https://paste.opendev.org/show/bHgnf4CZDrzgkJart8bD/ | 12:36 |
frenzyfriday | maybe thats why we arent getting new content? | 12:36 |
frenzyfriday | Error in plugin: metric parse error: expected tag at 1:127: "zuul-queue-status,url=https://softwarefactory-project.io/zuul/api/tenant/rdoproject.org/status,pipeline=openstack-check,queue=,job=tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-wallaby,review=852845,patch_set=1 result=\"None\",enqueue_time=1661511140489,enqueued_time=103.6614,result_code=-1" | 12:37 |
frenzyfriday | 2022-08-26T12:36:05Z D! [outputs.influxdb] Wrote batch of 1 metrics in 4.045052ms | 12:37 |
ysandeep|afk | dasm|off, ^^ | 12:37 |
ysandeep|afk | frenzyfriday, you are seeing that error continously or it was only once? | 12:40 |
frenzyfriday | ysandeep|afk, repeating | 12:44 |
frenzyfriday | ysandeep|afk, https://paste.opendev.org/show/bfphcqOi78l7eOWqC2fK/ | 12:44 |
* pojadhav|ruck brb | 12:44 | |
ysandeep|afk | frenzyfriday, probably related to some recent changes, I will let dasm|off take a look otherwise I can take a look on my Monday morning. | 12:47 |
frenzyfriday | ysandeep|afk, cool, thanks! | 12:47 |
frenzyfriday | rlandy, ^ related to upstream cockpit | 12:47 |
rlandy | frenzyfriday: probably last few merged changes on rr tool | 12:48 |
rlandy | soniya also noticed some data being off lately on rr tool | 12:49 |
rlandy | dasm|off: ^^ when you are in, pls look at these errors | 12:49 |
rlandy | pojadhav|ruck: hi - can you start looking at the node provision failure on check? | 12:51 |
pojadhav|ruck | rlandy, sure | 12:53 |
jm1 | frenzyfriday, ysandeep|afk, dasm|off: docker containers such as telegraf_py3 are not automatically updated when ruck_rover.py script is changed. for example, on upstream cockpit the telegraf container is 3 month old, but the latest change to ruck_rover.py is 4 days old | 12:53 |
jm1 | rlandy: ^ | 12:53 |
ysandeep|afk | soniya, could you please elaborate what issues you are seeing on rr tool? | 12:55 |
jm1 | frenzyfriday, ysandeep|afk, dasm|off, rlandy: docker-compose up tries to be smart and recreates containers when docker-compose.yml has been altered, but docker-compose will NOT watch for updates of Dockerfile(s) or files added in those Dockerfile(s) | 12:56 |
ysandeep|afk | jm1, :( we need a way to rebuild container for changes in files. | 12:57 |
rlandy | yeah - we need to rethink this | 12:57 |
rlandy | also - how many changes are going in rr tool | 12:57 |
rlandy | if they are all needed | 12:57 |
jm1 | ysandeep|afk, rlandy: yeah. how about starting to sync our ansible code in infra-setup with our existing infra? ;) | 12:57 |
soniya | ysandeep|afk, rlandy, sorry i missed your pings | 12:58 |
soniya | ysandeep|afk, the status shown with the rr tool was not so correct, like for a pipeline the rr tool shows status=RED but the pipeline didnt had so many blockers as such | 12:59 |
ysandeep|afk | soniya, by chance do you still have the output/if you are still seeing somewhere? | 13:00 |
soniya | ysandeep|afk, nope, i noticed this while filling the program call doc | 13:02 |
ysandeep|afk | ack, If even one job is failing whether its in criteria or not than red is correct | 13:02 |
jm1 | soniya: what do you mean with pipeline did not have so many blockers? are you check pipeline with cockpit? | 13:02 |
ysandeep|afk | so that ruck/rover will check the failing job | 13:02 |
* ysandeep|afk goes in a mtg | 13:02 | |
soniya | jm1: we are discussing about rr tool | 13:03 |
rlandy | pojadhav|ruck: 1-1?? | 13:05 |
jm1 | soniya: yeah and in that context i am wondering what you took as a comparison for rr tool to find out that rr tool is broken. i am asking that because maybe we have issues somewhere else as well | 13:05 |
soniya | ysandeep|afk, jm1, so IMHO i think the status shown in rr tool should also consider the latest promotion happened and not just failing criteria jobs, because in our case, we got promotion 2 days back and status shown was 'RED'. just to mention we didn't skip any of jobs as far as i remember | 13:09 |
soniya | <ysandeep|afk> so that ruck/rover will check the failing job - for this we already have rr tool showing the testproject ready for them, that should be enough, right? | 13:10 |
frenzyfriday | hey folks, the CRE team wanter to know if we have rocky linux running any of the jobs of tripleo ci? | 13:11 |
frenzyfriday | afuscoar, ^ | 13:11 |
*** rcastillo|rover is now known as rcastillo | 13:15 | |
rcastillo | o/ | 13:19 |
rcastillo | don't know why my bouncer wants me to be rover :( | 13:19 |
soniya | <ysandeep|afk> soniya, by chance do you still have the output/if you are still seeing somewhere? - may be new ruck/rovers can confirm this | 13:25 |
soniya | if time permits then | 13:25 |
bhagyashris | rlandy, add jobs in criteria https://code.engineering.redhat.com/gerrit/c/tripleo-environments/+/426268 and added 17.1 on rhel8 promotion https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44663 | 13:29 |
ysandeep|afk | rlandy, mtg time | 13:31 |
rlandy | bhagyashris: thanks - will look after meeting | 13:31 |
jm1 | soniya: maybe rr tool is right and our cockpit is wrong? we having issues with cockpit atm | 13:42 |
soniya | jm1, ack | 13:51 |
jm1 | frenzyfriday: time for a chat? | 13:52 |
frenzyfriday | jm1, yep sure | 13:53 |
pojadhav|ruck | rlandy, dasm|off still not in yet :( | 14:00 |
rlandy | pojadhav|ruck: let's wait for him | 14:02 |
rlandy | pojadhav|ruck: ping me when you are eod | 14:02 |
pojadhav|ruck | rlandy, i was about to eod by 1:30 pm UTC, not its 2 pm UTC :D | 14:04 |
*** dasm|off is now known as dasm | 14:04 | |
dasm | o/ | 14:04 |
pojadhav|ruck | omg | 14:04 |
pojadhav|ruck | dasm here :D | 14:04 |
pojadhav|ruck | I am waiting for you only dasm | 14:05 |
dasm | pojadhav|ruck: where? i joined a call, but no one is here | 14:05 |
pojadhav|ruck | dasm joining | 14:05 |
dasm | ack | 14:05 |
*** njohnston_ is now known as njohnston | 14:05 | |
pojadhav|ruck | rlandy, TC stuff ? | 14:06 |
dasm | 12:53 <jm1> frenzyfriday, ysandeep|afk, dasm|off: docker containers such as telegraf_py3 are not automatically updated when ruck_rover.py script is changed. | 14:08 |
dasm | jm1: hmm. i was told it's autobuilt. i don't remember who said that. | 14:09 |
dasm | that's a bad thing. | 14:09 |
rlandy | joining | 14:10 |
dasm | 12:37:30 frenzyfriday | 2022-08-26T12:36:05Z D! [outputs.influxdb] Wrote batch of 1 metrics in 4.045052ms | 14:19 |
dasm | i'm gonna check that, but if we're not deploying new code, we might not have newer changes | 14:19 |
dasm | ysandeep|afk: ^ | 14:19 |
jm1 | dasm: i am on upstream cockpit right now, trying to rebuild telegraf | 14:20 |
dasm | jm1: ack | 14:21 |
ysandeep|afk | dasm, true | 14:21 |
*** ysandeep|afk is now known as ysandeep|out | 14:22 | |
frenzyfriday | folk, lemme know if http://tripleo-cockpit.lab4.eng.bos.redhat.com/?orgId=1 is working for you | 14:24 |
frenzyfriday | I have changes centos9 image to 8 for telegraf. thanks jm1 for the suggestion! The script will need a lot of work which will take time | 14:25 |
dviroel | frenzyfriday: works for me, i see 17.1 rhel 9 there, no data yet but, at least, there is board there now | 14:26 |
afuscoar | frenzyfriday: in my case the downstream dashboard I've created is there http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/tbUsg0Z4k/downstream-data?orgId=1 | 14:28 |
afuscoar | Just not information to populate. I guess bc telegraf problem | 14:28 |
frenzyfriday | afuscoar, hm.. lemme check what happened to the data | 14:29 |
*** jpena is now known as jpena|off | 14:29 | |
afuscoar | Idk if the script is located in the right path | 14:31 |
chandankumar | see ya people | 14:36 |
chandankumar | happy weekend :-) | 14:37 |
rlandy | frenzyfriday: http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/bSwsg0WVz/rhel9-rhos17-1-full-component-pipeline - yep | 14:37 |
rlandy | no data yet on the downstream side | 14:37 |
afuscoar | Oh, it's also happening in other dashboards | 14:37 |
afuscoar | happy weekend chandankumar | 14:37 |
rlandy | frenzyfriday++++++++++++++ | 14:38 |
rlandy | jm1++++++++++++++++++ | 14:38 |
rlandy | thank you both | 14:38 |
rlandy | dasm: when you are done with UA stuff, can you look into the upstream cockpit error with frenzyfriday? | 14:39 |
rlandy | hasn't collected new data | 14:39 |
dasm | rlandy: it takes few minutes to gather data, but i'll check that one | 14:39 |
dviroel | jm1: I added you to cloudops space, you might receive tons of notifications, but you can disabled them | 14:40 |
dviroel | jm1: we are discussing cloudops issue there since yesterday | 14:40 |
jm1 | dviroel: ack ok thanks! | 14:41 |
dasm | frenzyfriday: i logged into lab4 instance and manually ran ruck_rover.py --influx. I see results. They should be landing in the cockpit soon | 14:43 |
jm1 | dasm: frenzyfriday has fixed downstream (manually) | 14:43 |
frenzyfriday | dasm, ack, yeah the logs say it is still pulling data | 14:43 |
* frenzyfriday lunch | 14:43 | |
jm1 | rlandy, pojadhav|ruck: upstream cockpit is currently refreshing but might still be broken. you can use downstream cockpit for the moment, frenzyfriday has fixed it | 14:44 |
rlandy | jm1: frenzyfriday: thank you | 14:44 |
rlandy | jm1: putting in patch to chase wallaby c9 | 14:45 |
jm1 | rlandy: thank you! put my latest update to rr notes. | 14:45 |
jm1 | rlandy, dasm, ysandeep|out: upstream cockpit has several issues which we fixed manually for now, e.g. mtu was wrong which prevented container rebuilds for months. (this actually should have caused issues since the beginning of that vm on vexxhost). anyway, we will fix infra code next week. i am eod for today | 14:48 |
rlandy | jm1: ok - have wallaby c8 and c9 in rerun | 14:48 |
dasm | jm1: o/ have a good weekend mate | 14:48 |
rlandy | jm1: asked dasm to add a new eoic | 14:48 |
rlandy | epic | 14:48 |
rlandy | to track needed cockpit changes | 14:48 |
dasm | it's gonna be EPIC! | 14:48 |
rlandy | to take up next sprint | 14:48 |
rlandy | or this one - if anyone has time | 14:49 |
dasm | EPIC sprint for EPIC epic with EPIC task :) | 14:49 |
rlandy | jm1: dasm: pls log these tasks there | 14:49 |
dasm | (sorry, i'm in a weekend mode atm) | 14:49 |
rlandy | so we capture what needs to be done | 14:49 |
rlandy | jm1: have a good weekend | 14:49 |
jm1 | rlandy, dasm: ack. dasm please add me to 'Watchers' on jira cards (in case you are creating new ones for infra) | 14:51 |
dasm | jm1: ack. cc pojadhav|ruck | 14:51 |
* jm1 have a nice weekend #oooq | 14:53 | |
pojadhav|ruck | rlandy, leaving for the day | 14:54 |
rlandy | frenzyfriday: huge thank you for dealing with all this | 14:54 |
rlandy | frenzyfriday+++++++++++++++++++++++++++ | 14:54 |
pojadhav|ruck | did most of the JIRA stuff with dasm | 14:54 |
rlandy | pojadhav|ruck: dasm: thank you both!!! | 14:55 |
rlandy | Team is rocking out!!!! | 14:55 |
* pojadhav|ruck out | 14:55 | |
*** pojadhav|ruck is now known as pojadhav|out | 14:55 | |
pojadhav|out | see you all on monday !! | 14:55 |
pojadhav|out | bbye | 14:55 |
rlandy | have a great weekend | 14:56 |
*** dviroel is now known as dviroel|lunch | 15:00 | |
Tengu | chandankumar: heya! I think there's an issue with the way molecule are launched in tripleo-ansible - it seems to limit to the "default" scenario... how can I make sure it's running *all* ? | 15:02 |
rlandy | Tengu: you missed chandankumar - is it urgent? | 15:13 |
Tengu | rlandy: not really - just making tripleo-ansible molecule tests less reliable. | 15:13 |
Tengu | I'll check next week. | 15:13 |
Tengu | but basically, this explains why molecule tests are all green for my tripleo_httpd_vhost role, while it's NOT working :). | 15:14 |
Tengu | "woops". | 15:14 |
rlandy | yep - too good t be true type thing | 15:15 |
Tengu | exactly. | 15:15 |
Tengu | I was testing my role "in real situation" and it crashes with a template thingy that should have crashed zuul. | 15:16 |
Tengu | it didn't. | 15:16 |
Tengu | so I hunted down the molecule report and, wow, only one test was launched :). the "default" scenario. «woopsss» | 15:16 |
rlandy | :) | 15:16 |
rlandy | ok - so tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001 works | 15:17 |
rlandy | but c9 doesn't provision nodes .... fhmmm | 15:17 |
rlandy | last success 08/24 | 15:18 |
rlandy | charming | 15:18 |
Tengu | rlandy: I may have a working change for the tripleo-ansible molecule thingy. | 15:43 |
Tengu | testing on my env. | 15:43 |
Tengu | it's "meh", but it allows to run all the scenarios. | 15:44 |
*** frenzyfriday is now known as frenzyfriday|pto | 15:48 | |
*** dviroel|lunch is now known as dviroel | 16:10 | |
rlandy | dasm: hi | 16:22 |
rlandy | when was the last working centos image you could use for bmc? | 16:22 |
rlandy | date of that image? | 16:22 |
dasm | rlandy: 20220606.0 next one which fails is 20220621.1 | 16:23 |
rlandy | ok it's not that | 16:24 |
dasm | > CentOS-Stream-GenericCloud-9-20220606.0. Following releases, none of them starts. | 16:24 |
rlandy | yeah - I think we need to promote master | 16:27 |
rlandy | current-trupleo hash is from 08/22 | 16:28 |
rlandy | we need 08/25 or later | 16:28 |
dviroel | rlandy: on 08/25 we already have cloudops bug? | 16:41 |
dviroel | yes right, it was yestreday | 16:41 |
dviroel | :P | 16:41 |
*** rcastillo|rover is now known as rcastillo | 16:55 | |
dviroel | rcastillo: do you miss rover times? | 16:55 |
dviroel | :) | 16:55 |
rcastillo | every day I miss it | 16:57 |
rlandy | rcastillo; we can always out you back on :) | 17:16 |
rcastillo | it'd be unfair not to let others in on the fun :) | 17:16 |
rlandy | you're so considerate :) | 17:16 |
dasm | rcastillo: i believe others don't mind to allow you to have all fun ;) | 17:20 |
dasm | *all the fun | 17:20 |
rcastillo | you're just saying that to be nice | 17:23 |
dviroel | rlandy: that tempest test failure on wallaby c8 seems to be a real issue? | 18:12 |
dviroel | not first time? | 18:12 |
rlandy | dviroel: got one test failing on latest run | 18:12 |
rlandy | not repeated | 18:12 |
rlandy | running again | 18:12 |
rlandy | to compare | 18:12 |
rlandy | different hash | 18:13 |
dviroel | i think I saw that failing yesterday, not sure which hash | 18:13 |
rlandy | dviroel: wrt node prov failure - I really think we need to promote a more recent version fo master | 18:13 |
rlandy | dviroel: that was a different hash - and a different test | 18:13 |
dviroel | ok | 18:13 |
rlandy | following both hashes | 18:13 |
rlandy | dviroel: for node prov - you promoted hash from 08/22 I think | 18:14 |
dviroel | rlandy: yeah, but we are blocked due to scenario001 | 18:14 |
rlandy | we need 08/25 | 18:14 |
rlandy | correct | 18:14 |
rlandy | so once that is cleared | 18:14 |
dviroel | cloudops is out already | 18:14 |
dviroel | their fix didn't work | 18:15 |
dviroel | they use Spaces to chat, I can add you if you want | 18:15 |
dviroel | if you want notifications on your gmail | 18:15 |
dviroel | :) | 18:16 |
rlandy | now looking at wallaby c9 | 18:36 |
rlandy | ugh - I give up - skipping fs001 to promote wallaby c8 | 19:58 |
rlandy | dviroel: ^^ https://logserver.rdoproject.org/54/36254/156/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-wallaby/e4928dd/logs/undercloud/var/log/tempest/stestr_results.html.gz - best result I can get | 19:59 |
rlandy | skipping and promoting this hash | 19:59 |
rlandy | ugh - tempest you win, I give up | 19:59 |
dviroel | rlandy: you have my vote on this | 20:03 |
rlandy | dviroel: will need - patch in progress -one sec | 20:04 |
rlandy | dviroel: ok - https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44664 - here | 20:06 |
rlandy | dviroel: thank you | 20:08 |
rlandy | openstack-periodic-integration-main - quite good | 20:10 |
rlandy | waiting on sc01 | 20:11 |
rlandy | dviroel: pls check vexx flavors are there | 20:11 |
rlandy | ticket says done | 20:11 |
dviroel | they missed one, but it is ok for now | 20:17 |
dviroel | will unblock our work | 20:17 |
dviroel | the missing flavor may or may not be needed | 20:17 |
dviroel | the one with extra memory | 20:17 |
dviroel | rlandy: added a new section on podified doc, about cluster provision | 20:18 |
rlandy | checking which one was missed | 20:18 |
* dviroel brb - walk | 20:26 | |
rlandy | dviroel: fond which one is missing | 20:28 |
rlandy | responded on ticket | 20:28 |
rlandy | dviroel: dasm: rcastillo: one of you guys, pls check me on this before I merge: https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44663 | 20:31 |
rlandy | from bhagyashris | 20:31 |
rcastillo | looking | 20:31 |
rlandy | t | 20:32 |
rlandy | ty | 20:32 |
rcastillo | looks good | 20:35 |
rcastillo | do depends-on work on rdo -> downstream gerrit? | 20:35 |
dasm | rlandy: lgtm | 20:39 |
dasm | but, there is one minor thing. i'm double checking | 20:40 |
dasm | ok, it's good. | 20:42 |
rlandy | thanks | 20:52 |
* dviroel back | 21:00 | |
* dasm => offline | 21:06 | |
dasm | see you Monday! | 21:06 |
*** dasm is now known as dasm|off | 21:07 | |
rlandy | bye dasm | 21:11 |
rlandy | dviroel++ thanks for onboarding help | 21:25 |
rlandy | still trying to promo c8 | 21:42 |
rlandy | patch keeps failing on a diff test | 21:42 |
dviroel | docker rate limiting | 21:43 |
rlandy | mumble | 21:43 |
rlandy | I pay $7 a month for us to have a paid account | 21:43 |
dviroel | haha | 21:43 |
dviroel | we just need to add container login on these jobs then | 21:44 |
dviroel | will solve that issue | 21:44 |
dviroel | I still need to move rdo to container-login role | 21:44 |
dviroel | i have a note about that, but it is good to create tasks | 21:45 |
rlandy | otherwise will try on sunday | 21:45 |
rlandy | when less people are active | 21:45 |
dviroel | are we missing wallaby c8 and c9? | 21:46 |
dviroel | how is wallaby c9? | 21:46 |
rlandy | don't ask | 21:51 |
rlandy | kvm job now passed | 21:51 |
rlandy | as did tempest | 21:51 |
rlandy | checking ovb | 21:51 |
rlandy | new hash just started | 21:52 |
rlandy | old one was out 64, 39, 1 | 21:52 |
rlandy | 35 | 21:52 |
rlandy | https://review.rdoproject.org/r/c/testproject/+/43883 | 21:52 |
rlandy | was my rerun | 21:52 |
rlandy | 2022-08-26 15:31:24.107091 | primary | TASK [print content of 'resolv.conf' after modifications] ********************** | 21:53 |
rlandy | 2022-08-26 15:31:24.107327 | primary | Friday 26 August 2022 15:31:24 -0400 (0:00:01.672) 0:15:50.294 ********* | 21:53 |
rlandy | 2022-08-26 15:31:24.133797 | primary | ok: [undercloud] => { | 21:53 |
rlandy | 2022-08-26 15:31:24.133846 | primary | "msg": "Content of resolv.conf: # Generated by NetworkManager\nsearch openstacklocal novalocal\nsearch ooo.test\nnameserver 10.0.0.250\n# NOTE: the libc resolver may not support more than 3 nameservers.\n# The nameservers listed below may not be recognized." | 21:53 |
rlandy | ha | 21:53 |
rlandy | that may be the change made yesterday | 21:54 |
rlandy | diff overcloud deploy errors in multiple runs | 21:57 |
rlandy | on 064 | 21:57 |
rlandy | we're getting nowhere now | 22:16 |
dviroel | hum | 22:17 |
dviroel | yeah, 064 and 039 where like that in master too | 22:18 |
dviroel | did you tried another cloud? ibm, internal? | 22:18 |
* dviroel it is happening, I see 3 control plane and 3 worker vms | 22:19 | |
dviroel | 7 instances if you add bootstrap | 22:19 |
dviroel | rlandy: i think that these jobs are testing you | 22:23 |
dviroel | rlandy: failing again | 22:23 |
* dviroel brb | 22:31 | |
*** dviroel is now known as dviroel|afk | 22:31 | |
*** dviroel|afk is now known as dviroel | 22:58 | |
* dviroel almost there with ocp cluster | 23:00 | |
dviroel | have a great weekend team? | 23:00 |
dviroel | s/?/! | 23:01 |
rlandy | bye all | 23:01 |
dviroel | o/ | 23:01 |
rlandy | have a great weekend | 23:01 |
*** dviroel is now known as dviroel|out | 23:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!