*** dasm|off is now known as Guest1645 | 04:00 | |
*** jm1|ruck is now known as jm1|rover | 07:00 | |
jm1 | moin #oooq :) | 07:00 |
---|---|---|
chandankumar | jm1: o/ | 07:14 |
chandankumar | jm1: need any help on ruck rover, do let me know :-) | 07:14 |
jm1 | chandankumar: ok thanks! will ping you when i need help ^^ | 07:28 |
*** jpena|off is now known as jpena | 07:33 | |
jpodivin | reviewbot: https://review.rdoproject.org/r/c/rdo-jobs/+/44608 | 08:10 |
jpodivin | reviewbot: add to review list https://review.rdoproject.org/r/c/rdo-jobs/+/44608 | 08:19 |
reviewbot | I could not add the review to Review List | 08:19 |
jm1 | chandankumar: ./ruck_rover.py --release osp17-1 --distro rhel-9 --component all fails for me :/ does it work for you? | 09:13 |
chandankumar | jm1: yes it fails for me also https://paste.centos.org/view/ed91d40d | 09:14 |
jm1 | chandankumar: yeah thats the error i get as well | 09:16 |
afuscoar | Hii. One little question, do you have any update about what could be happening in the dashboard that says no data to show (http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/tbUsg0Z4k/downstream-data?orgId=1), thx | 09:17 |
chandankumar | jm1: https://code.engineering.redhat.com/gerrit/plugins/gitiles/tripleo-environments/+/refs/heads/master/ci-scripts/dlrnapi_promoter/config/RedHat-9/component/rhos-17.1.yaml | 09:19 |
chandankumar | criteria is empty currently that;s why it is failing | 09:19 |
chandankumar | I think ruck/rover tool does not consider this case when crtieria is empty | 09:19 |
jm1 | afuscoar: ananya is currently working on downstream cockpit but she is on PTO this week | 09:21 |
jm1 | chandankumar: nice catch! wanna submit a patch? ;) | 09:22 |
afuscoar | Thank you jm1 | 09:23 |
chandankumar | let me put a patch | 09:24 |
chandankumar | jm1: https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44694 | 09:53 |
chandankumar | jpodivin: o/ please have a look at last comment https://review.rdoproject.org/r/c/rdo-jobs/+/44608/3#message-8eb50329a9b09c43b335f40d499ef6675ec07755 | 09:53 |
jpodivin | chandankumar: right. Thanks for pointing that out | 09:54 |
jpodivin | reviewbot: add to review list https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 | 10:07 |
reviewbot | I could not add the review to Review List | 10:08 |
jm1 | chandankumar: thank you! voted on your patch | 10:11 |
jm1 | chandankumar: why is ruck_rover.py not printing any component jobs to rerun although it has a lot of red components? | 10:12 |
* jm1 lunch | 10:17 | |
rlandy | jm1: hey - when you are back from lunch will run through the program doc with you | 10:37 |
rlandy | <jm1> chandankumar: ./ruck_rover.py --release osp17-1 --distro rhel-9 --component all fails for me | 10:37 |
rlandy | there are no components for 17.1 | 10:37 |
rlandy | waiting on CRE team to develop them | 10:37 |
chandankumar | rlandy: https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44694 will fix the current issue | 10:41 |
rlandy | chandankumar; ok - want me to merge that or bring it to review time? | 10:42 |
chandankumar | rlandy: feel free to merge that | 10:43 |
chandankumar | rlandy: thank you :-) | 10:52 |
jm1 | rlandy: o/ wanna sync? | 11:10 |
rlandy | jm1: sure | 11:10 |
rlandy | https://meet.google.com/qsr-cfbr-mep?pli=1&authuser=0 | 11:11 |
chandankumar | anyone wants to join review meeting? there is not much there , all of them already reviewed | 11:16 |
chandankumar | india having public holiday, I think we can skip the meeting | 11:17 |
chandankumar | rlandy: please merge https://review.rdoproject.org/r/c/rdo-jobs/+/44608 https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 and https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/43307 | 11:22 |
*** dviroel|out is now known as dviroel | 11:23 | |
dviroel | o/ | 11:23 |
rlandy | dviroel: hi - pls see pvt chat | 11:47 |
rlandy | chandankumar: dviroel: lol - this is fun - 5 mins pls ... https://meet.google.com/ucs-kaow-bkd?pli=1&authuser=0 | 11:50 |
rlandy | dviroel: ^^ pls | 11:51 |
rlandy | I only have a few minutes | 11:51 |
jm1 | dviroel: your kvm patch is working 🥳 | 11:53 |
dviroel | jm1: thats great, thanks for updating it | 11:56 |
dviroel | jm1: i will check in a bit | 11:57 |
jm1 | rlandy: want to add your alternative solution for BZ:2105408 (switching back to vexxhost jobs) to cix card? :) | 12:07 |
rlandy | jm1: we would still need the fix for internal | 12:07 |
rlandy | so it's not really ab alterntaive | 12:07 |
rlandy | but I will put in the testproject job to test out vexx on w and t | 12:07 |
jm1 | rlandy: not a complete alternative solution but a partial solution which could "fix" our c9 master/wallaby jobs (and consolidate it on one platform: vexxhost) | 12:09 |
jm1 | rlandy: will also make rerunning jobs easier because we only need rdo testprojects | 12:09 |
rlandy | jm1: yep - working on that testproject now | 12:19 |
rlandy | akahat: hello ... | 12:36 |
rlandy | remember the work to move sc010 all to vexx | 12:36 |
rlandy | do you still have those reviews? | 12:36 |
reviewbot | Do you want me to add your patch to the Review list? Please type something like add to review list <your_patch> so that I can understand. Thanks. | 12:36 |
rlandy | testprojects? | 12:37 |
rlandy | akahat: master sc010 on vexx is looking much more stable than psi now | 12:37 |
rlandy | I want to recheck your jobs | 12:37 |
rlandy | i think we merged the definitions | 12:41 |
rlandy | https://review.rdoproject.org/r/q/topic:sc10_cs9 | 12:46 |
rlandy | ok - so maybe it was just master | 12:46 |
rlandy | adding wallaby c9 and train | 12:46 |
jm1 | rlandy: we can get a c9 master promotion (integration line) today. only three jobs missing which failed on different intermittent errors | 12:49 |
jm1 | rlandy: internal kvm job on c9 master did not fail today so apparently it ran on a patched L0 host | 12:49 |
rlandy | great | 12:51 |
rlandy | jm1: do we need to skip promote or it will just go through? | 12:51 |
jm1 | rlandy: simply wait | 12:51 |
rlandy | awesome | 12:51 |
jm1 | rlandy: same for c9 wallaby: internal kvm job passed, so psi is maybe upgrading their nodes to rhel 8.6? | 12:52 |
rlandy | it's intermittent | 12:52 |
rlandy | sometimes to get a lucky run | 12:53 |
jm1 | rlandy: yeah because psi has some nodes with rhel 8.6 or code is running on amd cpus, i guess. and since these internal kvm jobs are passing more often now, we can imply they are upgrading their infra. anyway, good to see its getting better | 12:55 |
jm1 | rlandy: c9 wallaby integration jobs failed on different intermittent errors, hence we can simply rekick them until they pass. then c9 wallaby should promote | 13:24 |
rlandy | yep - give it another shot | 13:24 |
jm1 | rlandy: already did. proceeding with c8 train | 13:25 |
jm1 | rlandy: omg c8 train should also promote today when we rerun jobs often enough 🙄 | 13:31 |
rlandy | jm1: your reactions are so funny :) | 13:32 |
rlandy | https://review.rdoproject.org/r/c/rdo-jobs/+/44696 Add wallaby c9 and train c8 kvm jobs | 13:35 |
rlandy | testprojecting that | 13:39 |
rlandy | https://review.rdoproject.org/r/c/testproject/+/36254 | 13:44 |
rlandy | let's see what that does | 13:44 |
chandankumar | dviroel: please merge https://review.rdoproject.org/r/c/rdo-jobs/+/44608 https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 and https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/43307 | 13:49 |
*** Guest1645 is now known as dasm | 13:50 | |
chandankumar | rlandy: ^^ | 13:50 |
chandankumar | dasm: o/ | 13:50 |
dasm | o/ | 13:51 |
rlandy | looking | 13:52 |
jm1 | rlandy: thanks! | 13:53 |
jm1 | rlandy: most c9 master components should promote as well. only exception is tripleo component, since takashi is still working on patches | 13:54 |
rlandy | done | 13:54 |
rlandy | jm1+++++++++++++++++ | 13:55 |
chandankumar | rlandy: dviroel on IBM cloud, we can run 5 fs01 ovb jobs successfully in one go | 13:57 |
chandankumar | no more retry limit issue there on OVB i think | 13:57 |
dviroel | chandankumar: nice, how do we control that? if there is 5 ovb running there, and we trigger a testproject, it will fail? | 14:00 |
jm1 | rlandy: wanna join our meeting with phil? | 14:01 |
chandankumar | dviroel: those are 5 ovb fs01 jobs | 14:02 |
chandankumar | https://review.rdoproject.org/r/c/testproject/+/44388/10#message-13e83612b696e4d730b924d4c3ed80c3d745ad92 | 14:02 |
chandankumar | fs01 consume maximum resource | 14:02 |
chandankumar | in our pipeline, we have multiple ovb jobs with different config | 14:04 |
chandankumar | I think testproject will also work fine | 14:04 |
jpodivin | chandankumar: The https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44695 failed in gate. Something about not sharing the change queue with dependency ... | 14:05 |
jpodivin | should I hit recheck or is there something else that should be done? | 14:05 |
chandankumar | dviroel: regarding controlling, if we start seeing issues or retry_limit we can ask infra to increase the space | 14:05 |
chandankumar | jpodivin: recheck would be fine | 14:05 |
chandankumar | once dependson merges | 14:05 |
jpodivin | chandankumar: done. Thank you very much | 14:05 |
jm1 | rlandy: wanna join cix? | 14:31 |
dviroel | jm1: false positive on kvm patch | 14:49 |
dviroel | jm1: not using cpu_model on libvirt conf | 14:50 |
chandankumar | see ya people! | 14:53 |
dasm | chandankumar: o/ | 14:54 |
dviroel | jm1: we can't remove 'libvirt_', because is stable/train | 14:56 |
dviroel | jm1: if you go to "logs/undercloud/var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf" you can check cpu_model config there | 14:59 |
dviroel | jm1: updated the patch, rerunning testproject | 14:59 |
jm1 | dviroel: omg true, thanks for checking and fixing this. so the issue was simply that cascadelake was used instead of skylake | 15:02 |
jm1 | dviroel: our internal kvm jobs for c9 master and c9 wallaby passed today and my not-doing-anything-patch job passed as well. does this mean psi has upgraded their environment? | 15:04 |
jm1 | dviroel: for example check this here, no cpu_mode=host-model and it is still passing https://sf.hosted.upshift.rdu2.redhat.com/logs/32/425432/6/check/periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2-keys/9c6085d/logs/undercloud/var/log/containers/nova/nova-compute.log | 15:05 |
jm1 | dviroel: "no cpu_mode=host-model" => "cpu_mode=host-model" | 15:05 |
tosky | arxcruz: hi, I've just noticed that https://review.opendev.org/c/openinfra/python-tempestconf/+/849127/ now fails with a weird error in one of the jobs, which was working in the previous recheck | 15:07 |
tosky | arxcruz: should we just recheck or do you think it may be related to something else going on? | 15:07 |
arxcruz | tosky checking | 15:07 |
dviroel | jm1: not sure, we have a mix of success and failures I think, so it may depends where the instance gets scheduled | 15:08 |
dviroel | jm1: https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2&skip=0 | 15:08 |
arxcruz | tosky it's related to the rpm build, i would recheck just in case, since the other jogs are running fine, if still fails, we need to check if something change in the cfg to build the rpm | 15:09 |
tosky | arxcruz: oki, thanks | 15:11 |
jm1 | dviroel: your last failing job (before i played with it) had skylake cpus but the later jobs god icelake cpus | 15:16 |
jm1 | dviroel: but both had rhel8.4 below | 15:17 |
dviroel | jm1: i see, lets see if this workaround works for everything | 15:22 |
* dviroel lunch | 15:25 | |
*** dviroel is now known as dviroel|lunch | 15:25 | |
jm1 | dviroel: what makes it difficult is that e.g. periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-master failed with the same error yesterday but was running on icelake cpus https://sf.hosted.upshift.rdu2.redhat.com/logs/29/426429/2/check/periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-master/e31bce2/zuul-info/host-info.primary.yaml | 15:27 |
* jm1 bbl | 15:39 | |
rlandy | jm1: c8 - nice clean run :) https://review.rdoproject.org/zuul/buildset/d89307d240ed41d8a15cc84e0bb6ab18 | 15:41 |
rlandy | should promote | 15:41 |
rlandy | OVB stacks are not being deleted in downstream | 16:07 |
rlandy | lunch - brb | 16:20 |
*** dviroel|lunch is now known as dviroel | 16:25 | |
dviroel | jm1: damn, failed, trying again. It should be cpu_model not cpu_models | 16:43 |
*** jpena is now known as jpena|off | 16:43 | |
rlandy | dasm: hi | 17:58 |
rlandy | dasm: could you upload the bmc-template? | 17:59 |
rlandy | jm1: definite issue with component jobs and ovb in downstream | 18:02 |
dasm | rlandy: it's done | 18:03 |
rlandy | great | 18:03 |
dasm | i left notes on rr review list | 18:03 |
carloss | dviroel++ - thank you for all the reviews in Manila features for feature freeze. I owe you a beer! | 18:51 |
reviewbot | Do you want me to add your patch to the Review list? Please type something like add to review list <your_patch> so that I can understand. Thanks. | 18:51 |
dviroel | carloss: np, i will update the beer counter | 18:54 |
dviroel | carloss: you can ignore our friend reviewbot | 18:55 |
carloss | haha | 18:55 |
carloss | I wondered if he was talking to me or not | 18:55 |
carloss | s/he/it | 18:55 |
jm1 | rlandy: fs64 and fs35 on c9 wallaby and c9 master are very unstable. both are the only jobs missing for promotion. they are failing on intermittent issues again and again :( | 19:09 |
rlandy | jm1: if there are only tempest tests failing and they ar not the same, we can skip promote | 19:10 |
rlandy | there is an issue with OVB with downstream | 19:10 |
rlandy | metadata service | 19:10 |
rlandy | on meeting with phil now | 19:10 |
rlandy | need to chat with rhos-ops about that after meeting | 19:11 |
dviroel | jm1: worked with skylake cpu type: https://sf.hosted.upshift.rdu2.redhat.com/logs/32/425432/6/check/periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2-keys/4acf7b6/logs/undercloud/var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf | 19:19 |
dviroel | jm1: running test again, on 16-2 and 17 now | 19:19 |
jm1 | rlandy: jobs for fs35 and fs64 on c9 master and c9 wallaby always fail for different reasons. but you will see the same intermittent errors across those different jobs | 19:23 |
jm1 | rlandy: i reformated the rr notes and now you can better see that intermittent errors appear very often across jobs but very rarely twice on the same job | 19:28 |
jm1 | rlandy: for example "‘arecord’ not found." and "ensure apache is installed" happens often | 19:29 |
dasm | jm1: is there any pattern to the failure? Can we investigate some global|common fixes? | 19:29 |
dasm | I'd like to add some extra tweaks to make our jobs more reliable | 19:30 |
jm1 | dasm: https://hackmd.io/94uNoMlnQgegrgy1iXV1kQ?view | 19:32 |
jm1 | dasm: i cannot see a pattern but its late here so... | 19:33 |
dasm | > Cannot download, all mirrors were already tried without success | 19:33 |
jm1 | dasm: looks like network or connectivity is often an issue | 19:33 |
dasm | hmm... repos might be out of sync. Unfortunately it happens very often | 19:33 |
rlandy | jm1: sorry - just coming out of meeting | 19:34 |
jm1 | dasm: but it only happens for one run and on the next run it passes and fails on something else | 19:34 |
dasm | hmm | 19:34 |
rlandy | gettin back to ovb downstream | 19:34 |
jm1 | dasm, rlandy: i think this "'arecord' not found." is a real issue | 19:34 |
rlandy | jm1; when did it start? | 19:34 |
rlandy | can you track it to a specific change? | 19:35 |
jm1 | rlandy: its intermittent and i see it since yesterday but i had no time to check older jobs | 19:36 |
rlandy | dasm: can I merge https://review.rdoproject.org/r/c/config/+/44595? | 19:36 |
jm1 | rlandy: actually i started tracking intermittent issues in detail only since yesterday | 19:36 |
rlandy | jm1; ok - I'll look at it | 19:36 |
jm1 | rlandy: better deal with downstream, for these intermittent bugs we should simply create launchpad bugs tomorrow | 19:37 |
jm1 | rlandy: please simply recheck my testprojects for c9 master and c9 wallaby. those fs35 and fs64 jobs are the only ones missing for promotion and they really try hard to fail on something else everytime | 19:39 |
rlandy | jm1: sure - will leave you notes | 19:39 |
rlandy | wallaby c9 is only out fs064 | 19:41 |
rlandy | will recheck | 19:41 |
rlandy | jm1: master and wallaby are both only out on fs064 | 19:43 |
rlandy | retesting those | 19:43 |
jm1 | rlandy: oh fs35 just passed, awesome! | 19:45 |
* rlandy check fs064 failure | 19:45 | |
jm1 | rlandy: already checked it | 19:46 |
jm1 | rlandy: its in rr notes | 19:46 |
* dviroel going afj - run an errand | 19:46 | |
rlandy | jm1: ok - late for you - I'll check it | 19:46 |
jm1 | rlandy: fs64 both failed on "'arecord' not found." | 19:47 |
*** dviroel is now known as dviroel|afk | 19:47 | |
rlandy | 2022-08-31 18:45:23 | 2022-08-31 18:45:23.950835 | fa163e7d-a1a6-78e8-a52d-000000006cc1 | FATAL | try modifying forward dns record | undercloud | error={"changed": false, "msg": "`arecord` not found."} | 19:48 |
rlandy | I see that | 19:48 |
rlandy | but it carries on | 19:48 |
jm1 | rlandy: yes, i shorten it in job overview to "'arecord' not found." in rr notes. if you jump to section "Intermittent Failures" you will find a longer description | 19:50 |
rlandy | jm1; think the arecord thing looks legit | 19:50 |
jm1 | rlandy: i will file a bug tomorrow with some more details | 19:51 |
* rlandy asks on sec channel | 19:51 | |
jm1 | rlandy: ok thanks! i am eod now | 19:53 |
* jm1 have a nice evening #oooq 🥂 | 19:54 | |
rlandy | jm1: have a good night | 19:54 |
rlandy | logging bug | 19:54 |
rlandy | to pass to sec team | 19:54 |
*** dasm is now known as dasm|off | 21:40 | |
dasm|off | o/ see you tomorrow | 21:40 |
*** rlandy is now known as rlandy|bbl | 22:25 | |
*** dviroel|afk is now known as dviroel | 23:12 | |
rcastillo | o/ | 23:21 |
rcastillo | I'll be on pto until monday, see you next week | 23:21 |
dviroel | rcastillo: enjoy o/ | 23:26 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!