*** xiaolin has joined #opendev | 00:53 | |
*** ysandeep|PTO is now known as ysandeep | 02:00 | |
*** sgw has quit IRC | 02:20 | |
*** sgw has joined #opendev | 03:46 | |
*** ysandeep is now known as ysandeep|ruck | 04:32 | |
*** sgw has quit IRC | 04:56 | |
*** sgw has joined #opendev | 05:07 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS https://review.opendev.org/741868 | 05:35 |
---|---|---|
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Deprecate dib-python; remove from in-tree elements https://review.opendev.org/741877 | 05:35 |
*** marios has joined #opendev | 06:20 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: build-python-release: default to Python 3 https://review.opendev.org/742799 | 06:41 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: build-python-release: default to Python 3 https://review.opendev.org/742799 | 06:42 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: build-python-release: default to Python 3 https://review.opendev.org/742799 | 06:43 |
*** qchris has quit IRC | 06:51 | |
*** qchris has joined #opendev | 07:04 | |
*** fressi has joined #opendev | 07:12 | |
*** tosky has joined #opendev | 07:37 | |
*** dougsz has joined #opendev | 07:40 | |
*** fressi has quit IRC | 07:43 | |
*** DSpider has joined #opendev | 07:44 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 07:49 | |
*** fressi has joined #opendev | 07:51 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:03 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils https://review.opendev.org/742736 | 08:14 |
*** fressi has quit IRC | 08:14 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils https://review.opendev.org/742736 | 08:25 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Merge upload logs modules into common role https://review.opendev.org/742732 | 08:29 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils https://review.opendev.org/742736 | 08:29 |
*** ysandeep|lunch is now known as ysandeep|ruck | 08:31 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Merge upload logs modules into common role https://review.opendev.org/742732 | 08:32 |
openstackgerrit | Tobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils https://review.opendev.org/742736 | 08:32 |
*** dougsz has quit IRC | 08:35 | |
*** dougsz has joined #opendev | 08:48 | |
*** dtantsur|afk is now known as dtantsur | 08:53 | |
*** zbr is now known as zbr|ruck | 09:26 | |
openstackgerrit | Merged openstack/diskimage-builder master: Support non-x86_64 DIB_DISTRIBUTION_MIRROR variable for CentOS 7 https://review.opendev.org/740183 | 09:38 |
*** fressi has joined #opendev | 09:59 | |
openstackgerrit | Merged opendev/irc-meetings master: Rotate Large Scale SIG meeting https://review.opendev.org/742386 | 10:10 |
*** fressi has quit IRC | 11:57 | |
*** fressi has joined #opendev | 12:00 | |
*** ysandeep|ruck is now known as ysandeep|afk | 12:15 | |
*** ysandeep|afk is now known as ysandeep|ruck | 12:40 | |
*** avass has joined #opendev | 12:56 | |
*** ysandeep|ruck is now known as ysandeep|away | 12:57 | |
*** ysandeep|away is now known as ysandeep | 12:58 | |
*** ysandeep is now known as ysandeep|away | 13:44 | |
*** mlavalle has joined #opendev | 13:58 | |
clarkb | hello! | 13:59 |
clarkb | fungi: one thing I realized we may want to do is clean up the disk on review.o.o a bit. I think we have enough free space for our index backup today (it will be in the 7GB range and we have 23GB available) but before we forget a gain we should clean up stale backup material | 14:00 |
clarkb | fungi: do you think we should try and do that now really quick or do it after the downtime? | 14:00 |
clarkb | in particular we've got old index backups that I think can go away as well as old mysql backups | 14:00 |
clarkb | I probably lean towards after simply because its early and I don't awnt to think extra hard :) | 14:01 |
*** dpawlik2 has quit IRC | 14:11 | |
clarkb | fungi: (and infra-root ) I added some disk cleanup notes to https://etherpad.opendev.org/p/gerrit-2020-07-24 so that we have that ready to go. If you get a chance double checking those files and dirs would be good | 14:14 |
corvus | clarkb: what time is start? | 14:16 |
clarkb | corvus: 15:00 | 14:16 |
clarkb | about 44 minutes from now | 14:16 |
clarkb | I'll stop zuul in about half an hour so that we can confirm it is paused well before we start the outage | 14:17 |
corvus | that confused me; i think you mean "stop the periodic ansible runs executed by zuul on bridge" yeah? :) | 14:23 |
clarkb | yes, sorry not the zuul service but our consumption of it to run playbooks on bridge | 14:24 |
clarkb | via the disable ansible command on bridge | 14:24 |
fungi | clarkb: i would do the cleanup after | 14:25 |
fungi | also is disabling 15 minutes before downtime sufficiently early to avoid having an earlier build still running by the maintenance start? | 14:27 |
fungi | i guess we can always check if one's running and delay the maintenance if we need | 14:27 |
clarkb | fungi: it should be because we haven't approved anything and the hourly deploys happen at top of the hour | 14:27 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Add docker format option to skopeo in push-to-intermediate-registry role https://review.opendev.org/742892 | 14:27 |
clarkb | basically that gets us well ahead of the hourly deploy and should be enough | 14:27 |
fungi | oh, yep. perfect | 14:27 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Add docker format option to skopeo in push-to-intermediate-registry role https://review.opendev.org/742892 | 14:28 |
openstackgerrit | Oleksandr Kozachenko proposed zuul/zuul-jobs master: Modify push-to-intermediate-registry role https://review.opendev.org/742892 | 14:29 |
clarkb | the other rename step that would be good is to have someone double check the content of https://etherpad.opendev.org/p/gerrit-2020-07-24 looks correct and that I copied it to bridge at /root/renames/20200724/20200724.yaml properly | 14:31 |
clarkb | if that looks good then I think we are pretty well set up to go now | 14:32 |
fungi | i'll take another look in just a sec | 14:33 |
clarkb | er I meant to ask to review https://review.opendev.org/742731 | 14:34 |
clarkb | not the etherpad itself necessarily (but double checking the etherpad too is good as well :) | 14:34 |
clarkb | thats our input to the playbook | 14:34 |
fungi | ahh, yup | 14:35 |
openstackgerrit | Oleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-build-docker-image job https://review.opendev.org/742895 | 14:36 |
clarkb | I added an IRC status notice to the etherpad | 14:37 |
*** mlavalle has quit IRC | 14:40 | |
*** mlavalle has joined #opendev | 14:43 | |
corvus | on screen, status and renames files lgtm. | 14:43 |
clarkb | corvus: thanks for checking. Unfortunately it seems that openstack release team isn't subbed to our service-announce ml and they are doing a nova release right now :/ I've since warned them and those jobs don't tend to take long so we can likely wait for that to flush out | 14:45 |
corvus | cool. they subbed now? :) | 14:45 |
clarkb | I've been told the old rule of no friday releases has been relaxed because the testing and automation works so well (the upside in all this I guess) | 14:45 |
clarkb | corvus: yup | 14:45 |
corvus | then all's well that ends well :) | 14:45 |
fungi | heh, we've sabotaged our maintenance availability by being too stable ;) | 14:46 |
corvus | i'm planning on re-enqueing that nodepool release after we're done | 14:46 |
fungi | clarkb: disabling ansible deploys now? | 14:48 |
clarkb | fungi: yup just did it in the screen | 14:48 |
fungi | oh, cool. i've joined that screen session now | 14:49 |
clarkb | hopefully I made my terminal window small enough | 14:49 |
fungi | i suppose there's no need to move conversation to #opendev-meeting for this maintenance | 14:49 |
clarkb | we can if you'd prefer | 14:50 |
clarkb | maybe thats a good habit to get into regardelss | 14:50 |
fungi | doesn't matter to me, just suddenly remembered that's one of the reasons we created it | 14:50 |
openstackgerrit | Oleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-buildset-registry job https://review.opendev.org/742895 | 14:54 |
*** sgw is now known as sgw_away | 15:06 | |
openstackgerrit | Oleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-buildset-registry job https://review.opendev.org/742895 | 15:11 |
-openstackstatus- NOTICE: We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience. | 15:21 | |
*** fressi has quit IRC | 15:24 | |
openstackgerrit | Merged openstack/project-config master: Rename transparency-policy from openstack/ to osf/ namespace https://review.opendev.org/739286 | 15:37 |
openstackgerrit | Merged openstack/project-config master: Fix x/devstack-plugin-tobiko name https://review.opendev.org/738979 | 15:37 |
corvus | mnaser: as requested in #zuul i ran this: zuul autohold --tenant vexxhost --project vexxhost/node-labeler --job node-labeler:image:build --change 742276 --reason "mnaser debug multi-arch containers" --count 1 | 15:41 |
corvus | mnaser: and issued a recheck on that change | 15:41 |
*** marios has quit IRC | 15:46 | |
*** dtantsur is now known as dtantsur|afk | 15:53 | |
mnaser | corvus: thanks -- infra-root, appreciate access to root@198.72.124.203 which is the failed held node :) | 15:56 |
fungi | mnaser: where can i find your ssh public key? | 15:57 |
mnaser | fungi: curl https://github.com/mnaser.keys >> ~/.ssh/authorized_keys :) | 15:57 |
mnaser | but with your luck: curl: command not found | 15:57 |
mnaser | :P | 15:57 |
fungi | i did it with wget anyway | 15:58 |
fungi | you should be all set. let us know when you're done with it | 15:58 |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Add infra-prod-base job to set up git repos https://review.opendev.org/742934 | 15:59 |
mnaser | fungi: awesome. thank you. | 16:01 |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Add infra-prod-base job to set up git repos https://review.opendev.org/742934 | 16:03 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Use infra-prod-base in infra-prod jobs https://review.opendev.org/742935 | 16:04 |
corvus | clarkb, fungi: ^ i think those 2 should get us moving again | 16:05 |
openstackgerrit | Oleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-buildset-registry job https://review.opendev.org/742895 | 16:06 |
clarkb | corvus: both lgtm | 16:07 |
clarkb | thats actually cleaner than I thought it would be making me way more comfortable with just merging them without disabling most of the jobs first | 16:08 |
corvus | yeah, i'd be comfortable merging that and just watching the next run | 16:08 |
fungi | approved both | 16:10 |
*** chandankumar is now known as raukadah | 16:18 | |
corvus | clarkb, fungi: i think things are settling down... | 16:23 |
corvus | i have to prepare for my hazmat shopping expidition; i'll be back in a little while and will check back in then, sound good? | 16:24 |
openstackgerrit | Merged opendev/project-config master: Add record for project renames on July 24, 2020 https://review.opendev.org/742731 | 16:24 |
clarkb | corvus: ya thats fine | 16:24 |
fungi | corvus: have fun storming the castle! | 16:24 |
clarkb | corvus: before you go should we recheck https://review.opendev.org/742935 ? | 16:24 |
clarkb | corvus: mostly wondering if you think you want to be around for that landing | 16:25 |
clarkb | which will trigger the jobs I think | 16:25 |
clarkb | we can also wait for you to return to do that | 16:25 |
corvus | clarkb: i say don't wait for me, but it does look like 742934 needs whitespace fixing | 16:25 |
corvus | oh actually | 16:25 |
corvus | it's that the job isn't documented | 16:26 |
corvus | lemme fix that real quick | 16:26 |
clarkb | k | 16:26 |
clarkb | actually if we're going to pause I'm happy to get my bike ride in too then we can land the child change when people have returned? | 16:27 |
clarkb | fungi: ^ unless you'd prefer to just push forward I can stick around and bike later (its actaully somewhat cool again here) | 16:27 |
fungi | i have plenty of things i can knock out in the meantime, go for it | 16:27 |
fungi | still baffled by the release copy to tarballs failure, i was able to write to the same path from the same executor | 16:28 |
openstackgerrit | James E. Blair proposed opendev/base-jobs master: Add infra-prod-base job to set up git repos https://review.opendev.org/742934 | 16:29 |
clarkb | fungi: could it be a corner case issue wtih afs in docker in bwrap? | 16:29 |
corvus | clarkb, fungi: ^ i think that should take care of it | 16:29 |
clarkb | fungi: you might need to docker exec then run bwrap and try? | 16:29 |
fungi | clarkb: well, i also didn't use the exector's creds when i tested, i used my own... but yeah maybe something where certain executors aren't able to use their kerberos accounts for some reason? | 16:30 |
*** dougsz has quit IRC | 16:31 | |
fungi | it's just odd since the job runs fine most of the time, but we've had three failures in as many days... an order of magnitude more successes than failures | 16:32 |
fungi | for the same jobs | 16:32 |
clarkb | ya | 16:33 |
fungi | i wonder if the executor's bubblewrap container started before ntpd corrected the boot-time clock skew, and whether thatt has caused kerberos tickets to be invalid or something | 16:35 |
fungi | er, docker container, not bubblewrap | 16:38 |
fungi | though `date` run under docker-compose exec matches | 16:40 |
*** zbr|ruck is now known as zbr | 16:41 | |
mnaser | hmm | 16:52 |
mnaser | i see some afs wording in the backlog | 16:52 |
mnaser | is there any known issues currently? | 16:53 |
mnaser | doc promote job here failed -- but on vexxhost tenant -- https://zuul.opendev.org/t/vexxhost/build/2aeeb5d8d38d427ab03b95ca475fa408 | 16:53 |
mnaser | There was an issue creating /afs/.openstack.org as requested: [Errno 13] Permission denied: b'/afs/.openstack.org' | 16:53 |
clarkb | mnaser: yes | 16:53 |
clarkb | we don't know why yet and seems to only affect a subset of executors (we should check your job's executor for that) | 16:54 |
mnaser | let me grab it for you | 16:54 |
mnaser | Executor: ze10.openstack.org | 16:54 |
mnaser | a successful run in the same merge (minutes apart was on Executor: ze03.openstack.org) | 16:55 |
clarkb | ya thats one of the unhappy ones | 16:55 |
mnaser | so at least it consistently breaks, which is .. nice | 16:55 |
clarkb | fungi: fwiw I would try an fs checkvolumes on those servers | 16:57 |
clarkb | and if that doesn't help a reboot :/ | 16:57 |
*** redrobot has joined #opendev | 16:58 | |
openstackgerrit | Merged opendev/base-jobs master: Add infra-prod-base job to set up git repos https://review.opendev.org/742934 | 16:59 |
clarkb | I've rechecked https://review.opendev.org/#/c/742935/1 since ^ merged | 16:59 |
fungi | "All volumeID/name mappings checked." | 17:03 |
fungi | ran `sudo fs checkvolumes` on both ze10 and ze11 | 17:03 |
clarkb | ya I think thats all it ever says | 17:03 |
clarkb | it doesn't report if it fixed anything but has fixed things for wheel caches | 17:03 |
clarkb | also not sure ifi t matters but I ran fs checkvolumes aklogged into my admin account on the server | 17:04 |
fungi | i think part of the lack of reporting is because it's async | 17:04 |
clarkb | ah | 17:04 |
fungi | if memory serves, it flags them for check on next use | 17:04 |
fungi | but if that were the problem, then i shouldn't have been able to write to the same volume from those servers | 17:05 |
fungi | in theory | 17:05 |
clarkb | re gerrit disk cleanup. /opt actually has plenty of room so I'm thinking I'll move the things I identified into there then we can let them sit another week or wahtever then rm off of /opt | 17:08 |
clarkb | this is my paranoia coming through :) | 17:08 |
fungi | wfm, though i really don't think that stuff you identified will be missed, and also we've been backing it up too right? | 17:08 |
clarkb | fungi: yes it should be included in our backups. ianw was going to double check recovery after the local index was cleaned | 17:09 |
clarkb | I'm not sure if that happened | 17:09 |
clarkb | (but also these files are all so old they'd ahve been covered pre index cleanup anyway) | 17:09 |
clarkb | fungi: you're good with the list I have on the etherpad? | 17:09 |
fungi | where was it again? | 17:18 |
fungi | i remember looking at it some weeks ago and being fine with it | 17:18 |
clarkb | I added it to the maintenance etherpad https://etherpad.opendev.org/p/gerrit-2020-07-24 | 17:19 |
clarkb | that way it would be top of mind :) | 17:19 |
fungi | oh, cool, it's there now | 17:20 |
fungi | clarkb: yep, that list looks plenty safe to clean up | 17:22 |
clarkb | ok I'll start mv's now | 17:22 |
fungi | thanks! | 17:23 |
*** sgw_away is now known as sgw | 17:23 | |
fungi | what happened to your bike ride? | 17:23 |
clarkb | fungi: I decided to get some things done. I'm just in the habit of early bike rides because it gets hot later in the day but today is supposed to be cool so I'll do it in a bit | 17:24 |
clarkb | also I got hungry | 17:24 |
fungi | i moved to the patio. first day this week without a heat advisory, only a little over 30c with a pleasant breeze now | 17:26 |
clarkb | system-config-run-base failed on the infra-prod fix | 17:29 |
clarkb | I want to finish up the file mv's before I do anything else | 17:29 |
fungi | it's going to be a game of whack-a-mole | 17:31 |
fungi | "Check that ARA is installed" also runs afoul of the same problem | 17:32 |
fungi | https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base-post.yaml includes the ara-report role which runs `bash -c "type -p {{ ara_report_executable }}"` | 17:37 |
fungi | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ara-report/tasks/main.yaml#L3 | 17:37 |
clarkb | fungi: I think that is doing an ara report for the nested ansible ? if so maybe we can fix that by running it on the remote node | 17:37 |
fungi | it runs on "localhost" https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base-post.yaml#L73-L81 | 17:38 |
clarkb | ya but its running it against the sqlite file produced by nested ansible so I think we can instead generate that report on the remote node before we copy logs | 17:39 |
clarkb | basically generate then copy to logs dir on executor not copy to executor then generate | 17:40 |
fungi | so we'll need to adjust ara_database_path and ara_report_path too | 17:41 |
clarkb | yes | 17:41 |
clarkb | as well as move the task location so it happens before logs are copied | 17:41 |
clarkb | above line 53 ish | 17:42 |
fungi | ahh, yep, so we'd have it use ara_database_path: "{{ log_dir }}/ara-report/ansible.sqlite" | 17:42 |
clarkb | no {{ log_dir }} is executor relative already | 17:42 |
clarkb | /var/cache/ansible/ara.sqlite is the bridge.o.o file | 17:43 |
fungi | oh, duh, "/var/cache/ansible/ara.sqlite" | 17:43 |
fungi | yep, i mixed up src and dest | 17:43 |
clarkb | I'm goign to close our root screen nwo since it isn't being used | 17:44 |
fungi | should we tell ara-report to write the html report there too i guess? | 17:44 |
clarkb | file moves are done | 17:44 |
clarkb | fungi: we should write the html report to /home/zuul/logs or similar I think there is a convention for that then base jobs automatically grab them | 17:44 |
*** AJaeger has quit IRC | 17:45 | |
clarkb | #status log Moved files out of gerrit production fs and onto ephemeral drive. Assuming this causes no immediate problems those files can be removed in the near future. This has freed up space for gerrit production efforts. | 17:47 |
openstackstatus | clarkb: finished logging | 17:47 |
clarkb | fungi: also as another option we can just disable ara for now | 17:48 |
clarkb | fungi: continue to copy the sqlite file then if we need it we can manually generate a report | 17:48 |
clarkb | that may be quickest for now and keep things moving | 17:48 |
fungi | looks like it's /home/zuul/zuul-output/logs | 17:51 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Run ara-report on bridge in run-base-post https://review.opendev.org/742955 | 17:55 |
fungi | clarkb: ^ is that what you were thinking? | 17:55 |
clarkb | looking | 17:55 |
fungi | may still need that task which creates the report dir, if ara won't do so itself | 17:56 |
fungi | in retrospect, we likely added it for a reason | 17:57 |
clarkb | fungi: I think the task you rm'd at line 10 is still needed for th task at line 54. As an alternative copy the sqlite file into the test node's zuul/zuul-output/logs dir | 17:57 |
clarkb | fungi: I think its correct to drop the hostname when working on the remote host because then when we copy we automatically prefix with the hostname | 17:58 |
fungi | yeah, that's the one i was thinking i'd need to put back (but also move it to run on the node) | 17:58 |
clarkb | but you are right that you may need to create zuul-report/logs/ara-report/ ? | 17:58 |
clarkb | then also change the sqlite copy to go into zuul-report/logs/ | 17:58 |
clarkb | but ya that looks like the thing we want | 17:59 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Run ara-report on bridge in run-base-post https://review.opendev.org/742955 | 18:01 |
fungi | oh, i see what you meant about the sqlite copy | 18:01 |
fungi | what's the best way to copy/move a file on the node in ansible? | 18:02 |
clarkb | fungi: left comments on the change itself | 18:04 |
clarkb | fungi: oddly one of your responses was to the base of the chagne. Not ps2 | 18:09 |
clarkb | took me half a second to udnersatnd what the question was :) | 18:09 |
fungi | huh, i wonder if gertty is having trouble aligning comments with the right lines | 18:10 |
*** gmann is now known as gmann_lunch | 18:13 | |
fungi | oh, i see, if you comment on an unchanged line gertty seems to associate it with the base not new | 18:16 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Run ara-report on bridge in run-base-post https://review.opendev.org/742955 | 18:18 |
clarkb | fungi: you missed the last synchronise, but that looks like what I would expect otherwise | 18:19 |
clarkb | maybe we want to see if this run works before updating? | 18:19 |
fungi | not sure what you mean by "missed the last synchronize" | 18:20 |
clarkb | I left a comment in the change | 18:20 |
fungi | i asked if you meant it on a different line and you replied yes? | 18:21 |
fungi | are you saying the synchronize from /etc/ansible to "{{ log_dir }}/etc" needs a separate call to create that path too? | 18:21 |
fungi | i didn't see where the job was originally doing that, if so | 18:21 |
clarkb | oh I did misparse the moved comment then. I thought you were asking if the see above was referring to the prior comment | 18:22 |
clarkb | fungi: that last task in the file is copying from test node to the executor. But we're no longer creating the directory on the executor to copy into | 18:22 |
clarkb | I think it may be better to copy everything as we'd normally do via the zuul-output dir and have it all get collceted together from there | 18:23 |
clarkb | that said if rsync will create the dir for us it may just work | 18:23 |
clarkb | fungi: the task on line 10 is what creates that dir which is removed in your change | 18:23 |
fungi | gertty and the gerrit webui totally don't seem to agree about which line numbers those comments are supposed to be on | 18:23 |
fungi | so gimme a sec, i'm completely confused about which synchronize you're talking about. i thought i removed the one for the dest we were directly creating previously | 18:24 |
clarkb | fungi: you removed one of the two | 18:24 |
clarkb | we need to remove the second one as well. | 18:24 |
fungi | but we weren't creating the path for the other one? | 18:24 |
fungi | the one for etc? | 18:24 |
clarkb | oh I see its /etc instead of /ara-report so ya the synchronize will probaly work there | 18:25 |
fungi | i'm pulling up the latest patchset in the gerrit webui so i can go by the line numbers it displays, just a sec | 18:25 |
clarkb | I still think its odd to copy directly like that if we're relying on zuul-output but it should be functional | 18:25 |
clarkb | maybe lets wait on CI results for this? | 18:25 |
fungi | so in the newest version, the "{{ log_dir }}/etc" dest at line 69, the playbook never created that before now either | 18:26 |
fungi | at least not that i can see | 18:26 |
clarkb | ya when I was reading it before I was thinking etc was under ara-report | 18:26 |
clarkb | but it isn't | 18:26 |
fungi | okay, cool | 18:26 |
clarkb | so this is likely fine. Except its weird to copy files like that and also via zuul-output. But lets let it run and see if the changes as is work as there may be things we haven'y considered yet | 18:27 |
fungi | i *suspect* the reason the playbook used to have a "ensure bridge ara log directories exist" task is that ara itself wouldn't create the directory where it's told to dump its report | 18:27 |
fungi | so to satisfy it, i switched to creating the report directory on the node (before generating the report on the node) | 18:28 |
fungi | i doubt it was there to satisfy the synchronize to the executor | 18:29 |
dmsimard | o/ reading scrollback | 18:29 |
corvus | clarkb: i agree with your comment but am not too fussed about it :) | 18:30 |
dmsimard | thanks for looking into this! I haven't had the change to troubleshoot yet but I had provided some info http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-22.log.html#t2020-07-22T02:52:55 | 18:31 |
dmsimard | s/change/chance/ | 18:31 |
corvus | dmsimard: can you change that play to run ara-report on the remote node instead of the executor? that's what fungi is doing for the system-config jobs | 18:32 |
clarkb | the devel job is failing but I think because of ansible devel being unhappy with our stuff | 18:32 |
dmsimard | ara-report originally ran from the executor to generate a report based on the executor's perspective (not the nested one) | 18:33 |
dmsimard | there is a use case for that and there is a different use case for the nested one | 18:33 |
clarkb | dmsimard: ya for the non nested case that will have to be run trusted | 18:34 |
corvus | dmsimard: imeant the zuul-executor | 18:34 |
fungi | argh, i replied to your comment with gertty and ended up commenting on the base side agani | 18:34 |
fungi | again | 18:34 |
corvus | fungi: hit "tab" :) | 18:34 |
clarkb | but that should all be handled by base jobs if it is what the zuul install wants? its only the nestedcase that is a problem and fungi's change shows how we can fix that | 18:34 |
dmsimard | clarkb: yeah, my understanding is that the one from the executor perspective now has to be trusted | 18:34 |
corvus | fungi: oh, this may be a unified diff usability issue | 18:34 |
corvus | dmsimard: were you running ara from the zuul executor before? | 18:35 |
fungi | corvus: oh, yep! i'm using unified diff, and gertty seems to assume it should comment on the base if commenting on an unchanged line | 18:35 |
fungi | (tab does nothing in unified) | 18:35 |
fungi | i can work around it | 18:36 |
corvus | dmsimard: sorry i meant ansible | 18:36 |
corvus | dmsimard, clarkb: let me rephrase this to either clarify or elucidate my confusion: the job currently runs ansible $somewhere; it seems that since it's failing when running ara on zuul-executor in post, it must be the case that the job got past the point where it ran ansible, so it must be running ansible on some test node. can the post playbook be updated to run ara-report on that node? | 18:37 |
clarkb | "2020-07-24 18:37:34.709316 | Something failed during the generation of the HTML report." | 18:38 |
dmsimard | corvus: zuul executor runs ansible with the ara callback enabled, right ? in the executor's buildroot there'll be a ~/.ara/ansible.sqlite file from which we can generate html reports | 18:38 |
clarkb | corvus: that is exactly what we are doing with system-config (or attempting to do). I think it would be possible for other jobs too. We may even be able to copy the sqlite file from executor to remote then do the "compile" on remote | 18:39 |
corvus | oh, this isn't ara-report for the ara tests | 18:39 |
corvus | this is ara-report for this job? | 18:39 |
dmsimard | now, there is also a nested report -- because inside the job we install ansible and ara and then run ansible... and then generate a nested report | 18:39 |
clarkb | corvus: the thing fungi and I are looking at is for system-config-run jobs which is ara report on nested ansible. I'm not sure if dmsimard is talking about another case? | 18:40 |
corvus | clarkb: i am sure dmsimard is talking about another case (the ones he linked to :) | 18:40 |
corvus | clarkb: we're good in system-config. i +2d the change. i expect it to work. | 18:40 |
clarkb | corvus: it failed fwiw | 18:40 |
fungi | yeah, but we're probably close | 18:40 |
corvus | clarkb: i still stand by my statement :) | 18:40 |
fungi | ;) | 18:41 |
clarkb | oh wait it succeeded the job but ara failed | 18:41 |
clarkb | maybe that is good enough for now | 18:41 |
clarkb | https://ca46ccff70fd6ee77e6c-5f381a9e8c14b627196c6ef3340b4d4e.ssl.cf1.rackcdn.com/742955/3/check/system-config-run-base/2a5c90d/bridge.openstack.org/ara-report/ ya we didn't get a report but do get the sqlite file | 18:41 |
clarkb | I think I can live with that for now as we can always grab the sqlite file and generate a report locally if necessary. fungi corvus any objectiosn to approving the chagne given ^ | 18:42 |
corvus | clarkb: err | 18:42 |
corvus | clarkb: i pretty much rely on the ara report to debug those jobs | 18:42 |
dmsimard | http://paste.openstack.org/show/796296/ is the error from the generation | 18:42 |
fungi | yeah, trying to see if i can work out why it failed | 18:42 |
dmsimard | not very helpful :( | 18:43 |
clarkb | k I'll hold off on approving | 18:43 |
dmsimard | let me pull the database and check | 18:43 |
clarkb | https://zuul.opendev.org/t/openstack/build/2a5c90ddf6dc4848955e0923862acf3e/console#3/2/11/bridge.openstack.org | 18:43 |
corvus | clarkb: i'll say it's worth another 15 minutes effort before we cut our losses and approve it just to get things moving :) | 18:43 |
clarkb | the command line there looks wrong | 18:44 |
clarkb | almost as if it assumes to run on the executor | 18:44 |
corvus | clarkb: yep | 18:44 |
fungi | looks like "install-ansible: Verify ansible install" failed? | 18:44 |
clarkb | which may be why we did the old process | 18:44 |
dmsimard | I'm not sure how that's worked before | 18:44 |
clarkb | dmsimard: before it relied on the zuul bug to run on the executor | 18:44 |
dmsimard | but the nested report worked, right ? | 18:45 |
corvus | final_ara_report_path: "{{ zuul.executor.log_root }}/{{ ara_report_path }}" | 18:45 |
corvus | that's hardcoded in the ara-report role | 18:45 |
fungi | ohh | 18:45 |
clarkb | dmsimard: the nested report was generated on the executor but with the remote sqlite file | 18:45 |
clarkb | dmsimard: now we know why :) | 18:45 |
dmsimard | clarkb: or maybe it ran ara-manage on bridge instead ? | 18:46 |
openstackgerrit | Mohammed Naser proposed zuul/zuul-jobs master: dnm: test multiarch https://review.opendev.org/742967 | 18:46 |
fungi | also i just realized i was looking at the wrong build result | 18:46 |
dmsimard | although as I read the code it would run ara-report on localhost which I guess is the executor | 18:47 |
fungi | yep, that's the executor | 18:48 |
fungi | we're trying to run it on the node where the nested ansible is invoked instead | 18:48 |
corvus | i'm going to propose a change, 1 sec | 18:48 |
fungi | and then copy the html report back to the executor for publication | 18:49 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Allow ara-report to run on any node https://review.opendev.org/742971 | 18:52 |
corvus | dmsimard, clarkb, fungi: ^ i believe this is how we imagined that role worked | 18:53 |
corvus | i think that should be backwards compatible for anyone using the role in a base job post playbook with default values | 18:53 |
dmsimard | yeah that makes sense | 18:53 |
corvus | i'm not sure if we want to announce that change though in case people are using it in another awy | 18:54 |
corvus | my feeling is that i don't expect the change to break anyone, so we should go with that approach rather than, say, adding a bunch of new forward-compatible variables. but we still should probably announce it with a warning period. | 18:55 |
clarkb | corvus: in the commit message is ara_report_root meant to be ara_report_path? /me triyng to understand the risk of the change now | 18:55 |
corvus | clarkb: yes | 18:55 |
clarkb | ah yup because its relative append before | 18:56 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Allow ara-report to run on any node https://review.opendev.org/742971 | 18:56 |
clarkb | so if they had a relative path before it will break | 18:56 |
clarkb | I agree I think risk is reasonably low because there aren't a ton of useful paths to operate on within the executor | 18:56 |
clarkb | a warning period should be fine | 18:56 |
clarkb | if we do that would the plan be land fungi's change as is, then in a week or two land the ara role update and it will start working in fungi's change? | 18:57 |
clarkb | we may even be able to test it with a depends on ? | 18:57 |
corvus | clarkb: sgtm and yes | 18:58 |
corvus | dmsimard: i'll wait for a code-review vote from you on https://review.opendev.org/742971 and if you like it, i can send the announcement | 18:58 |
dmsimard | is the change for localhost execution released yet ? then I guess if people update without a change they would have the same breakage | 18:58 |
clarkb | dmsimard: yes it was released yesterday | 18:58 |
corvus | dmsimard: note that if they're using ara-report in a trusted base playbook, it'll still work | 18:58 |
dmsimard | change is good +2 :) | 19:01 |
corvus | case A) ara-report in base job trusted post playbook with default values: works past, present, and future. case B) ara-report in base job trusted post playbook with a non-standard ara_report_path: works past, present, will break with my change. case C) ara-report in untrusted playbook: worked in past, broken now, can be fixed in future if run on a worker node after my change merges. | 19:01 |
corvus | cool, i'm going to grab lunch now, will send email after | 19:01 |
clarkb | corvus: should I approve fungi's change? | 19:02 |
clarkb | you're ok with it landing half broken I mean? | 19:02 |
clarkb | well ist got my +2 now. I'm going to similarly do the bike ride now and find food and all that | 19:02 |
clarkb | I think the ara report change in system-cofnig is fine as well as the infra-prod reparenting change if yall want to alnd that while I'm out | 19:03 |
dmsimard | I downloaded the database and trying to figure it out.. haven't touched 0.x in a bit and getting tracebacks :D | 19:03 |
dmsimard | need to look up the python and ansible version used | 19:03 |
dmsimard | python3.6 and ansible 2.9.11 | 19:09 |
dmsimard | ok the tracebacks were my fault, was running from an old checkout o_O | 19:15 |
dmsimard | nothing wrong with generating the report from the database locally | 19:16 |
dmsimard | all green in the playbooks too | 19:17 |
*** gmann_lunch is now known as gmann | 19:19 | |
corvus | dmsimard: yeah, i suspect the error is just "can't open file at path" | 19:26 |
corvus | clarkb: i approved fungi's change | 19:27 |
openstackgerrit | Merged opendev/system-config master: Run ara-report on bridge in run-base-post https://review.opendev.org/742955 | 19:46 |
*** DSpider has quit IRC | 19:47 | |
fungi | i'm done grilling dinner and back now too | 20:19 |
clarkb | Im done biking | 21:03 |
clarkb | has the fix for infra prod been rechecked? | 21:03 |
clarkb | it has been | 21:04 |
clarkb | but its failing in the gate on a puppet change :/ | 21:06 |
fungi | yep, system-config-puppet-apply-4-ubuntu-xenial | 21:07 |
fungi | just noticed it too | 21:07 |
clarkb | I'm making lunch but should be around for a few more hours to help shepherd that in, I'm also happy if we decide it can wait for monday :0 | 21:08 |
clarkb | er :) | 21:08 |
clarkb | (I'm not sure how close to wanting a weekend othes are) | 21:08 |
fungi | i'm always weekending, but taking a look at it in a sec | 21:11 |
fungi | https://zuul.opendev.org/t/openstack/build/323bf84fc8a545c8bba16a5ee2dbc7ac/log/applytest/puppetapplytest11.final.out.FAILED#20 | 21:12 |
clarkb | thats a nested ansible I believe | 21:13 |
fungi | ssh host key verification failure connecting to localhost | 21:13 |
clarkb | so localhost should be literally right there | 21:13 |
clarkb | also we run ansible a bunch of times for all the other puppet applies | 21:13 |
fungi | yeah | 21:13 |
clarkb | I'm guessing thats a recheck and ignore it for now? | 21:13 |
fungi | also this same job passed in check | 21:13 |
fungi | yup | 21:13 |
fungi | very odd though | 21:14 |
clarkb | fungi: ya and in that same job it will have run ansible for the other paply tests | 21:14 |
clarkb | it does one for each different puppet host | 21:14 |
fungi | right | 21:14 |
clarkb | it is in the gate again | 21:44 |
openstackgerrit | Merged opendev/system-config master: Use infra-prod-base in infra-prod jobs https://review.opendev.org/742935 | 22:01 |
fungi | yay! | 22:02 |
fungi | now wait for the next deploy? | 22:02 |
clarkb | it should enqueue a deploy from that change I think | 22:03 |
clarkb | https://zuul.opendev.org/t/openstack/stream/7305e6f77df04045b2e9350c657895f8?logfile=console.log that job is one of them I think | 22:04 |
clarkb | then ya the next hourly run should include manage-projects (we just missed the previous one I checked it failed) | 22:04 |
clarkb | we have permissions issues now | 22:04 |
fungi | missed it by one minute | 22:05 |
clarkb | so its still not working | 22:05 |
clarkb | oh maybe those were happening already | 22:05 |
clarkb | ok maybe it did work? I'm trying to find the logs on bridge now | 22:06 |
clarkb | nope install ansible log is still a week old. I'm confused as to what actually failed | 22:07 |
clarkb | the job succeeded but it didn't really do anything? | 22:08 |
clarkb | https://zuul.opendev.org/t/openstack/build/7305e6f77df04045b2e9350c657895f8 | 22:08 |
clarkb | its like the run playbook didn't run | 22:09 |
clarkb | so we ran pre, then post and succeeded | 22:09 |
fungi | skipped run... why? | 22:10 |
clarkb | no clue | 22:11 |
clarkb | I've grepped the logs for the run on ze03 now and am trying to see if the executor says more | 22:12 |
clarkb | provided hosts list is empty, only localhost is available. | 22:12 |
clarkb | I think that is the problem | 22:12 |
fungi | the console view is having trouble rendering the json or else it's really just empty for those plays | 22:12 |
clarkb | ya its empty because it had no hosts to run on | 22:13 |
clarkb | because the base-jobs side is a separate playbook we need to add host again in the system-config side | 22:13 |
clarkb | I'll work on that change now | 22:13 |
fungi | why did that change? | 22:13 |
fungi | oh! right | 22:14 |
fungi | the stuff that went into opendev/base-jobs | 22:14 |
fungi | okay, got it | 22:14 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Continue to add_host here even though we do it in base-jobs https://review.opendev.org/743005 | 22:16 |
clarkb | corvus: fungi ^ | 22:16 |
clarkb | I've also added in the ssh host key but it may not be strictly necessary | 22:17 |
clarkb | now separately there are some pyc files that the ansible complains about not being able to cleanup but I suspect that has been the case for a while and isn't a regression | 22:17 |
corvus | clarkb: lgtm | 22:18 |
clarkb | corvus: I wonder if we can update the zuul console to catch the "no hosts matched" situation and give that info | 22:19 |
clarkb | (I'm not sure what the json looks like) | 22:19 |
clarkb | the fix is in the gate now | 22:36 |
openstackgerrit | Merged opendev/system-config master: Continue to add_host here even though we do it in base-jobs https://review.opendev.org/743005 | 22:54 |
clarkb | https://zuul.opendev.org/t/openstack/stream/ad587c4bfafc49d9a5b1ec535f5f8229?logfile=console.log is running now | 22:55 |
clarkb | Running 2020-07-24T22:55:17Z: ansible-playbook -v -f 5 /home/zuul/src/opendev.org/opendev/system-config/playbooks/install-ansible.yaml has shown up on bridge in /var/log/ansible/install-ansible.something.log | 22:56 |
clarkb | its looking happy so far | 22:56 |
clarkb | bridge.openstack.org : ok=27 changed=2 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0 | 22:57 |
fungi | yay! | 22:57 |
clarkb | it updated virtualenv a point release | 22:58 |
clarkb | and that seems to have been about it | 22:58 |
clarkb | and in a few minute swe should have hourly jobs I think | 22:58 |
clarkb | and we should finally update the ssl cert for the linaro mirror now that things are running again | 22:58 |
clarkb | that was our canary I think | 22:58 |
fungi | we're up to 2 or 3 mirrors with certs expiring <30 days now | 22:59 |
clarkb | they should update today/tonight I think | 22:59 |
clarkb | assuming this fix is globally happy (it should be we addressed a thing that is happy on one job and used the same way by other jobs) | 22:59 |
fungi | yep, i agree | 23:00 |
clarkb | and now we have hourly jobs. No manage-projects jobs whcih would've been good to see but we really sould be fine there too since we use latest project-config in them so I think if the jobs we do have are happy running now we're all set | 23:02 |
*** mlavalle has quit IRC | 23:02 | |
fungi | wfm | 23:02 |
clarkb | one down 7 to go | 23:04 |
clarkb | probably the most important one is the zuul one to pick up the project renames in zuul's config | 23:04 |
fungi | right | 23:04 |
clarkb | bridge.openstack.org : ok=36 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0 from service-bridge | 23:06 |
clarkb | cloud launcher is doing its thing. Looks as expected to me so far | 23:11 |
clarkb | bridge.openstack.org : ok=477 changed=0 unreachable=0 failed=0 skipped=1022 rescued=0 ignored=0 thats cloud launcher | 23:19 |
clarkb | nodepool runs on 7 hosts so I won't paste all of it here but it looked good to me | 23:26 |
clarkb | registry is just finishing up. Zuul is next | 23:28 |
*** shtepanie has joined #opendev | 23:31 | |
clarkb | zuul is spending a lot of time gathering facts | 23:34 |
clarkb | I think zm01 is out to lunch | 23:38 |
clarkb | so I guess we wait until it gives up then hope that we update zuul01 anyway? | 23:38 |
clarkb | fungi: zm01 is pingable from here but not respnding to ssh, do you see the same? | 23:38 |
fungi | um, checking | 23:45 |
fungi | i can ssh to it | 23:45 |
fungi | both via ipv4 and ipv6 | 23:45 |
fungi | clarkb: did it recover for you? | 23:45 |
fungi | oh, zm01 | 23:45 |
fungi | hold on | 23:45 |
fungi | yeah, it gets partway through the ssh key exchange | 23:46 |
fungi | and then hangs | 23:46 |
clarkb | ya | 23:46 |
clarkb | so that might be another case where we need to reboot :/ they keep dropping like flies | 23:47 |
clarkb | I'm running out of steam | 23:47 |
clarkb | (I expect we're in a happy state now less ssh that doesn't timeout in a reasonable amount of time) | 23:47 |
*** tosky has quit IRC | 23:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!