Monday, 2022-02-21

ianwit's rather nail-bighting having 25 system-config jobs in gate00:29
opendevreviewMerged opendev/system-config master: Base work for exporting encrypted logs
opendevreviewMerged opendev/system-config master: run-production-playbook: return encrypted logs
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: default false when encrypting logs
NeilHanloni'm not particularly opinionated on where it's fixed as i mentioned in the comments for the swap change for dib. I think ultimately either is fine and if something happens down the line we can always fix it then01:11
opendevreviewMerged opendev/system-config master: run-production-playbook: default false when encrypting logs
opendevreviewIan Wienand proposed opendev/system-config master: infra-prod: bump codesearch playbook
ianw has run post ^^ so we're back to working on the regular path01:59
opendevreviewIan Wienand proposed openstack/diskimage-builder master: bootloader: fix arm64 install path
opendevreviewMerged opendev/system-config master: infra-prod: bump codesearch playbook
*** pojadhav is now known as pojadhav|ruck03:32
ianwhrm, i guess ^ has falled behind the periodic jobs, so it may be a while before it runs.  hopefully i can check on it03:35
opendevreviewIan Wienand proposed zuul/zuul-jobs master: encrypt-file: become when installing packages
opendevreviewIan Wienand proposed zuul/zuul-jobs master: encrypt-file: become when installing packages
*** frenzy_friday is now known as frenzyfriday|ruck03:59
*** frenzyfriday|ruck is now known as frenzyfriday|rover04:00
opendevreviewMerged zuul/zuul-jobs master: encrypt-file: become when installing packages
*** ysandeep|out is now known as ysandeep04:38
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Update fedora element testing to F35
*** prometheanfire is now known as Guest204:59
*** Guest2 is now known as prometheanfire05:09
ianwclarkb / fungi : not really going to argue much over the rocky fix, if you still prefer the dib side feel free to merge.  if we do though, i think we've got two more for probably a 3.18.1 release05:23
ianw as gentoo is currently failing05:23
ianwand (or something like it) to fix a regression in arm64 bootloader path i should have realised when we cleaned that up05:24
ianwi'm not 100% sure of the status of stevebaker's changes05:26
*** amoralej|off is now known as amoralej07:01
*** ysandeep is now known as ysandeep|afk07:07
opendevreviewIan Wienand proposed opendev/system-config master: infra-prod: bump codesearch playbook (again)
*** jpena|off is now known as jpena08:36
*** ysandeep|afk is now known as ysandeep08:47
*** sshnaidm|afk is now known as sshnaidm08:55
opendevreviewMerged opendev/system-config master: infra-prod: bump codesearch playbook (again)
dpawlikfungi, Clark[m]: hey, could you tell me on which region you spawned logscraper  instance? I would like to have that information in my notes09:31
ianwi found it, it's vexxhost ca-ymq-1.  we just wanted to hard reboot it09:39
ianwit looks like permissions errors with the gpg signing.  have to think about that one, but i think it can wait09:39
dpawlikthanks ianw++09:43
*** rlandy|out is now known as rlandy|ruck11:13
*** dviroel_ is now known as dviroel11:16
*** ysandeep is now known as ysandeep|afk11:57
*** rcastillo|rover is now known as rcastillo12:35
*** arxcruz|ruck is now known as arxcruz12:38
*** ysandeep|afk is now known as ysandeep12:48
*** amoralej is now known as amoralej|lunch13:09
fungiianw: gpg signing of what?13:15
fungiclarkb: this is worth keeping an eye on from a "where's setuptools going next" perspective:
*** amoralej|lunch is now known as amoralek13:59
fungiinfra-root: if the gitea links in gerrit check out for you (they're looking fine to me so far) we should be able to proceed with to block public access to the gitiles plugin14:01
opendevreviewPierre Riteau proposed opendev/irc-meetings master: Remove inactive IRC chairs
amusilgtema: Hi, is there any plan about when openstacksdk 0.62.0 will be released?14:25
gtemanext planned is 1.0, but that will take a little bit more time14:28
amusilOk, thanks 14:30
*** pojadhav|ruck is now known as pojadhav|dinner14:51
dpawlikfungi, Clark[m]: it took some time today to ensure that the issue with logscraper workflow is ok, the system is also ok (after an upgrade), but logstash seems to be "freezed".  Sometime ago I got same issue with the logstash service. That prompts me to think if the current log workflow is correct and reduce some services, if it is possible. 14:56
*** dviroel is now known as dviroel|lunch15:26
*** ysandeep is now known as ysandeep|out15:31
*** pojadhav|dinner is now known as pojadhav|ruck15:44
clarkbdpawlik: I'm not sure I understand what you mean by reduce some services. Do you mean index less data?16:00
*** ykarel is now known as ykarel|away16:03
dpawlikclarkb: due the logstash service is freezed once again, maybe the whole log workflow can be improved:
dpawlikclarkb: if you are checking that etherpad, feel free to comment16:10
dpawlikclarkb: I will write an email today/tomorrow16:10
clarkbdpawlik: thanks, left a couple thoughts for things I Noticed.16:17
dpawlikclarkb: yep, I saw. Thanks for commenting. I will do an email and send on the mailing list tomorrow16:17
dpawlikif nobody will against that, I will start working on the improvements16:18
dpawlikclarkb: TBH I will leave it as it is, but the logstash freezed second time and if nobody is able to restart/check whats going on, it makes a problem16:19
clarkbdpawlik: yes, problems like this are why we haven't been able to maintain and operate the service let alone upgrade it. It needs a lot of care16:19
dpawlikthat's why I was thinking with tristanC for the log workflow improvement16:20
clarkbbecause the volume of logs is non trivial16:20
dpawlikyup, and there will be a lot of them. I'm hoping that it will be just "download" and in few minutes it will be computed by other service that send it to the elasticsearch and log should be deleted16:21
clarkbfungi: I think what I'm reading re setuptools is that more and more of what made PBR useful is getting consumed by upstream. Seems like most of what PBR would be doing for us is versioning? I half wonder if a good approach here is to replace PBR with a git versioning specific plugin and then rely on setuptools-scm and setuptools and pepwhateveritis to get the tools installed via the16:21
clarkbtoml spec16:21
fungispecifically what pbr is doing for us is pep-440 compliant semver versioning based on git tags, recording of git commit info in custom package metadata, and generation of changelog and authors files based on git history16:23
*** dviroel|lunch is now known as dviroel16:26
clarkbfungi: one of the things I don't quite understand with modern setuptools is how you are expected to hook into it these days. But ya it seems like hooking into setuptools for that subset of functionality should be doable then we don't have to worry about the other bits16:26
fungifor the versioning, pbr is solving a couple of problems i haven't seen setuptools-scm cover: scalar dev versions, and determination of upcoming major/minor/patch level increment16:26
clarkb-scm also doesn't handle the git hashes safely iirc. Maybe that is what you meant by scalar dev versions16:27
fungithat, yes16:29
fungisetuptools-scm creates non-pep440-compliant dev versions16:29
clarkbfrickler: did you have a chance to test the mergeability update on our held gerrit yet? I'm happy to help with that if I can just let me know16:30
fungipbr solves it by putting the commit id in separate metadata and recording commit count in the dev version string, relying on clues in commit messages within the commit history to disambiguate upcoming versions across concurrent branches16:31
fungithat comes in really handy for projects like openstack which release from multiple branches16:31
clarkbfungi: ianw  NeilHanlon I have approved to address the coreutils rocky problem outside of dib. Once that lands we can unpause image builds and see what happens next :)16:32
fungisgtm, thx!16:32
clarkbinfra-root that large set of changes is the next step for OpenDev project retirements. Once those land I'll push up a change to remove them from zuul and we can abandon their open changes16:33
clarkbfungi: its also useful when you are on a single branch as it makes clear that things aren't sortable as expected if you are testing two bugfixes to a single branch without stacking them16:34
clarkbfungi: re your question to ianw about gpg signing I think this is the error and context
fungiaha, thanks16:39
opendevreviewMerged openstack/project-config master: infra-package-needs: don't require coreutils for Rocky Linux 8
*** amoralek is now known as amoralej|off16:43
fungiclarkb: ianw: i think that's a red herring. "gpg: can't create '/var/log/ansible/service-codesearch.yaml.log.gpg': Permission denied" seems more likely to indicate that the /var/log/ansible/ path isn't writeable by the zuul user (try adding become:true?)16:44
clarkbthat makes sense16:44
clarkbor stage and then copy with perms16:44
fungithe logs we write there on production bridge.o.o are created by root16:45
clarkbfungi: ya and I want to say the dir is 775 and zuul isn't in the root group16:48
clarkbthe testing udpated it to 755 too maybe even16:49
*** jpena is now known as jpena|off17:43
fricklerclarkb: didn't get to it today, feel free to take over if you have time17:44
clarkbok will see17:48
*** pojadhav|ruck is now known as pojadhav|out17:49
opendevreviewMerged opendev/irc-meetings master: Remove inactive IRC chairs
NeilHanlonclarkb: ack, thank you!17:53
clarkbthe updated element made it onto the builders and I've run `nodepool image-unpause rockylinux-8`18:19
clarkbnb02 is building the image now so we should be able to follow that log and see if there are more things to look at18:20
clarkb+ dnf -y install --enablerepo=epel haveged then Error: Unknown repo: 'epel'18:54
clarkbI'll run the pause against that image next18:54
fungiNeilHanlon: ^ next iteration18:55
clarkband pause is failing on a 503 from vexxhost trying to get their client profile. I'll try again in a few minutes in case this is not persistent18:55
opendevreviewMerged openstack/diskimage-builder master: dhcp-all-interfaces: opt let NetworkManager doit.
clarkbyup rerunning a minute later seems to have worked18:56
NeilHanlonclarkb, fungi: i have some feeling that is due to something jrosser was doing the other day with nodepool and epel19:15
clarkbNeilHanlon: the root of the problem is we install haveged on the VMs to increase entropy on the VMs. To do that on other rhel likes we have to install it from epel. I wonder if we just need to add epel as a repo source to the images19:16
* jrosser hopes there is not another repo to have to opt out of….19:17
clarkbjrosser: I don't think so. THis is our rocky image builds failing beacuse it cannot install haveged from epel19:19
clarkbI think we either need to find somewhere to install haveged from or don't install it and accept less entropy on rocky19:19
clarkbI beleve starting with 9 we expect haveged to not be necessary due to kernel changes?19:19
NeilHanlonI think jrosser made a change to not auto-enable epel; but i may be misremembering19:22
NeilHanlonall that needs doing is installing epel-release first (or otherwise enabling it) which is available from  the extras repo on the image19:23
clarkbI suspect that the centos elements may be preinstalling epel for us19:23
clarkband that is why it has caught us out here19:23
clarkbah nope I see it, one sec19:24
jrosserso long as whatever happens matches what we find in the wild with a bare metal host, I’m happy :)19:26
clarkbjrosser: I think that ship has already sailed on coreutils19:26
clarkbbut this would be setup to mimic what centos is doing19:26
opendevreviewMerged openstack/diskimage-builder master: update gpg / file verification for Gentoo
opendevreviewMerged openstack/diskimage-builder master: Make growvols config path platform independent
opendevreviewClark Boylan proposed openstack/diskimage-builder master: Add rocky support to the epel element
clarkbI think ^ is needed to make the epel role do what we want?19:35
clarkbthe role is included it just doesn't know how to handle rocky tet19:35
clarkb is the relevant bit of the build log and I Think 830278 should address that by ending up in the other block in the if else19:37
NeilHanlonthat seems sane to me clarkb19:51
clarkbthank you for looking19:51
ianwthanks for looking in on the rocky bits, i'll double check but i think the epel element is the way to go for our images20:46
ianwyes the gpg signing was referring to the codesearch deploy job20:46
ianwit's probably better to open the permissions on the log dir more than run as root, i'm thinking?20:47
tristanCis unreachable?20:47
ianwtristanC: hrm, yes it could be ... i got an error20:48
clarkbyes its been discussed in #openstack-infra20:48
clarkbthe hypervisor hosting the instance is apparently dead dead and is being recovered now20:48
clarkbI've mentioned to mnaser that we can rebuild the host if necessary and to give us an indication if we should do that (the assumption being recovering the instance on another hv will be quicker right now though)20:48
mnaserclarkb: its back now20:49
ianw++ can confirm from here :)20:50
ianwwell that was a 1m 30s of excitement :)20:50
clarkbyup confirmed and I see in the haproxy log that gitea01 was marked up20:50
clarkbti was the other server I noticed that may have had trouble. We may need to resync gitea01 from gerrit though20:50
mnaserfwiw, gitea01-lb is a spof so i guess everything will go down if it does20:51
clarkbya confirmed gitea01 uptime is short20:51
mnaseri can confir mboth gitea01 and gitea-lb01 were moved off20:51
clarkbmnaser: yup, iirc because the only way to run haproxy with neutron and openstack is via octavia?20:51
clarkbwe're happy to run an lb pair ourselves but iirc the networking in openstack doesn't make this very viable. But maybe there are layer 7 workarounds we could make use of20:52
mnaserclarkb: i've actually been toying with some potential ideas without octavia but yes, that's the straight forward one 20:52
clarkbour previous experiences some of the managed services have made us cautious of going that route20:52
clarkbbut ya I think octavia is aavailable now in vexxhost?20:52
mnaseryep it's been for a while :)20:53
clarkbinfra-root I have manually disabled gitea01 now. I'm going to go eat lunch, but the next thing there is likely to tell gerrit replication to replicate to gitea0120:54
clarkbI can do that after lunch if no one beats me to it20:54
ianwi can get gerrit started on that20:55
clarkbianw: I think it is the url flag that allows you to do something like gerrit replication start --url gitea01 and it will only replicate to gitea0120:55
clarkbianw: the show queue command will also show you all of those inflight and queued tasks once requested20:55
ianwi ran replication start --url 'ssh://'20:58
fungithat sounds right20:58
ianw2160 queue jobs for gitea01 which seems about right20:58
fungiit'll probably match whatever the entry is in our replication plugin config20:58
ianwcurrently the job is doing "gpg2 --encrypt --output /var/log/ansible/service-codesearch.yaml.log.gpg --recipient=0x9615aec8 /var/log/ansible/service-codesearch.yaml.log"21:01
ianwwe can either open the permissions on /var/log/ansible so zuul can write there, or update the role so that it can output to a different path21:01
ianwor maybe just copy it and do it all in a tmpdir21:05
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: encrypt logs in temporary staging directory
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: encrypt logs in temporary staging directory
fungiianw: yes, any of those seems like a fine solution to me21:27
clarkbgerrit show queue looks empty now. Should I reenable gitea01?21:41
clarkbianw: doesn't look like the rocky functests are run by the change I pushed to update the epel element: ?21:42
clarkbis there another way to test that?21:42
clarkboh I see I can just update the change to run them in functests I think21:45
clarkbthen remove if people don't want them permanently after we've seen them be happy21:45
fungior push a follow-on addition21:45
fungiwhich would run the additional testing optionally without blocking merge of the fix21:45
opendevreviewClark Boylan proposed openstack/diskimage-builder master: Add rocky support to the epel element
clarkbya I think having the extra checks shouldn't be too much extra burden on the test runs so I figure keep it if this generates data showing it isn't a big impact21:47
clarkbotherwise I'll split it out21:47
clarkbI'm reenabling gitea01 now that 830278's update shows up on it21:50
*** dviroel is now known as dviroel|out21:52
ianwsorry back now, yeah we don't run that test in gate22:09
ianwi'd probably prefer to not have it in functests too, because we already have so much going on running tests on the same element twice seems a bit wasteful22:10
clarkbianw: I don't know that we are using epel anywhere in the tests? But I may be wrong about that22:10
ianwprobably not, i mentioned in the change we could think about adding it to the nodepool tests22:12
clarkbah yup looks like we support an extra element argument. I can update for that22:13
ianwmy only concern with that though is that it ties us more to epel reliability with gate testing22:14
clarkbI guess this comes down to how willing dib is to accept changes to the epel element without testing22:15
clarkbit looks like a year ago we updated the role for centos-stream-9. Not sure how that was tested if at all22:15
ianwwe just have to trade off the cost of building a whole distro for one element, or the extra dependencies if adding it to the extant tests22:16
ianwobviously the best solution would be to only test it when it changes, but i'm not sure how we could achieve that22:16
ianwthat is a bit of a general problem with dib, in that it throws everything at most every change22:17
clarkbwe could modify the nodepool tests to only run when key bits change, but the key bits tend to be modified often enough that may not help as much as we hope22:17
clarkb(things like the partitioning and so on)22:18
fungiwe could put some separate jobs in experimental22:19
clarkboh thats a thought22:19
ianwit's probably easiest to do the follow-on idea and have one run that double-checks the match works, given it is unlikely to ever change after that it's, to me, an ok trade-off22:19
ianwfungi / clarkb: if you could just double-check, i'd like to get that in to make sure the codesearch prod job starts working22:20
fungioh, yep, i already had that one pulled up to see how the job did22:21
ianwthanks; it's an annoying path because it really only shows up in deploy22:22
fungiand for future reference, those gpg messages about trust levels are just noise22:22
ianwfungi: which messages are they?22:27
fungithe ones you thought indicated bad signatures earlier22:27
ianwfungi: umm, sorry still not sure which ones?  do you mean where we're updating the trustdb in the encrypt-file role?22:30
ianwclarkb: if you have time too, in dib can go in with the epel fix, if you can just double check I haven't missed anything in the matching/logic path there22:30
ianwit fixes a regression in arm64 introduced by recent bootloader cleanup22:31
fungi09:39 <ianw> it looks like permissions errors with the gpg signing.  have to think about that one, but i think it can wait22:31
clarkb one thing on that change22:31
ianwfungi: oh sorry, i meant on-disk file permissions22:31
fungiwell, there was also no signing going on there22:32
fungihence my confusion about what you meant22:32
ianwclarkb: oh, doh, good catch22:34
clarkbianw: for does that imply the elements list for build-succeeds is insufficient?22:34
clarkbI'd like to figure out how to fix that but split it out into a child change22:35
clarkbI see the fedora one has block-device-gpt. I'll try that22:35
ianwahh yeah, that will need block-device-*, mbr or gpt will work22:36
ianwbecause it has the vm element22:36
ianwnow it says remote_src doesn't work with mode setting22:37
opendevreviewClark Boylan proposed openstack/diskimage-builder master: Add rocky support to the epel element
clarkbsomething like that maybe22:39
opendevreviewClark Boylan proposed openstack/diskimage-builder master: DNM Follow on commit to test rocky + epel
clarkbianw: hrm22:39
clarkbianw: I find this aspect of ansible to be extremely confusing. Maybe we shell out a cp?22:40
ianwi just tried it, and it works as i expected22:40
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: encrypt logs in temporary staging directory
clarkb indicates that src is not remote unless remoet_src is set22:41
ianwclarkb: the other interesting thing with that change is that we probably do not setup podman to be able to run a containerfile element on the functest host22:44
ianwso i wouldn't be surprised if it maybe fails trying to get the base image22:45
clarkbianw: ya that is why I also went with the nodepool functest update too :)22:45
clarkbI figured I had twice the opportunity to get something working that way22:46
clarkbianw: for `&& ! -d /usr/lib/grub/*-efi` what are we trying to accomplish there? I'm comparing against the code that added the regression and it isn't clear22:46
clarkbianw: seems like before we always set the i386-pc target but now we're adding an additional spot where we don't?22:47
ianwif "/usr/lib/grub/<arch>-efi" is there, it means the grub efi packages are installed.  so this is trying to say "we are on a system that doesn't have grub-efi installed"22:47
clarkbbut in both cases don't we need to set the target for when it updates the grub install? Or maybe we just don't care about updating if it is already there?22:48
ianwno, but now i look again, perhaps we should move this check into the section below22:50
ianwactually, no, that doesn't work22:50
clarkbianw: I think the way it works is you always get both bios and uefi compat the way it was written22:51
clarkbbut now I think we may only get uefi depending on whether or not /usr/lib/grep/*-efi exist22:51
ianwwe added something so that uefi would make bios compatible images22:53
ianwthis bit22:54
clarkbianw: that sets the same flag the code you are updating modifies22:55
clarkbis the code you are modifying entirely redundant?22:55
ianwno i don't think so, because we need to explicitly set the target (i think) in the case where you're building on a BIOS only image on an EFI system -- to avoid it trying to guess from /sys22:56
clarkbI suspect it is since both seem to check for use of efi then set the --target=i386-pc flag22:56
ianwif we fall into the mbr/gpt bits we want that flag set22:57
clarkbwhy are we checking /sys and /usr then?22:57
clarkbShouldn't we just check if this is mbr/gpt?22:57
clarkbI think that is what I'm confused about. We seem to be checking the same thing multiple differetn ways and setting the same flags either way. If efi then set --target=i386-cp22:58
ianwwe may be able to refactor it to that, yes22:58
ianwi think history might show that we have added the gpt/mbr path well after this check, and never gone through and read it top-to-bottom23:00
clarkbI also may be getting a bit confused by the values in DIB_BLOCK_DEVICE23:01
clarkbsince efi can boot mbr and gpt (though apparently on arm they neglected the part of the spec that requires they be backward compatible?)23:01
clarkband I thought bios only did mbr? But maybe if you have a newer bios it does both too23:01
clarkbanyway I see now where it is different as mbr or gpt doesnt seem to imply efi. Even though we check if efi is used then set the value for mbr or gpt :)23:03
ianwi believe that efi can only boot from gpt23:03
clarkbianw: I think I undersatnd this better and it seems like if DIB_BLOCK_DEVICE == gpt or mbr then we alway want to set i386-pc because those imply non efi systems. I think the efi check is because grub will automatically determine i386-pc properly unless boot ed with efi23:04
ianwso in our code, efi implies gpt, but gpt does not imply efi23:04
clarkbianw: to simplify this I think you can just add a check for x86 in the mbr/gpt block and set the flag there. Then it is a lot more direct and makes sense23:04
clarkbyou shouldn't need the efi check in there since we've already determined we are not efi?23:04
ianwok, so this check came in with23:06
ianwwhich interestingly gives a gerrit 500 error23:06
clarkbreally I think the confusing part is checking if we want to manually set the flag because we're on efi and grub won't autodetect. But instead we can just always set the target if not efi regardless of the host situation23:07
ianwthis is *way* predates any of the block-device gpt/mbr/efi stuff23:07
ianwit seems like we've added those bits, but not really pulled out the old check23:07
clarkbI think those gerrit errors are related to comments that didn't make the migration :/23:07
ianwso i tend to agree a full refactor will work23:08
clarkb*notedb migration23:08
clarkbianw: I'm happy to approve it now that I understand the tangle there a bit better :)23:08
clarkbor would you prefer to try and refactor first?23:08
ianwperhaps i'll propose a follow-on with the refactor to keep it separate23:08
clarkbthe !-d /usr/.... check still seems unnecessary since we should be able to set the target regardless in this case23:08
ianwright; now we actually know what we are building, in the 36861 days we didn't23:09
clarkbjentoio: does tomorrow afternoon work for syncing up on the container stuff? we can do a jitsimeet call?23:18
clarkbI've updated the meeting agenda. Please edit to add or fixup any topics in the next little bit then I'll send it out23:20
opendevreviewIan Wienand proposed openstack/diskimage-builder master: bootloader: clean up EFI checking
ianwclarkb: thanks, fresh eyes on bits are always good!23:22
clarkbcool +2'd both but didn't approve the first one in case you wanted to land them together or squash23:24
ianwlet's make sure it passes separately and see23:26
*** rlandy|ruck is now known as rlandy|ruck|bbl23:30
clarkbianw: I think that shows the epel for rocky change working23:39
ianwthanks, lgtm23:45
opendevreviewMerged opendev/system-config master: run-production-playbook: encrypt logs in temporary staging directory
clarkbGerrit is doing a hackathon in May with in person and remote attendance. The inperson is super limtied and I'm not sure I'm up for the travel anyway. So then I look at the reot stuff and wonder if I can be away from 09:00 - 17:00 London time23:52
clarkbI suspect that would be very difficilt :)23:52
clarkbpython3.10 adds additional determinism to python thread scheduling. I guess we'll want to keep testing an old python with anything that might have races23:54
opendevreviewIan Wienand proposed opendev/system-config master: run-production-playbook: fix path typo

Generated by 2.17.3 by Marius Gedminas - find it at!