Thursday, 2024-10-10

opendevreviewMerged openstack/project-config master: Use vexxhost/project-config in vexxhost tenant  https://review.opendev.org/c/openstack/project-config/+/93199800:00
tkajinamE: Failed to fetch https://mirror.sjc3.raxflex.opendev.org/ubuntu/dists/focal/main/binary-amd64/Packages  403  Forbidden [IP: 65.17.193.187 443]10:15
tkajinamI've seen a few 403 from package mirrors today. there might be some issues with a few specific mirrors10:16
tkajinamI've seen 403 from the mirror 5 times for far10:36
tkajinamhmm looks like this specific mirror is still broken. and we see more failures caused by it since more people start their day11:16
priteauHello. Anyone know when https://review.opendev.org/c/openstack/project-config/+/931631 will be deployed? The new image doesn't seem to be available yet.12:16
mnasiadkaclarkb: It seems the locale patch in DIB did not help - I'll do some debug tomorrow to find out what's wrong now ;-) https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_acc/925581/59/check/kolla-ansible-centos9s/accafe5/primary/logs/ansible/bootstrap-servers12:27
Clark[m]tkajinam it looks like afs crashed or is no longer serving content there as the entire root is empty except for robots.txt. I can't look closer until I get kids to school though.13:55
Clark[m]priteau it depends on when images are built which is on a more staggered schedule these days. The code in your jobs should treat that as a cache and fetch the newer images directly if they are not available though13:56
Clark[m]Grafana has an image build dashboard that will show you when things were last built13:56
Clark[m]mnasiadka: maybe there is extra stuff that needs to be installed in addition to configuring rpm to build that locale when installing packages? As mentioned previously I'm not really clued into how that platform handles locales and am still hoping someone who is takes a closer look13:58
mnasiadkaI will tomorrow and report back ;-)14:02
Clark[m]mnasiadka: I guess the other thing to check is the image was the newly built one. We capture that in the zuul info log files somewhere I think14:21
fungimy isp has been down again all morning (latest estimate is they'll have it working again by 18:00 utc), so i'm trying to work over a phone tether and don't have access to much besides irc and e-mail for the moment14:28
opendevreviewMerged openstack/project-config master: Enable nodepool-in-zuul for opendev tenant  https://review.opendev.org/c/openstack/project-config/+/93199914:30
corvusdo we happen to know what image format rax-flex wants?14:43
corvuswe don't specify it in clouds.yaml; does that mean it's qcow2?14:43
opendevreviewJames E. Blair proposed opendev/system-config master: Install clouds.yaml on zuul-launcher  https://review.opendev.org/c/opendev/system-config/+/93208714:44
Clark[m]corvus: I think it does default to qcow214:47
corvuscool, that'll make for easier testing.  i've got patches staged to add rax-flex to the zuul-launcher, so we can test the image upload there14:49
corvusClark: fungi can you review https://review.opendev.org/931996 and https://review.opendev.org/932087 ?14:53
clarkbya dmesg reports afs is very unhappy: afs: Cannot open cache file (code -30). Trying to continue, but AFS accesses may return errors or panic the system14:53
corvusthey should be quick, and i'd like to get that deployment going14:53
clarkbalso io errors like Buffer I/O error on dev dm-1, logical block 1070, async page read14:53
corvusclarkb: where are the afs errors?14:53
clarkbcorvus: dmesg is where I found the one above14:53
clarkbon mirror.sjc3.raxflex.opendev.org14:54
corvusi mean server (and what volume are you looking at)14:54
corvusoh ok client side14:54
fungiaha, so not afs-wide, sounds like there might have been an iscsi outage impacting that cinder volume14:54
fungion the client where the cache resides14:54
clarkboh yup it seems to be specific to the mirror node. When I checked earlier I checked other mirrors were returning content too and they were14:54
corvusis this affecting one or all volumes on that server?14:55
fungimaybe a quick reboot will sort it, might have to stop the daemon and blow away the cache dir14:55
corvusah i see, it's at least the root mirror volume, so no accessing any mirrors14:56
clarkbcorvus: I think just vdd which I'm trying to figure out what that maps to14:56
clarkboh you mean afs volumes ya its the whole thing as far as mirroring goes14:56
clarkbvdd isn't in fstab so that must be the cinder volume? lvs/pvs doesn't return any info14:57
corvusyeah, but with a name like "main-openafs" i'm pretty sure that's lvm14:57
corvusso maybe lvm is in a broken state?14:58
clarkbya and they remounted ro looks like. So maybe things are sad enough that even lvm can't report things back?14:58
corvusseems like a good guess to me14:58
clarkbin that case should we try a reboot and see if things come back cleaner?14:59
corvusi wonder if we rescan devices/partitions if lvm would report info14:59
clarkboh that is a good idea14:59
corvuspartprobe /dev/vdd did not output happy messages15:00
corvusi'm on team reboot now15:00
clarkbcorvus: do you want to do the typing for that or should I?15:00
clarkb(and I guess if it doesn't come up clean we set max-servers to 0 for now)15:00
corvusclarkb: i can15:01
clarkbthanks15:01
corvuskpartx -u /dev/vdd also was unhappy ftr15:01
corvusi suppose there's a chance it got reattached as a different device... 🤷15:02
clarkbI want to say we use uuids for the setup so that would hopefully emit errors about the device not being present rather than being confused? But ya that seems theoretically possible15:02
clarkbno dev mapper mounts at all on reboot so its still not finding the lvm stuffon the disk I think15:04
clarkbI/O error, dev vdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 015:05
clarkbthat seems pretty damning :/15:05
clarkbI'll get max-servers set to 0 as this is unlikely to resolve quickly15:05
opendevreviewClark Boylan proposed openstack/project-config master: Set raxflex max-servers to 0  https://review.opendev.org/c/openstack/project-config/+/93209015:07
corvusshould we just launch a new mirror?  or do we want to practice recovery skills?15:07
clarkbnow to do that manually. I'll put nl01 in the emergency file and then edit the config manually too15:07
clarkbcorvus: I was wondering about that. I think we shouldn't delete the old mirror so that rackspace can investigate and ensure there isn't a systemic issue to address. But on our side spinning up a new mirror is likely a good course of action15:08
fungiso the cinder volume is present and the logical volumes on it get detected at boot but the disk is still reporting i/o errors?15:08
mthebeauHi, I have an odd low pritority request. The gerrit server and review.opendev.org interprets the text "Candidat" on the first line of a file as an image.  Here is an example review: https://review.opendev.org/c/starlingx/vault-armada-app/+/932089  How would I report something like that?15:08
fungiif the problem is just the cinder volume, can't we detach it and attach a new one? that would basically be one of the steps in creating another server regardless15:08
clarkbfungi: I don't think the lvm partition scheme is detected at all15:09
clarkbfungi: we're trying to read sector 0 and failing15:09
fungiokay, but still the cinder volume, sounds like. should be able to detach it and attach a new one, unless the hypervisor host has totally lost its access to the storage network or something15:10
clarkbfungi: oh ya that is another option for recovery15:10
fungilike i said, if you built a replacement server you'd still need to create an attach a cinder volume and so the lvm setup to create the volumes for apache and openafs caches regardless15:11
clarkbmthebeau: I suspect that string corresponds to the image/x-quicktime format and whatever file type detection library gerrit is using interprets it that way. The best place to report that is probably in the gerrit bug tracker15:11
clarkbmthebeau: https://issues.gerritcodereview.com/issues?q=status:open15:11
clarkbone of us can probably file the issue as well. It helps you've got the example change up that we can refer to. Do it happen if the file contains additional text after the string starts that way?15:12
corvusswapping the volume sounds good to me; clarkb are you able to do that?15:13
clarkbcorvus: yes I should be able to figure that out this morning15:13
mthebeauThanks clarkb15:13
clarkbas far as process for that goes I should probably remove the existing mounts from fstab, detach the existing cinder volume, attach a new cinder volume then go through the lvm and fs creation process then readd the mounts and maybe reboot for good measure?15:14
clarkband i guess we need to stop apache and openafs so they stop writing to the root fs?15:14
corvusisn't the root fs fine?15:15
clarkbcorvus: it is but we don't need it to fill up with content that will be masked by the new mounts15:15
corvusoh they're writing to mountpoints15:15
corvusyeah; probably worth remembering to do a rm on those before we cover them15:15
clarkblooks like /var/cache/apache2 is empty but /var/cache/openafs is not15:16
clarkbI'll put this node in the emergency file list so that we don't accidentally restart things15:16
clarkbnode is in the emergency file and openafs-client and apache2 are stopped15:20
fungithat process sounds right. you may also need ansible re-run to recreate subdirectories under those volumes, i don't recall15:21
clarkbthe old fstab entries are commented out now to go inspect what oepnstack sees15:23
clarkbdfcb0b9c-cef6-45e6-aab5-c23b4e26659b is the current volume and cinder reports it as being attached as vdd (just to confirm what we've surmised so far)15:26
opendevreviewMerged openstack/project-config master: Set raxflex max-servers to 0  https://review.opendev.org/c/openstack/project-config/+/93209015:27
corvusfungi: clarkb any chance you could review https://review.opendev.org/931996 and https://review.opendev.org/932087 real quick?15:30
clarkbsure15:30
mthebeauclarkb, I created the issue here: https://issues.gerritcodereview.com/issues/372628496  It seems that the only required text is "Candidat" at the beginning of the first line.  The original review where it was noticed was here:  https://review.opendev.org/c/starlingx/election/+/93199015:30
corvusi can help with the recovery while those are working through the system :)15:30
mthebeauMore or less text after that "Candidat" is not impactful15:31
clarkbmthebeau: thanks. I suspect that the detection will have to use additional heuristics to make a best effort (like checking file extension and the prefix and if they disagree take the conservative approach, or scanning for binary data after the magic number)15:32
clarkbI have detached the old volume from the mirror node (I was owrried this would not work but it did and /dev/vdd is gone)15:34
clarkbnew volume is attached and /dev/vdd is back15:35
clarkbnow I'm going to need a few minutes to go refresh our block device management setup15:36
clarkbwe have a mirror_volumes.sh script in system-config which I'll use once I figure out the correct invocation15:37
clarkbhttps://opendev.org/opendev/system-config/src/branch/master/launch/src/opendev_launch/mirror_volumes.sh it has a nice example in the help output. Perfect and thank you to whoever added that15:38
clarkbtonyb I suspect15:38
clarkbthe script doesn't appear to mount -a so I can run it and then clean up /var/cache/openafs15:39
clarkbcorvus: with openafs-client stopped it should be safe to just completely rm /var/cache/openafs right? I mean it says cache in the path :)15:40
corvusyep15:40
clarkbcool I'm just going to delete the two cache dirs since the script recreates the mountpoint if they are not precreated15:41
clarkbscript is done running, cache mount points were created and fstab was updated15:45
clarkblvs shows data now15:45
clarkbI'm going to mount -a then start apache2 and openafs-client again15:45
clarkbmount: (hint) your fstab has been modified, but systemd still uses the old version; use 'systemctl daemon-reload' to reload.15:46
clarkbits been a while since I found a systemd behavior particularly problematic.15:46
clarkbreset the timer15:46
clarkbit did mount things, but I'll do the daemon reload and rerun to make sure it is happy15:47
clarkbstarting openafs client is slow...15:48
clarkbhttps://mirror.sjc3.raxflex.opendev.org/ looks good. Do we want to reboot the server for good measure before reseting max-servers back to 32?15:50
clarkbprobably a good idea to catch any errors I may have introduced with boot time startup of all this stuff15:50
clarkbI'm going to grab something to drink and if no one objects by the time that is done I'll rebioot15:51
clarkbrebooting nwo15:52
clarkbserver is back and https://mirror.sjc3.raxflex.opendev.org/ still looks good15:55
clarkbany objections to putting that cloud region back into service?15:55
corvusclarkb: lgtm15:56
opendevreviewJames E. Blair proposed opendev/system-config master: Add some documentation about the mirror volume config script  https://review.opendev.org/c/opendev/system-config/+/93209215:57
corvusclarkb: ^ i didn't find *any* reference to that mirror volume config script in our docs.  that isn't nearly enough information (because i don't actually know what to write), but that's at least a start15:57
clarkb++ I'll take a look once I'm caught up elsewhere15:58
corvusit would probably be good for someone who has actually set up a mirror to update the docs to explain how it's done :)15:58
clarkbso nl01 has max-servers set to 32. maybe I got the node wrong in the emergecny file?15:58
clarkbin any case I think this is already in service :/ and I just need to revert the chagne to set it to 0 then after the revert lands remove emergency file entries15:58
corvuslooks right to me; maybe some racing15:59
opendevreviewClark Boylan proposed openstack/project-config master: Revert "Set raxflex max-servers to 0"  https://review.opendev.org/c/openstack/project-config/+/93209315:59
* clarkb writes down a todo list so that updating docs and emailing rackspace with qusetions about the old volume don't get forgotten16:00
opendevreviewJames E. Blair proposed opendev/system-config master: Install clouds.yaml on zuul-launcher  https://review.opendev.org/c/opendev/system-config/+/93208716:06
corvusclarkb: i missed a small thing on that change ^ 16:06
clarkbcorvus: looking I'm going to dobule check that group isn't used for deploying a whole nodepool or something ( Idon't think it is016:07
clarkbnot finding evidence of host selection in plays based on that. Must only be used for var access?16:08
clarkb+2 from me based on ^16:09
corvusyeah, that was the conclusion i came to as well, so i decided that was the cleanest way to do that16:10
corvusthe launchers and builders have their own groups used for host targeting in playbooks16:11
clarkbcorvus: for the mirror docs update the two main pieces of info that come to mind are the size of the volume (200GB minimum with 100gb each for apache and openafs) and then the command invocation from the script help is probably worht including too. I assume you'd prefer me to make that update but let me know if you'd like to keep pushing it along16:15
corvusclarkb: yeah, i think if we add that it should be fairly complete; i say you go for it :)16:18
opendevreviewClark Boylan proposed opendev/system-config master: Add some documentation about the mirror volume config script  https://review.opendev.org/c/opendev/system-config/+/93209216:24
clarkbdone16:24
clarkbcorvus: fungi  can you review https://review.opendev.org/c/openstack/project-config/+/932093 so that I can undo the emergency file entries?16:25
fungiyay! i can now that my internet's back on16:29
fungiapproved16:29
clarkbthank you16:30
clarkbI'm going to work on an email now16:30
corvushuh, i thought i did, but it turns out i just clicked the link to open it and then the browser sat there...16:31
fungiand thanks for fixing the raxflex mirror, sorry i was inaccessible16:35
opendevreviewMerged openstack/project-config master: Revert "Set raxflex max-servers to 0"  https://review.opendev.org/c/openstack/project-config/+/93209316:39
opendevreviewMerged opendev/project-config master: Add image build pipelines  https://review.opendev.org/c/opendev/project-config/+/93200016:40
corvusi made 2 pipelines there... i think we could combine them.  but for now, we might want to keep them separate, just to have a little more control.16:41
clarkbemail sent16:43
opendevreviewClark Boylan proposed opendev/system-config master: Cleanup python 3.10 bullseye images  https://review.opendev.org/c/opendev/system-config/+/93210216:51
clarkbfinally we can drop the bullseye image builds \o/16:52
opendevreviewMerged opendev/system-config master: Configure zuul-launcher to use its logging config file  https://review.opendev.org/c/opendev/system-config/+/93199616:53
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Add raxflex provider and debian-bullseye image  https://review.opendev.org/c/opendev/zuul-jobs/+/93210316:53
corvusokay that's the actual fun change ^ (feel free to review now, but i'll approve once all the pre-reqs are confirmed in place)16:53
clarkbcorvus: is the name overlap with what nodepool is building going to cause su to potentially boot the new niz label using a nodepool image or nodepool boot the nodepool label with a zuul image?16:56
clarkbalso should we use the same flavor for better comparisons between the two?16:56
clarkbgp.0.4.8 is what we use on the nodepool side for the flavor16:57
corvusclarkb: both systems use the external image ids they store for deciding what to boot, so no problem there16:57
corvusi thought i did set it to use gp.0.4.8?16:57
clarkbcorvus: oh there is an extra level of indication16:58
clarkbs/indication/indirection/16:58
clarkbnormal == gp.0.4.816:58
clarkbI'm not sure I understand why that is ncessary if the flavor only maps a name to a flavor-name. Are there additional attributes we expect the flavor objects to carry?16:59
corvusyep, that way we can map the different flavors on different clouds to one zuul flavor16:59
clarkbah I see17:00
corvus(in nodepool we have to do that for every cloud plus every label, so it's n(cloud*label) mappings.  in zuul it will be n(cloud) mappings)17:00
clarkbits the sort of thing that would be intuitive with multiple clouds from the start but less so with one. Anyway makes sense to me17:06
fungisounds like we should also take raxflex back out of nodepool temporarily in the near future and recreate our tenant networks there (which i guess will also mean reattaching the mirror server, so probably a change of ip address?)17:06
clarkbfungi: it uses a floating ip so shouldn't affect that. Only the private ip would change and thats not a big deal17:07
fungiah, yeah i guess we should be able to retain and move that fip17:07
clarkbonce the hourly nodepool job ends I'll clean up the emergency file (this way I avoid any unexpected race conditions between that job and things merging earlier17:08
clarkbemergency file is updated. I note that jvb01 is in there with a note from tonyb about upgrades. I think the upgrades occurred and we might be able to clean that up now?17:14
clarkbya only jvb02 is in inventory so its a noop17:14
clarkbwow I'm just noticing now the level of spam in the gerrit issue tracker...17:19
corvus(fyi i made a fix to the less-than-ideal config error zuul returned on that change: https://review.opendev.org/932106 )17:20
corvusi'm starting to use the add-filter magnifying-glass icon thingy on the status page a bit more.  as long as there's something in gate, i'm finding it pretty easy to just use that to add a filter for system-config.  if there isn't something in gate, it's a bit harder.17:22
opendevreviewMerged opendev/system-config master: Install clouds.yaml on zuul-launcher  https://review.opendev.org/c/opendev/system-config/+/93208717:37
corvusi think we need to restart the schedulers, web, and launcher to see the new connection, so i'm going to do that now18:24
corvusokay, amusing error due to mixture of speculative / non-speculative there18:37
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Add raxflex provider and debian-bullseye image  https://review.opendev.org/c/opendev/zuul-jobs/+/93210318:38
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Enable the debian-bullseye image build job  https://review.opendev.org/c/opendev/zuul-jobs/+/93211918:38
corvusthat may need to be a two-step process for now; we might want to see if we can change that in zuul18:38
clarkbthe restart is necessary for config validation since the scedhulers do that right?18:49
corvusclarkb: yes, but also, they generate the config that the launchers use, so even a force-merge wouldn't work19:02
corvuswe should probably put some file matchers on those image build jobs :)19:21
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Add file matchers to image build jobs  https://review.opendev.org/c/opendev/zuul-jobs/+/93213219:25
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Add file matchers to image build jobs  https://review.opendev.org/c/opendev/zuul-jobs/+/93213219:29
corvusclarkb fungi ^ that might speed things up if you have a sec :)19:57
fungilookin'19:58
fungilgtm19:59
fungii think one reviewer is enough for that, it's trivial and isn't going to break anything for now even if it's missing something, we just need to remember to expand the list of file patterns if we add relevant bits outside those directories20:00
clarkbsorry lunch went long20:16
opendevreviewMerged opendev/zuul-jobs master: Add raxflex provider and debian-bullseye image  https://review.opendev.org/c/opendev/zuul-jobs/+/93210320:21
opendevreviewMerged opendev/zuul-jobs master: Add file matchers to image build jobs  https://review.opendev.org/c/opendev/zuul-jobs/+/93213220:21
corvus2024-10-10 20:21:49,461 ERROR zuul.Launcher:   keystoneauth1.exceptions.auth_plugins.MissingRequiredOptions: Auth plugin requires parameters which were not given: auth_url20:22
corvushrm.  that's a little confusing.20:23
corvuson account of it is in the file20:23
corvusoh derp20:28
opendevreviewJames E. Blair proposed opendev/system-config master: Fix raxflex connection entry for zuul-launcher  https://review.opendev.org/c/opendev/system-config/+/93214720:30
corvusclarkb fungi ^ that fixes an oops.  i'm going to make that edit on the launcher to keep moving.20:31
clarkbapproved20:33
clarkbarg restarted firefox to pick up some updates and didn't realize I had a second window open so it saved the wrong set of tabs on exit20:35
clarkbgood way to clear house on open tabs I guess20:35
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: Revert "Add raxflex provider and debian-bullseye image"  https://review.opendev.org/c/opendev/zuul-jobs/+/93214820:46
corvusokay, i've run into a zuul bug that needs fixing; i think we should yank ^ until it's fixed to avoid looping on builds20:46
corvus(good news! image build job is running in the image build pipeline! :)20:46
gouthamro/ need help to check if shubham.kumar.yadav369<at>gmail<dot>com has been subscribed to the openstack-discuss list? they have questions on the list, but, unfortunately doesn't respond to the list .. :( 20:47
clarkbI guess mm3 doesn't tell you who the list admins are anymore?20:52
clarkbgouthamr: fungi and a couple of others have been managing that list but I don't recall who they all are at this point20:52
gouthamrah; no i couldn't tell on the interface clarkb 20:53
clarkbcorvus: +2 from me I'll let you decide on if it needs a fast approval20:54
clarkbgouthamr: the only email I see on the thread from them did go to the list. So if anyone responded I think they responded off list20:58
clarkbwhich would then perpetuate via reply buttons in clients.20:58
clarkbbut it is possible that fungi or someone else moderated through the original email earlier today20:58
clarkbgouthamr: X-Mailman-Rule-Hits: nonmember-moderation <- from the headers on the email that did make it to the list. I think that does indicate they are not a list member20:59
gouthamrclarkb: ah! their initial email goes to the list, but then they start replying off-list.. so i thought i'd drop them from the CC if they're a list subscriber to prevent it :) 21:00
gouthamrthanks for sharing that tip! i'll watch for that next time21:00
corvushttps://review.opendev.org/932150 is the zuul change needed to continue launcher work (that should explain the issue we ran into)21:47
corvus5 lines to fix, 195 to test21:47
opendevreviewMerged opendev/system-config master: Fix raxflex connection entry for zuul-launcher  https://review.opendev.org/c/opendev/system-config/+/93214722:50
opendevreviewJay Faulkner proposed openstack/project-config master: Proposed new Ironic core structure  https://review.opendev.org/c/openstack/project-config/+/93199122:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!