*** jamesmcarthur has quit IRC | 00:01 | |
*** jamesdenton has quit IRC | 00:02 | |
*** jamesdenton has joined #openstack-infra | 00:02 | |
*** jamesmcarthur has joined #openstack-infra | 00:03 | |
*** sshnaidm|ruck is now known as sshnaidm|afk | 00:08 | |
*** tosky has quit IRC | 00:09 | |
*** jamesmcarthur has quit IRC | 00:15 | |
*** jamesmcarthur has joined #openstack-infra | 00:15 | |
*** jamesmcarthur has quit IRC | 00:22 | |
*** jamesmcarthur has joined #openstack-infra | 00:22 | |
*** jamesmcarthur has quit IRC | 00:27 | |
*** jamesmcarthur has joined #openstack-infra | 00:28 | |
*** jamesmcarthur has quit IRC | 00:30 | |
*** rcernin has quit IRC | 00:32 | |
*** rcernin has joined #openstack-infra | 00:37 | |
*** jamesmcarthur has joined #openstack-infra | 00:37 | |
*** jamesmcarthur has quit IRC | 00:38 | |
*** lbragstad_ has joined #openstack-infra | 00:38 | |
*** jamesmcarthur has joined #openstack-infra | 00:38 | |
*** jamesmcarthur has quit IRC | 00:38 | |
*** lbragstad has quit IRC | 00:40 | |
*** dychen has joined #openstack-infra | 01:15 | |
*** dave-mccowan has quit IRC | 01:15 | |
*** dchen has quit IRC | 01:17 | |
*** jamesmcarthur has joined #openstack-infra | 01:22 | |
*** jamesmcarthur has quit IRC | 01:23 | |
*** jamesmcarthur has joined #openstack-infra | 01:23 | |
*** dave-mccowan has joined #openstack-infra | 01:30 | |
*** jamesmcarthur has quit IRC | 01:34 | |
*** jamesmcarthur has joined #openstack-infra | 01:35 | |
*** lbragstad_ is now known as lbragstad | 01:37 | |
*** lbragstad has quit IRC | 01:46 | |
openstackgerrit | Merged openstack/project-config master: Set up access for #openinfra channel https://review.opendev.org/c/openstack/project-config/+/771073 | 01:57 |
---|---|---|
*** dychen has quit IRC | 01:59 | |
*** dychen has joined #openstack-infra | 02:00 | |
*** hamalq has quit IRC | 02:24 | |
*** dingyichen has joined #openstack-infra | 02:25 | |
*** dychen has quit IRC | 02:27 | |
*** jamesmcarthur has quit IRC | 02:28 | |
*** jamesmcarthur has joined #openstack-infra | 02:30 | |
*** jamesmcarthur has quit IRC | 02:35 | |
*** dingyichen has quit IRC | 02:39 | |
*** dingyichen has joined #openstack-infra | 02:40 | |
*** ysandeep|away is now known as ysandeep | 02:43 | |
*** armax has joined #openstack-infra | 02:50 | |
*** dklyle has quit IRC | 02:54 | |
*** david-lyle has joined #openstack-infra | 02:54 | |
*** armax has quit IRC | 02:56 | |
*** jamesmcarthur has joined #openstack-infra | 02:58 | |
*** rcernin has quit IRC | 03:08 | |
*** ianw is now known as ianw_pto | 03:12 | |
*** armax has joined #openstack-infra | 03:17 | |
*** lbragstad has joined #openstack-infra | 03:25 | |
*** armax has quit IRC | 03:31 | |
*** rcernin has joined #openstack-infra | 03:33 | |
*** gyee has quit IRC | 03:36 | |
*** david-lyle has quit IRC | 03:38 | |
*** rcernin has quit IRC | 03:48 | |
*** armax has joined #openstack-infra | 03:59 | |
*** jamesmcarthur has quit IRC | 04:03 | |
*** jamesmcarthur has joined #openstack-infra | 04:03 | |
*** rcernin has joined #openstack-infra | 04:04 | |
*** rcernin has quit IRC | 04:05 | |
*** rcernin has joined #openstack-infra | 04:05 | |
*** Ajohn has joined #openstack-infra | 04:14 | |
*** armax has quit IRC | 04:22 | |
*** vishalmanchanda has joined #openstack-infra | 04:47 | |
*** Ajohn has quit IRC | 04:55 | |
*** Ajohn has joined #openstack-infra | 04:58 | |
*** ykarel has joined #openstack-infra | 04:59 | |
*** lbragstad has quit IRC | 05:11 | |
*** psachin has joined #openstack-infra | 05:22 | |
*** Ajohn has quit IRC | 05:41 | |
*** jamesmcarthur has quit IRC | 05:51 | |
*** jamesmcarthur has joined #openstack-infra | 05:52 | |
*** jamesmcarthur has quit IRC | 05:57 | |
*** Ajohn has joined #openstack-infra | 05:57 | |
*** matt_kosut has joined #openstack-infra | 05:59 | |
*** jamesmcarthur has joined #openstack-infra | 06:01 | |
*** jamesmcarthur has quit IRC | 06:01 | |
*** Ajohn has quit IRC | 06:02 | |
*** Ajohn has joined #openstack-infra | 06:03 | |
*** Ajohn has quit IRC | 06:23 | |
openstackgerrit | Rico Lin proposed openstack/project-config master: Mark min-ready for ubuntu-focal-arm64 https://review.opendev.org/c/openstack/project-config/+/771912 | 06:30 |
*** sboyron has joined #openstack-infra | 06:32 | |
*** rcernin has quit IRC | 07:25 | |
*** rpittau|afk is now known as rpittau | 07:34 | |
*** rcernin has joined #openstack-infra | 07:37 | |
*** jcapitao has joined #openstack-infra | 07:37 | |
*** hashar has joined #openstack-infra | 07:41 | |
*** rcernin has quit IRC | 07:42 | |
*** ralonsoh has joined #openstack-infra | 07:48 | |
*** rcernin has joined #openstack-infra | 07:50 | |
*** eolivare has joined #openstack-infra | 07:50 | |
*** yamamoto has quit IRC | 07:53 | |
*** yamamoto has joined #openstack-infra | 07:54 | |
*** slaweq has joined #openstack-infra | 07:54 | |
*** rcernin has quit IRC | 07:55 | |
*** ysandeep is now known as ysandeep|lunch | 08:00 | |
*** jamesmcarthur has joined #openstack-infra | 08:02 | |
*** yamamoto has quit IRC | 08:04 | |
*** jamesmcarthur has quit IRC | 08:06 | |
*** rcernin has joined #openstack-infra | 08:08 | |
*** mgoddard has quit IRC | 08:11 | |
*** mgoddard has joined #openstack-infra | 08:11 | |
*** andrewbonney has joined #openstack-infra | 08:13 | |
*** rcernin has quit IRC | 08:13 | |
*** rcernin has joined #openstack-infra | 08:14 | |
*** amoralej|off is now known as amoralej | 08:17 | |
*** rcernin has quit IRC | 08:19 | |
*** rcernin has joined #openstack-infra | 08:26 | |
*** rcernin has quit IRC | 08:31 | |
*** xek_ has joined #openstack-infra | 08:34 | |
*** yamamoto has joined #openstack-infra | 08:36 | |
*** dingyichen has quit IRC | 08:41 | |
*** rcernin has joined #openstack-infra | 08:44 | |
*** tosky has joined #openstack-infra | 08:44 | |
*** rcernin has quit IRC | 08:49 | |
*** jamesdenton has quit IRC | 08:49 | |
*** jamesdenton has joined #openstack-infra | 08:49 | |
*** yamamoto has quit IRC | 08:51 | |
*** gfidente|afk is now known as gfidente | 08:52 | |
*** jpena|off is now known as jpena | 08:57 | |
*** nightmare_unreal has joined #openstack-infra | 08:58 | |
*** ociuhandu has joined #openstack-infra | 09:02 | |
*** lucasagomes has joined #openstack-infra | 09:02 | |
*** rcernin has joined #openstack-infra | 09:08 | |
*** jamesmcarthur has joined #openstack-infra | 09:13 | |
*** rcernin has quit IRC | 09:13 | |
*** zxiiro has quit IRC | 09:14 | |
*** PrinzElvis has quit IRC | 09:14 | |
*** masayukig has quit IRC | 09:14 | |
*** zigo has quit IRC | 09:14 | |
*** sorrison has quit IRC | 09:14 | |
*** zxiiro has joined #openstack-infra | 09:15 | |
*** PrinzElvis has joined #openstack-infra | 09:15 | |
*** masayukig has joined #openstack-infra | 09:15 | |
*** zigo has joined #openstack-infra | 09:15 | |
*** sorrison has joined #openstack-infra | 09:15 | |
*** wolsen has quit IRC | 09:18 | |
*** jamesmcarthur has quit IRC | 09:20 | |
*** jamesmcarthur has joined #openstack-infra | 09:21 | |
*** mordred has quit IRC | 09:21 | |
*** JanZerebecki[m] has quit IRC | 09:21 | |
*** rcernin has joined #openstack-infra | 09:25 | |
*** jamesmcarthur has quit IRC | 09:26 | |
*** psachin has quit IRC | 09:26 | |
*** rcernin has quit IRC | 09:30 | |
*** rcernin has joined #openstack-infra | 09:32 | |
*** ysandeep|lunch is now known as ysandeep | 09:33 | |
openstackgerrit | Pranali Deore proposed openstack/project-config master: Add official-openstack-repo-jobs for openstack/glance-tempest-plugin https://review.opendev.org/c/openstack/project-config/+/771954 | 09:39 |
*** derekh has joined #openstack-infra | 09:48 | |
*** dtantsur|afk is now known as dtantsur | 09:59 | |
*** yamamoto has joined #openstack-infra | 10:05 | |
*** yamamoto has quit IRC | 10:09 | |
*** wolsen has joined #openstack-infra | 10:10 | |
*** yamamoto has joined #openstack-infra | 10:16 | |
*** yamamoto has quit IRC | 10:16 | |
*** yamamoto has joined #openstack-infra | 10:16 | |
*** yamamoto has quit IRC | 10:20 | |
*** yamamoto has joined #openstack-infra | 10:20 | |
*** yamamoto has quit IRC | 10:20 | |
*** yamamoto has joined #openstack-infra | 10:21 | |
*** yamamoto has quit IRC | 10:22 | |
*** yamamoto has joined #openstack-infra | 10:22 | |
*** yamamoto has quit IRC | 10:22 | |
*** yamamoto has joined #openstack-infra | 10:23 | |
*** yamamoto has quit IRC | 10:27 | |
*** yonglihe has quit IRC | 10:29 | |
*** JanZerebecki[m] has joined #openstack-infra | 10:43 | |
*** mordred has joined #openstack-infra | 10:43 | |
*** systemc is now known as systemb | 10:44 | |
*** hashar is now known as hasharAway | 10:44 | |
*** ykarel_ has joined #openstack-infra | 11:08 | |
*** ykarel has quit IRC | 11:08 | |
*** ykarel__ has joined #openstack-infra | 11:16 | |
*** ykarel_ has quit IRC | 11:19 | |
*** jcapitao is now known as jcapitao_lunch | 11:23 | |
*** mgoddard has quit IRC | 11:30 | |
*** rcernin has quit IRC | 11:31 | |
*** rcernin has joined #openstack-infra | 11:38 | |
*** rcernin has quit IRC | 12:00 | |
*** ysandeep is now known as ysandeep|afk | 12:15 | |
*** rlandy has joined #openstack-infra | 12:25 | |
*** rcernin has joined #openstack-infra | 12:27 | |
*** iurygregory_ has joined #openstack-infra | 12:28 | |
*** iurygregory has quit IRC | 12:28 | |
*** iurygregory_ is now known as iurygregory | 12:29 | |
*** jpena is now known as jpena|lunch | 12:35 | |
*** yamamoto has joined #openstack-infra | 12:44 | |
*** rcernin has quit IRC | 12:48 | |
*** yamamoto has quit IRC | 12:49 | |
*** hasharAway is now known as hashar | 12:52 | |
*** AJaeger has joined #openstack-infra | 12:54 | |
*** ociuhandu has quit IRC | 12:55 | |
*** ysandeep|afk is now known as ysandeep | 12:56 | |
*** ociuhandu has joined #openstack-infra | 12:56 | |
*** ociuhandu has quit IRC | 12:58 | |
*** ociuhandu has joined #openstack-infra | 12:59 | |
*** jcapitao_lunch is now known as jcapitao | 13:01 | |
*** ociuhandu has quit IRC | 13:02 | |
*** ociuhandu has joined #openstack-infra | 13:02 | |
*** ociuhandu has quit IRC | 13:02 | |
*** ociuhandu has joined #openstack-infra | 13:03 | |
*** ociuhandu has quit IRC | 13:15 | |
*** tkajinam_ has quit IRC | 13:16 | |
*** sboyron has quit IRC | 13:16 | |
*** sboyron has joined #openstack-infra | 13:27 | |
*** mgoddard has joined #openstack-infra | 13:32 | |
*** jpena|lunch is now known as jpena | 13:34 | |
*** hemna has quit IRC | 13:39 | |
*** ociuhandu has joined #openstack-infra | 13:39 | |
*** lbragstad has joined #openstack-infra | 13:42 | |
*** hemna has joined #openstack-infra | 13:44 | |
*** ykarel__ is now known as ykarel | 13:44 | |
*** ociuhandu has quit IRC | 13:49 | |
*** ociuhandu has joined #openstack-infra | 13:50 | |
*** AJaeger has quit IRC | 13:53 | |
*** ysandeep is now known as ysandeep|away | 14:06 | |
*** amoralej is now known as amoralej|lunch | 14:08 | |
*** hemna has quit IRC | 14:12 | |
*** ociuhandu has quit IRC | 14:20 | |
*** jamesdenton has quit IRC | 14:20 | |
*** kashyap has joined #openstack-infra | 14:23 | |
*** hemna has joined #openstack-infra | 14:24 | |
*** amoralej|lunch is now known as amoralej | 14:43 | |
*** jamesdenton has joined #openstack-infra | 14:45 | |
*** jamesmcarthur has joined #openstack-infra | 14:48 | |
*** ociuhandu has joined #openstack-infra | 14:49 | |
*** rpittau is now known as rpittau|afk | 14:58 | |
*** mugsie has joined #openstack-infra | 15:09 | |
zbr | infra-core: who can help me do a git-review release? i managed to clean the backlog. | 15:09 |
zbr | i do use git-review from master branch, and last release was in 2019 | 15:13 |
zbr | probably we want to name this 2.0? due to dropping support for py27? | 15:13 |
*** vishalmanchanda has quit IRC | 15:15 | |
*** armax has joined #openstack-infra | 15:17 | |
*** mugsie has quit IRC | 15:18 | |
*** zbr3 has joined #openstack-infra | 15:22 | |
fungi | zbr: i can do it today, i also need to cut a release for bindep | 15:23 |
fungi | and yeah, i might make two git-review releases for the py27 drop depending on what order it falls in the history. would be nice to tag the last py27-supporting version and then do the major version bump with the py27 drop | 15:23 |
fungi | i'll try to look once i finish catching up on channel backlog and mailing lists | 15:24 |
*** zbr has quit IRC | 15:24 | |
*** zbr3 is now known as zbr | 15:24 | |
zbr | fungi: cool, if you can also do some testing it would be cool. like using it from master ;) | 15:26 |
zbr | i have no pressure on that, master works fine for me. | 15:27 |
fungi | yeah, i do tend to use git-review from master but i don't refresh it as often as i should, thanks for the reminder! | 15:28 |
zbr | do a --help after you do, you will see a cool feature mentioned at the end. | 15:32 |
kashyap | fungi: Heya. When you get a minute, a small topic to discuss on adding a custom disk image for testing: | 15:32 |
zbr | using username instead of name for branch names. | 15:32 |
*** armax has quit IRC | 15:32 | |
kashyap | fungi: So, we're trying to add support for secure boot in Nova; for that we'd need a disk image in the gate that has an EFI partition. | 15:33 |
kashyap | fungi: Now, none of the default images; nor any cloud images distributed by distro vedors don't have EFI parition | 15:33 |
fungi | kashyap: i think our amd64 nodes already (have to) boot via uefi | 15:33 |
fungi | er, sorry, i meant arm64 | 15:34 |
kashyap | fungi: I see; I'm looking for instance booting | 15:34 |
fungi | aarch64 | 15:34 |
kashyap | (Right; I've had some devel boards in the distant past) | 15:34 |
kashyap | fungi: So, in a disk image I should see something like this: http://paste.openstack.org/show/801871/ | 15:34 |
kashyap | fungi: For now, my testing has been by creating custom images from a distro install tree, like this: | 15:34 |
kashyap | https://kashyapc.fedorapeople.org/Create-a-SecureBoot-enabled-VM.bash | 15:35 |
fungi | er, to be clear, you want custom test nodes booted with a uefi partition? or you're nesting instances and want an image file booted in devstack in a job or something like that? | 15:35 |
kashyap | So one idea, I'm exploring is this: | 15:35 |
kashyap | fungi: My bad, I didn't explain well. Let me retry :-) | 15:35 |
kashyap | fungi: The goal is to be able to test a VM with OVMF ("UEFI for VMs") in DevStack in a job. Tthe physical host _itself_ doesn't have to have a UEFI partition | 15:36 |
fungi | okay, and by "physical host" you mean the virtual machine instance we boot with nodepool to run the jobs on top of | 15:37 |
kashyap | Right | 15:37 |
kashyap | I see that the infra uses nesting under the hood | 15:37 |
fungi | (because there's also an actual physical bare metal host under that which the cloud provider controls) | 15:37 |
kashyap | Because the "hosts" the Infra gets are essentially level-1 VMs -- right? | 15:37 |
fungi | right, we get accounts at public cloud providers and boot nova instances with images we build in diskimage-builder | 15:38 |
kashyap | Right | 15:38 |
fungi | and ssh into those with ansible to start jobs | 15:38 |
kashyap | So do you have a suggestion on how best to go forward here? Another idea that I'm exploring is: | 15:38 |
kashyap | Make a disk image with a real Fedora grub and kernel, and a custom tiny initrd that just prints the secure boot status. That image will be be <10MB, and doesn't require frequent "updating". | 15:39 |
fungi | yeah, so within a job you want devstack to download this image and boot it and verify it booted, basically? | 15:41 |
fungi | seems like that test could be worked into devstack/tempest and just included in whatever other tests are run in one of nova's existing ci jobs | 15:42 |
kashyap | fungi: Boot it based on a Nova config that enables the secure-bootability -- by picking the right OVMF binary, etc. | 15:42 |
kashyap | fungi: Where can I host this image? | 15:43 |
kashyap | s/image/template image/ | 15:43 |
fungi | and i agree, one of the first steps would be finding or making the image | 15:43 |
*** dklyle has joined #openstack-infra | 15:44 | |
kashyap | Yes, how about this: I'll prepare the image, and then write an email to the list w/ some details? | 15:44 |
kashyap | (So others can follow along, too, who's interested in the topic.) | 15:44 |
fungi | you say you've been unable to find a reliable source for a small uefi test image? | 15:44 |
kashyap | fungi: Yeah, for example: | 15:45 |
kashyap | Fedora's cloud images doesn't have any EFI partitions, and I was told "cloud images are not intended to do EFI" | 15:45 |
kashyap | fungi: And I haven't explored what Ubuntu / Debian offer | 15:46 |
frickler | or create a small project that creates such an image and has a job that uploads it as an artefact to some of our sites? (not sure tarballs.o.o would be the right one) | 15:47 |
fungi | yeah, even uefi.org seems to recommend fat (ubuntu live) images for validation testing | 15:47 |
*** armax has joined #openstack-infra | 15:47 | |
kashyap | Yeah, that's too much | 15:47 |
kashyap | fungi: Yeah, we've had various ad-hoc scripts for testing the QEMU bits, etc. Perhaps should create a tester project | 15:47 |
fungi | whoever said "cloud images are not intended to do EFI" also probably thinks arm hardware is not used for making clouds | 15:48 |
kashyap | Heh | 15:48 |
kashyap | frickler: fungi: In fact, we did this in the past when testing OVMF / QEMU stuff: https://github.com/puiterwijk/qemu-secureboot-tester/blob/master/sbtest | 15:49 |
kashyap | (Warning, "tall" script ... but I'll work through to see how to make use of it in this context) | 15:50 |
*** mugsie has joined #openstack-infra | 15:51 | |
fungi | given the simplicy of uefi, i wonder if it wouldn't be simpler to just make a uefi image generator which devstack can run... would probably take all of a few seconds during the job if you really just want something which will boot and echo a string, doesn't need a kernel/userspace | 15:51 |
kashyap | Thanks, both for the input. I still have to work out some Nova code. | 15:51 |
kashyap | fungi: Heh, "simplicity" | 15:51 |
fungi | of course if you want to test through booting a signed kernel and then do userspace attestation, you'll need more than just that | 15:52 |
kashyap | Whoa ... no need for "attestation"-level stuff | 15:52 |
kashyap | fungi: All we want to validate is: | 15:52 |
kashyap | - Has the VM booted with the right OVMF binaries for secure boot (there's a whole bunch of them!) | 15:53 |
kashyap | - And has the guest kernel emitted "Secure Boot is in effect" or whatever is the latest message | 15:53 |
fungi | okay, but you do need a kernel of some sort, so just booting into the uefi manager and echoing something isn't enough | 15:54 |
fungi | though you could likely boot something small like syslinux | 15:54 |
fungi | (i think the current debian installer chains uefi to syslinux, as an example) | 15:55 |
kashyap | fungi: Yes, yes; a proper kernel is needed, definitely | 15:55 |
kashyap | fungi: So, I know you're curious of low-level bits ... just to show the terrible messiness involved here: | 15:55 |
kashyap | fungi: There are sooooooooooo many OVMF binary names, and each distro has their _own_ naming scheme! Thankfully, QEMU solved this problem by creating a firmware "schema" that all distros can use | 15:56 |
kashyap | And we've slowly got that work done in distros over time. | 15:56 |
kashyap | Check this out (resolved): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932269 | 15:57 |
openstack | Debian bug 932269 in ovmf "Ship the firmware "descriptor files" as part of the 'ovmf' package" [Normal,Fixed] | 15:57 |
fungi | oh yikes | 15:58 |
fungi | so, yeah, i see two possible approaches: | 15:58 |
kashyap | Yeah ... I did the packaging work in Fedora, and now Debian, Ubuntu, and Fedora ship these files | 15:58 |
kashyap | fungi: Actually this all makes life _much_ easier. Why? | 15:58 |
fungi | 1. have devstack make a uefi test image on the fly and use that (if it's reasonably self-contained, simple-ish, and only takes a few seconds to run) | 15:59 |
kashyap | (libvirt has done enough work to take advantage of these JSON files, and will just do the right thing -- you only have to tell libvirt to boot with 'efi') | 15:59 |
* kashyap listens | 15:59 | |
kashyap | fungi: On (1) -- is there a prior example in-tree? | 16:00 |
clarkb | crazy idea: update cirros to do both legacy bios and uefi then use that image for all the things | 16:00 |
clarkb | or cirros standin | 16:00 |
fungi | 2. write a python tool to create a uefi test image and run a ci job to use it to publish the image to somewhere like tarballs.opendev.org/openstack/uefi-test/uefi-test_2021-01-22.img and tell devstack to grab from there | 16:00 |
* kashyap taps on the table and thinks | 16:01 | |
fungi | yeah, 3. fork/take over cirros maintenance and make it support uefi would be awesome but... | 16:01 |
kashyap | clarkb: Heh; doesn't sound that crazy -- but no, I just don't have time to maintain that | 16:01 |
clarkb | I know sean was looking at building a small image with dib to replace cirros (alpine based maybe?) and I think frickler has helped smoser a bit in the past. They may have ideas on #3 | 16:02 |
frickler | actually I think hrw made cirros work on arm, so it should have at least some kind of efi support I guess | 16:02 |
fungi | #2 would be the "this is a little too complex to fit in devstack and/or runs too long to just do within the job" | 16:02 |
kashyap | fungi: Yeah; I see what you mean | 16:03 |
kashyap | clarkb: Oh, a complication for CirrOS: it doesn't ship the JSON "firmware descriptor files" (see above Debian RFC) | 16:03 |
kashyap | fungi: Good thing is: it should be quicker now, because, Debian, Ubuntu, and Fedora -- all should ship the firmware descriptor files, with pre-made OVMF "vars" files | 16:04 |
clarkb | kashyap: well we would be modifying cirros either way? I assume any problems like that can be sorted out too | 16:04 |
kashyap | (Sorry for the terminology) | 16:04 |
clarkb | but I'm not super familiar with cirros' dev process, frickler would definitely be a better input on that | 16:04 |
kashyap | clarkb: Would we? I lost track :-) | 16:04 |
fungi | kashyap: to answer your earlier question about publication, we run any services which allow someone to just stick a file somewhere, but we do run services which can be used to publish ci build artifacts (ephemerally or durably), or to upload to other services which will host them for you | 16:05 |
fungi | er, i tried to say "we DON'T run any services which allow someone to just stick a file somewhere..." | 16:05 |
kashyap | fungi: Okay; I'll come back with some more details next week, after some tinkering. And discuss more concretely, instead of waving my hands. | 16:05 |
kashyap | fungi: Fair enough | 16:06 |
kashyap | fungi: By "other services", you mean non-OpenStack infra ones? | 16:06 |
fungi | yeah | 16:06 |
frickler | there's also #cirros if you want to discuss modifications or shortcomings | 16:06 |
fungi | or, terrible idea here, you could also just stick the image in a git repo, it's sort of a cheat, but if it's not huge and is the only thing in the repo and doesn't update often/ever... | 16:06 |
kashyap | Alright; got it. I'll tinker some more and see what's the least invasive that I can come up w/ for our purposes here. | 16:07 |
kashyap | fungi: 10MB -- you can forgive that, right, for sticking in a Git repo? :-) | 16:07 |
kashyap | (I'm not sure yet; 10MB is a guesstimate) | 16:07 |
fungi | we do have git repos hosting things like video files to embed in web pages we publish, but yeah it's not the best idea | 16:07 |
kashyap | And no; it doesn't update often; let alone once every 8 months. | 16:07 |
kashyap | Okay; thanks for the useful discussion, folks. I'll come back w/ something more concrete. | 16:08 |
*** lpetrut has joined #openstack-infra | 16:10 | |
fungi | any time! | 16:10 |
*** zul has quit IRC | 16:11 | |
*** lpetrut has quit IRC | 16:14 | |
*** ociuhandu has quit IRC | 16:18 | |
*** ociuhandu has joined #openstack-infra | 16:19 | |
*** ociuhandu has quit IRC | 16:24 | |
*** __ministry1 has joined #openstack-infra | 16:30 | |
*** __ministry1 has quit IRC | 16:30 | |
*** ociuhandu has joined #openstack-infra | 16:38 | |
zbr | fungi: if you are pleased with what i did with git-review, i could also take try to take care of bindep, https://review.opendev.org/q/project:opendev%252Fbindep+status:open | 16:39 |
zbr | someone i have the impression you will never find time to rework your 3y+ incomplete patches. | 16:40 |
*** ociuhandu has quit IRC | 16:42 | |
*** ykarel has quit IRC | 16:47 | |
*** ociuhandu has joined #openstack-infra | 16:47 | |
*** matt_kosut has quit IRC | 16:56 | |
*** lucasagomes has quit IRC | 17:08 | |
*** zzzeek has quit IRC | 17:12 | |
*** zzzeek has joined #openstack-infra | 17:13 | |
*** jamesmcarthur has quit IRC | 17:17 | |
*** eolivare has quit IRC | 17:18 | |
*** jamesmcarthur has joined #openstack-infra | 17:18 | |
*** ociuhandu_ has joined #openstack-infra | 17:22 | |
*** armax has quit IRC | 17:22 | |
*** jamesmcarthur has quit IRC | 17:23 | |
*** hashar has quit IRC | 17:25 | |
*** ociuhandu has quit IRC | 17:26 | |
*** armax has joined #openstack-infra | 17:26 | |
*** ociuhandu_ has quit IRC | 17:27 | |
*** jamesmcarthur has joined #openstack-infra | 17:29 | |
*** jamesmcarthur has quit IRC | 17:34 | |
*** amoralej is now known as amoralej|off | 17:39 | |
*** ociuhandu has joined #openstack-infra | 17:47 | |
*** jamesmcarthur has joined #openstack-infra | 17:50 | |
*** gyee has joined #openstack-infra | 17:51 | |
*** ralonsoh has quit IRC | 17:51 | |
*** ociuhandu has quit IRC | 17:53 | |
*** gfidente has quit IRC | 17:55 | |
*** jpena is now known as jpena|off | 17:55 | |
*** jcapitao has quit IRC | 17:59 | |
*** derekh has quit IRC | 18:01 | |
*** jamesmcarthur has quit IRC | 18:10 | |
*** slaweq has quit IRC | 18:12 | |
*** lbragstad has quit IRC | 18:16 | |
*** dtantsur is now known as dtantsur|afk | 18:17 | |
*** lbragstad has joined #openstack-infra | 18:22 | |
*** jamesmcarthur has joined #openstack-infra | 18:25 | |
*** nightmare_unreal has quit IRC | 18:25 | |
*** ramishra has quit IRC | 18:38 | |
*** andrewbonney has quit IRC | 18:56 | |
*** matt_kosut has joined #openstack-infra | 18:57 | |
*** matt_kosut has quit IRC | 19:02 | |
*** jamesmcarthur has quit IRC | 19:28 | |
*** ThePherm has joined #openstack-infra | 19:33 | |
*** jamesmcarthur has joined #openstack-infra | 19:43 | |
*** jamesdenton has quit IRC | 19:43 | |
*** jamesdenton has joined #openstack-infra | 19:43 | |
*** slaweq has joined #openstack-infra | 19:46 | |
*** jamesmcarthur has quit IRC | 19:47 | |
dansmith | fungi: you mean tempest would ask nova to create an instance and then when it deleted it, the instance seemingly didn't get deleted (because arp indications) ? | 19:59 |
*** jamesmcarthur has joined #openstack-infra | 20:00 | |
*** slaweq has quit IRC | 20:01 | |
fungi | dansmith: nope, nodepool would ask the cloud provider to boot an instance and then the address it gets assigned happens to already be in use by a vm elsewhere on the lan, typically it was an existing server instance at some point but nova no longer had record of it, and the provider ends up running virsh across their hosts to find it | 20:01 |
dansmith | ah | 20:01 |
dansmith | that would seem like a huge problem for them | 20:01 |
dansmith | unrelated to us | 20:01 |
fungi | well, *maybe* unrelated. could also be a bug in nova/neutron/something we've provided them ;) | 20:02 |
dansmith | no, | 20:02 |
dansmith | I meant unrelated to the aggravation it causes us as a "customer", | 20:02 |
fungi | oh, yep | 20:02 |
dansmith | I would think that would be a problem for lots of customers and thus a real problem for them | 20:02 |
dansmith | you say virsh, so does that mean they end up actually finding libvirt domains that never went away? | 20:03 |
dansmith | because if they have logs, we could certainly look at them, but I've really never heard of that happening | 20:03 |
fungi | where it winds up impacting jobs is that sometimes the new node will initially "win" in the arp fight with the router and zul will be able to ssh into it, but then at some point it loses the battle in an arp cache refresh and suddenly zuul experiences a connection timeout, connection refused, or ssh host key changed error and fails or retries the job | 20:03 |
dansmith | now, asking to delete an instance and nova not being able to do it? sure, but.. not acting like it's gone and it's now | 20:04 |
dansmith | *not | 20:04 |
dansmith | sure, that's a huge problem | 20:04 |
fungi | how different providers deal with it varies, and they're not all so forthcoming with details. i've heard at least one provider say they ran virsh on all their hosts periodically to find virtual machines that nova had "forgotten about" so they could be cleaned up | 20:05 |
fungi | but this is all second-hand | 20:05 |
dansmith | hmm, I'm skeptical :) | 20:05 |
dansmith | nova even periodically reaps instances that might've been deleted when it was offline, if properly configured | 20:05 |
fungi | it's just as possible some overzealous cleanup script is muddling things behind nova's back | 20:06 |
fungi | i'm not really privy to what goes on in anyone's networks, just a user | 20:06 |
dansmith | yeah I mean there's lots of custom stuff that could be getting in the way | 20:06 |
fungi | i wouldn't be surprised to learn that it's either a bug in some openstack software (maybe in a very old release they're still running) or the side effects of custom attempts to work around yet another bug of some sort | 20:07 |
fungi | but possibilities are endless | 20:08 |
dansmith | I mean, I'm not at all saying it's not a bug in nova, | 20:09 |
dansmith | I'm just saying that if we had this problem, I would think people would be raising hell about it because it's such a big deal | 20:09 |
dansmith | so it makes it wonder if people have local hacks, some other buggy network hook thing, etc | 20:10 |
fungi | i can say that we see evidence of it in varying degrees across most of our cloud donors from time to time | 20:10 |
dansmith | mnaser: you see this? ^ | 20:11 |
fungi | usually it's background noise/minor annoyance for us, sometimes it's debilitating and we temporarily stop using that provider and/or give them a list of addresses we saw impacted so they can clean them up | 20:11 |
dansmith | fungi: if they will file bugs (or if there have been some already) then please encourage them to speak up about them next time you hear it | 20:11 |
fungi | i don't recall if we've seen it happen in vexxhost | 20:11 |
fungi | though in some cases it seems to be coupled with system outages/restarts where something may have gotten out of sync or maybe a database got rolled back | 20:12 |
dansmith | yeah, so, | 20:12 |
fungi | in very large providers, that may be a constant occurrence, i suppose | 20:12 |
dansmith | if they don't have nova set to reap unknown VMs (maybe because they're cheating and running their own under the covers) then a delete while the system is not healthy could leave one running | 20:13 |
dansmith | but that's the point of that option | 20:13 |
fungi | right, and whether or not a particular donor takes advantage of that feature we generally won't know. it may also be racy and we're hitting the problem between whatever points in time nova rediscovers rogue virtual machines | 20:14 |
dansmith | it's possible, but again that case has to involve a compute node going offline (via rabbit) for more than a $service_timeout period right before you tried to delete your instance | 20:15 |
dansmith | and actually, it might even require them to have archived their DB in there, I forget all the semantics of that check | 20:15 |
clarkb | fungi: did this come up again because we've seen errors in inap after reenabling it? | 20:15 |
fungi | the varying degrees to which we see it in different providers could be a mix of the stability of their infrastructure, whether they're relying on that feature, and what sort of frequency it's checking | 20:15 |
fungi | clarkb: not entirely, i was explaining in #openstack-tc why we had disabled it previously | 20:16 |
dansmith | fungi: maybe but as above, some real badness has to happen to even get into a situation where nova needs to reap one of its own | 20:16 |
fungi | i don't know if the problem there has resurfaced since readding them yesterday | 20:16 |
dansmith | inap was on super old cellsv1 for quite a while.. do we know what they run now? | 20:16 |
fungi | real badness is the beef stew on which cloud operators dine daily ;) | 20:17 |
dansmith | because cellsv1 had all kinds of problems with stuff like that because of all the fowarding | 20:17 |
clarkb | fungi: got it | 20:17 |
fungi | dansmith: i have no idea if they've upgraded, but it could explain why it's been more of a problem there than elsewhere | 20:17 |
dansmith | yeah for sure | 20:17 |
fungi | the second highest incidence of it has tended to be in rackspace, though seems less common lately. we've also seen it come and go in ovh from time to time | 20:18 |
dansmith | I also don't know if rax ever moved :) | 20:18 |
fungi | this sounds liek a potential correlation ;) | 20:19 |
dansmith | not sure about ovh, but I kinda expect they are more current | 20:19 |
dansmith | I thought rax was mostly frozen in time from the sale, which was cellsv1 IIRC | 20:19 |
*** auristor has quit IRC | 20:20 | |
*** auristor has joined #openstack-infra | 20:23 | |
melwitt | there was/is a patch proposed to reap unknown VMs but we nacked it because archive --before could avoid that state | 20:30 |
*** jamesmcarthur has quit IRC | 20:34 | |
*** jamesmcarthur has joined #openstack-infra | 20:34 | |
*** thogarre has joined #openstack-infra | 20:35 | |
dansmith | melwitt: isn't there a setting for that reap periodic that will nuke non-instance vms? | 20:35 |
melwitt | no there isn't. that's what the person wanted to add | 20:35 |
melwitt | well, sorry. they wanted to nuke instance (owned by nova) vms that were no longer in the database | 20:36 |
melwitt | but yeah we don't have any options for destroying guest vms that are unknown to nova | 20:37 |
dansmith | hmm, I really thought there was | 20:37 |
dansmith | not really sure why we have the option to turn of the reaping then, if it's only things we know used to be ours | 20:38 |
melwitt | this was the patch I'm thinking of https://review.opendev.org/c/openstack/nova/+/627765 | 20:39 |
melwitt | I don't know the historical reasons to not reap but there are other options like 'log' and 'shutdown'. they look to be options for if the operator wanted to debug or otherwise do forensic stuff | 20:40 |
dansmith | I distinctly remember some issue with nova reaping vms that weren't nova instances on hosts, from like 2012 (an IBM internal customer) | 20:40 |
dansmith | but a lot has changed since then, including all that power state goo from the AT&T people | 20:41 |
melwitt | https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L1323 | 20:41 |
melwitt | default is reap | 20:41 |
*** dciabrin__ has joined #openstack-infra | 20:41 | |
dansmith | yeah, the comment even says that "log is what you want for production" which I don't think is legit :) | 20:41 |
dansmith | yeah, I'm looking at it | 20:41 |
melwitt | oh, yeah I wasn't aware of the 2012 issue of reaping vms that weren't nova instances. maybe that was "fixed" since | 20:42 |
dansmith | well, a ton of that periodic stuff that tries to correct power state (to the chagrin of the ironic people) was all around/after that point | 20:42 |
dansmith | so .. yeah, I wonder if they're not running reap, but should (or at least shutdown) | 20:43 |
*** dciabrin_ has quit IRC | 20:43 | |
dansmith | but even still, this would only be a thing if they're local deleting from an api node | 20:43 |
melwitt | yeah ... I dunno, as far as I've known, it's only reaped instances in the database and has caused problems for those who happened to have an archive cron run while any computes were "down" | 20:43 |
dansmith | without logs it's really impossible to tell | 20:44 |
melwitt | right | 20:44 |
melwitt | so if internally a provider cloud had a network partition or some other issue to a compute when a delete was requested, it would do the local delete | 20:44 |
melwitt | and then if archive ran before the compute came back to being accessible, this would happen | 20:45 |
fungi | also it may impact us more by virtue of the fact that we constantly boot and delete instances in tight loops... if their typical users treat the service more like a vps and boot new instances a few times a month and keep them for years, they're unlikely to be severely impacted | 20:45 |
dansmith | melwitt: right, it's a lot of hoops to jump through for a thing that seems to be pervasive | 20:45 |
fungi | so it's just those crazy openstack testing people who are complaining about it ;) | 20:46 |
melwitt | fungi: yeah, I think so. internally the most I've seen this is on an internal CI system, they call them "zombie vms" | 20:46 |
dansmith | melwitt: but, I can totally see cellsv1 database replication over rabbit leaving residue that causes the compute to never delete it | 20:46 |
dansmith | melwitt: because a delete is always local at the top cell anyway, has to be replicated down to the cell and replayed to actually delete sutff | 20:46 |
dansmith | that might also be why lots of delete/create ends up with conflicts, even if they're temporary.. that's how cellsv1 worked | 20:47 |
melwitt | fungi: and it's because there's a cron running archive + the env struggles with high load issues and computes go "down" often-ish | 20:47 |
fungi | melwitt: zombie works. i've tended to refer to them as "rogue vms" (reminds me of rogue ais, i guess) | 20:47 |
dansmith | melwitt: yeah, so those are easily explainable situations for why it happens on that internal cloud right? | 20:48 |
melwitt | dansmith: yeah, agreed cells v1 could be at play here. I keep forgetting it's still in rax at the least | 20:48 |
dansmith | the thing I find so bizarre is why someone like inap would be experiencing this with us and other customers and just sweep it under the rug instead of complain | 20:48 |
*** jamesmcarthur has quit IRC | 20:48 | |
dansmith | unless they're on cellsv1 and they know why they have consistency problems :) | 20:48 |
fungi | could be. next time someone sees mgagne we can ask for more details | 20:49 |
melwitt | yeah... I _thought_ mgagne had moved to cells v2. but I don't feel sure | 20:50 |
*** viks____ has quit IRC | 20:50 | |
fungi | anyway, it's possible this last bout was just protracted fallout from a major outage or maintenance which took them a couple months to get around to cleaning up completely for $reasons, and it's cleared up since. i haven't seen anyone complaining about a spike in these sorts of failures in the past 24 ours since we've reenabled them in nodepool again | 20:52 |
* fungi hunts down the elastic-recheck graph for that pattern | 20:52 | |
fungi | yeah, i can't even find the query which was tracking that condition | 20:57 |
*** rcernin has joined #openstack-infra | 20:58 | |
fungi | entirely possible that in the zuul v3 era it mostly surfaces as occasional retry_limit results because zuul sees it as a connectivity issue and so builds occasionally just get unlucky and run there three times and happen to trip over bad addresses every time | 20:59 |
melwitt | ah, yeah | 21:01 |
*** rcernin has quit IRC | 21:02 | |
*** rcernin has joined #openstack-infra | 21:03 | |
*** jamesmcarthur has joined #openstack-infra | 21:15 | |
fungi | i think it was hitting tripleo hardest because they already had some jobs which were just intermittently knocking test nodes offline, so if they were already relying on builds getting retried once or twice, having an increased source of retries in some provider would nudge them over the edge into retry_limit | 21:19 |
*** thogarre has quit IRC | 21:20 | |
*** jamesmcarthur has quit IRC | 21:23 | |
*** rcernin has quit IRC | 21:28 | |
*** jamesmcarthur has joined #openstack-infra | 21:29 | |
*** rlandy has quit IRC | 21:40 | |
*** xek_ has quit IRC | 21:48 | |
*** jamesmcarthur has quit IRC | 22:16 | |
*** jamesmcarthur has joined #openstack-infra | 22:17 | |
fungi | zbr: looks like i'll probably wind up doing the git-review and bindep releases tomorrow, i've had more stuff than i expected crop up today and am running out of steam at this point | 22:27 |
*** rcernin has joined #openstack-infra | 22:46 | |
*** zzzeek has quit IRC | 22:56 | |
*** zzzeek has joined #openstack-infra | 22:57 | |
*** yamamoto has joined #openstack-infra | 22:58 | |
*** matt_kosut has joined #openstack-infra | 22:59 | |
*** ociuhandu has joined #openstack-infra | 22:59 | |
*** matt_kosut has quit IRC | 23:03 | |
*** ociuhandu has quit IRC | 23:03 | |
*** paladox has quit IRC | 23:14 | |
*** paladox has joined #openstack-infra | 23:17 | |
*** jamesmcarthur has quit IRC | 23:34 | |
*** paladox has quit IRC | 23:40 | |
*** jamesdenton has quit IRC | 23:40 | |
*** jamesdenton has joined #openstack-infra | 23:41 | |
*** paladox has joined #openstack-infra | 23:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!