hashar | my ISP is probably too picky since that works fine from a gmail address | 00:00 |
---|---|---|
hashar | I tried with hashar@free.fr | 00:00 |
hashar | so maybe there is a misconfiguration in your config or it is just my ISP being annoying | 00:00 |
corvus | hashar: we got this from your mail server: "SMTP error from remote mail server after end of data: 550 spam detected" | 00:01 |
hashar | ah yeah | 00:01 |
hashar | that is my isp ;] | 00:01 |
hashar | thank you for checking it! | 00:01 |
corvus | hashar: np, sorry :( | 00:01 |
hashar | it is one of the largest isp in France and they went with a few hammers when it comes to deal with inbound spam | 00:01 |
hashar | anyway | 00:01 |
hashar | the reason was to ask about the status of opendev/gear since it has a lot of small patches that could use review | 00:02 |
*** cloudnull has joined #opendev | 00:02 | |
corvus | hashar: ah, i can try to take a pass through those soon. it mostly "just works" so i haven't really been looking | 00:04 |
hashar | an idea I had was to write down a mail listing the patches and giving a brief overview for each of them | 00:05 |
hashar | that might be less intimidating / easier to process them in bulk | 00:05 |
* hashar mailed the postmaster | 00:24 | |
*** tkajinam has quit IRC | 00:59 | |
*** tkajinam has joined #opendev | 00:59 | |
*** elod has quit IRC | 01:35 | |
*** elod has joined #opendev | 01:37 | |
*** hashar has quit IRC | 01:49 | |
*** euclidsun has joined #opendev | 02:53 | |
*** euclidsun has left #opendev | 02:58 | |
*** zbr4 has joined #opendev | 05:03 | |
*** zbr has quit IRC | 05:06 | |
*** zbr4 is now known as zbr | 05:06 | |
*** ysandeep|away is now known as ysandeep | 05:06 | |
*** qchris has quit IRC | 06:20 | |
*** qchris has joined #opendev | 06:33 | |
*** Gyuseok_Jung has quit IRC | 06:51 | |
yoctozepto | morning | 07:16 |
yoctozepto | how can infra help us (kolla) deal with the rate-limiting problem of docker.io - could we set up a caching docker registry? | 07:17 |
*** tosky has joined #opendev | 07:25 | |
*** andrewbonney has joined #opendev | 07:40 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:03 | |
*** hashar has joined #opendev | 08:19 | |
*** pushparajkvp has joined #opendev | 08:19 | |
*** dtantsur|afk is now known as dtantsur | 08:24 | |
*** moppy has quit IRC | 08:28 | |
*** moppy has joined #opendev | 08:28 | |
*** moppiner has joined #opendev | 08:32 | |
*** moppy has quit IRC | 08:33 | |
*** DSpider has joined #opendev | 08:51 | |
*** pushparajkvp has quit IRC | 08:54 | |
*** xiaolin has joined #opendev | 09:15 | |
*** xiaolin has quit IRC | 09:28 | |
*** stephenfin has quit IRC | 10:27 | |
*** hashar is now known as hasharAway | 11:43 | |
*** redrobot has quit IRC | 12:10 | |
*** Eighth_Doctor has quit IRC | 12:21 | |
*** mordred has quit IRC | 12:22 | |
*** mordred has joined #opendev | 12:30 | |
*** hasharAway has quit IRC | 12:45 | |
*** Eighth_Doctor has joined #opendev | 13:00 | |
*** lpetrut has joined #opendev | 13:21 | |
fungi | yoctozepto: we were talking about that yesterday (either in here or #openstack-infra, maybe both) | 13:50 |
fungi | yoctozepto: docker has promised to publish recommendations for operators of ci systems as to how best to solve the problem, so we're mostly holding out for that | 13:52 |
fungi | though if running a proxy registry does wind up being their recommended solution, i wonder if we should do a double-layered solution where we used a proxy registry to cache images somewhere centrally, and then pointed our current caching http proxies in each provider at that instead of at dockerhub. that way you get images cached near the nodes, but also have the caches hitting a registry which doesn't rate | 13:54 |
fungi | limit them (we could even restrict access to it so only our http proxies were allowed to make requests if we needed to mitigate abuse | 13:54 |
frickler | fungi: yoctozepto: the docker blog says "To apply for an open source plan, please complete the short form here.", did anyone do that? Not sure whether they'll announce more details only to those that leave their data there, instead of publically | 13:56 |
*** ysandeep is now known as ysandeep|away | 13:56 | |
frickler | the form starts by asking for personal data including a docker id | 13:57 |
frickler | oh, the form even says "Please complete our survey to get more information about how Docker can support your open source project on Docker Hub. | 14:01 |
frickler | " at the top https://forms.gle/vvKURDTYwok7Pc4r5 | 14:02 |
fungi | i think we also assumed that it would require some sort of authentication to make use of a special plab | 14:07 |
fungi | plan | 14:07 |
*** qchris has quit IRC | 14:13 | |
*** qchris has joined #opendev | 14:14 | |
clarkb | to be clear we do already cache. The specific issue is we cache blobs not manifests. The old rate limits werebased on blob fetches because they arethe actual data but docker changed the rate limiting ti be based on manifests becausepeople found blob limits confusing | 14:24 |
clarkb | it is unfortunate because we were doing the right thing for the previous situation | 14:25 |
clarkb | and ya they promised a blog post specifically related to CI | 14:25 |
fungi | new yesterday, https://discuss.python.org/t/pep-632-deprecate-distutils-module/5134 "Deprecate distutils module" | 14:35 |
fungi | (in the ongoing setuptools/distutils saga) | 14:35 |
fungi | er, i meant to link https://www.python.org/dev/peps/pep-0632/ but that's the discussion on their discourse | 14:35 |
fungi | maybe this will finally force distros who want to be able to split files under their package management from those managed by pip et cetera to better come to a compromise with the upstream python devs and package ecosystem | 14:37 |
fungi | since "just patch distutils" will cease to be an option | 14:37 |
fungi | "Code that imports distutils will no longer work from Python 3.12." | 14:39 |
fungi | that's going to be un | 14:39 |
fungi | also fun | 14:39 |
clarkb | and setuptools is vendoring distutils because it does import distutils? | 14:42 |
*** priteau has joined #opendev | 14:43 | |
clarkb | for docker I expect we have two simple options in the short term. First is stop using our caches, then the requests willbe distributed across many more IPs | 14:57 |
clarkb | Second is set up per project accounts and then use those with the mirrors so that manifest fetches are associated to accounts not IPs but we atill get blob caching for reliability (and perhaps speed) | 14:58 |
*** lpetrut has quit IRC | 15:06 | |
fungi | setuptools is vendoring distutils because it needs new distutils features and doesn't want to have to maintain backward compatibility with whatever the implementations in various 5-year-old stdlib might be | 15:12 |
fungi | and also as indicated by pep 632, the python stdlib maintainers would like to be able to stop maintaining it themselves (it's currently used for building the stdlib modules, but they're looking to switch to using makefiles directly like the interpreter does) | 15:14 |
*** mlavalle has joined #opendev | 15:14 | |
*** rpittau is now known as rpittau|afk | 15:18 | |
openstackgerrit | Nate Johnston proposed openstack/project-config master: Make the Backport-Candidate field in Octavia reviews persist https://review.opendev.org/749986 | 15:19 |
*** hashar has joined #opendev | 15:22 | |
fungi | hrm, the old afs02.dfw cinder volumes i cleaned up went into error_deleting state for some reason | 15:24 |
fungi | i don't think afs01.dfw's did that | 15:25 |
fungi | #status log all four cinder volumes for afs02.dfw have been replaced and cleaned up | 15:26 |
openstackstatus | fungi: finished logging | 15:26 |
fungi | i'll get to work on the dfw mirror server's volume shortly | 15:26 |
clarkb | infra-root I intend to catch up on email and any review response, then pop out for a bike ride. When I get back I plan to try booting an nb03.opendev.org server in linaro-us which can serve as our new dockerized nodepool builder for arm | 15:37 |
fungi | awesome | 15:38 |
clarkb | I do wonder if I sould boot a nb05.opendev.org instead to avoid the hostname conflicts but iirc we fixed that in nodepool | 15:40 |
clarkb | and maybe this is a good test of that | 15:40 |
clarkb | fungi: re error deleting volumes, di they successfully detach from the VM at least? | 15:48 |
*** pushparajkvp has joined #opendev | 15:57 | |
*** dtantsur is now known as dtantsur|afk | 16:06 | |
fungi | yep, or at least cinder thought they did | 16:12 |
fungi | it reported them as "available" rather than "in-use" | 16:12 |
fungi | ick, the dfw mirror recorded a page allocation failure in xenwatch the same time i attached its new volume | 16:16 |
fungi | infra-root: ^ should we turn down that region temporarily and reboot the mirror? | 16:17 |
clarkb | is it persistently unhappy? | 16:18 |
clarkb | page allocation failures would indicate some type of OOM? | 16:18 |
fungi | i have a feeling the new volume wasn't successfully hot-added | 16:18 |
clarkb | ah | 16:18 |
fungi | that was the first entry in dmesg in roughly a month | 16:19 |
fungi | and it didn't claim to be out of memory, or even close | 16:19 |
clarkb | ya unless you want to quickly write a partition table and mkfs and do some tests a reboot sounds practical (to be clear I'm saying reboot is probably simpler and easier than the laternative) | 16:21 |
yoctozepto | fungi, frickler, clarkb: I haven't done that form for sure; glad to know there is plan to have some cache; it might be beneficial in general - dockerhub likes to go awry; please ping me wherever you discuss docker issues, I won't mind but rather be thankful :-) | 16:23 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Temporarily disable rax-dfw for mirror reboot https://review.opendev.org/749993 | 16:24 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily disable rax-dfw for mirror reboot" https://review.opendev.org/749994 | 16:24 |
fungi | i'll wip the revert for now | 16:25 |
openstackgerrit | Nate Johnston proposed openstack/project-config master: Allow copyAnyScore in gerrit ACLs https://review.opendev.org/749995 | 16:31 |
openstackgerrit | Nate Johnston proposed openstack/project-config master: Make the Backport-Candidate field in Octavia reviews persist https://review.opendev.org/749986 | 16:34 |
fungi | while we wait for a safe mirror reboot, i'll work on etherpad, and then maybe gerrit after that since hopefully activity level will be dropping headed into the weekend so if there is any (unlikely) disruption from the pvmove there it won't be too painful | 16:41 |
*** hashar has quit IRC | 17:11 | |
openstackgerrit | Merged openstack/project-config master: Temporarily disable rax-dfw for mirror reboot https://review.opendev.org/749993 | 17:12 |
fungi | etherpad pvmove is in progress under a root screen session now | 17:13 |
fungi | shouldn't take long, it's a 50gb ssd | 17:14 |
fungi | not like the afs servers where we have 4tb attached | 17:14 |
fungi | already 10% complete | 17:14 |
fungi | #status log cinder volume for etherpad01 has been replaced and cleaned up | 17:28 |
openstackstatus | fungi: finished logging | 17:28 |
fungi | clarkb: if we're worried about i/o performance, should we give review.o.o an ssd volume instead of sata? | 17:34 |
fungi | easy enough to do while i'm replacing anyway | 17:34 |
fungi | it might help with the upgrade | 17:34 |
fungi | looks like the rax-dfw max-servers was zeroed at 17:25z, so in use counts there are dwindling | 18:07 |
fungi | once they bottom out i'll reboot it and then may as well do its pvmove before bringing it back into service | 18:08 |
fungi | demand's not that high now anyway | 18:08 |
fungi | and after that maybe we'll have an idea of whether we want to make changes to the volume for review.o.o | 18:08 |
*** pushparajkvp has quit IRC | 18:11 | |
*** andrewbonney has quit IRC | 18:22 | |
clarkb | fungi: I thought it was an ssd voolume already | 18:42 |
clarkb | but yes I think we want an ssd volume for the upgrade process | 18:43 |
fungi | it's a 200gb sata volume right now. i could make it a... 256gb? ssd | 18:43 |
fungi | it's around half-used currently, but i figure we've got db content moving into it too | 18:44 |
fungi | when the notedb migration happens that is | 18:44 |
clarkb | ya one of the things we'll need to sort out is how much extra disk we need | 18:44 |
clarkb | one reason the current volume is so full is we've got a bit of old stuff in /home/gerrit2 | 18:44 |
clarkb | I cleaned up some of that recently though | 18:44 |
clarkb | ~250gb seems fine and if we need more we can always attach another | 18:45 |
fungi | or migrate to a larger volume, yep | 18:46 |
fungi | down to 7 nodes in-use for rax-dfw | 18:47 |
fungi | though we have a bunch of nodes there which look to be stuck in a "deleting" state according to grafana/graphite/nodepool | 18:48 |
fungi | the cinder volume for review.o.o is undergoing pvmove to a 256gb ssd in a root screen session | 18:50 |
fungi | hopefully should be done within the hour | 18:50 |
fungi | it's already 3% complete | 18:50 |
clarkb | it looks like the cinder volume on nb03.openstack.org isn't lvm'd | 18:51 |
fungi | by then we ought to be clear to do the mirror | 18:51 |
clarkb | I think I'll lvm the new server and if thats wrong becaus arm64 we'll sort that out | 18:51 |
fungi | yeah, having the lvm layer in place allows us to do stuff like this ;) | 18:51 |
*** zbr0 has joined #opendev | 18:52 | |
fungi | once the pvmove on review.o.o is done and cleaned up, we can either do the logical volume and filesystem resize, or just wait and remember we want to do that during the next gerrit maintenance | 18:53 |
fungi | but risk for online resize is low in my opinion | 18:53 |
*** zbr has quit IRC | 18:53 | |
*** zbr0 is now known as zbr | 18:53 | |
*** priteau has quit IRC | 19:00 | |
clarkb | launch node is running (I had to specify a network to make it work but otherwise smooth sailing so far) | 19:02 |
clarkb | hrm ping6 to wiki failed | 19:04 |
clarkb | I can do --ignore_ipv6 which maybe is necessary in this cloud (it is for ovh) | 19:04 |
clarkb | I'll see where that gets me I guess | 19:05 |
fungi | aha, yeah ping6 in ovh will likely fail out of the gate | 19:05 |
fungi | we should see if we can think up a round-trip mechanism to push a working v6 config to the instances there | 19:05 |
clarkb | we can configure it statically with launch node there | 19:06 |
fungi | ought to be able to query the nova api and then generate something to scp onto it | 19:06 |
clarkb | (still not sure what is up with linaro-us ipv6 but we can always scrap this server and start over if necessary. Figure some progress is better than one) | 19:06 |
fungi | review pvmove is 60% complete | 19:09 |
fungi | 0 nodes in use for rax-dfw. i'll reboot it now | 19:11 |
clarkb | now I seem to have a conflict between make swap script and mount volume script | 19:12 |
fungi | #status log rebooted mirror01.dfw.rax to resolve a page allocation failure during volume attach | 19:13 |
openstackstatus | fungi: finished logging | 19:13 |
clarkb | the issue is that /dev/vdb is the device and make_swap.sh assumes it owns that, then mount_volume.sh tries to use the same device and fails | 19:14 |
clarkb | Instead of using launch_node to mount and configure the volume I'll do that after I've booted the instance and launch node is happy | 19:14 |
clarkb | I need to pop out fo ra few minutes and finish lunch I'll try again without automated volume mounting after | 19:17 |
fungi | the old sata volume for review.o.o went into error_deleting too, but it's migrated | 19:31 |
fungi | #status log cinder volume for review.o.o has been replaced, upgraded from 200gb sata to 256gb ssd, and cleaned up | 19:32 |
openstackstatus | fungi: finished logging | 19:32 |
fungi | i went ahead and resized the logical volume and fs to fill it | 19:35 |
fungi | activity seems pretty low at the moment | 19:35 |
fungi | infra-root: ^ just in case anyone spots anything amiss | 19:36 |
fungi | the /home/gerrit2 fs is now 36% used | 19:36 |
fungi | so we've got lots of headroom | 19:36 |
fungi | pvmove for the mirror01.dfw.rax volume is underway in a root screen session on the server | 19:40 |
clarkb | thank you for keeping us up to date | 19:42 |
clarkb | I've just started a new launch without volume automation to get around make swap and mount volume fighitng | 19:42 |
fungi | yw | 19:42 |
clarkb | server booted successfull this time. And it got its ipv6 address. It appears that the RAs may have a delay wich is why it failed before | 20:00 |
clarkb | we may want launch node to have some reasonable timeout for global ipv6 addr to show up via RAs but will worry about that once I've got this all sorted | 20:01 |
clarkb | fungi: re uuids and devfs its the /dev/disk/by-id path but its not strictly 1:1 | 20:04 |
fungi | yeah, i think that makes sense (so long as we output a message saying that's the delay) | 20:04 |
clarkb | you get a entry there that is virtio-truncateduuid | 20:04 |
fungi | oh, fun | 20:04 |
clarkb | this is on kvm though | 20:04 |
fungi | yep, in rackspace (maybe because of xen?) we get no uuid for raw disk, only for partitions | 20:04 |
fungi | at least according to the kernel | 20:05 |
fungi | pvmove on the rax-dfw mirror is about half done | 20:05 |
clarkb | ok I think the server is all good now with a volume attached and part of lvm and fstab. Working on the changes needed to turn it into a builder now | 20:08 |
openstackgerrit | Clark Boylan proposed opendev/zone-opendev.org master: Add nb03.opendev.org to DNS https://review.opendev.org/750037 | 20:16 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove nodepool builder puppetry and nb03.openstack.org https://review.opendev.org/749853 | 20:23 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add nb03.opendev.org https://review.opendev.org/750039 | 20:23 |
clarkb | there is a whole depends on chain in there to ensure we don't configure the nb03 with the wrong config | 20:23 |
clarkb | if we do that I think it may delete images in all the clouds (because of our default config?) | 20:23 |
fungi | yep | 20:24 |
clarkb | and I'll WIP the last one in the stack which is the removal of the puppetry | 20:24 |
fungi | that got ugly last time | 20:24 |
clarkb | (we should only do that once we are happy with the new server | 20:24 |
clarkb | It is a holiday on monday | 20:49 |
clarkb | chances are I'll end up being around but maybe not as much as a typical monday | 20:49 |
clarkb | infra-root https://review.opendev.org/#/c/744821/ is a useful launch node change from ianw to add sshfp info to the output of the script | 21:05 |
clarkb | I had to manaully figure that out (based on what the script does) | 21:05 |
fungi | yeah, i think christine doesn't expect me to be on the computer much monday, but i'll still try to be around for emergencies | 21:24 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Wait for ipv6 addrs when launching nodes https://review.opendev.org/750049 | 21:27 |
clarkb | that change feels hacky but shoulddo the job | 21:27 |
fungi | #status log cinder volume for mirror01.dfw.rax.o.o has been replaced and cleaned up | 21:32 |
openstackstatus | fungi: finished logging | 21:32 |
fungi | and i've approved the revert of its max-servers zeroing | 21:33 |
fungi | after quickly browsing around it and making sure it's not obviously broken | 21:33 |
openstackgerrit | Merged openstack/project-config master: Revert "Temporarily disable rax-dfw for mirror reboot" https://review.opendev.org/749994 | 21:39 |
*** mlavalle has quit IRC | 22:59 | |
*** tosky has quit IRC | 23:07 | |
*** DSpider has quit IRC | 23:41 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!