*** hamalq has joined #opendev | 00:10 | |
*** hamalq has quit IRC | 00:15 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660 | 00:15 |
---|---|---|
*** hamalq has joined #opendev | 00:28 | |
*** hamalq_ has joined #opendev | 00:33 | |
*** hamalq has quit IRC | 00:33 | |
*** hamalq_ has quit IRC | 00:38 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660 | 00:54 |
*** hamalq has joined #opendev | 01:29 | |
*** hamalq has quit IRC | 01:34 | |
*** DSpider has quit IRC | 01:45 | |
*** hamalq has joined #opendev | 01:45 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660 | 01:48 |
*** hamalq has quit IRC | 01:49 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepro: convert to Ansible https://review.opendev.org/757660 | 02:27 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: reprepro: convert to Ansible https://review.opendev.org/757660 | 03:08 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: mirror-update/reprepro : use common functions https://review.opendev.org/758695 | 03:08 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Remove rsyslogd xconsole workaround https://review.opendev.org/756628 | 05:06 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: docker: install rsyslog to capture container output https://review.opendev.org/756605 | 05:06 |
*** marios has joined #opendev | 05:08 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ARM64 : run base test on Focal too https://review.opendev.org/756629 | 05:48 |
*** ralonsoh has joined #opendev | 06:43 | |
*** sboyron has joined #opendev | 06:45 | |
*** eolivare has joined #opendev | 06:49 | |
*** iurygregory has joined #opendev | 06:51 | |
*** slaweq has joined #opendev | 06:57 | |
*** andrewbonney has joined #opendev | 06:59 | |
*** sshnaidm is now known as sshnaidm|afk | 07:00 | |
*** hamalq has joined #opendev | 07:26 | |
*** rpittau|afk is now known as rpittau | 07:26 | |
*** tosky has joined #opendev | 07:29 | |
*** sgw has quit IRC | 07:29 | |
*** hamalq has quit IRC | 07:30 | |
*** sgw has joined #opendev | 07:47 | |
*** hashar has joined #opendev | 07:52 | |
*** sgw has quit IRC | 08:00 | |
*** hamalq has joined #opendev | 08:19 | |
*** hamalq has quit IRC | 08:24 | |
*** sgw has joined #opendev | 08:25 | |
*** hamalq has joined #opendev | 08:40 | |
*** hamalq has quit IRC | 08:45 | |
*** mkalcok has joined #opendev | 08:57 | |
*** sshnaidm|afk is now known as sshnaidm | 09:37 | |
*** DSpider has joined #opendev | 09:57 | |
*** marios has quit IRC | 10:30 | |
*** ysandeep is now known as ysandeep|coffee | 11:23 | |
*** marios has joined #opendev | 11:30 | |
sshnaidm | hi, all | 11:50 |
sshnaidm | I saw in some vms they have private IP v4 like 10.45.1.98 and public IP v6 2607:ff68:100:54:f816:3eff:fe91:803 - is it for specific cloud providers only? Can I connect from other vm that happened to be in other cloud provider to private IP v4 10.45.1.98 ? | 11:50 |
sshnaidm | is there a routing? | 11:50 |
fungi | sshnaidm: multi-node jobs always satisfy all nodes for their nodeset from a single provider, and it's expected that the public interfaces of all nodes in that build will be able to communicate with each other | 11:56 |
sshnaidm | fungi, yeah, but we have a case with dependency job | 11:56 |
sshnaidm | and seems like they can be in different clouds | 11:56 |
sshnaidm | fungi, can we ensure dependent jobs will run on same cloud as its "parent"? | 11:57 |
fungi | i thought we had solved that already. we have container building workflows which start a registry service in a paused build and then interact with it from others. we wouldn't be able to do that consistently mixing nodes from ipv4-only and ipv6-only providers | 12:03 |
sshnaidm | fungi, yeah, I think I have exactly same jobs pattern - registry and dependent jobs, they happen to be from ipv4 and ipv6. | 12:04 |
sshnaidm | fungi, lemme find logs | 12:05 |
fungi | maybe clarkb or corvus can say for certain how that works when they wake up. i'm a bit swamped getting ready for summit sessions to start and don't have time to go digging in the docs right this moment | 12:05 |
fungi | but yeah, an example will help | 12:06 |
sshnaidm | fungi, hmm.. seems like I was wrong, dependency also has IPv6. It just can't pull from ipv6 registry.. | 12:07 |
sshnaidm | fungi, maybe need rule in ip6tables also | 12:07 |
sshnaidm | fungi, ack, will ping someone if I still have trouble, thanks | 12:08 |
*** ysandeep|coffee is now known as ysandeep | 12:11 | |
*** slaweq has quit IRC | 12:19 | |
fungi | sshnaidm: i think there may also be a firewall role we use from zuul-jobs which will open ports for both ipv4 and ipv6, but i may have also imagined it | 12:21 |
*** slaweq has joined #opendev | 12:21 | |
sshnaidm | fungi, yep, trying this now, but is deps jobs are in the same cloud, then I can use private ipv4 as well | 12:22 |
sshnaidm | and not to deal with ipv6 at all | 12:22 |
*** slaweq has quit IRC | 12:25 | |
*** priteau has joined #opendev | 12:30 | |
openstackgerrit | sebastian marcet proposed opendev/puppet-openstackid master: Changed default queue driver from database to redis to prevent deadlocks https://review.opendev.org/758806 | 12:31 |
*** slaweq has joined #opendev | 12:31 | |
*** Khodayar has joined #opendev | 12:54 | |
*** slaweq has quit IRC | 12:54 | |
Khodayar | Hi, anyone with OpenStack Monitoring experience? | 12:55 |
*** slaweq has joined #opendev | 12:57 | |
TheJulia | Hey, is openstackid-resources.openstack.org run by the infra-team? | 13:08 |
TheJulia | or is that purely OSF ? | 13:08 |
fungi | TheJulia: it is not run by us, no, it's managed by the osf webdev team and contractors | 13:08 |
fungi | they're working to get it back on track from what i understand | 13:08 |
TheJulia | fungi: thanks | 13:09 |
TheJulia | fungi: any specific communication channel I should keep an eye on? | 13:14 |
fungi | TheJulia: they've been posting updates in the "lobby" main page for the conference | 13:15 |
TheJulia | fungi: almost nobody can even get that far... :\ | 13:16 |
TheJulia | we just get "Checking credentials" for at least most people | 13:16 |
gouthamr | +++ | 13:16 |
gouthamr | :( | 13:16 |
TheJulia | oh hey, it just loaded on one of my computers | 13:16 |
* TheJulia looks at how many requests this is | 13:16 | |
gouthamr | I'm moderating a session that was supposed to start a couple of minutes ago - zoom room not working sigh | 13:17 |
TheJulia | gouthamr: the link not working or the supplied credential information ? | 13:17 |
TheJulia | ugh 138 successful of 185 attemtped requests | 13:18 |
gouthamr | TheJulia: the zoom room error is "the meeting ID is invalid", dunno who to get hold of, sent an email to speakersupport | 13:18 |
gouthamr | speakersupport@openstack.org* | 13:18 |
TheJulia | gouthamr: have you updated zoom in say the last two months? | 13:19 |
TheJulia | maybe 3 | 13:20 |
gouthamr | TheJulia: yep - the meeting URL is probably incorrect, i get a "This meeting link is invalid (3,001)" even from the browser | 13:20 |
fungi | as of a few minutes ago we've got a #openinfra-summit channel on freenode | 13:20 |
TheJulia | Have you tried the meeting id manually in the zoom client and then paste the password from the url? | 13:20 |
TheJulia | oooh | 13:20 |
fungi | also yes, e-mail to speakersupport is the preferred way for the events staff to help you sort out session issues | 13:22 |
yoctozepto | TheJulia: I don't seem to be able to get the zoom link from anywhere for your session, only getting disqus... would you mind sharing the link if you have one? | 13:23 |
*** sboyron_ has joined #opendev | 13:23 | |
ttx | Let's centralize the discussion on those issues on #openinfra-summit | 13:23 |
yoctozepto | ++ | 13:24 |
frickler | ttx: fungi: want to send a status notice for that? | 13:25 |
fungi | i can, sure | 13:25 |
*** sboyron has quit IRC | 13:26 | |
openstackgerrit | Merged opendev/puppet-openstackid master: Changed default queue driver from database to redis to prevent deadlocks https://review.opendev.org/758806 | 13:26 |
fungi | though we don't have logging set up for that channel since it was just created | 13:26 |
fungi | something like: #status notice open infrastructure summit platform status updates will be available in the #openinfra-summit channel (though it is presently not logged) | 13:27 |
*** snbuback2 has joined #opendev | 13:28 | |
*** sboyron_ has quit IRC | 13:32 | |
fungi | #status notice Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged) | 13:33 |
openstackstatus | fungi: sending notice | 13:33 |
-openstackstatus- NOTICE: Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged) | 13:33 | |
openstackstatus | fungi: finished sending notice | 13:36 |
fungi | reports in #openinfra-summit that etherpad is slow... looking into it now | 13:37 |
fungi | etherpad server is up over 1.2k concurrent connections according to cacti, but seems to be holding out okay. rootfs utilization is kinda risky though, i'll see if there's something i should clean up in the short term | 13:44 |
clarkb | fungi: older db backups can likely go. We keep 7 locally iirc | 13:50 |
clarkb | docker logs to journalctl another likely candidate | 13:51 |
clarkb | we can prune those with a journalctl command iirc | 13:51 |
*** diablo_rojo has joined #opendev | 13:57 | |
clarkb | the journal is 4GB | 13:58 |
clarkb | looks like we have space there now, I assume you cleaned some stuff up? | 13:59 |
fungi | no, it's just the amount of available space is barely enough for an uncompressed db backup i think, so we nearly fill it daily when mysqldump runs | 14:00 |
fungi | (looking at the cacti graph) | 14:00 |
clarkb | ah | 14:00 |
*** sgw has left #opendev | 14:00 | |
clarkb | fungi: hrm I'm not sure that is it either unless gzip spools to disk | 14:01 |
clarkb | we do mysqldump | gzip > file | 14:01 |
clarkb | which should mean the uncompressed content is only ever in memory | 14:01 |
fungi | oh, maybe not then | 14:01 |
fungi | heh, apparently we're recompressing those? | 14:02 |
fungi | looking in /var/backups/etherpad-mariadb/ there's etherpad-mariadb.sql.gz.2.gz et cetera | 14:03 |
fungi | i think we have logrotate set to compress them when rotating, which would explain the spikes | 14:03 |
fungi | but of course it can't effectively compress them, so it just winds up being an extra copy while rotating | 14:03 |
clarkb | ah yup its set to compress | 14:04 |
clarkb | I bet that is a bug in our ansible conversion | 14:04 |
*** sgw has joined #opendev | 14:07 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Don't recompress db backups https://review.opendev.org/758824 | 14:07 |
clarkb | fungi: ^ that may fix it | 14:07 |
fungi | aha, thanks! | 14:07 |
*** Khodayar has quit IRC | 14:08 | |
fungi | indeed, i guess we were doing it on gitea as well | 14:09 |
*** sshnaidm is now known as sshnaidm|afk | 14:12 | |
*** elod has quit IRC | 14:17 | |
*** elod has joined #opendev | 14:19 | |
fungi | etherpad concurrent tcp connection count is up to 1.4k now | 14:47 |
fungi | server still looks reasonably happy | 14:47 |
fungi | server-status scoreboard has lots of open slots | 14:49 |
clarkb | I think we can do 8l | 14:51 |
clarkb | *8k | 14:52 |
*** mlavalle has joined #opendev | 15:01 | |
*** ysandeep is now known as ysandeep|away | 15:18 | |
*** hashar has quit IRC | 15:38 | |
*** slaweq has quit IRC | 15:55 | |
*** slaweq has joined #opendev | 15:59 | |
*** hamalq has joined #opendev | 16:00 | |
*** marios has quit IRC | 16:01 | |
*** prometheanfire has quit IRC | 16:05 | |
*** tosky has quit IRC | 16:12 | |
*** prometheanfire has joined #opendev | 16:14 | |
*** dtroyer has joined #opendev | 16:29 | |
*** eolivare has quit IRC | 16:36 | |
*** snbuback has joined #opendev | 16:41 | |
*** rpittau is now known as rpittau|afk | 16:43 | |
*** snbuback92 has joined #opendev | 16:45 | |
snbuback | 16:55 | |
*** snbuback92 has quit IRC | 16:55 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add two more openstackid servers so that we can load balnace them https://review.opendev.org/758846 | 16:55 |
clarkb | I based ^ on the logrotate fix let me rebase really quickly | 16:56 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add two more openstackid servers so that we can load balnace them https://review.opendev.org/758846 | 16:56 |
clarkb | mnaser: fungi ^ fyi please double check the ips there | 16:56 |
fungi | yup | 16:57 |
clarkb | I'm finishing up dns chagnes for them. NOTE I cannot do reverse PTR records so that will need to happen vexxhost side if email si a concern | 16:57 |
clarkb | A and AAAA records for both should be up now | 16:57 |
*** snbuback2 has quit IRC | 16:58 | |
clarkb | note the groups already match the variable digits in the names so I think we should be good on that side of things | 16:59 |
*** hamalq has quit IRC | 17:04 | |
*** mkalcok has quit IRC | 17:05 | |
fungi | i've enqueued 758846,2 directly to the gate pipeline | 17:06 |
*** hamalq has joined #opendev | 17:31 | |
*** hamalq has quit IRC | 17:36 | |
openstackgerrit | Merged opendev/system-config master: Add two more openstackid servers so that we can load balnace them https://review.opendev.org/758846 | 17:42 |
*** andrewbonney has quit IRC | 17:49 | |
*** ralonsoh has quit IRC | 18:13 | |
*** priteau has quit IRC | 18:22 | |
clarkb | infra-root I'e disabled ansible on bridge with `disable-ansible` LE updates failed which is causing us to not run puppet run else | 18:36 |
clarkb | we're going to manually run puppet run else on openstackid02 and 03 to pick up those changes | 18:37 |
clarkb | re LE failure I think it is due to nb01 filling its disk | 19:19 |
clarkb | ianw: ^ fyi sinceI think you were looking at that recently | 19:19 |
clarkb | also I had planned to try and do a quiet infra meeting tomorrow, but given today's fires I doubt I'll have the attention span for it | 19:24 |
clarkb | anyone object to cancelling the meeting? | 19:24 |
fungi | i have no burning desire for a meeting | 19:25 |
corvus | that's fine | 19:32 |
*** tosky has joined #opendev | 19:35 | |
*** weshay has quit IRC | 20:07 | |
*** slaweq has quit IRC | 20:12 | |
*** slaweq has joined #opendev | 20:14 | |
*** hamalq has joined #opendev | 20:32 | |
*** hamalq has quit IRC | 20:36 | |
*** hamalq has joined #opendev | 20:48 | |
*** hamalq has quit IRC | 20:52 | |
clarkb | openstackid 02 and 03 are puppeted now | 20:55 |
clarkb | fungi: should we rm the DISABLE-ANSIBLE file? | 20:55 |
fungi | yeah, i think we're all clear now | 20:57 |
clarkb | done | 20:58 |
ianw | hrm, was it 01 disk filling before? | 21:01 |
clarkb | ianw: I think so | 21:01 |
ianw | fungi: the reprepo work i turned off the cron jobs as suggested, so applying it should be essentially a no-op now : https://review.opendev.org/#/c/757660/ | 21:03 |
clarkb | https://review.opendev.org/758824 is another good one to get in | 21:03 |
ianw | clarkb: ^ if you have time to look it over, i can work on it | 21:03 |
clarkb | fwiw I'm still trying to coordinate some openstackid scale up | 21:04 |
clarkb | but once done I can try and take a quick look | 21:04 |
ianw | yeah, i saw that. i have generic roles to setup a load-balancer that didn't go in i think | 21:04 |
ianw | https://review.opendev.org/#/c/677903/ | 21:05 |
clarkb | ah cool that could be useful if we stop puppeting this server :) | 21:05 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Make haproxy role more generic https://review.opendev.org/677903 | 21:07 |
ianw | ahh, right. well yeah, that at least makes it possible to more generically install haproxy | 21:09 |
ianw | that was the idea of https://review.opendev.org/#/c/678159/ to be a haproxy for situations like this to be a generic-ish proxy. that was going to go infront of static but we dropped that bit | 21:10 |
*** ianw has quit IRC | 21:13 | |
*** ianw has joined #opendev | 21:15 | |
openstackgerrit | Goutham Pacha Ravi proposed openstack/project-config master: Add manila client,ui,tempest plugin core teams https://review.opendev.org/758868 | 21:18 |
openstackgerrit | Goutham Pacha Ravi proposed openstack/project-config master: Add manila client,ui,tempest plugin core teams https://review.opendev.org/758868 | 21:30 |
*** slaweq has quit IRC | 21:35 | |
*** slaweq has joined #opendev | 21:35 | |
*** slaweq has quit IRC | 21:41 | |
ianw | there's a bunch of .raw images left on nb01 | 22:05 |
ianw | i think all the interesting logs have scrolled away but i'm looking | 22:05 |
openstackgerrit | Merged opendev/system-config master: Don't recompress db backups https://review.opendev.org/758824 | 22:06 |
ianw | [Mon Oct 19 22:05:54 2020] EXT4-fs error (device dm-1): ext4_put_super:935: Couldn't clean up the journal | 22:06 |
ianw | [Mon Oct 19 22:05:54 2020] EXT4-fs (dm-1): Remounting filesystem read-only | 22:06 |
clarkb | that will do it | 22:08 |
ianw | the logs are full of bad stuff. i'm starting to think maybe it's beyond fsck | 22:09 |
ianw | i think we can just mkfs /opt and an ansible run will restore everything | 22:10 |
ianw | i wonder if this is related to container updates. | 22:21 |
clarkb | we should have a step that does a container image clean but maybe that isn't working or was missed on nodepool builders? | 22:22 |
ianw | in theory a graceful stop of the container should wait for the current dib to finish and shutdown but i doubt it practically does | 22:23 |
clarkb | but that keeps the running container image and the most up to date container images if working | 22:23 |
ianw | the /opt fs fsck's clean, maybe all the dm-1 stuff is from dib's loopback mounts | 22:24 |
ianw | zuul/nodepool-builder latest 7749c9547286 3 weeks ago 792MB | 22:25 |
ianw | i guess that's unlikely then | 22:25 |
ianw | there's no leaked images, they are legitimately taking up 924G | 22:27 |
*** DSpider has quit IRC | 22:27 | |
clarkb | hrm what does nb02 look like? is it not helping out? | 22:28 |
ianw | yeah, it's seeming not doing much | 22:30 |
ianw | openstack.exceptions.HttpException: HttpException: 500: Server Error for url: https://image.api.mtl01.cloud.iweb.com/v2/images/7360eac0-a157-4975-acb3-8b87bfbf53ee, The server has either erred or is incapable of performing the requested operation.: 500 Internal Server Error | 22:31 |
ianw | i tjust seems to be looping around doing that | 22:31 |
ianw | grep 'Deleting image build ' * | awk '{print $8}' | sort | uniq -c | sort | 22:34 |
ianw | ... 20 fedora-32-0000000038 | 22:34 |
ianw | 20 ubuntu-bionic-0000120991 | 22:34 |
ianw | 27468 debian-stretch-0000116039 | 22:34 |
ianw | so not quite true, but it's certainly hung up on that image | 22:34 |
clarkb | that should be in a separate thread I think | 22:38 |
ianw | 2020-10-16 17:12:46.590 | DEBUG diskimage_builder.block_device.utils [-] exec_sudo: losetup: /opt/dib_tmp/dib_image.JPvC9WST/image0.raw: failed to set up loop device: No such file or directory exec_sudo /usr/local/lib/python3.7/si | 22:39 |
ianw | te-packages/diskimage_builder/block_device/utils.py:135 | 22:39 |
ianw | it is trying to build ... but getting weird errors | 22:40 |
ianw | there's a bunch of stuff in /dev/mapper, but no mounts | 22:41 |
*** qchris has quit IRC | 22:41 | |
ianw | i don't know, i think i'm going to reboot it, there's not much i can do with it now | 22:41 |
clarkb | ok | 22:41 |
ianw | we could also consider btrfs and dedup these, as i think the raw and vhd files are basically the same | 22:43 |
ianw | nb02 is trying centos-8 now and i'm watching it | 22:44 |
*** qchris has joined #opendev | 22:54 | |
clarkb | fungi shoudl I exit our root screen on bridge? I think we're steady state on our side of things and now its up to smarcet et al | 23:06 |
fungi | yeah, i already detached | 23:07 |
clarkb | and now I'm out too | 23:07 |
*** tosky has quit IRC | 23:16 | |
ianw | nb02 seems to have mounted it's loop device and is making the image | 23:18 |
ianw | 60G free doesn't leave much headroom on nb01 i guess, as i think vhdutil makes about 3 copies at various points | 23:23 |
ianw | if i let nb02 go for a while and build a few images, presumably then the images will be old on nb01 and should free up some space when i start it | 23:23 |
ianw | i will renable ansible though | 23:25 |
clarkb | ya usually we end up about 50:50 between them | 23:25 |
clarkb | I guess if one breaks the other does too after ita disk fills | 23:26 |
ianw | yeah, i think that's at the root of this | 23:26 |
ianw | and i think all the corruption might be because we may make sparse .raw files? and then when the disk fills ... bang, nothing can handle that | 23:27 |
ianw | speaking of backups, "Connection closed by remote host. Is borg working on the server?" on ethercalc ... so something up there | 23:41 |
ianw | ahh, i think i might know, i think the bup config for .ssh/config overwrites the borg one | 23:42 |
fungi | that seems entirely likely | 23:47 |
ianw | hrm, it uses blockinfile ... but still the borg config seems to not be there | 23:47 |
ianw | oh, i think you have to set unique markers | 23:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: use unique mark in .ssh/config https://review.opendev.org/758879 | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!