*** rlandy_ is now known as rlandy|out | 00:05 | |
*** ministry is now known as __ministry | 02:35 | |
*** Tengu_ is now known as Tengu | 08:26 | |
*** rlandy|out is now known as rlandy | 10:31 | |
*** rlandy is now known as rlandy|mtg | 13:03 | |
*** dasm|off is now known as dasm | 13:25 | |
*** dviroel is now known as dviroel|mtg | 14:15 | |
fzzf[m] | fungi: hi, when I use DIB build diskimage have error, log like... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/tHheIRSvyYwZKoxlWNObiPHh) | 15:20 |
---|---|---|
fzzf[m] | * clarkb: fungi: hi, when I use DIB build diskimage have error, log like... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/lrnCfSDQjOEuYYKoxHMYfqRf) | 15:20 |
fungi | fzzf[m]: it's hard to identify the cause without a lot more detail, but usually the reason is that some process run during package installation within the chroot didn't terminate or mounted something in a subtree of that filesystem which didn't get umounted | 15:24 |
fungi | you might try using df and lsof to figure out what's still using that mount | 15:25 |
fzzf[m] | fungi: That's a way, I'll try to check it. I have check diskimage-builder log. will some useful information here be displayed? thanks :) | 15:29 |
fungi | fzzf[m]: probably, but you'll want the context of what on the system is still using that block device/filesystem first, and then you may be able to track down the reason for it in the log | 15:30 |
fzzf[m] | fungi: fine. I get it. thanks :d | 15:31 |
fungi | unfortunately, packages have a tendency to run maintscripts at installation which sometimes mount other things (especially virtual device trees under /dev) or leave processes going with open file handles (e.g., logging to something under /var) | 15:33 |
fzzf[m] | fungi: Is there any way to avoid it. I get this umount error every time. I looked at the diskimage build log, the cirrors image download was completed, most elements completed their jobs, and then an error occurred when unmount, resulting in an unsuccessful final build. and also show this lastly. but this should not be the reason for... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/hvtKLSzDDslWeFeiJnaBHmhZ) | 15:43 |
clarkb | dib does set the apt settings to not run scripts iirc | 15:44 |
clarkb | you can interrupt the build process and then examine what processes have files open | 15:44 |
clarkb | to do this you can put a line that is `bash` in the element script that runs at the end before failure | 15:45 |
fungi | also keep in mind that the log entry about being unable to umount the block device is coming from the cleanup phase, so you may be encountering an error before that which is terminating the image build | 15:45 |
clarkb | good point | 15:45 |
fzzf[m] | fungi: sry, I don't understand. do you mean error from cleanup phase | 15:49 |
clarkb | fzzf[m]: no the cleanup phase runs after successful or errored builds. This means that if the cleanup phase fails it could be due to an earlier fail during the actual build | 15:50 |
fungi | fzzf[m]: dib tries to install/configure things in the image, then once it's done it cleans up after itself. if something goes wrong during the install/config phase, then it could leave things in a "dirty" state which dib is unable to properly clean up | 15:51 |
fungi | so you'll want to look at the log entries prior to when cleanup started for the build to see if things were successful or whether there's some other failure you need to address | 15:51 |
fzzf[m] | fungi: This sudo kpartx -d /dev/loop4 is before trap_cleanup. I haven't found any other errors... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/fYWaPwBDtZEWPNimhsqmnhOQ) | 15:57 |
fzzf[m] | <clarkb> "to do this you can put a line..." <- do you mean add /bin/bash in elemnts script.. | 15:58 |
clarkb | fzzf[m]: yes if you do that then manually run the image build your terminal will enter that bash shell and you can interact with the system. Then when you exit that shell the build will continue | 15:59 |
fungi | so it umounts /var/cache/nodepool/dib_tmp/dib_build.OuW0QXUa/mnt/ and then tries to delete /dev/loop4 but gets back an error that loop4p2 (a partition of the loop4 device) is still busy. what does the log say that filesystem was mounted on? or where does it say that partition was mounted? | 16:00 |
*** dviroel|mtg is now known as dviroel|lunch | 16:02 | |
fzzf[m] | clarkb: In this case, I need to edit element script, manually build the diskimage, and set the env variable first, right? I used nodepool-builder to automatically build it before. | 16:03 |
*** rlandy|mtg is now known as rlandy | 16:03 | |
clarkb | yes if you want to do interactive debugging of the builds you need to do manual steps | 16:06 |
clarkb | nodepool runs automatically as a daemon and retries in a loop and there isn't a good way to break into it from there | 16:07 |
fzzf[m] | fungi: find this... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/qWMVqhpjsnVqOiGbrCvspUfO) | 16:08 |
fzzf[m] | clarkb: okay, get it. | 16:09 |
fungi | fzzf[m]: is that a log from a different build? it's talking about a loop2p3 partition, not the loop4p2 your previous sample was complaining about being in use. switching between logs from multiple is just going to get confusing | 16:22 |
fungi | er, between logs from multiple image builds | 16:23 |
fzzf[m] | fungi: sry. that's other. this is loop4p2 log.... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/MTLasbUvSFkzrPHoKXqURRub) | 16:29 |
opendevreview | sean mooney proposed openstack/project-config master: update Review-Priority lable for nova related projects https://review.opendev.org/c/openstack/project-config/+/837595 | 16:32 |
clarkb | sean-k-mooney: re ^ why explicitly list +0 permissions? | 16:33 |
clarkb | I don't think it hurts but also want to amke sure I'm not missing anything | 16:34 |
fungi | okay, so the /var/cache/nodepool/dib_tmp/dib_build.OuW0QXUa/mnt/ being umounted is where loop4p2 was mounted, that helps. i wonder if it's somehow acting implicitly as a "lazy umount" (-l/--lazy) so it's not actually umounted when the kpartx -d call happens | 16:34 |
sean-k-mooney | clarkb: we want the patch owner to be able to clear it but not request it | 16:36 |
sean-k-mooney | so we dont want the patch owner to be able to set +1 | 16:36 |
sean-k-mooney | we might relax that but that is why i put chage ownwer +0 | 16:36 |
clarkb | sean-k-mooney: right but everyone can always +0 | 16:36 |
clarkb | and taht doesn't clear other people's votes | 16:36 |
sean-k-mooney | right but i was hoping that woudl overrite registered ownwer | 16:37 |
sean-k-mooney | ah ok | 16:37 |
clarkb | oh no I don't think it will | 16:37 |
clarkb | I see what you are saying now. Pretty sure that isn't how it will work | 16:37 |
sean-k-mooney | ok | 16:37 |
clarkb | everyone will be able to +1 | 16:37 |
sean-k-mooney | we technially dont need ot enforce that i guess | 16:37 |
sean-k-mooney | ya i guess that is ok | 16:38 |
sean-k-mooney | we can just dicurrage self use | 16:38 |
sean-k-mooney | we had considerd allowing self use but were not sure if it woudl be abused | 16:38 |
sean-k-mooney | clarkb: so you would suggest just droping the change owner line | 16:39 |
clarkb | sean-k-mooney: yes to avoid confusion | 16:39 |
sean-k-mooney | runnign tok localy i seam to have some failures | 16:39 |
sean-k-mooney | so ill be resinnign anyway | 16:39 |
sean-k-mooney | thanks will do | 16:39 |
sean-k-mooney | clarkb: ill respin that tomorrow thanks for taking a look. ill get sylvain as ptl to give it a review too to make sure he is happy and then remvoe -w when its ready for reivew by infra or ping ye here | 16:45 |
clarkb | sounds good | 16:45 |
*** dmellado_ is now known as dmellado | 16:46 | |
fzzf[m] | fungi: start from line 81. seem like use umount -fl. https://paste.opendev.org/show/btoz1JnttNjtouE58mau/ | 16:49 |
clarkb | fzzf[m]: is the filesystem that you are running on network provided? just noting the umount man page for --lazy indicates network filesystems may cause problems | 16:50 |
fungi | well, it's rather than -l/--lazy is there to counter problems with unresponsive network filesystems | 16:51 |
fungi | it's rather that, i mean | 16:51 |
clarkb | right but maybe the mount wasn't actually gone at the end because it was on a network fs? lazy unmounting allows you to ignore that but we aren't ignoring it later due to the loopback device handling | 16:52 |
clarkb | anyway dib should probably not do a lazy umount given the later loopback device handling | 16:52 |
fungi | lazy umounting will also return even if there are still submounts | 16:53 |
fungi | per the umount manpage | 16:53 |
fungi | "A system reboot would be expected in near future if you’re going to use this option for [...] local filesystem with submounts." | 16:54 |
fungi | anyway, it's probably not great to be trying to delete block devices after lazy-umounting them, since there's no guarantee that the umount has completed by the time device deletion starts | 16:55 |
fungi | hard-umounting would solve that, but could lead to the process hanging indefinitely | 16:55 |
fungi | this likely explains the loop device leaks we see on builders from time to time | 16:56 |
clarkb | ya it may be better to lazy unmount, then check in a loop with a timeout and if after say 5 minutes we still haven't unmounted then error | 16:57 |
*** dviroel|lunch is now known as dviroel | 16:58 | |
dansmith | dpawlik: https://af03dfc56dd1bea1c6a5-57b719e0009d4036c44d6542bd77bfc6.ssl.cf1.rackcdn.com/837139/11/check/tempest-full-py3/57baa39/controller/logs/performance.json | 20:10 |
dansmith | clarkb: ^ | 20:10 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove geard graphing from zuul-status dashboard https://review.opendev.org/c/openstack/project-config/+/837621 | 20:26 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove geard graphing from zuul-status dashboard https://review.opendev.org/c/openstack/project-config/+/837621 | 20:31 |
clarkb | dansmith: might want to graph rabbit and mysql and etcd in the processes list? but that is looking pretty good | 20:34 |
dansmith | ah, yeah | 20:34 |
clarkb | dansmith: is that ~ half a gig of memory just for privsep though? | 20:34 |
dansmith | no, ~50ish | 20:35 |
clarkb | 514330624 is the rss added together from your example and then divided by 1024^2 is ~490 | 20:38 |
dansmith | oh sorry, you mean total privsep usage, I see | 20:40 |
dansmith | I thought you mean the individual ones, most of which are around 50mb | 20:41 |
dansmith | but yeah neutron is 100m on its own for some reason | 20:41 |
clarkb | ah yup | 20:41 |
clarkb | dansmith: my hunch is they have more regexes/rules. I wonder if they all get precompiled for performance reasons but that means regardless of how the software is used we carry the memroy cost of all the rules at all times | 20:42 |
dansmith | maybe, I was thinking more like they're doing large dumps of netlink outputs which inflate the heap and get proxied to the neutron proicess | 20:42 |
dansmith | like "iptables --line-numbers -L -nv" | 20:42 |
clarkb | hrm ya, iptables can be chatty | 20:43 |
dansmith | IIRC with privsep it should be mostly python code that is resident, not rootwrap-style rules right? | 20:43 |
clarkb | I thought it still did rootwrap style rules, but I may be mistaken. I didn't follow that migration super closely | 20:44 |
dansmith | okay, I didn't think so but... me either | 20:44 |
dansmith | also that doesn't include the api call counts because I specified the wrong file, but it'll have those too | 20:44 |
dansmith | also added erlang (rabbit) and mysqld to the default process list | 20:44 |
clarkb | etcd is probably worthwhile too since it is included by default. Though unsure if anything actually uses it at this point | 20:45 |
clarkb | dansmith: looking at neutron/etc/neutron/rootwrap.d/rootwrap.filters it seems that there are path filters too which may imply privsep is reading and writing file stoo | 20:47 |
clarkb | that could explain buffer bloat too if files are large | 20:47 |
dansmith | oh is their privsep using rootwrap instead of the native stuff? | 20:47 |
clarkb | th econfig files seem to use rootwrap paths at least | 20:48 |
dansmith | I thought the native way privsep works is that the privsep binary is run as root (potentially using rootwrap for just that) and then it proxies actual python calls rpc-style to/from the parent | 20:49 |
dansmith | so only one rootwrap rule should be needed if that's what you use | 20:49 |
dansmith | cinder is using etcd via tooz, IIRC | 20:50 |
clarkb | got it re rootwrap and privsep. That sounds likely | 20:50 |
*** dviroel is now known as dviroel|out | 20:55 | |
fungi | afaik, all privsep rule evaluation should be taking place in python, however the migration was possible piecemeal, so if neutron still hasn't completed it then they may be doing both rootwrap and privsep | 21:12 |
clarkb | well its mostly that the configs all see mto be in rootwrap files hinting at that. But maybe that was a compatibility thing | 21:17 |
fungi | i wouldn't be surprised if they never finished | 21:18 |
clarkb | looks like c-bak is still a big consumer of memory too. I thought we had addressed that by not running c-bak since nothing was testing cinder backup in those installations | 21:31 |
clarkb | looking at devstack's .zuul.yaml c-bak is explicitly enabled. I wonder if that disablement got lost in the d-g to zuul config shuffle /me tries to figure that out | 21:32 |
clarkb | looks like we removed it from grenade in d-g | 21:34 |
dansmith | fungi: right | 21:40 |
dansmith | clarkb: I thought we did have some c-bak tests, but I could be wrong | 21:40 |
dansmith | I disabled c-bak and swift in some other jobs specifically to get some memory back since we were ooming | 21:40 |
dansmith | (and apparently c-bak requires swift) | 21:41 |
clarkb | dansmith: ya looking at the c-bak logs it seems to be doing something. My memory may be related specifically to hte grenade situation which likely doesn't check c-bak | 21:43 |
dansmith | ack | 21:44 |
*** rlandy is now known as rlandy|bbl | 22:35 | |
*** dasm is now known as dasm|off | 22:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!