Friday, 2020-07-31

*** ryohayakawa has joined #opendev00:05
openstackgerritMerged opendev/system-config master: Revert "Cap pytest to <6.0.0 to fix pytest-html"
*** ryohayakawa has quit IRC00:11
*** ryohayakawa has joined #opendev00:27
kevinzianw: Hi, I'm online01:00
kevinzianw: What do you mean of "leaked node"?01:01
ianwkevinz: hey :)  so i can't delete any of the servers at the moment, they all seem stuck in state deleting01:01
kevinzianw: OK, let me check01:01
ianw| OS-EXT-STS:task_state       | deleting01:02
ianwone example01:02
kevinzAha, yes I see. a lot of01:03
*** elfenix has quit IRC01:10
kevinzianw: it looks recovered01:18
kevinzianw: The nova compute service can not talk to rabbitmq01:19
ianwkevinz: cool, thanks yeah i see it blank now too.  let's try pyca/cryptography recheck :)01:19
kevinzianw: np01:19
kevinzianw: I wonder why it can not connect to the rabbitmq, and just restart nova_compute can solve this01:20
kevinzianw: btw I saw that has been closed01:21
ianwoh ... that should not be?  as in shutoff?  let me check01:21
ianwi'm powering it up now ...01:23
kevinzianw: OK01:23
ianwkevinz: ^ does that timestamp correlate to anything?  i don't think we did anything to shut it down01:23
kevinzianw: I don't think we have some operation at that time...01:24
kevinzianw: let me check the log why it has been closed01:24
ianwkevinz: hrm, i'm not sure it's turning on ... the console log is giving me a Unknown Error (HTTP 504)01:25
ianw| OS-EXT-STS:task_state       | powering-on01:25
ianwbut ... yeah, it doens't seem to be01:25
kevinzInstance in transitional state powering-off at start-up retrying stop request _init_instance01:29
kevinzianw: looks this mirror is started now01:32
ianwyep; although "console log" on it still shows a 504 error for me, so maybe the console service is unhappy?01:33
ianwpyca-cryptography-centos-8-py36-arm64 node_failure01:35
kevinzlet me see01:35
kevinzand I see that scheling failed01:37
ianwyeah, all of them seem to be going into node_failure01:41
ianwdo you need the logs from our side?  i guess you can see the problem on your side01:41
kevinzianw: I saw that the scheduling to one node already, and then the compute show log like this: Instance spawn was interrupted before instance_claim, setting instance to ERROR state _error_out_instances_whose_build_was_interrupted /var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py01:44
ianw... hrm ... that message doesn't ring any instant bells for me, sorry01:45
clarkbcheck libvirt log?01:48
kevinzclarkb: the process has stoped at nova-compute. do not kick off the process to calling libvirt to create vm01:50
clarkbhuh not sure then, nova channel may know?01:54
kevinzclarkb:ianw: looks recoverd now02:03
kevinzI see that the problem is due to rabbitmq02:03
kevinzall the compute node failed to connect to the rabbitmq server, and stucking at creating connection02:03
kevinzianw: could you help to retry creation? I create 5 instances and they are working fine now02:04
ianwkevinz: yep, i've just retriggered some testing for
ianwlooks like all the nodes are building, a good sign :)02:05
kevinzianw: yes I see, coo02:07
ianwkevinz: not sure if you saw the context for pyca/cryptography testing, mostly described in
kevinzianw: thanks for the info. I will take a look at this. It  looks Opendev can offer external CIs for testing.02:17
kevinzThis maybe a better method to envolve more CI jobs ourside OpenStack foundation02:19
ianwyeah, we don't want to become travisci exactly, but contributing where it makes sense on a more "strategic" model02:20
ianw ... yay, results! :)02:27
*** owalsh has joined #opendev02:29
*** owalsh_ has quit IRC02:33
corvus"TypeError: 'ellipsis' object is not iterable"  neat02:48
corvusianw: have they perhaps dropped py35 support?02:48
ianwcorvus: yeah, the setup.cfg seems to say it's supported02:48 but yeah02:49
corvusyeah, the travis 3.5 builds are passing02:49
corvusi wonder why that failed then02:49
corvusor, rather, the github actions builds02:52
corvusi dunno about travis, i'm not logged in02:52
ianwit looks like maybe an issue with the typing library on xenial :
ianwhrm, that can't be it ... that must be part of the tox run02:54
ianw maybe not, typing isn't listed in pip list02:55
ianwoh, hang on, i'm getting confused by the backport packge02:57
ianw"OK, I see you are using 3.5.2 in CI, then you need to either upgrade to 3.5.3, " xenial is 3.5.302:57
ianw3.5.2 i mean02:57
ianwthat suggests the travis xenial tests are not using the xenial python02:58
ianwright, looks like xenial testing is restricted to python 2.703:00
corvusi'm going to go out on a limb and guess that the pyca folks are gonna be in the "test latest upstream python 3.5" camp and not in the "test what the distros ship" camp03:01
ianwok, well there is no coverage for python3 on xenial afaics anyway.  we can switch the xenial test to 2.7 and that would be equivalent of the x86 tests03:02
ianwDownloading archive:
ianw$ python --version03:03
ianwPython 3.5.703:03
ianwthat's how it gets tested on 3.5 effectively03:03
*** bhagyashris|away is now known as bhagyashris03:52
fungiwe have an ensure-python role which installs a built latest minor release using stow, right?03:59
fungior did i imagine that?03:59
*** DSpider has joined #opendev05:00
*** Dmitrii-Sh has quit IRC05:07
*** Dmitrii-Sh has joined #opendev05:08
*** redrobot has quit IRC05:17
*** lpetrut has joined #opendev06:10
ianwfungi: yeah, but not sure about arm64 support ... and also there's the speed to consider if we do that06:13
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml
*** Meiyan has joined #opendev06:32
*** ysandeep|away is now known as ysandeep06:48
*** ssaemann has joined #opendev06:50
*** qchris has quit IRC06:51
*** avass has quit IRC06:56
*** ssaemann has quit IRC07:04
*** qchris has joined #opendev07:05
*** tosky has joined #opendev07:19
zbrI managed to get ownership of -- we can try to use it to make the community more accessible to others.07:21
*** ssaemann has joined #opendev07:28
*** ssaemann has quit IRC07:36
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** ianw has quit IRC08:01
*** ianw has joined #opendev08:02
*** ysandeep is now known as ysandeep|afk09:07
*** dtantsur|afk is now known as dtantsur09:19
*** Meiyan has quit IRC09:46
openstackgerritLajos Katona proposed openstack/project-config master: Import netowrking-l2gw & networking-l2gw-tempest-plugin to x/
*** lpetrut has quit IRC10:17
*** ysandeep|afk is now known as ysandeep10:37
*** lpetrut has joined #opendev10:51
*** zbr is now known as zbr|pto10:56
openstackgerritLajos Katona proposed openstack/project-config master: Import netowrking-l2gw & networking-l2gw-tempest-plugin to x/
*** tkajinam has quit IRC11:36
*** ryohayakawa has quit IRC11:48
*** ysandeep is now known as ysandeep|brb12:03
*** ssaemann has joined #opendev12:10
*** ysandeep|brb is now known as ysandeep12:16
*** ssaemann has quit IRC13:58
*** mlavalle has joined #opendev14:03
*** lpetrut has quit IRC14:06
mordredfungi: we have a role for it that uses stow but I think so far only mnaser is taking advantage of it14:12
mnaseri don't think we have added it yet sadly :( but i think its .. tested14:12
mnaseras in tested inside zuul-jobs, i think14:12
mnaseris there a way that we can get a nodeset for the vexxhost tenant carved out of our nodepool allocation?14:39
mnaserreasoning is the operator stuff is hard to fit into a single 8gb system.  the only other option is to deploy over multinode to distribute this stuff but that's a whole another realm of issues14:40
clarkbmnaser: vexxhost doesnt currently provide any larger flavors iirc. several otherclouds do. You could add the expanded labels to vexxhost then consume them from the global pool?14:43
clarkbthat said multinode is valuable because its more like what the operator will do in the real world14:44
clarkbmore issues yes, but better to address them early?14:44
mnaserclarkb: yeah -- i agree on that second statement very much.14:44
mnaserwe were very stable until we started adding more things and then it wasn't happy, i mean, we run it in multinode right now (operator is how our cloud runs for those services)14:45
mnaserbut i think this might be something we should do to replicate a more production environment14:45
mnaseri think for now we might have to use bigger nodesets just to help us unblock the progress14:50
mnaserclarkb: i could swear zuul had a thing where buildset vm needed to be colocated with the jobs that consumed it, is that right?14:53
mnaserwhich means i might need 2 actual vexxhost-specific flavors, as i really don't need a big nodeset holding that..14:54
clarkbmnaser: by buildset you mean buildset registry? then yes. This is why I suggest adding to the existing pool of those resources instead14:56
clarkbyou'll avoid node failures through extra retries and schedule more quickly if thereis headroom14:57
mnaserclarkb: so using the -expanded thing? i think it looks like openedge and airship are the only ones that maintain it?14:57
mnaserand one of those is gone and im not sure about the other14:57
clarkbyes and openedge is off for the summer but should be back and airship is still providing those resources if they can be scheduled14:58
fungiit's technically citycloud providing them, and yeah the 16gb flavor seems to schedule somewhat reliably there, just not the 32gb one15:06
openstackgerritMohammed Naser proposed openstack/project-config master: Re-add vexxhost-specific labels
mnaser^ we'll unblock ourselves and i'll work on moving those jobs to multi-node over the weekend so we can kill those labels15:11
*** chkumar|rover is now known as raukadah15:12
donnydHoping in the next 7 days to have it back online15:16
openstackgerritMohammed Naser proposed openstack/project-config master: Re-add vexxhost-specific labels
mnaserfungi: i missed updating labels15:17
mnasercc clarkb mordred ^15:17
clarkbI'm going to pop out shortly and get abike ride in. Then back to lamd that gerrit /p/ change and maybe upgrade gitea15:35
*** redrobot has joined #opendev15:37
*** auristor has quit IRC15:43
*** auristor has joined #opendev15:44
fungii'll probably be semi-around15:45
openstackgerritMerged openstack/project-config master: Re-add vexxhost-specific labels
mnaseris it _possible_ that we have bad images uploaded?17:27
mnaseroddly enough happens _only_ for the new expanded flavor, but the non-expanded one works just fine, and they're both spawning in the same exact hypervisors..17:28
*** dtantsur is now known as dtantsur|afk17:33
fungianything funny with the way bfv is set for those flavors? (no clue if that can vary, complete shot in the dark, seems to be having trouble identifying the boot partition)17:45
clarkbmnaser: we've seen it before because yhere is no verification of hash sums with glance17:48
clarkbbut its been universal failure when that happened17:48
clarkbcorvus:  mordred can I get a second review on I'll aplrove it shortly if it looks good17:48
mnaserfungi: those flavors are not bfv, they actually are specific to opendev with built-in local storage (hence osf- prefix)17:50
mnaseri cant imagine why a larger instance fails to boot but a smaller one does just fine.  the diskimage seems the same and nodepool won't be uploading another image...17:51
mnaser"here are the available partitions:" does not list the actual local drive17:51
mnaserdid i create the flavor incorrectly in openstack... /me checks17:51
mnaseroh my god17:52
mnaseri created a flavor with 64mb of memory :-)17:52
mnaserthat'll do it.17:52
dmsimardthat's a lot of megabytes17:57
*** ysandeep is now known as ysandeep|brb18:00
fungiall the megabytes18:01
*** ysandeep|brb is now known as ysandeep18:07
mordredclarkb: +2 - +W at will18:15
corvusditto clarkb (fungi has a comment)18:15
clarkbya I think I'll address fungi's comment in a followup along with another cleanup18:18
clarkbchange is approved, I'll keep an eye on it18:18
openstackgerritClark Boylan proposed opendev/system-config master: Cleanup /p/ further and add reminder comments
clarkb^ addresses fungi's comment and adds another clenaup18:23
openstackgerritMerged openstack/project-config master: Remove os_congress gating
*** ysandeep is now known as ysandeep|away19:05
openstackgerritMerged opendev/system-config master: Deny Gerrit /p/ requests
clarkbthe job to apply ^ has started19:22
clarkbapache has restarted19:25
clarkbgerrit itself still works for me so thats all happy looking19:25
clarkbfatal: unable to access '': The requested URL returned error: 40319:26
clarkbthat looks correc ttoo19:26
mordredI agree - gerrit still works19:27
fungiyep, lgtm generally19:27
clarkbI'll give that a bit just to be sure nothing pops up then approv ethe gitea upgrade19:28
clarkbok no screaming yet, I'll approve the gitea upgrade now19:55
mordredclarkb: aren't you so glad we don't actually hear physical screaming through IRC? :)20:00
clarkbya I get enough from my kids20:01
*** smcginnis has quit IRC20:12
*** smcginnis has joined #opendev20:13
openstackgerritMerged opendev/system-config master: Cleanup /p/ further and add reminder comments
clarkbrhel and centos users:
clarkbI wonder if that will affect our image builds, it isn't clear to me if secureboot is necessary to trip it20:21
clarkboh its any uefi setup with secureboot or without but we only uefi on arm64 so we're probably ok to just roll with it20:22
clarkbmnaser: ^ I think you run centos20:23
mnaserclarkb: thanks for that, we've actually moved most of our fleet to debian :>20:24
mnaserand i think we're already patched and the debian patches are largely ok20:24
*** owalsh has quit IRC20:24
clarkbcool, just wanted to point it out as that could make for a very bad weekend :)20:25
* prometheanfire uses his own key :D20:28
clarkbprometheanfire: this bug doens't seem to require secureboot, its an issue with the update for secureboot that breaks all uefi boots20:28
clarkbso if you auto update and uefi boot you're still likely to break?20:28
prometheanfiremaybe, I'm using systemd-boot personally, so not sure if that's impacted20:31
fungiyeah, i haven't seen any complaints about the debian patches, and i'm subscribed to all the mailing lists which would be blowing up about now if there were any20:37
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify
fungialso i have systems which boot uefi and have rebooted them since the patches were applied, without issue20:38
fungi(currently testing a bunch of experimental kernel rebuilds on one, so i definitely would have noticed)20:38
clarkbya I think it is a rh specific bug20:40
*** owalsh has joined #opendev20:45
fungiseems so20:45
clarkbgitea change should merge in a couple minutes20:55
clarkbjust waiting on zuul to process its results queue21:01
openstackgerritMerged opendev/system-config master: Upgrade Gitea to v1.12.3
clarkbhrm gitea01 is done but I'm not convinced its gitea process is happy21:11
clarkbI think it may be restarting?21:11
clarkbyup it just did it again21:12
clarkb2020/07/31 21:11:45 ...exer/code/indexer.go:125:func3() [F] Repository Indexer Initialization Timed-Out after: 30s21:12
clarkbI think that is the issue21:12
clarkbgitea02 seems to not be expriencing this21:14
clarkbhrm and now gitea01 has been running for longer than 30 seconds21:14
clarkbthat makes me wonder if that is simply an incremental process and a timeout that is too short21:14
clarkbI'll continue to monitor21:14
clarkbother than that the web ui seems to render ok and I havne't seen any other issues21:15
clarkb5 minutes ago       Up 3 minutes <- is how to identify if it has happend without grepping the logs21:15
clarkbthats the docker ps -a output showing container creation time and start time21:16
clarkbI'll check the rest of them but I think we're ok as gitea01 has caught up and is no longer restarting now21:16
clarkbhowever the problem with that is it makes our graceful restarts for gerrit replication less graceful as web can be down with ssh up21:16
*** owalsh has quit IRC21:17
clarkbI think they may all experience it but then they recover I'm rtfsing now to see if that is a configurable timeout21:17
openstackgerritClark Boylan proposed opendev/system-config master: Increase gitea indexer startup timeout
clarkbinfra-root ^ I think that should address the problem21:24
clarkb lgtm I'll check 02-08 render it properly too21:24
clarkbif others can do a quick check too that would be great21:25
mnaseri'm trying `docker run -it --rm` which i remembered corvus shared a usage example in the past21:25
mnaserbut it looks like im getting 404 not found (plain text) back..21:25
mnaseris it possible that it just got pruned?21:25
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify
clarkbpruning is done when you promote the next image and we prune only tose that are more than 24 hours old iirc21:26
mnaserthey were pushd here
mnaseri don't think we've done a promotion yet21:26
mnaserim not even getting a proper 404, it's a cherrypy 40421:26
mordredinsecure-ci-registry is cherrypy21:26
clarkb02-08 also lgtm so I think we're good, double checking still appreciated21:27
mordredoh - but you mean it's a text not a json21:27
mordredor whatever21:27
mnaseryeah, text not json21:27
mnaserdocker: Error response from daemon: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE html PUBLIC\ etc21:27
mordredcorvus: ^^21:28
corvusmnaser: first of all, do you have the zuul build link for that rather than the log?21:28
* mnaser needs to break the log link habit21:28
corvusmnaser: docker:// is what you want21:28
corvusmnaser: it's an artifact link on that page21:28
corvusmnaser: (right click / copy url on "vexxhost/glance-api:latest")21:29
mnasercorvus: wow, that's very easy.21:29
corvusmnaser: too easy apparently ;)21:29
mnasercorvus: `import rbd` and i can see my error why it failed in ci21:30
mnaserthis is _awesome_21:30
mnasernow back to finding the fun of why `ImportError: /usr/local/lib/python3.7/site-packages/ undefined symbol: rbd_aio_write_zeroes`21:30
corvusmnaser: yeah, i think this could be revolutionary for the "why did my change fail in ci?" use case21:30
mnaser100% -- no more retrying things in ci21:30
corvusjust put all of the complexity of making a reproducible build aside, and just run the actual build21:30
corvusonly requirement is do everything with containers.  seems we're heading that way anyway21:31
mordredcorvus: I hear containers are just linux after all21:31
*** owalsh has joined #opendev21:32
clarkbif only we could fetch and cache them in a reliable manner21:34
* clarkb spent the early part of the week debugging tripleo's request limit woes21:34
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify
corvusclarkb: i haven't gotten my signet hc yet... i'm starting to wonder if they're walking it down from portland.  ;)21:47
corvus(still shows as pending on crowdsupply, so they haven't forgot)21:47
clarkbhuh maybe you're on a second round of prints?21:47
corvusmaybe; ordered in feb21:47
corvusmaybe i'll be lucky and they'll have fixed the mmc issue21:48
clarkbI also noticed there is a newer client version I need to try21:48
clarkbthat may make a good weekend project to finally dig into that and see if I can make it reliable21:49
clarkb(different usb ports, new client, etc)21:49
clarkblast weekend I upgraded my home fileserver21:50
*** tosky has quit IRC22:10
*** DSpider has quit IRC22:24
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add a job for publishing a site to netlify
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Pass node_version through to included roles
clarkbjust confirming no more recent gitea restarts on gitea0123:17
clarkbdefinitely seems to be something that increasing the timeout as in would fix for next time23:18
clarkberps wrong link :)23:18
clarkb that change23:18
fungibut also the mlb23:45

Generated by 2.17.2 by Marius Gedminas - find it at!