17:00:38 <krtaylor> #startmeeting third-party 17:00:39 <openstack> Meeting started Tue Jul 21 17:00:38 2015 UTC and is due to finish in 60 minutes. The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:43 <openstack> The meeting name has been set to 'third_party' 17:00:52 <asselin> o/ 17:00:57 <krtaylor> who's here for the third party CI working group? 17:00:59 <mmedvede> o/ 17:01:10 <krtaylor> hi asselin , mmedvede 17:01:54 <krtaylor> asselin, thanks again for running the last meeting 17:02:12 <patrickeast> hi 17:02:13 * krtaylor feels relaxed after vacation time off 17:02:21 <krtaylor> hi patrickeast 17:02:28 <asselin> you're welcome 17:03:16 <krtaylor> I realized that I put the wrong date on the agenda, glad you all are here anyway 17:03:24 <krtaylor> here's the agenda 17:03:27 <krtaylor> #link https://wiki.openstack.org/wiki/Meetings/ThirdParty#7.2F21.2F15_1700_UTC 17:03:57 <krtaylor> #topic Announcements 17:04:06 <krtaylor> I don't have any, none listed 17:04:17 <krtaylor> anyone have anything to quickly announce? 17:04:38 <krtaylor> deadlines? news? 17:05:13 <krtaylor> #topic Common CI Vsprint 17:05:29 <krtaylor> looks like it went well, and now 4 are done! 17:05:48 <asselin> hi, yes, didn't finish, but made a lot of progess 17:06:13 <asselin> nodepool is the most challenging one because it involves changes to nodepool itself 17:06:27 <krtaylor> asselin, what are you thinking for the next steps? 17:06:36 <krtaylor> want to have a second vsprint? 17:07:08 <asselin> krtaylor, not thinking about that. I think we can just do normal reviews and get it done that wat 17:07:48 <krtaylor> asselin, fair enough 17:07:51 <asselin> part of the issue is that we are too dispersed geographically for nodepool, so it's difficult to iterate 17:08:03 <mmedvede> I had a general question on the sprint. I noticed some refactoring/move patches where not just moving things, but combining more changes in a single patch 17:09:03 <mmedvede> I think it did slow things done, e.g. my patch (low priority) had a comment to have things changed, in comparison on how they where in system-config 17:09:06 <mmedvede> #link https://review.openstack.org/#/c/199790/ 17:09:16 <asselin> mmedvede, yes, I try to limit & enforce scope, but definietly need to do better with that 17:09:36 <krtaylor> it would be good to limit the initial drop to be just the refactoring to make it work 17:09:55 <mmedvede> asselin: ok, good to know. I wanted to do move without regression in one patch, and any improvements in a different ones 17:10:44 <mmedvede> krtaylor: +1 17:10:44 <asselin> mmedvede, there are exceptions of course, but that is what we should aim for 17:11:16 <asselin> perhaps submitting follow-up patches and comment with 'done in patch#" can help 17:11:32 <mmedvede> I feel there is sometimes a push to do more than necessary in a single patch, not sure why 17:12:43 <krtaylor> as long as we agree, then comments to split out work will be supported 17:13:12 <krtaylor> why would someone refuse a higher patch count? :) 17:13:46 <asselin> I think we need to stand stronger on that. 17:14:44 <asselin> I will add comments to that end: separate refactor from improvement patch. 17:15:12 <krtaylor> #agreed Take a stronger review position on common ci patches that do more than minimal refactoring 17:15:17 <mmedvede> asselin: thank you 17:15:31 <asselin> mmedvede, thanks for bringing it up 17:15:47 <krtaylor> hm, not sure agreed worked, whatever 17:15:56 <krtaylor> we'll see in logs 17:16:08 <krtaylor> asselin re: iterate on nodepool, not sure I understood that 17:16:42 <asselin> just mean working throught the patch review cycle https://etherpad.openstack.org/p/common-ci-sprint 17:16:57 <asselin> there are quite a few interelated patches 17:17:18 <asselin> those are difinitely more than a refactor 17:17:46 <asselin> but necessary to not have the nodepool.yaml file being a template 17:18:08 <asselin> so it's more of an improvement followed by a refactor 17:18:15 <krtaylor> so just quicker reviews to land everything in one group 17:18:45 <krtaylor> I understand what you are saying 17:19:30 <krtaylor> actually, that sounds like a good exercise for a vsprint, with lots of ci and infra involvement 17:19:56 <krtaylor> or at least a focus hours during/after an infra meeting 17:21:39 <asselin> honestly, we got quite a bit done prior the virtual sprint, so I think we should do that by keeping reviews & testing active 17:21:54 <krtaylor> asselin, your call, let us know how we can help 17:22:23 <mmedvede> asselin: any idea why this one did not merge? https://review.openstack.org/#/c/199737/ 17:22:36 * asselin looks 17:23:04 <mmedvede> might need a re-nudge, maybe gerrit had problems at the time 17:23:21 <asselin> yes, seems like it 17:25:11 <asselin> oh I see....it's depends-on is still in review 17:25:55 <mmedvede> asselin: good catch 17:26:42 <krtaylor> anything else for common ci? 17:26:45 <asselin> i've nothing else 17:27:09 <mmedvede> krtaylor: proceed 17:27:10 <krtaylor> #topic Spec to have infra host monitoring dashboard 17:27:42 <krtaylor> so this was moving well, no major problems 17:28:02 <krtaylor> I learned about another dashboard 17:28:29 <mmedvede> #link https://review.openstack.org/#/c/194437/ 17:28:32 <krtaylor> jogo wrote lastcomment 17:28:44 <krtaylor> thanks mmedvede 17:28:50 <krtaylor> #link http://jogo.github.io/lastcomment/ 17:29:23 <wznoinsk> krtaylor: sorry to inject it here, did you look into having nagios + nagstamon (as a desktop app instead of dashboard) ? 17:30:20 <krtaylor> wznoinsk, no, although I'm not sure its a bad idea 17:30:47 <krtaylor> but it would need to be a service that infra would host 17:31:12 <krtaylor> that way, it would be available, else it would be dependent on someone privately hosting it 17:31:13 <mmedvede> krtaylor: jhesketh suggested to rename the spec files to avoid confusion, I do not think it has been addressed 17:31:22 <wznoinsk> nagstamon is just an app on your workstation you have it in your systray that poll nagios server (over http) for any alerts nagios server is seeing 17:32:04 <krtaylor> mmedvede, I changed the topic, I didn't agree :) Also, we are moving the original spec...eventually 17:32:06 <wznoinsk> krtaylor: anyways, we can take it offline, I've got some experience with that and really like that (over dashboards or emails) 17:32:13 <krtaylor> sweston, are you around? 17:33:10 <sweston> krtaylor: yes, sir 17:33:32 <krtaylor> wznoinsk, it would mean that someone would have to install that to see the history of a system that just posted a failed comment 17:33:34 <sweston> reading backlog 17:34:01 <krtaylor> wznoinsk, I'd think that a page would be easier for a dev to hit to see if a system was off in the weeds 17:34:24 <wznoinsk> krtaylor: nagstamon is just a desktop version of what you normally see on nagios dashboard(s) 17:34:50 <wznoinsk> you can use nagios webpages for 'non-infra' (devs) 17:34:58 <krtaylor> sweston, thanks for joining us! I had pinged you yesterday, was wondering if you had a chance to see if a patch could change projects in gerrit 17:35:20 * asselin will be back in a few 17:35:33 <jogo> krtaylor: I tried to keep lastcomment as simple as possible, it is a tiny python requests script that runs from a cron job right now 17:35:42 <sweston> krtaylor: no, unfortunately I have not had any spare time at all. I might be able to get to it later in the week 17:35:58 <krtaylor> wznoinsk, I am certainly open to suggestions, I know that others are using nagios (we are too) 17:36:31 <sweston> krtaylor: actually, I am getting ready to upgrade my systems again, so today or tomorrow would be a good time to test this 17:36:37 <krtaylor> sweston, thanks, let me know if you can't, but it woul dbe good to keep all the history and comments with the original spec 17:36:50 <wznoinsk> krtaylor: let's talk some other time as it's go-home time for me already, I'll catch you on #openstack-infra if you don't mind 17:36:52 <mmedvede> wznoinsk: nagios is generally good at monitoring multiple hosts, I am confused how it can be used to show status of third-party CIs 17:37:06 <krtaylor> sweston, else, we can capture the test and include it with a txt file when it is moved to third-party-ci-tools 17:37:38 <krtaylor> jogo, thanks, I do really like the layout, clean and simple 17:37:47 <sweston> krtaylor: yes, we can consider that as a last pass option 17:37:54 <wznoinsk> mmedvede: nagios is powerfull you can use any script program (bash, python, perl, java or whatever you pick) to check 'a thing' for you and feed status back to nagios 17:38:41 <krtaylor> wznoinsk, thanks for coming, and let me know what you are thinking, if it provides more with its framework, it may be useful for other tasks in infra as well 17:38:57 <wznoinsk> mmedvede: krtaylor: I'm a huge fan of nagios so I'll sound like everything is doable in nagios, because it is in my opinion :-) 17:39:10 <mmedvede> wznoinsk: I see, so you basically suggest not to write a web frontend, but reuse nagios 17:39:16 <wznoinsk> krtaylor: sure, I'll catch you in the week 17:39:22 <krtaylor> wznoinsk, it is a good tool, and widely used 17:39:30 <krtaylor> wznoinsk, thanks 17:39:44 <wznoinsk> mmedvede: web frontend is probably not the strongest part of nagios especially for wider (than just infra) public so it may still be needed 17:39:59 <wznoinsk> but the infra part should be well covered by nagios out of the box 17:40:13 * krtaylor is trying to keep up with the different threads 17:40:24 <mmedvede> wznoinsk: ok, now I am more confused. Would definitely want to learn more about what you suggest :) 17:40:48 <mmedvede> wznoinsk: I think we might be talking about monitoring different things 17:40:57 <krtaylor> sweston, I'll ping you tomorrow and see where you are at, I think it would help speed the infra hosting spec if it were moved 17:41:19 <jogo> krtaylor: and the code is fairly compact too 17:41:21 * asselin returns 17:41:26 <sweston> krtaylor agreed 17:42:21 <krtaylor> jogo, it would be ideal if we could combine efforts with patrickeast and have a super simple dashboard 17:42:59 <krtaylor> jogo, patrickeast - have either of you compared with the other dashboard? 17:43:21 <wznoinsk> mmedvede: I monitor my 3rdparty CI with a script under nagios, not sure what exactly you want to monitor, could you ping me the spec pls? 17:43:39 <patrickeast> yea we’ve chatted a bit about them 17:44:13 <mmedvede> wznoinsk: this is the spec in discussion: https://review.openstack.org/#/c/194437/ 17:44:31 * krtaylor has not had the time to do a functional comparison of dashboards 17:44:32 <patrickeast> i think one issue is that they are kind of targeting different audiences, so they prioritize different features/designs 17:44:56 <patrickeast> mine is more focused on someone who wants to troubleshoot a ci system and know what/where things broke 17:45:05 <jogo> patrickeast: yeah, it would be possible to make two views. I think the big difference is how we collect data 17:45:21 <mmedvede> wznoinsk: this is patrickeast 's dashboard #link http://ec2-54-67-102-119.us-west-1.compute.amazonaws.com:5000/?project=openstack%2Fnova&user=&timeframe=24&start=&end= 17:45:29 <jogo> I just use the gerrit REST API periodically, patrickeast wanted real time data so gerrit stream ... which makes things a lot more complex 17:45:47 <jogo> I am happy to see any solution that is agreed upon 17:45:58 <jogo> I would be more then happy to stop running lastcomment 17:46:12 <patrickeast> haha, same boat here, i’m ok either way too 17:46:20 <patrickeast> i just want *something* 17:46:37 <wznoinsk> mmedvede: yes, I like patrickeast's work, even if that dashboard saw day light for the first time (and it was more basic) 17:46:45 <krtaylor> jogo, patrickeast thanks for your flexibility 17:47:42 <krtaylor> jogo, as patrickeast said, yours is useful too, I don't see why you'd have to stop running it, unless you wanted to 17:47:53 <wznoinsk> I do understand there will be different criteria you'd be scoring the CIs on (i.e.: how many times per 10 a CI failed, how many times per 10 a 3rdparty CI failed when upstream jenkins DID NOT etc.) 17:47:55 <patrickeast> krtaylor: +1 17:48:06 <krtaylor> but having everyone jump in on maintaining one dashboard and improving it would be MUCH better 17:48:20 <patrickeast> jogo: i think yours is actually more useful for someone who just wants to know what systems are ok for a project at a glance 17:48:32 <patrickeast> its just harder to figure out *why* they are broken 17:48:42 <patrickeast> but that doesn’t matter to 99% of the openstack devs 17:48:44 <sweston> krtaylor: +1 . I would prefer this as well, and would rather have folks contributing to radar 17:48:59 <wznoinsk> mmedvede: with more criteria and logic you want a more advanced dashboard, or easy to read dashboards 'per problem' 17:49:21 <asselin> sweston has a good point 17:49:40 <krtaylor> I think the most important thing is that we all agree, very quickly, on one dashboard to get the hosting spec done 17:49:43 <jogo> patrickeast: right, that is exactly what I head in mind when I put it together 17:49:51 <asselin> these others are supposed to be 'temporary' 17:50:03 <krtaylor> then we can start working on the full featured solution 17:50:03 <mmedvede> wznoinsk: I want something I can use to detect anomalies in our CI compared to others. patrickeast 's scoreboard is currently sufficient for our usecase 17:50:28 <jogo> patrickeast: it wouldn't be too hard to add an option to my dashboard to list failed cases when you click something 17:50:39 <mmedvede> wznoinsk: I do not necessarily want 'automagic' detection, I just want data presented in a consumable way 17:51:18 <krtaylor> ok so do we have agreement to stay with scoreboard for now? else we prob have to wait 2 weeks to see any movement 17:52:28 <krtaylor> jogo, would you be willing to move yours to third-party-ci-tools so others could contribute through the gerrit process? 17:52:43 <jogo> krtaylor: no problem 17:53:09 <mmedvede> If we would ask infra to deploy scoreboard or lastcomment, where would puppet modules go 17:53:11 <mmedvede> ? 17:53:36 <krtaylor> mmedvede, its in the spec (roughly) 17:53:48 <mmedvede> krtaylor: yes, sorry 17:53:55 <jogo> krtaylor: I don't think it makes sense to have infra host a temporary solution 17:54:19 <krtaylor> jogo, actually, that was their suggestion 17:54:25 <jogo> in the time we are sorting out a 'temporary' thing a final thing could be done 17:54:47 <krtaylor> they wanted something now, and it would get all the structure in place 17:55:18 <krtaylor> thats really the catch here, we need a basic solution NOW 17:55:27 <mmedvede> krtaylor: +1 for NOW 17:56:32 <krtaylor> ok, well, we are running out of time for the meeting, move to email thread? or a quick decision? 17:56:37 <wznoinsk> mmedvede: I'm affraid that to check dashboard for all projects, all CIs in each project, scanning it once/a few times a day is taking a lot of human cycles, I'd prefer to have a check and threshold defined that lets me know 17:57:10 <krtaylor> wznoinsk, it would really only be visited when a dev got a neg comment 17:57:21 <krtaylor> and it would be filtered by project 17:57:36 <krtaylor> just meant to see if a system is off in the weeds or not 17:58:07 <krtaylor> and for that reason, I'd choose patrickeast 's if I had to pick one today 17:58:34 <wznoinsk> krtaylor: yes, it's good for ad hoc checks 17:58:37 <krtaylor> its easy to compare a systems results to everyone else that tested the same patch 17:58:52 <krtaylor> so do we agree? 17:59:53 <mmedvede> +1 17:59:53 <krtaylor> out of time 18:00:08 <krtaylor> I'll move to email thread 18:00:19 <krtaylor> thanks everyone, really good meeting 18:00:33 <krtaylor> #endmeeting