My development career has been focused on building Web sites and Web-basedapplications for a variety of companies. I enjoy working on Ruby and Rails-basedapplications.My drive is in making the lives of my peers better and more consistent. I wantto work with teams that do that by building applications and tools that automatedeployments, environment provisioning and communication. For my customers andclients, I like building simple, well-designed applications, tailored to aspecific purpose that allow efficient job completion.I value proving software development through concurrent testing using test- andbehavior-driven development). I value automating the development to productionpipeline, from environment set-up to shipping the bits to customers.I help build and ship valuable fault-tolerant and loosely-coupled softwareservices for businesses that embrace agile principles as they iteratively andcontinuously build tailored, pragmatic solutions for their customers. That helpcan come through directly building those services or it can come from providingstable and consistent development and QA tooling to prepare for successfulproduction deployments.View the profile
About the talk
RailsConf 2019 - Growing internal tooling from the console up by Nathan L Walls
Your site was built for your external customers first. Data or workflow problems are solved on the Rails console.
But, two years in, your app has grown. Identifying, researching, and fixing those data and workflow problems takes more of your time and attention. It frustrates your business stakeholders, your customers and, of course, you.
This talk will look at a Rails-based web store–including inventory, payment processing, fraud mitigation and customer notifications–and explore how we can build tools into our apps to discover when things go sideways and then help get things back on track.
Alrighty, how's everybody doing today? All right today. I'm talking about to a building to your internal tooling from the console up. So I want to start with an initial question and add it that is how old is your code base? And is it just a show of hands? I'm not going to be able to see all of you all that. Well just cuz of the lights coming in here, but how many have apps that are essentially rails in it? What's in 6 month old? If you are 6 months to 18 months. 18 months to 3 years
All right, and more than four years old. So the vast majority of you were here. All right, so some Central questions here. Keep in mind throughout the course of the talk here. When is a good time for a team to start making their internal life better? And this it this is essentially when you start building an application, you were really starting to solve an external need generally speaking. But when do you actually start paying attention more to what your internal needs are? What might that look like? And who should advocate for it
and who should do it? So by myself, I am a senior developer team Lee dish of an education focused ebookstore. And my team builds the store on a rail scope base. We aren't using anything other than rails. We're not using a commercial using I can work in a few hours and wake me basketball to custom after this. I am one of three production key holders. And I review lock code. So our store looks like this. Internally we call it Stargate. I need the best rated side. We have a rudimentary interface. We have that we have some nice graphics and stuff like that. We have it. We have a few
things that we actually take care of in Terms of administrative tasks. One of them is looking at our it would be called a Cadillac Imports, but this is essentially our inventory ingestion process. And just a little bit of a Zuma night. I checked all of this out earlier and I I noticed that my full frame my full frame screenshots for a very very tiny from the very back of the room. So I tried my best to for the back of the room folks to put in big in these but you'll see here that we've got like what type of inventory import were doing. These extra pounds do get used to a whole lot
how long things happened what started by and whether or not it was successful. And you see about halfway down there something called in progress and we'll talk about that little bit more. So the application self we're currently on reels 5:1 and Ruby 2.5. It is react for the front end. There's a fair amount of araby and they're feeding react as well. We use my cycling right us for data stores psychic for background job processing and kubernetes on Google cloud is how we deploy.
Pretty much an app that is right at The Sweet Spot of rails here that we're not doing that really hard computer science. We're not either investing new algorithms for this world. We're running an online store. So I spent a lot of my time helping my team and that really reflects as I spent a lot of time helping them find the answers to questions that they can't find on their own and it's not because they're not smart people. I work with some very very smart people. It's because they don't have access to go find the answers themselves. Until
accordingly. I am a bottleneck of knowledge and access now if anybody has read the Phoenix project, I'm Brent. All right. Now what kind of what defines Brent in the Phoenix project? Brent is a focal point of multi multiple team multi-team dependencies. He can't focus because he either seeks out or he gets pulled into every emergency or emergent situation that comes up and he spends all of his time being a firefighter instead of being a mentor now my situation is not nearly as bad as the one that is
conveyed in the book about Brent. It's not catastrophic or anything but it has at times been incredibly frustrating feeling the weight of expectations and demands on time to not feel like I can basically bring full value other than being in a firefighter role. So I want to make this Dynamic better. I don't want to be brined. I don't want anybody else on my team to be Bryant. And so we're going to talk about how to mitigate this word of team bottlenecking. So what are we going to cover? We're going to talk about some overarching goals for this. We're going to
kind of build at the state of the world for how my application is presently and I'm hoping that you all can find something kind of relate to for where you're at. So are all at what I'm talking about some initial pain points with all of us. We're going to talk about the approaches to problem-solving with us. And we're going to then go through kind of stepping up from the rails come in from the command line and the rails console. And then we're going to go into some initial automation notifications will take an opportunity to read out if I have a tarp and points will look at some
administrative Frameworks and then we'll talk about building your own tooling. Set goals. We want to make the problems easy-to-see evaluate the notch pain for the entire team. So we don't want to limit who can actually see that. There is a problem in existence and we don't want to have to have anybody have to ask a production key holder to discover if a problem exists. We want to limit key holder specific tools to being really needed and well-defined and basically for higher risk situations. We want to develop observe
a Daiwa kniterate on this and so we wanted to take this in this at we want to take this approach and then we wanted you build on an overtime. So some further goals here less involvement for myself and the other production key holders in emergency situations in so when I say Amberson situation what I mean, I mean something that is has some sort of business urgency to it but isn't necessarily an emergency. Just something that comes up is unplanned work. So a lot of the can you boil down to a publisher that we're doing business with doesn't
have an answer to why their title isn't in the store. So we have to go find the answer to that. Is an emergency situation emergency would be the stores down. We wanted to still take having fewer emergency situations over also, we want to provide proactive tools for folks to basically be able to discover problems and address address them quickly or funny answers to questions what they have. And then I want to take the time that I'm currently putting into these emergency situations and I want to read a wreck that into actually longer-term
mentorship helping my team improve the tools that they work with to better answer their questions and better operate the sight better you'd and then just like focus on things like technical career growth Finally, I don't want to change friend's name. I don't I don't want to be pregnant, but I don't want anybody else on my team to be trying to either. So some caveats with this is all work in progress and I'm okay exploring the ideas that will make our Collective life better and experience is better. And then this is all iterative. I don't want this to be a
final state in any state of any sense of the word. So in the world building, so the code base here is about 4 years old. It was started with a consulting company in overtime. We both grown grown the team scale down the consultancy involvement and the team has turned over. And all the development efforts primarily have been around implementing sales Focus Features and solve an external problems. Not so much on addressing our internal life. Production access is limited. We have some notification for automated jobs,
but not all of them. And then production access is actually required to determine the state of automated jobs and generate artifacts only or fees that go out to certain taxes and stuff like that. We don't have we have some notifications run up, but we don't have a whole lot. Problem solving presently involves a lot of ad hoc rails Consular database ticking. And again, we have a limited. We have a constraint that we only have three key holders. In the vast majority of these operation of questions end up requiring the specialized access. So it's like that talks more about paying points.
We need to reset stale data. We need to investigate are states of transaction. So people attempting to make purchases on the site and you for whatever reason they can't whether their address doesn't validator. The credit card doesn't doesn't work or we can't generate the proper entitlement. I want to try to make a purchase. We have to troubleshoot and restart field jobs. I like the inventory ingestion and then we have to find a verify artifact. So like our site Maps, where are store feeds to external sites. We actually have to make sure that those successfully built and they look the way that we
think they should Contact switching is painful. So for myself, I find that disruption from these emergency situations and that Hawk request is a bit of a productivity pit. And so this manifests that I spent a lot of my time contact switching or answering questions instead of doing the feature work or the Practical debt paydown or team mentorship in a proven that I would prefer to do. So we talked a little bit about we will have ingestion failures. We will have a notification to third-party sites fail Publishers asking why their store why the
title is an inventory or we might have troubleshoot payment transaction problems. And another one that's pretty popular for us is email notification troubleshooting. Why didn't a particular user get an email likes a password reset? We also run into detox issues. We can get overly aggressive set calling from certain certain, Texas which looks a lot like Adidas. We deal with fraudulent purchases and just all of the other General issues. I can come up with running an e-commerce store. So these are one off some of them are ongoing and so
are cell cycle is strong in the fall in the spring tide to the College semesters little, outside of that. We'll start talking about how to make this better. Start by taking the approaching the issues that are to blame. So what defines the issue this could be just a simple restatement of the problem who has the issue or who has to address the issue now and by this, I mean who are The Limited population of your team members who can actually just the issue or question and what makes that true
and who could potentially address this issue instead. So if if the prior constraint was remove the things for different who would be able to potentially act and answered that issue in a future instance of that problem. How might solving this issue be easier in that's meaning prevention is your medication higher visibility. That's where thing and how might this issue be easier to spot. Set bring this into issue visibility in this is we'll talk about that alerts and notifications. You can
obviously you monitoring with New Relic Skylight where you have operational. Denise good across the water problems from payment problems fraud inventory operations other periodic jobs at yellow run. Just want to improve resilience. We want to make expensive things were call tolerant soda for us. Our sitemap extraction process is very very expensive. It takes multiple hours to submit. We take the contents of our inventory and write it out to a series about Canal files. This is an engineer a resilience engineering talk. But
we want to consider what we make bulletproof and then for less expensive things making making them easier to recover from failure. And we want to make it as practical as possible to recover from filled out. So we want to have a well-defined path for how we have to rerun something if we have to rerun it. And the other thing I'll adhere is you might have a general approach for how you want to approach these sort of problems. But I encourage you to think pragmatically about how you solve each one individually and how to let the the actual individual circumstances kind of dictate how you
approach that from a sense of from a set of general principles that you have. You want to involve your team here? So you want to socialize the issues you want to review kniterate Solutions with them and you don't want to be the only person working on solving these problems. So if the problem that one of the key problems that we're facing here is a bottle knocking don't perpetuate the bottleneck by being the only person working the problem. So for my team, what we have is a point of Oliver and this is a rotating responsibility presently every couple of weeks.
And this is personal taxes telephone triage as needed for bugs. They will be the kind of General technical question answerer and they become a focus point of of interruption for our business stakeholders our product manager business analyst some of the other business folks elsewhere as well. And what we want to get them to you is being able to literally improve internal tooling so that they can be more effective. Right now we're kind of on our second or third iteration of what our point of Oliver does and what we have is kind of a
double Ops Daily Ops spreadsheet. No, I'm going to zoom in on this little bit, but we have is a series of things that we break down. In the last 24 hours. Have you seen any spikes in New Relic on the back end or on the front end? Do we have any new Honey Badger issues that we need to look at? A crossroad different environments is catalog or our inventory industrial working correctly is background job processing working as we expected to. Alright, so now it's actually start talking about how to start making improvements here. So
we have kind of this initial state with a command line in the rails console. What's a console good for so you can investigate your data and your state changes, you can try and apply one-off fixes because you're hey Ruckel you can your work with your production data and you can actually do you write new classes and methods against that to test out solutions to things and it's also a kind of Handy backwards way of running SQL queries. So yes, if you don't actually have live access to a database console, you can actually execute that sequel from the rails console itself.
And actually this is kind of nice because you can then come intermex testing out reels objects or you're working with SQL queries and seeing what you got. So we use this and we look at things like air estate's on pending transactions or Phil purchases. We look at whether or not why Pickler pieces of book is not in inventory or not showing up in the store or wire inventory adjustment process isn't working correctly. And we also use it for flipping feature Flags. I will have to make data corrections as
well as requested by our business partners. An example use case. This is actually something that will touch on a little bit further, but we're actually like planning out this working slack. So we have a couple of actual commands that were going to be issuing here in the kind of like a pre-formatted block there. So we're calling out the work. We're going to explain what's going on and we're going to get a couple sign up. So if you notice down at the very bottom there, you got a couple of check mark, that's me and my manager basically saying yes thumbs up. We're going to go ahead
and do this work. Two benefits of console here is ad hoc. You have access to your Scopes in your pride day time. You can reopen classes and write new methods on a trial basis and you probably already have access to it on your production servers be beyond your logging in or whatever. You're 2.2. caveat courser your limited here to your production key holder. So you're not that you're not actually changing the dynamic of that bottleneck at this point here. Secondly, you were alive and production. So, you know, you're working potentially without a safety net. Your ad hoc Solutions aren't
safe here. So you do all that hard work to it to fix the problem and you don't have an artifact really that you have or natural artifact to take out take away from that. And it's not great from a standpoint of reviewing visibility or are nobility. I'm kind of You have to kind of announce what's going to happen here? It's this isn't going to be something that's just a line medically going to end up as a GitHub pull request and socialize the wood. That is your team. So how we actually make that a little bit better? Personal planner actions
in a non production environment in this is actually a great time to plan your action with appear only one who is not a production key holder. So if your bottleneck make sure that your team understands where does it you're doing a planet workout with them so they get the experience of seeing it happen. You didn't want to inform your broader team of what your intentions are and this breaks down to what you're doing. How is happening or how how it's going to happen? Why is happening? He does like why are we making the state cracker? Why are we making a manipulation to a bunch of business
records? And then you want to say one is going to happen. So ideally it's not just like oh hey now, but you actually say hey, I'm going to carry out this action here. Along with working with a p r i think working from a script is is a great approach and when your task complete get out. Don't let your cat have products us. So let's get into initial Automation and notification on jobs, right? So I meant what you can make it visible. So we start off we had some big automated tasks here. And sometimes they did they wouldn't complete or require changes and the time sensitive and so overall
like we kind of cascade through things and we made some of it visible but not all of it. We wind up with a problem. We have an alerts room and it is chock-full of notifications. Simeon I just just like a lot of notifications for what's going on. So we have the benefit that we're increasing the visibility of our system. We are socializing so you want to socialize where these are happening you want to socialize the purpose of the notification so you can have like a broader business like understand things are happening. It's easy to do chat
notifications the barrier to to just Bixby get these into slack is pretty low or whatever magnet if using teams or whatever else. And we use this for both background jobs and live events that we type in. Is an example of what offense anytime we get a hit from our fraud service that we don't want to allow jugular user to to go through. We actually like get an alert and that prompts us to actually like lucky count and stuff like that. That just gives us a little bit of a closer. Look at that. So problems with
notification serum. Dumping everything into one or more slack channels can get very very noisy. Secondly, the map is not the territory here. And so what it what I mean by this is you have to have the right things notifying and you have to understand what not to notify about and you have to make sure that your team understands which is which and what parts of the system you don't have notifying at all. So you don't want to jump to an assumption about the state of the system unless you know what's being reported in and what
isn't So you can that you can do more fixing that up requires more complex interaction so you can get in to chop off so you could get work at what books from those were things like in the previous talk show. He bought from GitHub or you do that there could be some other custom solutions that you can cut up. Sodat automation we hit we've done pretty well with us. So you discover your solving the same sort of problems over time and you can make them, right.
And when's a good time to do this as soon as you actually tried to text in the pattern the goal here is so you you're still you still have the problem of production down constraints on a limited number of people but you're getting the knowledge out of people's heads and into your code base. And so you can still broaden your team's understanding about those. And for us like we actually can you quickly re invoke our catalog range of our inventory ingestion process with separate command. So I just had to hop on a production server in run a command like this
and we can get that kick off if it fails. So we know land of the point where we can start Reviving things. We've talked about the the rails console. We talked about improving some Automation and we talked my notifications. You made some improvements, but we still haven't really solved the bottlenecking issue. So what where can we go from here? We talked about administrative framework here. And these are things like let me zoom in here active. Admin. There's rails admin. There is administrate. Active and then looks like this on their main side and you do four orders, you'll kind of
end up with screen like this evening in a little bit. You'll see you the order number. You'll see see the date you take aspirin and total not through thing administrate kind of gives you the same sort of you. I think so is it is a great way to get a broader data view of what's going on with your application. So you're increasing the visibility of the data inside the app and for myself as a production key holder. That's what actually solve a good SWAT the problem because suddenly I'm not the only person who can actually look at that
data. So we can uncouple the answering the questions from from of production data from systems level access, which I think is a good approach and there's not a whole lot of work to do you make her tryouts make sure you're in there you go and they'll offer some. The cabinet are you need to make sure that you work this into as mentioned your authentication and authorization schemes because you are actually providing a live you to your production data. You don't want to leave it unprotected. You don't want to leave for anybody in your out to get be able to get to get into. And they
give me a sizable addition to your to your app. And finally, they won't cover everything. You need to clean specialized use cases. So we didn't kind of bridge over to building your own admin interfaces. Why would you want to build your own? You're not limited by the by the structure of an atom framework and I like to think about this as a way of representing more complex states of things. And so how's things to drop out of work clothes versus just how did it would be represented in the database? Carry out here, or
you're essentially doing full feature work at this point. So it has all the you coming expense on YouTube going through a full-featured omnicycle testing getting interface work done QA. That's everything getting approval from your stakeholders on this the stakeholders in this case are both the rest of your Tactical Team and your business stakeholders making sure that they are actually set up for success with this is wow. so what kind of things might we go to a customized interface for this
for? So in our case, we presently have a interface for currency conversion and what we use currency conversion for is we get a lot of our product price in u.s. Dollars. We sell things in several other countries through their own kind of regional storefronts and we don't necessarily get our publishing Partners to provide us native or you're in country pricing for those other countries. So what we do is on a daily basis, we actually get a conversion rate from u.s. Dollars into these other currencies and then we can
actually apply those against the available inventory for those stores to say, okay based on the US dollar and this conversion rate. We should price us this product at this amount for this other store. So this is just kind of assumed into beyond that. You'll see in that Top Line there a convert currency conversion from u.s. Dollar to the US dollar. It converts it 1.0 conversion to the pound is 5.77 soon so far. Can we just add new entry for this for the currency or on a daily basis for the currencies over converting to? There's more than
I want to see us do too and that this gets into this is hypothetical stuff that I thought about in the last few days and weeks. I showed you an example before where we are getting a lot of notifications about fraud and fraud prevention into a chat room. I would love to get this into a nap instead and then instead of dumping everything into slack for manual action out-of-band. What I would have said like to do is provide summaries in slack, but have an actual dashboard and it's something that's actionable in the app itself
that we could say. Yeah, we had this email you and you we have user account the side of that they get some good. We got a score back from our fraud provider that's basically says here's how whether or not we think this person or how risky we think this person is and then we use that kind of threshold whether or not we're going to allow a purchase. How many attempts are purchased attempts are making cuz that helps us evaluate and then we basically give ourselves some opportunity to say either or
let's go ahead and block them. We do something similar was it with purchase troubleshooting thankfully most transactions go through smoothly. And so it's very easy to just say hey, okay. This is it. This is fine. But you know when the things that we got is users will put in an address that the bank doesn't think is their address and we get what's called an address verification failure. You can also get failures from just your card is declined for various reasons and we can get
it all this information right now to our payment processor to actually pull this information back. We actually we actually get this information back was hurting in the database, but we don't actually presented and so when we have a user calling with a problem, somebody has to go look up why instead of actually just saying hey customer support. Here's like the latest running thing of issues that have come up you might hear from these folks. What are things that comes up to sometimes is this is on that second line there we call out to a different service that basically generates the entitlement
for something to have to have a license for a buck if that fails we want to be able to say hey go retry that attempt again generate. And then activate the book for the user instead of having to have them contact support we can do that practically. And finally, we do run into the trouble of folks potentially buying the same thing twice in a short. Of time, either they run into the first problem. We're code wasn't generated or they forgot generally forgot the they bought something they bought it again. And you do when you're buying college textbooks, you do it there 50 or $100 or
whatever. That's definitely money that you want back quickly as a student. So if we can actually say, oh, that's probably do you might hear from these are about that we can actually proactively cancel that transaction. What are the other things that I think would be very helpful for us to do two is actually start thinking about our automated jobs as a stream of the vents and then can I get a summary for these? So this is a sampling of the sort of things that we do so we have our inventory and Justin that's that first line with a Cadillac Delta. If we just saw when the
first time when the last time that ran and was successful when he got a status of it and we need to say hey, okay, and then if we had our store feed fail, we get a heads-up about that and it failed twice we could escalate it. We could actually fire a notification to hey, this needs some is further attention. And then we will see a stream of events of hey here are the the actual events that have all happened. Here's how they resolve themselves into it be very easy to do for anybody on the team to just see a sense of how things were working rather than having to go dig.
And so what I wanted want this to be is I talked before about our point of filling out the spreadsheet. That's currently about 15 to 45 minutes a day for somebody and that's better than it has been we we have a good sense of what's going on in production, but we can drop that the five minutes and we can make this easily accessible for anybody in the team to go look at. That's a lot of tools in tools are not the be-all end-all here. Meeting is you want to work with your team and make sure that your understanding what problems different
members of your team different constituencies of your team see in face and questions that they have questions that they may have in the back of their head, but they don't ask because they don't think that there's Dan what to answer them. So it's come back to the central questions and Rin look at them again. So when is a good time when is a good moment for a teen to start making their internal life better? I think now is is is a financier for that. Obviously how much you can do it at any given time either obviously is depending on your circumstances, but start
advocating for getting this time to start making your eternal life better. What might that look like again? It could be standardizing crime that you don't run jobs for things that are manual tasks and now it could be building or just getting an admin interface in place and providing access to data for your technical stakeholders and ideal your business stakeholders as well without requiring them to have production systems access. And who should advocate for it? And what should it and who should do it? I think everybody in the team tactical stakeholder or a technical
practitioners business stakeholders the altogether sure. I should be advocating for those. I think you got a lot of value when you can start delivering more information to your business stakeholders without having to send them away from your app to things you do that are actually part of your system, but you're sending data out to so like it are bespoke spend a lot of time in Google analytics and they kind of have to cut a piece things together in there. What I want to do is I want to give them better tooling to actually answer a lot of questions that they have about how person is
going how sales are going inside the opposite of Bill still be a purpose for the analytics that they're using but there are some Core Business questions that we can answer and we had the answers to we just need to kind of surface. Do some additional questions here. What is the time commitment? How often should we review and who should be involved so Your time commit it could be everybody does 20% It could be you take a week out of every month or quarter or you dedicate a point of olive,
you know, if they're not spending time answering questions. They're spending time improving this internal to like there are a lot of different ways to approach this problem. I think you should review your approach to this. If not, every retrospective at least every other retrospective or particularly when used to actually cycle over having somebody actually looking at this internal tooling. So if you have a point that I've liked we use a two-week adoration right now. I think 2 weeks is a good time for the person who's leaving Point dive in the person who's incoming in the point that the kind of
like touch base and you at least have a two-person retrospective about hey, here's what I tried to here's what I think would be good to work on next. I think we can turn all these rain causing the rain boats myself. And so really think critically about how like what feels like it's excessive friction where you feel like the business folks are asking questions of you you have to to turn to a production key holder or you are that production key holder who has to go answer them. What would have to be true for them to not have to interrupt you what
kind of problems do you want to focus and have those production key holders worried about more rather than solving everything. So I have some some kind of related resources around all of this so part of the preparation for the site. I reread the Phoenix project for the first time in about 5 years. It's I think it's really good at it. It's official business novelization. It's kind of a rewrite of the goal. I would came out I think late 80s early 90s. That's also good one. work clean
This isn't this book came out late last year early this year and help me really think about my my role as a senior developer focusing on quality control and mentorship and kind of developing a personal framework for optimizing my my organization in my productivity and kind of like really shipped to my thinking in terms of unsticking myself in terms of helping my team. Are the nature fix this reminds me of reminded me of the importance of getting up from my desk regularly and getting out in the woods just like getting outside and not staying. So focused
on solving problems all the time that I actually like getting some perspective on things. If I leave the building code podcast, they consistently offer insights of topics and interviews and they have a great lineup of empathetic Allison gasps. And so I just a word of appreciation for for all the hard work. They been doing over the Run of that podcast. These are watching the scratch that Graphics are by Stephanie Shaffer vitalsource Technologies designer. I'm at Wells park. Us. I right they're very infrequently,
but I'll have a lot of stuff up there later today or tomorrow at Wells park. UFC stations. I'm also on Twitter. Doesn't my talk. Thank you all so much and I hope you all have a great rest your conference. You all have been a great audience and y'all you chose this session out of all the others. So I'm honored. Thank you very much.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.