• Strong technologist with experience in telecommunications, healthcare, technology and software consulting verticals.• Certified DevOps and API Engineer/Practitioner• Google Cloud Apigee API Ninja• Passionate about APIs, Microservices, Kubernetes, Serverless, CI/CD, Automated Delivery Pipelines, Cloud Computing.View the profile
About the talk
Most business-critical applications today are powered by APIs, so any downtime or performance degradation can lead to significant loss in revenue, customers, and brand value. This puts pressure on operations teams to monitor real-time API performance and security. Adding to the challenge is the need to support new releases and reduce resolution times.
See how Google Cloud’s Apigee API management platform brings the power of industry-leading AI and ML technologies to simplify operations and easily helps ensure APIs are always available, secure, and performing as expected. Learn best practices of API operations and see a live demo from T-Mobile.
Speakers: Manoj Gunti, Mudit Purwar
Google Cloud Next ’20: OnAir → https://goo.gle/next2020
Subscribe to the GCP Channel → https://goo.gle/GCP
product: Apigee; fullname: Manoj Gunti;
event: Google Cloud Next 2020; re_ty: Publish;
Everyone. Thank you so much for tuning in my name is Mona and the product marketing manager for Google Cloud. Apigee API management Saco. Thank you. And I would say hello everyone. My name is Monday and I'm a principal engineer in a pure Center for enable meant dealing with a p. I developed engineering team along with driving the CI CD and cell service automation tools initiated a program at T-Mobile Locomotor. So in the session we want to explain and show you a live, demo of how you can extend, Google's industry-leading
Technologies to your API programs. For many years, we had the opportunity to work with customers across the globe and across Industries. We have observed that the role of API, an API management has been continuously. Evolving, what used to be? Just an agent for accessing various systems, is not actually an agent for driving digital transformation. Programmes lapi continue to the belt and consumed by developers marketing and business. Owners have started looking at API as something Beyond, just a piece of technology to drive 330 transformation for cramps.
And when you align apis with your digital experiences, it also creates an extreme impact on the value chain. When your court business, critical applications are powered by API. Imagine what Impact Zone, X or comments, decorations, or even security incidents can have on your customers and partners are for Ati operations becomes extremely critical to ensure a connected. And experience is delivered to all the stakeholders. Operations teams are under immense pressure. Apis are up and running and Performing as expected and there's an
issue. How do I quickly identify the root cause and I want to do all of this without getting distracted by false alert. I was sick of Scorpion. I want to ensure my API adhere to compliance requirements. I want to protect my sensitive data from abuse and security, I'm scared your API programs, API data collected across the entire, APR value chain, all the way from your back and targets to the front and apps. And this date, I can deal with all these problems management platform life,
industry-leading a i and a male models to harness the power of your date. With features like a p.m. monitoring anomaly detection, security report writing and Border Protection. You can now ensure that your API are always available secure and Performing as expected. Let me know about T-Mobile's API journey and how they have put a animal our capabilities into practice. Thank you. We are T-Mobile around a journey to change the Wallace experience of a customer. in today's digital age of mobile web, artificial intelligence machine, learning Internet of Things
impossible, without leveraging, the power of apis across the digital value chain, Let's reflect back on our API Journey so far. We started with a pgp. I dated, back in 2015, with two dedicated API teams, developing apis, to expose Legacy Services, what a new, radically simplified Frontline application, In 2017 to bring some cohesiveness among the growing near the items. Oxygen development tool was launched, it would take the API contract or the solder as an input and Gendry the opposite of a proxy bundle. And according to the source, could not have been reported lost the CI CD Pipeline
and deploy. The apogee Bay proxy into Appetit, Edge, T-Mobile UPS center for any movement was launched in 2018 to address the growing challenges with a program that was gradually emerging. Same here. We launched a suite of API First Development tools to enable the domain teams to self serve themselves for any need on the Apple TV repair platform. And heavy are in 2020 with 80 + business. Two main themes, developing apis within T-Mobile is 1300 +. 8 are proxies resulting in over 9 billion transactions per month. Let's look at some of the
API operational challenges that come with this kind of throat. Here are some of the challenges we continually face everyday in the APA, operation space and some of these might resonate with your company is a PA operations as well. Securing and protecting apis from external and internal Bad actors. What traffic analysis, broken authentication, and so on, and so forth, and trying to continually maintain a secure washer to have a seamless digital experience of for a customers. Observability into unexpected and unforeseen API.
Anthony shoes in contextual information about those issues is a key challenge to improve the availability and performance. EPA platform scale ability to handle the sudden traffic burst during our uncarrier move announcements. Like our latest uncarrier move scam shield and promotional events like Black Friday holiday season expect. On a similar note capacity planning to support new Innovative. 34C Beyond Wireless like a banking service, T-Mobile money, or TV, service, Division, and others And then
streamlining rebates on Zoomer and partner onboarding process to access T-Mobile, apis to improve consumer and partner experience and Asia Lee. And ml have definitely helped us in addressing some of these challenges. Nml features like a Pokemon, a train and a pretty sense have definitely empowered or every day a paid operations. Like a ring or Blindside for unforeseen API, traffic and performance issues, which could go undetected and cost of your API, availability incidence, and impact the business negatively.
Anomaly detection in a play Operation by applying a i n m. L model is a line with T-Mobile's observability objectives, like for actively monitoring the systems and apis, addicting issues prior to any negative impact on the business and developing Auto hearing capabilities were possible. Loading a notification combined with anomaly detection or real-time traffic and performance issues has improved our team's productivity and significantly reduced to empty to me. Time to diagnose a p. I m p. With the added to this orbital control
in the form of predetermined resolution play bows for any issues as notably to use the mtgr mean time to resolution offending API incident. Let's look at a demo on how a i n m. L have empowered Oribe operations at T-Mobile. Let's check out some of the activities building a high animal features in real time and how these features are assisting in keeping T-Mobile apis available and Performing for a seamless Chase still experience of T-Mobile customers. When you login to Apple Jacks, this is how the landing page
looks like. You can get to all the observable T related features under the online. Stop. As a devops engineer, I visit the API monitoring overview dashboard quite frequently. This dashboard provides me with a quick summary about the runtime, Telemetry of all they pay Roxy's deployed in the production environment. For the purposes of this demo, we will be using one of our development and Mormons wedding. Hopefully you will not see a lot of broken stuff.
As you can see in this dashboard we get a summary information about a given API environment. Total traffic in transactions per second. Overall average error rate percentage. Call Bay Bay, proxy response, P99 latency value and total number of recent alerts. It also provides top-three API proxies with highest value in each major category. So this dashboard provides a devops engineer, a pretty good overview of any potential run time issues with any API proxy. Now let's move on to the event, stop to check if there have been any recent alerts
triggered, or any new anomalies detected. Looks like you'll have to go little bit back in time to see any alerts and anomaly. Hey, Joe. So the alert app shows all the recent triggered alerts based on the predefined conditions set by us that shows all the events that deviate from the normal behavior, as bad as the apogee region organization and environment level. These divisions as well as the severity is detected by applying the building and ml models to this article API data. The cool thing about the
anomaly is the contextual information. They provide with respect to environment name, maybe a proxy name Target fault code reader and the time of the trailer, For a devops engineer. This contextual information about an anomaly is very critical and important as a significantly. Reduces the overall empty DD mean time to diagnose and empty TR mean times the resolution for any unexpected and unforeseen issues. But then I realize, this might be in our Blind Side.
And the amazing thing about this feature is that we do not have to do anything to set all this up. It comes by default in his is Belton. So based on this, contextual information, if a devops engineer decides to diving deeper into the anomaly he or she can drill down into the anomaly. And it'll take them into the investigate page, all the time. So let's investigate one of the anomalies Just looking at the anomaly details on the table, looks like this, anomaly was detected due to a reasonably High number of 504 Response Code in the April flower.
It started at around, 3:17, in the afternoon. Let's look at why the FBI is returning the five, what are response to the clients. Now, I have various ways and different dimensions that I can find the root cause about this issue. Let's try to determine the fall sores in the API Flow by looking at different dimensions. I want to look at the fall suits at this point in time. So I'll go select from the drop-down and as you can see in the API. So that the dog is in point is the problematic area
by clicking on the target cell. We can see the suspected cause it's mine by the ml. Model with skin, be attributed to the fault code which clearly states that the response code is being written by the back in service. Also their details for the impacted clients, if its lines and the Target in points that is API is dependent upon and is at fault for this anomaly. No, it's the one to look at any specific. Transaction-level information, we can do so by going into view logs.
This view provides us into specific details about each transaction, like the request-uri, which is basically the URI being involved along with the request message ID, what is the status code response time of falsehoods and other details about the client and the transaction. Next time I can do a list of things for this anomaly like creating an alert based off this anomaly. So let's quickly create another officer. Normally you have to hit this button create alert ab and we'll have
to come up with a choice if you want a base to select police officer, normally we can leave it at anomaly and then I wanted the condition if we choose to or let it remain the same as that you could buy the anomaly and get the salad. The alert notification type vivant Like email notification or slide or a PG Duty notification have to save this and we have a free country. Good alert for a given a normal condition of future proactive notifications. So let me share some sample. Alert notifications,
It is one email alert notification and how it looks like it has information like the timestamp of the trigger or name. Alert name, summary of the alert and details regarding the alerts, like the condition trigger value proxy reason at cetera to directly view the event in a PDS. This is all a sample. Slack notification looks like it provides us the name of the alert organization name, Cecil Wallace in details, like, The target latency proxy named region condition information, so on and so forth.
It also provides us to view the anomaly on details directly into I even stopped of apogee Edge. So, let's go back. No, I can do for the things. Like I can create a report based of this anomaly to look for any additional dimensions for the first, for the analysis, or for an RCA Russell management. Reporting needs really impressed by the future. As it has helped, mitigate latency and network issues. In the past resulting in increased API availability, along with improved performance of high, a
p. I Next, let's see how collections can be really useful in making the most out of these AI animal features. Collections, a group of a proxy targets developer apps based on dog medicine, choice of dimension of business women apis, which helps us in setting up appropriate alert contributions and tourism values for the same group of apis. Next, I want to move on to security. Security is one of the single biggest challenge that most of the digital Enterprises are trying to solve, and it never gets easier it's ever evolving,
new address, so facing everyday. In the security reporting overview dashboard, we have information with respect to Northbound APA traffic is all the given environments and it still says provides further. Information regarding the EPA proxies taking active traffic out of the total appearance applied total number of states was deployed in a change in traffic. It's also provides a API client traffic composition by region. Since this is one of our development environment and most of our
developers are based in u.s. West Coast and hence we can see the first one in classic competition being hired in the u.s. west region. Let's move on to the wrong time. I spilled which provides us more specific and detailed runtime information about the ePay. Proxies kproxy traffic percentage, change in traffic and percentage of traffic on non https, are not secured food and the total number of Apex Irons in working or accessing this API. And if I want to look for the details into a
proxy, it provides a visual eyes draft for the DPS in honesty. If you had access protocols and then it also provides for the data points with respect to developer of Fortune host and Target in points, that describe a property is dependent on. Let's move on to the configuration. Off the EPA, proxies this provides insight into the EPA proxy, security configuration and policy composition. As well as providing information about the voice of Lois definitions as well include the traffic management security and expansion policies,
and all is well as usable piece of code in the form of State close. So what should host is the definition in a bit? Rocky with the two minds are the effects lines or on which protocol a PICC lines are going to ask. If you want to dig deeper into specific security policy details, that are part of a proxy. We can click on a box seat and we can see specific policy names that are part of this community, a proxy. And then, there are some security policy standards that have been established by a PA program. We can evaluate each and every prophecy against those standards to see whether
these apis are in compliance or not. The dispatch board is very helpful in a quick glance on the security forces of a run time. Maybe I can figure Asians So all of this data that is visible eyes on the Stars board is also available via API for the security organization to integrate, just a David Dickerson, dashboards and Reporting tools. Finally take me to talk about user activity as a user activity dashboard has individual user ID to find information which I cannot display differ. I have
screenshots for the use of activity, details dashboard. User activity dashboard provides insight into user access and their access Behavior. Don. The apogee sui or why the management apis. As well as change in user activity, over the period of time, and what type of data did accessing and percentage of potentially sensitive operations, they have performed. If we identify any suspicious activity by a user, you can drill down further. On all the activities and actions performed by a particular user. Like the resources, they have accessed and what action they have taken on those
particular resources security. Reporting feature provides us with relevant data to evaluate whether it's a process and consecration requirements set forth by the organization's digital security team. Lastly realizing the value of these productive. AI animal features is the key to have higher available in performing apis for any API program. Next, let's do some real-world business outcomes, wearing these pieces were immensely valuable for our divorce in the rain, team
to be 40% off of what all internet traffic and all of those 20% is bad, bot traffic. Video of Bad Blood traffic. One of the anomalies we had was their dancers in traffic to rotbier doing. Office is ours, which is odd. 1 diagnosing the issue we found out that some Bad actors overseas with trying to harvest Akatsuki from the author ABI using Bots specifically BattleBots as this was a major security incident, we moved fast remedy, the issue and other way after you were able
to successfully block the ball traffic, very pretty Next Network, latency can cause substantial degradation in service and negatively impact, the SLS. Another anomaly detected was Network latency in a PR response Times Standard. Even though the traffic, what is needed in the newest registration agency and negatively impacting a customer's experience, I want to hear working the issue, the music's all provider unconsciously made changes to the networking doubting logic in the changes and you
guys slow. But ultimately the cloud provider was able to permanently fix the network issue on their side. New product, launches a new promotions. Like, our latest cancel uncarrier, move our big events and can bring Houston traffic to our websites. Next anomaly, detected by the sudden burst in a traffic for TVs Navy eyes. When they were launched Expected thesaurus but not the extent of 700% increase. Since this was one of the varmints apogee support team at Google was contacted to check up on auto scan capabilities of the apogee.
Runtime compliments to handle the additional load. All of these incidents were detected by a game on a train, which is the Testament to a yard, and a Mercator bleaches for automated monitoring and alerting along with the ability, to dig deeper into an anomaly or incident in real time, to diagnose the issue and find reservation, correct. You can tune a i n m l have significantly transform the business of operating apis. And Indian Fusion using Advanced Predictive Analytics ability to predict the impact that smart and probably self is
all with elevated to the next level. thank you and talk to you when I was That's awesome. Thank you everyone for tuning in. Hope you have got a good Insight on how you can apply for a. I n m l to your API program, any questions or comments that you may have in the Q&A module and will be more than happy to a Christian and have a
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.