Rebuilding our API Call Logging Feature from Scratch
Bearer is shedding its winter coat. As we stayed safe at home during the COVID-19 crisis, it gave us the opportunity to think about our vision for the API Monitoring industry. Today, we are releasing a brand-new dashboard, a rebuilt navigation, and improvements to many of our existing features. But one change is quite big, as it is changing one of the core features of our product.
We have completely rebuilt the way API call logs are managed in Bearer. It has been a long process, influenced by what makes the most sense for our users. Bearer’s goal and vision are to instrument API calls and pinpoint what is going on between your application and the API it consumes.
TLDR; Bearer is not a log management product and therefore will provide a specialized, targeted logging feature that focuses on the 1% of relevant logs instead of brute-forcing its way out of 100% of the load.
As we rolled out our agent technology to new stacks, adoption grew. More users meant more API calls and more data to process. We rewrote some of our services and improved scaling to accommodate the increased demand.
However, it raised a broader question: what product do we build? Do we build yet another log management system highly specialized on APIs or do we get back to our mission of helping developers monitor their API usage? Given there are many great existing solutions on the market to manage logs, designed for complete log storage and management, and most high volume customers already have one, we had two different possible paths for our product.
- Build a fully-scalable logging system, which is highly complex and already done well by existing products.
- Or recognize that we don’t need yet another logging solution but instead understand what is the value of an API call log in the context of API monitoring — in other words managing the user’s expectations, needs, and cost target.
The first expected use-case of Bearer is to point out what is going wrong between the application and its integrations. We provide that through our anomaly detection feature. The built-in detection rules we provide will find the needle in the haystack, making sure that errors, performance issues, and unexpected qualities such as deprecation warning headers, are reported.
After extensive research and additional interviews with current and prospective users, we found that right after error and problem detection, our logging feature is used to investigate “non-faulty” API calls. Finding error calls is quite easy, the HTTP status code takes care of that for us. What is more challenging is finding API calls that present as successful, but provide unexpected results. Around this same time, we discovered that the interaction model for our log search didn’t scale well with a very large number of log entries. This made finding the specific non-faulty call more difficult for users.
This use case accounts for about 1% of the total amount of logs we process. 99% of the volume is not that useful and, if recorded, should be stored in a dedicated product, for compliance purposes for example - but also let’s face it, for cost reasons
So we decided to be bold and deeply modify how we provide value with our logging feature. It should be easy to use, adaptable, and offer a complete search interface that makes finding the right API calls easier. It should record everything users need for debugging, all while tying into our anomaly detection and remediation features.
To tackle those challenges, we have built a new feature called Log Collections. It leverages our agent filter processing technology, deciding on the fly what to do with the logs we are generating. That decision is made by our platform. As the situation evolves and anomalies are detected, the platform updates the agent’s configuration in seconds to change the routing pattern of the logged events.
Log collections are based on filters. They sort logs into specific buckets, or collections, to make finding the right log easier. Either a log is matched by an anomaly or a remediation rule (and then goes to those collections) or it’s matched by a Custom Log Collection with specific filters. The log is then sorted and sent to the right collection. When the log is flagged for full processing, it’s sent into the log collection queue where it is sorted, stored, and available for full-text payload search. If a log doesn’t match one of the collection filters, we only process it for metrics and pattern detection before deleting it. This change makes it easier than ever to find the logs you’re looking for. By default, we provide one collection for Anomalies, one collection for Remediations, and a number of custom log collections, depending on the plans.
This feature is available live for everyone! We hope it will continue to help developers monitor and troubleshoot their API issues. Check out the changelog entries for a complete overview of our last major update.
We wouldn’t have been able to make such a change without our user’s feedback, thanks to all of them.