Making Ad Blocking Easy — Nimbus: Maximize your Programmatic Ad Revenue

It was a regular day when I received a message from our Programmatic Strategy Manager: "Hey, I've just received an email from one of our publishers. There is an advertiser running multiple campaigns with invalid creatives, so their ads are not being rendered properly. The publisher wants to make sure that none of their placements affect user experience, as well as the interface of the app. Can we do something about it?"

Such a message was not uncommon. At Timehop, we have to deal with bad ads quite a few times each week. The process was cumbersome and barely scalable. The main reason was that the old functionality kept the list of bad advertisers hardcoded, so it required us to make a deployment every time we had a request to block yet another ad.

var blockedAdvertisers = []string{
	"foobar.com",
	"foo.barbaz.com",
	"barbaz.com",
}

var blockedApps = []string{
	"com.harmful.app",
}

var blockedCategories = []string{
	"IAB11-4",
	"IAB25-4",
	"IAB26-3",

}

func makeBidRequest() BidRequest {
	...
	...

	bidRequest.BAdv = blockedAdvertisers
	bidRequest.BApp = blockedApps
	bidRequest.BCat = blockedCategories
	
	return bidRequest
}

It was not convenient, the code was prone to human error and frequent changes led to a polluted Git history. We all badly wanted to rearchitect this logic. Given that the number of publishers monetizing with Nimbus keeps growing, we wanted to make sure we’d be able to keep up with the growth and wouldn’t drown in a flood of requests to block certain ads. On that day I had little work to do and quickly recognized that it was a perfect time to show some love to the blocking ads architecture. The idea was to decentralize the entire process, so that down the road it would minimize the engineering time required to maintain the logic, as well as provide fine-grained control over bad ads to all of our publishers through the means of a simple UI. I also wanted to make sure the code was free from hardcoded values.

Nimbus’ exchange runs on OpenRTB specification, an industry standard for communication between buyers of advertising and sellers of publisher inventory. The specification allows us to block advertisers by their domains (badv), application bundle identifiers (bapp) and content categories (bcat), such as Politics, Profane Content, Cocktails/Beer, etc. Our goal is to be able to handle this data in such a way that it's both flexible and efficient. Having a good data model solves many of the fundamental business problems, so let's get to that first.

Data Structures:

The first step is to organize elements of data and standardize how they relate to one another and to the properties of real-world entities. Let's step back for a moment and review our requirements. There are three types of entities we need to abstract away:

Advertiser domains
Applications
Content categories

The naive solution is to introduce a data type for each of these use cases:

type BAdv struct {
	domain string
}

type BApp struct {
	application string
}

type BCat struct {
	category string
}

Now while this approach is perfectly valid, it can be a little tedious to maintain three structures that share the same responsibilities. In fact, if we zoom out and look at the problem from the birds-eye view, we can notice that all these structures are responsible for the exact same thing— passing certain values in the RTB request. Having said that, we can simplify it all by introducing a dedicated BlockedEntity data structure.

type BlockedEntityType string

const (
	EntityTypeBApp BlockedEntityType = "bapp"
	EntityTypeBAdv BlockedEntityType = "badv"
	EntityTypeBCat BlockedEntityType = "bcat"
)

type BlockedEntity {
	BlockedType BlockedEntityType
	Value       string
}

Such abstraction will help to smooth the serialization/deserialization process when it comes to working with the persistence layer. We won't have duplicated code as if we were leaving BAdv, BApp and BCat in place. For type safety, we also abstract away each blocking variant in the BlockedEntityType data type.

Persistence:

Having designed domain models, it's time to look into persistence. Persistence is a term used to describe how objects utilize a secondary storage medium to maintain their state across discrete sessions. This is an essential step as we obviously don't want to lose data every time the service is restarted. Nimbus uses a SQL-based persistent storage to store publisher settings. This data is neither read- nor write-heavy, so there are no strict requirements on how to design database schema. It would be fair to say that a flexible and concise schema is a decent requirement we could try to fulfill. Let's review some options.

Each publisher should be able to populate a list of applications, advertisers and categories they would like to block. Let's look at the applications list first. To create one, we could define a dedicated *bapps* table with a sole responsibility of keeping application names. To block the application for the given publisher, a many-to-many mapping table between publishers and bapps is used. Essentially, 2 tables per blocked entity type, which means we'll end up with a minimum of 6 tables. Here is the relationship diagram of the described design.

This is a heavily-normalized schema which ensures no redundant data across tables. Since the actual block is defined by a single relationship in the mapping table, a nice side-effect of this design is that we have access to all the advertisers (as well as applications and IAB categories) ever blocked in the system, even if the block is no longer in place, hence the relationship record is gone. The downside of the solution is its bulkiness. It's just way too many tables for such a simple task. Luckily, we don't have a requirement to keep track of all the blockings ever created, so we can trade this feature for a simpler and more minimalistic design.

One way to simplify the schema is to drop all the mapping tables and move the publisher id to the entities tables.

The schema now reflects the very first domain model we reviewed in the previous section. One could argue there is no point in simplifying further, and they wouldn't be wrong. In fact, each table has its own purpose now and is pretty much self-contained. But it's always easier to work with a system in which deviation between application data structures and database data models is minimized. In the same fashion as we introduced BlockedEntity data type, why don't we define *blocked_entities* relation to represent all of our entity types?

The data model is now very concise and easy to work with. Its concepts are somewhat similar to Entity–attribute–value model, in which each attribute-value pair is a fact describing an entity. Since we are bounded by the RTB specification, the number of unique *blocked_type* values are limited, which is why such schema is not going to be a concern.

Data Validation

When it comes to working with production systems, you want to make sure the data you are operating on is both valid and real. I'll give you a quick example to illustrate what I mean in the context of Nimbus. Say you as a publisher want to block an advertiser who's domain name is Google.COM (for the sake of example, haha). An ad operator mistyped the name and entered Google.CMO instead. Is such a domain name valid? No, obviously, it's not. .CMO TLD simply does not exist, so the operation should fail. In a similar fashion, an operator's finger slipped off the E key and the blocking name ended up looking like Googlw.COM. Is it a valid domain name? Sure it is. But is it real? No, it's not (at least at the time of writing this post). Such value should also be declined by the system.

If you ever had a chance to work with data, you know that validation code is usually messy unless you are using a very cool helper library. At the same time, libraries are dependencies. The more dependencies you have, the more locked in you are. A good philosophy to follow is that one should always strive for elegance. For the domain-specific validation, we can kill two birds with one stone by simply looking up the host using the local resolver. By doing so, you guarantee that the domain name is both valid and real. In Go, the primary language at TechByNimbus, you can do it with the help of net.LookupIP function, which is part of the language's Standard Library.

Validating IAB categories is the most trivial problem. Nimbus keeps the entire list of categories in-memory and a simple logic guarantees to validate user input against that list. It's boring and just works.

Let's take a look at how to validate mobile apps, as they are neither domain names nor specification-defined categories. Each application has a unique identifier. It's known as Bundle ID on iOS and a package on Android. For example, com.timehop is the name of the package that belongs to the Android version of Timehop. In contrast, 569077959 is the Bundle ID of an iOS version of Timehop. Again, we strive for elegance and do not want to be dependent on third-party libraries. Similar to domain validation technique, we can check whether the given application ID exists in the stores of both iOS and Android platforms. The easiest way to do so is by sending a simple GET request to https://play.google.com/store/apps/details?id=com.timehop and make sure the response is not 404. Since we are only interested in response status code, to make things even more efficient, we can send a HEAD request instead. The HTTP HEAD method requests the headers that are returned if the specified resource would be requested with an HTTP GET method. Such validation is super simple to abstract away and it's almost guaranteed to always work.

Conclusion

It all started with a regular message from Nimbus Programmatic Strategy Manager, but the team takes it seriously even when it comes to a basic CRUD application. Before releasing it to our publishers, we went through the entire process of planning, designing, implementing and testing. We always strive for elegance, which is why delivering Ad Blocking functionality was a short, but super fun journey.