In this series of articles, we discuss some of our learnings along with a few examples that illustrate the biggest challenges with location data and also offer insights on how important accuracy is for us at InMobi. Here is the first part - Location Accuracy: Who can you Trust? Part 1/2, in case you missed it.
Detecting errors by comparing latitude/longitude country with IP country
A request that comes with a latitude/longitude (henceforth, also referred to as lat/long) value will also typically be associated with an IP address. Even in cases where the request is forwarded via multiple servers, the servers are expected to provide the IP address of the user’s device. If the IP address’ country does not match the lat/long's country value, one of the two signals might be wrong.
Detecting errors by comparing ad request volumes to population densities
Looking for anomalies in the geo-distribution of ad requests is one of the techniques we use to identify errors in location signals. The human population is not evenly distributed across the globe and there are several low-density regions. Anytime we see a large number of ad requests coming from regions of low user density, it is an anomaly that requires further investigation.
One common anomaly in geo-distribution is a large number of requests from latitude = 0 and longitude = 0. The location (0,0) is in the middle of the ocean off the coast of Ghana. This anomaly occurs when the ad request simply sends the default system values of lat/long when it does not have access to a precise user location. Ideally, the lat/long values should be omitted from the request when they are unavailable.
Swapped or equal lat/long values: typical errors
Another anomaly shown in the pictures below is a high volume of requests coming from the seas off the Scandinavian coast and the south Atlantic ocean. Upon further investigation it is apparent to us that the ad requests have swapped the values of latitude and longitude, making requests from India appear to come from Scandinavia and requests from Brazil appear to come from the South Atlantic ocean. These errors are typically observed in ad requests that come to us from other servers, possibly from applications that have a lat/long swapping bug. We are able to confirm the existence of this bug when we receive other SDK-originated ad requests from the same devices containing correct lat/long values. It is tricky to detect this kind of problem if both the correct and swapped locations correspond to regions with reasonable population density, which adds complexity to location hygiene methodologies.
The picture below shows another interesting anomaly: a high density of ad requests originating along a line that cuts across the Mediterranean Sea. This issue seems to result from a bug where the application copies the same value into both the latitude as well as the longitude parameters. We have also seen such streaks along the equator and the prime meridian where only one of the two values of latitude / longitude are populated while the other is left as zero.
First-party location data from the SDK: manageable and reliable
Almost all of these observed bugs occur with ad requests that come to us via server-to-server integration, where our SDK is not involved. In these cases, when requests come from blinded apps, we are forced to discard all the lat/long signals from those servers to avoid erroneously targeting location specific ads. Wherever we are able to identify the application with the bugs described above, we try and help the developers fix the bug.
If we were to serve ads on requests that always set longitude to zero, we may end up targeting a user in the western part of London with ads meant for people on the eastern side, closer to longitude zero.
Detecting errors based on multiple requests from the same user
Observing the different lat/long samples from the same user over time also helps in the identification of errors in the location signals. Here is an example of ad requests that we see from a certain device closely spaced in time but from two different applications. One request appears to come from the New York area and a short while later we see another request from the mountains of Kyrgyzstan.
It is unlikely that the user actually traveled that far!
Also, the IP country did not match with Kyrgyzstan. The root cause seems to be an error in one of the apps, where the longitude value might have lost its negative sign.
Detecting deliberate abuse
Often, advertisers are willing to pay a premium for precisely location-targeted ads and as a result app developers who are able to send requests with precise lat/long values can make more money. This creates an incentive for app developers to abuse the lat/long signals and introduce values in these fields that have nothing to do with where the user actually is located. While some of the techniques described above, especially looking at location points over time for the same user, or high request density from sparse regions, can help identify such abuse, it often requires a trustworthy location signal such as an SDK signal to confirm the abuse.
Not all users have apps that are integrated with the InMobi SDK. In such cases detecting the abuse is harder. We list a few methods of abuse that we have detected and eliminated.
- User lat/longs are sometimes populated in the request through geo-coding of the corresponding IP addresses. Populating accurate lat/long fields allow the supply source to command a higher premium by pretending to have access to precise user locations. Such abuse tends to place a large number of users at the centroid of the US (Potwin Kansas), for instance, where these IP addresses are typically geo-coded. When we detect such abuse, we discard the lat/long values from these requests.
- Some supply sources generate a random lat/long value within a region where they think the user is located based on some other signal such as IP address. Detecting such abuse requires intense analytical efforts and clever algorithms.
An example of such abuse is depicted in the following snapshot where the location signals appear to be evenly distributed across a wide region, some of it spilling into the South China sea.
There are more types of abuse we have observed that are beyond the scope of this blog.
Best Practices Insights
Location Hygiene is a complex task that requires a deep understanding of diverse, complex data sets and the use of sophisticated analytical algorithms for detection/prevention of abuse. Before subscribing to a vendor’s location solutions, it is important to understand (1) whether the solutions are based on first-party (SDK) data or on 3rd-party (off-exchanges) data, and (2) whether the vendor has methodologies in place to detect and guard against anomalies. Such anomalies, whether instituted through unintentional errors or systematic abuse/fraud, can derail the promise of precisely geo-targeted services and geo-targeted advertising.
At InMobi, we take the user’s experience seriously - serving relevant content while assigning utmost importance to concerns such as propriety and privacy. Having a set of trusted signals helps us identify issues in the broader signal space. Our experience teaches us that unless service providers deploy safeguards against errors in geo signals, these signal errors can degrade the user experience.