Bot name

Our system uses multiple methods to identify and manage non-human traffic to your site:

  1. Amazon Bot Control:
    Detects sophisticated bots that don't clearly identify themselves
  2. User Agent blocking:
    Blocks traffic based on keywords in their user agent
  3. User Agent matching:
    Identifies bots that declare themselves in their user agent string

How we identify bot traffic

Amazon Bot Control integration

For traffic that doesn't explicitly identify itself as non-human, we leverage Amazon Bot Control to detect and block sophisticated bots based on behavioral patterns and known signatures.

This already blocks between 12 to 14% on average.

User Agent blocking

There can be additional traffic that identify themselves as bot that we are blocking as well. For example:

  • HeadlessChrome
  • Applebot
  • Googlebot
  • AdsBot
  • Baiduspider

User Agent matching

As the result of the earlier blocking rules, it's only about 0.08% of our data that contains signs of bots. While RUMvision already applies basic patterns to match bots, site owners can add additional strings to recognize and group traffic based on their user agent string. Do note that we never collect nor store full user agent strings.

The regular expression that we use to identify declared bots based on their user agent string is as follows:

/\b\w*((gle|ing|ro)?bot|crawl|spider|headless)[\w-]*\b/i

This helps identify requests from services that openly declare themselves as bots, crawlers, or spiders in their user agent.

Amazon bot control rules

RUMvision automatically blocks traffic from the following categories:

  • CategoryAdvertising
  • CategoryArchiver
  • CategoryContentFetcher
  • CategoryEmailClient
  • CategoryHttpLibrary
  • CategoryLinkChecker
  • CategoryMiscellaneous
  • CategoryMonitoring
  • CategoryScrapingFramework
  • CategorySearchEngine
  • CategorySecurity
  • CategorySeo
  • CategorySocialMedia
  • CategoryAI
  • SignalAutomatedBrowser
  • SignalKnownBotDataCenter
  • SignalNonBrowserUserAgent

For detailed explanations of each category, refer to the Amazon Bot Control documentation.

Examples of blocked bots

The following are examples of bots that are pro-actively blocked by our system (meaning no data is collected from these sources):

Search Engines & Crawlers

  • Google Bot, Google Ads bot, Google User Triggered fetcher, Google Special-Case Crawler
  • Ahrefsbot, Applebot, Baidu, Bingbot, Yandexbot, DuckDuck bot, Seekportbot, Bytespider

Monitoring Tools

  • New Relic, GTmetrix, Datadog, Google + Chrome Lighthouse
  • HeadlessChrome, Savvii Munin LoadTimes Measure, Pingdom

AI & Content Analysis

  • GPTBot, ClaudeBot
  • Prerender, HubSpot Page Screenshot Service

Social Media & Others

  • Facebook, Pinterest, DisplayMetrics (Facebook Lite), meta-externalagent
  • AmazonProductDiscovery, Screaming Frog SEO Spider, SiteAuditBot (SEMrush), SEBot-WA, YisouSpider, kioskbrowser, tbtbot

Advanced bot identification

You can further customize how bot traffic is handled on your site:

Custom bot identification

If you need to identify automated traffic beyond our default configuration, contact us for a tailored approach. All identified bot traffic (that isn't already blocked by our Amazon bot control rules) will be visible through the bot_name filter in your dashboard.

Blocking additional traffic

To block user agents not covered by our default rules, you can do the following to prevent such traffic from ending up in your dataset:

  1. Navigate to the snippet configuration of your domain
  2. Add an exclude rule like bot_name:BlockWhenUserAgentContainsThisString