Bot name
Our system uses multiple methods to identify and manage non-human traffic to your site:
- Amazon Bot Control:
Detects sophisticated bots that don't clearly identify themselves - User Agent blocking:
Blocks traffic based on keywords in their user agent - User Agent matching:
Identifies bots that declare themselves in their user agent string
How we identify bot traffic
Amazon Bot Control integration
For traffic that doesn't explicitly identify itself as non-human, we leverage Amazon Bot Control to detect and block sophisticated bots based on behavioral patterns and known signatures.
This already blocks between 12 to 14% on average.
User Agent blocking
There can be additional traffic that identify themselves as bot that we are blocking as well. For example:
- HeadlessChrome
- Applebot
- Googlebot
- AdsBot
- Baiduspider
User Agent matching
As the result of the earlier blocking rules, it's only about 0.08% of our data that contains signs of bots. While RUMvision already applies basic patterns to match bots, site owners can add additional strings to recognize and group traffic based on their user agent string. Do note that we never collect nor store full user agent strings.
The regular expression that we use to identify declared bots based on their user agent string is as follows:
/\b\w*((gle|ing|ro)?bot|crawl|spider|headless)[\w-]*\b/i
This helps identify requests from services that openly declare themselves as bots, crawlers, or spiders in their user agent.
Amazon bot control rules
RUMvision automatically blocks traffic from the following categories:
- CategoryAdvertising
- CategoryArchiver
- CategoryContentFetcher
- CategoryEmailClient
- CategoryHttpLibrary
- CategoryLinkChecker
- CategoryMiscellaneous
- CategoryMonitoring
- CategoryScrapingFramework
- CategorySearchEngine
- CategorySecurity
- CategorySeo
- CategorySocialMedia
- CategoryAI
- SignalAutomatedBrowser
- SignalKnownBotDataCenter
- SignalNonBrowserUserAgent
For detailed explanations of each category, refer to the Amazon Bot Control documentation.
Examples of blocked bots
The following are examples of bots that are pro-actively blocked by our system (meaning no data is collected from these sources):
Search Engines & Crawlers
- Google Bot, Google Ads bot, Google User Triggered fetcher, Google Special-Case Crawler
- Ahrefsbot, Applebot, Baidu, Bingbot, Yandexbot, DuckDuck bot, Seekportbot, Bytespider
Monitoring Tools
- New Relic, GTmetrix, Datadog, Google + Chrome Lighthouse
- HeadlessChrome, Savvii Munin LoadTimes Measure, Pingdom
AI & Content Analysis
- GPTBot, ClaudeBot
- Prerender, HubSpot Page Screenshot Service
Social Media & Others
- Facebook, Pinterest, DisplayMetrics (Facebook Lite), meta-externalagent
- AmazonProductDiscovery, Screaming Frog SEO Spider, SiteAuditBot (SEMrush), SEBot-WA, YisouSpider, kioskbrowser, tbtbot
Advanced bot identification
You can further customize how bot traffic is handled on your site:
Custom bot identification
If you need to identify automated traffic beyond our default configuration, contact us for a tailored approach. All identified bot traffic (that isn't already blocked by our Amazon bot control rules) will be visible through the bot_name
filter in your dashboard.
Blocking additional traffic
To block user agents not covered by our default rules, you can do the following to prevent such traffic from ending up in your dataset:
- Navigate to the snippet configuration of your domain
- Add an exclude rule like
bot_name:BlockWhenUserAgentContainsThisString