Data continues to be one of the most valuable and sought-after commodities on the planet. Billions of dollars are being invested into figuring out how to acquire and process large sums of data. From travel to retail to government to media to healthcare, competitive data, in particular, plays a vital role in shaping the course of businesses across any industry. Some jobs our customers are doing with the competitive data we provide are:
- Ensuring route and price parity in the market
- Ensuring product catalogs are in line with competitor offerings and the market
- Evaluating competitor inventory quantity and assortment to meet customer demand
- Increasing understanding of trending products on a global scale
- Eliminating pricing gaps on most important SKUs
- Increasing match coverage
Accessing competitive data can be extremely difficult at the best of times. But when dealing with data access at scale, it sometimes feels like these difficulties can reach levels nearing the impossible. Now, let’s take a deeper dive into exactly why data acquisition is becoming more difficult.
Investments in Anti-Bot & Anti-Scraping Technology
Continued investment into research and tools to make sure sites are being protected makes it continually harder for automated processes to access data on websites. If you do a quick internet search, you’ll find articles such as Anti-Bot Fraud Detection Firm HUMAN Snags $100M Investment or Kasada raises $23m, continues to revolutionise bot security. Articles such as these show that there is tremendous focus on limiting autonomous web navigation tools. Three main types of players emerge in the web security space.
- Security and Blocking Companies (e.g., Perimeter X, F5 Shape Security) // marketed as web application protection solutions with built-in tools that detect trends in traffic, detect fingerprint signatures, and have invested in machine learning to help identify automated traffic.
- Content Delivery Networks (CDN) (e.g., Akamai, Cloudflare) // Provide a CDN or a set of web hosting services but offer the ability to add an extra layer of security on top of the tools they specialize in. This allows some blocking technologies to become more widespread than they would normally, giving customers the ability to lock down their sites with a press of a button.
- CAPTCHA & Others (e.g., hCaptcha, GeeTest, Google) // Build CAPTCHAs and other tools that supplement the antibot industry.
When One Bad Bot Spoils the Bunch
With all the investment into anti-blocking and anti-bot technology, the question is why are companies doing this? Why are companies spending so much money to stop bot traffic? The quick and easy answer—there are malicious players out there. You’ll likely recognize most of these types of “bad bots”:
- Bot or automated processes that hack into accounts or try to take over personal accounts.
- Bots that game the system by scalping tickets or products to buy in bulk at a low price and then selling for a higher price. (This is why it’s been so hard to get that coveted PS5 or the latest Nikes!)
- Bots are set up to perpetuate credit card fraud.
- Content spamming bots are set up to distribute large volumes of content across numerous platforms and sites, such as disseminating political views en masse or overwhelming sites with embedded sales content in the comments of your favorite blog.
There is a burgeoning industry that is focused on data protection and keeping sites from being compromised. But the same tools used to ensure the safety of website content make the idea of accessing and sharing competitive data in a free market an extremely difficult proposition.
Stay tuned to next week’s installment of our competitive data acquisition blog series which will go into the “how”—how data acquisition is getting more difficult, what types of blocking techniques are being used, and how QL2 ensures ethical data acquisition.
For more information on the topics discussed in this blog series, check out our webinar, The Trials and Tribulations of Competitive Data Acquisition.
Written by: Jeremy Frank, SVP of Engineering