ByteDance, the parent company of TikTok, is collecting vast amounts of web data at an unprecedented rate. Its “Bytespider” web crawler outpaces competitors such as OpenAI, Google, Meta, and Anthropic, consuming data 25 times faster than OpenAI’s GPTbot and 3,000 times that of Anthropic’s ClaudeBot.
As ByteDance continues to amass this vast amount of data, the US government is taking steps to limit access to American user data by Chinese authorities. This comes as President Biden signed a bill requiring ByteDance to sell TikTok within a year.
ByteDance may be planning to release its own Large Language Model (LLM) using the collected data. The company has already launched several AI-powered features on TikTok, including tools for advertisers and AI-generated avatars. There are also rumors of an internal search engine powered by AI, potentially utilizing ChatGPT technology.
The sheer volume of web data being collected raises concerns about how ByteDance plans to utilize it. With a looming deadline to sell TikTok, the company’s actions are under intense scrutiny.
Source: https://mashable.com/article/tiktok-parent-company-bytedance-web-crawler-25-times-faster-than-openai