More and more sites are implementing dynamic updates of their contents. New items are added as the user scrolls down. Twitter is one of these sites. Twitter only displays a certain number of news items initially, loading additional ones on demand. How can sites with this behavior be scraped?
- Download the code from github
- Load the extension in Google Chrome: settings => extensions => check “developer mode” => load unpacked extension
- An “eye” icon now appears on the Google Chrome bar
- Go to the Twitter’s search page https://twitter.com/search-home and enter your search keywords
- Now press the “eye” and then the start button
- The scraping output is displayed on the console as JSON
- To modify the number of news items to be scraped open the file inject.js and change the scrollBottom(100); line by the number of items you would like (e.g: scrollBottom(200);)
This source code was written by Matias Palomera from Nektra Advanced Computing.
If you like this article, you might also be interested in
- Running Your Own Anonymous Rotating Proxies
- Scraping for Semi-automatic Market Research
- Automated Discovery of Blog Feeds and Twitter, Facebook, LinkedIn Accounts
- Distributed Scraping With Multiple Tor Circuits
- SQL Server Interception and SQL Injection Attack Prevention