google-chrome-extension | Data Big Bang Blog

Preface

More and more sites are implementing dynamic updates of their contents. New items are added as the user scrolls down. Twitter is one of these sites. Twitter only displays a certain number of news items initially, loading additional ones on demand. How can sites with this behavior be scraped?

In the previous article we played with Google Chrome extensions to scrape a forum that depends on Javascript and XMLHttpRequest. Here we use the same technique for retrieving a specific number of news items based on a specific search. A list of additional alternatives is available in the Web Scraping Ajax and Javascript Sites article.

Code

Instructions

Download the code from github
Load the extension in Google Chrome: settings => extensions => check “developer mode” => load unpacked extension
An “eye” icon now appears on the Google Chrome bar
Go to the Twitter’s search page https://twitter.com/search-home and enter your search keywords
Now press the “eye” and then the start button
The scraping output is displayed on the console as JSON

Customization

To modify the number of news items to be scraped open the file inject.js and change the scrollBottom(100); line by the number of items you would like (e.g: scrollBottom(200);)

Acknowledgments

This source code was written by Matias Palomera from Nektra Advanced Computing.

Data Big Bang Blog

Creativity and Problem Solving for Data Science (whatever it may mean…) | An experimental spin-off from Nektra Advanced Computing

Menu

Tag Archives: google-chrome-extension

Scraping Web Sites which Dynamically Load Data

Preface

Code

Instructions

Customization

Acknowledgments

If you like this article, you might also be interested in

Further Reading