Articles Summary

This is a summary of all the Data Big Bang blog articles by subject.


A summary of information retrieval stages and current data science articles.


  1. Distributed Scraping With Multiple Tor Circuits
  2. Running Your Own Anonymous Rotating Proxies


  1. HTML Cleaners and Tidiers


Handling of Active Content

  1. Web Scraping Ajax and Javascript Sites

Main Content Extraction

  1. Extraction of Main Text Content Using the Google Reader NoAPI
  2. Voice Recognition + Content Extraction + TTS = Innovative Web Browsing

Language Identification

  1. Language Identification for Text Mining and NLP


  1. Automated Browserless OAuth Authentication for Twitter
  2. The Python POPO’s Way to Integrate PayPal Instant Payment Notification

APIs and NoAPIs

  1. Google Search NoAPI
  2. Exporting StackOverflow users blogs to Excel Hyperlinks
  3. Extraction of Main Text Content Using the Google Reader NoAPI
  4. Integrating Google Analytics into your Company Loop with a Microsoft Excel Add-on

Policies and Data Issues

  1. Scraping vs Antiscraping
  2. The Data Portability Fact Sheet


  1. Ideas and Execution Magic Chart
  2. Ideas: Egont, A Web Orchestration Language
  3. Egont Part II

Marketing and Sales

  1. Automated Discovery of Blog Feeds and Twitter, Facebook, LinkedIn Accounts Connected to Business Website


  1. Integrating Google Analytics into your Company Loop with a Microsoft Excel Add-on

Big Data Stack

  1. Using Queues in Web Crawling and Analysis Infrastructure
  2. Persisting Native Python Queues
  3. Adding Acknowledgement Semantics to a Persistent Queue
  4. Esoteric Queue Scheduling Disciplines


  1. Running Microsoft Windows Console Applications with Invisible Windows



  1. Data Science Resources
Digital Art by Don Relyea