Enriching a List of URLs with Google Page Rank

Dealing with a large body of web resources can be daunting. You make a list of hundreds of blogs, but how do you share or recall those resources later? You must somehow organize your list. Many people do this with tags, but this is not necessarily the best option. Manual organization is also tedious, so tools for enriching data automatically came in handy. The relevance of different resources changes over time. What we originally tagged as “breakthrough” may come insignificant.

Last week I saw a friend who had recently started a new job and wanted my opinion about current and future technological trends. I wanted to give him links to thousands of resources that I have been accumulating over the years, but organized in such a way that he would not have to view them one at a time. This triggered an avalanche of ideas about how to enrich lists of links. My first thought was to rank my list of sites about venture capital and data science using Google Page Rank. I also considered adding the number of tweets, likes, and “+1” for each site but these are generally awarded for individual articles, not whole sites. I ended up adding the Google Page Rank with project pagerank.

The most interesting ideas to explore, though, are in another direction: how to boost items that are in the long tail. The best music may not make the Top 40, and so remains invisible. Algorithms better at recognizing value in the long tail would revolutionise the economy.

The code is available on github. Two examples of the output are available on data-science-bundle and venture-capital-bundle.

See Also

  1. Automated Discovery of Blog Feeds and Twitter, Facebook, LinkedIn Accounts Connected to Business Website
  2. Exporting StackOverflow users blogs to Excel
  3. Data Science Resources

Exporting StackOverflow User Blogs to Excel

It’s more simple than what you think

Do you find yourself with the need to automatically convert URLs that you have imported in your cells to hyperlinks? If you search on Google there are many solutions but the top ones add complexity using macros or VBA solutions.

I needed to do it quickly in the context of retrieving information from the StackOverflow API. I was searching for all sites from users in a specific country to keep track of their blogs. So I added

SELECT
   Id,
   DisplayName,
   Reputation,
   WebsiteUrl
FROM
   Users
WHERE
   Location like '%Buenos Aires%' or Location like '%Argentina%'
ORDER BY
   Reputation DESC

on http://data.stackexchange.com/stackoverflow/q/110548/  and exported it to CSV but when they were imported to Microsoft Excel I needed to convert the URLs manually. The results are ordered by StackOverflow ranking.

I found this solution, you can just use the Microsoft Excel hyperlink function. Say you have URLs as plain text in the cell D2, then on column E2 add something like =hyperlink(D2) and this will do the trick. You can now click on all the cells!

Please don’t tell recruiters about this. Hopefully they don’t develop web mashups.

See Also

  1. Integrating Google Analytics into your Company Loop with a Microsoft Excel Add-on
  2. Automated Discovery of Blog Feeds and Twitter, Facebook, LinkedIn Accounts Connected to Business Website

Resources

  1. Most influential github users by location
  2. Implement OData API for StackOverflow
  3. Creative Commons Stack Overflow Data Dump