Egont Part II

(part I here)

Description

Egont is a shared space where users mashup personal information.
Its top goals are:
  • Discovering and curating new information in a personalized and dynamic way.
  • Promoting emergent behavior in a shared programming environment
  • Facilitating Serendipity.

Egont is a personalization environment where users can connect to, import, expose, and index data from their web services. They can also apply functions to build mashups around their personal interest like in a spreadsheet. On Egont, users can combine and exchange information. For example, users can connect their Egont accounts to a variety of services like movie rankings, and merge rankings from their social networks. If they want to find independent films they can filter out blockbusters. When users from their social networks update their rankings, these updates are processed and the result is automatically recalculated. The same idea can be applied to streams from Twitter or blog posts. One user can apply a filter to those streams to curate information apart from mainstream trends and recommendation systems, while other users can build new filters using this user’s data. Third parties can take advantage of the data flowing in this shared environment by developing new information functions.

Egont has a simple programming language where experienced users can access other user’s variable namespaces and handle security granularities to enable or restrict the flow of information. Less experienced users personalize their Egont experience using a simpler web interface.

Summary

Egont is composed of the following elements:
  1. A data flow engine
  2. A data store where cell values are persisted.
  3. A web application
  4. A simple programming language

Data Flow Engine

The data flow engine works like a spreadsheet. Some cells may be dependant on others. Values are recalculated only when necessary. For example, one cell may contain a function to retrieve new tweets, while another cell takes those tweets and uses a second function to extract named entities like places or proper names. Users can personalize the vast flow of information from many sources to process, aggregate, and filter information. The data flow engine limits recalculation to affected cells only.

The key feature of the engine is its ability to apply functions to a set of shared cells from other users. Another important feature is the handling of security settings. Users can configure which cells are shared with which users at a very granular level.

Web Application

The web application has two important parts. One is the editor where advanced users can use the browser to edit their Egont scripts. The other is a simpler user interface where users are able to define their sources of information and apply functions to them more easily.

Programming Language

The goal of Egont is to simplify the building of personalization and mashups, so its programming language is oriented to quickly orchestrating user information.

This is a rough example of how an advanced user could use Egont programming language to merge friends movie rankings.

friends <- [egont.users.alice, egont.users.bob, me] # list of friends.
movies_ranking <- imdb.ranking("swain-4") # persist my ranking on movies_ranking from my user on IMDB.
movies_average <- average(apply(friends, ’movies_ranking’)) # calculate the average of movies rankings from my specified friends. It only changes when rankings are updated
egont.feeds <- movies_average # expose the results as a feed in the web application.

Whenever any of the above users modify a movie’s ranking Egont recalculates that movie’s score.

With Egont,  we will have a place where we can discover new resources, research our interests, and create a community capable of sifting through the ever more vast sea of data available on today’s web.

See Also

  1. Parsing S-Expressions in C# using OMeta

Resources

  1. A Brief History of Spreadsheets
  2. Kahn process networks
  3. Directed acyclic graph
  4. Advances in IC-Scheduling Theory: Scheduling Expansive and Reductive Dags and Scheduling Dags via Duality
  5. Pregel: A System for Large-Scale Graph Processing
  6. Grzegorz Malewicz’s Google Research page
  7. CIEL: a universal execution engine for distributed data-flow computing
  8. Bloom Programming Language (via ComingThoughts)

The Data Portability Fact Sheet

Introduction

Parallego has been announced on TechCrunch after a stealth period as the latest social network that will challenge Facebook and Google Plus. Their investors include big names like Sequoia Capital, Andreessen Horowitz and Union Square Ventures, and they have top angels like Ron Conway. They really love developers, so they offer an API to show their commitment to openness.

Parallego doesn’t really exist, but announcements like this are part of startup breaking news about the web and entrepreneurship. These companies emphasize their love for developers and claim to be open because they provide APIs. The truth is that when you test their APIs you usually find a number of problems:

  1. You can read the information but cannot write or modify it.
  2. You have access to certain information but other information is unavailable.
  3. The rate of API calls is low, so you can only make a few calls and must wait a certain period of time to continue.
  4. You cannot make parallel requests in a multiprocess or multithreaded application.
  5. There is no way to quickly pay for the service and access a better service. Google API Console is a step in that direction but a lot of important Google NoAPIs are unavailable.
  6. Some OAuth2 protocol implementation does not work with the existing development libraries.
  7. The service says it welcomes new applications, but this is not the case for new UIs and mobile clients. See Twitter to Devs: Don’t Make Twitter Clients… Or Else [mashable.com]
  8. You cannot even export your own information. The time you have spent adding content to this service is lost once you leave it.
  9. There is no love for developers: the forums are filled with questions and there are no official answers. See Rate limit with billing enabled [google.com] and Graph API rate limit? [facebook.com]
  10. The company often changes its policies. The web mashup that you did seven months ago that attracted thousands of users is useless because the new API revision does not give you the data that you need for some specific features. See Should facebook pay compensation for deprecated API calls and changes [facebook.com]
  11. Old content is removed without warning.

After a while, you begin to doubt, close your eyes and rethink again about the word “Open”. It seems somewhat meaningless. If you are older you may remember that Microsoft was accused of being closed, but you may also remember that in the worst case you could reverse engineer and access all the internals yourself. You need advanced knowledge of tools like IDA Pro, OllyDbg, and WinDbg of course, but it was possible. You can’t reverse engineer the cloud, however you can scrape the information, but this is time consuming both in terms of development and running time.

And while “Open” is repeated in every announcement from high profile web companies, your brain does not register the word anymore just like you do not see any of the ads on Google because your brain made has made its own AdBlock extension.

Data Portability Classification

For all of the above reasons we think the best initiative towards transparency is adding a fact sheet to every service so we can compare them and know how “open” they really are. WikiMatrix is a good example of how comparisons could be made.

Marco Paol from DBB has been informally collecting information about some web services and has put it in a public spreadsheet on Data Portability Comparison

Please feel free to send us clarifications, suggestions, and fixes.

Resources

  1. Open Data and Linked Data [wikipedia.org]
  2. DataPortability project [wikipedia.org]
  3. Small data [smalldata.org]
  4. The open data manual [opendatamanual.org]
  5. Is It Open Data?
  6. Open Data mailing lists [okfn.org]
  7. Synaptic/Web
  8. Open Knowledge Foundation Blog
  9. The Friend of a Friend (FOAF) project
  10. theinfo.org: Community for Getting, Processing, and Visualizing Large Data Sets
  11. Plagiarism Today
  12. PeopleBrowsr’s case against Twitter heads back to state court after federal court ruling
  13. Archive Team archivists