Parsing S-Expressions in C# using OMeta

It is easy to parse S-Expressions in C# with OMeta. Our code limits the grammar to lists, and atoms of string, symbol, and number types. So, it is not complete, but it can easily be expanded with OMeta. What motivated me to write this article was the lack of publicly available S-Expression parsers in C#/.NET.

Our parser converts the expression (+ (* 3 4 5 6) (- 7 1) ) to the following tree:

parsed-s-expression
where each vertex is represented by a C# class containing an ArrayList, Symbol, String, or Integer. Note that the expression (1) is different from the expression without parenthesis. The first is a list with one atom and the other is just the atom.

S-Expressions are a compact way to express programs and data structures. They were first defined for Lisp, but are used in a variety of areas including public key infrastructure. We use S-Expressions to define data flows in Egont, our web orchestration language. In Egont, each S-Expression produces a tree which is converted into a directed acyclic graph, the subject of a future post.

OMeta can be used under C# via the OMeta# project. That makes it more interesting since the classical lexical analyzer and parser generators such as Lex/flex and Yacc/GNU bison do not produce C# code. ANTLR is an interesting alternative but at the time of this post the latest version, ANTLR 4, does not support C#. OMeta’s ability to deal with ambiguities makes it more suited to playing with grammars. However, there are performance penalties in OMeta which must be taken into account.

Code

The code is available as SExpression.NET [github.com].

  1. Compile the RebuildParser project first
  2. Run the Test project
  3. The SExpression project contains the SExpression.ometacs parser and its related C# classes

See Also

  1. Egont, a [Social] Web Orchestration Language
  2. Egont Part II

Additional Resources

  1. IronMeta: another OMeta implementation in C#
  2. YaYAML: a YAML parser written in OMeta#
  3. OMeta Performance
  4. Domain-Specific Languages: An Annotated Bibliography

Searching for Substrings in Streams: a Slight Modification of the Knuth-Morris-Pratt Algorithm in Haxe

It is odd that the base libraries for most programming languages do not allow you to search for regular expressions and substrings in streams or partial reads. We have modified the KMP algorithm so that it accepts virtually infinite partial strings. The code is implemented in Haxe, so it can generate code in multiple programming languages.

Streams are important when working with data that does not fit in main memory, such as large files, or with data which is being transferred. There are a few implementations of regular expressions and substrings matching. One is the Jakarta Regexp, now retired and resting in the Apache Attic. The Jakarta Regexp library “match” method in the RE class uses a CharacterIterator as a parameter. In C++, Boost.Regex implements partial matches.

Our code is implemented in Haxe so the same code can target Javascript, ActionScript, Flash SWF, NekoVM, PHP, C++, C#, and Java. We really like the concept of writing one code and expanding it to a variety of platforms with minimum effort. There are excellent libraries in specific environments that can work perfectly in other environments. Porting libraries from one programming language to another is tedious. For example, the amazing NetworkX graph library implemented in Python can be easily ported to C# to benefit a broader audience.

Code

Prerequisites

  1. Haxe (tested on version 2.10)
  2. For C++: hxcpp (run haxelib install hxcpp)
  3. For Java: hxjava (run haxelib install hxjava)
  4. For Mono/C#: jxcs (run haxelib install hxcs)

Source code available on github.

See Also

  1. Parsing S-Expressions in C# using OMeta
  2. Esoteric Queue Scheduling Disciplines

Resources

  1. Knuth-Morris-Pratt string matching
  2. Text Searching: Theory and Practice
  3. Boyer–Moore–Horspool algorithm
  4. Rabin–Karp algorithm
  5. Aho–Corasick string matching algorithm
  6. Lexicographically minimal string rotation
  7. Efficient way to search a stream for a string

Running Console Applications with Invisible Windows

Hiding Console Application Windows

With the simple free source code and executable application below you can launch and run multiple console applications using cmd.exe without displaying the console window. This code eliminates the need to build a service, and is a useful complement to the Distributed Scraping With Multiple Tor Circuits article for those running Tor under the Microsoft Windows operating system. It includes a batch file to enable you to run multiple Tor proxies. Other companies charge for a similar application.

Application

You can download the application hideconsole.exe here.

Code

This is the simple code. If you want to download the entire Visual Studio 2010 project or fork it please use the github project.

// hideconsole.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include "windows.h"
#include <string>
#include <iostream>
#include <stdio.h>

namespace std
{
#ifdef _UNICODE
	typedef wstring tstring;
	#define tout std::wcout
#else
	typedef string tstring;
	#define tout std::cout
#endif
}
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
	if(argc == 1) {
		//wprintf(_T("Usage: %s <cmd> [<parameter1>..<parametern>]"));
		tout << "Usage: " << argv[0] << " <cmd> [<parameter1>..<parametern>]" << endl;
		return 0;
	}

	tstring file = argv[1];
	tstring parameters;

	for(int i=2; i < argc; i++) {
		if( i != 2)
			parameters.append(_T(" "));
		parameters.append(argv[i]);
	}

	ShellExecute(NULL, _T("open"), file.c_str(), parameters.c_str(), NULL, SW_HIDE);
	tout << "Running cmd = " << file << endl;
	tout << "Arguments = " << parameters << endl;

	return 0;
}

Batch for Running Multiple Tor Instances

mytor.bat

@echo off
echo %3%
IF EXIST data\tor%3 GOTO DATASUBDIREXISTS
	mkdir data\tor%3
:DATASUBDIREXISTS
.\hideconsole.exe c:\windows\system32\cmd.exe /k """"""c:\Program Files (x86)\tor\tor.exe""" --RunAsDaemon 1 --CookieAuthentication 0 --HashedControlPassword """" --SocksListenAddress 192.168.0.178 --ControlPort %2 --PidFile tor%3.pid --SocksPort %1 --DataDirectory data\tor%3""""

start-multiple-tor.bat

@echo off
setlocal enabledelayedexpansion

set /A base_socks_port=9050
set /A base_control_port=8118
set /A idx=0

IF EXIST data GOTO DATAEXISTS
mkdir data

:DATAEXISTS

FOR /L %%i IN (1,1,80) DO (
	call .\mytor !base_socks_port! !base_control_port! !idx!
	set /A base_socks_port+=1
	set /A base_control_port+=1
	set /A idx+=1
	rem echo !idx!
)

Note

  • The multiple quotes that you see in the batch file are necessary since we are escaping quotes three times.

Resources

  1. CMD.EXE: Escape Characters, Delimiters and Quotes
  2. CMD.EXE Escape Character, Caret. OK, But What About Backslash?

Image by william