Parsing S-Expressions in C# using OMeta

It is easy to parse S-Expressions in C# with OMeta. Our code limits the grammar to lists, and atoms of string, symbol, and number types. So, it is not complete, but it can easily be expanded with OMeta. What motivated me to write this article was the lack of publicly available S-Expression parsers in C#/.NET.

Our parser converts the expression (+ (* 3 4 5 6) (- 7 1) ) to the following tree:

parsed-s-expression
where each vertex is represented by a C# class containing an ArrayList, Symbol, String, or Integer. Note that the expression (1) is different from the expression without parenthesis. The first is a list with one atom and the other is just the atom.

S-Expressions are a compact way to express programs and data structures. They were first defined for Lisp, but are used in a variety of areas including public key infrastructure. We use S-Expressions to define data flows in Egont, our web orchestration language. In Egont, each S-Expression produces a tree which is converted into a directed acyclic graph, the subject of a future post.

OMeta can be used under C# via the OMeta# project. That makes it more interesting since the classical lexical analyzer and parser generators such as Lex/flex and Yacc/GNU bison do not produce C# code. ANTLR is an interesting alternative but at the time of this post the latest version, ANTLR 4, does not support C#. OMeta’s ability to deal with ambiguities makes it more suited to playing with grammars. However, there are performance penalties in OMeta which must be taken into account.

Code

The code is available as SExpression.NET [github.com].

  1. Compile the RebuildParser project first
  2. Run the Test project
  3. The SExpression project contains the SExpression.ometacs parser and its related C# classes

See Also

  1. Egont, a [Social] Web Orchestration Language
  2. Egont Part II

Additional Resources

  1. IronMeta: another OMeta implementation in C#
  2. YaYAML: a YAML parser written in OMeta#
  3. OMeta Performance
  4. Domain-Specific Languages: An Annotated Bibliography

Searching for Substrings in Streams: a Slight Modification of the Knuth-Morris-Pratt Algorithm in Haxe

It is odd that the base libraries for most programming languages do not allow you to search for regular expressions and substrings in streams or partial reads. We have modified the KMP algorithm so that it accepts virtually infinite partial strings. The code is implemented in Haxe, so it can generate code in multiple programming languages.

Streams are important when working with data that does not fit in main memory, such as large files, or with data which is being transferred. There are a few implementations of regular expressions and substrings matching. One is the Jakarta Regexp, now retired and resting in the Apache Attic. The Jakarta Regexp library “match” method in the RE class uses a CharacterIterator as a parameter. In C++, Boost.Regex implements partial matches.

Our code is implemented in Haxe so the same code can target Javascript, ActionScript, Flash SWF, NekoVM, PHP, C++, C#, and Java. We really like the concept of writing one code and expanding it to a variety of platforms with minimum effort. There are excellent libraries in specific environments that can work perfectly in other environments. Porting libraries from one programming language to another is tedious. For example, the amazing NetworkX graph library implemented in Python can be easily ported to C# to benefit a broader audience.

Code

Prerequisites

  1. Haxe (tested on version 2.10)
  2. For C++: hxcpp (run haxelib install hxcpp)
  3. For Java: hxjava (run haxelib install hxjava)
  4. For Mono/C#: jxcs (run haxelib install hxcs)

Source code available on github.

See Also

  1. Parsing S-Expressions in C# using OMeta
  2. Esoteric Queue Scheduling Disciplines

Resources

  1. Knuth-Morris-Pratt string matching
  2. Text Searching: Theory and Practice
  3. Boyer–Moore–Horspool algorithm
  4. Rabin–Karp algorithm
  5. Aho–Corasick string matching algorithm
  6. Lexicographically minimal string rotation
  7. Efficient way to search a stream for a string