Slightly amusing

Posts

Showing posts from 2016

Testdriving Earley: Part 1 - Basics

February 08, 2016

So, I decided to write a parser . Silly me, I know. So if I am going to do this, I need to do it properly. So I am going to try and develop it using test driven approach and documenting the whole process in this blog as I go. This is the first article in the series of articles where I try to establish basics and bootstrap a project. First things first. To write a parser, you need to know at least some basics of the theory. Some of things you need to know at least on the basic levels is the basic theory and terminology around formal grammars , Chomsky hierarchy of formal grammars and basics of automata theory . You don't have to become an expert (I am certainly not at this point), but the terms and concepts from these theoretical disciplines will crop up here and there and it There is a lot of literature out there about Earley parsers . Some of it even readable for mere mortal developers not used to the academic vernacular. The initial set of source info...

Testdriving Earley - Introduction

January 15, 2016

So, I've been interested in parsers for a while. There are plenty of opportunities to use a parser in our every day jobs, but for some reason, it does not happen very often. Most of the time, when some task presents us that requires us to recognize some user input, we almost instinctively reach for regular expressions. Regular expressions are a very powerful tool, to be sure. Maybe even too powerful. With all the fancy additions like non-capturing lookahead and back references — regular expressions have become legendary source of obfuscation and subtle bugs in user input verification. Over the years, I've tried my hand with various parser generator and combinator libraries (ANTLR, JavaCC, JParsec to name the few) and hand-rolled my own in couple of instances. Most of the time when using parser generators, I've felt like I am invoking this big machinery just to get something simple up. And then there's specific parsing technology quirks that I just have to know...