Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

LexingError

Alex Speller edited this page Jul 12, 2013 · 41 revisions

Note: if you are sure your feature is correct, please see this issue

Lexing errors look like this:

features/h2.feature: Lexing error on line 1: 'As anybody'. (Gherkin::LexingError)

If you see a LexingError when using Gherkin (or other software, like Cucumber, that uses Gherkin), it is because Gherkin has been fed some input which it cannot recognize at all. It is a situation similar to compiling HTML with gcc: it can’t even begin to work. That being said, there are a few common causes of lexing errors.

Common Causes of LexingErrors

Missing Feature header

  • To fix: Add a Feature: keyword (or appropriate translation) to the top of the offending file
  • Usually affects users upgrading from versions of Cucumber older than 0.7.0
  • Gherkin replaced Treetop as the feature-file parser beginning with v0.7.0, and Gherkin is less permissive than Treetop

For example: this feature file with throw a lexing error with Cucumber v0.7.0 and later:

As a user
I want something
So I can be rich

Scenario: Foo
  When Bar
  Then Baz

To fix the error, add a Feature: keyword to the top of the file:

Feature: Get rich
  As a user
  I want something
  So I can be rich

Typos

They happen. When you see a lexing error, it might be a typo. Read the error carefully and review the input source at the line mentioned. You’ll usually see the typo right away; often a missing or misplaced colon after a keyword.

EOL Comments

  • Comments are allowed only at the start of a line
  • Comments are not allowed at the end of the line, and in some cases may be interpreted as part of that line (as with a Given/When/Then step)
  • Comments are best used to comment out a single step or table row. If you feel like using a comment to clarify your intention, ask yourself if it is possible to include that detail in a step name or a scenario or feature description.

What to do when you encounter a LexingError

Look over the common causes. If you still can’t figure out what is wrong, you may have found a bug in the Lexer. Open an issue in the Gherkin tracker describing how to reproduce the bug, or hop onto the #cucumber IRC channel and ask for help or verification there. See the cucumber wiki on getting in touch for more details on finding help.

LexingError vs. ParseError

If you are wondering why lexing errors are so cryptic and parsing errors are much clearer (as far as errors go), the answer is to understand where the errors are raised during typical usage. This will also explain why it is so difficult to create a more helpful lexing error.

Gherkin processes all of its input in two phases: lexing and parsing. The codebase is divided accordingly into a lexer and a parser (actually several of each, but that’s an implementation detail at the moment). The lexer’s responsibility is to match an input stream against a regular language (think a big regular expression) and “pick it apart” into a sequence of tokens that it sends to the parser. The parser’s responsibility, on the other hand, is to verify that the sequence of tokens it receives from the lexer “makes sense” in the order they are received. So the lexer only identifies valid tokens, but the parser only valid sequences of tokens. From this you can infer why the two kinds of errors differ so much. Each Gherkin parser (there are currently two) is a state machine. As such it knows what tokens are allowable given its current state, and when it receives a token it was not expecting, it is quite easy to output all possible expected tokens. The lexer, because it does not track state at the token level, has no such luxury. It is busy determining whether a sequence of bytes is a valid token to begin with.

As a quick example consider the following Gherkin:

Given a step
Scenario: Confused
@tag1
Feature: backwards

This can be lexed—it is composed purely of valid tokens—but it is not a valid feature: it is backwards. As such it should not be parsed, and a ParseError is raised. Now consider this:

Feature: my feature
Scenario: my scenario
Given a step
Or another step

This is “corrupt” Gherkin. “Or” is not a valid keyword in any language. It cannot even be lexed, so Gherkin does the simple thing and raises a LexingError.

If you’ve gotten this far you may as well learn about the payoff. This design allows us to define different valid subsets of the Gherkin language by implementing different parsers, which is much easier than changing the lexer to accomodate all sorts of special cases, or “faking it out” whenever you need just a small bite of Gherkin. Cucumber already takes advantage of this by using separate parsers for feature files and lists of steps.

See Wikipedia for more information: