Check out my online course: AngularJS Unit Testing in-depth with ngMock.

Understanding Automatic Semicolon Insertion in JavaScript

A Busy Person's Guide to Understanding Automatic Semicolon Insertion in JavaScript.

Bradley Braithwaite on August 26, 2015
on javascript

The JavaScript specification describes a feature called Automatic Semicolon Insertion. This feature has a special place in the Awful Parts section of Douglas Crockford’s booked titled: JavaScript: The Good Parts where he suggests that you shouldn’t use this feature. I generally avoid this feature. So if you shouldn’t use it, why am I writing about it? Firstly, because telling somebody they shouldn’t use a feature without explaining why is rude, and secondly knowledge of the feature may be useful in the event of a legacy code incident!

You can read the official ECMA descriptions of the feature here (but the detail will be covered in this post):

5th Edition: 7.9 Automatic Semicolon Insertion
6th Edition: 11.9 Automatic Semicolon Insertion

NB The 6th Edition of the specification was published only in June 2015 and many browsers have yet to implement all the features. At the time of writing most of the JavaScript around conforms to the 5th Edition.

What’s the Point of Automatic Semicolon Insertion?

The principle of the feature is to provide a little leniency when evaluating the syntax of a JavaScript program by conceptually inserting missing semicolons. I say conceptually, as it could just be a case that a program parses successfully based on this rule, as opposed to actually changing the code and adding the semicolons.

From the specification:

Certain ECMAScript statements must be terminated with semicolons. Such semicolons may always appear explicitly in the source text

But it goes on to state:

For convenience, however, such semicolons may be omitted from the source text in certain situations.

Here’s a simple code example, according to the specification these variable statements should be terminated by a semicolon:

var foo = 1
var bar = 2
var baz = 3

These syntax errors are accommodated for, and the code will considered as:

var foo = 1;
var bar = 2;
var baz = 3;

This means that when typing JavaScript code the semicolons are optional… expect when they’re not. Not very convenient. To use this feature correctly we need to understand the exact rules of the grammar and how the automatic insertion works, which is prone to human error! It’s much easier to debug a program with an outright syntax error when we will be given a line number of the problem, rather than a program that’s buggy at runtime and we can’t see where semicolons may or may not have been inserted.

What Problem Does It Cause?

There are cases when this feature can alter the intended behaviour of our program when it would have been better to simply stop us with a syntax error. The two most common issues are unintentionally returning undefined from a function or creating global variables by accident.

The most common problem caused by missing out a semicolon is with the return keyword. Consider this following code snippet:

var foo = function() {
  var bar = 'baz'
  return 
  {
    bar: bar
  }
}

console.log(foo());

This code is syntactically incorrect and would be better treated as a syntax error. The grammar rule for the return statement is as follows:

return [no LineTerminator here] Expression ;

I.e. there should be no line terminator after the return keyword. But automatic semicolon insertion accommodates this error by transforming the return statement to this:

return;
  {
    bar: bar
  }

When this code executes the output to the console is undefined and not the object. The inclusion of the semicolon has turned my erroneous return statement into a valid one, thus separating it from the initial value I intended to return which is now its own separate expression.

The next snippet also doesn’t use semicolons, but without the line terminator after the return keyword and this works as intended:

var foo = function() {
  var bar = 'baz'
  return {
    bar: bar
  }
}

Here’s another example, in the following code snippet I’ve been careless and omitted a comma after the foo variable:

var foo 
    bar = 2;

console.log(foo);
console.log(bar);

Rather than blowing up with a syntax error, the semicolon is added like this:

var foo; 
    bar = 2;

console.log(foo);
console.log(bar);

This problem is a little more nuanced. The inclusion of the semicolon would result in the bar variable being declared on the global scope. This would cause an error in strict mode, but that’s a discussion for another post.

The key point is that this feature can have some very subtle side-effects that can be hard to track in larger code bases.

How to Avoid Automatic Semicolon Insertion

A common misconception is that the Strict Variant (Strict Mode) of JavaScript suppresses this feature. It doesn’t. The only way the avoid this feature is to ensure that semicolons are always inserted in the correct place. The most effective way to detect missing semicolons in JavaScript is via a lint tool such as JSHint or JSLint. I generally use JSHint and have done so on most of the projects I’ve been involved with that use JavaScript extensively. This will help to detect instances of this type of problem (and others) at coding time.

For example, the following line of JavaScript should be terminated by a semicolon:

var foo = 'bar'

Using the jshint npm package, I can run a command such as the following via the command line and check for errors:

jshint semicolon.js

semicolon.js: line 1, col 16, Missing semicolon.

1 error

I also use Sublime Text 3 which has plug-ins that will show such lint errors within the editor. In the following example the yellow dot indicates the problem line, clicking on it indicates the specific ‘Missing semicolon’ error:

Sublime Text 3 JSHint

As a side-note, you can read more about my Sublime Text 3 Developer Setup.

How it Works

If you’re really in a hurry, you can finish reading here. Knowing about automatic semicolon insertion and how to avoid it will keep you safe. But, if you are curious and want to know more about how it works precisely then read on.

The following statements must be terminated by semicolons:

statements
- empty (an empty statement is nothing, and therefore would just be a ;)
- variable (var foo = ‘bar’;)
- expression (1 + 1; can also be function expressions)
- do-while
- continue
- break
- return
- throw
- debugger (6th edition only)
declarations (6th edition only)
- let
- const
- import
- export

Before we dive into the specific rules, let’s review the structure of a JavaScript program at a high level. A JavaScript program that parses correctly is made up of smaller statements that must match its grammar rules. Generally speaking, each statement that makes up a program is separated by semicolons, so that when a sequence of tokens that make up a JavaScript program are read from left to right, it’s easier to determine the end of a statement and the start of the next by the semicolons.

To muddy the water a little, there are some statements in the grammar that are not terminated by semicolons, e.g. The IF, WHILE and FOR statements. But we can terminate these with a semicolon, since the trailing semicolon would be handled as an empty statement.

In code, that means:

if (true) {
  return 1;
}; // <-- this semicolon is not part of the IF statement grammar rule

There are three basic rules of semicolon insertion:

Rule #1

When reading tokens of a program from left to right, a token that doesn’t match the grammar rule has a semicolon inserted before it if either of the two following conditions are met:

The error token is separated from the previous by at least one LineTerminator
The error token is }

Here’s an example. Where do you think the semicolon would be added for this snippet?

var a 
    b = 
    3;

The answer is:

var a;
    b = 
    3;

The semicolon is inserted at the end of the first line, since the grammar rule didn’t match (i.e. it didn’t expect to encounter b immediately after a) and there was a new line between the identifiers a and b. Note that it was ok to have the new line between = and 3 since this is allowed by the grammar rule.

To show how the second condition works, we have the following code:

var foo = 0;

if (true) { do { foo++; } while ( foo < 5 ) }

console.log(foo);

The grammar for the do statement is:

do Statement while ( Expression );

NB it must be terminated by a semicolon. In this example, after the ( foo < 5 ) a left brace is encountered and not a semicolon, so the second condition holds (the error token is }). This rule follows the assumption that we’ve finished a block of code but have likely forgotten the semicolon.

Our code will be changed to this, accommodating our mistake in forgetting the semicolon:

if (true) { do { foo++; } while ( foo < 5 ); }

Here is this rule visually:

Rule 1

Rule #2

If the program is parsed until the end of the input and it’s not yet a complete program i.e. there were no outright errors that would have caused an exception to throw before reaching the end of the tokens, a semicolon is appended.

Following on from the previous example, this time we haven’t wrapped the code in a { block }:

var foo = 0;
do { foo++; console.log(foo) } while ( foo < 5 )

This is technically not a valid program, since the trailing semicolon is missing, but with the automatic insertion the program still executes since it becomes:

var foo = 0;
do { foo++; console.log(foo) } while ( foo < 5 );

NB the trailing semicolon means that this now matches the production for the do-while iteration statement.

Here is this rule visually:

Rule 2

Rule #3

If you’ve read the specification, this is the most difficult rule to read. What it means to say, is that for the following grammar rules that do not allow line terminals where indicated, if a line terminator is encountered it will try and save you by adding a semicolon before the line terminator is encountered:

PostfixExpression :
LeftHandSideExpression [no LineTerminator here] ++
LeftHandSideExpression [no LineTerminator here] --

ContinueStatement :
continue [no LineTerminator here] Identifier ;

BreakStatement :
break [no LineTerminator here] Identifier ;

ReturnStatement :
return [no LineTerminator here] Expression ;

ThrowStatement :
throw [no LineTerminator here] Expression ;

/* ES6 Only */

ArrowFunction :
ArrowParameters [no LineTerminator here] => ConciseBody

YieldExpression[:
yield [no LineTerminator here] * AssignmentExpression
yield [no LineTerminator here] AssignmentExpression

As an example, consider the grammar rule for the return statement. This states that there should be no new line after the return keyword:

return [no LineTerminator here] Expression ;

In this example, we have a return keyword which is followed by a new line before we see the expression (in this case it’s a number expression).

  return 
  1

So the semicolon is automatically inserted. E.g. the executed code becomes:

  return;
  1

This is similar to what we saw in Rule #1, but these are special cases for these restricted grammar rules that explicitly state that the line terminator is the error token.

This means that if you do wish to use semicolons optionally in your code, you must keep the expression to be used with the below keywords on the same line:

continue
break
return
throw
yield
postfix ++ or –

Here is this rule visually:

Rule 3

An Overriding Rule

However, there is an additional overriding condition on the preceding rules:

The added semicolon would become one of the two semicolons in the header of a for statement.

A semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement.

The first case is straight foward, it means that a semicolon will never be automatically inserted between the parenthesis of a for statement:

for ( Expression ; Expression ; Expression ) Statement

The second is a little more complex. It’s best illustrated via the following example snippet:

for (var i = 0;
     i < 10;
     i++)

Here, the last token would be reached but it would still not make up a complete program (since the grammar rule of the for statement would not be matched) and Rule #2 would apply. However a trailing semicolon cannot be added in this instance, since doing so would produce this:

for (var i = 0;
     i < 10;
     i++);

If you were to type out this code, it would be valid. But the trailing semicolon in this code snippet is an empty statement. To recap, the syntax ;;; is valid JavaScript and equates to three empty statements. Therefore semicolon insertion would not be valid in this example, since the automatically added semicolon would be parsed as an empty statement.

Here is this rule visually:

Rule 3