: an attempt to make computer machines run better : an attempt to make computer machines run better

home | better linux | games | software | tutorials | reference | web log |
index |
Javascript: worst language ever? A totally objective analysis

To be fair, I hate a lot of programming languages. However, after working in a variety of different languages, I think my hatred is justified. I also think I can convince you if you read this article.

Javascript has been around for a long time, it's one of today's most popular languages, and if you develop anything for the internet, it's impossible to avoid using some Javascript. However, we would all be better off without it.

For some reason, a lot of really bad decisions have been made in web standards. HTML was the first, when most of the internet was made up of incredibly slow dial-up modem connected computers, bandwidth should have been a precious resource. Instead, we created HTML, based on SGML, a rediculously verbose language that was intended to be transmitted in uncompressed, text form, taking up as much bandwidth as possible. On top of that, HTML was at the time probably the most difficult language for a computer to parse, which was made even worse by the loose standard, parses had to not only parse valid HTML, but invalid HTML as well. It was a worst case scenario for both bandwidth and cpu. All that served over the text-based HTTP protocol. However, it caught on.

Then came along flash, java, and Javascript. I don't need to say anything about flash, and java is an entirely different subject. But Javascript, or ECMAScript as it is called in the standard, is what I want to talk about. It's quite possibly the biggest problem with modern computing, maybe even worse than systemd.

To make my case, I want to first talk about how computers do math. Addition and subtraction are easy, each number is stored in binary, with a high voltage representing a 1 and a low voltage representing a 0. Then, to add two numbers together, there's a little circuit made up of a two xor gates, two and gates, and one or gate, called a full adder. They take three input signals, one bit from each input number, and the carry bit from the previous bit's addition, and produce one bit of the sum, and one carry bit. To add two 64 bit numbers together, all you need is 64 full adders chained together by their carry inputs and outputs. This might sound like a lot, but CPUs really do have this circuitry in them, and it takes up a very small amount of space in the chip. Also note that this operation doesn't have any clock signal input, so a grand total of 0 clock cycles are required to complete this operation. It's entirely combinational, making it incredibly fast.
For negative numbers, computers use a the two's complement of positive number to represent a negative number. To calculate a two's complement, simply flip all bits of the number, then add 1 to the result. Computers also use the highest bit signify the sign of a number, 0 for positive and 1 for negative. The two's complement operation can also be implemented with combinational logic. Two's complement negatives are important because they allow the same addition hardware to work with negative numbers as well as positive. This also allows for implementing subtraction using only the same hardware as used for addition, although dedicated combinational subtraction is also possible.
These simple operations and others like them handle a lot of what computers need to do, so it's extremely important that they be implemented efficiently. They make our computers fast, cheap, and power efficient.
Not all operations are implemented entirely in combinational logic though. Multiplication and division are much more complicated, and require clock cycles to complete. This is especially true for division, which is consistently the slowest operation across most CPUs by a significant margin.

So how does this prove that Javascript is a bad language? Where could I possibly be going with this?
Well, all the operations I have described were for integer operations, which are the vast majority of operations performed by almost all software, from operating systems to text editors to image editors and even web browsers. Pretty much everything except Javascript.
Not many people know this, including veteran javascript developers, but Javascript doesn't actually have an integer data type, there is only "number".
Now, you might think that it's fine because the Javascript interpreter is smart enough to figure out when its ok to use integers behind the scenes. However, you would be wrong.

The ECMAScript standard (Javascript standard) actually requires that the number type be implemented using IEEE double-precision floating point numbers. That means that the fastest, most optimized, most common, most important operations that a CPU can do are simply not available in Javascript...
Now let's look breifly at how IEEE double precision floating point numbers work to get a better insight into how Javascript does things. For the sake of brevity, from now on I will use the term "double" (C's data type for the same thing) to refer to IEEE double precision floating point numbers.
doubles are stored using 64 bits, broken up into 3 parts. In order, they are the sign bit, 11 bits representing the biased exponent, and 52 bits representing the low 52 bits of the 53 bit significand (also known as the mantissa). The high bit of the significand is implicit and always 1 in normal numbers, it's an implied 0 for 0 and all "subnormal" numbers. The biased exponent is an 11 bit integer, but it's biased by -1023. So to find the value of a double, all you need to do is apply the following formula:
(pow(-1, signbit)) * (1 + pow(significand, -53)) * (pow(2, exponent - 1023))
Simple right?

This means that converting back and forth from integers to doubles requires a significant amount of computation, involving multiplication and division. Some of that can be optimized out with bit shifts (another combinational logic operation), but not all of it. On top of that, actually doing math with floating point numbers becomes significantly more difficult. Addition and subtraction can only be done on numbers that have the same exponent, and then the result has to be normalized. This rules out the exclusive use of fast combination logic. Multiplication and division are similarly much more complicated for doubles. What all this means is that using doubles is several orders of magnitude slower than integral types.

Now the unfortunate arguement that I always hear is that speed doesn't really matter, computers are fast enough. This, of course, goes against everything this site and I personally stand for, but with Javascript, there is an excess of evidence proving that speed and efficiency are important. Just think about how much better everything would be if your computer didn't slow to a crawl when browsing Javascript-heavy websites.

Having no access to integer data types doesn't prevent the Javascript language from including a full complement of bitwise operators. The question is, what do they do? Bitwise operations rarely make sense for floating point numbers, and most architectures don't implement bitwise operations in the floating point unit for this reason, so why does Javascript implement them? The answer is that it actually pretends like the number is an integer. Whether it converts to an integer or emulates integer operations, I don't know, probably depends on the implementation. Bitwise operations are used for a variety of reasons, but one common one is for their speed. Left and right bitshifts can allow a programmer to do some types of multiplication and division (on integers) without actually using the expensive multiplication and division operations; Bitshifts can bring them into the combinational logic domain. However, in Javascript, this would also involve converting from double to integer, doing the operation, then converting back, completely negating the performance advantage. Other bitwise operations have the same performance penalties in Javascript, while using the same operations in other languages would likely increase performance. On top of this, bitwise operations treat Javascript numbers as 32 bit integers, but a normal Javascript double has 53 bits of significand, meaning that this operation can truncation. Bitwise operations just make no sense in Javascript, which is probably why Javascript developers have really silly ideas about what they are for and what they do. I have seen bitwise or being used to convert strings to numbers.

If you still aren't on my side... then you are either crazy, or in denial, or maybe just very stubborn. However, I'm not done yet, performance isn't my only argument. Next, let's talk about the language itself.
The most commonly cited "benefit" of Javascript is that it is "easy for beginners". However, I have noticed that this seems to claimed true for almost all programming languages. In fact, I assert that this is the case for C as well. In reality, this can be said for any programming language because programming just isn't very difficult. Anybody can pick up any programming language and start doing things with it in a short time with just a little bit of effort. I would assert that C is actually the easiest language to start with because the language itself is so small and its rules so consistent. Javascript, on the other hand is significantly bigger, and comes with built-in functions, objects, and methods.
What people really mean when they claim Javascript is easy is that Javascript is very forgiving. It tends to keep working even when there are mistakes and errors in the code. This is not a good thing, especially for beginners. What this actually does it hinder debugging and promote sloppy, lazy code. The best example of this is Javascript's infamous Automatic Semicolon Insertion (ASI). According to the ECMASCript standard (in one of the most incoherent paragraphs I have ever read), it is stated that semicolons are required for terminating statements in Javascript. However, if the Javascript parser encounters a situation that would result in a parse error, must attempt to resolve the situation by inserting a semicolon. Parsing only fails if it still can't be parsed after adding semicolons. Sounds insane, right? But read the ECMAScript standard on ASI (section 11.9) for the most bizzare paragraph you will ever read in formal documentation, and maybe the most bizzare in your entire life:

Most ECMAScript statements and declarations must be terminated with a semicolon. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations.

This makes parsing Javascript a nightmare, and it also results in some kinds of common errors being hidden. To compound the problem, Javascript's grammar has some very strange symbols and production rules in it. For instance, Javascript actually has two different terminal symbols for whitespace, one that includes a newline, and one without. The whitespace-no-newline terminal is used in 3 statements, most notably the return statement, coming between the return symbol and its operand. I have made this mistake before, putting your return value on the line below the return keyword can make code easier to read sometimes, but in Javascript, it causes a parse error. Javascript will never tell you that you made this mistake though, because ASI will resolve the parse error for you by placing a semicolon after return and causing the operand to never be evaluated.

Now a sane programmer (e.g. a non-javascript programmer) might expect this type of error hiding to be an issue because an error would be thrown immediately by anything attempting to use the return value of a function containing this type of error. However, once again Javascript has a solution to hide this kind of bug from the developer. See, the return value is undefined, and you can't use and undefined value.... right....? Wrong. Javascript has defined undefined values. You function returning nothing will actually return a special value called undefined. You can do almost anything with an undefined value without causing Javascript to even generate a warning. Your code will not work correctly, but good luck figuring out where the bug is. Debugging Javascript is impossible.

At this point, I think you have to concede the following:
1. Given Javascript's lack of integers and resultant performance limitations, it's a very poor choice for skilled developers seeking to write high-performance code.
2. Given Javascript's strange parsing, grammar production rules, and error hiding behavior, it's a very poor choice for beginners.

However, I know that some people will still argue that Javascript is somehow the right language choice, despite all the issues I highlighted so far. Many developers will likely argue that they fall into another category of programmer that I haven't covered, a skilled programmer that doesn't care about performance. I assert that such a programmers are misguided, disregarding performance is the same as disregard for the users of your software, and is likely to cause expensive issues in the future when your software is brought to scale, but that's another debate. Believe it or not, I still have more issues with Javascript to talk about that affect these types of developer.

Javascript also has issues with basic programming concepts. For instance once a variable is declared, one expects that variable to exist in a specific area of the code defined by where it was declared. This is called scope, and it allows variable names to be reused in different places without conflicting. This is important for a lot of reasons, but mainly its important because it helps programmers effectively manage large numbers of variables. It's a very important concept. The problem of course is that Javascript does it wrong.

I tend to think of scope as a property, as far as I can tell from experience and research, my understanding of scope is also technically correct. However, Javascript developers seem to invariably consider variables as being contained within a scope, which exists regardless of the existance of variables. There are reasons for this, variables in Javascript can only have either a scope spanning the entire script, or a scope spanning the function in which they are declared. This is contrary to most sane languages where scope begins at declaration and ends at the end of the compound statement in which it was declared. Note that this distinction seems minor, but consider the difference between a compound statement and a function. if statements, while loops, and for loops are all usually followed by compound statements. Also, consider that Javascript's variables' scopes span entire functions or the entire script, not beginning from their declaration as they would in a sane language. This means that variables can exist and be used before they are declared. Note that this is not the same as implicit declarations, which are also a part of Javascript, we will get to that soon. This behavior is known as hoisting, and to my knowledge, no other language does this. I think this tends to lead Javascript developers to the wrong idea about what scope means. This and the fact that the ECMAScript standard seems to imply this definition of scope as well, although its hard to tell because the ECMAScript standard does not do a a great job of defining scope (see ECMAScript standard section 8.1), and sometimes uses very bizarre langauge (see section 11.9). Why does this matter? For small programs it doesn't matter much, but it can cause serious complications in large, complex projects, which seem to be Javascript developers favorite type of projects.

Now let's get back to implicit declarations. Some other languages do this too, even C, but in C it only applies to functions, and since scope makes sense in in C, and the actual linking of functions is done by the linker, everything still makes sense. In Javascript though, variables can be implicitly declared, get a type and value of undefined, and trigger no warnings or errors. This means that once again, Javascript does a good job of hiding errors from you, hindering debugging. What a wonderful language for beginners. Typos in variable names are not considered an error in Javascript. On top of this, implicitly declared variables get a global scope, further increasing the chance of hard to debug issues in the future.

Finally, let's talk about strings. I have already demonstrated that Javascript is terrible with numbers and math, its terrible with basic langauge concepts, maybe since it was made for the web, its good with strings.
Well, not quite. The top 10 most common [natural] languages used on the internet are English, Russian, German, Japanese, Spanish, French, Chinese, Portuguese, and Italian. All except Russian, Japanese, and Chinese utilize a Latin based alphabet. The most character encoding used on the web is UTF-8 (by an enourmous margin, 93.1% UTF-8 vs 3.4% the next most common encoding, ISO-8859-1). This makes sense, since Most of the Latin based characters fit into one single byte of UTF-8, and all other Latin based characters fit in two bytes. Cyrillic also fits in two bytes of UTF-8, which covers Russian as well. UTF-8 is an excellent choice for most text. There are plenty of other good reasons for UTF-8 as well, like avoiding endianness issues, it's the best general purpose text encoding we have.

So what does Javascript do? Well, it uses UTF-16 for all strings. This is not a case of poor implementations, just like IEEE floating points, it's actually mandated by the ECMAScript standard (sections 4.2.17 and 11.8.4). Javascript follows the example of Microsoft, using an inefficient encoding for everything. Even worse, Javascript developers don't seem to even be aware that strings are stored in UTF-8, I know developers that have insisted that UTF-8 is the encoding in use.
UTF-16 doesn't even have the advantage of being a fixed-width encoding, characters above U+FFFF get encoded in four bytes per character.
However, this doesn't mean that UTF-8 would have been the right choice, UTF-8 becomes less efficient for most Asian languages, which is why sane languages don't mandate any specific encoding, but instead leave the decision up to the developer. Even PHP (which is not a sane language) leaves this decision up to the developer.

There is still a lot wrong with Javascript that I haven't covered. There is weird type coercion rules (1 + "1" == 11), lack of arrays (array objects are not arrays), and of course trying to figure out what object "this" is... What I have elaborated on, however, should be more than enough to convince almost everyone. To reiterate what I have written:

1. IEEE double precision floating points for all numbers was a poor decision for any application seeking high-performance or use on resource constrained systems

2. IEEE double precision floating point numbers make no semantic sense with bitwise operators

3. Automatic Semicolon Insertion results in subtle bugs being hidden from the developer

4. Javascript defines undefined values, causing further debugging issues

5. Javascript's error hiding behavior makes it a poor choice for beginners

6. Javascript's strange parser production rules make little sense

7. Javascript defines scope incorrectly, and its scope rules are terrible and hinder development of complicated software

8. Variable hoisting and implicit (and global) variable declarations

9. Javascript forces use of UTF-16 for all strings, making it ineffecient for the vast majority of uses

Given this evidence, I think I can finally and safely answer this article's titular question:
Is Javascript the worst language ever?