A Glance Into Web Tech.-Based Text Editors' Text Management

It's been nearly two years since I wrote "A brief glance at various text editors" which was fun to write. A lot of people wanted to see how modern web-based editors manage their text. I've found the time and motivation to write another article about these wonderful tools. This does not cover rich text web-based editors - only source code editors.

As with the last article, this is no way 100% accurate. There isn't as much source this time, because the majority use plain JavaScript arrays... I've left out the "physically, virtually, and space efficient" triangle graphics. Sure they were nice to see but offered little, and were a lot of work. To make up for it, I have a little surprise at the end you hardcore text editor fans will love. Corrections normally follow when writing this type of material. If you have any suggestions, I'll do my best to find you in the various reddit and hackernews threads that sprout from this...or I'll reply to your comment here right away! Anyone can comment. No registration required.

Alright, grab a drink, sit back, and read on!

Snazzy

GitHub's Atom

Atom has been infamous for being slow. With other JavaScript based editors like Visual Studio Code, there is no excuse for this bad experience (and we'll take a look at VSC soon too). I begin by going to their GitHub repository page and searching src/ directory for anything to do with a buffer or text editor component. One thing that's apparent right away is the amount of abstraction and decoupling. Atom truly does lend itself to be very malleable. After 10 minutes I realized it was importing a "text-buffer" package that it downloads from npm. It's still by the Atom developers, they've just separated the buffer so it's easier to test and maintain. Personally I like how this further demonstrates their excellent separation of concerns.

All the text insertion is "funneled" through a single function called setTextInRange through various events. In theory the asynchronous event system should leave Atom always responsive. The real meat is when applyChange is called. Below is a copy of the code:



I'm not sure if key presses are buffered by Atom. Worst case scenario this approximately 80 line function is being called every time you press one key. In reality though I think the key presses are buffered and the routine is run when the event system is given time to handle the event.

Every time text is inserted, it's inserted as one "chunk", then split up by its line endings. This is done by invoking a regular expression engine. Personally I think this is overkill, but it certainly lets Atom continue to be easily modifiable. I can imagine the same thought is running through a few people reading this. It pushes all the new lines to a stack (or more technically: a regular JavaScript array). Already I don't want to find myself opening a large file. It then uses "spliceArray" to replace a range of lines.

So what is the actual data structure of the great Atom text buffer?...

@lines = [''];

A regular JavaScript array. Ooof.

Atom will have no problem growing. Its speed can definitely be on-par with Visual Studio Code if someone truly wanted to see it happen. Currently it looks like the Atom developers are leaning on the built-in arrays for performance. There is a huge opportunity here to contribute to Atom and make it way better than it already is. Write a piece chain text buffer implementation. Write a gap buffer implementation. Anything but using a regular array like this.


Oh this is snazzy too

Microsoft's Visual Studio Code

Visual Studio Code is the new player on the block. With it's 1.0 release, Visual Studio Code has reached a stable API to allow its developers to devote efforts to new plugins. It's a direct competitor to GitHub's Atom. One major advantage VSC has over Atom is its responsiveness. In my experience, and the experience of many others, we've observed that VSC is just plain overall faster than Atom. I'm not sure about resource usage, but I've heard stories of Atom eating 8 GB of RAM vs no stories of VSC doing this.

It was very difficult to find out how VSC manages its text. There are several applyEdits methods peppered all over the source. Instead of copying the code here, I'll link to these applyEdits methods for you to read and interpret yourself.

https://github.com/Microsoft/vscode/blob/2f76c44632b0d47ba97f66fbc158c763628e30b3/build/monaco/api.js#L178

https://github.com/Microsoft/vscode/blob/04f5c82604307d7928656a2a3fccaf321a6c4f3f/src/vs/editor/common/model/editableTextModel.ts#L259

https://github.com/Microsoft/vscode/blob/7f438ed3c0cbdf2caa98bd89d44d37c300573a99/src/vs/editor/common/model/modelLine.ts

If we look at api.js, it appears VSC doesn't do anything fancy either. It uses plain JavaScript arrays and uses .splice() methods to insert new text. One difference is it looks like VSC buffers many edits. But again like Atom it is event based and uses a very, very similar architecture. The code looks less organized, and more complex. I guess VSC uses these different text models when appropriate and it pays off overall. Other than that I have no idea. Maybe that one difference is all it takes vs Atom's simple approach. I still think VSC would choke on large files just like Atom would.

"Arrays" could've been a good name

Adobe's Brackets

CodeMirror is a competitor in the web-based editor arena. CodeMirror 2 is a rewrite by Adobe. You may also know this editor as "Brackets". It's also used by the Codeanywhere service. There is already an excellent write-up about it. Both of them use binary trees where the nodes are lines, so not quite rope data structures. Finally a web editor that steps up its text management game. Only issue I have with this editor is the source code is a single 9000 line fileHere is the first version's insert code walking through the tree and inserting a new leaf. It's cool to see some businesses have realized CodeMirror's efficiency and power.

Wow a non-flat icon?!
It actually looks great.


Cloud9's Ace

Ace is another web-based editor that's been around for awhile. It's the editor used over at Cloud9. There's not much more to say about it. At its core it's just a simple JavaScript array.





Orion and The 3 Moons

Eclipse's Orion

And that leaves us with just Orion, an editor I haven't heard much about. It's used in Eclipse CHE, a web-based Eclipse. Their mission statement is noble:


The goal of Orion is to build developer tooling that works in the browser, at web scale. The vision behind Orion is to move software development to the web as a web experience, by enabling open tool integration through HTTP and REST, JSON, OAuth, OpenID, and others. The idea is to exploit internet design principles throughout, instead of trying to bring existing desktop IDE concepts to the browser.

I browsed the GitHub for a bit but I couldn't really find anything. A lot request-based code. If anyone finds out I'll add your inspection here.


So those are all the current popular web technology based source code editors out there. Did you notice something? They are all backed by some company. This leads me to ask: why are these companies competing in this market? What is there to gain? What is there to...lose? Well money, time and energy.

One thing I'm taking away from this is if I ever need a web-based source code editor, hands down I'm going with CodeMirror. I also expect some big changes in Atom that will make its efficiency on-par with Visual Studio Code. That will be exciting. I welcome the JavaScript Emacs of tomorrow.


And now, for our special feature presentation...!

Microsoft's Word 1.1a


Holy this tool looks serious...I think I like it.

Yep. And boy is the source not pretty. After about an hour and a half I've finally found the code that manages the text. I have to thank the people who worked on this project. Without the comments I'd be sitting here all night trying to figure out what it does.



And that's not even the best part. The FcAppendRgchToFn function is in fricking assembly.


I've got to say though the assembly is nicely written and commented. It's how I write my Z80 actually - writing out pseudo code in a comment and then translate it below.
Word uses a "scratch buffer" to hold all changes. The scratch buffer is a gap buffer from what I can tell, actually it is a piece table, which is great to see. I assumed it was a gap buffer because of the string always being appended to then the end pointers being fixed up. While looking at the other parts of the source code, it's apparent there is a ton of optimization all over. Maybe I should start writing all my blog posts in Word 1.1, and use a rtf -> html converter...

And that's all folks! Have a nice week. I hope you enjoyed that. I certainly did. Want to help me out? Email me about JavaScript work (Angular or Node! ES6! Waaa!!). Alright, I'm going to play Age of Empires 2. Good night.

Comments

  1. Looks to me like Orion also defaults to using a JS array:

    https://github.com/eclipse/orion.client/blob/3f256ab112f1d3f127b674a8da63eee671e57e8a/bundles/org.eclipse.orion.client.editor/web/orion/editor/textModel.js

    Line 39:
    https://github.com/eclipse/orion.client/blob/3f256ab112f1d3f127b674a8da63eee671e57e8a/bundles/org.eclipse.orion.client.editor/web/orion/editor/textModel.js#L39

    ReplyDelete
  2. Great blog post and really helpful and your blog are very interesting midnightinfo

    ReplyDelete

Post a Comment

Popular Posts