How MDN’s autocomplete search works

Last month, Gregor Weber and I added an autocomplete search to MDN Web Docs, that allows you to quickly jump straight to the document you’re looking for by typing parts of the document title. This is the story about how that’s implemented. If you stick around to the end, I’ll share an “easter egg” feature that, once you’ve learned it, will make you look really cool at dinner parties. Or, perhaps you just want to navigate MDN faster than mere mortals.

MDN's autocomplete search in action

In its simplest form, the input field has an onkeypress event listener that filters through a complete list of every single document title (per locale). At the time of writing, there are 11,690 different document titles (and their URLs) for English US. You can see a preview by opening https://developer.mozilla.org/en-US/search-index.json. Yes, it’s huge, but it’s not too huge to load all into memory. After all, together with the code that does the searching, it’s only loaded when the user has indicated intent to type something. And speaking of size, because the file is compressed with Brotli, the file is only 144KB over the network.

Implementation details

By default, the only JavaScript code that’s loaded is a small shim that watches for onmouseover and onfocus for the search <input> field. There’s also an event listener on the whole document that looks for a certain keystroke. Pressing / at any point, acts the same as if you had used your mouse cursor to put focus into the <input> field. As soon as focus is triggered, the first thing it does is download two JavaScript bundles which turns the <input> field into something much more advanced. In its simplest (pseudo) form, here’s how it works:

<input 
 type="search" 
 name="q"
 onfocus="startAutocomplete()" 
 onmouseover="startAutocomplete()"
 placeholder="Site search..." 
 value="q">
let started = false;
function startAutocomplete() {
  if (started) {
    return false;
  }
  const script = document.createElement("script");
  script.src = "/static/js/autocomplete.js";
  document.head.appendChild(script);
}

Then it loads /static/js/autocomplete.js which is where the real magic happens. Let’s dig deeper with the pseudo code:

(async function() {
  const response = await fetch('/en-US/search-index.json');
  const documents = await response.json();
  
  const inputValue = document.querySelector(
    'input[type="search"]'
  ).value;
  const flex = FlexSearch.create();
  documents.forEach(({ title }, i) => {
    flex.add(i, title);
  });

  const indexResults = flex.search(inputValue);
  const foundDocuments = indexResults.map((index) => documents[index]);
  displayFoundDocuments(foundDocuments.slice(0, 10));
})();

As you can probably see, this is an oversimplification of how it actually works, but it’s not yet time to dig into the details. The next step is to display the matches. We use (TypeScript) React to do this, but the following pseudo code is easier to follow:

function displayFoundResults(documents) {
  const container = document.createElement("ul");
  documents.forEach(({url, title}) => {
    const row = document.createElement("li");
    const link = document.createElement("a");
    link.href = url;
    link.textContent = title;
    row.appendChild(link);
    container.appendChild(row);
  });
  document.querySelector('#search').appendChild(container);
}

Then with some CSS, we just display this as an overlay just beneath the <input> field. For example, we highlight each title according to the inputValue and various keystroke event handlers take care of highlighting the relevant row when you navigate up and down.

Ok, let’s dig deeper into the implementation details

We create the FlexSearch index just once and re-use it for every new keystroke. Because the user might type more while waiting for the network, it’s actually reactive so executes the actual search once all the JavaScript and the JSON XHR have arrived.

Before we dig into what this FlexSearch is, let’s talk about how the display actually works. For that we use a React library called downshift which handles all the interactions, displays, and makes sure the displayed search results are accessible. downshift is a mature library that handles a myriad of challenges with building a widget like that, especially the aspects of making it accessible.

So, what is this FlexSearch library? It’s another third party that makes sure that searching on titles is done with natural language in mind. It describes itself as the “Web’s fastest and most memory-flexible full-text search library with zero dependencies.” which is a lot more performant and accurate than attempting to simply look for one string in a long list of other strings.

Deciding which result to show first

In fairness, if the user types foreac, it’s not that hard to reduce a list of 10,000+ document titles down to only those that contain foreac in the title, then we decide which result to show first. The way we implement that is relying on pageview stats. We record, for every single MDN URL, which one gets the most pageviews as a form of determining “popularity”. The documents that most people decide to arrive on are most probably what the user was searching for.

Our build-process that generates the search-index.json file knows about each URLs number of pageviews. We actually don’t care about absolute numbers, but what we do care about is the relative differences. For example, we know that Array.prototype.forEach() (that’s one of the document titles) is a more popular page than TypedArray.prototype.forEach(), so we leverage that and sort the entries in search-index.json accordingly. Now, with FlexSearch doing the reduction, we use the “natural order” of the array as the trick that tries to give users the document they were probably looking for. It’s actually the same technique we use for Elasticsearch in our full site-search. More about that in: How MDN’s site-search works.

The easter egg: How to search by URL

Actually, it’s not a whimsical easter egg, but a feature that came from the fact that this autocomplete needs to work for our content creators. You see, when you work on the content in MDN you start a local “preview server” which is a complete copy of all documents but all running locally, as a static site, under http://localhost:5000. There, you don’t want to rely on a server to do searches. Content authors need to quickly move between documents, so much of the reason why the autocomplete search is done entirely in the client is because of that.

Commonly implemented in tools like the VSCode and Atom IDEs, you can do “fuzzy searches” to find and open files simply by typing portions of the file path. For example, searching for whmlemvo should find the file files/<b>w</b>eb/<b>h</b>t<b>ml</b>/<b>e</b>lement/<b>v</b>ide<b>o</b>. You can do that with MDN’s autocomplete search too. The way you do it is by typing / as the first input character.

Activate "fuzzy search" on MDN

It makes it really quick to jump straight to a document if you know its URL but don’t want to spell it out exactly.
In fact, there’s another way to navigate and that is to first press / anywhere when browsing MDN, which activates the autocomplete search. Then you type / again, and you’re off to the races!

How to get really deep into the implementation details

The code for all of this is in the Yari repo which is the project that builds and previews all of the MDN content. To find the exact code, click into the client/src/search.tsx source code and you’ll find all the code for lazy-loading, searching, preloading, and displaying autocomplete searches.

About Peter Bengtsson

Peter is a staff web developer at Mozilla working on MDN Web Docs. He blogs on www.peterbe.com

More articles by Peter Bengtsson…


11 comments

  1. Sea Man

    So you overrode a useful key to make it less useful? The slash activates Firefox’s quick search by default. Seems like an odd choice to override that.

    August 3rd, 2021 at 10:47

    1. Peter Bengtsson

      It was a trade-off we picked. I think the inspiration came from how Algolia does/did it.
      I think the reason I never noticed myself was because I use ⌘-f to find in page.

      Either way, putting focus into the search field is orthogonal to what happens when you’re in the input widget.

      By the way, you can actually use ? too to the same effect. Is that also problematic as a shortcut?

      Would you mind joining us on https://github.com/mdn/yari/issues/new and we can dig into the details. Your feedback is much appreciated!

      August 3rd, 2021 at 10:57

  2. Patrick H. Lauke

    Screen reader behaviour could be improved a tad. The a11y-status-message container is not always updated correctly – try typing in “test” letter by letter; on the first letter, it announces that 10 results are available; any subsequent letters that are typed in (and change the number of results) don’t update the message container/make any announcement. Other words are fine though … e.g. “axes” announces the different numbers of results as you go along. It’s a bit flakey.

    Also, for keyboard users in general, once one of the suggestions is highlighted (after reaching it with cursor keys), but the user decides to bail out by using “Tab”, the result is triggered anyway as if they had pressed enter. It should just close the autocomplete results and move focus along.

    Pressing Esc to close the autocomplete popup leads to focus being lost. While browsers generally error correct for this, it’s not reliable. Focus should be explicitly moved to the search input again.

    August 3rd, 2021 at 18:26

    1. Peter Bengtsson

      I see you posted https://github.com/mdn/yari/issues/4407 and that’s very much appreciated. I hope you can join us in fixing it.

      August 4th, 2021 at 10:07

  3. Konstantin

    It’s an interesting post, but I’d like to point out that the focus on `/` does not seem to work with all keyboard layouts. Might be an a11y issue you might want to address.

    August 3rd, 2021 at 23:23

    1. Peter Bengtsson

      Does typing ? work with all keyboard layouts?

      August 4th, 2021 at 10:10

  4. Jonas Jensen

    This was a cool read, thank you for posting this.

    August 4th, 2021 at 06:08

  5. pmario

    The “/” key may be nice if you have the right keyboard layout but it doesn’t work if it is different eg: German, where you have to type: SHIFT-7 to get “/” …

    August 4th, 2021 at 12:31

    1. Peter Bengtsson

      So pressing Shift and 7 doesn’t register, in the event listener, as a `/` in the end?

      August 4th, 2021 at 13:15

      1. Sandro

        No, it doesn’t work (I use a Swiss German keyboard layout). The useFocusOnSlash hook (in search utils) checks the event.code property which is “Digit7” in that case. I need to press “-” (the key to the left of the right shift key) to focus the search input, this results in event.code = “Slash”.

        I’m not sure if event.code is the right property for that job as MDN says:
        > This property is useful when you want to handle keys based on their physical positions on the input device rather than the characters associated with those keys

        And that’s not what you want to do… I guess.

        August 7th, 2021 at 11:46

  6. Sosyallift

    Screen reader behaviour could be improved a tad. The a11y-status-message container is not always updated correctly – try typing in “test” letter by letter; on the first letter, it announces that 10 results are available; any subsequent letters that are typed in (and change the number of results) don’t update the message container/make any announcement. Other words are fine though … e.g. “axes” announces the different numbers of results as you go along. It’s a bit flakey.

    Also, for keyboard users in general, once one of the suggestions is Sosyallift highlighted (after reaching it with cursor keys), but the user decides to bail out by using “Tab”, the result is triggered anyway as if they had pressed enter. It should just close the autocomplete results and move focus along.

    Pressing Esc to close the autocomplete popup leads to focus being lost. While browsers generally error correct for this, it’s not reliable. Focus should be explicitly moved to the search input again.

    August 18th, 2021 at 14:59

Comments are closed for this article.