import * as React from 'react'
  /* @jsx mdx */
import { mdx } from '@mdx-js/react';
/* @jsxRuntime classic */

/* @jsx mdx */

import DefaultLayout from "/home/vincent/Documents/Develop/Web/PersonalWebsite/website/src/components/layout-markdown.tsx";
export const _frontmatter = {};
const layoutProps = {
  _frontmatter
};
const MDXLayout = DefaultLayout;
export default function MDXContent({
  components,
  ...props
}) {
  return <MDXLayout {...layoutProps} {...props} components={components} mdxType="MDXLayout">


    <p><span parentName="p" {...{
        "className": "gatsby-resp-image-wrapper",
        "style": {
          "position": "relative",
          "display": "block",
          "marginLeft": "auto",
          "marginRight": "auto",
          "maxWidth": "885px"
        }
      }}>{`
      `}<a parentName="span" {...{
          "className": "gatsby-resp-image-link",
          "href": "/static/b6a426411a7b16bcafd9d465d44c9665/efc66/2025-02-11-yaml-banner.png",
          "style": {
            "display": "block"
          },
          "target": "_blank",
          "rel": "noopener"
        }}>{`
    `}<span parentName="a" {...{
            "className": "gatsby-resp-image-background-image",
            "style": {
              "paddingBottom": "56.333333333333336%",
              "position": "relative",
              "bottom": "0",
              "left": "0",
              "backgroundImage": "url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAIAAADwazoUAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAqUlEQVQoz6WQRw7DMAwE85CIqiwqjpSCGPn/xyKXHJyjTBDEXma52ItPwTirQQ/MxSc8A4eTsNEAHYcB2HpnQHcHZ4xdd5urAqUW1813EwcYM4bSpL4f7f6q6dPyfMvPEueaa5Qambq5tc72F/2aI1zIsSBRFk6MicIkFClEDC1y14I+EQqGzP3aA0yT6N6XWgeWnEvaXahd/xbgL3bh8bZx4rXtQViG4S9c9lQ9zrcSqgAAAABJRU5ErkJggg==')",
              "backgroundSize": "cover",
              "display": "block"
            }
          }}></span>{`
  `}<img parentName="a" {...{
            "className": "gatsby-resp-image-image",
            "alt": "Banner",
            "title": "Banner",
            "src": "/static/b6a426411a7b16bcafd9d465d44c9665/efc66/2025-02-11-yaml-banner.png",
            "srcSet": ["/static/b6a426411a7b16bcafd9d465d44c9665/5a46d/2025-02-11-yaml-banner.png 300w", "/static/b6a426411a7b16bcafd9d465d44c9665/0a47e/2025-02-11-yaml-banner.png 600w", "/static/b6a426411a7b16bcafd9d465d44c9665/efc66/2025-02-11-yaml-banner.png 885w"],
            "sizes": "(max-width: 885px) 100vw, 885px",
            "style": {
              "width": "100%",
              "height": "100%",
              "margin": "0",
              "verticalAlign": "middle",
              "position": "absolute",
              "top": "0",
              "left": "0"
            },
            "loading": "lazy",
            "decoding": "async"
          }}></img>{`
  `}</a>{`
    `}</span></p>
    <p>{`I’m `}<a parentName="p" {...{
        "href": "/blog/architecture-docs-prototype/"
      }}>{`writing a software architecture visualization program in Zig`}</a>{` that allows both developers and designer easily navigate a codebase and understand product functionality. Since it combines automatically generated data with human input, I needed a data format that was easy to generate from different languages, yet also easy to edit by hand. So, I went with YAML.`}</p>
    <p>{`But, Zig is a new language and the various YAML parsers out there didn’t seem to do what I needed, mainly parsing of recursive data structures for the nested diagrams. Since this is a learning project and I was curious to experiment with different techniques to write parsers, I decided to write my own. Specifically, I wanted to see if I could make a zero-allocation parser and what its performance would be.`}</p>
    <p>{`But did I really want to write a full YAML parser? Of course not! The advantage of writing things from scratch is that you know exactly what you need and can choose the trade-off between features and complexity/speed. So no crazy features and ‘yes’ and ‘no’ are strings, not boolean values.`}</p>
    <p>{`There were 3 parts I needed to write: the generic YAML lexer, the generic YAML parser and the application-specific parser that turns the parsed YAML into the actual structs my application uses while running. The lexer divides the source file into chunks that are a bit easier to process. So { foo: 5 } would give the tokens “object_start, “whitespace”, “single_line_string”, “colon”, “whitespace”, “integer”, “whitespace” and “object_end”. What I wanted as an end result is “object_entry_start” (storing the start and end indices of “foo” in the source), “integer” and “object_entry_end”. The final high-level parser need to turn this into struct { foo: number }.`}</p>
    <p>{`The lexer was pretty quick and fun to write. I store a pointer to the source and the start index of the next text to parse. Most of it was pretty simple, but I had the tendency to do too much in the lexer. I tried to skip comments in the lexer, but corrected that to do it in the parser. But, I actually kept track of how deep I am in arrays and objects, so I could differentiate between object keys and strings, which should’ve been the parser’s job. This introduced some needless complexity like keeping track of deep we’re nesting objects/arrays. But overall, most code looks like this and is pretty easy to follow.`}</p>
    <p><span parentName="p" {...{
        "className": "gatsby-resp-image-wrapper",
        "style": {
          "position": "relative",
          "display": "block",
          "marginLeft": "auto",
          "marginRight": "auto",
          "maxWidth": "938px"
        }
      }}>{`
      `}<a parentName="span" {...{
          "className": "gatsby-resp-image-link",
          "href": "/static/2b72f9c8d40f88fdf942abd6c8e915f4/dc333/2025-02-11-yaml-lexer-zig.png",
          "style": {
            "display": "block"
          },
          "target": "_blank",
          "rel": "noopener"
        }}>{`
    `}<span parentName="a" {...{
            "className": "gatsby-resp-image-background-image",
            "style": {
              "paddingBottom": "94.66666666666667%",
              "position": "relative",
              "bottom": "0",
              "left": "0",
              "backgroundImage": "url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAATCAIAAAAf7rriAAAACXBIWXMAAA7EAAAOxAGVKw4bAAACkklEQVQ4y2VUWa7bMAzMQRpb+75LtmXnJX0Butz/RqWdpM1DaUHgh6gZckY+EWmpraZkPzdVm5gKn6ts2dfMYlIuciOxUcQozDka0fgWp+F8Po/YSu+5laHZuti2mtKEYKOQI0LDOA6wo3GAo9++nd/ihDCG2xiXqW22zGm5TH1ZLlvvNc+LjskawymDA9IGYZyUkgtFGacYnxACLkhLF0K1dU39GloP0+pjVGniUsHVCGH0jGcyHvxPj5wDcll25H5pfZl35BLaopwzSjEKnxDWS22kFIQJxtizGG5Wyoc0u7al/gE1ae7aR99Wm5s0HhbjnDOOqHxR+IeMGBXBFT+voV9dmcJ8CWWG3cVonHN1MbFwRgkZhmGf8/CY9t4H3juhhLkwlX7ba1IVgqtYoSxOlzBtLmTN5RehXshQPuIRMUwl5aDIvoZjHZKASpCP/8WLNqPaRaoVjZ4FL5wmhGDKMOOE8eEtXpTfaENGKMvbvVzupbVca1o20Fz7IJSBesoFHKCUoqPHL8iEIMqZdFEoDWGsM85b54QQ43mgUoM9GKgkOCIMY/Is3mHBJApM0nSsMC2bJ5ebdd6kqlOV1ivjuRAUQ+FDqvEBeXowoWBA5Wzb8uUzztuuswsgT1q/h6n7VDhgAkFK91kcAcmTNqRMe1dXENmVAj37MqV5i1N3wN5ZsPdDFnRUjkffp4dZCBdA2Fdw9ZJbrS3HWmK/6ZAFeFJrMBg4dB8KvAwpMBWUoOe0tfY+VD9tcfnYX8W02ph1XZnUh+zDF2f9lWrf0Mhg0i67CvZeAc3V7jL8IWahQCEMV8Bj2I2E3iUfTkBCHRFgLP3a77+X249pu9XW6nqdr/f182f//LV83HNMWsn3+AO6nbJLZrD2hgAAAABJRU5ErkJggg==')",
              "backgroundSize": "cover",
              "display": "block"
            }
          }}></span>{`
  `}<img parentName="a" {...{
            "className": "gatsby-resp-image-image",
            "alt": "Snippet of Lexer",
            "title": "Snippet of Lexer",
            "src": "/static/2b72f9c8d40f88fdf942abd6c8e915f4/dc333/2025-02-11-yaml-lexer-zig.png",
            "srcSet": ["/static/2b72f9c8d40f88fdf942abd6c8e915f4/5a46d/2025-02-11-yaml-lexer-zig.png 300w", "/static/2b72f9c8d40f88fdf942abd6c8e915f4/0a47e/2025-02-11-yaml-lexer-zig.png 600w", "/static/2b72f9c8d40f88fdf942abd6c8e915f4/dc333/2025-02-11-yaml-lexer-zig.png 938w"],
            "sizes": "(max-width: 938px) 100vw, 938px",
            "style": {
              "width": "100%",
              "height": "100%",
              "margin": "0",
              "verticalAlign": "middle",
              "position": "absolute",
              "top": "0",
              "left": "0"
            },
            "loading": "lazy",
            "decoding": "async"
          }}></img>{`
  `}</a>{`
    `}</span></p>
    <p>{`Writing the parser however, was a different story. I shot myself in the foot with trying to it zero-allocation. This introduced a ton of edge cases to deal with. For example, when you encounter a new line in YAML, you might actually be closing multiple objects and arrays at the same time. But since we only return one token at a time, we need to remember whether you have any object/array closes. And there were a few things like that. It would’ve been simpler to use an arena allocator which makes allocation much cheaper and maybe even improved performance because of less branching.`}</p>
    <p>{`The application-specific parsing was simple and fun again! Because the generic YAML parser doesn’t have any opinion on what you want to parse the data into, I could easily lay data out in a cache-friendly way. For example, instead of storing the tree as nested hash maps, I could use structs of arrays to store nodes in one linear array, their names in another one and when rendering them, I could just iterate iterate over the node array because they are already laid out depth-first.`}</p>
    <p>{`Now, how performant is this? From my first tests, it parses at about 122mb/s using a ReleaseFast optimized build (not counting time spent reading from disk), which goes down to 96mb/s using ReleaseSafe and 27mb/s using Debug. Using ReleaseFast, my big, but simple stress test file of 123mb or 2,351,461 lines parses in just under a second, meaning 470,292 lines per second on my Intel i7-12700H CPU. But, how much faster could this be? That’s the whole reason I started this project, to be able to have a playground to reason about these kinds of questions, and get better at using the CPU at it’s full speed.`}</p>
    <p>{`For now however, I’m busy playing around with other features like parsing a Typescript codebase to see if I can generate useful information (in YAML) about it’s architecture to visualize in this program. For those curious, the full parser is available as a Gist on GitHub `}{`[1]`}{`, but I don’t yet have time to turn it into a proper package.`}</p>
    <p>{`Are you interested in this kind of stuff? Feel free to reach out, and I may perform and write about performance experiments that you can follow along with  :)`}</p>

    </MDXLayout>;
}
;
MDXContent.isMDXComponent = true;
      