Serge Toarca

Startups, AI, and macroeconomics.

  1. Content-defined chunking: unreasonably effective compression

    Content-defined chunking runs approximately as fast as gzip and can achieve compression ratios better than 100:1 on certain classes of data.[1]

    Suppose we have several webpages that frequently get updated, and we want to store every update of every page. The most naive way to do it would be to make a full copy of the visited page every time, and use a standard compression algorithm like zstd on each copy. For simplicity, let's say that each page is about 100kB and that the compression ratio of zstd on the typical page is 5:1.[2]

    To store n updates, we would require n * 20kB of storage. Can we do better? We notice that occasionally, one of the updates is identical to a previous one. That sounds like an opportunity to improve. Before saving, we hash the content to check for duplication. We only save the content and

    read more ⇁

  2. BitCollapse: Bitcoin's hard problem of latency

    High latency between miners gives an exponential advantage to miners closer to the highest concentration of compute. Unless this problem is solved, it is not possible to mine Bitcoin on multiple planets at the same time. Mining "collapses" to the single largest low-latency mining cluster.

    Consider the following thought experiment.

    There are exactly 2 computers in the universe, sitting 60 light minutes away from each other. Each is exactly as powerful as the other, and both are spending all of their compute mining bitcoin. Suppose they both start mining the same block at the same time ("same time" according to an observer sitting halfway between them). After about 10 minutes, one of them (call it computer A) will have mined the first block. A sends it over to B.

    But it takes 60 minutes for the block to arrive at computer B. In that time, B will have mined an

    read more ⇁

  3. Why the hell am I building a product with a tiny market?

    This article was originally published on

    Two months ago, I launched a regex tester.

    Why would I ever build a product around helping people with their regular ex­pres­sions? The market is tiny. There are dozens of free al­ter­na­tives, and only a small percentage of people I've asked said they would pay for my product.

    There are a few big advantages to be had competing in a smaller market.

    In a smaller market, everything happens on a smaller scale. Successes and failures are smaller in magnitude and take less time to pan out. Less effort is required to build a competitive product, since the existing ones are not as well-developed. The result is a tighter feedback loop for your learning. Debuggex is my first product, so I want to optimize for learning and profit rather than just profit.

    To date, I've spent less than

    read more ⇁