Skip to main content

· 77 min read

image-20260110002236729

CRDT​

đŸ”ïž 2025-11-29 The CRDT Dictionary: A Field Guide to Conflict-Free Replicated Data Types - Ian Duncan - Ian Duncan { www.iankduncan.com }

image-20251129110810492 image-20251129112659154


Explains how to design and use conflict free replicated data types to handle concurrent updates without coordination, walking through the core idea of lattices and monotone joins, state based vs operation based variants, and concrete structures for counters, sets, registers, maps, and sequences. Shows how different semantics arise (grow only, two phase, last write wins, add wins, multi value) and how to compose these pieces into practical data structures like shopping carts, collaborative text, and replicated maps, including causal and delta based optimizations.

Digs into the real tradeoffs: metadata growth, tombstones, garbage collection, causal tracking, bandwidth, and the need for supporting protocols like causal broadcast. Stresses that nothing is free; each structure trades coordination for more state and weaker semantics, so the right choice depends on operations needed, tolerance for lost updates, and operational constraints, with a strong push to treat CRDTs as a targeted tool to be combined and tuned rather than a default magic solution.


  • G-Counter: Grow-only counter where each replica keeps its own count and merge takes per-replica max; use for monotonic metrics like page views, likes, or any count that only increases.
  • PN-Counter: Counter built from two G-Counters (increments and decrements) whose values are subtracted; use for inventory, resource pools, or any count that must go up and down.
  • G-Set: Grow-only set that supports add and merge=union but no removals; use for append-only collections like tag registries, logs of seen items, or immutable membership.
  • 2P-Set: Two-phase set with separate grow-only add and remove sets where removal is permanent; use when elements can be created then permanently retired but never re-added (e.g., tombstones, revoked IDs).
  • LWW-Element-Set: Set that tracks per-element add/remove timestamps and lets the latest operation win; use when you need add/remove/re-add and can tolerate last-write-wins data loss (preferences, feature flags, cached sets).
  • OR-Set: Observed-remove set that tracks per-element tags so removes only delete observed additions, giving add-wins semantics; use when concurrent adds must never be lost (collaborative lists, shopping carts, shared sets).
  • LWW-Register: Single-value cell with a timestamped value where the latest timestamp wins; use for fields where occasional lost concurrent updates are acceptable (profile fields, cached config).
  • MV-Register: Multi-value register that stores all concurrent writes instead of discarding them; use when you must detect and resolve conflicts in application logic (collaborative text fields, conflict-aware configs).
  • Causal Register: Register keyed by version vectors that keeps only values with concurrent causal histories; use when you want MV-Register behavior plus precise causal conflict detection and better GC.
  • OR-Map: Map whose keys and/or values are backed by OR-Set semantics, often with nested CRDTs per value; use for replicated JSON-like documents, distributed configuration maps, and nested structures.
  • RGA (Replicated Growable Array): Sequence where elements have immutable IDs and parent links, supporting inserts-after and tombstoned deletes; use for collaborative text or lists where arbitrary-position inserts must merge cleanly.
  • WOOT: Sequence CRDT representing characters as objects with prev/next links and visibility flags, resolving order via constraints; use mainly as a historical or academic model, not typically in new production systems.
  • Logoot: Sequence CRDT assigning each element a dense ordered position identifier; use for collaborative sequences when you prefer position-based ordering over pointer-based structures.
  • LSEQ: Variant of Logoot with adaptive position allocation to keep identifiers shorter; use as a practical improvement over plain Logoot when identifier growth is a concern.
  • Tree CRDTs: Family of structures for replicated trees that preserve parent-child relationships under concurrency; use when you truly need CRDT-level guarantees over hierarchical data like file trees or document outlines.
  • OR-Tree: Tree CRDT that stores an OR-Set of parents per node and resolves parent conflicts with policies like LWW or first-wins; use for replicated hierarchies where concurrent moves must be reconciled automatically.
  • CRDT-Tree: Tree design that relies on causal ordering of move operations to pick winners; use when you already enforce causal delivery and want deterministic, causality-driven resolution of structural conflicts.
  • Log-based Trees: Tree approach that logs operations and rebuilds structure on read from a replicated log; use when reads can afford reconstruction cost and you want simple, append-only operational histories.
  • Delta CRDTs: Any state-based CRDT extended with a delta mechanism that sends only changes instead of full state; use whenever state is large or bandwidth is a concern, especially in production systems.
  • Causal CRDTs (e.g., Causal OR-Set, causal maps): CRDTs augmented with version vectors or similar clocks to track happens-before and prune dominated history; use when you need precise conflict classification and safer garbage collection.
  • Causal OR-Set: OR-Set variant that attaches version vectors to tags and uses them to decide what metadata can be safely discarded; use for long-lived sets where tag GC matters and causal tracking is already in place.
  • CheckpointedCRDT: Wrapper pattern that periodically compacts history into a baseline snapshot plus recent deltas; use when most replicas are online often and you want aggressive pruning at the cost of occasional full resyncs.
  • Observed-Remove Shopping Cart (OR-Set + PN-Counter): Composite CRDT mapping products to PN-Counters under OR-Set semantics; use for offline-capable carts where concurrent adds/removes and quantity changes must merge without data loss.

  1. The term “Conflict-free Replicated Data Type” was coined by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski in their 2011 paper “Conflict-free Replicated Data Types” (technical report) and the 2011 SSS conference paper “A comprehensive study of Convergent and Commutative Replicated Data Types”. The theoretical foundations draw from earlier work on commutative replicated data types and optimistic replication.
  2. WOOT was introduced by Oster, Urso, Molli, and Imine in “Data Consistency for P2P Collaborative Editing” (2006). The name is a play on “OT” (Operational Transformation), emphasizing that it achieves similar goals “WithOut OT.” WOOT was one of the first practical sequence CRDTs and influenced many subsequent designs.
  3. State-based CRDTs are also called “convergent” replicated data types (CvRDT). The “Cv” stands for “convergent” - emphasizing that replicas converge to the same state by repeatedly applying the join operation.
  4. Operation-based CRDTs are also called “commutative” replicated data types (CmRDT). They require causal delivery of operations - if operation A happened before operation B on the same replica, B must not be delivered before A at any other replica.
  5. The G-Counter appears in Shapiro et al.’s 2011 technical report “A Comprehensive Study of Convergent and Commutative Replicated Data Types” as one of the foundational examples demonstrating CRDT principles.
  6. The space complexity is O(n) where n is the number of replicas, not the number of increments. This means G-Counters scale well with the number of operations but require tracking all replicas that have ever incremented the counter.
  7. The OR-Set (Observed-Remove Set) was introduced by Shapiro et al. in their 2011 technical report. It’s also known as the “Add-Wins Set” because concurrent add and remove operations result in the element remaining in the set. The key innovation is using unique tags to distinguish between different additions of the same element.
  8. Sequence CRDTs are particularly challenging because positional indices change as elements are inserted or deleted. Unlike sets or counters where elements have stable identity, sequences must maintain ordering despite concurrent modifications at arbitrary positions.
  9. RGA was introduced by Roh et al. in “Replicated Abstract Data Types: Building Blocks for Collaborative Applications” (2011). The name “Replicated Growable Array” emphasizes that it’s an array-like structure that can grow through replication.
  10. YATA (Yet Another Transformation Approach) was developed by Kevin Jahns for the Yjs collaborative editing library. It combines ideas from RGA and WOOT while optimizing for the common case of sequential insertions (typing). Yjs is used in production by companies like Braid, Row Zero, and others for real-time collaboration.
  11. Version vectors were introduced by Parker et al. in “Detection of Mutual Inconsistency in Distributed Systems” (1983). They extend Lamport’s logical clocks to track causality in distributed systems. Each replica maintains a vector of logical clocks (one for each replica), enabling precise causal ordering without requiring synchronized physical clocks.
  12. Delta CRDTs were introduced by Almeida, Shoker, and Baquero in “Delta State Replicated Data Types” (2018). They bridge the gap between state-based and operation-based CRDTs, achieving operation-based bandwidth efficiency while maintaining state-based simplicity. Most production CRDT systems (Riak, Automerge) use delta-state internally.
  13. Logoot was introduced by Weiss, Urso, and Molli in “Logoot: A Scalable Optimistic Replication Algorithm for Collaborative Editing” (2009). The name combines “log” (logarithmic complexity) with “oot” from WOOT, its predecessor. Logoot’s position-based approach influenced many subsequent CRDTs including LSEQ and Treedoc.
  14. LSEQ was introduced by NĂ©delec, Molli, MostĂ©faoui, and Desmontils in “LSEQ: An Adaptive Structure for Sequences in Distributed Collaborative Editing” (2013). The key innovation is using different allocation strategies (boundary+ vs boundary-) based on tree depth, which keeps position identifiers shorter in practice compared to Logoot’s fixed strategy.
  15. Automerge, created by Martin Kleppmann and collaborators, implements a JSON CRDT described in “A Conflict-Free Replicated JSON Datatype” (2017). It uses a columnar encoding for efficiency and has been rewritten in Rust for performance. Used by production apps like Inkandswitch’s Pushpin.
  16. Yjs, created by Kevin Jahns, is optimized for text editing and uses the YATA algorithm. It’s notably faster than Automerge for text operations and includes bindings for popular editors like CodeMirror, Monaco, Quill, and ProseMirror.
  17. Riak, a distributed database from Basho, was one of the first production systems to adopt CRDTs (2012). It implements counters, sets, and maps as native data types, using Delta CRDTs internally to minimize bandwidth. Sadly, the company collapsed dramatically, and the project was abandoned for quite some time. I think it’s still around in a diminished form, but haven’t tried it in a while.
  18. Redis Enterprise’s CRDT support (Active-Active deployment) uses operation-based CRDTs with causal consistency. It supports strings, hashes, sets, and sorted sets with CRDT semantics, enabling multi-master Redis deployments.
  19. AntidoteDB is a research database from the SyncFree project that makes CRDTs the primary abstraction. Unlike other databases where CRDTs are a feature, AntidoteDB is designed from the ground up around CRDT semantics, providing highly available transactions over CRDTs.

2025-12-06 Martin Kleppmann CRDTs: The hard parts - YouTube { www.youtube.com }

image-20251206120639388The talk introduces conflict free replicated data types as a way to build collaboration software where several people can edit shared state, such as documents, graphics, or task boards, even while offline, and then have all changes merged automatically without manual conflict resolution.

A conflict free replicated data type is a data structure that guarantees all replicas end up in the same state after exchanging all updates, without needing central coordination.

One established approach to collaborative text editing is operational transformation, where every change is recorded as an operation like insert or delete at a numeric index in the document, and when concurrent edits arrive, their positions are transformed so they still apply correctly to the modified document, a process that assumes all operations are totally ordered by a single server.

Operational transformation is a method where edits are indexed by position and later adjusted so they still make sense after other edits change the document.

A key limitation of that older family of algorithms is the reliance on a central server that sequences all edits, which prevents using peer to peer channels, local networks, or offline media to synchronize, because any side channel would break the single ordered stream of operations the method depends on.

The newer family of replicated data types solves the same general problem but avoids indexes and central ordering by giving each element in the document a unique identifier, allowing edits to commute regardless of network topology, and targeting a core correctness property called convergence: if two replicas have seen the same set of operations, they must be in the same state, no matter the order of delivery.

Convergence means that any two replicas that have processed the same updates, in any order, must show exactly the same data.

However, convergence alone is not sufficient, because it says only that everyone ends up in the same state, not that this state is meaningful or desirable for users; many simple designs converge to results that are technically consistent but clearly wrong or unusable from a human perspective, so additional constraints and better algorithms are needed.

One common design for text in these data types is to represent each character with a fractional position between 0 and 1 instead of an index, assigning numbers like 0.2, 0.4, 0.6, and 0.8 to successive characters, and when inserting between two positions, choosing any number between them, perhaps randomly, which allows new characters to be ordered without shifting indexes.

With that scheme, if two people independently insert different words at the same place, they both generate multiple positions in the same numeric interval between two existing characters, and when those sets of characters are merged and sorted by position, the letters from both words can become arbitrarily interleaved, producing output that is a jumble of mixed characters rather than two readable words in sequence.

This interleaving anomaly has been found in at least two specific list based algorithms in the literature, where the design of their position identifiers makes it impossible to prevent such mixing without completely changing the algorithm, while other schemes that do not suffer from this issue, like some tree based ones, are much less efficient and were partly the motivation for the problematic designs.

An interleaving anomaly is when concurrent inserts at the same place are merged in a way that mixes characters or chunks from different users, producing unreadable or surprising text.

Another widely discussed list algorithm uses a different structure: each inserted element remembers its predecessor at the time of insertion, forming a tree like structure based on cursors; this avoids arbitrary character level interleaving in typical use, but still allows block level interleaving where whole words or segments can end up woven between each other under some cursor movement patterns.

In this predecessor based structure, a user might type “dear reader” by first inserting “reader” and then moving the cursor back and inserting “dear”, while another user inserts “Alice” at the same place; depending on the order of operations, outcomes like “hello dear Alice reader” are allowed, which are sometimes acceptable but show that concurrent insertions can slip between earlier insert segments.

The worst theoretical behavior of that structure occurs if a user types the entire document backwards, constantly jumping the cursor to the front, which would allow arbitrary character interleaving, but under the realistic assumption that people mostly type forward with occasional cursor moves, the problem is much smaller, and this makes that algorithm more attractive than ones with inherent character scrambling.

The speaker and collaborators prefer this predecessor based approach over the more pathological schemes and have developed an extended version that eliminates even these less severe interleavings by refining the insertion rules; the details and proofs are in a separate research paper that formalizes the problem and presents a corrected variant.

The talk then turns to moving items in lists, as in a to do app where a user drags an item like “Phone Joe” to the top, and points out that existing list structures built for text only support insertion and deletion, so developers often simulate a move by deleting at the old position and inserting at the new one.

If two replicas simulate a move in this way and both perform the same move concurrently, each one deletes the old item once but inserts it at the new position twice, so when their changes are merged the list contains duplicated items, which is not what users expect when they drag a single entry.

To define more reasonable behavior, the speaker considers the case where two people move the same item to different places: instead of duplicating, a more useful semantics is that the item appears only once in the final list, at one of the requested positions chosen arbitrarily but deterministically, which mirrors how a last writer wins register handles conflicting updates.

A last writer wins register is a variable where concurrent writes are resolved by picking a single winner deterministically, usually based on timestamps or IDs.

Using this analogy, each list item can have an associated value that describes its position, stored in a last writer wins register, and when different replicas move the item, they simply assign different position values; when merged, the register chooses one winning position, so the item appears only once, and all replicas agree on where it ended up.

To implement this, the design reuses any existing list structure that already produces stable, unique position identifiers for arbitrary insertion points, and combines it with a set structure that holds items and their position registers, so moving an entry becomes “allocate a new position ID where you want it to appear, and update the item’s register to that ID,” giving a move operation that works with any underlying list algorithm.

This compositional construction yields a new list data type that supports atomic moves of single items with sensible semantics under concurrency, without modifying the underlying sequence structure, and shows how combining simple replicated components such as sets, registers, and list position IDs can express more complex operations.

The talk then examines moving ranges of text, such as moving a whole list item represented as a line of characters with a bullet and newline, and shows a counterexample where one replica moves the range “milk” above “bacon” while another edits “milk” into “soy milk”; the intuitive desired result is that the edited text appears in its new position.

Applying the single item move construction naïvely to each character or range causes edits to remain tied to the old positions, so the moved copy is “milk” while the edit turns the original location into a partially applied change like a stray “soy m” after “bacon”, revealing that properly moving segments while preserving concurrent edits is significantly harder than moving individual logical items.

The speaker notes that they do not yet have a fully satisfactory and safe general solution for moving ranges of characters with correct interaction with concurrent edits, and that this remains an open research question that others are invited to work on.

The next topic is moving nodes in tree shaped data structures such as file systems, JSON documents, or XML, where nodes represent directories or objects and operations can move an entire subtree from one parent to another, and the same concurrency issues appear when multiple replicas move the same node.

If one replica moves a node under B and another concurrently moves it under C, simple strategies include duplicating the subtree so it appears under both parents, or treating the structure as a general graph where a node can have multiple parents, but both are undesirable in many applications that expect a proper tree, so again the best option is to choose one destination as the winner and discard the other move.

Trees add a second challenge absent from linear lists: cycles, as illustrated by trying to move directory A into its own child B or more subtly by two replicas moving A under B and B under A concurrently, which can create a loop in the parent pointers and break the tree structure if not detected and prevented.

Real systems like file systems detect direct self moves and reject them as invalid, but concurrent cross moves from different replicas cannot be caught locally in the same simple way, and the talk describes experiments where cloud storage failed with vague errors under such patterns, motivating a rigorous algorithm that guarantees the tree remains acyclic.

To handle these moves, each operation is represented with a globally unique timestamp (for example a Lamport clock), the identifier of the node being moved, the new parent, and some metadata like a local name, and all operations on a replica are conceptually ordered by their timestamps so that the effect of concurrent moves can be judged in a consistent historical order.

Because operations may arrive out of timestamp order, the algorithm maintains a log that supports undo and redo: when an operation with an earlier timestamp arrives, the system temporarily undoes all later operations, applies the new one, and then reapplies the undone ones, so that, in effect, the tree has been modified as if all operations were processed strictly in timestamp order.

The cost of this backward and forward replay grows with the number of operations processed, but experiments with three replicas on different continents performing many moves show that even with this overhead, a simple implementation can handle on the order of hundreds of moves per second, which is sufficient for interactive applications where humans generate edits relatively slowly.

Within this framework, the algorithm defines an ancestor relation on nodes: a is an ancestor of b if it is the parent of b or the parent of some ancestor of b; before applying a move of child under parent, it checks whether child is already an ancestor of parent or identical to parent, and if so, it discards the move because it would introduce a cycle.

If the move passes this check, the operation removes the old parent child edge from the tree and inserts the new one with its metadata; the authors prove that this preserves the tree properties of unique parents and absence of cycles, and that for any set of moves, the final tree is the same on all replicas, so the structure is a valid replicated data type.

The final major topic is performance and space overhead, especially for text, where each character carries not only its byte of content but also a position identifier, an actor identifier, and additional metadata, so the per character overhead can easily be tens or hundreds of bytes, making naive implementations impractical.

The speaker reports on work in the AutoMerge project using a real dataset: the full editing history of an academic paper written in a custom editor that logged every keystroke and cursor move, producing a final LaTeX file of about 100 kilobytes and roughly 300,000 recorded changes including insertions, deletions, and cursor movements.

Storing this history as a simple JSON log of operations yields about 150 megabytes, which compresses to around 6 megabytes with gzip, but by redesigning the storage format they can encode the same full history in about 700 kilobytes, a roughly 200x improvement over the naive encoding, without losing any information about past edits.

They then explore further tradeoffs: discarding cursor movement events reduces the size by roughly a fifth, discarding full editing history while keeping only the data needed to merge the current state cuts it further down to a few hundred kilobytes, and if one also removes tombstones that track deleted characters, the metadata overhead shrinks to on the order of tens of kilobytes.

Tombstones in this context are markers that remember where deleted elements used to be so that concurrent edits can still be merged correctly.

One version of the compressed format, with history for text but not cursors and with merge relevant metadata retained, gzips to almost the same size as the raw LaTeX text, showing that with careful design, these data types can be implemented with overhead comparable to traditional version control while still supporting rich merging and offline edits.

The compression method keeps the idea of storing all operations with unique identifiers, often Lamport timestamps composed of a counter and an actor ID, and references predecessors (as in the predecessor based list algorithm) to specify where new characters are inserted, but organizes these operations into columns and encodes each column separately.

For a simple example, operations are tabulated with columns for timestamp counter, actor ID, predecessor reference, inserted text, length of the inserted UTF 8 sequence, and flags for deletion, then numeric columns are delta encoded so that successive values become small differences, run length encoded where repeated values occur, and finally written using variable length integer encoding that uses fewer bytes for small numbers.

The text column is compacted by concatenating the bytes of all inserted characters while lengths and deletion flags allow reconstructing which subsequences belong to which operations, so together with some modest metadata about event grouping and ranges of counters, the system can reconstruct the document at any past time while storing the entire operation log in a very compact binary representation.

The question period turns to delta based replicated data types, which combine characteristics of state based and operation based approaches by aggregating several adjacent changes into small deltas that can be merged idempotently, and the speaker notes that while this is natural for counters and sets, it is less useful for text and list structures where operations are insertions and deletions at specific places rather than arithmetic updates.

A delta based replicated structure sends compact summaries of recent changes instead of individual operations or full state, but still provides a merge function that can be applied repeatedly without changing the result.

Another question concerns snapshotting or garbage collecting the operation log used for undo and redo; the answer is that logs can be truncated safely once causal stability is reached, meaning all nodes are known to have applied all operations up to some timestamp, beyond which no older operations will arrive, but determining that point is hard in practice because a single offline node can delay the stability frontier.

Causal stability is the point in time up to which every replica has seen all updates, so older metadata can be safely discarded.

There is discussion about whether it still makes sense to use these replicated structures when a system already uses a single server for synchronization, and the speaker explains that historically operational transformation had an efficiency advantage for plain text in such settings, but the new metadata compression makes the newer approach competitive, while the latter also scales better to richer data types and multi data center server replication.

Comparing the two families, the speaker suggests that if a system only needs plain linear text and can rely on a robust single sequencer with well tested implementations, the older approach can be acceptable, but for applications that need trees, complex documents, or server side replication across data centers, the more recent data types provide a simpler correctness story and avoid the fragile single server requirement.

The final question asks about implementing modal editors like Vim on top of these structures, and the response is that most editor commands ultimately decompose into insertions, deletions, cuts, copies, and moves of ranges, which can in principle be expressed using the foundational operations discussed, though there is still open work on recognizing and coalescing sequences like cut then paste into semantic moves.

Throughout the answers, the speaker emphasizes that beyond formal convergence, the ultimate test for a merging strategy in editors is whether it matches human expectations in real use, and that much of the ongoing research is about refining the behavior of these replicated data types until they converge not just to a single state, but to one that users experience as natural and correct.

2025-11-30 dotJS 2019 - James Long - CRDTs for Mortals - YouTube { www.youtube.com }

image-20251130105613938

2025-11-30 jlongster/crdt-example-app: A full implementation of CRDTs using hybrid logical clocks and a demo app that uses it { github.com }

This is a demo app used for my dotJS 2019 talk "CRDTs for Mortals"

Slides here: https://jlongster.com/s/dotjs-crdt-slides.pdf

View this app here: https://crdt.jlongster.com

It contains a full implementation of hybrid logical clocks to generate timestamp for causal ordering of messages. Using these timestamps, CRDTs can be easily used to change local data that also syncs to multiple devices. This also contains an implementation of a merkle tree to check consistency of the data to make sure all clients are in sync.

It provides a server to store and retrieve messages, so that clients don't have to connect peer-to-peer.

The entire implementation is tiny, but provides a robust mechanism for writing distributed apps:

  • Server: 132 lines of JS
  • Client: 639 lines of JS

(This does not include main.js in the client which is the implementation of the app. This is just showing the tiny size of everything needed to build an app)

Links:


The talk starts from the question of why apps that work offline by design have not become common. The core claim is that making everything local - all code and data stored on the device - is straightforward, but the hard part is syncing that local state across devices without data loss or scary "changes may not be saved" errors.

The key step is to recognize that a local app used on multiple devices is a distributed system. Each device runs its own copy, can go offline, make changes, then later reconnect, and all those independent histories must be merged safely.

The speaker describes building a personal finance app that is fully local but syncs across devices. The design goals are instant offline availability, high speed, strong privacy, and the ability to run arbitrary queries, all of which naturally follow when all data lives on the device.

Because all data is local, the app can expose a query interface that directly compiles user input to SQLite, allowing custom reports and even query-like code from the user. This would be unsafe or unacceptable in a cloud environment, where arbitrary code on the server is a security and reliability risk, but is fine when it only touches the user's own local database.

The need for a mobile client to record transactions on the go forced the creation of a sync engine. The app's data is small - a few megabytes in SQLite - and the author refused to switch databases because SQLite's extremely fast reads are central to the user experience, so syncing had to be built as a thin layer on top of SQLite rather than as a replacement.

Syncing is described as hard because of two fundamental challenges: unreliable ordering of changes between devices, and conflicts when multiple devices edit the same data. The solution must run correctly 100 percent of the time, with no data loss and no irrecoverable states, because the app is local and cannot be "fixed" by refreshing a browser tab.

Unreliable ordering arises because different devices make changes in parallel and receive each other's updates at different times. If each device simply applied incoming operations in whatever order they arrived, the final states would diverge, since one client might apply A, C, D, B and another might apply B, A, D, C.

Back end systems traditionally deal with this by enforcing strong consistency, which relies on heavy coordination and complex algorithms. The talk instead advocates eventual consistency, where the system accepts that multiple timelines exist and is designed so that once every device has seen all the same changes, they all converge to the same state, regardless of the order in which those changes arrived.

Eventual consistency means every copy of the data ends up the same after all changes have been delivered, even if they were applied in different orders.

To get convergence under reordering, each change needs a timestamp that encodes its position relative to other changes on that device, not a wall clock time. The timestamp must capture what events the device had already seen when the new change was made, so that later merges can respect causal order.

The solution is to use logical clocks, such as vector clocks or hybrid logical clocks, that exist per device and generate timestamps that can be compared using a simple less-than comparison. The talk focuses on hybrid logical clocks, which produce string timestamps that combine physical time with logical counters while remaining easy to serialize and compare.

A hybrid logical clock is a per-device counter that mixes real time and a logical sequence so you can tell which of two events happened "later" without relying on a perfectly accurate clock.

Each change gets an HLC timestamp, and operations like "set X to value" are ordered by comparing these timestamps. In a last-write-wins strategy, the change with the larger timestamp wins. The important point is that these timestamps are not trusted as actual times, only as a consistent way to order events, and the full implementation can fit in a couple hundred lines of JavaScript without dependencies.

Even with reliable ordering, conflicts still happen when two devices set the same field while offline and then sync later. Many existing systems hand this problem off to developers by requiring manual conflict resolution logic, but the speaker argues this is unrealistic and error-prone, because conflict handling is subtle and must be designed into the data model from the beginning, not tacked on afterward.

Conflict-free replicated data types are presented as the solution to conflict handling in a distributed setting. These are special data structures that are designed so that concurrent updates can always be merged automatically in a well-defined way.

A CRDT is a data structure that you can copy to many devices and update in any order, and it will still end up the same everywhere when you merge the changes.

The specific flavor of these structures that matters in practice is defined by two properties. Operations must be commutative, meaning applying changes in different orders gives the same result, and idempotent, meaning applying the same change more than once does not change the result after the first time.

Commutative means you can swap the order of two operations and still get the same final state.

Idempotent means doing the same operation multiple times has the same effect as doing it once.

An example structure is a last-write-wins map. Here, each update to a property carries a timestamp, and when applying a change, the system checks whether the new timestamp is later than the one already stored. If it is later, it overwrites the value; if it is earlier, it is ignored. Because only the update with the newest timestamp is used per property, applying the same set of updates in any order yields the same map.

A last-write-wins map is a key-value map where, for each key, the value from the newest timestamp always wins over older values.

Another example is a grow-only set, where elements can be added but never removed. Duplicate additions have no effect because membership is just true or false for each id, and once true it stays true. In a distributed setting, the reason nothing is ever removed is that future changes might still reference an element that has not yet been seen locally, so permanent deletion would make merging unsafe.

A grow-only set is a set where you can only add elements and never delete them, so any number of adds for the same element has the same effect as one add.

To bring these ideas into a relational world, a SQLite table is treated as a grow-only set of last-write-wins maps, one map per row. The concrete implementation adds a single messages table to the database that records every change ever observed, whether created locally or received from another device.

Each message row contains a timestamp, the target dataset or table name, the row id, the column name, and the new value. Applying a message is conceptually like selecting a cell at (table, row id, column) and writing the value there, but only if the timestamp is newer than whatever has been recorded before for that cell.

If a message refers to a row id that does not yet exist, the system creates that row on the fly and sets the specified column. Over time, as more messages arrive, rows get more fields filled in, so the full relational data structure emerges from this stream of CRDT updates.

Reads stay simple and fast because the app still uses plain SQLite queries to access the reconstructed tables. Writes are routed through helper functions such as an update function that takes the table name, row id, and changed fields, generates messages with timestamps, and feeds them through the same sync pipeline that handles incoming messages from other devices.

Deletion is handled using tombstones instead of actually removing rows. A delete function generates a message that sets a special tombstone field on the row to 1, and read queries are written to ignore rows with the tombstone set. The row remains present in the underlying grow-only set so that future sync operations can still reason about it correctly.

A tombstone in this context is a flag on a record that marks it as deleted without physically removing it from the data set.

To keep devices efficiently in sync, the system can use a Merkle tree built over the set of timestamps. This tree of hashes summarizes which changes a device has seen, so two clients can quickly compare trees to figure out what messages they are missing and only exchange the necessary differences.

A Merkle tree is a tree of hashes that lets two sides compare large sets of data by comparing small hash values instead of every item.

The architecture lends itself to end-to-end encryption and a very lightweight sync server, because the server only needs to accept messages and send them back out, without needing to understand or inspect the actual data contents.

The talk emphasizes that data shapes should be designed to avoid conflicts altogether when possible. For example, a mapping table can be used so that category ids from different devices are mapped into a canonical set of categories, ensuring that cases like "item added in a category that was deleted elsewhere" resolve automatically to a safe default without manual conflict code.

The resulting sync implementation is surprisingly small: the server side is roughly a hundred lines of JavaScript that just stores and forwards messages, and the client side - including database handling, clocks, and CRDT logic - is only a few hundred lines with minimal dependencies. This demonstrates that robust local-first sync can be achieved with compact, understandable code.

The conclusion is that fully local applications provide a far superior experience in speed, offline behavior, privacy, and flexibility, and developers are encouraged to explore this direction using CRDTs, simple logical clocks, and deliberately small implementations instead of relying on complex, heavyweight systems.

2025-11-30 John Mumm - A CRDT Primer: Defanging Order Theory - YouTube { www.youtube.com }

image-20251129235241699


Imagine we are building the Birdwatch app from the talk: people click a little bird icon on a post, and we want to count how many times that has happened across several servers.

Step 1: We decide that each server will keep its own local copy of the counter, but instead of storing a single integer, each server stores a vector of integers. If we have three servers, the state looks like [c0, c1, c2], where c0 is how many clicks server 0 believes it has processed, c1 is how many clicks it believes server 1 has processed, and so on. At the very beginning, all servers start at [0, 0, 0].

Step 2: A user request to click the bird hits server 0. Server 0 handles that click by applying the local update. The update rule is: "increment my own slot in the vector." Since this is server 0, it increments the first component and changes its local state from [0, 0, 0] to [1, 0, 0]. The other servers have not seen this yet, so they still sit at [0, 0, 0].

Step 3: Another user click arrives at server 2. Server 2 uses the same rule, but on its own index. It increments the third component and changes its local state from [0, 0, 0] to [0, 0, 1]. Now the system has two different local views: server 0 believes the state is [1, 0, 0], server 2 believes it is [0, 0, 1], and server 1 still believes [0, 0, 0].

Step 4: Periodically, servers gossip their state to each other. Suppose server 0 sends its state [1, 0, 0] to server 2. When a server receives a remote state, it merges it into its own local one using the merge function. The merge rule is: "take the componentwise maximum." Server 2 merges [0, 0, 1] (its own) and [1, 0, 0] (received) and gets [max(0,1), max(0,0), max(1,0)] = [1, 0, 1]. After this merge, server 2 now knows that server 0 has seen one click and server 2 itself has seen one click.

Step 5: At any point, a client can ask a server, "what is the current value of the counter?" The rule for answering is simple: sum all components of the local vector. For server 2, which now holds [1, 0, 1], the visible count is 1 + 0 + 1 = 2. That is exactly the total number of clicks the whole system has processed so far, even though not all servers know this yet.

Step 6: A third click arrives, this time at server 1. Using the same update rule, server 1 increments its own slot and changes its local state from [0, 0, 0] to [0, 1, 0]. Now the true global situation, if we conceptually add everything up, is three clicks: one at server 0, one at server 1, and one at server 2. But the replicas do not yet all agree.

Step 7: Gossip continues. Suppose server 1 sends [0, 1, 0] to server 0. Server 0 merges its own [1, 0, 0] with that remote state componentwise and gets [max(1,0), max(0,1), max(0,0)] = [1, 1, 0]. Server 0 now believes one click happened on itself and one on server 1, but still knows nothing about server 2. If a client queries server 0 at this moment, it answers 1 + 1 + 0 = 2, which is slightly behind the real total of three, but it is not wrong with respect to anything it has seen.

Step 8: Later, server 2 gossips [1, 0, 1] to server 1. Server 1 merges [0, 1, 0] and [1, 0, 1] and gets [1, 1, 1]. Now server 1 has a full picture: one click per server. A query to server 1 now gets 1 + 1 + 1 = 3, which matches the true global count. Nothing has forced all servers to synchronize at once; this has happened through normal asynchronous gossip and merging.

Step 9: Eventually server 1 will gossip [1, 1, 1] to server 0 and server 2. When they merge, both will also obtain [1, 1, 1]. At that point all replicas agree, but the key point is that this agreement was not required for correctness at intermediate steps. Every merge was just a componentwise max; every local update only increased one component; and any sequence of these operations keeps moving the state upward in the partial order defined by comparing vectors componentwise.

Step 10: Because the merge is associative, commutative, and idempotent, the final state each replica converges to does not depend on the order of gossip messages, nor on whether some states are received multiple times. Re-merging [1, 1, 1] with [1, 1, 1] does nothing, since the max of equal components is the same number. Delayed messages do not break anything; when an old state finally arrives, merging it with a newer state simply keeps the newer information because the newer components are greater.

Step 11: From the client perspective, the counter behaves very naturally. When they click, the local node increases its own component, so the next read from that same node will show a value at least as large as what they saw before, often strictly larger. As more gossip completes, reads from any node monotonically rise toward the true total. There is no possibility of the count going down, and no risk that merging creates phantom extra clicks, because every update is a local increment and every merge is a join that preserves the maximum seen at each replica.

Step 12: The same pattern works for more complex replicated structures. Once you choose a state representation with a partial order, define a merge that is the join in that order, and design updates that only move states upward, you obtain the same behavior: independent replicas, asynchronous gossip, arbitrary reordering and duplication of messages, and eventual convergence on a coherent global value without coordination.


At the heart of this structure there are three moving parts: the update function, the merge function, and the value function. Each has a very simple job, and the combination of those jobs is what makes the whole thing work in a hostile distributed system.

First, think about the order we put on states. For the G-counter, each state is a vector like [c0, c1, c2]. We say one state is less than or equal to another when every component is less than or equal componentwise. So [1,0,1] <= [2,0,3] because each position is less than or equal. But [1,3,0] and [2,1,0] are incomparable, because sometimes the arrows go up, sometimes they go down.

A state s1 is below s2 in the order if every component of s1 is less than or equal to the corresponding component of s2.

Now look at the update function. On node i, update is “add one to component i, leave the rest alone.” If the old state is v and the new state is v', then every component except i is identical, and component i has increased by one. That means v is always less than or equal to v' in our componentwise order. The state never moves sideways or down; an update always pushes it strictly upward.

The update function is monotone: applying it produces a new state that is greater than or equal to the old one in the order.

Because updates only move upward, they never erase information. A click that has been recorded in some component is never undone by a later update anywhere. If you imagine the partial order as a graph, every update is a step along one of the arrows that go upward.

Next, the merge function. Merge takes two states and computes the componentwise maximum. If we merge [1,0,2] and [0,3,1], we get [max(1,0), max(0,3), max(2,1)] = [1,3,2]. This merged state is exactly the least upper bound of the two in our order: it is above both input states, and it is the smallest state with that property. So merge is literally implementing the join operation of the join-semilattice.

The merge function is the join: it returns the smallest state that is above both of its arguments in the order.

Because merge is a join, it obeys three key algebraic laws. It is associative, so (a merge b) merge c is the same as a merge (b merge c). That means if node A gossips to B, then B gossips to C, you get the same combined information as if A had gossiped directly to C and then C merged with B later. It is commutative, so a merge b equals b merge a; it does not matter which direction the message flowed or which replica is considered “left” or “right” in the code. And it is idempotent, so a merge a is just a; if the same state is transmitted twice and merged twice, nothing changes the second time.

Idempotence means merging the same information again does not change the state.

Those three properties of merge are exactly why message reordering, duplication, and fan-out do not break convergence. Any finite pattern of gossip is equivalent to “take the join of all states you have ever seen,” because you can regroup (associativity), reorder (commutativity), and drop duplicate merges (idempotence) without changing the result. The unique result of “join everything” is the least upper bound of all the replica states at that moment, which we can think of as the ghost global state.

Now combine update and merge. Every local update moves a state upward. Every merge also moves the receiving state upward or leaves it where it is, because the result is an upper bound of the two arguments in the order. There is no operation in the system that moves a state downward. So if you watch any one replica over time, its state follows some path that only climbs in the partial order, sometimes by local increments, sometimes by merges. If you conceptually join all the states that have been created so far, you get the current ghost global upper bound. And because each replica is repeatedly joining in more and more of these states through gossip, its own local state keeps moving toward that upper bound.

Because every transition is monotone, any sequence of updates and merges always moves replicas upward toward the current global upper bound.

The last piece is the value function. For the G-counter, value is “sum all components of the vector.” If state v is below state w in the componentwise order, then every component of v is less than or equal to the corresponding component of w, so the sum of v is less than or equal to the sum of w. That means the value function itself is monotone with respect to the state order: when the state goes up, the visible integer count never goes down.

A monotone value function never reports a smaller observable value when the underlying state becomes larger in the order.

This is precisely why a client never observes the counter decreasing. When you click on a post, your node applies an update, which raises its local state. The next time you ask for the value on that node, the state cannot be lower than it was before, so the sum cannot be lower either. Later, when gossip brings in information about clicks seen on other nodes, merges will raise the local state again, and the sum will increase again. At worst, the value you see is a little behind the ghost global sum because you have not yet heard about all remote updates, but it is always consistent with some prefix of the system’s history and always nondecreasing from your point of view.

Putting all of this together, the structure works because of a very tight alignment between these three parts. The state space with its order is a join-semilattice. Merge computes joins in that semilattice and so has the algebraic properties that tolerate arbitrary gossip. Update is monotone in that same order, so it moves states up without breaking the lattice structure. Value is also monotone, so as states climb, observable values climb too. Given those three conditions, no matter how many nodes you have, no matter how messages are reordered, delayed, or duplicated, all replicas are always climbing toward the same upper bound, and all clients see counters that only move forward.

2025-12-02 Conflict-Free Replicated Data Types (CRDT) for Distributed JavaScript Apps. - YouTube { www.youtube.com }

Speaker: https://jonathanleemartin.com/

2025-12-02 coast-team/dotted-logootsplit: A delta-state block-wise sequence CRDT { github.com }

2025-12-02 automerge/automerge: A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically. { github.com }

image-20251201233349962 image-20251201233518746


Conflict-free replicated data types are presented as a different paradigm for handling concurrent edits: instead of transforming operations like in operational transformation, they define data structures whose update rules guarantee that all replicas converge without needing a special conflict-resolution step.

A conflict-free replicated data type is a way of storing data so that many copies can be edited independently and still end up in the same state automatically.

Basic instances such as sets and counters are described as largely solved, and are already used in systems like distributed databases; however, creating good structures for ordered sequences such as text is much harder and has driven a lot of recent research.

One particular sequence design, called LSEQ (linear sequence), is highlighted as a favorite because it aims to be fast, memory-efficient, and suitable for multi-peer collaboration without a central server.

LSEQ is a tree-based representation of a sequence that assigns each element a carefully chosen position identifier so replicas can merge edits consistently.

Instead of representing text as a simple array of characters, this approach uses a special exponential tree (based on the Logoot tree): a root with multiple numbered branches at each level, where characters live at leaves, and a left-to-right depth-first traversal of the tree yields the visible string.

In this scheme there is a clear split between the model and the view: the underlying tree is the model optimized for correctness and performance, while the string shown to the user is a view obtained by traversing that structure.

The model is the internal data structure the algorithm manipulates, while the view is the human-readable form derived from it.

Each character node is addressed by an identifier constructed from the sequence of branch labels taken from the root to that node (for example, "3.7.7" might name one letter), and this identifier serves as a permanent, unique name for that position.

An identifier here is a structured label that uniquely and permanently names a position in the collaborative document.

When inserting between two characters, the editor chooses a new identifier that sorts between their identifiers (for example, inserting between "3.7.7" and "3.8" might produce "3.7.9"), adds a new node in the appropriate place in the tree, and associates the inserted character with that new name.

When replicas exchange operations, they do not transform indices; they simply share the inserted or deleted identifiers, and each participant integrates them into its own tree and then traverses that tree, so that all end up with the same character order even if operations are applied in different orders.

The core idea is that the total order over identifiers replaces explicit conflict resolution rules.

For this to work, the identifiers must be immutable and unique: once a name is assigned to a position, it can never change or be reused, because later merges may depend on that exact label as a stable reference.

Immutability here means that once a position label is created, it never changes value.

Because identifiers are never reused and insertions may require creating names between existing ones, their length and the depth of the tree can grow as the document is edited; if this were done naively, the tree could degenerate into something like a linked list with poor performance.

LSEQ therefore focuses heavily on an allocation strategy for identifiers: it uses an exponential branching pattern and some randomness so that, even under adversarial editing patterns, the tree stays roughly balanced, identifier growth is slow, and lookups remain close to logarithmic time.

A key advantage over central-server operational transformation is that this tree-based sequence allows true multi-peer operation: any number of replicas can edit offline and then synchronize changes directly with one another, yet still converge without relying on a single authoritative server.

Another advantage is that it does not require an explicit tombstone mechanism with coordinated garbage collection; instead, because identifiers encode position independently of the current tree, nodes associated only with deleted content can eventually be dropped locally without a global clean-up phase.

A tombstone is a marker that remembers where a deleted element used to be so later operations that refer to it can still be interpreted.

In the question-and-answer discussion, it is clarified that deletions do still leave structural traces: if a node has children, its character payload is removed but the node stays as a kind of implicit tombstone until all possible descendants are gone, after which the structure can naturally disappear from that replica.

The same discussion explains that even if a subtree was removed on one machine, a later operation from another replica that references an identifier inside that region can reconstruct the necessary path from the identifier itself, because the position is encoded in the ID rather than in surrounding context.

To keep different editors from picking exactly the same position name, identifiers include more than just the tree path: they also contain a replica or site ID, a local counter, and possibly causal metadata, which together ensure uniqueness and provide a deterministic tie-breaker when different users choose numerically similar positions.

Replica identifiers and counters let the system break ties in a consistent way whenever different users generate conflicting-looking position labels.

The algorithm has non-obvious performance costs: every time a user inserts at a given character index in the visible text, the system must map that view index to the correct node in the tree, and this mapping step can be more expensive than the ideal logarithmic time suggested by the tree shape.

There is also a behavioral caveat: if two people independently insert different words at the same visual position while offline, the merged result may consist of their letters interleaved character-by-character, producing a structurally valid but linguistically unnatural string that does not match either person`s intention.

Subsequent research has built on this scheme, leading to newer sequence structures such as dotted Logoot-split AVL trees that aim to reduce identifier growth, improve memory usage, and mitigate interleaving problems while preserving the same convergence guarantees.

Practical JavaScript implementations now exist: tree-based text CRDTs like logoot-split offer LSEQ-style behavior for documents, while a library such as Automerge provides a CRDT for arbitrary JSON data, so application state or Redux-like stores can be replicated and merged without conflicts at the structural level.

Automerge treats strings as lists and uses a different sequence algorithm (RGA-split) that relies on tombstones; this works well for many small text fields such as card titles or descriptions but is not ideal for very long, heavily edited documents because the tombstone history can grow large in memory.

Beyond linear text, these ideas generalize to many collaborative domains: any state that can be modeled as sets, counters, ordered sequences, or nested JSON-like structures can be given CRDT semantics, enabling distributed editing of things like boards, documents, diagrams, or even music notation.

The speaker stresses that structural convergence does not automatically guarantee semantic correctness; designers still need to choose good data models, so that even when different edits are merged mechanically, the resulting state respects as many application-level invariants as possible, with manual conflict resolution reserved for rare edge cases.

Finally, the algorithms rely only on per-replica ordering of operations, typically via local counters or timestamps; there is no need for globally synchronized clocks, and batching operations for network transmission is independent of the logical order used for merging.


A00 Let us first fix a simple mental model. Think of an LSEQ style CRDT as keeping two things at once. There is the view, which is the text the user sees, like "cat". Underneath there is the model, which is a list of characters, and each character has a special position id that can be compared and sorted. When two replicas merge, they do not argue about indices like "insert at index 1". They only collect all characters with their ids, sort by id, and the sorted order defines the final text.

A01 A position id in LSEQ is not just a single number. It is a small vector of integers, like [3] or [3,5] or [10,2]. Two ids are compared lexicographically: first element, then second, and so on. Between two ids you can often create a new id that sorts between them by choosing new numbers or by going one level deeper. This ability to always find a fresh id between two existing ids is the key that lets many users insert in the "same" place without conflicts.

A02 For this example, there will be two users, Alice and Bob. They both start from the same initial document containing the text "cat". At the model level, the document is stored as a list of entries of the form (id, character). At the beginning we will assume some simple ids, chosen by a library when the document was created:

(id: [3], char: "c") (id: [7], char: "a") (id: [11], char: "t")

A03 The view is the string that results from sorting these entries by id and reading out the characters: [3] < [7] < [11], so the text is "cat". Alice and Bob both see exactly this text and those same ids at the start.

A04 Now both go offline and make edits concurrently. Alice wants to insert the letter "h" right after "c" to start writing "chat". In the view, that means insert "h" at index 1. In the model, Alice looks at the two neighboring ids: the "c" has id [3], the next character "a" has id [7]. LSEQ asks: choose a new id that compares strictly between [3] and [7]. There are many choices; a simple one is [5]. So Alice creates a new entry:

(id: [5], char: "h")

and inserts it in her local structure between [3] and [7]. Locally her model is now:

[3] "c", [5] "h", [7] "a", [11] "t"

and her view shows "chat".

A05 At the same time, Bob wants to insert the letters "r" and "e" after "c" to write "cr eat" on his side. For illustration, let us say Bob first inserts "r" after "c", then "e" after that. When he inserts "r", he also looks at ids [3] and [7], just like Alice did, but his local random strategy picks a different id between them, say [4]. So after inserting "r" he has:

[3] "c", [4] "r", [7] "a", [11] "t"

and his view shows "crat" for a moment.

A06 Then Bob inserts "e" right after "r". In the view this is between "r" and "a". In the model this is between ids [4] and [7]. He can choose a new id between [4] and [7], say [6]. After that change, his model is:

[3] "c", [4] "r", [6] "e", [7] "a", [11] "t"

and his view shows "creat". So now, offline, Alice sees "chat" and Bob sees "creat".

A07 Each replica has also recorded the operations it made in terms of CRDT events. An event is something like "insert character 'h' at id [5]" or "insert character 'r' at id [4]". Notice that the event contains the chosen id, not "insert at index 1". The index is only a local convenience for the user; the id is the durable address.

A08 When Alice and Bob come back online, they exchange their operations. The important point: they do not need to know the exact order in which things happened in real time, and they do not need to transform indices. Each side simply takes the operations that came from the other side and applies them to its own model.

A09 Consider what happens on Alice’s side when she receives Bob’s operations. Her model currently has entries:

[3] "c", [5] "h", [7] "a", [11] "t"

She receives an operation "insert 'r' at id [4]" and an operation "insert 'e' at id [6]". The application rule is very simple: add these new entries into the set keyed by id, assuming those ids are not already present. After inserting both, Alice’s local list of entries is:

[3] "c", [4] "r", [5] "h", [6] "e", [7] "a", [11] "t"

A10 To build the view, Alice sorts by id. Lexicographically [3] < [4] < [5] < [6] < [7] < [11], so the visible text becomes "crheat". This looks odd in English, but structurally it is fully defined by the ids and independent of the order in which the operations arrived.

A11 On Bob’s side the same thing happens in the other direction. His model currently has:

[3] "c", [4] "r", [6] "e", [7] "a", [11] "t"

He receives Alice’s operation "insert 'h' at id [5]". That id is not present yet, so Bob adds the new entry and now has:

[3] "c", [4] "r", [5] "h", [6] "e", [7] "a", [11] "t"

He also sorts by ids and his view becomes "crheat". Even though the messages may have arrived in a different order, the final sorted order by id is exactly the same as on Alice’s side.

A12 This is how convergence is achieved. The CRDT guarantees two properties. First, every replica eventually receives the same set of operations, which means the same set of (id, character) pairs. Second, the order is determined only by a pure function over these immutable ids. So as long as id generation follows the same rules on all replicas, sorting will always produce the same sequence of characters.

A13 The example above uses only one level numbers like [3], [4], [5], [6], [7]. In practice, you will eventually need to insert between two ids that have no free integer between them. For example, suppose you have ids [3] and [4], and someone wants to insert one more character between them. There is no integer strictly between 3 and 4, so LSEQ drops to a deeper level. It creates a two-component id, for example [3,5]. When you compare ids, [3,5] still sorts between [3] and [4], because first you compare the first component: 3 equals 3, then you compare the second component, 5, which is less than 4 by definition of how that level is constructed.

A14 Because ids are vectors of integers and not just single numbers, the CRDT can always proceed by adding a deeper component when it runs out of space at the current level. Different replicas can choose different random numbers for these components and still end up with a total order because the tie breaking rules are deterministic. Over time this gives you a tree shape inside the id space, even though you interact with it as a sorted sequence.

A15 Deletion works in a similar fashion. There is no "delete at index 2" in the CRDT protocol. There is "delete the character that has id [5]". When Alice deletes that character, she marks id [5] as removed in her model. When Bob later receives this deletion operation, he also removes the character associated with id [5] from his model. Since both replicas drop the same id, they both agree on which character disappeared. They then sort the remaining ids and reach the same view.

A16 This example shows the main ideas without the full tree representation. The model stores characters keyed by immutable ids. Insertion generates a fresh id between two neighboring ids. Deletion removes the character for a specific id. Replicas exchange operations that mention these ids, not raw indices. Each replica independently maintains a set of entries and derives a view by sorting them. Because sorting is deterministic and ids never change, all replicas converge to the same sequence once they have seen the same operations, even if their edits were concurrent and even if messages arrive in different orders.


To make LSEQ-style ids immutable and unique you mainly have to decide 2 things: what the id looks like, and how you allocate a new one between two existing ids.

A practical shape for an id is a triple: a path, a replica identifier, and a local counter. The path is the list of integers you already saw, for example [3], [3,7], [3,7,9]. The replica identifier is a value that is globally unique per device or browser session, such as a UUID or random 128-bit number. The counter is a number that starts at 0 on each replica and is incremented every time that replica allocates a new id. Once you create a triple (path, replicaId, counter) you never change any part of it again; that is where immutability comes from.

Uniqueness comes from combining the path with the replica identifier and counter. Two replicas may occasionally choose the same path when inserting between the same neighbors, but because they have different replica identifiers they still produce different overall ids. Even on a single replica, the counter ensures you never accidentally reuse the same (path, replicaId) pair twice. When you compare ids to sort characters, you use a fixed lexicographic rule: first compare paths component by component; if the paths are equal, compare replicaId; if still equal, compare counter. That comparison rule is the same on every replica, so they all derive the same total order.

The interesting part is how to generate a new path between two neighbors. Suppose you want to insert between left path L and right path R. At some depth d you look at the d-th component of L and R. If they differ and there is room between them, you choose a random integer between them at that depth, and copy the prefix from the left side. For example, if L is [3] and R is [7], you can choose 4, 5, or 6 as the new component and get a path [5]. Because 3 < 5 < 7, [5] will sort between the two neighbors on all replicas. If there is no room between L and R at the current depth, you go one level deeper. For example, if L is [3] and R is [4], there is no integer strictly between 3 and 4, so you extend the shorter path. You might treat L as [3,0] and R as [4,0] conceptually, or you might define a depth-dependent base, and then pick a number between those new bounds at the deeper level, for example [3,5]. Now [3,5] still sorts between [3] and [4] because you compare 3 with 3 first, then 5 with the implicit 4 at that depth according to your scheme. The LSEQ paper chooses these ranges carefully and often uses randomness within them so that, over time, insertions are spread out and the tree stays fairly balanced.

Once you have a procedure like that, generating a new id when a user inserts a character looks like this in words. You take the id of the character just before the insertion point and the id of the character just after it. You run your “allocate path between L and R” procedure to get a fresh path. You increment your local counter. You form the id as (newPath, replicaId, counter). You attach that id to the inserted character. After that, that id is frozen forever: you never edit it, and you never reuse it.

Deletions then talk only about existing ids. When a user deletes a character, the operation you broadcast is “delete id X”. Every replica that receives that operation finds the entry with id X and removes it or marks it deleted. Because ids never change and are globally unique, every replica removes the same character. Insertions only ever introduce new ids, never modify old ones. Deletions only ever remove existing ids, never create new ones. As replicas exchange these operations, each one ends up with the same set of ids and associated characters, and because the comparison rule is deterministic and based on immutable fields, sorting them always yields the same order.

So the recipe is: design ids as immutable tuples that include a path, a globally unique replica identifier, and a per-replica counter; define a deterministic lexicographic comparison over those fields; implement a “pick a path between two paths” function that can always find a new path by going deeper when necessary and that tends to keep the tree balanced. With those three pieces in place, the ids are naturally immutable, unique, and sufficient to drive convergence.

2025-12-02 CRDT Papers Conflict-free Replicated Data Types { crdt.tech }

This page contains a comprehensive list of research publications on CRDTs. The data is available in BibTeX format. If you have anything to add or correct, please edit the file on GitHub and send us a pull request. image-20251201235239094

2025-10-27 Conflict-Free Replicated Data Types (CRDTs): Convergence Without Coordination { read.thecoder.cafe }

☕ Welcome to The Coder Cafe! Today, we will explore CRDTs, why they matter in distributed systems, and how they keep nodes in sync. Get cozy, grab a coffee, and let’s begin!

image-20251026213014196Concurrency is about causality, not timing Two operations are concurrent if neither was aware of the other — regardless of when they happened. For example, edits made hours apart can still be concurrent if there was no shared knowledge of each other's changes. Action: Classify operations as concurrent by checking if they were causally dependent, not by timestamp.

Coordination is costly and optional with CRDTs Traditional systems need replicas to coordinate to agree on one valid result before responding, which delays responses. CRDTs remove this need by defining deterministic merge functions, enabling replicas to process updates locally. Action: Use CRDTs when you need immediate local writes, even under network partitions.

CRDTs ensure Strong Eventual Consistency (SEC) CRDTs use deterministic merge rules that ensure all replicas converge on the same result without central coordination. Action: Choose CRDTs when you want high availability and are willing to trade off strong immediate consistency.

G-Counter shows simple merge rules Each node only increments its own counter. Merging is done by taking the element-wise maximum, ensuring all replicas converge. Action: Use G-Counter for cases like "likes" or counters where values only grow.

PN-Counter handles increments and decrements Maintains separate vectors for additions and subtractions. After merge, computes total by subtracting summed decrements from increments. Action: Use PN-Counter for distributed systems that need to both increment and decrement reliably.

Three CRDT sync models exist

  • State-based: send full state and merge using associative, commutative, idempotent logic.
  • Operation-based: send ops like "add 5", needing causal delivery.
  • Delta-based: send only changed fragments. Action: Match the sync strategy to your bandwidth and delivery guarantee constraints.

Offline collaboration is a natural CRDT use case CRDTs let users make changes while offline and merge later without conflict. Notion and other modern tools leverage this for seamless collaboration. Action: Implement CRDTs in editors and UIs where users may go offline.

Active-active replication is CRDT-backed Systems like Redis use CRDTs to support multi-region writes with local latency and no central authority. Action: Apply CRDTs in systems needing cross-region availability and latency-sensitive updates.

CRDTs vs Operational Transformation (OT) OT needs a central arbiter to coordinate edits. CRDTs allow fully decentralized, offline-safe updates. Action: Use CRDTs for peer-to-peer or offline-first architectures.

Beyond text: CRDTs fit edge and IoT scenarios CRDTs naturally support devices that store local state and sync later, like IoT or CDN edge caches. Action: Consider CRDTs in environments where connectivity is intermittent or decentralized coordination is impractical.

2025-12-06 Microsoft Research Video 153540 Strong Eventual Consistency and Conflict free Replicated Data Types - YouTube { www.youtube.com }

image-20251206114520819

🌈Year 2011 Marc Shapiro

The talk explains how to build very fast, highly scalable replicated data structures in the cloud by relaxing traditional consistency guarantees, so that replicas can update their local state without coordination yet still converge to the same value later.

Strong consistency is described as the model where all updates are totally ordered, so every replica sees the same sequence of operations and always has the same view of the world, often called linearizability or sequential consistency.

Strong consistency: every operation appears to run one after another in a single global order.

To implement this total order, systems use consensus protocols that force all participants to agree on each next step, turning a parallel system into a logically sequential one and creating both a performance bottleneck and a reliability bottleneck because progress depends on a majority of nodes being alive.

Consensus: a protocol by which multiple nodes agree on a single value or decision.

Eventual consistency arose as a way to avoid putting consensus on the critical path: replicas can update independently, diverge for a while, then later reconcile their differences with a global arbitration step that resolves conflicts, often again using some form of consensus in the background.

In this weaker model, a replica may temporarily see invalid or conflicting states; reconciliation can be complex, may roll back previous outcomes, and can be difficult to design correctly despite the apparent simplicity of letting updates proceed without initial coordination.

Eventual consistency: if updates stop, all replicas will eventually hold the same state, but what happens before that is unconstrained.

Strong eventual consistency is introduced as a stricter and more useful variant: as soon as two replicas have received the same set of updates, they must already be in the same state, with no rollbacks or later corrections.

Under this model, updates are applied locally without synchronization, but the way operations are defined guarantees that any replica that has seen the same operations will converge to exactly the same result, independent of delivery order or interleaving.

Strong eventual consistency: replicas that have applied the same updates are already identical.

With strong eventual consistency, the classical CAP tradeoff is reframed: instead of choosing between strong consistency and availability under network partitions, one can keep availability and partition tolerance and still have a well-defined consistency property, provided one is willing to adopt this weaker but deterministic convergence model.

The talk formally characterizes traditional eventual consistency with three properties: every operation is eventually delivered everywhere, every operation eventually terminates, and replicas eventually converge when updates stop; strong eventual consistency keeps the first two but strengthens convergence to hold immediately once the same updates have been seen, eliminating the need for rollbacks.

To realize strong eventual consistency, the speaker proposes designing special data types whose operations guarantee convergence without coordination; these are called conflict-free replicated data types, or CRDTs.

CRDT: a replicated data type whose operations are designed so that replicas converge without needing coordination.

Two replication styles are considered: state-based and operation-based. In the state-based style, replicas occasionally send their entire state (or summaries) to other replicas, which merge the received state with their own using a merge function defined by the data type.

For state-based replication, a simple mathematical condition guarantees convergence: the local payload must form a join-semilattice, every update must move the state monotonically upward in that partial order, and the merge function must compute the least upper bound of the two states.

Semilattice: a set with a partial order and an operation that gives a least upper bound for any two elements.

This condition means that replicas can exchange states in any pattern, repeatedly, and still converge, because each merge moves them upward in a way that never loses information and never cycles.

For operation-based replication, replicas broadcast update operations instead of full state; each operation is applied at its origin and then delivered to all other replicas, which replay it on their local state.

In the operation-based style, a sufficient condition for strong eventual consistency is that all concurrent operations commute, while operations that are causally ordered must be delivered and applied in that same causality order at every replica.

Commutativity: two operations commute if applying them in either order gives the same result.

The talk explains that these two conditions, one for state-based and one for operation-based replication, are in fact equivalent: any state-based CRDT can be emulated in an operation-based system and vice versa, and convergence in one model implies convergence in the other.

State-based specifications are often easier to reason about mathematically, while operation-based implementations are usually more efficient in practice, because they avoid sending entire state and can instead send small deltas or individual operations.

The compositional properties of CRDTs are highlighted: the product of two independent CRDTs is again a CRDT, and a single CRDT can be partitioned into independent components that remain CRDTs, which directly supports static sharding of large data structures across many nodes.

This compositionality underpins a clean story for scaling: a large structure, such as a graph, can be partitioned deterministically across shards by a hash function, with each shard maintaining its own CRDT state, as long as the partitioning remains static or any re-partitioning is done with some coordination.

The speaker emphasizes that the CRDT conditions are not just sufficient but effectively necessary for strong eventual consistency: if a data type allows two replicas to apply the same concurrent updates and still end up consistent regardless of delivery order, then those concurrent operations must commute for all initial states and arguments.

The talk then explores concrete CRDT designs, starting with counters. Even for a simple grow-only counter, the state-based approach requires more structure than a single integer: each replica keeps a vector of per-replica counters, with each replica only incrementing its own entry, and merge taking elementwise maxima.

The value of such a grow-only counter is the sum of the entries in this vector, the partial order is the usual componentwise order between vectors, and each increment moves the state upward, so merge-as-max satisfies the semilattice condition and convergence is guaranteed.

To support both increments and decrements while staying within the semilattice framework, the design uses two grow-only counters, one tracking increments and one tracking decrements; the exposed value is the difference between these two internal counters, and each update only ever grows one of the components.

The talk stresses that naĂŻve approaches, such as using a single counter that increments and decrements arbitrarily and merging by taking maxima, do not work because decrements would be lost or behave as no-ops, violating the intended semantics.

Sets provide a richer case study. Sequentially, a set offers add and remove operations with the obvious invariants: after add(e), element e is in the set; after remove(e), e is not; the challenge is to define what should happen for concurrent add(e) and remove(e) operations on different replicas.

Several conflict-resolution strategies are possible: mark the result as an error, use last-writer-wins based on timestamps, choose add-wins, or choose remove-wins; all are consistent strong eventual semantics, and the right choice depends on the application.

The talk focuses on an add-wins design that closely matches intuitive expectations for many applications: when an add and a remove are concurrent, the element should remain present, because the remove could not have seen the add.

To realize add-wins semantics as a CRDT, the observed-remove set (OR-set) is introduced. Each call to add(e) actually creates an internal instance of e tagged with a unique identifier, and remove(e) removes only those internal instances of e that were observable in the local state at the time of the remove.

Tombstone: a marker that an element instance was removed, kept so that later merges can recognize that removal.

In this OR-set, the internal state includes both live element instances and tombstones for removed instances; merge is defined as a union of these internal records, and the externally visible membership test for e only checks whether there is any live instance of e that has not been tombstoned.

Because a remove can only tombstone internal instances it has observed, a concurrent add of e with a fresh identifier that the remove did not see will survive, so after all updates are propagated and merged, e remains in the set, giving the intended add-wins behavior.

The talk explains why simpler ideas like using a per-element counter (increment on add, decrement on remove) can fail in the presence of concurrency: different replicas may both decrement based on the same single add, causing the counter to go negative or to misrepresent membership when operations are merged.

The need to keep tombstones raises garbage-collection concerns. The speaker notes that vector clocks or version vectors can be used to track causality, allowing implementations to detect when a deletion has been seen by all replicas, at which point the corresponding tombstones can be safely discarded in optimized implementations.

A practical motivation is given through Amazon Dynamo’s shopping cart example. Dynamo used a multi-value register that stored a set of possible values instead of a true set CRDT, which can cause removed items to reappear; a correctly designed set CRDT like the OR-set would avoid such anomalies.

Building on sets, the talk constructs a graph CRDT. A graph is modeled as a pair of sets: a set of vertices and a set of edges, where each edge is a pair of vertices; sequentially, one typically enforces an invariant that edges reference existing vertices and that vertices cannot be removed while incident edges exist.

In the distributed setting, concurrency between removing a vertex and adding an edge to that vertex introduces a new kind of conflict; as with sets, there are multiple possible semantics, such as enforcing strict invariants with coordination or choosing which operation should dominate.

For modeling the web, the talk adopts add-edge-wins semantics and relaxes the invariant: operations to add edges or delete vertices are always accepted, the invariant is not enforced at update time, and instead lookups interpret the stored state, treating edges to non-existent vertices as absent.

This behavior matches URLs pointing to pages that may not exist yet or anymore and supports sharding, because add operations do not need to synchronously consult remote shards to check invariants; only lookups need to traverse shards, and they can do so on a consistent snapshot.

Consistent snapshots are achieved by using the unique identifiers of elements as timestamps, combined with version vectors: for each replica, a timestamp summarizes what it has seen, and a global snapshot is defined by choosing a vector of local timestamps that describes a cut in the distributed execution.

Given such a snapshot vector and retained tombstones, the system can answer membership or reachability queries as of that logical time, which is useful for operations like computing PageRank that require a stable view of the graph.

The talk then sketches how these ideas could structure a large-scale web indexing and search system: crawlers build local maps from URLs to content, a CRDT graph for links, and CRDT maps for words to postings; updates are propagated as operations through a dataflow pipeline.

Because CRDTs allow asynchronous updates and strong eventual consistency, the system can run many replicas and shards around the world, make progress despite partitions, and adapt resource usage by throttling or accelerating update propagation without violating convergence guarantees.

Finally, the limitations of CRDTs are discussed: they make it difficult to enforce strong global invariants or multi-object constraints, such as ensuring a bank account never goes negative or atomically transferring money between accounts without ever showing intermediate double-credit or double-debit states.

In such cases, some form of synchronization, transactions, or centralized computation on consistent snapshots is still required, and the lesson is that CRDTs and strong eventual consistency are powerful for a wide class of problems but do not replace consensus-based mechanisms for all data types or invariants.

2025-09-29 Why Local-First Apps Haven’t Become Popular? { marcobambini.substack.com }

image-20250929162449926 Local-first apps promise instant load times, resilience to flaky networks, and better privacy, but they remain rare because sync is hard. The article frames local-first as a distributed-systems problem: multiple devices mutate the same data while offline, then must converge to one state without losing intent. Two core challenges block adoption. First, unreliable ordering: operations arrive out of order, so naïve “last write wins” produces surprising losses. Second, conflicts: concurrent edits require semantics that match user expectations, not just technical convergence.

The proposed path uses Hybrid Logical Clocks (to assign a consistent happens-before order across devices) and CRDTs (to merge concurrent changes without coordination). Together they remove reliance on network ordering and allow safe, client-side merges. The piece argues that a robust local database is the right foundation and positions SQLite as a strong fit because it is embeddable, fast, and ubiquitous across platforms. With a local DB handling durability and indexes, and a CRDT/clock layer handling causality and merge, you can deliver true offline-first behavior and sync later without central coordination.

This matters because local-first improves UX (zero-latency reads/writes), reliability (works through outages), and privacy (data persists on devices). For developers, the takeaway is to treat sync as a first-class concern: model data with CRDTs where appropriate, capture causality with logical clocks, and store everything in a proven local database. The combination reduces edge-case complexity and makes offline-capable apps practical beyond demos.

· 23 min read

image-20251227001921810

Scheduled post: Put it off and read it on Dec 31!

Good Reads​

2025-12-24 Nobody knows how large software products work { www.seangoedecke.com }

image-20251224141056474

Big software products become hard to understand because growth usually means adding options that widen the audience: enterprise controls, compliance, localization, billing variants, trials, and many special cases.

Each new option changes the meaning of existing features, so the overall behavior becomes a mesh of conditions and exceptions rather than a single clear rule set.

At that scale, many questions cannot be answered from memory or docs; the practical source of truth is often the code, plus experiments to see what happens in real environments.

Keeping complete, accurate documentation is usually infeasible because the system changes faster than teams can write, review, and maintain descriptions of every interaction.

A lot of behavior is not explicitly designed in one place; it emerges from many local decisions and defaults interacting, so "documenting it" often means discovering it.

Because understanding decays when ownership changes or people leave, organizations repeatedly pay the cost of re-investigation, and engineers who can reliably trace and explain behavior become disproportionately valuable.

~~ GPT 5.2 brainstorming ~~


Treat "how does it work" questions as first-class work, not as interruptions. Make a visible queue for them, timebox investigations, and record the answer where future readers will actually look (runbook, ownership doc, or an internal Q and A page tied to the repo). If you do not create a home for answers, the organization will keep paying the same investigation cost.

Set explicit boundaries for complexity before you add more surface area. When someone proposes a new audience-expanding capability (trial variants, enterprise policy, compliance mode, region-specific behavior), require them to also name the ownership model, the invariant it must not break, and the cross-cutting areas it touches:

  • authz
  • billing
  • data retention
  • permissions
  • UI,
  • APIs

If that extra work cannot be staffed, defer the feature or narrow it.

Prefer designs that localize rules rather than sprinkling conditions everywhere. Centralize entitlement and policy decisions behind a small number of well-named interfaces and treat them as critical infrastructure with tests and monitoring. Avoid copying "if customer has X then Y" logic across services, UI layers, and scripts, because that is how the system turns into an untraceable tangle.

Localize rules means keep one decision in one place, not duplicated across the product.

Make "explainability" a design requirement. Every major user-visible decision should have a reason code that can be logged, surfaced to support, and traced back to a single decision point. This turns investigations from archaeology into lookup: you can answer "why did it deny access" by reading the reason and following a link.

Explainability means the system can tell you why it made a decision.

Invest in a small set of canonical sources of truth, and be ruthless about what is not canonical. For example, code and automated tests are canonical; a wiki page is not unless it is owned, reviewed, and updated with changes. When people ask for "documentation", decide whether you need durable truth (tests, typed schemas, contract checks) or just a short-lived explanation (a note in an incident channel).

Use tests to document the behavior you care about, not everything. Write high-level contract tests for the most expensive-to-relearn areas: entitlements, billing state transitions, permissions, data deletion, compliance modes, and migrations. The goal is not more test coverage; it is protecting the system against silent drift in the places where drift creates confusion and risk.

Contract test means a test that asserts the externally visible behavior stays the same.

Turn emergent behavior into intentional behavior where it matters. If something "just happens" because of interacting defaults, and customers rely on it, promote it to an explicit rule with an owner, a test, and a clear place in the code. If it is accidental and harmful, add guardrails that prevent it from reappearing.

Manage knowledge loss by making ownership concrete and durable. Assign an accountable owner for each cross-cutting domain (authz, billing, data lifecycle, compliance), keep an oncall or escalation path, and require a handoff checklist when teams change. Reorgs will still happen, but you can avoid re-learning the same things from scratch.

Treat investigations as an engineering skill and teach it directly. Provide playbooks for reading logs, reproducing production behavior safely, tracing through services, and using feature flags to isolate paths. Review "investigation writeups" the same way you review code: what evidence was used, what was ruled out, and what changed in the system to prevent recurrence.

Investigation playbook means a repeatable method for finding the real cause of behavior.

Instrument the product so answers come from data instead of people. Log decision points with stable identifiers, record key inputs (tenant type, plan, flags, region, policy), and make those logs easy to query. When you can reconstruct the decision path from telemetry, you reduce dependence on tribal knowledge.

Keep the number of variants smaller than it wants to be. Standardize on a limited set of plan types, policy knobs, and deployment modes, and aggressively retire rarely used branches. If you cannot delete variants, you should at least measure their usage and cost so the business can see the tradeoff.

Create a "complexity budget" tied to revenue impact. For each cross-cutting feature, estimate ongoing cost (support, incidents, engineering time, cognitive load) and compare it to the expected value. This makes complexity a managed resource rather than an untracked byproduct of ambition.

When documentation is necessary, make it executable or tightly coupled to change. Put key behavior in schemas, config definitions, and code comments that are enforced by review. For narrative docs, use ownership, review gates, and small scope: short runbooks for common questions beat long encyclopedias that rot.

Make support and engineering share the same debugging artifacts. Provide a way for support to capture a "decision trace" (inputs and reason codes) that engineering can replay. This reduces back-and-forth and prevents engineers from having to start every investigation by reconstructing the scenario.

Finally, accept that some ambiguity is structural, and optimize for fast, reliable rediscovery. If you cannot fully prevent the fog, build systems that let you cut through it quickly: centralized decision logic, reason codes, strong observability, and a habit of writing down answers where they will be reused.

2025-12-20 abseil / Performance Hints { abseil.io }

image-20251219212548654

2025-12-13 AI Can Write Your Code. It Cant Do Your Job. Terrible Software ✹ { terriblesoftware.org } ✹

image-20251212221823279

If you’re reading this, you’re already thinking about this stuff. That puts you ahead. Here’s how to stay there:

  1. Get hands-on with AI tools. Learn what they’re actually useful for. Figure out where they save you time and where they waste it. The engineers who are doing this now will be ahead.
  2. Practice the non-programming parts. Judgment, trade-offs, understanding requirements, communicating with stakeholders. These skills matter more now, not less.
  3. Build things end-to-end. The more you understand the full picture, from requirements to deployment to maintenance, the harder you are to replace.
  4. Document your impact, not your output. Frame your work in terms of problems solved, not lines of code written.
  5. Stay curious, not defensive. The engineers who will struggle are the ones who see AI as a threat to defend against rather than a tool to master.

The shape of the work is changing: some tasks that used to take hours now take minutes, some skills matter less, others more.

But different isn’t dead. The engineers who will thrive understand that their value was never in the typing, but in the thinking, in knowing which problems to solve, in making the right trade-offs, in shipping software that actually helps people.

OpenAI and Anthropic could build their own tools. They have the best AI in the world. Instead, they’re spending billions on engineers. That should tell you something.

2025-12-07 The Math of Why You Can't Focus at Work | Off by One { justoffbyone.com }

image-20251207134426233

found in: Leadership in Tech Why you can't focus at work

image-20251207134528777

2025-11-28 How good engineers write bad code at big companies { www.seangoedecke.com }

image-20251128133307489


Bad code at large tech companies emerges not from weak engineers but from structural pressures like constant team churn, tight deadlines, and unfamiliar legacy systems that force competent people to write quick, imperfect fixes.

Short engineer tenure combined with decade-old codebases means most changes are made by relative beginners who lack deep context, making messy or fragile solutions an inevitable byproduct of the environment.

Code quality depends heavily on a small group of overloaded experts who informally guard the system, but because companies rarely reward long-term ownership, their influence is fragile and inconsistently applied.

Big tech intentionally optimizes for organizational legibility (Seeing like a software company) and engineer fungibility over deep expertise, creating conditions where hacky code is a predictable side effect of easy reassignability and constant reprioritization.

Individual engineers have limited power to counteract these structural forces, so the most effective strategy is to become an "old hand" who selectively blocks high-risk decisions while accepting that not all ugliness is worth fighting.

Most big-company work is impure engineering driven by business constraints rather than technical elegance, so ugly-but-working code is often the rational outcome rather than a failure of ability or care.

Public ridicule of bad big-company code usually misattributes blame to an individual rather than recognizing that the organization’s incentives, review bandwidth, and churn make such outcomes routine.

Improving quality meaningfully requires leadership to change incentives, stabilize ownership, and prioritize expertise, because simply hiring better engineers cannot overcome a system designed around speed and reassignability.

2025-11-28 Feedback doesn't scale | Another Rodeo { another.rodeo }

When you're leading a team of five or 10 people, feedback is pretty easy. It's not even really "feedback”: you’re just talking. You may have hired everyone yourself. You might sit near them (or at least sit near them virtually). Maybe you have lunch with them regularly. You know their kids' names, their coffee preferences, and what they're reading. So when someone has a concern about the direction you're taking things, they just... tell you.

You trust them. They trust you. It's just friends talking. You know where they're coming from.

At twenty people, things begin to shift a little. You’re probably starting to build up a second layer of leadership and there are multiple teams under you, but you're still fairly close to everyone. The relationships are there, they just may be a bit weaker than before. When someone has a pointed question about your strategy, you probably mostly know their story, their perspective, and what motivates them. The context is fuzzy, but it’s still there.

Then you hit 100

image-20251127225216543

2025-11-28 Tiger Style { tigerstyle.dev }

Tiger Style is a coding philosophy focused on safety, performance, and developer experience. Inspired by the practices of TigerBeetle, it focuses on building robust, efficient, and maintainable software through disciplined engineering.

Summary

  1. Core principles
  2. Design goals
    1. Safety
    2. Performance
    3. Developer experience

image-20251127225023662

2025-11-09 To get better at technical writing, lower your expectations { www.seangoedecke.com }

image-20251108161947048


Write for people who will not really read you. Put your main point in the first sentence and, if possible, in the title. Keep everything as short as you can. Drop most nuance and background. Say one clear thing for a broad audience, like "this is hard" or "this is slow." Reserve long, detailed docs for the tiny group of engineers who actually need all the details. Before you write, force yourself to express your idea in one or two sharp sentences, and build only the minimum around that.

Do this because almost nobody will give your writing full attention. Most readers will glance at the first line, skim a bit, then stop. They do not share your context, they do not care as much as you do, and they do not have time. No document will transfer your full understanding or perfectly align everyone. Real understanding comes from working with the system itself. In that reality, a short, front-loaded note that lands a single important idea is far more useful than a long, careful essay that most people never finish.

2025-11-05 Send this article to your friend who still thinks the cloud is a good idea { rameerez.com }

image-20251104195529880

2025-11-02 Your URL Is Your State { alfy.blog }

image-20251102114702224

Couple of weeks ago when I was publishing The Hidden Cost of URL Design I needed to add SQL syntax highlighting. I headed to PrismJS website trying to remember if it should be added as a plugin or what. I was overwhelmed with the amount of options in the download page so I headed back to my code. I checked the file for PrismJS and at the top of the file, I found a comment containing a URL:

/* https://prismjs.com/download.html#themes=prism&languages=markup+css+clike+javascript+bash+css-extras+markdown+scss+sql&plugins=line-highlight+line-numbers+autolinker */

I had completely forgotten about this. I clicked the URL, and it was the PrismJS download page with every checkbox, dropdown, and option pre-selected to match my exact configuration. Themes chosen. Languages selected. Plugins enabled. Everything, perfectly reconstructed from that single URL.

It was one of those moments where something you once knew suddenly clicks again with fresh significance. Here was a URL doing far more than just pointing to a page. It was storing state, encoding intent, and making my entire setup shareable and recoverable. No database. No cookies. No localStorage. Just a URL.

This is scary:

image-20251102120457125

2025-11-04 hwayne/awesome-cold-showers: For when people get too hyped up about things { github.com }

image-20251103200937106

đŸȘ» Bloom Filter​

2025-11-25 Bloom filters: the niche trick behind a 16× faster API | Blog | incident.io { incident.io }

image-20251124210734023

The goal of the optimization is to reduce the cost of fetching and decoding data by pushing as much of the filtering as possible into Postgres itself, specifically by making better use of the JSONB attribute data so that fewer irrelevant rows ever reach the application.

Two approaches are considered: using a GIN index on the JSONB column, which is the standard Postgres solution for complex types, or introducing a custom encoding where attribute values are turned into bit strings so the database can perform fast bitwise membership checks instead.

A bloom filter is introduced as the core idea of the second approach: a probabilistic data structure that can say an item is definitely not in a set or might be in it, with the benefit of very efficient use of time and space.

A bloom filter is a compact data structure that lets you test set membership quickly by allowing some false positives but no false negatives.

2025-12-15 Bloom Filters { www.jasondavies.com }

image-20251214200242738

2025-12-15 jasondavies/bloomfilter.js: JavaScript bloom filter using FNV for fast hashing { github.com }

FNV Hash

2025-12-15 lcn2/fnv: FNV hash tools { github.com }

Fowler/Noll/Vo hash

The basis of this hash algorithm was taken from an idea sent as reviewer comments to the IEEE POSIX P1003.2 committee by:

Phong Vo Glenn Fowler

In a subsequent ballot round Landon Curt Noll improved on their algorithm. Some people tried this hash and found that it worked rather well. In an email message to Landon, they named it the Fowler/Noll/Vo or FNV hash.

FNV hashes are designed to be fast while maintaining a low collision rate. The FNV speed allows one to quickly hash lots of data while maintaining a reasonable collision rate. See http://www.isthe.com/chongo/tech/comp/fnv/index.html for more details as well as other forms of the FNV hash. Comments, questions, bug fixes and suggestions welcome at the address given in the above URL.

😁 Fun / Retro​

2025-12-09 Legacy Update: Get back online, activate, and install updates on your legacy Windows PC { legacyupdate.net }

image-20251208221839451

image-20251209182938697

2025-12-09 Legacy Update { github.com } image-20251208221932089

2025-11-29 Mac OS 9 Images < Mac OS 9 Lives { macos9lives.com }

image-20251128213821336 2025-11-29 Tiernan's Comms Closet How to Set Up Mac OS 9 on QEMU { www.tiernanotoole.ie }

👂 The Ear of AI (LLMs)​

2025-12-24 Tencent-Hunyuan/AutoCodeBenchmark { github.com }

image-20251224152333225

2025-12-23 The Illustrated Transformer Jay Alammar Visualizing machine learning one concept at a time. { jalammar.github.io }

Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese Watch: MIT’s Deep Learning State of the Art lecture referencing this post Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others

image-20251222213442305 image-20251222213511023

2025-12-23 Scaling LLMs to larger codebases - Kieran Gill { blog.kierangill.xyz }

image-20251222212757899


This was the third part of a series on LLMs in software engineering.

First we learned what LLMs and genetics have in common. (part 1) LLMs don't simply improve all facets of engineering. Understanding which areas LLMs do improve (part 2) is important for knowing how to focus our investments. (part 3)


Invest in reusable context that makes the model behave like someone who already knows your codebase, so prompts can stay focused on requirements instead of restating conventions every time.

A prompt library is reusable context you give a model so it follows your codebase conventions.

Aim for workflows where output is usable in one pass, because the main cost comes from rework when you have to repeatedly intervene and patch what it produced.

One-shotting is getting a usable solution from a model in a single attempt.

Reduce hidden complexity in the system first, because accumulated compromises make every change harder for both humans and tools, which limits automation gains.

Technical debt is accumulated design and code compromises that make future changes harder and riskier.

Structure the system into clear, independent parts with stable boundaries, so changes can be localized and the context needed for edits stays small and high-quality.

Modularity is organizing software into well-defined parts that can be understood and changed independently.

Treat quality and safety as a checking problem, not a prompting problem, and build verification into the process because instructions do not guarantee the code actually meets the intent.

Verification is the process of checking that changes are correct, safe, and meet requirements.


Aside: Example usage of a prompt library.

You are helping me build a new feature. 
Here is the relevant documentation to onboard yourself to our system:
- @prompts/How_To_Write_Views.md -- Our conventions and security practices for sanitizing inputs.
- @prompts/How_To_Write_Unit_Tests.md -- Features should come with tests. Here are docs on creating test data and writing quality tests.
- @prompts/Testing_Best_Practices.md -- How to make a test readable. When to DRY test data creation.
- @prompts/The_API_File.md -- How to find pre-existing functionality in our system.

Feature request:

Extend our scheduler to allow for bulk uploads.
- This will happen via a csv file with the format `name,start_time,end_time`.
- Times are given in ET.
- Please validate the user exists and that the start and end times don't overlap. You should also make sure there are no pre-existing events for a given row; we don't want duplicates.
- I recommend by starting in @server/apps/scheduler/views.py`.

Or, better yet, the preamble is preloaded into the models context (for example, by using CLAUDE.md).

Your prompt should be thorough enough to guide the LLM to the right choices. But verification is required. Read every line of generated code. Just because you told an LLM to sanitize inputs, doesn't mean it actually did.

2025-12-09 Has the cost of building software just dropped 90%? - Martin Alderson { martinalderson.com }

image-20251208220102525


Domain knowledge is the only moat

So where does that leave us? Right now there is still enormous value in having a human 'babysit' the agent - checking its work, suggesting the approach and shortcutting bad approaches. Pure YOLO vibe coding ends up in a total mess very quickly, but with a human in the loop I think you can build incredibly good quality software, very quickly.

This then allows developers who really master this technology to be hugely effective at solving business problems. Their domain and industry knowledge becomes a huge lever - knowing the best architectural decisions for a project, knowing which framework to use and which libraries work best.

Layer on understanding of the business domain and it does genuinely feel like the mythical 10x engineer is here. Equally, the pairing of a business domain expert with a motivated developer and these tools becomes an incredibly powerful combination, and something I think we'll see becoming quite common - instead of a 'squad' of a business specialist and a set of developers, we'll see a far tighter pairing of a couple of people.

This combination allows you to iterate incredibly quickly, and software becomes almost disposable - if the direction is bad, then throw it away and start again, using those learnings. This takes a fairly large mindset shift, but the hard work is the conceptual thinking, not the typing.

2025-12-09 AI should only run as fast as we can catch up Higashi.blog { higashi.blog }

image-20251208220410815

2025-12-02 Codex, Opus, Gemini try to build Counter Strike { www.instantdb.com }

image-20251201222631428

image-20251201222543416

2025-11-29 So you wanna build a local RAG? { blog.yakkomajuri.com }

When we launched Skald, we wanted it to not only be self-hostable, but also for one to be able to run it without sending any data to third-parties.

With LLMs getting better and better, privacy-sensitive organizations shouldn't have to choose between being left behind by not accessing frontier models and doing away with their committment to (or legal requirement for) data privacy.

So here's what we did to support this use case and also some benchmarks comparing performance when using proprietary APIs vs self-hosted open-source tech.

image-20251128215551615

2025-11-28 InstaVM - Secure Code Execution Platform { instavm.io }

image-20251128135958931

LLMs perform more reliably when you avoid sending redundant or unchanged context. LLMs handle exact or brittle tasks poorly, so shift precision work into generated code or external tools. Long prompts degrade accuracy, making it essential to keep context well below the model`s limits. Models struggle with obscure or rapidly evolving topics unless you supply up-to-date, targeted documentation. AI-generated code remains fallible, requiring disciplined human review to prevent security and correctness issues.

2025-11-27 addyosmani/gemini-cli-tips: Gemini CLI Tips and Tricks { github.com }

image-20251126202046041 This guide covers ~30 pro-tips for effectively using Gemini CLI for agentic coding

Gemini CLI is an open-source AI assistant that brings the power of Google's Gemini model directly into your [terminal](https://www.philschmid.de/gemini-cli-cheatsheet#:~:text=The Gemini CLI is an,via a Gemini API key). It functions as a conversational, "agentic" command-line tool - meaning it can reason about your requests, choose tools (like running shell commands or editing files), and execute multi-step plans to help with your development [workflow](https://cloud.google.com/blog/topics/developers-practitioners/agent-factory-recap-deep-dive-into-gemini-cli-with-taylor-mullen#:~:text=The Gemini CLI is,understanding of the developer workflow).

In practical terms, Gemini CLI acts like a supercharged pair programmer and command-line assistant. It excels at coding tasks, debugging, content generation, and even system automation, all through natural language prompts. Before diving into pro tips, let's quickly recap how to set up Gemini CLI and get it running.

2025-11-15 Anthropic admits that MCP sucks - YouTube { www.youtube.com }

image-20251114190038807

image-20251114193003068

2025-11-15 Code execution with MCP: building more efficient AI agents \ Anthropic { www.anthropic.com } The core problem is context bloat. MCP clients typically load all tool definitions into the system prompt, then run multi step flows like gdrive.find_document followed by gdrive.get_document, then another tool, and so on. Every tool definition and every intermediate result lives in the context window, so each new tool call resends the whole history as input tokens. This quickly explodes into hundreds of thousands of tokens, increases latency, raises costs, and raises the chance of mistakes. Real world setups like Trey show agents with dozens of tools, including irrelevant ones like Supabase for users who do not even use it, which only adds noise.

Anthropic’s proposed fix is to treat MCP servers as code APIs and let the model write code that calls them from a sandboxed execution environment, usually using TypeScript. Tools become normal functions in a file tree, and the model discovers and imports only what it needs. Most of the heavy lifting happens in code, not in the LLM context. That cuts token usage dramatically, makes workflows faster, and lets the model leverage what it is actually good at, which is writing and navigating code rather than juggling hundreds of tool definitions and transcripts.

This code first approach also solves privacy, composition, and state in a more normal way. Large documents, big tables, and joined data can be filtered, aggregated, and matched in code, then only the small, relevant results are sent back to the model. Sensitive fields can be tokenized before they ever reach the LLM and untokenized only when communicating between backends like Google Sheets and Salesforce. State can live in memory or in files, and reusable workflows can be saved as functions or skills, which starts to look a lot like conventional SDKs and libraries.

· 51 min read

⌚ Nice watch!​

2025-11-27 Platform Engineering is Domain Driven Design - Gregor Hohpe - DDD Europe 2025 - YouTube { www.youtube.com }

image-20251127095532793

We tend to see ourselves as the ones pushing technology forward, but once systems become more complex than our own ability to understand and safely operate them, we become the limiting factor, not the machinery.

The attempt to "shift left" in software delivery has often turned into "pile left": a single developer is expected to own product, analysis, design, security, operations, and only then write code, which drives up mental load to an unsustainable level.

Cognitive load is the amount of mental effort required to understand and work with a system.

In response to this overload, organizations promote platforms as a cure-all: a shared environment that promises abstractions, guardrails, and higher productivity, but people often treat the label as magic without understanding how such a thing actually needs to work.

image-20251127100132126

The idea of building on a shared base layer is old, but historically it was justified by cost and reuse: if everyone uses the same foundation, the fixed investment is amortized and you save money compared to building everything from scratch.

When the main driver shifts from cost to speed, the picture changes: what matters is how quickly teams can move, not how perfectly everything is standardized, and that leads to very different design choices for shared infrastructure.

If a single, broad platform becomes a lowest common denominator that serves nobody particularly well, it reduces velocity; in a fast-changing environment, smaller, opinionated building blocks are often better for speed than one huge, generic base.

The classic "pyramid" model of software delivery assumes a big shared base and tiny bespoke tops; it looks economically attractive on paper, but it only works if the people designing the base can anticipate almost everyone else’s needs in advance.

That anticipation requirement is both unrealistic and undesirable, because it suppresses unexpected ideas: a system designed only around predicted use cases leaves little room for people to do things no one thought of, which is where real innovation tends to come from.

Innovation is the emergence of valuable new behavior that was not anticipated or centrally planned.

A healthy shared environment should widen the playing field instead of narrowing it, improving the economics of experimentation so that you get more diverse solutions on top, not fewer.

The tension in many organizations is that developers expect the "open" picture with lots of room to explore, while infrastructure or operations groups favor the tight pyramid for control and standardization, and trying to satisfy both with one design often fails.

The car industry illustrates a better model: auto makers invest heavily in shared components like chassis, engines, and electronics, but use them to support a wide variety of models and trims, turning a deep common platform into dozens of differentiated products.

Cloud providers work similarly: they use massive economies of scale to build data centers, networks, and complex services, yet expose them so that individual teams can move in "speed economics", renting small slices and iterating quickly without feeling the cost of the underlying machinery.

Economies of scale mean shared resources become cheaper per unit as you grow; economies of speed mean you optimize for fast learning and change, not unit cost.

A good shared platform combines these two worlds: underneath, it leverages scale to justify big investments; on top, it preserves speed and diversity so teams can innovate rapidly without needing to build the heavy substrate themselves.

image-20251127102658544

A simple test of whether an internal platform is real or just a renamed "common services layer" is whether teams have built something on it that its creators did not foresee; unanticipated but successful uses are a strong signal that the top of the pyramid is truly open.

The Cadillac Cimarron story shows the danger of shrinking the "top" too much: when manufacturers tried to pass off a lightly modified mass-market car as a luxury model, customers rejected it because there was no genuine differentiation, mirroring what happens when a platform leaves no room for innovation above it.

Organizational charts that show "applications" sitting on top of "platform" can be misleading, because they are static; speed, learning, and change are dynamic phenomena, so you need a model that shows iterative loops, not just boxes and layers.

Viewed as a cycle, software delivery has an "inner loop" (build, run, adjust) and "outer loop" (observe business impact and refine), and if the platform team is inserted directly into that loop for every change, it becomes a bottleneck instead of an accelerator.

A well-designed internal environment makes itself invisible in the day-to-day flow: teams deploy, operate, and learn without filing tickets or needing manual help from the platform group, which should provide the rails and instruments, not sit in the control loop.

The metaphor of guardrails is often misused: guardrails are emergency safety devices, not steering mechanisms, and if developers constantly "bounce off" organizational guardrails, the system is high friction and fragile rather than smooth and safe.

Instead of relying on hard boundaries, you want self-centering mechanisms like a train’s conical wheels or a car’s lane assist: low-friction feedback that keeps teams in a good path most of the time and only resorts to hard stops or blocks in real emergencies.

Transparency is one of those self-centering mechanisms: if teams can see cost, reliability, and performance effects of their choices immediately, they can adjust their behavior without needing central enforcement to correct every mistake.

Abstraction is central to platforms, but it is often misunderstood: hiding controls or renaming knobs is not enough; you must decide what concepts to emphasize, what to downplay, and how to express the underlying complexity without creating a false sense of simplicity.

An abstraction is a new way of describing a system that hides some details while emphasizing others that matter more for a given purpose.

Naming is part of this: good names describe the essential property of the whole system (like "automobile", something that moves itself), not the list of internal parts, while bad names like "gas pedal" leak implementation details that may no longer be true.

image-20251127104529860

Remote procedure call is a canonical example of a leaky abstraction: calling something "just like a local method" hides huge differences in latency, failure modes, and consistency, leading people to misuse the mechanism unless they understand what is really going on.

Because of this, there is no formula for "perfect abstraction level": it is a calibration exercise where you use feedback from actual use to adjust what you expose, much like steering a car by continually correcting rather than calculating the perfect trajectory in advance.

We often think that hiding details means people see less, but in practice careful omission can make important properties more visible; by removing noise you highlight structure, but this only works if you choose what to omit and what to emphasize wisely.

The number of parameters or methods in an interface is not a reliable measure of mental burden; a single overloaded flag or magic string can encode a huge amount of hidden behavior, yielding high cognitive load under the appearance of simplicity.

Overloaded parameters, "do everything" flags, and similar tricks repeat old mainframe-era patterns where fields were reused for multiple meanings because schemas were hard to change; they lower the visible surface but raise the mental work required to use them correctly.

The goal is not to make inherently complex things "simple" at all costs, but to make them intuitive: the surface should reflect the real challenges of the domain in a way that aligns with how users think and work, without pretending those challenges do not exist.

When teams build shared systems, the natural gravitational pull is toward more capability: adding more features, more knobs, more supported cases, because that is how products are typically evaluated and how internal stakeholders ask for enhancements.

What actually matters for fast, safe change is composability: small, coherent building blocks that can be combined easily, consistently, and predictably, so teams can assemble new solutions without needing a giant one-size-fits-all service.

Composability is the ability to combine simple pieces into more complex behavior without unexpected side effects.

A platform team must consciously push against the "more features" gravity by investing in coherence and composition, ensuring that services fit together cleanly rather than just accumulating independent capabilities at the bottom.

Historical platforms like operating systems did not reduce cognitive load by pre-filling disk drive numbers; they introduced new concepts such as files, streams, and processes, building a fresh vocabulary that sits between raw hardware and application logic.

For software teams, the sweet spot for internal platforms lies in the space between the business domain (customers, ledgers, policies) and the raw cloud services (databases, queues, functions) where you define concepts that encode both technical and organizational concerns.

Examples include distinguishing a "customer-data database" from a "non-PII database", or creating a "ledger database" that embodies auditability and change tracking; these abstractions bake in compliance, risk, and governance assumptions in a way infrastructure providers cannot.

To do this well, you must understand both sides: the technical constraints of services like managed databases and the specific business rules around data, regulation, and risk in your company, then reflect that understanding in the platform’s language and APIs.

Abstraction should also be two-way: when a higher-level concept compiles down into multiple low-level resources, there must be a clear mapping back, like stack traces and line numbers, so that when something goes wrong you can see which high-level construct caused it.

Physical properties such as latency, cost, quotas, and availability cannot be fully abstracted away; any attempt to pretend they are uniform across providers or environments creates leaky abstractions that eventually surprise and hurt the teams relying on them.

Generative AI does not remove the need for good platforms and abstractions: large models can generate code, but they still need clear, well-designed APIs and domain concepts to target, and they do not replace the human work of shaping those conceptual structures.

Finally, illusions in models and diagrams are like visual illusions in geometry: they can be surprisingly convincing, but physical reality eventually wins, so platform engineering has to respect the real constraints and behaviors of systems rather than relying on attractive but unrealistic pictures.

2025-11-23 Modern Architecture 101 for New Engineers & Forgetful Experts - Jerry Nixon - NDC Copenhagen 2025 - YouTube { www.youtube.com }

Author: Jerry Nixon {MSFT}

image-20251122173112292 image-20251122173206297

In this talk I walk through how I think as an architect in a world that is absurdly noisy for developers, especially the newer ones. I start by pushing back on the idea of universal “best practices” and reframe architecture as a series of context-heavy tradeoffs: politics, budget, legacy tech, team skills and time all matter more than whatever some big company did on a podcast. From there I define my job as the person who makes the decisions that are the most expensive to change later and, even more importantly, the person who protects the system from unnecessary stuff. Rather than chasing every fad, I care about keeping things as simple as they can reasonably be, knowing that every new box I add to the diagram is one more thing I own when it breaks.

Then I use the big architecture picture as a thinking model, not a prescription. I start with the simplest shape – a client, an API, a database – and show how I decide when to layer in new capabilities: separating reads from writes, adding replicas, splitting into services, introducing a service bus so systems can talk, hanging retries, caching and queues around the database so the user has a smooth experience even when the backend is struggling. I generalize patterns like event hubs and functions for reacting to changes, API gateways for versioning and protection, static hosting and CDNs at the edge, different storage patterns inside the database, data lakes and ML for analytics that feed back into the product, and identity with JWTs for access control. The point is to teach a way of reasoning: for each of these pieces, understand what it gives you, understand its caveats, and then deliberately choose what to leave out and what to postpone, because in the end simple, deferrable decisions usually make the strongest architecture.

2025-11-23 Workflows of Highly Functional App & Data Engineering Teams - Jerry Nixon - YouTube { www.youtube.com }

Author: Jerry Nixon {MSFT}

image-20251123152108766


{2025-11-23 note: LLM transcript test on this talk }

There is a common three-layer data model for analytics where data flows from a raw zone to a cleaned zone and finally to a curated zone; this structure helps separate ingestion, cleaning, and business-ready views of data.

Medallion strategy: A three-step way to organize data into raw, cleaned, and curated layers.

The first tier is a raw zone where copies of data from systems such as HR, sales, or factory applications are dumped into a data lake with no transformation, typically through extract-and-load jobs scheduled nightly, weekly, or by other batch or event processes.

Bronze layer: A place where raw copies of data from source systems are stored without changes.

The second tier is a cleaned zone where duplicates are removed, keys between different applications are aligned, value formats such as country codes are standardized, and fields are reshaped so that data from multiple systems can be reliably joined and queried together.

Silver layer: A cleaned area where data from different systems is deduplicated and made consistent.

The top tier is a curated zone where experts design models that are easy to use, well documented, and enriched with pre-aggregations and external sources like weather, so that common business questions can be answered quickly with minimal extra logic.

Gold layer: A curated area where data is modeled, documented, enriched, and pre-aggregated for business use.

In reality different applications inside the same company reach different tiers: one system might have fully curated data, another only cleaned data, and another just raw dumps, yet even the raw tier is valuable because it removes pressure from production systems and takes advantage of cheap storage, at the cost of large, messy data lakes.

Every move from one tier to the next adds latency because teams must negotiate extract schedules with system owners and then run heavy transformations, so when managers ask for real-time dashboards they usually mean updates as fast as is practical without huge extra cost, not truly instantaneous streaming like stock tickers.

Near real time: Data that is updated frequently enough to be useful but not instantly as events happen.

Modern pipelines often follow an extract-load-then-transform pattern in which data first lands in the lake and is transformed later once other sources arrive, and for this flow it is better to rely on a dedicated orchestration tool rather than hand-rolled notebooks with custom logging, auditing, and dashboards that become a maintenance burden for future developers.

ETL orchestration tool: A system that coordinates moving and transforming data between sources and targets in a repeatable way.

The three tiers differ in quality and timing: raw dumps may still expose subtle bugs from source systems, cleaned data starts to encode business rules and becomes much more reliable, and curated data is the most trustworthy and enriched but also the furthest behind real time due to all the work needed to prepare it.

Collaborative development on a database often degenerates into direct edits on a shared instance with scattered documentation about changes, which makes it hard to track differences between environments; a better approach is to capture the desired schema as declarative code under source control and let tools apply it to databases.

Declarative schema: A description of what the database should look like, stored as code, that tools use to create or update the actual database.

SQL Server database projects in Visual Studio provide this declarative model by letting teams define tables, views, procedures, and options as files, target specific SQL Server versions, build the project like any other codebase, and catch issues such as views referencing missing tables during compilation instead of at runtime.

Building such a project produces a schema package called a DACPAC that can be compared with a live database to generate upgrade scripts; within the project, all objects are defined with create statements, semantic renaming can safely update column names across all dependent objects, and pre- and post-deployment scripts plus publish profiles control behaviors such as blocking data-loss changes and recreating databases, enabling repeatable deployments and clean test-database resets.

DACPAC: A compiled package of a database schema that can be used to compare, deploy, or update databases consistently.

Similar capabilities exist in tools like Liquibase, Flyway, and DbUp, and the essential practice is to treat database schema as code managed in source control rather than working only against a live instance, regardless of which product is chosen.

A practical convenience for development environments is the Windows package manager winget, which can install tools such as SQL Server, SQL Server Management Studio, and Visual Studio through simple commands or scripts, making it easy to standardize and automate machine setup.

A sensible continuous integration pipeline for database changes starts by linting SQL to enforce style and consistency, and a dedicated T-SQL linting tool can run both locally and in build agents to keep the codebase clean and uniform.

After linting, regular MSBuild compiles the database project and produces the DACPAC, and a companion command-line tool can either apply the package directly to a target database or generate an incremental update script; the build agent that runs this often uses Windows, even in mostly cross-platform setups.

In CI it is safer to deploy schema changes into ephemeral SQL Server instances running in Linux containers from the official registry image, first applying a snapshot of the current production schema and then applying the new version to confirm that the upgrade path is valid, and only then running automated tests before any human reviews and approves scripts for later environments.

A modern cross-platform SQL command-line tool written in Go integrates with Docker to spin up and manage these containers, making it straightforward to create and tear down temporary test databases inside pipelines with single commands.

Database code such as stored procedures and views deserves automated unit tests just as much as application code, because future changes can silently break behavior, and tests serve as a safety net that guards against mistakes by any team member.

One effective pattern is to write tests as stored procedures that begin a transaction, prepare test data, call the code under test, and then use assertion helpers to check results before rolling back; these test procedures live in a separate database project that references the main project and an assertion schema, and an xUnit-based runner in application code can query for procedures that follow a naming convention and execute thousands of tests automatically in CI.

A shared assertion library for T-SQL provides helpers like equals and contains, along with example runner code, so teams can plug in a ready-made framework for writing and executing database tests instead of building all the infrastructure from scratch.

For the application-to-database boundary, a configurable data API engine can read a JSON description of the database and automatically expose REST and GraphQL endpoints with built-in security, telemetry, and resolvers, avoiding the need to hand-write CRUD APIs that repeat effort and require their own full testing and maintenance.

Data API engine: A configurable service that automatically exposes database tables as secure HTTP APIs without custom backend code.

The impact of these practices is not only technical but financial; one example describes a beverage company where surfacing data into a cleaned layer and analyzing it led to an 8 percent sales increase, roughly 440 million dollars, showing how modest engineering work on data quality and access can unlock large business gains.

A careful pattern for evolving schemas, such as splitting a full-name column into separate first- and last-name columns, starts by adding new columns while keeping the original, populating them via a trigger based on existing writes, holding back application changes behind configuration so production still writes only the old column, validating that downstream systems handle the new fields, and only then flipping the flag so the app writes directly to the new structure before eventually retiring the old one.

Feature flag: A configuration switch that turns a behavior in the application on or off without redeploying code.

Even seemingly simple changes like splitting names become complex with real data containing middle names and multiword surnames, so schema evolution requires patience, defensive design, and clear communication about effort and risk to managers who may assume the change is trivial.

2025-11-23 DORA Report 2024 explained for non-teching | DevOps Research & Assessment - YouTube { www.youtube.com }

image-20251122173927672

Elite teams in the DORA dataset deploy to production multiple times per day, have about a 5% change failure rate, can recover from failures within an hour, and can get a change from code to production in roughly a day.

Low-performing teams deploy at most once a month, have about a 40% change failure rate, can take up to a month to recover from failures, and may need up to six months to get a change into production.

Throughput and stability move together: teams that deploy less frequently tend to have higher failure rates; teams that deploy more often tend to have lower failure rates (correlation, not causation).

DORA’s main recommendation is that “elite improvement” matters more than “elite performance”: teams that keep improving their delivery metrics over time are healthier than teams that just chase a target level.

When a platform is present, developers report feeling about 8% more productive individually and teams about 10% more productive, with overall organizational performance up about 6% compared to teams without a platform.

Those same platform setups also see about an 8% drop in throughput and a 14% drop in change stability, and these drops correlate with higher burnout, so the platform brings both benefits and pain.

Developer independence (being able to do end-to-end work without waiting on an enabling team) is associated with about a 5% productivity improvement at both individual and team level.

The report strongly links a user-centered mindset to better outcomes: when teams align what they build with real user needs and feedback, they see higher productivity, lower burnout, and higher product quality.

Frequent priority churn hurts; stable priorities are associated with more productive teams and less burnout.

Cross-functional collaboration and good documentation show up as clear positive factors for long-term product quality and team health; optimizing only for throughput increases the risk of building “shiny but unused” features.

A 25% increase in transformational leadership (leaders who give purpose, context, and autonomy) is associated with about a 9% increase in employee productivity, plus lower burnout and higher job, team, product, and organizational performance.

Across a decade of data, DORA’s conclusion is that continuous improvement is not optional: organizations that do not keep improving are effectively falling behind, while those that treat transformation as an ongoing practice achieve the best results.

2025-11-23 Software Engineering at Google ‱ Titus Winters & Matt Kulukundis ‱ GOTO 2022 - YouTube { www.youtube.com }

image-20251122175056602

Software Engineering at Google

Book abseil / Software Engineering at Google { abseil.io }

In March, 2020, we published a book titled “Software Engineering at Google” curated by Titus Winters, Tom Manshreck and Hyrum Wright.

The Software Engineering at Google book (“SWE Book”) is not about programming, per se, but about the engineering practices utilized at Google to make their codebase sustainable and healthy. (These practices are paramount for common infrastructural code such as Abseil.)

We are happy to announce that we are providing a digital version of this book in HTML free of charge. Of course, we encourage you to get yourself a hard copy from O’Reilly if you wish.

Most chapters in “Software Engineering at Google” were written by subject-matter experts across Google; Titus, Tom Manshreck, and Hyrum Wright chose topics, matched authors, and did heavy editing, but did not write most of the content themselves.

A core rule of their build philosophy: if you ever need to run make clean (or the equivalent) to get a correct build, the build system is considered broken.

Hermetic build: every dependency (including toolchains and libraries) must be tracked by the build and version control; relying on whatever headers or .so files happen to be installed on a developer’s OS is treated as a violation.

Titus’s “heresy”: you should be able to update the OS on your dev machine without changing or breaking your build, because the build environment is controlled separately (containers, toolchains, etc).

The same hermeticity idea is applied to tests: especially unit tests should not depend on shared, mutable state (like a shared DB), otherwise only a few people can run them and they do not scale.

Testing is framed as “defense in depth”: cheap, fast unit tests; more expensive but higher-fidelity integration tests; and finally end-to-end tests and canary releases in production as the highest-fidelity checks.

Canarying small releases in production, frequently and safely, is described as the strongest possible way to validate a release, because it runs in the real environment against real traffic with controlled risk.

Dependency management is called out as Google’s biggest unsolved technical problem: they know many approaches that do not scale or are too high-toil, but do not yet have a dependency management model that is cheap, intuitive, and scales well.

Titus would rather deal with “any number of version-control problems than one dependency-management problem,” because once you lose track of what depends on what version, coordination and iteration become extremely expensive.

He leans heavily on DORA / “Accelerate”: trunk-based development, strong documentation, and especially continuous delivery correlate with better technical outcomes; his wish is for directors to set explicit CD and quality targets so teams have cover to fix flaky tests and infrastructure.

Software engineering is described as “multi-person, multi-version” work; the way universities train students (solo assignments, code rarely seen by others, slow feedback) does not prepare them for real team environments.

New hires are framed as “fresh eyes”: their confusion during onboarding is valuable signal about broken documentation, onboarding flows, and mental models, and they are expected to ask questions early instead of hiding until work is “perfect.”

On task assignment, the rule of thumb is: give work to the most junior engineer who can do it with reasonable oversight, not to the “best” engineer; that grows the team and frees seniors to tackle harder problems.

Leadership anti-pattern: “because I said so” or hard overruling is treated as a failure; you only override when the cost of a mistake is clearly higher than the learning value, and you should normally guide people to understand the tradeoff instead.

Real delegation means actually letting go: if you leave a project but keep trying to control it from the sidelines, you block growth and ownership in the new leads.

Design and evolution rule: always put version numbers on protocols and file formats; once data is on disk or “in the wild,” you accumulate permanent compatibility hacks if you can’t distinguish old from new.

Example of irreversible complexity: a historical bug wrote a specific bad checksum pattern for several days; because they do not know if those records still exist, the code has to keep a “special case” branch forever to accept that pattern.

Large-scale changes (LSCs) across Google’s codebase are only possible because of heavy investment in unified build systems, testing, and a monorepo; trying to do the same over today’s heterogeneous open-source ecosystem (inconsistent builds, many projects already broken) is seen as practically infeasible.

2025-11-20 Algorithms Demystified - Dylan Beattie - NDC Copenhagen 2025 - YouTube { www.youtube.com }

image-20251120000410212

image-20251120002232792

image-20251120004015553


Have you ever got stuck on a coding problem? Maybe you're implementing a feature on one of your projects, maybe you're solving puzzles for something like Advent of Code, and you get stuck. You just can't figure out how to get the result you need. So you head over to Stack Overflow, or Reddit, or ask a colleague for help, and you get an answer like "oh, that's easy, just use Dijkstra's algorithm"... and your brain crashes. Use what? So you go and look it up and discover it's for "finding the shortest paths between nodes in a weighted graph", and now you've got to look up what a "node" is, and what a "weighted graph" is... and then figure out how to turn all that into working code? Nightmare. Well, it doesn't have to be like that. Algorithms are the key to all kinds of features and systems, from networks to autocorrect, and understanding how they work will help you build better software, fix subtle bugs - and solve Advent of Code. In this talk, we'll meet some of his favourite algorithms, explain why they're important, and help you understand what they do, and how they do it.

2025-11-23 Stoicism as a philosophy for an ordinary life | Massimo Pigliucci | TEDxAthens - YouTube { www.youtube.com }

image-20251123004845461


This approach to life started with Zeno, a merchant who lost everything in a shipwreck, went to Athens, studied with local thinkers, and then founded a school in the public market to help ordinary people live better lives.

This school of thought became one of the main currents in the ancient world, shaping writers like Seneca and Marcus Aurelius and later influencing Paul, Aquinas, Descartes, and Spinoza.

The teaching says we should live “according to nature”, meaning we take human nature seriously: we are social and we can reason, so a good life uses our mind to improve life in the community, not only our own comfort.

Living according to nature means using your mind and social side to make life better for yourself and others.

This way of living rests on four main virtues: practical wisdom (knowing what truly helps or harms you), courage (especially moral courage), justice (treating others fairly), and temperance (doing things in the right measure, avoiding excess and lack).

A key idea is the “dichotomy of control”:

  • some things are up to us (our opinions, motives, and choices)
  • others are not (our body, property, reputation and most outcomes), so we should focus only on what is really in our power.

The dichotomy of control means separating what you can change from what you cannot.

We control our intentions, effort, and preparation, but we do not control final results; like an archer, we can train and aim well, but once the arrow leaves the bow, wind and movement can change what happens.

Applied to work, this view says we should stop worrying about “getting the promotion”, which depends on others and luck, and instead care about doing our best work, preparing well, and truly earning that promotion.

Applied to relationships, we cannot force someone to love us or stay with us; what we can control is being kind, reliable, and loving.

Epictetus, a former slave who became a teacher, says that if we clearly see what belongs to us (our judgments and choices) and what does not (most external things), we stop blaming others and stop feeling stuck because of what we cannot change.

He adds that peace comes from internalizing goals: judging ourselves by whether we acted well and tried our best, not by whether events turned out the way we wanted.

Their ethics can be seen as “role ethics”: a good life means playing our different roles well and balancing them as wisely as we can.

Role ethics means judging your life by how well you carry out your different roles.

We all have three types of roles: the basic role of human being and world citizen; roles given by circumstance (child, sibling, etc); and roles we choose (career, partner, parent, and so on).

The role “human being” comes first; whenever we act, we should ask whether the action is good for humanity, and if it is not, we should not do it.

Other roles must be balanced; we cannot be perfect in all roles at once, so this way of thinking helps us handle tradeoffs between being, for example, a good parent, worker, friend, and citizen.

Integrity is central: Epictetus warns us not to “sell” our integrity cheaply; the aim is not to be flawless but to become a bit better than yesterday without betraying our core values.

Integrity means sticking to your values even when it costs you.

We should not confuse feelings with duties: strong emotions like fear, grief, or distress arise on their own, but our actions in response remain our responsibility.

In the story of the distraught father and sick daughter, feeling overwhelmed is natural, but leaving the child and the burden to the mother fails his duty; real courage and justice mean staying and caring for his daughter despite the pain.

The virtues guide how we handle roles and role conflicts: practical wisdom to see what is truly good, courage to do it, justice to treat others rightly, and temperance to spread our time and energy in a balanced way.

These thinkers suggest seeing yourself as an actor: life hands you a role and a rough script, but how you play that role is up to you.

We learn to play our roles better by using role models, real or fictional, as patterns to imitate in our own choices.

Ancient writers admired figures like Cato the Younger for his honesty and Odysseus for his steady effort to return home to his family, even when faced with comfort or immortality.

Modern examples include Nelson Mandela, who used ideas from Marcus Aurelius to move from anger to reconciliation, and Susan Fowler, who risked her career to expose wrongdoing at her company out of a sense of duty.

Even fictional characters can help: Spider-Man’s line “with great power comes great responsibility” sums up the idea that any real power or choice we have always comes with moral responsibility.

Overall, the picture is that we wear many “masks” or roles in life and switch among them as things change, and a good life is one where we play all these roles with skill, balance, honesty, and focus on what is truly under our control.

2025-11-29 C++ The worst programming language of all time - YouTube { www.youtube.com }

image-20251128215827430

image-20251129090051081

C++ uses many different syntaxes and rules for simple tasks such as initialization, printing, random numbers, casting, globals, and inheritance, so code often becomes verbose, irregular, and hard to recall from memory.

Language constructs for casting are split across static_cast, reinterpret_cast, const_cast, dynamic_cast, and bit_cast, and the correct choice depends on subtle context; hiding these behind user-defined aliases is discouraged because it makes code harder for others to read.

Keywords such as static, extern, inline, const, and constexpr have multiple meanings depending on where they appear, including effects on storage duration, linkage, visibility, optimization hints, and One Definition Rule handling, and these meanings must be memorized rather than inferred from their names.

The keyword static is reused for several unrelated purposes, including persisting local variables between function calls, sharing class variables and functions between instances, and limiting visibility of functions and variables to a translation unit.

The keyword inline historically requested inlining of functions but now mostly manages One Definition Rule issues; using it on functions and variables affects how many instances can appear across a program, and inline on namespaces is used for library versioning.

The keyword constexpr enables compile-time evaluation for expressions, functions, and variables, but each standard version changes which constructs are allowed, and constexpr functions are implicitly inline while constexpr variables are not.

constexpr: A specifier that allows certain expressions and functions to be evaluated at compile time if they satisfy strict rules.

C++ builds interfaces by combining virtual functions and pure virtual members written as = 0 instead of a dedicated interface keyword, and the override specifier is optional, so incorrect overrides can compile without warning if override is omitted.

The integer type system is large and implementation-dependent; sizes for short, int, long, long long, and size_t vary by platform and compiler, and only constraints like sizeof(short) <= sizeof(int) <= sizeof(long) are guaranteed.

Character and string handling uses multiple types, including char, signed char, unsigned char, wchar_t, char16_t, char32_t, std::string, and std::wstring, and developers must know which types represent text, bytes, or wide encodings and how they interact.

Many tasks can be expressed with several different syntaxes: functions can use traditional declarations or trailing return types; callables can be implemented as functions, struct function objects, or lambdas; and const can appear on either side of the type, or on pointers, references, and member functions.

The const keyword serves multiple purposes: it can make variables read-only, mark member functions that do not modify logical state, apply to pointers versus pointed-to values, and be combined with mutable to allow specific fields to change even in const objects, and const_cast can remove constness where allowed.

The same type can be written as const int or int const with no semantic difference, and both pointers and pointees can be const, so developers must learn rules such as reading declarations right-to-left or using patterns like “east const” versus “west const” to understand them.

Code style varies widely between projects; conventions for naming, brace placement, order of public and private sections, pointer and reference binding (int* p vs int *p), snake_case versus camelCase, and east const versus west const differ, making each codebase feel like a separate dialect.

Guides such as large corporate style guides and the C++ Core Guidelines exist but are incomplete, opinionated, or not widely followed, so there is no single canonical style or “modern” subset that all projects share.

Standard library names are often terse or misleading; std::vector is a dynamic array rather than a mathematical vector, std::set and std::map are tree-based containers with logarithmic lookups, and hash-based containers are named std::unordered_set and std::unordered_map.

Important idioms use non-obvious acronyms, such as RAII and CRTP, and their names do not describe their behavior; understanding them requires learning an additional vocabulary beyond the core language.

RAII: A pattern where creating an object acquires a resource and destroying the object releases it automatically.

Many newer types such as std::scoped_lock, std::jthread, and std::copyable_function are improved forms of earlier std::lock_guard, std::thread, and std::function, but their names do not clearly indicate that they supersede older versions, so developers must remember which ones to prefer.

The header system requires splitting declarations into header files and definitions into source files and keeping them synchronized; changing a function signature, parameter order, or constructor often means editing both files and can introduce mismatches that still compile.

The #include mechanism is textual; including a header effectively copies its content into each translation unit, so any widely used header such as <string> or <vector> may be parsed and compiled hundreds or thousands of times per build.

Header guards implemented with #ifndef, #define, and #endif, or #pragma once, are required in every header to prevent multiple definition errors; they add boilerplate and depend on conventionally unique macro names that must be maintained when files are renamed.

Private implementation details appear in headers for classes, including private members and helper functions, which leaks internal structure, forces dependents to recompile when private fields change, and complicates attempts to treat headers as pure interfaces.

Patterns such as PIMPL move private data into an opaque implementation type defined in a .cpp file to reduce recompilation and hide details but require extra allocation, indirection, and hand-written forwarding code.

Header inclusion chains bring many transitive declarations and macros into a translation unit, which pollutes autocompletion with irrelevant symbols and increases the risk of name collisions and accidental dependency on internal details of other libraries.

Macros defined with #define perform raw text substitution before compilation without respect for types, scopes, or namespaces, so they can override function names, change access control (for example, redefining private as public), or interfere with later includes.

macro: A preprocessor construct that replaces text based on simple rules before the compiler parses the code.

Including large system headers such as windows.h can inject many macros and type aliases, including names like SendMessage, that may silently override user code or require careful include ordering with other headers.

Namespaces are used to prevent name clashes by grouping related identifiers into scopes such as std::, but avoiding using namespace in headers leads to verbose code with frequent short, cryptic qualifiers like ab::cd::ef::Type.

Unqualified name lookup searches local scope, then enclosing namespaces, then global scope, so adding a new function into an inner namespace can silently change which overload is chosen for an existing unqualified call in nested namespaces.

Deep namespace hierarchies increase the chance that new symbols overshadow existing ones through lookup rules, so many projects constrain themselves to a single top-level namespace or shallow hierarchies to reduce accidental coupling.

Macros do not respect namespaces at all, so even with carefully designed namespaces, a macro from an included header can alter identifiers anywhere downstream in the translation unit.

Repeated parsing and instantiation of templates and includes across translation units leads to long compile times for large projects, and small changes can trigger large rebuilds due to header dependencies.

Precompiled headers collect commonly used includes into a single compiled artifact to reduce parse times, but they introduce extra configuration, may differ across compilers, and can confuse newcomers who do not understand why certain symbols “magically” exist.

Modules are intended to replace headers with compiled interface units and import statements, but partial toolchain support, bugs, and the large installed base of header-only libraries mean that headers and modules will coexist for a long time.

The phrase “modern C++” has no precise definition; resources labeled as modern may target C++11, C++14, C++17, or later, and many tutorials teach mixtures of newer and older practices without clearly identifying their assumptions.

Introductory material often explains syntax for raw pointers, new / delete, and basic templates but omits ownership semantics, lifetimes, and practical rules such as preferring std::unique_ptr over std::shared_ptr in most designs.

In practice, std::shared_ptr is appropriate only in relatively rare scenarios where ownership is genuinely shared and lifetimes cannot be expressed through simpler ownership relationships, but it is frequently used by beginners as a default pointer type.

Because many production codebases are stuck on older standards such as C++14 or C++17, developers must understand both old idioms like the Rule of Three and manual memory management and newer idioms like the Rule of Zero and move semantics.

The ecosystem has no single standard compiler, build system, package manager, or ABI; major compilers such as MSVC, GCC, and Clang differ in extensions, warning flags, optimization behavior, and pace of implementing new language features.

ABI: An agreement at the binary level that defines how functions are called, how data is laid out, and how different compiled units interoperate.

Each compiler exposes many flags controlling warnings, optimizations, security hardening, and language modes; the same intent may require different flags on each compiler, and some flags such as “warn all” do not actually enable all warnings.

Build systems vary widely; many projects use platform-specific systems such as MSBuild, Xcode projects, or Makefiles, while cross-platform projects often rely on meta-build tools like CMake, Meson, or others, each with its own language and conventions.

CMake has become the de facto cross-platform meta-build system, but it uses a custom scripting language with unusual scoping and variable rules; its API has evolved over time, leading to several coexisting styles and many outdated tutorials.

Package managers such as vcpkg, Conan, and others exist but are not universally adopted; some libraries are distributed only as source or prebuilt binaries, and integrating them still requires configuring include paths, library paths, and specific link directives.

Because there is no standard ABI, precompiled C++ libraries typically must match the consumer’s compiler, version, architecture, and key flags; mismatches can cause link-time symbol errors or runtime crashes due to layout or calling convention differences.

Integrating a precompiled library often involves choosing between static (.lib, .a) and dynamic (.dll, .so) variants, placing headers and binaries in structured directories, and wiring build system or IDE settings to find them correctly on each platform.

If a suitable binary is not available, developers must build third-party libraries from source using the library’s chosen build system, which may differ from the application’s build system and may introduce further dependencies.

Header-only libraries eliminate linking steps by placing all code in headers, simplifying distribution, but they create huge single headers or header sets, which increase compile times because every translation unit recompiles the same template-heavy code.

Low-level system APIs, such as the Win32 API, use older naming schemes, macros, and verbose struct initialization with many flags and pointer parameters; accomplishing simple tasks like opening a file dialog or reading the registry requires dozens of lines of boilerplate.

The C++ standard library does not include built-in facilities for networking, HTTP, JSON, or comprehensive Unicode text handling, so developers commonly rely on third-party libraries for tasks that other languages provide in their standard libraries.

std::string stores bytes rather than Unicode characters, has historically lacked many convenience functions (such as split, join, and trim until recent standards added some methods), and cannot natively handle complex Unicode text segmentation.

std::vector and other containers expose only basic methods; higher-level operations such as mapping, filtering, reduction, and slicing are implemented as free algorithms like std::transform and std::find_if, which require verbose iterator-based calls.

Many algorithms require passing container.begin() and container.end() explicitly, even when the full container is the intended range, and mixing iterators from different containers can compile but produce undefined behavior.

The iterator returned by vector.end() refers to a position past the last element rather than the last element itself, while vector.back() returns the last element value, so “end” in the iterator sense differs from the last valid index.

The specialization std::vector<bool> packs bits instead of storing real bool values, returns proxy objects rather than standard references, and introduces performance and semantic differences that do not exist for other std::vector<T> instantiations.

Associative containers such as std::unordered_map expose operator[] that inserts a default-constructed value when a key is missing, so using [] to test for existence can silently mutate the map instead of acting as a pure query.

Inserting key-value pairs into std::map or std::unordered_map with insert or emplace does nothing if the key already exists, so updates require explicit logic using insert_or_assign or checking the result of insert.

Map entries are stored as std::pair<const Key, T> with members .first and .second instead of more descriptive names like key and value, reducing readability in code that manipulates entries.

New abstractions such as std::optional, std::expected, std::span, std::string_view, and std::variant add useful semantics but have limitations like difficulties with references, and many existing APIs do not use them, relying on older patterns instead.

The standard library offers functions that are technically available but discouraged, such as some character conversion routines that do not handle Unicode correctly and old random number facilities with poor statistical properties.

Large production projects often build custom utility layers or wrapper libraries on top of the standard library to provide safer defaults, better naming, or richer operations, but these layers themselves require advanced template and generic programming skills.

Mainstream IDEs and editors expose complex project templates and settings; initial configurations may default to older C++ language versions, suboptimal optimization levels, or confusing folder layouts that many developers immediately adjust.

Refactoring operations such as renaming header files or changing extensions can break underlying project files such as .vcxproj files in Visual Studio, forcing developers to manually edit generated XML or rebuild projects.

IntelliSense and code navigation tools often struggle with large projects, heavy template usage, macros, and nontrivial build configurations; as codebases grow, features like “go to definition” and autocompletion can become unreliable or slow.

C++ requires that functions be declared before they are used, so function ordering and forward declarations matter, unlike languages with module or script semantics where declarations can appear in any order.

Aggregate initialization and std::initializer_list both use brace syntax, and adding or removing an = in an initialization expression can change which initialization form is selected, so the same-looking code may produce different types or behaviors across language versions.

Member initializer lists use a distinct syntax between the constructor parameter list and body; members are initialized in the order of declaration in the class, not in the order listed in the initializer list, so relying on the written order can cause bugs.

Struct and class layout is affected by member ordering; padding inserted to satisfy alignment requirements can increase object size, so placing larger types first can reduce memory footprint and improve cache behavior.

Static object initialization order across different translation units is unspecified, so static objects that depend on one another across .cpp files can access uninitialized values, leading to the “static initialization order fiasco.”

Copying and implicit conversions can hide expensive operations; a simple = assignment may perform deep copies, heap allocations, or other costly work, and function calls may create multiple temporary objects whose lifetimes are not obvious from syntax.

Implicit conversions allow surprising behaviors such as converting string literals to const char*, then to bool, and selecting overloads based on that bool, or returning integers where strings are expected, relying on converting constructors.

Operator overloading lets developers redefine the behavior of symbols like +, -, *, /, &, and [] for user-defined types; overloaded operators can deviate from arithmetic meaning and assign domain-specific semantics, which may be confusing to readers.

The address-of operator & can be overloaded, so writing &obj can call a user-defined function instead of yielding the real memory address; std::addressof exists to obtain the actual address even when operator& is overloaded.

Constructors without the explicit keyword allow implicit conversions; passing an int where an Animal is expected can cause automatic construction, so many constructors should be marked explicit to avoid accidental conversions.

undefined behavior: A condition where the language standard does not prescribe any result, allowing the compiler to assume the situation never happens and enabling optimizations that can produce arbitrary outcomes if it does.

Variables are mutable by default and may be left uninitialized, so reading from an uninitialized int or using an uninitialized pointer can yield arbitrary values and cause subtle data corruption or crashes.

Numeric types do not automatically initialize to zero and pointers do not default to nullptr, so safe initialization must be performed explicitly in constructors or variable declarations.

Destructors are non-virtual by default; deleting a derived object through a base class pointer without a virtual destructor yields undefined behavior and often leaks resources, so base classes intended for polymorphism must declare virtual destructors.

Inheritance using class defaults to private inheritance, which hides base public members, while inheritance using struct defaults to public; this asymmetry differs from many other languages and must be remembered.

switch statements fall through between case labels unless a break is added; missing break is a common source of bugs, and only newer attributes provide explicit [[fallthrough]] markers to document intentional fallthrough.

For containers such as std::vector, operator[] performs unchecked access and can produce undefined behavior on out-of-range indices, while .at() performs a bounds check and throws std::out_of_range, but it is less frequently used in performance-critical code.

Attributes such as [[nodiscard]] can mark functions whose return values should not be ignored, but they must be added manually; many functions that conceptually should be checked, including error-returning ones, lack this attribute in older code.

memory safety: A property where a program prevents invalid memory accesses, such as use-after-free, out-of-bounds reads or writes, and use of uninitialized storage.

C++ allows low-level memory errors such as dangling pointers, use-after-free, double-free, buffer overflows, uninitialized reads, null pointer dereferences, iterator invalidation, and lifetime issues with references, all of which can corrupt state or create security vulnerabilities.

Undefined behavior is left intentionally unspecified to enable aggressive optimizations; out-of-bounds access, signed integer overflow, and many other operations permit the compiler to assume they never happen, which can transform the program in unpredictable ways when they do occur.

Error handling strategies vary by project; some use exceptions, some return error codes, and others adopt newer types like std::expected, each with different tradeoffs in readability, performance, ABI stability, and control flow clarity.

Exceptions can introduce runtime cost, larger binaries, and difficulties when thrown across shared library boundaries; mismatched compilers or standard libraries can make cross-module exception handling unsafe.

std::expected and similar result types enforce explicit handling at call sites, making control flow visible in the source, but they introduce syntactic overhead and are not yet pervasive in existing APIs.

Testing frameworks such as GoogleTest, Catch2, and Boost.Test are not standardized; projects pick their own frameworks and test structures, often organizing core code as static libraries that are linked both into the main executable and into test binaries.

Accessing private or internal functions for tests has no single standard approach; practices include using friend declarations, changing private to protected for subclassing in tests, macros to modify access specifiers, or refactoring logic into free functions.

C++ presents itself as offering “zero-cost abstractions,” but practical use reveals cases where abstractions like std::unique_ptr, std::span, and move semantics introduce measurable costs compared to equivalent raw C code, especially due to calling conventions and non-destructive moves.

Because move constructors must still run destructors on moved-from objects and may throw exceptions, move semantics are not always zero cost, and some ideal optimizations such as destructive moves were not adopted due to backward compatibility.

Some standard components are known to be underperforming, such as std::regex implementations that are much slower than alternative regex libraries and std::unordered_map and std::unordered_set implementations that use cache-unfriendly chained buckets.

Faster hash map and hash set implementations exist in third-party libraries, such as flat hash table variants, but they cannot replace standard containers in the core library due to compatibility and ABI concerns.

The language’s semantics encourage value copying in many places, and optimizations such as copy elision, return value optimization, and move elision are relied upon to keep code fast; without these optimizations, naive code would copy objects many more times.

copy elision: A compiler optimization that removes temporary copies implied by the language’s value semantics, especially when returning objects from functions.

Developers cannot easily see or guarantee optimizations like tail-call elimination or specific inlining decisions from the source code alone and may need to inspect assembly to verify performance-critical behavior.

Debug builds and release builds often differ drastically in performance and behavior due to optimization settings, so teams sometimes introduce intermediate configurations to balance debuggability and runtime speed.

C++ projects span many domains such as operating systems, browsers, game engines, embedded systems, high-frequency trading, simulations, robotics, and infrastructure libraries, so knowledge of the language often must be combined with deep domain expertise.

Job opportunities in these domains can pay well, especially in finance or specialized infrastructure, but they may demand long hours, high stress, and strong mathematical or algorithmic background, while game development roles often pay less despite technical difficulty.

Skills built in one domain, such as graphics programming, may not transfer directly to another, such as quant finance, so developers can become tied to narrow niches based on their early career choices.

Learning C++ to effective professional proficiency generally takes longer than for simpler languages, because developers must absorb a large amount of language detail, legacy knowledge, tooling, idioms, and domain-specific patterns.

Rust offers a contrasting design, with a standard compiler rustc, a unified build and package tool cargo, a built-in test harness, and modules instead of headers, so starting projects and integrating dependencies is more uniform.

Rust enforces memory safety without a garbage collector by using ownership and borrowing rules checked by the compiler; this design prevents many categories of memory errors that are common in C++.

borrow checker: The part of the Rust compiler that enforces ownership and lifetime rules to guarantee safe memory access.

Rust provides expressive enum types with data payloads, pattern matching with exhaustive checks, and high-level standard types like Option and Result that are tightly integrated with the language and tooling.

Rust includes UTF-8 string handling, strong hash map implementations in the standard library, and convenient JSON serialization through libraries like Serde, with clear and human-readable type and function names.

Rust error messages are detailed and explanatory and often contain suggestions, which makes it easier to understand compilation failures compared to many C++ diagnostics.

Rust still has limitations; its ecosystem for certain domains such as GPU programming, some game engines, or legacy APIs is younger and sometimes thinner than C++’s, and compile times can also be substantial.

Rust’s unsafe block allows low-level operations but is more constrained than raw pointer work in C or C++; some patterns used in existing C++ code are harder to express or port directly.

As a newer language, Rust has fewer job openings than C++ in traditional systems domains, although cloud infrastructure and some new systems projects are adopting it rapidly.

Despite Rust’s advantages and the existence of many alternative languages, C++ remains deeply embedded in critical software and infrastructure, and its installed base and ecosystem make it likely to remain important for many years.

C++ combines high runtime performance, wide deployment, powerful abstractions such as RAII and templates, and a vast ecosystem with substantial complexity, many pitfalls, heavy boilerplate, weak defaults, and unresolved safety issues, so effective use requires significant time, care, and focused motivation.

· 18 min read

SQLite​

2025-10-06 SQLite File Format Viewer { sqlite-internal.pages.dev }

SQLite File Format Viewer This tool helps you explore the SQLite file format internals according to the official specification. It's designed for developers and database enthusiasts who want to understand the internal structure of SQLite database files.

invisal/sqlite-internal: Playaround with SQLite internal { github.com }

image-20251005214535369 image-20251005214622301

2025-10-06 Download HeidiSQL { www.heidisql.com }

image-20251005214700240

2025-10-06 How bloom filters made SQLite 10x faster - blag { avi.im }

That’s what the researchers did! They used a Bloom filter, which is very space efficient and fits in a CPU cache line. It was also easy to implement.

They added two opcodes: Filter and FilterAdd. At the start of the join operation, we go over all the rows of dimension tables and set the bits in the Bloom filter which match the query predicate. The opcode is FilterAdd.

During the join operation, we first check if the row exists in the Bloom filter at each stage. If it does, then we do the B-tree probe. This is the Filter opcode.

2025-10-06 SQLiteStudio { sqlitestudio.pl }

image-20251005215049159

.NET​

2025-07-14 Rejigs: Making Regular Expressions Human-Readable | by Omar | Jul, 2025 | Medium { medium.com }

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
var emailRegex = 
Rejigs.Create()
.AtStart()
.OneOrMore(r => r.AnyLetterOrDigit().Or().AnyOf("._%+-"))
.Text("@")
.OneOrMore(r => r.AnyLetterOrDigit().Or().AnyOf(".-"))
.Text(".")
.AnyLetterOrDigit().AtLeast(2)
.AtEnd()
.Build();

Nice! The improved version at the top looks very concise and clean! 💖 :D

C || C++​

2025-11-15 Reallocate memory | Field Guide { programmers.guide }

image-20251114192325977

image-20251114192239014

2025-11-05 zserge/grayskull: A tiny, dependency-free computer vision library in C for embedded systems, drones, and robotics. { github.com }

🏰 Grayskull

Grayskull is a minimalist, dependency-free computer vision library designed for microcontrollers and other resource-constrained devices. It focuses on grayscale images and provides modern, practical algorithms that fit in a few kilobytes of code. Single-header design, integer-based operations, pure C99.

Features

  • Image operations: copy, crop, resize (bilinear), downsample
  • Filtering: blur, Sobel edges, thresholding (global, Otsu, adaptive)
  • Morphology: erosion, dilation
  • Geometry: connected components, perspective warp
  • Features: FAST/ORB keypoints and descriptors (object tracking)
  • Local binary patterns: LBP cascades to detect faces, vehicles etc
  • Utilities: PGM read/write

2025-11-05 By the power of grayscale! { zserge.com }

When people talk about computer vision, they usually think of OpenCV or deep neural networks like YOLO. But in most cases, doing computer vision implies understanding of the core algorithms, so you can use or adapt them for your own needs.

I wanted to see how far I could go by stripping computer vision down to the bare minimum: only grayscale 8-bit images, no fancy data structures, plain old C, some byte arrays and a single header file. After all, an image is just a rectangle of numbers, right?

This post is a guided tour through the algorithms behind Grayskull – a minimal computer vision library designed for resource-constrained devices.

image-20251104191746300

2025-11-02 Notes by djb on using Fil-C (2025) { cr.yp.to }

I'm impressed with the level of compatibility of the new memory-safe C/C++ compiler Fil-C (filcc, fil++). Many libraries and applications that I've tried work under Fil-C without changes, and the exceptions haven't been hard to get working.

I've started accumulating miscellaneous notes on this page regarding usage of Fil-C. My selfish objective here is to protect various machines that I manage by switching them over to code compiled with Fil-C, but maybe you'll find something useful here too.

Timings below are from a mini-PC named phoenix except where otherwise mentioned. This mini-PC has a 6-core (12-thread) AMD Ryzen 5 7640HS (Zen 4) CPU, 12GB RAM, and 36GB swap. The OS is Debian 13. (I normally run LTS software, periodically upgrading from software that's 4–5 years old such as Debian 11 today to software that's 2–3 years old such as Debian 12 today; but some of the packages included in Fil-C expect newer utilities to be available.)

Related:

  • I've posted a script to help auditors see how Fil-C differs from upstream sources (clang, glibc, ...).
  • I've posted a self-contained filian-install-compiler script (replacing the 20251029 version) to download+compile+install Fil-C on Debian 13 in what I think are Debian-appropriate locations, along with glibc and binutils compiled with Fil-C. A run took 86 minutes real time (for 477 minutes user time and 52 minutes system time).
  • I've posted the start of a filian-install-packages script to download+compile+install Debian source packages, using Fil-C as the compiler (after filian-install-compiler has finished). This script has various limitations that need fixing, but it does work for a few packages already (e.g., ./filian-install-packages bzip2), after the installation of dh-exec etc. described below.
  • I've posted a graph showing nearly 9000 microbenchmarks of Fil-C vs. clang on cryptographic software (each run pinned to 1 core on the same Zen 4). Typically code compiled with Fil-C takes between 1x and 4x as many cycles as the same code compiled with clang.

2025-11-02 FreeDOS Books { www.freedos.org }

image-20251102114516953

Teach yourself how to write programs with the C programming language. We'll start with simple command line programs, and work our way up to writing a turn-based game.

2025-10-18 EbookFoundation/free-programming-books: 📚 Freely available programming books { github.com }

https://ebookfoundation.github.io/free-programming-books/

image-20251018103813447

C​

C++​

2025-10-07 Heap based scheme machine. · GitHub { gist.github.com }

as a single C file

image-20251006223335240

2025-09-27 Jacob Sorber - YouTube { www.youtube.com }

image-20250927123455585

2025-09-27 Sean Barrett - YouTube { www.youtube.com }

image-20250927123557883

2025-09-05 Memory is slow, Disk is fast - Part 2 { www.bitflux.ai }

Sourcing data directly from disk IS faster than caching in memory. I brought receipts. Because hardware got wider but not faster, the old methods don't get you there. You need new tools to use what is scaling and avoid what isn't.

The article benchmarks different ways of scanning a 50 GB dataset and shows that a carefully pipelined disk I/O path can outperform naive in-memory access when using mmap and the page cache. It highlights how modern hardware favors bandwidth scaling over latency improvements, making streaming, batching, and overlapping computation with I/O essential for high throughput.

Key Takeaways:

  • In tests, mmap with page cache delivered 3.71 GB/s, SIMD unrolled loops on cached data 5.51 GB/s, disk I/O with io_uring 5.81 GB/s, and preallocated in-RAM reads 7.90 GB/s. Disk streaming with io_uring outperformed naive cached-RAM paths.
  • mmap overhead from page faults makes it slower than reading into preallocated buffers, even when data is in RAM.
  • io_uring enables deep queues, batched async I/O, and overlap of compute and fetch, making it ideal for streaming workloads.
  • Modern hardware has flat latency but rapidly increasing bandwidth, so performance comes from streaming and batching, not random fine-grained access.
  • Proper tuning matters: high queue depth, multiple workers, 16 KB aligned buffers, NUMA pinning, and RAID0 across SSDs all improved throughput.
  • Profiling showed that cached-RAM scans were compute-limited until vectorization was optimized; memory was not the bottleneck.
  • With multiple SSDs and DMA-to-cache features, disk throughput can approach or exceed naive in-memory scans, making out-of-core processing viable.
  • Best practices: build async pipelines, profile compute loops, use aligned preallocated buffers, disable unnecessary FS features, and pin workloads to NUMA domains.
  • Key advice: do not assume RAM is always faster; measure, profile, and design for streaming pipelines.

image-20250904193416094

2025-10-10 ashtonjamesd/lavandula: A fast, lightweight web framework in C for building modern web applications { github.com }

image-20251009213359084

2025-10-10 Show HN: I built a web framework in C | Hacker News { news.ycombinator.com }

faxmeyourcode 14 hours ago | next [–]

This is some of the cleanest, modern looking, beautiful C code I've seen in a while. I know it's not the kernel, and there's probably good reasons for lots of #ifdef conditionals, random underscored types, etc in bigger projects, but this is actually a great learning piece to teach folks the beauty of C.

I've also never seen tests written this way in C. Great work.

C was the first programming language I learned when I was still in middle/high school, raising the family PC out of the grave by installing free software - which I learned was mostly built in C. I never had many options for coursework in compsci until I was in college, where we did data structures and algorithms in C++, so I had a leg up as I'd already understood pointers. :-)

Happy to see C appreciated for what it is, a very clean and nice/simple language if you stay away from some of the nuts and bolts. Of course, the accessibility of the underlying nuts and bolts is one of the reasons for using C, so there's a balance.

2025-10-10 Love C, Hate C: Web Framework Memory Problems { alew.is }

image-20251009230900726

line [1] takes Content-Length off the http packet. This is a non validated value basically straight from the socket. line [2] allocates based on that size. Line [3] copies data into that buffer based on that size. But it's copying out of a buffer of any size. So passing a Content-Length Larger than the request sent in will start copying heap data into the parser.request.body.

Another interesting choice in this project is to make lengths signed:

😁 Fun / Retro​

2025-11-15 DOOMscroll — The Game { gisnep.com }

image-20251114190733129

2025-11-04 When Stick Figures Fought - by Animation Obsessive Staff { animationobsessive.substack.com }

image-20251103200523861

2025-10-27 DOSBox SVN, CPU speed: 3000 cycles, Frameskip 0, Program: MARIO { www.myabandonware.com }

image-20251026214407849

2025-09-29 I'm Not a Robot { neal.fun }

This is a game! image-20250929162638634

2025-09-29 MitchIvin XP { mitchivin.com }

image-20250929164857194

💖 Inspiration!​

2025-11-13 Visual Types { types.kitlangton.com } Typescript

image-20251112211113670

image-20251112211445604 Visual Types is an animated, semi-interactive TypeScript curriculum by Kit Langton that teaches the type system through strong visual metaphors and motion rather than walls of text. It presents itself as a "humble collection of semi-interactive TypeScript lessons", focusing on giving newcomers durable mental models for how types behave at compile time and how they relate to runtime values.

2025-11-02 raine/anki-llm: A CLI toolkit for bulk-processing and generating Anki flashcards with LLMs. { github.com }

A CLI toolkit for bulk-processing and generating Anki flashcards with LLMs. image-20251102115322004

2025-10-27 DeadStack / Technology { deadstack.net }

image-20251026212653384

2025-10-26 ZzFX - Zuper Zmall Zound Zynth {killedbyapixel.github.io}

image-20251026112018441

2025-10-19 Notepad.exe - Native macOS Code Editor for Swift & Python { notepadexe.com }

image-20251019145620354 image-20251019145638050

2025-10-11 BreadOnPenguins/scripts: my scripts! { github.com }

image-20251011001955457

image-20251011001823643

2025-10-10 mafik/keyer: Firmware & goodies for making a KEYER (one-handed chorded keyboard). { github.com }

I've built a tiny hand-held keyboard image-20251009200630319

2025-10-04 Fluid Glass { chiuhans111.github.io }

image-20251003185844250

2025-09-29 Handy { handy.computer }

image-20250929121034886 Handy is a free, open-source speech-to-text application that runs locally on your computer. It allows users to speak into any text field by pressing a keyboard shortcut, with the app instantly transcribing speech into text. Designed for accessibility, privacy, and simplicity, Handy ensures that transcription happens on-device without sending data to the cloud.

Handy is an open-source speech-to-text app that anyone can download, modify, and contribute to.

It works via a keyboard shortcut (push-to-talk) that lets users dictate text directly into any text field.

Users can customize key bindings and choose between push-to-hold or press-to-toggle transcription modes.

All transcription is processed locally, ensuring privacy since no voice data is sent to external servers.

The app emphasizes accessibility, making advanced speech tools available for free without a paywall.

2025-09-29 consumed.today { consumed.today }

image-20250929121433709

2025-09-29 Learning Persian with Anki, ChatGPT and YouTube | Christian Jauvin { cjauvin.github.io }

image-20250929121642706 The article by Christian Jauvin describes his personal journey of learning Persian (Farsi) using a combination of tools and strategies: Anki for spaced repetition, ChatGPT for clarification and reinforcement, and YouTube with browser extensions for immersive listening practice. He emphasizes creating personalized flashcards, integrating visual aids, leveraging dual subtitles, and repeating structured listening exercises to deepen both reading and auditory comprehension.Anki as the core tool: Building a continuous deck with grammar-focused phrases, often sourced from YouTube lessons, helps reinforce memory more effectively than single words.

Card variety matters: Using "basic" cards for reading practice and "basic and reversed" cards for translation fosters both recognition and recall skills.

Challenge of Persian script: Despite knowing the alphabet, different letter forms and the absence of vowels make reading slow and difficult, requiring consistent practice.

ChatGPT as a tutor: By pasting screenshots of flashcards into a ChatGPT project, Jauvin gets instant explanations and contextual clarifications, supporting faster knowledge consolidation.

Dual Subtitles extension: Watching Persian YouTube videos with synchronized English and Farsi subtitles provides both learning material for new cards and contextual understanding.

Tweaks for YouTube extension: Fine-grained playback control (1-second skips) aids focused listening and pronunciation practice.

Listening technique: Steps include slowing playback to 75%, reading subtitles first in English, listening carefully to Farsi, cross-checking with Farsi script, and repeating out loud.

Iterative repetition: Rewatching videos multiple times allows the learner to progress from partial recognition to real-time understanding, which feels both effective and motivating.

Immersion mindset: Jauvin stresses the importance of “feeling” comprehension, even when not every word is known, by aligning meaning and sound during active listening.

Practical and replicable system: The method combines accessible digital tools with structured repetition, offering a practical framework for self-directed language learners.

2025-09-29 UTF-8 Playground { utf8-playground.netlify.app }

image-20250929163930599

2025-09-29 Why our website looks like an operating system - PostHog { posthog.com }

image-20250929164413791

2025-09-29 Elements of Rust – Core Types and Traits { rustcurious.com }

image-20250929164546808

· 14 min read

⌚ Nice watch!​

2025-11-02 How To Not Strangle Your Coworkers: Resolving Conflict with Collaboration - Arthur Doler - YouTube { www.youtube.com }

image-20251101221829769


What counts as conflict, and why it matters. Disagreement becomes conflict when the issue feels important, you are interdependent, and both sides think the evidence favors them.

Three kinds of conflict.

  • Task conflict (what to do, how to build) is good fuel for better solutions.
  • Process conflict (who decides, how we work) helps early, turns toxic if it persists.
  • Relationship conflict (who we work with, power plays) is corrosive and should be minimized.

Where conflict lives. It appears inside teams and between teams, especially with fuzzy ownership or misaligned priorities.

Two conflict mindsets (and the traps)

  • Model 1 (win-lose). Tries to control others’ emotions and “win” the exchange. Produces:
    • Self-fulfilling prophecies: your beliefs provoke the behavior you expected.
    • Self-sealing processes: your beliefs block the very conversation that could change them.
  • Model 2 (win-win). Aims for outcomes both sides can accept, accepts emotions as data, avoids self-sealing by talking openly.
  • Avoid Model 1 moves. Don’t swat opinions or moralize (“you’re wrong/bad”); it escalates and locks the trap.

Sound receptive on purpose: the HEAR method

  • Hedging. Use softeners like “perhaps,” “sometimes,” “maybe” to keep doors open.
  • Emphasize agreement. State shared premises before you differ.
  • Acknowledge. Paraphrase their point so they feel understood.
  • Reframe to the positive. Prefer “It helps me when I can complete my point.” over “I hate being interrupted.”

Confrontational styles

  • Avoiding, yielding, fighting, cooperating, conciliating.
    • Styles shift with context, status, and emotion.
    • Cooperating aligns with Model 2.
    • Conciliating mixes styles and can look like mid-conversation switching.
  • Use this awareness. Infer goals, adjust your approach, and decide when to continue, pause, or withdraw.

Conflict-handling Avoiding Competing Accommodating Compromising Collaborating { dmytro.zharii.com }

In conflict situations, individuals often exhibit different behavioral strategies based on their approach to managing disagreements. Avoiding is one strategy, and here are four others, alongside avoiding, commonly identified within conflict management models like the Thomas-Kilmann Conflict Mode Instrument (TKI):

Avoiding

  • Behavior: The individual sidesteps or withdraws from the conflict, neither pursuing their own concerns nor those of the other party.
  • When it's useful: When the conflict is trivial, emotions are too high for constructive dialogue, or more time is needed to gather information.
  • Risk: Prolonging the issue may lead to unresolved tensions or escalation.

Competing

  • Behavior: The individual seeks to win the conflict by asserting their own position, often at the expense of the other party.
  • When it's useful: When quick, decisive action is needed (e.g., in emergencies) or in matters of principle.
  • Risk: Can damage relationships and lead to resentment if overused or applied inappropriately.

Accommodating

  • Behavior: The individual prioritizes the concerns of the other party over their own, often sacrificing their own needs to maintain harmony.
  • When it's useful: To preserve relationships, resolve minor issues quickly, or demonstrate goodwill.
  • Risk: May lead to feelings of frustration or being undervalued if used excessively.

Compromising

  • Behavior: Both parties make concessions to reach a mutually acceptable solution, often splitting the difference.
  • When it's useful: When a quick resolution is needed and both parties are willing to make sacrifices.
  • Risk: May result in a suboptimal solution where neither party is fully satisfied.

Collaborating

  • Behavior: The individual works with the other party to find a win-win solution that fully satisfies the needs of both.
  • When it's useful: When the issue is important to both parties and requires creative problem-solving to achieve the best outcome.
  • Risk: Requires time and effort, which may not always be feasible in time-sensitive situations.

Self-fulfilling prophecies and Self-sealing processes

Self-fulfilling prophecies start as hunches and end as evidence. You label a teammate “unreliable,” so you stop looping them in early and keep updates tight to your chest. They hear about changes late, respond late, and your label hardens. You brace for a “hostile” stakeholder, arrive with a defensive deck and no questions, and they bristle at being steamrolled. You decide your junior “isn’t ready,” so you never give them stretch work; months later they still lack reps and look, to your eye, not ready. In each case the belief choreographs micro-moves -- who you cc, when you invite, how you ask -- that nudge the other person toward the very behavior you expected.

Breaking the spell is less grand than it sounds. Treat the belief as a hypothesis, not a verdict. Make one small change that would disconfirm it: add the “unreliable” teammate to the kickoff and define a clear, narrow success; open the “hostile” meeting with a shared goal and one genuine question; give the junior a contained, visible challenge with support and a check-in. When new behavior shows up, write it down. If you do not capture counter-evidence, your story erases it.

Self-sealing processes are trickier. Here the belief blocks the only conversation that could revise the belief. A manager thinks, “If I give direct feedback, they’ll blow up,” so they route around the issue with busywork and praise. The developer senses the dodge, digs in, and the manager sighs, “See? Impossible.” Engineering mutters, “Design never listens,” so they bring finished solutions, not problems. Design, excluded from shaping the brief, critiques what it can, the surface, and everyone leaves resentful, certain they were right. Product insists “Ops will block this,” skips early review, then hits a late veto. The loop seals itself because the corrective talk never happens.

Unsealing it means naming the cost of avoidance and asking for a bounded, specific conversation with a shared purpose. “We keep learning about scope changes after handoff. It’s creating rework. Can we spend ten minutes on a pre-handoff check so we catch this earlier?” Keep the frame neutral: what happened, the impact, the request, and invite correction: “What am I missing?” If they can edit your story, the seal is already cracking.

The difference is simple: prophecies steer people into your expectation; sealing blocks the talk that could change it. In both cases, curiosity plus one small, testable change is usually enough to bend the plot.

2025-10-18 5 Office Politics Rules That Get Managers Promoted - YouTube { www.youtube.com }

image-20251017182829612

You do the work, you hit your numbers, yet the promotion goes to someone who smiles wider and says less. I learned the hard way, twice passed over, until I stopped assuming merit speaks and started speaking the language of power. Here is the short version, straight and useful.

At the office, never outshine the bride at her own wedding. Translation: Never outshine the master. If your excellence makes your boss feel replaceable, your growth stalls. A Harvard study found that managers who align with their boss’s goals are 31% more promotable than peers who focus only on their own performance. Use the 3S Formula: Spotlight up (frame updates in your boss’s KPIs), Share credit (“This direction came from my manager”), and Strategic support (ask, “What is one thing I can take off your plate this month?”). This is not brown-nosing, it is showing you are on the same team.

Ambition is flammable. Conceal your intentions. Use the Ambition Pyramid: the bottom layer, most people, gets nothing but results; the middle, your boss and peers, gets today’s impact, not tomorrow’s titles; the tip, mentors, sponsors, and decision makers, gets the real plan because they can pull you up, not push you out. Remember Eduardo Saverin at early Facebook: oversharing ambitions created a rival power center, then his shares were diluted and he was pushed aside.

Your work is what you do; your reputation is what they remember. Guard it with your life. Define one line you want to shrink to, keep your word by under-promising and over-delivering, and stay out of gossip. Invest that energy in one ally who will defend you when you are not in the room.

Impact invisible is impact ignored. Court attention at all costs. Run the 10x Funnel: cut or delegate –10x busywork (inboxes, admin, overhelping), downplay 2x tweaks (necessary, forgettable), and spotlight 10x wins (new clients, major savings, strategic projects). This week: list and cut a –10x task, drop one 2x item from your update, and make sure the people responsible for promotions see one 10x result.

People promote the person who already feels like the job. Act like a king to be treated like one. Build presence with the 3Ps: Presence (sit tall, project your voice, cut filler, record yourself once), Point (enter each meeting with one clear strategic point, say it, then stop), Positioning (speak in outcomes, not tasks: “We drove 8% growth,” not “We finished the project”). Confidence, clarity, and composure signal readiness.

Play fair if you like; play smart if you want the title. Quick checklist for this week: spotlight up, share credit, take something off your boss’s plate, share plans only upward, define your one-line, keep one promise small and solid, avoid gossip and build one ally, cut a –10x task, drop one 2x, broadcast one 10x, and bring one sharp point and outcome language to every room. And one last trap the transcript flags: protecting your employees can backfire if it hides your results. Do not hide behind the team, scale them and make the impact visible.

2025-10-08 Answering behavioral interview questions is shockingly uncomplicated - YouTube { www.youtube.com }

image-20251007221323195​

Big idea Every behavioral question is a proxy test for a small set of core qualities. Map the question to the quality, tell a tight story using STAR, and land a crisp takeaway you learned.

The 5 qualities employers keep probing

  1. Leadership or Initiative. Not just titles. Do you take the lead without being asked.
  2. Resilience. How you respond to setbacks and failure.
  3. Teamwork. How you operate with and across people.
  4. Influence. How you persuade peers and leaders, especially senior to you.
  5. Integrity. What you do when the right choice is hard or awkward.

How the questions get asked, with quick answer hints

  1. Leadership or Initiative:
    • Phrasings: Tell me about a time you led. Tell me about a time you took initiative. Tell me about taking the lead without formal authority.
    • Hint: Show a moment you noticed a gap, acted without waiting, rallied others, and created a result.
  2. Resilience:
    • Phrasings: Tell me about a failure. Tell me about a tough challenge. Tell me about your proudest accomplishment and what it took.
    • Hint: Spend more time on the climb than the summit. What went wrong, what you changed, how you bounced back.
  3. Teamwork:
    • Phrasings: Tell me about working in a team. Tell me about bringing together people you did not know or with different backgrounds.
    • Hint: Name the goal, the mix of people, the friction points, and how you enabled collaboration.
  4. Influence:
    • Phrasings: Tell me about persuading someone. Tell me about convincing someone more senior who disagreed.
    • Hint: Show your evidence, empathy, and escalation path. Data plus listening beats volume.
  5. Integrity:
    • Phrasings: Tell me about an ethical dilemma. Tell me about seeing something off at work.
    • Hint: Show judgment, discretion, and action. Neither tattletale nor blind eye.

Prep system the author uses

  1. Brain dump:
    • Open a doc and list every personal and professional experience that could reflect the 5 qualities. Small stories count. Do not filter yet.
  2. Craft your arsenal with STAR:
    • Situation in 1 to 2 lines. Task in 1 line. Action in crisp verbs. Result in facts. Then add one line: What I learned was X.
  3. Practice delivery the right way:
    • Use bullets, not scripts. Force fluid speech.
    • Record yourself on video. Watch for filler words, eye contact, pacing.
    • Prefer pauses over fillers. Pauses feel longer to you than to them.

Storytelling rules that separate you

  1. Show, do not tell. Replace "I felt upset" with the visceral beat: "My first thought was, boy am I screwed."
  2. Build a single flowing narrative. No blocky transitions. Make STAR feel like a story, not sections.
  3. Have at least 2 stories per quality. Many stories cover multiple qualities, but do not burn your only one twice.

Example snapshots you can mirror

  1. Influence senior leader, data first:
    • S: Team used PitchBook, MD wanted to cancel due to cost.
    • T: Prove value.
    • A: Surveyed analysts, aggregated time saved and workflows unblocked, presented results.
    • R: Subscription renewed. Learned: bring data and do your own digging before making the case.
  2. Resilience via instrument switch:
    • S: Missed top orchestra on violin senior year.
    • T: Earn a second shot.
    • A: Took viola offer, hired teacher, practiced hard all summer.
    • R: Made the tour, 5 cities in Norway. Learned: treat setbacks as pivots, keep an open mind for serendipity.
  3. Integrity on the floor:
    • S: UPS coworker gaming punch times.
    • T: Decide whether to raise it.
    • A: Sought advice, raised discreetly, asked for no punitive outcome.
    • R: System improved, no one fired. Learned: character shows in small, unseen choices.

Fast checklist before your next interview

  1. For each quality, pick 2 stories, bullet them with 4 to 6 beats.
  2. Rehearse out loud from bullets only. Record and review twice.
  3. In the room, map the question to the quality before speaking.
  4. Tell the story, then say the line: What I learned from that experience was X.
  5. Keep it tight. 60 to 120 seconds per answer unless probed.

2025-10-14 Never Send These 4 Emails at Work (Lawyer's Warning) - YouTube { www.youtube.com }

image-20251013232421490

Ed Hones, an employment lawyer explains four common email mistakes that cost people their jobs and what to do instead. The talk focuses on how routine workplace emails can create legal exposure when they seem harmless.

Key points:

  • Complaining about your boss: Unless you connect your complaint to a protected activity like discrimination or harassment, your email gives you no legal protection.
  • Emotional replies to performance reviews: Don’t argue or vent. Acknowledge any fair criticism and calmly correct inaccuracies with evidence.
  • Vague health updates: Saying “I’m dealing with anxiety” or “not feeling well” gives no legal notice. State that it’s a diagnosed medical condition to trigger legal protections.
  • Personal or job-search emails from work: Your employer owns the system and can read everything. Using it for personal messages or job hunting gives them cause to fire you legally.

Bottom line: Stay factual, calm, and specific. Make protected complaints in writing, and never assume work email is private.

· 15 min read

Good Reads​

2025-10-27 Seeing like a software company { www.seangoedecke.com }

Legibility is the product large software companies sell. Legible work is estimable, plannable, and explainable, even if it’s less efficient. Illegible work—fast patches, favors, side channels—gets things done but is invisible to executive oversight. Companies value legibility because it enables planning, compliance, and customer trust.

Small teams move faster because they remain illegible. They skip coordination rituals, roadmap alignment, and approval processes. As companies grow, this speed is sacrificed in favor of legibility. Large orgs trade efficiency for predictability.

Enterprise revenue drives the need for legibility. Large customers demand multi-quarter delivery guarantees, clear escalation paths, and process visibility. To win and retain these deals, companies adopt layers of coordination, planning, and status reporting.

Urgent problems bypass process through sanctioned illegibility. Companies create strike teams or tiger teams that skip approvals, break rules, and act fast. These teams rely on senior engineers, social capital, and informal coordination. Their existence confirms that normal processes are too slow for real emergencies.

image-20251026215853892

2025-10-27 Abstraction, not syntax { ruudvanasseldonk.com }

image-20251026215455550

2025-10-10 My Approach to Building Large Technical Projects – Mitchell Hashimoto { mitchellh.com }

image-20251009222639033

I stay motivated on big projects by chasing visible progress. I break the work into small pieces I can see or test now, not later. I start with backends that are easy to unit test, then sprint to scrappy demos. I aim for good enough, not perfect, so I can move to the next demo. I build only what I need to use the thing myself, then iterate as real use reveals gaps.

Five takeaways: decompose into demoable chunks; write tests to create early wins; build quick demos regularly; adopt your own tool fast; loop back to improve once it works for you. Main advice: always give myself a good demo, do not let perfection block progress, optimize for momentum, build only what I need now, iterate later with purpose.

2025-10-01 Stop Avoiding Politics – Terrible Software { terriblesoftware.org }

image-20251001144303465

Here’s what good politics looks like in practice:

  1. Building relationships before you need them. That random coffee with someone from the data team? Six months later, they’re your biggest advocate for getting engineering resources for your data pipeline project.
  2. Understanding the real incentives. Your VP doesn’t care about your beautiful microservices architecture. They care about shipping features faster. Frame your technical proposals in terms of what they actually care about.
  3. Managing up effectively. Your manager is juggling competing priorities you don’t see. Keep them informed about what matters, flag problems early with potential solutions, and help them make good decisions. When they trust you to handle things, they’ll fight for you when it matters
  4. Creating win-win situations. Instead of fighting for resources, find ways to help other teams while getting what you need. It doesn’t have to be a zero-sum game.
  5. Being visible. If you do great work but nobody knows about it, did it really happen? Share your wins, present at all-hands, write those design docs that everyone will reference later.

2025-09-29 Taking a Look at Compression Algorithms | Moncef Abboud { cefboud.com }

image-20250929120137987 The article is a practical tour of lossless compression, focusing on how common schemes balance three levers: compression ratio, compression speed, and decompression speed. It explains core building blocks like LZ77 and Huffman coding, then dives into DEFLATE as used by gzip, before comparing speed and ratio tradeoffs across Snappy, LZ4, Brotli, and Zstandard. It also highlights implementation details from Go’s DEFLATE, and calls out features like dictionary compression in zstd.

💖 2025-09-29 Keeping Secrets Out of Logs - allan.reyes.sh { allan.reyes.sh }

image-20250929163228032

Treat it as a data-flow problem. Centralize logging through one pipeline and one library. Make it the only way to emit logs and the only way to view them.

Transform data early. Favor minimization, then redaction; consider tokenization or hashing; treat masking as last resort. Apply before crossing trust boundaries or logger calls.

Introduce domain primitives for secrets. Stop passing raw strings. Give secrets types/objects that default to safe serialization and require explicit unwraps.

Use read-once wrappers. Allow a single, intentional read; any second read throws. This turns accidental logging into a loud failure in tests and staging.

Own the log formatter. Enforce structured JSON. Traverse objects, drop risky paths (e.g., headers, request, response.body), redact known fields, and block generic .toString().

Add taint checking. Mark sources (decrypt, DB reads, request bodies). Forbid sinks (logger). Whitelist sanitizers (tokenize). Run in CI and on large diffs; expect rules to evolve.

Test like a pessimist. Capture stdout/stderr; fail tests on unredacted secrets. In prod, redact; in tests, error. Cover hot paths that produce “kitchen sinks.”

Scan on the pipeline. Use secret scanners in CI and at the log ingress. Prefer sampling per-log-type over a flat global rate so low-volume types still get scanned.

Insert a pre-processor hop. Put Vector/Fluent Bit between emitters and storage to redact, drop, tokenize, and sample for heavy scanners before persistence.

Invest in people. Teach “secret vs sensitive,” publish paved paths, and make it safe and fast to report leaks.

Lay the foundation. Align on a definition of “secret,” move to structured logs, and consolidate emit/view into one pipeline. Expect to find more issues at first; that’s progress.

Map the data flow. Draw sources, sinks, and side channels. Include front-end analytics, ALB/NGINX access logs, error trackers, and any bypasses of your main path.

Fortify chokepoints. Put most controls where all logs must pass: the library, formatter, CI taint rules, scanners, and the pre-processor. Pull teams onto the paved path.

Apply defense-in-depth. Pair every preventative with a detective one step downstream. If formatter redacts, scanners verify. If types prevent, tests break on regressions.

Plan response and recovery. When a leak happens: scope, restrict access, stop the source, clean stores and indexes, restore access, run a post-mortem, and harden to prevent recurrence.

2025-09-29 The yaml document from hell { ruudvanasseldonk.com }

image-20250929121905645Ruud van Asseldonk’s article The YAML Document from Hell critiques YAML as overly complex and error-prone compared to JSON. Through detailed examples, he shows how YAML’s hidden features, ambiguous syntax, and inconsistent versioning can produce confusing or dangerous outcomes, making it risky for configuration files.

Key Takeaways

  1. YAML’s complexity stems from numerous features and a large specification, unlike JSON’s simplicity and stability.
  2. Ambiguous syntax such as 22:22 may be parsed as a sexagesimal number in YAML 1.1 but as a string in YAML 1.2.
  3. Tags (!) and aliases (*) can lead to invalid documents or even security risks, since untrusted YAML can trigger arbitrary code execution.
  4. The “Norway problem” highlights how literals like no or off become false in YAML 1.1, leading to unexpected values.
  5. Non-string keys (e.g., on) may be parsed as booleans, creating inconsistent mappings across parsers and languages.
  6. Unquoted strings resembling numbers (e.g., 10.23) are often misinterpreted as numeric values, corrupting intended data.
  7. YAML version differences (1.1 vs 1.2) mean the same file may parse differently across tools, causing portability issues.
  8. Popular libraries like PyYAML or Go’s yaml use hybrid or outdated interpretations, making reliable parsing difficult.
  9. The abundance of edge cases (63+ string syntaxes) makes YAML unpredictable and fragile in real-world use.
  10. Author’s recommendation: avoid YAML when correctness and predictability are critical, and prefer simpler formats like JSON.

đŸ› ïž How the things work​

2025-10-27 Build Your Own Database { www.nan.fyi }

If you were to build your own database today, not knowing that databases exist already, how would you do it? In this post, we'll explore how to build a key-value database from the ground up. image-20251026210659851

2025-10-27 An Illustrated Introduction to Linear Algebra { www.ducktyped.org }

image-20251026215702717

Activity tracking​

2025-10-06 GitHub - ActivityWatch/activitywatch: The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused. {github.com}

image-20251005210310484

Cross-platform automated activity tracker with watchers for active window titles and AFK detection. Data stored locally; JSONL and SQLite via modules. Add aw-watcher-input to count keypresses and mouse movement without recording the actual keys.

🐙🐈 GitHub - ActivityWatch/activitywatch: The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused. {github.com}

2025-10-06 arbtt: the automatic, rule-based time tracker {arbtt.nomeata.de}

image-20251005210426544

🐙🐈 GitHub - nomeata/arbtt: arbtt, the automatic rule-based time-tracker {github.com}

2025-10-06 GitHub - MayGo/tockler: An application that tracks your time by monitoring your active window title and idle time. {github.com}

image-20251005211424673

Tockler is a free application that automatically tracks your computer usage and working time. It provides detailed insights into:

  • Application usage and window titles
  • Computer state (idle, offline, online)
  • Interactive timeline visualization
  • Daily, weekly, and monthly usage statistics
  • Calendar views and charts

Features

  • Time Tracking: Go back in time and see what you were working on
  • Application Monitoring: Track which apps were used and their window titles
  • Usage Analytics: View total online time, application usage patterns, and trends
  • Interactive Timeline: Visualize your computer usage with an interactive chart
  • Cross-Platform: Available for Windows, macOS, and Linux

2025-10-06 Welcome to Workrave · Workrave {workrave.org}

Take a break and relax Workrave is a free program that assists in the recovery and prevention of Repetitive Strain Injury (RSI). It monitors your keyboard and mouse usage and using this information, it frequently alerts you to take microbreaks, rest breaks and restricts you to your daily computer usage.

image-20251005211622763

image-20251005211759932

image-20251005211842500

🐙🐈 2025-10-06 GitHub - rcaelers/workrave: {github.com}

ADHD​

2025-10-01 ADHD wiki — Explaining ADHD with memes { romankogan.net }

It’s a personal “ADHD wiki” by Roman Kogan: short, plain-language pages that explain common adult ADHD patterns (e.g., procrastination, perfectionism, prioritizing, planning), with concrete coping tips and meme-style illustrations; sections include ideas like “Body Double” and “False Dependency Chain.”

See also: 2025-10-01 Show HN: Autism Simulator | Hacker News { news.ycombinator.com }

img

image-20251001140046232

👂 The Ear of AI (LLMs)​

2025-10-27 LLMs Can Get "Brain Rot"! { llm-brain-rot.github.io }

Low quality data causes measurable cognitive decline in LLMs The authors report that continually pretraining on junk data leads to statistically meaningful performance drops, with Hedges' g > 0.3 across reasoning, long context understanding, and safety. This suggests that data quality alone, holding training scale constant, can materially degrade core capabilities of a model. Actionable insight: data going into continual pretraining is not neutral, and "more data" is not automatically better.

image-20251026210759371

2025-10-05 Which Table Format Do LLMs Understand Best? (Results for 11 Formats) { www.improvingagents.com }

Study tests 11 data formats for LLM table comprehension using GPT-4.1-nano on 1,000 records and 1,000 queries. Accuracy varies by format. Markdown-KV ranks highest at 60.7 percent, CSV and JSONL rank lowest near mid 40s. Higher accuracy costs more tokens, Markdown-KV uses about 2.7 times CSV. Markdown tables offer a balance of readability and cost. Use headers and consider repeating them for long tables. Results are limited to one model and one dataset. Try format transforms in your pipeline to improve accuracy, and validate on your own data.

image-20251005102907707

2025-10-27 What Actually Happens When You Press ‘Send’ to ChatGPT { blog.bytebytego.com }

image-20251026205412821

2025-10-06 Stevens: a hackable AI assistant using a single SQLite table and a handful of cron jobs { www.geoffreylitt.com }

image-20251005215743741

I built a useful AI assistant using a single SQLite memories table and a handful of cron jobs running on Val.town. It sends my wife and me daily Telegram briefs powered by Claude, and its simplicity makes it both reliable and fun to extend.

  • The system centers on one memories table and a few scheduled jobs. Each day’s brief combines next week’s dated items and undated background entries.
  • I wrote small importers that run hourly or weekly: Google Calendar events, weather updates, USPS Informed Delivery OCR via Claude, Telegram and email messages, and even fun facts.
  • Everything runs entirely on Val.town — storage, HTTP endpoints, scheduled jobs, and email.
  • The assistant delivers a daily summary to Telegram and answers ad hoc reminders or queries on demand.
  • I designed a “butler” persona and a playful admin UI through casual “vibe coding.”
  • Instead of starting with a complex agent or RAG setup, I focused on simple, inspectable building blocks, planning to add RAG only when needed.
  • I shared all the code on Val.town for others to fork, though it’s not a packaged app.

2025-09-30 2025 AI Darwin Award Nominees - Worst AI Failures of the Year { aidarwinawards.org }

What Are the AI Darwin Awards? Named after Charles Darwin's theory of natural selection, the original Darwin Awards celebrated those who "improved the gene pool by removing themselves from it" through spectacularly stupid acts. Well, guess what? Humans have evolved! We're now so advanced that we've outsourced our poor decision-making to machines.

The AI Darwin Awards proudly continue this noble tradition by honouring the visionaries who looked at artificial intelligence—a technology capable of reshaping civilisation—and thought, "You know what this needs? Less safety testing and more venture capital!" These brave pioneers remind us that natural selection isn't just for biology anymore; it's gone digital, and it's coming for our entire species.

Because why stop at individual acts of spectacular stupidity when you can scale them to global proportions with machine learning? image-20250929221917787

2025-09-29 Varietyz/Disciplined-AI-Software-Development { github.com }

image-20250929165126218 This methodology provides a structured approach for collaborating with AI systems on software development projects. It addresses common issues like code bloat, architectural drift, and context dilution through systematic constraints and validation checkpoints.

2025-09-05 LLM Visualization { bbycroft.net }

image-20250904193058957

2025-09-29 The AI coding trap | Chris Loy { chrisloy.dev }

image-20250929120500935

  • Coding is primarily problem-solving; typing is a small fraction of the work.
  • AI coding tools accelerate code generation but often create more work in integration, debugging, and documentation.
  • Productivity gains from AI are overstated; real-world improvements hover around 10 percent.
  • Developers risk spending most of their time cleaning up AI output rather than engaging in creative coding.
  • The situation mirrors the tech lead’s dilemma: speed versus team growth and long-term sustainability.
  • Effective teams balance delivery with learning through practices like code reviews, TDD, modular design, and pair programming.
  • AI agents act like junior engineers: fast but lacking growth, requiring careful management.
  • Two approaches exist: sustainable AI-driven engineering versus reckless “vibe coding.” The latter collapses at scale.
  • Prototyping with AI works well, but complex systems still demand structured human thinking.
  • The path forward lies in integrating AI into established engineering practices to boost both velocity and quality without sacrificing maintainability.

💖 2025-09-29 Getting AI to Work in Complex Codebases { github.com }

image-20250929122409262 The writeup explains how to make AI coding agents productive in large, messy codebases by treating context as the main engineering surface. The core method is frequent intentional compaction: repeatedly distilling findings, plans, and decisions into short, structured artifacts, keeping the active window lean, using side processes for noisy exploration, and resetting context to avoid drift. The piece sits alongside a YC talk and HumanLayer tools that operationalize these practices for teams.

  • Create progress.md to track objective, constraints, plan, decisions, next steps.
  • Keep a short spec.md with intent, interfaces, acceptance checks.
  • Work in small verifiable steps; open tiny PRs with one change each.
  • Reset context often; reload only spec and latest progress.md.
  • Leave headroom in context; do not fill the window to max.
  • Use side scratchpads or subagents for noisy searches; paste back only distilled facts.
  • Select minimal relevant files/snippets; avoid dumping whole files.
  • Compact after each step: summarize what you learned and what changed.
  • Write interface contracts first; generate code to those contracts.
  • Define acceptance tests upfront; run them after every change.
  • Use checklists: goal, risks, dependencies, test plan.
  • Capture decisions in commit messages so resets can rehydrate fast.
  • Prefer diff-based edits; show before and after for each file.
  • Maintain a file map of key modules and entry points.
  • Record open questions and assumptions; resolve or delete quickly.
  • Pin critical facts and constraints at the top of progress.md.
  • Limit active artifacts to spec.md, progress.md, and the files you are editing.
  • Timebox exploration; convert findings into 3–5 bullet truths.
  • Avoid long logs in context; attach only error excerpts needed for next step.
  • Re-run tests after every edit; paste only failing lines and stack frames.
  • Use a stable prompt template: objective, constraints, context, task, checks.
  • Prefer rewriting small functions over editing large ones in place.
  • Name a single current objective; block unrelated requests until done.
  • Create a rollback plan; keep last good commit hash noted.
  • End each session by compacting into progress.md and updating spec if stable.

· 24 min read

⌚ Nice watch!​

2025-09-17 Sam H. Smith – Parsing without ASTs and Optimizing with Sea of Nodes – BSC 2025 - YouTube { www.youtube.com }

image-20250916232906913

Summary

  • Prefer simple, fast tokenization with a cached peek and a rewindable savepoint instead of building token arrays or trees. See Tiny C Compiler’s one-pass design for inspiration: Tiny C Compiler documentation.
  • Parse expressions without an AST using a right-recursive, precedence-aware function that sometimes returns early when the parent operator has higher precedence. This is equivalent in spirit to Pratt or precedence-climbing parsing. A clear tutorial: Simple but Powerful Pratt Parsing.
  • When a later token retroactively changes meaning, rewind to a saved scanner position and re-parse with the new mode rather than maintaining an AST.
  • Start with a trivial linear IR using value numbers and stack slots so you can get codegen working early.
  • Treat variables as stack addresses in the naive IR, but in the optimized pipeline treat variables as names bound to prior computations, not places in memory.
  • Generate control flow with simple labels and conditional branches, then add else, while, and defer by re-parsing the relevant scopes from savepoints to emit the missing pieces.
  • Inline small functions by jumping to the callee’s source, parsing it as a scope, and treating a return as a jump to the end of the inlined region.
  • Move to a Sea-of-Nodes SSA graph as the optimization IR so that constant folding, CSE, and reordering fall out of local rewrites. Overview and history: Sea of nodes on Wikipedia and Cliff Click’s slide deck: The Sea of Nodes and the HotSpot JIT.
  • Hash-cons nodes to deduplicate identical subgraphs and attach temporary keep-alive pins while constructing; remove pins to let unused nodes free. A hands-on reference implementation: SeaOfNodes/Simple.
  • Represent control with If nodes that produce projections, merge with Region nodes, and merge values with Phi nodes. A compact SSA primer: Static single-assignment form and LLVM PHI example: SSA and PHI in LLVM IR.
  • Convert the Sea-of-Nodes graph back to a CFG using Global Code Motion; then eliminate Phi by inserting edge moves. Foundational paper: Global Code Motion / Global Value Numbering.
  • Build a dominator tree and schedule late to avoid hoisting constants and work into hot blocks. A modern overview of SSA placement and related algorithms: A catalog of ways to generate SSA.
  • Prefer local peephole rewrites applied continuously as you build the graph; ensure the rewrite set is confluent enough to terminate. A readable walkthrough with code and GCM illustrations: Sea of Nodes by Fedor Indutny.
  • Keep memory effects simple at first by modeling loads, stores, and calls on a single control chain; only add memory dependence graphs once everything else is stable.
  • For debug info, insert special debug nodes that capture in-scope values at control points so later scheduling and register allocation can still recover variable locations.
  • Expect tokenizer speed to matter when you rely on rewinds; invest in fast scanning and cached peek results.
  • In language design, favor unique top-level keywords so you can pre-scan files, discover declarations, and compile procedure bodies in parallel.
  • Recognize limits and tradeoffs. One-pass compilers are fast but produce naive code without a strong optimizing IR; see the discussion and TCC’s own docs: Do one-pass compilers still exist and Tiny C Compiler documentation.
  • Know the current landscape. Sea-of-Nodes is widely used, but some engines have moved away for language-specific reasons; see V8’s 2025 write-up: Land ahoy: leaving the Sea of Nodes.

Minimal tokenizer with peek, consume, and rewind

typedef struct {
const char* src; // start of file
const char* cur; // current byte
Token cached; // last peeked token
bool has_cached;
} Scanner;

typedef struct { const char* cur; bool has_cached; Token cached; } Savepoint;

Token peek(Scanner* S) {
if (S->has_cached) return S->cached;
S->cached = scan_one_token(S->cur); // returns enum + slice/span
S->has_cached = true;
return S->cached;
}

Token consume(Scanner* S) {
Token t = peek(S);
S->cur = t.end; // advance by span
S->has_cached = false;
return t;
}

Savepoint mark(Scanner* S) { return (Savepoint){ S->cur, S->has_cached, S->cached }; }
void rewind(Scanner* S, Savepoint sp) { S->cur = sp.cur; S->has_cached = sp.has_cached; S->cached = sp.cached; }

Expression parsing without an AST, with early return on higher-precedence parent

// precedence: larger = binds tighter. e.g., '*' > '+'
int prec(TokenKind op);

bool parse_expr(Scanner* S, int parent_prec, Value* out);

// parse a primary or unary, then loop binary ops of >= parent_prec
bool parse_expr(Scanner* S, int parent_prec, Value* out) {
Value lhs;
if (!parse_unary_or_primary(S, &lhs)) return false;

for (;;) {
Token op = peek(S);
if (!is_binary(op.kind)) break;
int myp = prec(op.kind);
if (myp <= parent_prec) break; // go-left-sometimes: return to parent
consume(S); // eat operator
Value rhs;
if (!parse_expr(S, myp, &rhs)) return false;
lhs = emit_binop(op.kind, lhs, rhs); // compute or build IR
}

*out = lhs;
return true;
}

Rewind on forward knowledge

Savepoint sp = mark(&scanner);
Value v;
bool ok = parse_expr(&scanner, -1, &v);

if (ok && peek(&scanner).kind == TOK_DOLLAR) {
consume(&scanner);
rewind(&scanner, sp);
set_mode(EXPR_MODE_DOLLAR_PLUS); // switch semantics
ok = parse_expr(&scanner, -1, &v); // re-parse
}

Toy linear IR with value numbers and stack slots

// vN are SSA-like value numbers, but we spill everything initially.
int v_lit(int64_t k); // emit literal -> v#
int v_addr(StackSlot s); // address-of a local -> v#
int v_load(int v_addr); // load [v_addr] -> v#
void v_store(int v_addr, int v_val); // store v_val -> [v_addr]
void br_eqz(int v_cond, Label target);

Phi and region construction at a merge

int then_x = build_then(...);   // returns value number
int else_x = build_else(...);

Region r = new_region();
int phi_x = new_phi(r, then_x, else_x); // SSA merge point
bind_var(env, "x", phi_x);

Global code motion back to a CFG and Phi removal

// For each block that flows into region R with phi v = phi(a from B1, b from B2):
// insert edge moves at end of predecessors, then kill phi.
emit_in(B1, "mov v <- a");
emit_in(B2, "mov v <- b");
remove_phi(R, v);

Local peephole rules to run during graph build

// Commutativity and constant folding
rule add(x, y) -> add(y, x) if is_const(y) && !is_const(x);
rule add(k1, k2) -> lit(k1+k2);
rule mul(k1, k2) -> lit(k1*k2);

// Strength reductions
rule mul(x, lit(1)) -> x;
rule mul(x, lit(0)) -> lit(0);

What to watch out for

  • Tokenizer performance matters because you will peek and rewind frequently.
  • Ensure your rewrite set terminates; run to a fixed point in release builds and assert progress stops.
  • Keep memory ordering strict at first by threading loads, stores, and calls on the control chain; only then add memory dependence edges.
  • Dominance and latest safe placement are key for late scheduling; compute the dominator tree over the finalized CFG and sink work accordingly. Background: Code motion.
  • Sea-of-Nodes is powerful but not universal; language and runtime constraints may push you toward different IRs, as V8 discusses here: Land ahoy: leaving the Sea of Nodes.

Further references the talk alluded to

Bottom line

  • Parse without trees using a fast scanner, precedence-aware recursion, and savepoints.
  • Get a simple linear IR running, then switch to a Sea-of-Nodes SSA graph with hash-consing and continuous peephole rewrites.
  • Reconstruct a CFG via Global Code Motion, eliminate Phi with edge moves, and schedule late using a dominator tree.
  • Keep memory simple first; add sophistication only when the rest is solid.
  • Prefer local, incremental rewrites and measure.

2025-09-03 Please stop vibe coding like this - YouTube { www.youtube.com }

image-20250902230358413

  1. Know the craft; do not let tools outrun skill.
  2. Use vibe coding for throwaway and legacy work, not for core craftsmanship.
  3. Name the mode: agentic coding vs vibe coding, and pick deliberately.
  4. Prefer small local code over extra dependencies when the task is tiny.
  5. Use AI to replace low-value engineering, not engineers.

"You still need to know how code works if you want to be a coder." I keep the skill floor high. If I feel the tool exceeding my understanding, I stop, turn off the agent, and read. I ask chat to teach, not to substitute thinking. I refuse the comfort of not knowing because comfort in ignorance is corrosive. If the tool is better than me at the task, I train until that is no longer true, then use the tool as a multiplier rather than a crutch.

"The majority of code we write is throwaway code." I point vibe coding at disposable work: scripts, scaffolding, glue, UI boilerplate, exploratory benchmarks. I optimize for speed, learning, and deletion, not polish. Good code solves the right problem and does not suck to read; here I bias the first trait and accept that readability is optional when the artifact is destined to be forgotten. I ship, test the idea, and freely discard because throwing it away never hurts.

"Agentic coding is using prompts that use tools to then generate code. Vibe coding is when you don't read the code after." I name the mode so I do not confuse capabilities with habits. Agentic flows can plan edits across a repo; vibe coding is a behavior choice to stop reading and just prompt. If I neither know nor read the code, I am stuck. If I know the code and sometimes choose not to read it for low-stakes tasks, I am fast. Clear terms prevent hype and let me pick the right tool for the job.

"You cant be mad at vibe coding and be mad at left-pad." For tiny problems, I keep ownership by generating a few lines locally instead of importing yet another dependency with alien opinions. When a package bites, patching generated local code is easier than vendoring the world. Vibe coding solves the same pain that excessive deps create, but without surrendering control of the codebase.

"Vibe coding isn't about replacing engineers. Its about replacing engineering." I aim AI at the low-value engineering I never wanted to do: a quick SVG->PNG converter in the browser, a square image maker for YouTube previews, lightweight benchmarking harnesses. These are small, tailor-made tools that unlock output with near-zero ceremony. Experts remain essential for the hard parts; AI just clears the gravel so we can climb.

2025-08-30 Burnout from the Top – A Story of Falling, Learning, and Rising Again - Tom Erik Rozmara Frydenlund - YouTube { www.youtube.com }

burnout, leadership, mental health, recovery, resilience, psychological safety, work culture, boundaries, calendar managementimage-20250830161310226

image-20250830161514640

I used to treat leadership like armor. Stand in front. Be strong. Say yes. Keep moving. Then my own body called time. One night my heart raced past 220. The doctor said drive in. The nurse called an ambulance. It was not a heart attack, but it was close enough to stop me. That was the day I learned burnout is an invisible injury. You look fine. You are not.

The signs were there for weeks. I stopped sleeping. I lost motivation. My focus frayed. I snapped at home. I withdrew. My personality shifted. People saw the change before I did. If you notice this in yourself or in a colleague, ask the simple question: are you OK. That question can be the lifeline.

The causes were obvious in hindsight. Too much work, all channels open, phone always on. Unclear expectations I filled with extra effort. A culture that prized speed over quality. Isolation. Perfectionism. I tried to deliver 100 percent on everything. That is expensive in hours and in health. Ask what is good enough. Leave room to breathe.

Recovery was not heroic. It was slow and dull and necessary. I accepted that I was sick even if no one could see it. I told people. That made everyday life less awkward and it cut the shame. My days became simple: wake, breakfast, long walk, read, sleep, repeat. Minus 20 or pouring rain, I walked. Some days I felt strong and tried to do too much. The next day I crashed. I learned to pace. Think Amundsen, not Scott. Prepare. March the same distance in bad weather and good. Quality every day beats bursts and collapses.

Talking helped. Family, colleagues, a professional if you need it. Do not keep it inside. Burnout is now a described syndrome of unmanaged work stress. You are not unique in this, and that is a relief. The earlier you talk, the earlier you can turn. There are stages. I hit the last one. You do not need to.

Returning to work took time. Six months from ambulance to office. Do not sprint back. Start part time. Expect bumps. Leaders must make space for this. Do not load the diesel engine on a frozen morning. Warm it first. If you lead, build a ramp, not a wall.

I changed how I use time. I own my calendar. I block focus time before other people fill my week. I add buffers between meetings. I add travel time. I prepare on purpose. I ask why I am needed. I ask what is expected. If there is no answer, I decline. I say no when I am tired or when I will not add value. I reschedule when urgency is fake. Many meetings become an email or a short call when you ask the right question.

I changed how I care for the basics. I set realistic goals. I move every day. Long walks feed the brain. I go to bed on time. I protect rest. I learned to say no and to hold the line. I built places to recharge. For me it is a cabin and a fire. Quiet. Books. Music. You find your own battery and you guard it.

I changed how I lead. Psychological safety is not a slide. It is daily behavior. We build trust. We keep confidences. We invite dissent and keep respect. We cheer good work and we say the missing word: thank you. Recognition costs little and pays back a culture where people speak up before they break. I aim for long term quality over quick gains. The 20 mile march beats the sprint for the next quarter. Greatness is choice and discipline, not luck.

I dropped the mask. Pretending to be superhuman drains energy you need for the real work. I am the same person at home and at work. I can be personal. I can admit fear. I can cry. That honesty gives others permission to be human too. It also prevents the slow leak of acting all day.

On motivation, I look small and near. You do not need fireworks every morning. You need a reason. Clean dishes. A solved bug. A customer who can sleep because the system is stable. Ask why. Ask it again. Clear purpose turns effort into progress. When the honeymoon buzz fades, purpose stays.

If you are early on this path, take these moves now. Notice the signs. Talk sooner. Cut the always-on loop. Define good enough. Pace like Amundsen. If you are coming back, ramp slowly and let others help. If you lead, design conditions for health: time to think, time to rest, time to do quality work. Own the calendar. Guard the buffers. Reward preparation. Thank people. And remember the simplest goal. Wake up. You are here. Build from there.

2025-08-10 You're using AI coding tools wrong - YouTube { www.youtube.com }

image-20250810150835021 Key Takeaways – The Real Bottleneck in Software Development (and How AI Should Actually Help)

  • Writing code was never the bottleneck – Shipping slow isn’t because typing is slow. Code reviews, testing, debugging, knowledge transfer, coordination, and decision-making are the real pace-setters.
  • Processes kill speed when misused – Long specs, excessive meetings, and rigid “research → design → spec → build → ship” flows often lock in bad assumptions before real user feedback happens.
  • Prototype early, prototype often – Fast, rough builds are a cheap way to learn if an idea is worth pursuing. The goal is insight, not production-grade quality at first.
  • Optimize for “time to next realization” – The fastest path from assumption to new learning wins. Use prototypes to expose wrong assumptions before investing heavily.
  • Throwaway code vs. production code – Treat them differently. Throwaway code is for learning, experiments, and iteration; production code is for maintainability and scale. Confusing the two makes AI tools look worse than they are.
  • AI’s best use is speeding up iteration, not replacing devs – Let AI help create quick prototypes, test tech approaches, and refine concepts. Don’t just use it to auto-generate bloated specs or production code you don’t understand.
  • Bad specs cost more than slow typing – If research and design start from faulty assumptions, all the downstream work is wasted. Prototypes fix this by providing a working reference early.
  • Smaller teams + working prototypes = better communication – Three people iterating on a small demo is more effective than 20 people debating a massive spec.
  • Culture shift needed – Many engineers and PMs resist prototypes, clinging to big upfront design. This causes conflict when AI makes rapid prototyping possible.
  • Fun matters – Iterating on ideas with quick feedback loops is engaging. Endless Jira tickets and reviewing AI-generated slop are not.
  • Main warning – If AI tools only make it easier to produce large amounts of code without improving understanding, you slow down the real bottleneck: team alignment and decision-making.

Source article:

2025-08-10 Writing Code Was Never The Bottleneck - ordep.dev { ordep.dev }

The actual bottlenecks were, and still are,

  • đŸ–„ïžđŸ” code reviews,
  • đŸ“šđŸ€ knowledge transfer through mentoring and pairing,
  • đŸ§Ș✅ testing,
  • 🔎🐛 debugging, and
  • the human overhead of đŸ“…đŸ—ŁïžđŸ€ coordination and communication.

All of this wrapped inside the labyrinth of tickets, planning meetings, and agile rituals.

image-20250810150922779

2025-09-17 10 Things I Do On Every .NET App - Scott Sauber - NDC Oslo 2025 - YouTube { www.youtube.com }

image-20250916224718622

1. Organize by feature folders, not by technical layer Group controllers, views, view models, client assets, and tests by feature to increase cohesion and make adding or removing a feature localized. This applies equally to MVC, Razor Pages, Blazor, and React front ends.

Code sketch:

/Features
/MyProfile
MyProfileController.cs
MyProfileViewModel.cs
Index.cshtml
index.css
index.tsx // if co-locating SPA bits
MyProfile.tests.cs

Reference: Feature Slices for ASP.NET Core


2. Treat warnings as errors Fail the build on warnings to keep the codebase clean from day 1. Prefer project-wide MSBuild setting. For full coverage across tasks, also use the CLI switch.

Code:

<!-- .csproj -->
<PropertyGroup>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>

CLI:

dotnet build -warnaserror

Reference: MSBuild TreatWarningsAsErrors property


3. Prefer structured logging with Serilog via ILogger, enrich with context Use structured properties rather than string concatenation, enrich logs with correlation id, user id, request url, version, etc. Always program against ILogger and configure Serilog only in bootstrap.

Code:

// Program.cs
Log.Logger = new LoggerConfiguration()
.Enrich.FromLogContext()
.WriteTo.Console()
.CreateLogger();

builder.Host.UseSerilog((ctx, lc) => lc
.ReadFrom.Configuration(ctx.Configuration));

// In a handler/service
public Task Handle(Guid userId) {
_logger.LogInformation("Retrieving user {@UserId}", userId);
return Task.CompletedTask;
}

Reference: Serilog Documentation


4. Distinguish logs vs metrics vs audits; store audits in your primary data store Keep developer-focused logs separate from business metrics; store audit trails where loss is unacceptable in your transactional store, not only in logs. Security and compliance often require retention beyond default log windows.


5. Secure by default with a global fallback authorization policy Make endpoints require authentication unless explicitly opted out by AllowAnonymous or a policy override.

Code:

// Program.cs
builder.Services.AddAuthorization(options =>
{
options.FallbackPolicy = new AuthorizationPolicyBuilder()
.RequireAuthenticatedUser()
.Build();
});

Reference: ASP.NET Core Authorization Policies


6. Prefer FluentValidation over data annotations for complex rules FluentValidation offers readable, testable rules and rich composition.

Code:

public sealed class RegisterModelValidator : AbstractValidator<RegisterModel>
{
public RegisterModelValidator()
{
RuleFor(x => x.Email).NotEmpty().EmailAddress();
RuleFor(x => x.Password).NotEmpty().MinimumLength(12);
RuleFor(x => x.BirthDate)
.Must(d => d <= DateOnly.FromDateTime(DateTime.UtcNow).AddYears(-18))
.WithMessage("Must be 18+");
}
}

Reference: FluentValidation for .NET


7. Remove the Server header from Kestrel Avoid advertising your stack to scanners by disabling the Kestrel Server response header.

Code:

// Program.cs
builder.WebHost.ConfigureKestrel(o => o.AddServerHeader = false);

Reference: Kestrel Web Server in ASP.NET Core


8. Inject options as a POCO by registering the Value Keep Options pattern at the edges and inject your settings class directly to consumers by registering the bound value; use IOptionsSnapshot when settings can change per request.

Code:

// Program.cs
builder.Services.Configure<MyAppSettings>(builder.Configuration.GetSection("MyApp"));
builder.Services.AddSingleton(sp => sp.GetRequiredService<IOptions<MyAppSettings>>().Value);

// Consumer
public sealed class WidgetService(MyAppSettings settings) { ... }

Reference: Options pattern in ASP.NET Core


9. Favor early returns and keep the happy path at the end Minimize nesting, return early for error and guard cases, and let the successful flow be visible at the bottom of a method for readability.


10. Adopt the new XML solution format .slnx The new .slnx format is human-readable XML, reduces merge conflicts, and is supported by the dotnet CLI and Visual Studio.

CLI:

dotnet sln migrate MySolution.sln
# produces MySolution.slnx

Reference: Modern .slnx solution format


11. Add HTTP security headers Enable CSP, X-Frame-Options, Referrer-Policy, Permissions-Policy, etc., or use a helper package with sane defaults. Test with securityheaders.com.

Code:

// Using NetEscapades.AspNetCore.SecurityHeaders
app.UseSecurityHeaders(policies =>
policies.AddDefaultSecurityHeaders()
.AddContentSecurityPolicy(b => b.BlockAllMixedContent()));

Reference: NetEscapades.AspNetCore.SecurityHeaders


12. Build once, deploy many; prefer trunk-based development Use a single long-lived main branch, short-lived feature branches, and promote the same build artifact through environments.

Reference: Atlassian Gitflow vs Trunk-based development


13. Validate your DI container on startup Enable ValidateOnBuild and ValidateScopes to catch captive dependencies and lifetime errors during startup.

Code:

builder.Host.UseDefaultServiceProvider(o =>
{
o.ValidateScopes = true;
o.ValidateOnBuild = true;
});

Reference: .NET Generic Host Service Provider


14. Write automated tests; prefer xUnit, upgrade to v3 Automated tests improve speed and reliability. xUnit v3 is current and supports the new Microsoft testing platform.

Code:

<!-- Test.csproj -->
<ItemGroup>
<PackageReference Include="xunit.v3" Version="1.0.1" />
<PackageReference Include="xunit.runner.visualstudio" Version="3.*" />
</ItemGroup>
<PropertyGroup>
<UseMicrosoftTestingPlatformRunner>true</UseMicrosoftTestingPlatformRunner>
</PropertyGroup>

Reference: xUnit.net v3


15. Use Central Package Management Keep package versions in Directory.Packages.props to synchronize versions across projects.

Code:

<!-- Directory.Packages.props -->
<Project>
<ItemGroup>
<PackageVersion Include="xunit.v3" Version="1.0.1" />
<PackageVersion Include="Serilog.AspNetCore" Version="8.0.0" />
</ItemGroup>
</Project>

<!-- In .csproj files -->
<ItemGroup>
<PackageReference Include="xunit.v3" />
<PackageReference Include="Serilog.AspNetCore" />
</ItemGroup>

Reference: Central Package Management in .NET


16. Log EF Core SQL locally by raising the EF category to Information Enable Microsoft.EntityFrameworkCore.Database.Command at Information to see executed SQL. Use only for development.

Code:

// appsettings.Development.json
{
"Logging": {
"LogLevel": {
"Microsoft.EntityFrameworkCore.Database.Command": "Information"
}
}
}

Reference: EF Core Logging and Events


17. CI/CD and continuous deployment with feature toggles; ship in small batches Aim for pipelines that deploy green builds to production; replace manual checks with automated tests; use feature flags to keep unfinished work dark.

Reference: DORA: Trunk-Based Development

2025-09-28 Programming in Modern C with a Sneak Peek into C23 - Dawid Zalewski - ACCU 2023 - YouTube { www.youtube.com }

image-20250928150824455


A high-level tour of Programming in Modern C with a Sneak Peek into C23 (by Dawid Zalewski) shows how C remains alive and evolving. The talk focuses on practical, post-C99 techniques, especially useful in systems and embedded work. It demonstrates idioms that improve clarity, safety, and ergonomics without giving up low-level control.

Topics covered

Modern initialization Brace and designated initializers, empty initialization {} in C23, and mixed positional and designated forms.

Arrays Array designators, rules for inferred array size, and guidance on when to avoid variable-length arrays as storage while still using VLA syntax to declare function parameter bounds.

Pointer and API contracts Sized array parameters T a[n], static qualifiers like T a[static 3] to require valid elements, and const char *static 1 to enforce non-null strings.

Multidimensional data Strongly typed pointers to VLA-shaped arrays for natural a[i][j] indexing and safer sizeof expressions.

Compound literals Creating unnamed lvalues to reassign structs, pass inline structs to functions, and zero objects succinctly.

Macro patterns Named-argument style wrappers around compound literals, simple defaults, _Generic for ad-hoc overloading by type, and a macro trick for argument-count dispatch.

Memory layout Flexible array members for allocating a header plus payload in one contiguous block, reducing double-allocation pitfalls.

C23 highlights New keywords for bool, true, and false, the nullptr constant, auto type inference in specific contexts, a note on constexpr, and current compiler support caveats.

2025-09-27 Advice for Writing Small Programs in C - YouTube { www.youtube.com }

image-20250927123639285Main point

  • I spend my time writing code that gets real work done, and I rely on aggressive code reuse. In C that means I bring a better replacement for the C standard library to the party.

Key advice for writing C

  • Build your own reusable toolkit. My answer was stb: single-file, public-domain utilities that replace weak parts of libc.
  • Use dynamic arrays and treat them like vectors. I use macros so that arr[i] works and capacity/length live in bytes before the pointer.
  • Prefer hash tables and dynamic arrays by default. They make small programs both simpler and usually faster.
  • Be pragmatic with the C standard library. Use printf, malloc, free, qsort; avoid footguns like gets and be careful with strncpy and realloc.
  • Handle realloc safely. Assign to a temp pointer first, then swap it back if allocation succeeds.
  • Do not cache dynamic array lengths. It is a source of bugs when the array grows or shrinks.
  • Accept small inefficiencies if they improve iteration speed. Optimize only when it affects the edit-run loop or output.

Workflow and productivity

  • Remove setup friction. I keep a single quick.c workspace I can open, type, build, and run immediately.
  • Automate the boring steps. I have a one-command install that copies today’s build into my bin directory.
  • Write tiny, disposable tools. 5 to 120 minute utilities solve real problems now and often get reused later.
  • Favor tools that make easy things easy. Avoid frameworks that only make complicated things possible but make simple things tedious.
  • Keep programs single-file when you can. Deployment matters for speed and reuse.

Code reuse and licensing philosophy

  • Make reuse non-negotiable. I do not want to rewrite the same helper twice.
  • Ship as single-header libraries and make them easy to drop in. Easy to deploy, easy to use, easy to license.
  • Public domain licensing removes friction for future me and everyone else.

Language and ecosystem perspective

  • C can be great for small programs if you fix the library problem and streamline your workflow.
  • Conciseness matters. Shorter code usually means faster writing and iteration.
  • I choose C over dynamic languages for these tasks because my toolkit gives me comparable concision with better control.

API and library design principles

  • Simple, focused APIs with minimal surface area.
  • Make the common path trivial. Optional flexibility is fine, but do not tax the simple case.
  • Prefer data and functions over deep hierarchies or heavy abstractions.

· 22 min read

Good Reads​

2025-08-24 The Management Skill Nobody Talks About – Terrible Software { terriblesoftware.org }

image-20250824100027877

The real question isn’t whether you’ll make mistakes; it’s what you do after.

I recently read “Good Inside” by Dr. Becky Kennedy, a parenting book that completely changed how I think about this. She talks about how the most important parenting skill isn’t being perfect — it’s repair. When you inevitably lose your patience with your kid or handle something poorly, what matters most is going back and fixing it. Acknowledging what happened, taking responsibility, and reconnecting.

Sound familiar? Because that’s what good management is about too.

Think about the worst manager you ever had. I bet they weren’t necessarily the ones who made the most mistakes. But they were probably the ones who never acknowledged them. Who doubled down when they were wrong. Who let their ego prevent them from admitting they didn’t have all the answers.

2025-07-30 How to increase your surface area for luck - by Cate Hall { usefulfictions.substack.com }

image-20250730110328638

Cate Hall explains how increasing your “surface area”―the combination of doing meaningful work and making it visible―invites more serendipity. By writing, attending events, joining curated communities, and reaching out directly, you raise the probability that unexpected opportunities will find you.

Key Takeaways

  • Luck is not random; it grows when valuable work is paired with consistent public sharing.
  • Publishing ideas extends your reach indefinitely; a single post can keep generating inquiries for years.
  • Showing up at meetups, conferences, or gatherings multiplies chance encounters that can turn into collaborations.
  • Curated communities act as quality filters, putting you in front of people who already share your interests and standards.
  • Thoughtful, high‑volume cold outreach broadens your network and seeds future partnerships.
  • Deep expertise built on genuine passion attracts attention and referrals more naturally than broad generalism.
  • Balance is critical: “doing” without “telling” hides impact, while “telling” without substance destroys credibility.
  • Serendipity compounds over time; treat visibility efforts as long‑term investments, not quick wins.
  • Track views, replies, and introductions to identify which activities generate the most valuable contacts.

image-20250730110507789

2025-07-30 The actual reason you can't get a job - YouTube { www.youtube.com } image-20250730110439030

2025-07-19 Why Most Feedback Shouldn’t Exist – Terrible Software { terriblesoftware.org }

image-20250718183638669 when everything is “an opportunity for growth,” nothing is.

2025-06-25 Why Engineers Hate Their Managers (And What to Do About It) – Terrible Software { terriblesoftware.org }

image-20250624190706665

Most engineers have a complicated relationship with their managers. And by “complicated,” I mean somewhere between mild annoyance and seething resentment. Having been on both sides of this — more than a decade as an engineer before switching to management — I’ve experienced this tension from every angle.

Here’s the uncomfortable truth: engineers often have good reasons to be frustrated with their managers. But understanding why this happens is the first step toward fixing (or just coping with?) it.


Let me walk you through the most common management anti-patterns that make engineers want to flip tables — and stick around, because I’ll also share what the best managers do differently to actually earn their engineers’ respect.

If you’re an engineer, you’ll probably nod along thinking “finally, someone gets it.” If you’re a manager, well
 you might recognize yourself in here. And that’s okay — awareness is the first step.

2025-06-15 Good Engineer/Bad Engineer – Terrible Software { terriblesoftware.org }

(I've tried to summarize, but it is too good for summarizing, I can't! )

By Matheus Lima on June 13, 2025

This is inspired by Ben Horowitz’s “Good Product Manager/Bad Product Manager.” We all exhibit both behaviors — what matters is which ones we choose to reinforce.


Bad engineers think their job is to write code. Good engineers know their job is to ship working software that adds real value to users.

Bad engineers dive straight into implementation. Good engineers first ask “why?”. They know that perfectly executed solutions to the wrong problems are worthless. They’ll push back — not to be difficult, but to find the simplest path to real value. “Can we ship this in three parts instead of one big release?” “What if we tested the riskiest assumption first?”

Bad engineers work in isolation, perfecting their code in darkness. Good engineers share early and often. They’ll throw up a draft PR after a few hours with “WIP – thoughts on this approach?” They understand that course corrections at 20% are cheap; but at 80% are expensive.

Bad engineers measure their worth by the complexity of their solutions. They build elaborate architectures for simple problems, write clever code that requires a PhD to understand, and mistake motion for progress. Good engineers reach for simple solutions first, write code their junior colleagues can maintain, and have the confidence to choose “boring” technology that just works.

Bad engineers treat code reviews as battles to be won. They defend every line like it’s their firstborn child, taking feedback as personal attacks. Good engineers see code reviews differently — they’re opportunities to teach and learn, not contests. They’ll often review their own PR first, leaving comments like “This feels hacky, any better ideas?” They know that your strengths are your weaknesses, and they want their teammates to catch their blind spots.

Bad engineers say yes to everything, drowning in a sea of commitments they can’t keep. Good engineers have learned the art of the strategic no. “I could do that, but it means X won’t ship this sprint. Which is more important?”.

Bad engineers guard knowledge like treasure, making themselves indispensable through obscurity. Good engineers document as they go, pair with juniors, and celebrate when someone else can maintain their code. They know job security comes from impact, not from being a single point of failure.

Bad engineers chase the newest framework, the hottest language, the latest trend. They’ve rewritten the same app four times in four different frameworks. Good engineers are pragmatists. They’ll choose the tech that the team knows, the solution that can be hired for, the approach that lets them focus on the actual problem.

Bad engineers think in absolutes — always DRY, never compromise, perfect or nothing. Good engineers know when to break their own rules, when good enough truly is good enough, and when to ship the 80% solution today rather than the 100% solution never.

Bad engineers write code. Good engineers solve problems. Bad engineers focus on themselves. Good engineers focus on their team. Bad engineers optimize for looking smart. Good engineers optimize for being useful.

The best engineers I’ve worked with weren’t necessarily the smartest — they were simply the most effective. And effectiveness isn’t about perfection. It’s about progress.

2025-07-10 Mellow Drama: Turning Browsers Into Request Brokers | Secure Annex { secureannex.com }

image-20250709171941011

The SecureAnnex blog post “Mellow Drama: Turning Browsers Into Request Brokers” investigates a JavaScript library called Mellowtel, which is embedded in hundreds of browser extensions. This library covertly leverages user browsers to load hidden iframes for web scraping, effectively creating a distributed scraping network. The behavior weakens security protections like Content-Security-Policy, and participants include Chrome, Edge, and Firefox users—nearly one million installations in total. SecureAnnex traces this operation to Olostep, a web scraping API provider.

Takeaways:

Widespread involuntary participation Mellowtel is embedded in 245 browser extensions across Chrome, Edge, and Firefox, with around 1 million active installations as of July 2025.

Library functionality explained The script activates during user inactivity, strips critical security headers, injects hidden iframes, parses content via service workers, and exfiltrates data to AWS Lambda endpoints.

Monetization-driven inclusion Developers integrated Mellowtel to monetize unused bandwidth. The library operates silently using existing web access permissions.

Olostep’s connection Olostep, run by Arslan Ali and Hamza Ali, appears to be behind Mellowtel and uses it to power their scraping API for bypassing anti-bot defenses.

Security implications Removing headers like Content-Security-Policy and X-Frame-Options increases risk of XSS, phishing, and internal data leaks, especially in corporate settings.

Partial takedown by browser vendors Chrome, Edge, and Firefox have begun removing some affected extensions, but most remain available and active.

Shady transparency practices Some extensions vaguely mention monetization or offer small payments, but disclosures are often misleading or obscured.

Mitigation and detection guidance Users should audit installed extensions, block traffic to request.mellow.tel, and restrict iframe injection and webRequest permissions.

Community-driven defense Researchers like John Tuckner are sharing IOCs and YARA rules to detect compromised extensions and raise awareness.

Broader security trend This incident exemplifies a growing class of browser-based supply chain attacks using benign-looking extensions as distributed scraping nodes.

2025-07-27 Reading QR codes without a computer! {qr.blinry.org}

Did you ever wonder how QR codes work? You've come to the right place! This is an interactive explanation that we've written for a workshop at 37C3, but you can also use it on your own. You will learn:

  • The anatomy of QR codes
  • How to decode QR codes by hand (using our cheat sheet)

image-20250727135009283

2025-08-31 Notes on Managing ADHD { borretti.me } image-20250831115139380

I build external scaffolding so my brain has fewer places to drop things: memory lives in a single todo list, and the one meta habit is to open it every morning; projects get their own entries so half-read books and half-built ideas do not evaporate; I keep the list pinned on the left third of the screen so it is always in my visual field. I manage energy like voltage: early morning is for the thing I dread, mid-day is for creative work, later is for chores; when I feel avoidance, I treat procrastination by type—do it scared for anxiety, ask a human to sit with me for accountability, and write to think when choice paralysis hits; timers manufacture urgency to start and, importantly, to stop so one project does not eat the day. I practice journaling across daily, weekly, monthly, yearly reviews to surface patterns and measure progress; for time, I keep a light calendar for social and gym blocks and add explicit travel time so I actually leave; the todo list holds the fine-grained work, the calendar holds the big rocks.

On the ground I favor task selection by shortest-first, with exceptions for anything old and for staying within the active project; I do project check-ins—even 15 minutes of reading the code or draft—to refresh caches so momentum is cheap; I centralize inboxes by sweeping mail, chats, downloads, and bookmarks into the list, run Inbox Zero so nothing camouflages, and declare bankruptcy once to reset a swampy backlog. I plan first, do later so mid-task derailments do not erase intent—walk the apartment, list every fix, then execute; I replace interrupts with polling by turning on DND and scheduling comms passes; I do it on my own terms by drafting scary emails in a text editor or mocking forms in a spreadsheet before pasting; I watch derailers like morning lifting, pacing, or music and design around them; I avoid becoming the master of drudgery who optimizes the system but ships nothing; when one task blocks everything, I curb thrashing by timeboxing it daily and moving other pieces forward; and I pick tools I like and stick to one, because one app is better than two and building my own is just artisan procrastination.

ADHD, productivity, todo list, journaling, energy management, procrastination, timers, inbox zero, task selection, planning, timeboxing, focus, tools

Inspiration!​

2025-08-01 Cry Once a Week { www.cryonceaweek.com }

image-20250731184434171

image-20250731184229129

2025-09-01 Eternal Struggle { yoavg.github.io }

image-20250831175202480

👂 The Ear of AI (LLMs)​

2025-08-18 tokens are getting more expensive - by Ethan Ding { ethanding.substack.com }

image-20250818151926364

Flat subscriptions cannot scale The assumption that margins would expand as LLMs became cheaper is flawed. Users always want the best model, which keeps a constant price floor. AI Subscriptions Get Short Squeezed

Token usage per task is exploding Tasks that used ~1k tokens now often consume 100k or more due to long reasoning chains, browsing, and planning.

Unlimited plans are collapsing Anthropic announced weekly rate limits for Claude subscribers starting August 28, 2025. Anthropic news update

Heavy users (“inference whales”) break economics Some Claude Code customers consumed tens of thousands of dollars in compute while only paying $200/month. The Register reporting

Shift toward usage credits Cursor restructured pricing: Pro plans now include credits with at-cost overages, plus a new $200 Ultra tier. Cursor pricing page

2025-06-15 Field Notes From Shipping Real Code With Claude - diwank's space { diwank.space }

Tags: AI‑assisted coding, vibe coding, CLAUDE.md, anchor comments, testing discipline, token management, git workflows, session isolation, developer efficiency

image-20250615144843760

Here’s What You’re Going to Learn First, we’ll explore how to genuinely achieve a 10x productivity boost—not through magic, but through deliberate practices that amplify AI’s strengths while compensating for its weaknesses.

Next, I’ll walk you through the infrastructure we use at Julep to ship production code daily with Claude’s help. You’ll see our CLAUDE.md templates, our commit strategies, and guardrails.

Most importantly, you’ll understand why writing your own tests remains absolutely sacred, even (especially) in the age of AI. This single principle will save you from many a midnight debugging sessions.

Steve Yegge brilliantly coined the term CHOP—Chat-Oriented Programming in a slightly-dramatic-titled post “The death of the junior developer”. It’s a perfect, and no-bs description of what it’s like to code with Claude.

There are three distinct postures you can take when vibe-coding, each suited to different phases in the development cycle:

  1. AI as First-Drafter: Here, AI generates initial implementations while you focus on architecture and design. It’s like having a junior developer who can type at the speed of thought but needs constant guidance. Perfect for boilerplate, CRUD operations, and standard patterns.
  2. AI as Pair-Programmer: This is the sweet spot for most development. You’re actively collaborating, bouncing ideas back and forth. The AI suggests approaches, you refine them. You sketch the outline, AI fills in details. It’s like pair programming with someone who has read every programming book ever written but has never actually shipped code.
  3. AI as Validator: Sometimes you write code and want a sanity check. AI reviews for bugs, suggests improvements, spots patterns you might have missed. Think of it as an incredibly well-read code reviewer who never gets tired or cranky.

Instead of crafting every line, you’re reviewing, refining, directing. But—and this cannot be overstated—you remain the architect. Claude is your intern with encyclopedic knowledge but zero context about your specific system, your users, your business logic.

2025-06-02 MCP explained without hype or fluff - nilenso blog { blog.nilenso.com }

image-20250601185649415

Model Context Protocol (MCP) helps AI apps connect with different tools and data sources more easily. Usually, if many apps need to work with many tools, each app would have to build a custom connection for every tool, which becomes complicated very quickly. MCP fixes this by creating one common way for apps and tools to talk. Now, each app only needs to understand MCP, and each tool only needs to support MCP.

An AI app that uses MCP doesn't need to know how each platform works. Instead, MCP servers handle the details. They offer tools the AI can use, like searching for files or sending emails, as well as prompts, data resources, and ways to request help from the AI model itself. In most cases, it's easier to build servers than clients.

The author shares a simple example: building an MCP server for CKAN, a platform that hosts public datasets. This server allows AI models like Claude to search and analyze data on CKAN without any special code for CKAN itself. The AI can then show summaries, lists of datasets, and even create dashboards based on the data.

MCP has become popular because it gives a clear and stable way for AI apps to work with many different systems. But it also adds some extra work. Setting up MCP takes time, and using too many tools can slow down AI responses or lower quality. MCP works best when you need to integrate many systems, but may not be necessary for smaller, controlled projects where fine-tuned AI models already perform well.

ai integration, model context protocol, simple architecture, ckan, open data, ai tools, tradeoffs, protocol design

2025-07-19 Nobody Knows How To Build With AI Yet - by Scott Werner { worksonmymachine.substack.com }

image-20250719101340845

The Architecture Overview - Started as a README. "Here's what this thing probably does, I think."

The Technical Considerations - My accumulated frustrations turned into documentation. Every time Claude had trouble, we added more details.

The Workflow Process - I noticed I kept doing the same dance. So I had Claude write down the steps. Now I follow my own instructions like they're sacred text. They're not. They're just what happened to work this time.

The Story Breakdown - Everything in 15-30 minute chunks. Why? Because that's roughly how long before Claude starts forgetting what we discussed ten minutes ago. Like a goldfish with a PhD.


It's like being a professional surfer on an ocean that keeps changing its physics. Just when you think you understand waves, they start moving sideways. Or backwards. Or turning into birds.

This is either terrifying or liberating, depending on your relationship with control.

C || C++​

2025-08-16 Vjekoslav Krajačić on X: "C macro approach to using optional / default / named function params, using struct designated initializers. {x.com}

image-20250815235223750

image-20250815235236421

This approach uses C macros combined with struct designated initializers to mimic optional, default, and named parameters in C, something the language does not natively support.

Core Idea

  1. Separate mandatory and optional parameters
    • Mandatory arguments are given as normal function parameters.
    • Optional parameters are bundled into a struct with sensible defaults.
  2. Designated initializers for named parameter-like syntax
    • You can specify only the fields you want to override, in any order.
    • Unspecified fields automatically keep the default values.
  3. Macro wrapper to simplify usage
    • The macro accepts the mandatory arguments and any number of struct field assignments for optional parameters.
    • Inside the macro, a default struct is created and then overridden with user-provided values.

From the screenshot:

struct TransformParams {
Vec2 pivot; // Default TopLeft
Dim2 scale; // Default One (1,1)
f32 angle; // Default 0
b32 clip; // Default false
};

#define PushTransform(rect, ...) \
_PushTransform((rect), (TransformParams){ .scale = Vec2One, __VA_ARGS__ })

void _PushTransform(Rect rect, TransformParams params) {
// Implementation here
}

How it works

  • Default values: .scale = Vec2One is set in the macro. Other defaults could be set directly in the struct or in _PushTransform.
  • Optional override: When calling PushTransform, you can pass only the fields you care about:
PushTransform(rect);                               // all defaults
PushTransform(rect, .scale = Vec2Half); // override scale
PushTransform(rect, .angle = 90.0f, .pivot = Center); // override multiple

Advantages

  • Named parameter feel in plain C.
  • Optional arguments without multiple overloaded functions.
  • Defaults are enforced at the call site or macro expansion.
  • Zero-cost at runtime since it’s all resolved at compile time.

Limitations

  • Only works cleanly if the optional parameters are grouped in a single struct.
  • Macro syntax can be tricky if parameters require expressions with commas (needs parentheses or extra care).
  • Debugging through macro expansions is sometimes less clear.

Watch it also here: https://www.youtube.com/watch?v=VdmeoMZjIgs

2025-08-01 How I program C - YouTube { www.youtube.com }

image-20250801102620753 Speaker Eskil Steenberg – game-engine and tools developer (Quel Solaar) Recording Seattle, Oct 2016 (2 h 11 m)

Key themes

  • Results first, control later – why explicit memory management, crashes, and compiler errors are desirable.
  • Minimise technology footprint – target C89/C90, wrap every dependency, zero un-wrapped libraries.
  • Code is for humans – long descriptive names, uniform naming schemes, wide functions, avoid cleverness (e.g. operator overloading).
  • Favour simple languages plus strong tooling – write parsers, debuggers, doc generators yourself.
  • Memory mastery – pointers as arrays, alignment and padding, struct packing, cache-friendly dynamic arrays + realloc, dangers of linked lists.
  • API design – opaque handles (void *), start with public interface, isolate implementation, macro-assisted debug wrappers (__FILE__, __LINE__).
  • Build a mountain – own your stack, keep technical debt near zero, rewrite early.
  • UI toolkit pattern – single pass, stateless widgets keyed by pointer IDs; layout and hit-testing resolved internally.
  • Tools and snippets – Carmack inverse-sqrt; xorshift32 PRNG; GFlags page-guarding for memory bugs; Seduce UI; Testify binary packer; Ministry of Flat un-wrapper.

Talk structure in order of appearance

  • Motivation and philosophy
  • Results vs control; garbage collection vs manual free
  • Small footprint and dependency wrapping
  • Naming conventions and formatting policies
  • Crashes and compiler errors as friends
  • Macros: when to use, when to avoid
  • Deep dive: pointers, arrays, structs, alignment, packed allocations
  • Cache-aware data structures; realloc growth patterns; backwards remove
  • API style with opaque handles; object orientation in C
  • Memory-debug and binary-packing helpers using __FILE__ __LINE__
  • UI toolkit design example (Seduce)
  • Build-your-own-tools mindset; “build a mountain” analogy
  • Closing resources and project links

2025-08-01 C Programming Full Course for free ⚙ (2025) - YouTube {www.youtube.com}

image-20250801103430609

2025-08-02 Go from mid-level to advanced C programmer in two hours - YouTube {www.youtube.com}

image-20250801230639700

2025-08-02 C Programming and Memory Management - Full Course - YouTube {www.youtube.com}

Interesting, "Boot Dev", known for very annoying ads, also have some quality content

image-20250801230857693

2025-08-02 I'm Building C with C without CMake - YouTube {www.youtube.com}

image-20250801233701958

2025-08-02 Tips for C Programming - YouTube {www.youtube.com}

image-20250802105718241

2025-07-13 Parse, Don’t Validate AKA Some C Safety Tips { www.lelanthran.com }

The article "Parse, Don’t Validate AKA Some C Safety Tips" by Lelanthran expands on the concept of converting input into strong types rather than merely validating it as plain strings. It demonstrates how this approach, when applied in C, reduces error-prone code and security risks. The post outlines three practical benefits: boundary handling with opaque types, safer memory cleanup via pointer‑setting destructors, and compile‑time type safety that prevents misuse deeper in the codebase.

Key Takeaways:

  1. Use Strong, Opaque Types for Input
    • Instead of handling raw char *, parse untrusted input into dedicated types like email_t or name_t.
    • This restricts raw input to the system boundary and ensures all later code works with validated, structured data.
  2. Reduce Attack Surface
    • Only boundary functions see untrusted strings; internal functions operate on safe, strongly typed data.
    • This prevents deeper code from encountering malformed or malicious input.
  3. Enforce Correctness at Compile Time
    • With distinct types, the compiler prohibits misuse, such as passing an email_t* to a function expecting a name_t*.
    • What would be a runtime bug becomes a compiler error.
  4. Implement Defensive Destructors
    • Design destructor functions to take a double pointer (T **) so they can free and then set the pointer to NULL.
    • This prevents double‑free errors and related memory safety issues.
  5. Eliminate Internal String Handling
    • By centralizing parsing near the system entry and eliminating char * downstream, code becomes safer and clearer.
    • Once input is parsed, the rest of the system works with well-typed data only.

image-20250713115328366

Emacs​

2025-04-12 Emacs Lisp Elements | Protesilaos Stavrou { protesilaos.com }

Tags: Emacs, Emacs Lisp, Elisp, Programming, Text Editor, Customization, Macros, Buffers, Control Flow, Pattern Matching

A comprehensive guide offering a conceptual overview of Emacs Lisp to help users effectively customize and extend Emacs.

  • Emphasizes the importance of understanding Emacs Lisp for enhancing productivity and personalizing the Emacs environment.
  • Covers foundational topics such as evaluation, side effects, and return values.
  • Explores advanced concepts including macros, pattern matching with pcase, and control flow constructs like if-let*.
  • Discusses practical applications like buffer manipulation, text properties, and function definitions.
  • Includes indices for functions, variables, and concepts to facilitate navigation.

This resource is valuable for both beginners and experienced users aiming to deepen their understanding of Emacs Lisp and leverage it to tailor Emacs to their specific workflows.

image-20250412115256181

2025-01-11 ewantown/nice-org-html: Modern Org to HTML pipeline with CSS injection from Emacs themes { github.com }

This package generates pretty, responsive, websites from .org files and your choice of Emacs themes. You can optionally specify a header, footer, and additional CSS and JS to be included. To see the default output, for my chosen themes and with no header, footer or extras, view this README in your browser here. If you’re already there, you can find the GitHub repo here.

· 19 min read

⌚ Nice watch!​

2025-08-05 Architecting LARGE software projects. - YouTube { www.youtube.com }

image-20250804204435319

Define clear, long-term project goals: dependability, extendability, team scalability, and sustained velocity before writing any code, so every subsequent decision aligns with these objectives. Dependability keeps software running for decades; extendability welcomes new features without rewrites; team scalability lets one person own each module instead of forcing many into one file; sustained velocity prevents the slowdown that occurs when fixes trigger more breakage. Listing likely changes such as platform APIs, language toolchains, hardware, shifting priorities, and staff turnover guides risk mitigation and keeps the plan realistic.

Encapsulate change inside small black-box modules that expose only a stable API, allowing one engineer to own, test, and later replace each module without disturbing others. Header-level boundaries cut meeting load, permit isolated rewrites, and match task difficulty to developer experience by giving complex boxes to seniors and simpler ones to juniors.

Write code completely and explicitly the first time, choosing clarity over brevity to prevent costly future rework. Five straightforward lines now are cheaper than one clever shortcut that demands archaeology years later.

Shield software from platform volatility by funnelling all OS and third-party calls through a thin, portable wrapper that you can port once and reuse everywhere. A tiny demo app exercises every call, proving a new backend before millions of downstream lines even compile.

Build reusable helper libraries for common concerns such as rendering, UI, text, and networking, starting with the simplest working implementation but designing APIs for eventual full features so callers never refactor. A bitmap font renderer, for example, already accepts UTF-8, kerning, and color so a future anti-aliased engine drops in invisibly.

Keep domain logic in a UI-agnostic core layer and let GUIs or headless tools interact with that core solely through its published API. A timeline core powers both a desktop video editor and a command-line renderer without duplicating logic.

Use plugin architectures for both user features and platform integrations, loading optional capabilities from separate binaries to keep the main build lean and flexible. In the Stellar lighting tool, every effect and even controller input ships as an external module, so missing a plugin merely disables one function, not the whole app.

Migrate legacy systems by synchronizing them through adapters to a new core store, enabling gradual cut-over while exposing modern bindings such as C, Python, and REST. Healthcare events recorded in the new engine echo to the old database until clinics finish the transition.

Model real-time embedded systems as a shared authoritative world state that edge devices subscribe to, enabling redundancy, simulation, and testing without altering subscriber code. Sensors push contacts, fuel, and confidence scores into the core; wing computers request only the fields they need, redundant cores vote for fault tolerance, and the same channel feeds record-and-replay tools for contractors.

Design every interface, file format, and protocol to be minimal yet expressive, separating structure from semantics so implementations stay simple and evolvable. Choosing one primitive such as polygons, voxels, or text avoids dual support, keeps loaders small, and lets any backend change without touching callers.

Prefer architectures where external components plug into your stable core rather than embedding your code inside their ecosystems, preserving control over versioning and direction. Hosting the plugin point secures compatibility rules and leaves internals free to evolve.

2025-07-26 Ted Bendixson – Most of your projects are stupid. Please make some actual games. – BSC 2025 - YouTube { www.youtube.com }

image-20250726165734918

  1. Focus on making actual games and software people use, not tech demos of rotating cubes; he observes most showcases are rendering stress tests instead of finished games.
  2. Prioritize design because top-selling Steam games succeed on gameplay design, not just graphics; he cites "Balatro" competing with "Civilization 7".
  3. Always ask "What do we do that they don't?" to define your product’s unique hook; he references the Sega Genesis ad campaign as an example of aspirational marketing.
  4. Start from a concrete player action or interaction (e.g., connecting planets in "Slipways", rewinding time in "Braid") rather than from story or vibe.
  5. Use genres as starting templates to get an initial action set, then diverge as you discover your own twist; he compares "Into the Breach" evolving from "Advance Wars".
  6. Skip paper prototyping for video games; rely on the computer to run simulations and build low-friction playable prototypes instead.
  7. Prototype with extremely low-fidelity art and UI; examples include his own early "Moose Solutions" and the first "Balatro" mockups.
  8. Beat blank-page paralysis by immediately putting the first bad version of a feature into the game without overthinking interactions; iterate afterward.
  9. Let the running game (the simulation) reveal what works; you are not Paul Atreides, you cannot foresee every system interaction.
  10. Move fast in code: early entities can just be one big struct; do not over-engineer ECS or architecture in prototypes.
  11. Use simple bit flags (e.g., a u32) for many booleans to get minor performance without heavy systems.
  12. Combine editor and game into one executable so you can drop entities and test instantly; he shows his Cave Factory editor mode.
  13. Do not obsess over memory early; statically allocate big arenas, use scratch and lifetime-specific arenas, and worry about optimization later.
  14. Never design abstractions up front; implement features, notice repetition, and then compress into functions/structs (Casey Muratori’s semantic compression).
  15. Avoid high-friction languages/processes (Rust borrow checking, strict TDD) during exploration; add safety and tests only after proving people want the product.
  16. Do not hire expensive artists during prototyping; you will throw work away. Bring art in later, like Jonathan Blow did with "Braid".
  17. Spend real money on capsule/storefront art when you are shipping because that is your storefront on Steam.
  18. Keep the team tiny early; people consume time and meetings. If you collaborate, give each person a clear lane.
  19. Build a custom engine only when the gameplay itself demands engine-level control (examples: "Fez" rotation mechanic, "Noita" per-pixel simulation).
  20. If you are tinkering with tech (cellular automata, voxel sims), consciously pivot it toward a real game concept as the Noita team did.
  21. Cut distractions; social media is a time sink. Optimize for the Steam algorithm, not Twitter likes.
  22. Let streamers and influencers announce and showcase your game instead of doing it yourself to avoid social media toxicity.
  23. Do not polish and ship if players are not finishing or engaging deeply; scrap or rework instead of spending on shine.
  24. Tie polish and art budget to gameplay hours and depth; 1,000-hour games like Factorio justify heavy investment.
  25. Shipping a game hardens your tech; the leftover code base becomes your engine for future projects.
  26. Low-level programming is power, but it must be aimed at a marketable design, not just technical feats.
  27. Play many successful indie games as market research; find overlap between what you love and what the market buys.
  28. When you play for research, identify the hook and why people like it; you do not need to finish every game.
  29. Treat hardcore design like weight training; alternate intense design days with lighter tasks (art, sound) to recover mentally.
  30. Prototype while still employed; build skills and a near-complete prototype before quitting.
  31. Know your annual spending before leaving your job; runway is meaningless without that number.
  32. Aim for a long runway (around two years or more) to avoid the high cost of reentering the workforce mid-project.
  33. Do not bounce in and out of jobs; it drains momentum.
  34. Save and invest to create a financial buffer (FIRE-style) so you can focus on games full time.
  35. Maintain full control and ownership of your tech to mitigate platform risk (Unity’s policy changes are cited as a cautionary tale).

2025-07-25 Does AI Actually Boost Developer Productivity? (100k Devs Study) - Yegor Denisov-Blanch, Stanford - YouTube { www.youtube.com }

image-20250724214222583

Stanford research group has conducted a multi‑year time‑series and cross‑sectional study on software‑engineering productivity involving more than 600 companies

Current dataset: over 100,000 software engineers, dozens of millions of commits, billions of lines of code, predominantly from private repositories

Late last year analysis of about 50,000 engineers identified roughly 10 percent as ghost engineers who collect paychecks but contribute almost no work

Study team members include Simon (former CTO of a 700‑developer unicorn), a Stanford researcher active since 2022 on data‑driven decision‑making, and Professor Kasinski (Cambridge Analytica whistleblower)

A 43‑developer experiment showed self‑assessment of productivity was off by about 30 percentile points on average; only one in three developers ranked themselves within their correct quartile

The research built a model that evaluates every commit’s functional change via git metadata, correlates with expert judgments, and scales faster and cheaper than manual panels

At one enterprise with 120 developers, introducing AI in September produced an overall productivity boost of about 15–20 percent and a marked rise in rework

Across industries gross AI coding output rises roughly 30–40 percent, but net average productivity gain after rework is about 15–20 percent

Median productivity gains by task and project type: low‑complexity greenfield 30–40 percent; high‑complexity greenfield 10–15 percent; low‑complexity brownfield 15–20 percent; high‑complexity brownfield 0–10 percent (sample 136 teams across 27 companies)

AI benefits low‑complexity tasks more than high‑complexity tasks and can lower productivity on some high‑complexity work

For high‑popularity languages (Python, Java, JavaScript, TypeScript) gains average about 20 percent on low‑complexity tasks and 10–15 percent on high‑complexity tasks; for low‑popularity languages (Cobol, Haskell, Elixir) assistance is marginal and can be negative on complex tasks

Productivity gains decline sharply as codebase size grows from tens of thousands to millions of lines

LLM coding accuracy drops as context length rises: performance falls from about 90 percent at 1 k tokens to roughly 50 percent at 32 k tokens (NoLIMA paper)

Key factors affecting AI effectiveness: task complexity, project maturity, language popularity, codebase size, and context window length

2025-07-23 So You Want to Maintain a Reliable Event Driven System - James Eastham - NDC Oslo 2025 - YouTube { www.youtube.com }

image-20250722221746915

James Eastham shares hard‑won lessons on maintaining and evolving event‑driven systems after the initial excitement fades. Using a plant‑based pizza app as a running example (order → kitchen → delivery), he covers how to version events, test asynchronous flows, ensure idempotency, apply the outbox pattern, build a generic test harness, and instrument rich observability (traces, logs, metrics). The core message: your events are your API, change is inevitable, and reliability comes from deliberate versioning, requirements‑driven testing, and context‑rich telemetry.

Key Takeaways (9 items)

  • Treat events as first‑class APIs; version them explicitly (e.g. type: order.confirmed.v1) and publish deprecation dates so you never juggle endless parallel versions.
  • Adopt a standard event schema (e.g. CloudEvents) with fields for id, time, type, source, data, and data_content_type; this enables compatibility checks, idempotency, and richer telemetry. https://cloudevents.io
  • Use the outbox pattern to atomically persist state changes and events, then have a worker publish from the outbox; test that both the state row and the outbox row exist, not just your business logic.
  • Build a reusable test harness subscriber: spin up infra locally (Docker, Aspire, etc.), inject commands/events, and assert that expected events actually appear on the bus; poll with SLO‑aligned timeouts to avoid flaky tests.
  • Validate event structure at publish time with schema checks (JSON Schema, System.Text.Json contract validation) to catch breaking changes before they hit the wire.
  • Test unhappy paths: duplicate deliveries (at‑least‑once semantics), malformed payloads, upstream schema shifts, and downstream outages; verify DLQs and idempotent handlers behave correctly.
  • Instrument distributed tracing plus rich context: technical (operation=send/receive/process, system=kafka/sqs, destination name, event version) and business (order_id, customer_id) so you can answer unknown questions later. See OpenTelemetry messaging semantic conventions: https://opentelemetry.io/docs/specs/semconv/messaging
  • Decide when to propagate trace context vs use span links: propagate within a domain boundary, link across domains to avoid 15‑hour monster traces from batch jobs.
  • Monitor the macro picture too: queue depth, message age, in‑flight latency, payload size shifts, error counts, and success rates; alert on absence of success as well as presence of failure.

2025-07-13 The New Code — Sean Grove, OpenAI - YouTube { www.youtube.com }

This talk by Sean from OpenAI explores the paradigm shift from code-centric software development to intent-driven specification writing. He argues that as AI models become more capable, the bottleneck in software creation will no longer be code implementation but the clarity and precision with which humans communicate their intentions. Sean advocates for a future where structured, executable specifications—not code—serve as the core professional artifact. Drawing on OpenAI’s model specification (Model Spec), he illustrates how specifications can guide both human alignment and model behavior, serving as trust anchors, training data, and test suites. The talk concludes by equating specification writing with modern programming and calls for new tooling—like thought-clarifying IDEs—to support this transition.

Code is Secondary; Communication is Primary

  • Only 10–20% of a developer's value lies in the code they write; the remaining 80–90% comes from structured communication—understanding requirements, planning, testing, and translating intentions.
  • Effective communication will define the most valuable programmers of the future.

Vibe Coding Highlights a Shift in Workflow

  • “Vibe coding” with AI models focuses on expressing intent and outcomes first, letting the model generate code.
  • Yet, developers discard the prompt (intent) and keep only the generated code—akin to version-controlling a binary but shredding the source.

Specifications Align Humans and Models

  • Written specs clarify, codify, and align intentions across teams—engineering, product, legal, and policy.
  • OpenAI’s Model Spec (available on GitHub) exemplifies this, using human-readable Markdown that is versioned, testable, and extensible.

Specifications Outperform Code in Expressing Intent

  • Code is a lossy projection of intention; reverse engineering code does not reliably recover the original goals or values.
  • A robust specification can generate many artifacts: TypeScript, Rust, clients, servers, docs, even podcasts—whereas code alone cannot.

Specs Enable Deliberative Alignment

  • Using techniques like deliberative alignment, models are evaluated and trained using challenging prompts linked to spec clauses.
  • This transforms specs into both training and evaluation material, reinforcing model alignment with intended values.

image-20250713152914031

💡 Integrated Thought Clarifier! 💡 (I need one!)

image-20250713154457083

2025-07-17 Brand your types - Join me in the fight against weakly typed codebases! - Theodor René Carlsen - YouTube { www.youtube.com }

image-20250717011921769

This talk, titled "Branding Your Types", is delivered by Theo from the Danish Broadcasting Corporation. It explores the concept of branded types in TypeScript—a compile-time technique to semantically differentiate values of the same base type (e.g., different kinds of string or number) without runtime overhead.

Theo illustrates how weak typing with generic primitives like string can introduce subtle and costly bugs, especially in complex codebases where similar-looking data (e.g., URLs, usernames, passwords) are handled inconsistently.

The talk promotes a mindset of parsing, not validating—emphasizing data cleaning and refinement at the edges of systems, ensuring internal business logic can remain clean, type-safe, and predictable.

Generic Primitives Are Dangerous

  • Treating all strings or numbers the same can lead to bugs (e.g., swapped username and password). Using string to represent IDs, dates, booleans, or URLs adds ambiguity and increases cognitive load.

Use Branded Types for Clarity and Safety

  • TypeScript allows developers to brand primitive types with compile-time tags (e.g., Username, Password, RelativeURL) to distinguish otherwise identical types. This prevents bugs by catching misused values during compilation.

No Runtime Cost, Full Type Safety

  • Branded types are purely a TypeScript feature; they vanish during transpilation. You get stronger type guarantees without impacting performance or runtime behavior.

Protect Your Business Logic with Early Parsing

  • Don’t validate deep within your core logic. Instead, parse data from APIs or forms as early as possible. Converting "dirty" input into refined types early allows the rest of the code to assume correctness.

Parsing vs. Validation

  • Inspired by Alexis King’s blog post “Parse, Don’t Validate”, Theo stresses that parsing should transform unstructured input into structured, meaningful types. Validations check, but parsing commits and transforms.

📝 Original article: Parse, Don’t Validate by Alexis King — https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

Use types to encode guarantees Replace validate(x): boolean with parse(x): Result<T, Error>. This enforces correctness via types, ensuring only valid data proceeds through the system.

Parse at the boundaries Parse incoming data at the system’s edges (e.g. API handlers), and keep the rest of the application logic free from unverified values.

Avoid repeated validation logic Parsing once eliminates the need for multiple validations in different places, reducing complexity and inconsistency.

Preserve knowledge through types Using types like Maybe or Result lets you carry the status of values through your code rather than flattening them prematurely.

Demand strong input, return flexible output Functions should accept well-formed types (e.g. NonEmptyList<T>) and return optional or error-aware outputs.

Capitalize on language features Statically typed languages (e.g. Haskell, Elm, TypeScript) support defining precise types that embed business rules—use them.

Structured data beats flags Avoid returning booleans to indicate validity. Instead, return parsed data or detailed errors to make failures explicit.

Better testing and fewer bugs Strong input types reduce the number of test cases needed and prevent entire categories of bugs from entering the system.

Design toward domain modeling Prefer domain-specific types like Email, UUID, or URL rather than generic strings—improves readability and safety.

Applicable across many languages Though examples come from functional programming, the strategy works in many ecosystems—Elm, Haskell, Kotlin, TypeScript, etc.


Parse, Don’t Validate

The Parse, Don’t Validate approach emphasizes transforming potentially untrusted or loosely-structured data into domain-safe types as early as possible in a system. This typically happens at the "edges"—where raw input enters from the outside world (e.g. HTTP requests, environment variables, or file I/O). Instead of validating that the data meets certain criteria and continuing to use it in its original form (e.g. raw string or any), this pattern calls for parsing: producing a new, enriched type that encodes the constraints and guarantees. For example, given a JSON payload containing an email field, you wouldn’t just check whether the email is non-empty or contains “@”; you'd parse it into a specific Email type that can only be constructed from valid input. This guarantees that any part of the system which receives an Email value doesn’t need to perform checks—it can assume the input is safe by construction.

The goal of parsing is to front-load correctness and allow business logic to operate under safe assumptions. This leads to simpler, more expressive, and bug-resistant code, especially in strongly-typed languages. Parsing typically returns a result type (like Result<T, Error> or Option<T>) to indicate success or failure. If parsing fails, the error is handled at the boundary. Internally, the program deals only with parsed, safe values. This eliminates duplication of validation logic and prevents errors caused by invalid data slipping past checks. It also improves the readability and maintainability of code, as type declarations themselves serve as documentation for business rules. This approach does not inherently enforce encapsulation or behavior within types—it’s more about asserting the shape and constraints of data as early and clearly as possible. Parsing can be implemented manually (e.g. via custom functions and type guards) or with libraries (like zod, io-ts, or Elm’s JSON decoders).

Value Objects

The Value Object pattern, originating from Domain-Driven Design (DDD), is focused on modeling business concepts explicitly in the domain layer. A value object is an immutable, self-contained type that represents a concept such as Money, Email, PhoneNumber, or Temperature. Unlike simple primitives (string, number), value objects embed both data and behavior, enforcing invariants at construction and encapsulating domain logic relevant to the value. For instance, a Money value object might validate that the currency code is valid, store amount and currency together, and expose operations like add or convert. Value objects are compared by value (not identity), and immutability ensures they are predictable and side-effect free.

The key distinction in value objects is that correctness is enforced through encapsulation. You can't create an invalid Email object unless you bypass the constructor or factory method (which should be avoided by design). This encapsulated validation is often combined with private constructors and public factory methods (tryCreate, from, etc.) to ensure that the only way to instantiate a value object is through validated input. This centralizes responsibility for maintaining business rules. Compared to Parse, Don’t Validate, value objects focus more on modeling than on data conversion. While parsing is concerned with creating safe types from raw data, value objects are concerned with expressing the domain in a way that’s aligned with business intent and constraints.

In practice, value objects may internally use a parsing step during construction, but they emphasize type richness and encapsulated logic. Where Parse, Don’t Validate advocates that you return structured types early for safety, Value Objects argue that you return behavior-rich types for expressiveness and robustness. The two can—and often should—be used together: parse incoming data into value objects, and rely on their methods and invariants throughout your core domain logic. Parsing is about moving from unsafe to safe. Value objects are about enriching the safe values with meaning, rules, and operations.

· 36 min read

⌚ Nice watch!​

2025-07-06 Effective Ranges: A Tutorial for Using C++2x Ranges - Jeff Garland - CppCon 2023 - YouTube { www.youtube.com }

image-20250706101448811

  1. Ranges Abstract Iteration and Enable Composable Pipelines:
    • Ranges allow working with sequences in a more expressive and declarative way.
    • The move from manual begin and end iterator management to range-based algorithms improves safety and readability.
  2. Views Provide Lazy, Non-Owning Computation:
    • Views are lightweight wrappers that delay computation until necessary.
    • Useful for building efficient pipelines where intermediate results are not stored in memory.
  3. Pipe Syntax Enhances Readability and Function Composition:
    • Using | operator with view adapters enables cleaner syntax resembling Unix pipes.
    • Improves clarity by expressing transformations step-by-step.
  4. C++23 Introduces New Views and Algorithms:
    • Includes find_last, fold_left, chunk_by, join, zip, and more.
    • find_last simplifies reverse iteration compared to STL's reverse iterators.
  5. Views Must Be Passed Carefully in Functions:
    • Do not pass views as const references due to internal caching behaviors.
    • Prefer forwarding references (T&&) to preserve intended behavior.
  6. Projections Simplify Algorithm Customization:
    • Range algorithms allow projections, enabling operations on subfields without writing custom comparators.
    • E.g., ranges::sort(data, {}, &Data::field).
  7. Improved Return Values Preserve Computation Context:
    • Algorithms like find_last return subranges, not just iterators.
    • Encourages better code by retaining useful information.
  8. Range-based Construction in Containers (C++23):
    • STL containers now support constructing and assigning from ranges.
    • Enables direct pipeline-to-container conversions (e.g., to<std::deque>).
  9. You Can Write Your Own Views in C++23:
    • C++20 lacked a standard mechanism for user-defined views; C++23 adds it.
    • Writing custom views is complex due to required iterator machinery, but tools like Boost's stdx::interfaces can help.
  10. Use Range Algorithms First When Possible:
    • They offer better constraints, return types, and support for projections.
    • Cleaner syntax and fewer error-prone constructs compared to raw STL iterator use.

2025-07-06 ericniebler/range-v3: Range library for C++14/17/20, basis for C++20's std::ranges { github.com }

range-v3

Range library for C++14/17/20. This code was the basis of a formal proposal to add range support to the C++ standard library. That proposal evolved through a Technical Specification, and finally into P0896R4 "The One Ranges Proposal" which was merged into the C++20 working drafts in November 2018.

About

Ranges are an extension of the Standard Template Library that makes its iterators and algorithms more powerful by making them composable. Unlike other range-like solutions which seek to do away with iterators, in range-v3 ranges are an abstraction layer on top of iterators.

Range-v3 is built on three pillars: Views, Actions, and Algorithms. The algorithms are the same as those with which you are already familiar in the STL, except that in range-v3 all the algorithms have overloads that take ranges in addition to the overloads that take iterators. Views are composable adaptations of ranges where the adaptation happens lazily as the view is iterated. And an action is an eager application of an algorithm to a container that mutates the container in-place and returns it for further processing.

Views and actions use the pipe syntax (e.g., rng | adapt1 | adapt2 | ...) so your code is terse and readable from left to right.

2025-07-06 Resilient by Design - Chris Ayers - NDC Oslo 2025 - YouTube { www.youtube.com }

image-20250705194958121

2025-07-06 Aspiring .NET & Resilience @ NDC Oslo 2025 - Chris’s Tech ADHD { chris-ayers.com }

2025-07-06 Azure/Azure-Proactive-Resiliency-Library-v2: Azure Proactive Resiliency Library v2 (APRL) - Source for Azure WAF reliability guidance and associated ARG queries { github.com }

2025-07-06 Azure Well-Architected Framework - Microsoft Azure Well-Architected Framework | Microsoft Learn { learn.microsoft.com }

2025-07-05 Let’s catch up with C#! Exciting new features in C# 9, 10, 11, 12 and 13! - Filip Ekberg - NDC Oslo - YouTube { www.youtube.com }

image-20250705125149517

The following examples are GPT4o generated, inspired by the talk transcript:

Here’s an enhanced and polished Markdown list of C# language features, each with a clearer code example and concise description:


Nullable Reference Types (C# 8)

Compiler warns on nullable usage to avoid NullReferenceException.

string? name = null;
Console.WriteLine(name?.Length); // Safe check

Required Keyword (C# 11)

Forces initialization of essential properties via object initializers.

public class Person {
public required string Name { get; init; }
}

Init‑only Setters (C# 9)

Allows setting properties only during instantiation, promoting immutability.

public class User {
public string Username { get; init; }
}

Target‑Typed new Expressions (C# 9)

Omits redundant type when it can be inferred.

Person p = new();

Treat Warnings as Errors (Compiler Option)

Fails build on warnings to enforce high code quality.

<PropertyGroup>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>

Top‑level Statements (C# 9)

Removes boilerplate Main method for quick scripting or minimal APIs.

Console.WriteLine("Hello, World!");

Pattern Matching Enhancements & List Patterns (C# 8–11)

Supports rich switch/is patterns including lists and tuples.

if (numbers is [1, 2, .., 99]) {
Console.WriteLine("Starts with 1,2 ends with 99");
}

Switch Expressions (C# 8)

Simplifies switch statements into expressions.

string result = x switch {
1 => "One",
2 => "Two",
_ => "Other"
};

Record Types (C# 9)

Immutable reference types with built‑in equality and concise syntax.

public record Person(string Name, int Age);

File‑scoped Namespaces (C# 10)

Flattens indentation and streamlines namespace declaration.

namespace MyApp;

class Program { }

Global Using Directives (C# 10)

Applies using directives across the project from a single file.

global using System.Text;

Raw String Literals (C# 11)

Simplifies multiline or escaped strings.

var json = """
{
"name": "John",
"age": 30
}
""";

UTF‑8 String Literals (C# 11)

Creates UTF‑8 encoded string spans to improve performance.

ReadOnlySpan<byte> utf8 = "hello"u8;

Readonly Span Usage (C# 7.2+)

Enables memory-safe, high-performance access to underlying data.

ReadOnlySpan<char> span = "example";
Console.WriteLine(span.Slice(0,3).ToString()); // "exa"

Zero‑allocation String Manipulation (C# 11)

Slices strings without creating new copies.

ReadOnlySpan<char> name = "John Doe".AsSpan();
ReadOnlySpan<char> first = name[..4]; // "John"

Static Abstract Members in Interfaces (C# 11)

Supports generic math by allowing static operations in interfaces.

interface IAdd<T> {
static abstract T Add(T x, T y);
}

Delegate Caching Optimization (C# 11)

Improves performance by caching method group delegates.

Action a = MyStaticMethod;

Generic Math Support (C# 11)

Uses arithmetic operators in generic constraints.

T Sum<T>(T a, T b) where T : INumber<T> => a + b;

Generic Attributes (C# 11)

Enables attributes with generic parameters.

[MyAttribute<string>]
public class MyClass { }

Primary Constructors (C# 12)

Declares constructor parameters directly in the class header.

class Widget(string id) {
public void Print() => Console.WriteLine(id);
}

Collection Expressions (C# 12)

Simplifies array, list, or span creation.

int[] numbers = [1, 2, 3];

Spread Element in Collection Expressions (C# 12)

Flattens multiple collections into one.

int[] merged = [..arr1, ..arr2];

Optional Parameters in Lambdas (C# 12)

Adds default values to lambda parameters.

Func<int, int, int> sum = (int x = 1, int y = 2) => x + y;

Implicit Lambda Parameter Types (C# 10)

Type inference for lambda parameters using var.

var toInt = (var s) => int.Parse(s);

Alias Any Type (C# 12)

Creates type aliases for readability.

using Size = (int Width, int Height);
Size dims = (800, 600);

Interceptors for Source Generation (C# 12)

Enables compile-time method interception for code injection.

[InterceptsLocation]
void Log(string msg) => Console.WriteLine(msg);

Minimal APIs (C# 9+)

Lightweight, endpoint-focused API creation.

app.MapGet("/", () => "Hello World");

Lock Object (New Synchronization Primitive) (C# 13/.NET 9)

Offers optimized thread synchronization via Lock.

private static Lock _lock = new();

Field Keyword (C# 14)

Accesses auto-property backing field inside the accessor.

public string Name {
get => field.ToUpper();
set => field = value;
}

Implicit Index Access (^) (C# 8)

Enables accessing from end of collections.

int last = numbers[^1];

Overload Resolution Priority Attribute (C# 13)

Controls which overload is chosen when ambiguous.

[OverloadResolutionPriority(1)]
void M(List<int> list) { }

Partial Members Enhancements (C# 13)

Makes constructors, methods, or properties partial for source generators.

partial class Person {
partial void OnNameChanged();
}

Implicit Span Conversions (C# 14)

Allows seamless Span<T> allocation without casting.

Span<char> buffer = stackalloc char[10];

Extension Everything (C# 14 Planned)

Declares instance-like extension members via static classes.

extension class UserExt : User {
string Display => $"{Name} ({Email})";
}

Null‑Conditional Assignment (C# 14)

Assigns a value only if the target is null.

user?.Name ??= "Default Name";

2025-06-24 Building Rock-Solid Encrypted Applications - Ben Dechrai - NDC Melbourne 2025 - YouTube { www.youtube.com }

image-20250623215532608

Summary: Building Rock-Solid Encrypted Applications – Ben Dechrai

Ben Dechrai walks through building a secure chat application, starting with plain-text messages and evolving to an end-to-end encrypted, multi-device system. He explains how to apply AES symmetric encryption, Curve25519 key pairs, and Diffie-Hellman key exchange. The talk covers how to do secure key rotation, share keys across devices without leaks, scale encrypted messaging systems without data bloat, and defend against metadata analysis.

Key Insights

  1. Encryption is mandatory
    Regulatory frameworks like GDPR allow fines up to €20 million or 4% of annual global revenue.
    See GDPR Summary – EU Commission

  2. Use AES-256-GCM for payload encryption
    This is a well-audited symmetric cipher standardized in FIPS 197.
    See NIST FIPS 197: AES Specification

  3. Use Curve25519 key pairs per device
    Each device generates its own key pair; public key is shared, private key is never uploaded.
    See RFC 7748: Elliptic Curves for DH Key Agreement

  4. Encrypt the symmetric key for each participant
    Encrypt the actual message once with AES, then encrypt the AES key for each recipient using their public key. This avoids the large ciphertext problem seen in naive PGP-style encryption.

  5. Rotate ephemeral keys regularly for forward secrecy
    Generate a new key pair for each chat session and rotate keys on time or message count to ensure Perfect Forward Secrecy.
    See Cloudflare on Perfect Forward Secrecy

  6. Use Diffie-Hellman to agree on session keys securely
    Clients can agree on a shared secret without sending it over the wire. This makes it possible to use symmetric encryption without needing to exchange the key.
    See Wikipedia: Diffie–Hellman Key Exchange

  7. Use QR codes to securely pair devices
    When onboarding a second device (e.g. laptop + phone), generate keys locally and transfer only a temporary public key via QR. Use it to establish identity without a central login.

  8. Mask metadata to avoid traffic analysis
    Even encrypted messages can leak patterns through metadata. Pad messages to fixed sizes, send decoy traffic, and let all clients pull all messages to make inference harder.

  9. Adopt battle-tested protocols like Signal
    Don’t invent your own protocol if you're building secure messaging. The Signal Protocol already solves identity, authentication, and key ratcheting securely.
    See Signal Protocol Specification

  10. Store only ciphertext and public keys on servers
    All decryption happens on the device. Retaining private keys or decrypted messages is risky unless legally required. Private key loss or compromise must only affect a small slice of messages, not entire histories.

2025-06-23 I locked 1000 architects in a room until they all agreed on the best solution - Bronwen Zande - YouTube { www.youtube.com }

image-20250623001226688I built 1,000 architects using AI—each with a name, country, skillset, and headshot I asked a language model to make me architect profiles. I told it their gender, country, and years of experience. Then I got it to generate a photo too. I used tools like DALL-E and ChatGPT to get realistic images.

They all designed the same web app—based on a spec for a startup called Loom Ventures I created a pretend company and asked for a functional spec (nothing too crazy—blogs, search, logins, some CMS). Then I gave that spec to every AI architect and asked each to give me a full software design in Markdown.

I made them battle it out, tournament style, until we found “the best” design At first, designs were grouped and reviewed by four other architects (randomly picked). The best ones moved on to knockout rounds. In the final round, the last two designs were judged by all remaining architects.

The reviews weren’t just random—they had reasons and scores Each reviewer gave a score out of 100 and explained why. I asked them to be clear, compare trade-offs, and explain how well the design met the client's needs. The reviews came out in JSON so I could process them easily.

Experience and job titles really affected scores If a design said it was written by a “junior” architect, it got lower marks—even if the content was decent. When I removed the titles and re-ran reviews, scores jumped by 15%. So even the AIs showed bias.

Early mistakes in the prompt skewed my data badly My first example profile included cybersecurity, and the AI just kept making cyber-focused architects. Nearly all designs were security-heavy. I had to redo everything with simpler prompts and let the model be more creative.

The best designs added diagrams, workflows, and Markdown structure The winning entries used flowcharts (Mermaid), ASCII diagrams, and detailed explanations. They felt almost like something you’d see in a real architecture doc. A lot better than a wall of plain text.

Personas from different countries mentioned local laws That was cool. The architects from Australia talked about the APP (privacy laws). The ones from Poland mentioned GDPR. That means the AI was paying attention to the persona’s background.

2025-06-19 Andrej Karpathy: Software Is Changing (Again) - YouTube { www.youtube.com }

image-20250618232722060

Software 3.0 builds on earlier paradigms: It extends Software 1.0 (explicit code) and Software 2.0 (learned neural networks) by allowing developers to program using prompts in natural language.

Prompts are the new source code: In Software 3.0, well-crafted prompts function like programs and are central to instructing LLMs on what to do, replacing large parts of traditional code.

LLMs act as computing platforms: Language models serve as runtime engines, available on demand, capable of executing complex tasks, and forming a new computational substrate.

Feedback loops are essential: Effective use of LLMs involves iterative cycles—prompt, generate, review, and refine—to maintain control and quality over generated outputs.

Jagged intelligence introduces unpredictability: LLMs can solve complex problems but often fail on simple tasks, requiring human validation and cautious deployment.

LLMs lack persistent memory: Since models don’t retain long-term state, developers must handle context management and continuity externally.

“Vibe coding” accelerates prototyping: Rapid generation of code structures via conversational prompts can quickly build scaffolds but should be used cautiously for production-grade code.

Security and maintainability remain concerns: Generated code may be brittle, insecure, or poorly understood, necessitating rigorous testing and oversight.

Multiple paradigms must coexist: Developers should blend Software 1.0, 2.0, and 3.0 techniques based on task complexity, clarity of logic, and risk tolerance.

Infrastructure reliability is critical: As LLMs become central to development workflows, outages or latency can cause significant disruption, underscoring dependency risks.

Movies from the talk:

  • Rain Man (1988) - IMDb { www.imdb.com }

    When self-centered car dealer Charlie Babbitt learns that his estranged father's fortune has been left to an institutionalized older brother he never knew, Raymond, he kidnaps him in hopes of securing the inheritance. What follows is a transformative cross-country journey where Charlie discovers Raymond is an autistic savant with extraordinary memory and numerical skills. The film’s uniqueness lies in its sensitive portrayal of autism and the emotional evolution of a man reconnecting with family through empathy and acceptance.

  • Memento (2000) - IMDb { www.imdb.com }

    Leonard Shelby suffers from short-term memory loss, unable to form new memories after a traumatic event. He relies on Polaroid photos and tattoos to track clues in his obsessive search for his wife's killer. Told in a non-linear, reverse chronology that mirrors Leonard’s disoriented mental state, the film uniquely immerses the viewer in the protagonist’s fractured perception, making the mystery unravel in a mind-bending and emotionally charged fashion.

  • 50 First Dates (2004) - IMDb { www.imdb.com }

    Henry Roth, a commitment-phobic marine veterinarian in Hawaii, falls for Lucy Whitmore, a woman with anterograde amnesia who forgets each day anew after a car accident. To win her love, he must make her fall for him again every day. The film blends romantic comedy with neurological drama, and its charm comes from turning a memory disorder into a heartfelt and humorous exploration of persistence, love, and hope.

Tools:

2025-06-19 Andrej Karpathy on Software 3.0: Software in the Age of AI { www.latent.space }

image-20250618233234182

2025-06-16 Common Software Architectures and How they Fail - YouTube { www.youtube.com }

image-20250615202357978

Key Takeaways:

  • Modern deployment models (cloud, containers, serverless) simplify infrastructure maintenance, enhancing scalability and agility.
  • Horizontal scalability (using multiple servers) improves fault tolerance but introduces stateless application constraints.
  • Database performance optimization includes caching, read replicas, and CQRS but involves complexity and eventual consistency trade-offs.
  • Microservices address team and scalability issues but require careful handling of inter-service communication, fault tolerance, and increased operational complexity.
  • Modular monoliths, feature flags, blue-green deployments, and experimentation libraries like Scientist effectively mitigate deployment risks and complexity.

2025-06-09 Microservices, Where Did It All Go Wrong? - Ian Cooper - NDC Melbourne 2025 - YouTube { www.youtube.com }

image-20250608225447664

image-20250608230718013

This talk peels back the hype around microservices and asks why our bold leap into dozens—or even hundreds—of tiny, replaceable services has sometimes left us tangled in latency, brittle tests and orchestration nightmares. Drawing on the 1975 Fundamental Theory of Software Engineering, the speaker reminds us that splitting a problem into “manageably small” pieces only pays off if those pieces map to real business domains and stay on the right side of the intramodule vs intermodule cost curve. Through vivid “death star” diagrams and anecdotes of vestigial “restaurant hours” APIs, we see how team availability, misunderstood terminology and the lure of containers have driven us toward the anti-pattern of nano-services.

The remedy is framed via the 4+1 architectural views and a return to purpose-first design: start with a modular monolith until your domain boundaries—and team size—demand independent services; adopt classic microservices for clear subdomains owned by two-pizza teams; or embrace macroservices when fine-grained services impose too much overhead. By aligning services to business capabilities, designing for failure, and choosing process types per the 12-factor model, we strike the balance where cognitive load is low, deployments stay smooth and each component remains genuinely replaceable.

Tags: microservices, modular monolith, macroservices, bounded context, domain storytelling, 4+1 architecture, service granularity, team topologies

2025-06-04 The CIA method for making quick decisions under stress | Andrew Bustamante - YouTube { www.youtube.com }

image-20250603235631029

Understand that time is your most valuable asset because, unlike energy and money, you cannot create more of it; recognizing time’s finite nature shifts your mindset to treat each moment as critical.

When the number of tasks exceeds your capacity, you experience task saturation, which leads to decreased cognitive ability and increased stress; acknowledging this helps you avoid inefficiency and negative self-perception.

Apply the “subtract two” rule by carrying out two fewer tasks than you believe you can handle simultaneously; reducing your focus allows you to allocate more resources to each task and increases overall productivity.

Use operational prioritization by asking, “What is the next task I can complete in the shortest amount of time?”; this elementary approach leverages time’s objectivity to build momentum and confidence as you rapidly reduce your task load.

In high-pressure or dangerous situations, focus on executing the next fastest action—such as seeking cover—because immediate, simple decisions create space and momentum for subsequent choices that enhance survival.

Combat “head trash,” the negative self-talk that arises when you’re overwhelmed, by centering on the next simplest task; staying grounded in rational, achievable actions prevents emotional derailment and keeps you moving forward.

Practice operational prioritization consistently at home and work so that when you reach task saturation, doing the next simplest thing becomes an automatic response; repeated drilling transforms this method into a reliable tool that fosters resilience and peak performance.

Tags: time management, task saturation, operational prioritization, productivity, decision making, CIA methods, cognitive load, stress management, momentum, next-task focus, head trash, high-pressure situations, survival mindset, resource allocation, time as asset

2025-06-03 What You Should Know About FUTO Keyboard - YouTube { www.youtube.com }

image-20250603001715964

“with a PC keyboard. it bridges an electrical circuit to send a signal to your computer.” As typing evolved from mechanical typewriters to touchscreen apps, “software has become increasingly developed to serve its creators more than the users.” In many popular keyboards, “it sends everything you typed in that text field to somebody else's computer,” and “they say they may then go and train AI models on your data.” Even disabling obvious data-sharing options doesn’t fully stop collection—“in swift key there's a setting to share data for ads personalization and it's enabled by default.”

FUTO Keyboard addresses this by offering a fully offline experience: “it's this modern keyboard that has a more advanced auto correct,” and “the app never connects to the internet.” It provides “Swipe to Type,” “Smart Autocorrect,” “Predictive Text,” and “Offline Voice Input.” Its source code is under the “FUTO Source First License 1.1,” and it guarantees “no data collected” and “no data shared with third parties.”

privacy, offline, swipe typing, voice input, open source

2025-06-02 The internet shouldn't consume your life - YouTube { www.youtube.com }

image-20250601192845764

Dude, I’ve just been thinking a lot about how much we rely on the internet — like, way too much. Social media, video games, just endless scrolling — it’s all starting to feel like we’re letting the internet run our lives, you know? And yeah, I’m not saying we need to go full Amish or anything — there’s definitely real meaning you can find online, I’ve made some of my closest friends here. But we can’t keep letting it eat up all our time and attention. I’ve been lucky, my parents didn’t let me get video games as a kid, so I learned early on to find value outside of screens. But even now, it’s so easy to get sucked into that doom-scrolling hole — like, one minute you’re checking YouTube, and suddenly three hours are gone. We’ve gotta train ourselves, catch those moments, and build real focus again. It's not about quitting everything cold turkey, unless that works for you — it’s about moderation and making sure you’ve got stuff in your life that isn’t just online.

internet dependence, social media, balance, personal growth, generational habits

2025-06-02 Why good engineers keep burning out - YouTube { www.youtube.com }

image-20250601185939937I’ve been thinking a lot about something I call change energy. Everyone’s got a different threshold for how much change they can handle — their living situation, work, even what they eat. Too much stability feels boring, but too much change feels overwhelming. It’s all about where you sit on that spectrum.

For me, I don’t love moving, I burn out fast while traveling, but when it comes to my work, I need some change to stay engaged — not so much that everything’s new every day, but not so little that it gets stale. Developers usually sit on the lower end of that spectrum at work: stuck in old codebases, hungry for something fresh, constantly exploring new frameworks and tools because they’re not hitting their change threshold on the job.

Creators, though? It’s the opposite. We’re maxed out every single day. Every video has to be new, every thumbnail, every format — constant change. So any extra change outside of the content feels like too much. That’s why I didn’t adopt Frame.io for over a year, even though I knew it would help — I simply didn’t have the change energy to spare.

This difference is why creator tools are hard to sell to great creators: they're already burning all their change energy on making content. Meanwhile, great developers still have room to try new tools and get excited about them. That realization made us shift from creator tools to dev tools — because that’s where the most excited, curious people are.


meaningful quotes:

  1. "Humans need some level of stability in their lives or they feel like they’re going insane."
  2. "Most great developers are looking for more change. Most great creators are looking for less change."
  3. "Good creators are constantly trying new things with their content, so they’re unwilling to try new things anywhere else."
  4. "We need to feel this mutual excitement. We need to be excited about what we're building and the people that we're showing it to need to be excited as well."

image-20250601190733500

2025-05-30 Reflections on 25 years of writing secure code | BRK235 - YouTube { www.youtube.com }

image-20250529181550825

Michael Howard reflects on 25 years of writing Writing Secure Code, sharing insights from his career at Microsoft and the evolution of software security. He emphasizes that while security features do not equate to secure systems, the industry has made significant progress in eliminating many simple vulnerabilities, such as basic memory corruption bugs. However, new threats like server-side request forgery (SSRF) have emerged, highlighting that security challenges continue to evolve. Howard stresses the enduring importance of input validation, noting it remains the root cause of most security flaws even after two decades.

He advocates for a shift away from C and C++ towards memory-safe languages like Rust, C#, Java, and Go, citing their advantages in eliminating classes of vulnerabilities tied to undefined behavior and memory safety issues. Tools like fuzzing, static analysis (e.g., CodeQL), and GitHub's advanced security features play critical roles in identifying vulnerabilities early. Ultimately, Howard underscores that secure code alone isn’t sufficient; compensating controls, layered defenses, threat modeling, and continuous learning are essential. Security storytelling, he notes, remains a powerful tool for driving cultural change within organizations.

Quotes:

“Um, I hate JavaScript. God, I hate JavaScript. There are no words to describe how much I hate JavaScript.” Context: Michael Howard expressing his frustration with JavaScript during a live fuzzing demo.

“This thing is dumber than a bucket of rocks.” Context: Describing the simplicity of a custom fuzzer that nonetheless found serious bugs in seconds.

“If you don’t ask, it’s like being told no.” Context: The life lesson Michael learned when he decided to invite Bill Gates to write the foreword for his book.

“Security features does not equal secure features.” Context: Highlighting the gap between adding security controls and truly building secure systems.

“All input is evil until proven otherwise.” Context: A core principle from Writing Secure Code on why rigorous input validation remains critical.

“It’s better to crash an app than to run malicious code. They both suck, but one sucks a heck of a lot less.” Context: Advocating for secure-by-default defenses that fail safely rather than enable exploits.

“45 minutes later, he emailed back with one word, ‘absolutely.’” Context: Bill Gates’s rapid, enthusiastic response to writing the second-edition foreword.

“I often joke that I actually know nothing about security. I just know a lot of stories.” Context: Emphasizing the power of storytelling to make security lessons memorable and drive action.

software security, input validation, memory safety, secure coding, fuzzing, CodeQL, Rust, C/C++, SSRF, compensating controls, Microsoft, Secure Future Initiative

2025-05-27 Oddly useful Linux tools you probably haven't seen before - YouTube { www.youtube.com }

image-20250526204837094 Caught in a heavy downpour but grateful to be warm and dry inside, the speaker dives into a list of surprisingly useful tools. First is Webcam Eyes, a 200-line shell script that effortlessly mounts most modern cameras as webcams—especially useful for recording with tools like ffmpeg. After testing on multiple Canon and Sony cameras, it proved flawless. Next up is Disk, a colorful, graph-based alternative to df, offering cleaner output and useful export options like JSON and CSV, written in Rust and only marginally slower.

The Pure Bash Bible follows—a compendium of bash-only alternatives to common scripting tasks typically handled by external tools. It emphasizes performance and optimization for shell scripts. Then comes Zephyr, a nested X server useful for window manager development, poorly-behaved applications, or sandboxing within X11. Finally, a patch for cp and mv brings progress bars to these core utilities—helpful when rsync isn’t an option, even if coreutils maintainers deemed these tools “feature complete.”

tools, shell scripting, webcams, disk utilities, bash, X11, developer tools

2025-05-18 I changed databases again (please learn from my mistakes) - YouTube { www.youtube.com }

tags: database migration, Convex, IndexedDB, Dexi, sync engine, T3 Chat, optimistic updates, live queries, SSE streaming, resumable streams, PlanetScale, Drizzle ORM, Replicache Zero, feature flags, WebSocket authentication, TypeScript, JWT, session management, migration debugging, client-server architecture

image-20250518160147539

  1. Overview The speaker has completed yet another database migration—this time to Convex—and hopes it’s the last. After five grueling years of building and maintaining a custom sync engine and debugging for days on end, they finally reached a setup they trust for their T3 Chat application.
  2. Original Local-First Architecture
    • IndexedDB + Dexi: Entire client state (threads, messages) was serialized with SuperJSON, gzipped, and stored as one blob. Syncing required blobs to be re-zipped and uploaded whole, leading to race conditions (only one tab at a time), performance bottlenecks, and edge-case bugs in Safari.
    • Upstash Redis: Moved to Upstash with key patterns like message:userId:uuid, but querying thousands of keys on load proved unsustainable.
    • PlanetScale + Drizzle: Spun up a traditional SQL schema in two days. Unfortunately, the schema stored only a single SuperJSON field, bloating data and preventing efficient relational queries.
  3. Required Capabilities
    • Eliminate IndexedDB’s quirks.
    • One source of truth (no split brain between client and server).
    • Instant optimistic UI updates for renames, deletions, and new messages.
    • Resumable AI-generation streams.
    • Strong signed-out experience.
    • Unblock the engineering team by offloading sync complexity.
  4. Rejected Alternatives
    • Zero (Replicache): Required Postgres + custom WebSocket infra and separate schema definitions in SQL, client, and server permissions layers.
    • Other SDKs/ORMs: All suffered from duplicate definitions and didn’t fully solve client-as-source issues or resumable streams.
  5. Why Convex Won
    • TypeScript-first application database: Single schema file, no migrations for shape changes.
    • Built-in sync engine: WebSocket transactions automatically push updates to subscribed queries.
    • Permissions in code: Easily enforce row-level security in TS handlers.
    • Live queries: Any mutation (e.g. updating a message’s title) immediately updates all listeners without manual cache management.
  6. Refactored Message Flow
    1. Create mutations in Convex for new user and assistant messages before calling the AI.
    2. Stream SSE from /api/chat to the client for optimistic token-by-token rendering.
    3. Chunked writes: Instead of re-writing the entire message on every token, batch updates to Convex every 500 ms (future improvement: use a streamId field and Vercel’s resumable-stream helper).
    4. Title generation moved from brittle SSE event parsing & IndexedDB writes to a simple convex.client.mutation('chat/updateTitle', { threadId, title }). The client auto-refreshes via live query.
  7. Migration Path
    • Feature flag: Users opt into the Convex beta via a settings toggle.
    • Chunked data import: Server-side Convex mutations ingest threads (500 per chunk), messages (100 per chunk), and attachments from PlanetScale.
    • Cookie & auth handling: Adjusted HttpOnly, Expires, and JWT parsing (switched from a custom-sliced ID to the token’s subject field) to ensure WebSocket authentication and avoid Brave-specific bugs.
  8. Major Debugging Saga A rare Open-Auth library change caused early users’ tokens to carry user:
 identifiers instead of numeric Google IDs. Only by logging raw JWT fields and collaborating with an early adopter could this be traced—and fixed by reading the subject claim directly.
  9. Outcomes & Benefits
    • Eliminated IndexedDB’s instability and custom sync engine maintenance.
    • Unified schema and storage in Convex for all client and server state.
    • Robust optimistic updates and live data subscriptions.
    • Resumable AI streams via planned streamId support.
    • Improved signed-out flow using Convex sessions.
    • Team now free to focus on product features rather than sync orchestration.
  10. Next Steps
    • Migrate full user base.
    • Integrate resumable-stream IDs into messages for fault-tolerant AI responses.
    • Monitor Convex search indexing improvements under high write load.
    • Celebrate the end of database migrations—at least until the next big feature!

2025-04-25 The Inside Story of the Windows Start Menu - YouTube { www.youtube.com }

image-20250424215407832

The Windows Start Menu has a deep history that mirrors the evolution of Microsoft's operating systems. Beginning with the command-line MS-DOS interface in 1981 and the basic graphical MS-DOS Executive in Windows 1.0, Microsoft gradually developed more user-friendly navigation systems. Windows 3.1's Program Manager introduced grouped icons for application access, but the major breakthrough came with Windows 95, which debuted the hierarchical Start Menu. Inspired by the Cairo project, this menu featured structured sections like Programs, Documents, and Settings, designed for easy navigation on limited consumer hardware.

Subsequent versions saw both visual and technical advancements: NT4 brought Unicode support and multithreading; XP introduced the iconic two-column layout with pinned and recent apps; Vista added search integration and the Aero glass aesthetic; and Windows 7 refined usability with taskbar pinning. Windows 8's touch-focused Start Screen alienated many users, leading to a partial rollback in 8.1 and a full restoration in Windows 10, which blended traditional menus with live tiles. Windows 11 centered the Start Menu, removing live tiles and focusing on simplicity.

Technically, the Start Menu operates as a shell namespace extension managed by Explorer.exe, using Win32 APIs and COM interfaces. It dynamically enumerates shortcuts and folders via Shell Folder interfaces, rendering content through Windows' menu systems. A personal anecdote from developer Dave Plamer highlights an attempted upgrade to the NT Start Menu's sidebar using programmatic text rendering, which was ultimately abandoned in favor of simpler bitmap graphics due to localization complexities. This story underscores the blend of technical ambition and practical constraints that have shaped the Start Menu's legacy.

windows history, start menu, user interface design, microsoft development, operating systems, windows architecture, software engineering lessons

2025-03-24 Keynote: The past, present, and future of AI for application developers - Steve Sanderson - YouTube { www.youtube.com }

Tags: AI, application development, history, chatbots, neural networks, Markov models, GPT, large language models, small language models, business automation, agents, speech recognition, API integration.

image-20250323204028609

image-20250323210700467

2025-03-16 The Definition Of Insanity - Sam Newman - NDC London 2025 - YouTube { www.youtube.com }

Tags: Distributed Systems, Timeouts, Retries, Idempotency, Resilience, Reliability, Fault Tolerance, Network Communication, System Design, Exponential Backoff, Unique Request IDs, Request Fingerprinting, Latency Management, Resource Management, System Robustness, Software Engineering, Architecture Best Practices

image-20250316144920370

Timeouts: In distributed systems, waiting indefinitely leads to resource exhaustion, degraded performance, and cascading failures. Timeouts establish explicit limits on how long your system waits for responses, preventing unnecessary resource consumption (e.g., tied-up threads, blocked connections) and ensuring the system remains responsive under load.

Purpose: Timeouts help maintain system stability, resource efficiency, and predictable performance by immediately freeing resources from stalled or unresponsive requests.

Implementation: Clearly define timeout thresholds aligned with realistic user expectations, network conditions, and system capabilities. Even asynchronous or non-blocking architectures require explicit timeout enforcement to prevent resource saturation.

Challenges: Selecting appropriate timeout durations is complex—timeouts that are too short risk prematurely dropping legitimate operations, while excessively long durations cause resource waste and poor user experience. Dynamically adjusting timeouts based on system conditions adds complexity but improves responsiveness.

Tips:

  • Regularly monitor and adjust timeout values based on actual system performance metrics.
  • Clearly document timeout settings and rationale to facilitate maintenance and future adjustments.
  • Avoid overly aggressive or overly conservative timeouts; aim for a balance informed by real usage patterns.

Retries: Transient failures in distributed systems are inevitable, but effective retries allow your application to gracefully recover from temporary issues like network glitches or brief service disruptions without manual intervention.

Purpose: Retries improve reliability and user experience by automatically overcoming short-lived errors, reducing downtime, and enhancing system resilience.

Implementation: Implement retries using explicit retry limits to prevent repeated attempts from overwhelming system resources. Employ exponential backoff techniques to progressively delay retries, minimizing retry storms. Introducing jitter (randomized delays) can further reduce the risk of synchronized retries.

Challenges: Differentiating between transient errors (which justify retries) and systemic problems (which do not) can be difficult. Excessive retries can compound problems, causing resource contention, performance degradation, and potential system-wide failures. Retries also introduce latency, potentially affecting user experience.

Tips:

  • Set clear maximum retry limits to prevent endless retry loops.
  • Closely monitor retry attempts and outcomes to identify patterns that signal deeper system issues.
  • Use exponential backoff and jitter to smooth retry load, avoiding spikes and cascades in resource use.

Idempotency: Safely retrying operations depends heavily on idempotency—the principle that repeating the same operation multiple times yields the exact same outcome without unintended side effects. This is similar to repeatedly pressing an elevator button; multiple presses don't summon additional elevators, they simply confirm your original request.

Purpose: Idempotency guarantees safe and predictable retries, preventing duplicated transactions, unintended state changes, and inconsistent data outcomes.

Implementation Approaches:

  • Unique Request IDs: Assign each request a unique identifier, allowing the system to recognize and manage duplicate requests effectively.
  • Request Fingerprinting: Generate unique "fingerprints" (hashes) for requests based on key attributes (user ID, timestamp, request content) to detect and safely handle duplicates. Fingerprints help differentiate legitimate retries from genuinely new operations, mitigating risks of duplication.
  • Naturally Idempotent Operations: Architect operations to inherently produce identical outcomes upon repeated execution, using methods such as stateless operations or RESTful idempotent verbs (e.g., PUT instead of POST).

Challenges: Achieving true idempotency is complex when operations involve external resources, mutable states, or multiple integrated services. Fingerprinting accurately without false positives is challenging, and maintaining idempotency alongside rate-limiting or throttling mechanisms requires careful system design.

Tips:

  • Clearly mark operations as idempotent or non-idempotent in API documentation, helping developers and maintainers understand system behaviors.
  • Combine multiple idempotency strategies (unique IDs and fingerprints) for higher reliability.
  • Regularly validate and review idempotency mechanisms in real-world production conditions.
  • Ensure robust logging and tracing to monitor idempotency effectiveness, catching issues early.

2025-03-30 Using GenAI on your code, what could possibly go wrong? - - YouTube { www.youtube.com }

Tags: AI security, code generation, prompt injection

  • Generative AI tools like Copilot and ChatGPT often create insecure code—up to 40% has flaws.
  • Developers tend to trust AI output too much without proper validation.
  • Security debt is rising; most teams can’t fix issues faster than they appear.
  • AI risks fall into three layers: usage (how it’s used), application (how it’s integrated), and platform (how the models are built).
  • Plugins and prompt injection can give attackers unintended access and control.
  • LLM output is only as safe as its most sensitive input—plugins must be tightly controlled.
  • Backdoored models and poisoned training data are real threats.
  • Better prompts, human review, and secure defaults are essential.
  • Tools like PentestGPT and Oxbo show potential for AI to help find and fix security flaws.

Educational resources mentioned:

  • AI Security Fundamentals (Microsoft Learn)
  • Generative AI with Large Language Models (Coursera)
  • 3Blue1Brown (YouTube)
  • BlueHat talk on prompt injection
  • Microsoft Build/Ignite AI security talks
  • OpenFuse (model supply chain security)
  • AWS Bedrock Guardrails
  • Josh Brown-White talk on secure code fixing with AI

image-20250329171956006 image-20250329172051922