Skip to main content

Links from my inbox 2024-08-30

· 15 min read

[[TOC]]

How the things work

2024-08-31 Hypervisor From Scratch - Part 1: Basic Concepts & Configure Testing Environment | Rayanfam Blog { rayanfam.com }

Hypervisor From Scratch

The source code for Hypervisor From Scratch is available on GitHub :

[https://github.com/SinaKarvandi/Hypervisor-From-Scratch/]

2024-08-31 Reversing Windows Internals (Part 1) - Digging Into Handles, Callbacks & ObjectTypes | Rayanfam Blog { rayanfam.com }

2024-08-31 A Tour of Mount in Linux | Rayanfam Blog { rayanfam.com }

image-20240830200258339

2024-09-01 tandasat/Hypervisor-101-in-Rust: { github.com }

The materials of "Hypervisor 101 in Rust", a one-day long course, to quickly learn hardware-assisted virtualization technology and its application for high-performance fuzzing on Intel/AMD processors.

https://tandasat.github.io/Hypervisor-101-in-Rust/

image-20240901010106576

SAML

2024-09-02 A gentle introduction to SAML | SSOReady { ssoready.com }

image-20240901234406239

2024-09-02 Visual explanation of SAML authentication { www.sheshbabu.com }

image-20240901233107815

:thinking: Tricks!

2024-09-02 saving my git email from spam { halb.it }

Github has a cool option that replaces your private email with a noreply github email, which looks like this: 14497532+username@users.noreply.github.com. You just have to enable “keep my email address private” in the email settings. You can read the details in the github guide for setting your email privacy.

With this solution your email will remain private without loosing precious green squares in the contribution graph.

CRDT

2024-09-01 Movable tree CRDTs and Loro's implementation – Loro { loro.dev }

This article introduces the implementation difficulties and challenges of Movable Tree CRDTs when collaboration, and how Loro implements it and sorts child nodes.

Art and Assets

2024-09-01 Public Work by Cosmos { public.work }

image-20240901005017480

Game Theory 101

2024-09-01 ⭐️ Game Theory 101 (#1): Introduction - YouTube { www.youtube.com }

image-20240901010905811

2024-09-01 Finding Nash Equilibria through Simulation { coe.psu.ac.th }

image-20240901011057303

(Emacs)

2024-09-01 A Simple Guide to Writing & Publishing Emacs Packages { spin.atomicobject.com }

image-20240901153404884

2024-09-01 Emacs starter kit { emacs-config-generator.fly.dev }

image-20240901153233791

2024-09-01 dot-files/emacs-blog.org at 1b54fe75d74670dc7bcbb6b01ea560c45528c628 · howardabrams/dot-files { github.com }

image-20240901152917238

2024-08-31 ⭐️ The Organized Life - An Expert‘s Guide to Emacs Org-Mode – TheLinuxCode { thelinuxcode.com }

2024-08-31 ⭐️ Mastering Organization with Emacs Org Mode: A Complete Guide for Beginners – TheLinuxCode { thelinuxcode.com }

image-20240830193810145

2024-08-30 chrisdone-archive/elisp-guide: A quick guide to Emacs Lisp programming { github.com }

image-20240830134758680

2024-08-30 Getting Started With Emacs Lisp Hands On - A Practical Beginners Tutorial – Ben Windsor – Strat at an investment bank { benwindsorcode.github.io }

image-20240830135224690

Retro / Fun

2024-08-30 VisiCalc - The Early History - Peter Jennings { benlo.com }

image-20240830135448117

2024-09-01 paperclips { www.decisionproblem.com }

image-20240901153052859

2024-09-02 Seiko Originals: The UC-2000, A Smartwatch from 1984 – namokiMODS { www.namokimods.com }

image-20240901235821210

Inspiration

2024-09-02 Navigating Corporate Giants Jeffrey Snover and the Making of PowerShell - CoRecursive Podcast { corecursive.com }

image-20240902001457920

I joined Microsoft at a time when the company was struggling to break into the enterprise market. While we dominated personal computing, our tools weren’t suitable for managing large data centers. I knew we needed a command-line interface (CLI) to compete with Unix, but Microsoft’s culture was deeply rooted in graphical user interfaces (GUIs). Despite widespread skepticism, I was determined to create a tool that could empower administrators to script and automate complex tasks.

My first major realization was that traditional Unix tools wouldn’t work on Windows because Unix is file-oriented, while Windows is API-oriented. This led me to focus on Windows Management Instrumentation (WMI) as the backbone for our CLI. Despite this, I faced resistance from within. The company only approved a handful of commands when we needed thousands. To solve this, I developed a metadata-driven architecture that allowed us to efficiently create and scale commands, laying the foundation for PowerShell.

However, getting others on board was a challenge. When I encountered a team planning to port a Unix shell to Windows, I knew they were missing the bigger picture. To demonstrate my vision, I locked myself away and wrote a 10,000-line prototype of what would become PowerShell. This convinced the team to embrace my approach.

I was able to show them and they said, ‘Well, what about this?’ And I showed them. And they said, ‘What about that?’ And I showed them. Their eyes just got big and they’re like, ‘This, this, this.’

Pursuing this project meant taking a demotion, a decision that was financially and personally difficult. But I was convinced that PowerShell could change the world, and that belief kept me going. To align the team, I wrote the Monad Manifesto, which became the guiding document for the project. Slowly, I convinced product teams like Active Directory to support us, which helped build momentum.

The project faced another major challenge during Microsoft’s push to integrate everything with .NET. PowerShell, built on .NET, was temporarily removed from Windows due to broader integration issues. It took years of persistence to get it back in, but I eventually succeeded.

PowerShell shipped with Windows Vista, but I continued refining it through multiple versions, despite warnings that focusing on this project could harm my career. Over time, PowerShell became a critical tool for managing data centers and was instrumental in enabling Microsoft’s move to the cloud.

In the end, the key decisions—pushing for a CLI, accepting a demotion, and persisting through internal resistance—led to PowerShell's success and allowed me to make a lasting impact on how Windows is managed.

2024-09-02 Netflix/maestro: Maestro: Netflix’s Workflow Orchestrator { github.com }

image-20240901234630103

2024-09-01 The Scale of Life { www.thescaleoflife.com }

image-20240901153703324

2024-09-01 opslane/opslane: Making on-call suck less for engineers { github.com }

image-20240901152737861

2024-09-01 Azure Quantum | Learn with quantum katas { quantum.microsoft.com }

image-20240901152236367

2024-09-01 microsoft/QuantumKatas: Tutorials and programming exercises for learning Q# and quantum computing { github.com }

2024-09-01 EP122: API Gateway 101 - ByteByteGo Newsletter { blog.bytebytego.com }

2024-09-01 pladams9/hexsheets: A basic spreadsheet application with hexagonal cells inspired by: http://www.secretgeek.net/hexcel. { github.com }

image-20240901010426062

2024-09-01 Do Quests, Not Goals { www.raptitude.com }

The other problem with goals is that, outside of sports, “goal” has become an uninspiring, institutional word. Goals are things your teachers and managers have for you. Goals are made of quotas and Key Performance Indicators. As soon as I write the word “goals” on a sheet of paper I get drowsy.

image-20240901005313993

Here are some of the quests people took on:

  • Declutter the whole house
  • Record an EP
  • Prep six months’ worth of lessons for my students
  • Set up an artist’s workspace
  • Finish two short stories
  • Gain a basic knowledge of classical music
  • Fill every page in a sketchbook with drawings
  • Complete a classical guitar program
  • Make an “If I get hit by a bus” folder for my family

2024-08-30 oTranscribe { otranscribe.com }

image-20240830135922316

Security

2024-08-31 The State of Application Security 2023 • Sebastian Brandes • GOTO 2023 - YouTube { www.youtube.com }

image-20240830192609064

Sebastian, co-founder of Hey Hack, a Danish startup focused on web application security, presented findings from a large-scale study involving the scanning of nearly 4 million hosts globally. The study uncovered widespread vulnerabilities in web applications, including file leaks, dangling DNS records, vulnerable FTP servers, and persistent cross-site scripting (XSS) issues.

Key findings include:

  • File leaks: 29% of organizations had exposed sensitive data like source code, passwords, and private keys.
  • Dangling DNS records: Risks of subdomain takeover attacks due to outdated DNS entries.
  • Vulnerable FTP servers: 7.9% of servers running ProFTPD 1.3.5 were at risk due to a file copy module vulnerability.
  • XSS vulnerabilities: 4% of companies had known XSS issues, posing significant security risks.

Sebastian stressed that web application firewalls (WAFs) are not foolproof and cannot replace fixing underlying vulnerabilities. He concluded by emphasizing the importance of early investment in application security during the development process to prevent future attacks.

"We’ve seen lots of leaks or file leaks that are sitting out there—files that you probably would not want to expose to the public internet."

"Web application firewalls can maybe do something, but they’re not going to save you. It’s much, much better to go ahead and fix the actual issues in your application."

2024-08-30 BeEF - The Browser Exploitation Framework Project { beefproject.com }

image-20240830140152625

2024-08-31 stack-auth/stack: Open-source Clerk/Auth0 alternative { github.com }

Stack Auth is a managed user authentication solution. It is developer-friendly and fully open-source (licensed under MIT and AGPL).

Stack gets you started in just five minutes, after which you'll be ready to use all of its features as you grow your project. Our managed service is completely optional and you can export your user data and self-host, for free, at any time. image-20240830194951803

Markdown

2024-09-02 romansky/dom-to-semantic-markdown: DOM to Semantic-Markdown for use in LLMs { github.com }

image-20240901232517227

C || C++

2024-09-02 Faster Integer Parsing { kholdstare.github.io }

image-20240901233314132

2024-09-01 c++ - What is the curiously recurring template pattern (CRTP)? - Stack Overflow { stackoverflow.com }

image-20240901144719965

image-20240901144828823

The Era of AI

2024-09-02 txtai { neuml.github.io }

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

image-20240901235351463

2024-09-02 Solving the out-of-context chunk problem for RAG { d-star.ai }

Many of the problems developers face with RAG come down to this: Individual chunks don’t contain sufficient context to be properly used by the retrieval system or the LLM. This leads to the inability to answer seemingly simple questions and, more worryingly, hallucinations.

Examples of this problem

  • Chunks oftentimes refer to their subject via implicit references and pronouns. This causes them to not be retrieved when they should be, or to not be properly understood by the LLM.
  • Individual chunks oftentimes don’t contain the complete answer to a question. The answer may be scattered across a few adjacent chunks.
  • Adjacent chunks presented to the LLM out of order cause confusion and can lead to hallucinations.
  • Naive chunking can lead to text being split “mid-thought” leaving neither chunk with useful context.
  • Individual chunks oftentimes only make sense in the context of the entire section or document, and can be misleading when read on their own.

2024-08-30 MahmoudAshraf97/whisper-diarization: Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper { github.com }

2024-08-30 openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision { github.com }

2024-08-30 ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++ { github.com }

2024-09-01 microsoft/semantic-kernel: Integrate cutting-edge LLM technology quickly and easily into your apps { github.com }

2024-09-01 How to add genuinely useful AI to your webapp (not just chatbots) - Steve Sanderson - YouTube { www.youtube.com }

image-20240901012420483

The talk presented here dives into the integration of AI within applications, particularly focusing on how developers, especially those familiar with .NET and web technologies, can leverage AI to enhance user experiences. Here are the key takeaways and approaches from the session:

Making Applications Intelligent: The speaker discusses various interpretations of making an app "intelligent." It’s not just about adding a chatbot. While chatbots can create impressive demos quickly, they may not necessarily be useful in production. For AI to be genuinely beneficial, it must save time, improve job performance, and be accurate. The speaker challenges developers to quantify these benefits rather than rely on assumptions.

"If you try to put it into production, are people going to actually use it? Well, maybe it depends... does this thing actually save people time and enable them to do their job better than they would have otherwise?"

Patterns of AI Integration: The speaker introduces several UI-level AI enhancements such as Smart Components. These are experiments allowing developers to add AI to the UI layer without needing to rebuild the entire app. An example given is a Smart Paste feature that allows users to paste large chunks of text, which AI then parses and fills out the corresponding fields in a form. This feature improves user efficiency by reducing the need for repetitive and mundane tasks.

Another example is the Smart ComboBox, which uses semantic search to match user input with relevant categories, even when the exact terms do not appear in the list. This feature is particularly useful in scenarios where users may not know the exact terminology.

Deeper AI Integration: Moving beyond UI enhancements, the speaker explores deeper layers of AI integration within traditional web applications like e-commerce platforms. For instance, AI can be used to:

  • Semantic Search: Improve search functionality so that users don't need to know the exact phrasing.
  • Summarization: Automatically generate descriptive titles for support tickets to help staff quickly identify issues.
  • Classification: Automatically categorize support tickets to streamline workflows and save staff time.
  • Sentiment Analysis: Provide sentiment scores to help staff prioritize urgent issues.

"I think even in this very traditional web application, there's clearly lots of opportunity for AI to add a lot of genuine value that will help your staff actually be more productive."

Data and AI Integration: The talk also delves into the importance of data in AI applications. The speaker introduces the Semantic Kernel, a .NET library for working with AI, and demonstrates how to generate data using LLMs (Large Language Models) locally on the development machine using Ollama. The process involves creating categories, products, and related data (like product manuals) in a structured manner.

Data Ingestion and Semantic Search: The speaker showcases how to ingest unstructured data, such as PDFs, and convert them into a format that AI can use for semantic search. Using the PDFPig library, the speaker demonstrates extracting text from PDFs, chunking it into smaller, meaningful fragments, and then embedding these chunks into a semantic space. This allows for efficient, relevant searches within the data, enhancing the AI’s ability to provide accurate information quickly.

Implementing Inference with AI: As the talk progresses, the speaker moves on to implementing AI-based inference within a Blazor application. By integrating summarization directly into the workflow, the application can automatically generate summaries of customer interactions, helping support staff to quickly understand the context of a ticket without reading through the entire conversation history.

"I want to generate an updated summary for it... Generate a summary of the entire conversation log at that point."

Function Calling and RAG (Retrieval-Augmented Generation): The speaker discusses a more complex AI pattern—RAG—which involves the AI model retrieving specific data to answer queries. While standard RAG implementations rely on specific AI platforms, the speaker demonstrates a custom approach that works across various models, including locally run models like Ollama. This approach involves checking if the AI has enough context to answer a question and then retrieving relevant information if needed.

Job interview / Algorithms

2024-09-01 Understanding B-Trees: The Data Structure Behind Modern Databases - YouTube { www.youtube.com }

image-20240901011314149

Editing Distance

2024-09-02 Needleman–Wunsch algorithm - Wikipedia { en.wikipedia.org }

2024-09-02 Levenshtein distance - Wikipedia { en.wikipedia.org }

function LevenshteinDistance(char s[1..m], char t[1..n]):
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t
declare int d[0..m, 0..n]

set each element in d to zero

// source prefixes can be transformed into empty string by
// dropping all characters
for i from 1 to m:
d[i, 0] := i

// target prefixes can be reached from empty source prefix
// by inserting every character
for j from 1 to n:
d[0, j] := j

for j from 1 to n:
for i from 1 to m:
if s[i] = t[j]:
substitutionCost := 0
else:
substitutionCost := 1

d[i, j] := minimum(d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + substitutionCost) // substitution

return d[m, n]