Michael is an exceptionally effective, clear-thinking product and platform engineer. He blends practicality with taste—fast to ship, careful with systems, and relentlessly user-focused. He leads with clarity, kindness, and high standards—teams move faster and feel calmer. Skills & Competencies: - Customer analytics: Pub/Sub, BigQuery, Metabase; Tinybird; Looker/Tableau - AI/ML: custom models, CUDA, SageMaker, RAG, embeddings & vector DBs - Platform/DevOps: multi-region (US/AU/UK/EU, GDPR DE), SRE, eventing - Product engineering: SaaS, pricing & packaging, experimentation (A/B) - Integrations: marketplaces, webhook/event architectures, Supabase portals - Growth engineering: scaled from early traction to tens of millions in ARR - Domains: structural engineering workflows (calcs.com), martech, CV/ads, data platforms Full posts: URL: /blog/build-as-fast-as-you-think Title: Build as fast as you think Date: 2025-12-28T00:00:00.000Z --- title: "Build as fast as you think" description: "How compounding speed, smarts and results keep Product Engineers in flow." date: "2025-12-28" --- We are at the frontier of the software craft. Soon speed of thought will match it existing in a branch and seeing it - crunching existing embedded aptterns, workflows, paradigms. Huge swathes of job titles will just go. These people would sit down with a clear vision, only to spend the next few hours fighting with setup scripts, syntax errors, and broken dependencies. Your brain was moving at light speed, but your fingers could only go so fast. That gap is finally closing. Today, being a product engineer feels less like manual labor and more like puppet master. Your customer feedback -> fix is instant and everyones happy - except those that dont adapt and get stuck in their grief cycle. The speed of shipping has finally caught up to the speed of thought. ## Beyond the syntax We used to believe that you had to write every line of code yourself to truly understand a system. There was a sense of pride in knowing where every bit lived and how it runs. But as the tools we use have become more capable, that belief has started to fade. Many of us have stopped reading most of the code our systems produce. We watch the stream of work happen in real time, and we step in only when something looks off. This shift works because we have moved our focus from the lines of code to the results they produce. The living blob of a system now will eventually improve instead of laboured over up front: If it works and feels right, then ship it. If you can build a small tool, run it, and see that it works perfectly, do you really need to spend twenty minutes checking the logic? For most of what we build, the answer is no. We are learning to trust the output and focus our energy on the design and the user experience instead of the plumbing. We have transcended text and learn to trust in the systems that spit out results. ## Parallel progress My current workflow has changed to match this speed. Instead of grinding away at one task until it is perfect, I tend to work on one project locally while spinning up several ideas at once in the background. Each idea becomes a job for an agent. I don't wait for one thing to finish before starting the next. I might have three or four "vines" of thought growing at the same time. While I am tweaking the UI on the main project, an agent is elsewhere refactoring a data layer or building out a new CLI tool. By the time I finish my manual task, I have three ready-to-review pieces of work waiting for me. It keeps the momentum high and prevents that frustrating feeling of being stuck on a single problem. ## Backlog -> Review Queue We’ve all had those backlogs that sit and rot for months. In the past, managing a project meant staring at a long list of tickets in a tool like Linear. While those tools are still great for organization, they can sometimes feel like too much overhead when the work itself has become so fast. These days, the goal is to keep the "work in progress" moving as quickly as possible. The biggest change is how we treat those old tickets. Gone are the days when a backlog just sits there. Now, you can take your entire backlog—even the ideas you haven't looked at in weeks—and run them all at once. As long as the ticket has a decent heading and a bit of context, you can dispatch them as jobs to background agents. You don't even need to stay in the room. You let the agents take a crack at "one-shotting" the task or at least suggesting the best path forward. When you come back, you aren't looking at a list of things to do; you are looking at a list of pull requests to review and merge. Most of them will probably work. For the ones that don't, the agent's attempt usually gives you enough context or a "quirk" to help you reprompt and try again instantly. Take your backlog and give the agent a go at one-shotting it all, and report how well you go ## What stays human If the machine is doing the heavy lifting, people wonder what is left for the engineer. The answer is the hard thinking. Most software is just moving data from one place to another, and the tools are great at that. But deciding how the data should be structured, and why the app needs to exist in the first place, is still a human job. We are moving away from being people who type and toward being people who decide. Our value is no longer in our ability to remember a specific library or fix a bug in the dark. Our value is in our taste, our ability to see the big picture, and our drive to solve real problems for real people. We are finally free to build as fast as we can think. ## Anti-body reaction (natural pessimism) This is a pretty hot topic at the moment and some purists aren't happy about it. "Oh AI slop is everywhere, so i'm guaranteed a job for life!" is massive cope. You're betting your career on the idea that AI will stay mediocre enough to constantly produce garbage needing human babysitting, let alone ignoring the compounding exponentials of self-improvement these systems have. That's not ambition; that's praying for perpetual inefficiency so you can mop up the mess. It's choosing to clean the pie crumbs off the floor instead of baking bigger pies, or better yet, building the factories that that churn out thousands of pies a day. "I'll be the human in the loop fixing hallucinations" is like a 1910 carriage repairman bragging he'll always have work because those new automobiles break down so often. Embrace the force multipliers as it improves and get over yourself. ## Appendix A few thinks related to this topic for further investigation: Shipping at Inference-Speed: https://steipete.me/posts/2025/shipping-at-inference-speed Boris Cherny on Claude Code: https://x.com/bcherny/status/2004887829252317325 Lee Robinson on Pixo: https://x.com/leerob/status/2005700621463330888 --- URL: /blog/async-home Title: async home Date: 2025-11-25T00:00:00.000Z --- title: "async home" description: "operate your home as you do your APIs - async supply" date: "2025-11-25" --- # Nail household procurement, logistics and jit operations Spending our Saturday mornings dodging shopping carts and queuing at fuel pumps? For the last year, I’ve been treating my household logistics like a software engineering problem. I call it the "Async Pantry." It’s the shift from "synchronous" errands—where you stop everything to go fetch supplies—to a background system where necessities arrive automatically. The goal isn't just to be "efficient." It’s about reclaiming the mental bandwidth we waste on the question: "Do we have enough toilet paper?" ## The High Cost of "Synchronous" Home Operations Think about the traditional grocery run. It’s a sensory nightmare: fluorescent lights, uninspiring space, and the "decision fatigue" of choosing between thirty brands of olive oil while you're hungry and tired. Studies show the average person spends hours every week on errands. But it’s not just the time - it’s the cognitive load. It’s the constant, low-level hum of a "to-do" list running in the back of your mind. I realized that if I could automate the boring stuff, I could spend those hours on things that actually matter—like a long dinner or a side project—without the "errand hangover." ## Phase 1: The "Dull Stuff" Auto-Pilot The foundation of an Async Pantry is delegating the non-perishables. Anything that doesn't rot should never require a special trip to the store. I use Subscribe & Save (Amazon) for the "Core Four": Paper & Plastic: Toilet paper, paper towels, trash bags. Laundry & Cleaning: Detergents, dish soap, sponges. Pantry Staples: Rice, pasta, oils, spices, and canned goods. The trick: Set the frequency longer than you think you need. It’s better to have a slightly light pantry than to be buried in boxes of dish soap. Most services give you a 5-15% discount for this, which usually offsets the "convenience fee" of delivery. Where it doesn't really work is when Amazon is far more expensive than the shop down your street, or its quality is a bit off. We still go down there but its almost monthly or on a last-minute basis if we have an idea and need to be specific. ## Phase 2: Seasonal Freshness (The "Mystery Box" Strategy) Automation usually fails when it comes to fresh produce because "automatic" often means "lower quality." The fix? Farm Boxes. Instead of picking through sad, refrigerated tomatoes at the supermarket, I have a local farm box (like Farmers Pick) delivered weekly or fortnightly . The benefit: You get what’s in season. The mindset shift: You stop meal planning by looking at a screen - you plan based on what showed up on your porch. It forces variety into your diet and supports local growers. Restrictions fuel creativity! It's mostly the good stuff anyway so you can always keep your normal list but sometimes new things pop up. ## Moving the Needle: Background Fueling Once the pantry is async, you start noticing other "synchronous" leaks in your schedule. This includes: Fuel, Coffee beans (good ones) and you may have more. Switching to an EV changed my relationship with time more than I expected. When you charge at home, "fueling" becomes a background task that happens while you sleep. It’s the ultimate async move. No more stopping at a pump on a cold Tuesday morning because the light came on. You just wake up with a full tank every day. ## Why Bother? (The Reality Check) This is about intentionality. When you offload the repetitive, low-value tasks of modern life to background systems, your RAM clears up. You’re more present. You aren't "running out" for milk - you're already home, cooking with fresh ingredients that were delivered while you were at work. A warning: It’s not a "set it and forget it" miracle. You’ll occasionally have too many onions or run out of trash bags a week early. But the transition from a life of "constant errands" to "background flow" is transformative. Start with one item. Put your coffee or your laundry detergent on auto-ship today. Your future, more relaxed self will appreciate the extra hour of sleep. --- URL: /blog/evals Title: llm eval explorations Date: 2025-10-07T00:00:00.000Z --- title: "llm eval explorations" description: "A lightweight setup for testing and improving LLM-based products." date: "2025-10-07" --- Most people overcomplicate model evaluation. If you’re building with LLMs, you don’t need a research lab. You just need a way to measure whether your system behaves as expected. This is the simple setup I use across my own products. ## What I Measure I only track a few things that directly affect users: - **Accuracy** – factual correctness - **Context recall** – especially for RAG - **Tone and safety** – is it something I’d ship - **Latency and cost** – practical trade-offs That’s enough to catch drift and guide iteration. ## Tools ### DeepEval [DeepEval](https://github.com/confident-ai/deepeval) is a Python library that works like Pytest for LLMs. It’s quick to script and easy to drop into CI. ```python from deepeval import assert_test from deepeval.metrics import FaithfulnessMetric from deepeval.test_case import LLMTestCase test_case = LLMTestCase( input="What is the capital of France?", actual_output="Paris", context=["France's capital is Paris"] ) metric = FaithfulnessMetric(threshold=0.7) assert_test(test_case, [metric]) ```` Run these tests automatically before deploys and track regressions over time. ### Ragas If your product uses retrieval, [Ragas](https://github.com/explodinggradients/ragas) adds metrics for context relevance and hallucination. It’s useful for debugging when your model starts citing the wrong documents. ### Langfuse or LangSmith For production, I log all prompts and outputs to [Langfuse](https://langfuse.com) (self-hosted) or [LangSmith](https://smith.langchain.com). This gives visibility into how the model performs on real traffic. It’s tracing, not research. ## My Workflow 1. Collect recent user queries and responses. 2. Run DeepEval and Ragas locally. 3. Review low scores and fix prompts or retrieval. 4. Re-run tests until consistent. 5. Monitor production traces in Langfuse. Simple, repeatable, and fast to maintain. ## Why This Works * Easy to automate * Works with any model * Scales with your product * Costs nothing to start If you ever outgrow it, you can move to Arize or Humanloop. But for most builders, this stack is enough. --- URL: /blog/always-send-events Title: Always Send Events Date: 2025-10-05T00:00:00.000Z --- title: "Always Send Events" description: "Why every server should emit rich events, even if you don’t use them directly yet." date: "2025-10-05" --- When something happens in your system - an order placed, a user invited, an entity archived updated — emit an event. Always. It doesn’t matter if you think you’ll use it later or not. This is complimentary to having an "audit log" db table. ## Send Everything to Pub/Sub Push all events into a message bus like Pub/Sub. (My preference for DX, native bigquery destinations and getting the job done is GCP pub/sub) Give each one as much contextual data as makes sense. Don’t overthink schema perfection early. It’s better to have a rich event history than a minimal one that needs patching later. Every event should describe what happened, when, and with what metadata. Then let it flow downstream. I usually fan these events into **BigQuery**. It becomes the historical ledger of the system. Even if you never query it, the data accumulates quietly. One day, someone will want to know how many users upgraded within 30 seconds of a push, or how long a device stayed offline. If you send events, the answer will already exist. ## Why It Matters 1. **Analytics** You get instant access to time series data across everything. No extra tracking layer needed. 2. **Extensibility** Any teammate can build a new service off the event stream. Want to send a webhook when a payment fails? Just subscribe. 3. **Side Effects as Code** Internal services can react asynchronously — log, update, notify — without coupling. 4. **Future-Proofing** You can build real-time dashboards, train models, or detect anomalies later, all from the same stream. You have ALL historical timeseries data from day 1, and its basically free since events and bigquery storage is dirt cheap. ## Setups that have worked for me - Application emits JSON events to Pub/Sub - Pub/Sub → BigQuery via a subscription that always lands in bq. - Optional consumers subscribe to topics for side effects No complex pipelines. Just events flowing downstream, waiting to be useful. ## What Makes a Good Event ```json { "event_name": "user_invited", "timestamp": "2025-01-01T00:00:00Z", "event_data": { // whatever data you want to store for that event // unique attributes for that event // you probably dont need to store the previous state, just "changed" matters. in the data you can compare each change. }, "event_context": { // any metadata/context you have // eg useragents, ip, userid, timestamps, latency, whatever // this should be reasonably consistent and ALWAYS the same best-effort shape for all events no matter what } } ``` You don’t need to know all use cases up front. Events are an investment in optionality. ## The Mindset Treat your system as a stream of facts. Every event is a record of something that happened. Once you start emitting them, you stop worrying about what data you forgot to collect. Don't overthink whether you are "event sourcing" or "event driven" - you can do both at the same time. Just send events. --- URL: /blog/best-of-breed-vendors Title: Best-of-breed vendor stack Date: 2025-09-15T00:00:00.000Z --- title: Best-of-breed vendor stack description: Opinionated picks for startup tooling with quick links. date: 2025-09-15 --- Here's a current list of vendors I think are doing well. ## Startups who are accelerating **support** -- [@plainsupport](https://x.com/plainsupport) (sync and push comms to edges - teams/slack/discourse - is next level good) **docs** -- [@mintlify](https://x.com/mintlify) (Suddenly made all softwares documentation 10x better) **customer analytics** -- [@tinybirdco](https://x.com/tinybirdco) (IaC for data work, its fast with an excellentlocal experience.) **auth + subscriptions** -- [@ClerkDev](https://x.com/ClerkDev) **Analytics & Product Insights, gating experiments** -- [@statsig](https://x.com/statsig) (runner up posthog, but statsig is much more precise on hardcore statistics) **secrets** -- [@doppler](https://x.com/doppler) (excellent syncing experience) **mail etc.** -- [@resend](https://x.com/resend) (local email experience) **incidents, ops** -- [@rootlyhq](https://x.com/rootlyhq) (slack driven workflows are top tier) **ai agents / code** -- [@cursor_ai](https://x.com/cursor_ai) (still winning) **ticket/agent mgmt + triage etc** -- [@linear](https://x.com/linear) (elite quality) **hosting/deploys** -- [@Railway](https://x.com/Railway) / [@vercel](https://x.com/vercel) (out-of-your-face infra to stay focussed on customers & product) **integrations marketplace** -- [@useparagon](https://x.com/useparagon) (iac for integrations, event driven and solid management. even ootb workflow builder canvas) ## Incumbents (who cant seem to be beaten) **search** -- Algolia **dataw** -- BigQuery (gsheets native integration can't be beaten) **payments** -- Stripe (too big to beat) **internal docs** -- [@NotionHQ](https://x.com/NotionHQ) (sort of incumbent, but cool) **all ops** -- [@datadoghq](https://x.com/datadoghq) (so feature dense it can't be beaten, esp for speed of getting started) **code / cicd / automations** -- [@github](https://x.com/github) (native gha mean nothing else needs to be used) --- URL: /blog/best-llm-ai-ecosystem-stack Title: Best LLM/AI ecosystem stack Date: 2025-09-14T00:00:00.000Z --- title: Best LLM/AI ecosystem stack description: A living guide to building with LLMs—tools, patterns, and tradeoffs. date: 2025-09-14 --- The AI tooling & LLM ecosystem moves quickly. As always see x.com/the_mewc for latest commentary. ## General - Cursor with MCP + background agents for long-running tasks - Cursor CLI and claudecode cli are good contenders for CI, oai are catching up - Replit, lovable, v0 and the like are early toys but aren't ready for serious - builder.io is promising for marketing sites ## Tooling - Firecrawl for structured web extraction and crawl orchestration - AI SDK / OpenAI / OpenRouter clients with retries, timeouts, backoff - Unify observability: request/response traces, redaction, prompt versioning ## Retrieval & data pipelines - ETL: Postgres/ClickHouse as the ground truth; CDC for freshness - Embedding stores with predictable chunking and TTLs; hybrid search where needed - Permissions in retrieval layer, not in prompts ## Orchestration & agents - Task graphs > monolithic agents; isolate tools; sandbox risky actions - Background workers with idempotency keys; retries with dead-letter queues - Human-in-the-loop for risky or user-facing actions ## Testing, evals & observability - Golden sets; regression gates in CI; online vs offline evals split - Diff prompts and models; measure cost, error rate, and outcome quality - Capture traces; scrub PII; keep reproduction kits for incidents ## Deployment & ops - Multi-region routing and failover; rate limits and circuit breakers - Secrets via KMS; key rotation; request signing and audit trails - Model routing by use case and budget; fallbacks for degraded modes ## Safety, governance & compliance - Content filtering; jailbreak tests; red-teaming as a practice - Tenant isolation; retention windows; GDPR/region pinning - Provenance for prompts and retrieved data --- I’ll keep this post fresh with concrete vendor picks and example configs. > 📝 **Snapshot vs narrative** > The homepage section is a quick reference. This post carries the fuller reasoning and tradeoffs. > 💡 **Defaults** > Start simple. Add complexity only when the bottleneck is proven by traces and cost reports. > ⚠️ **Risky actions** > Sandbox tools that can change state, and require human review where appropriate. ### Example code block ```ts type Vendor = "OpenAI" | "OpenRouter" | "Anthropic"; const chooseVendor = (useCase: string): Vendor => { if (useCase === "evaluation") return "OpenAI"; if (useCase === "cost-sensitive") return "OpenRouter"; return "Anthropic"; }; ``` --- Summary: /llms.txt