One of the features we're most excited about in Postiller is serendipitous discovery—surfacing unexpected connections between bookmarks that might spark new ideas. You save an article about team dynamics, another about product strategy, and months later the app shows you a thread you hadn't noticed.

That's the vision. The reality has been more complicated.

The Embedding Problem We Didn't Expect

Our initial approach used embeddings to find similar content. Embeddings are mathematical representations of meaning—content with similar embeddings should be about similar topics. Elegant in theory.

In practice, we kept seeing strange clusters. Articles about completely different subjects grouping together: AI ethics alongside meal planning, architecture next to leadership advice. After digging into the data, we found the common thread: they all used problem-solution narrative framing.

The embedding model was detecting writing style, not subject matter.

This appears to be a known limitation, though we hadn't anticipated how much it would affect discovery. Embeddings capture semantic patterns broadly—including rhetorical structure, tone, and framing. For search, where you query with specific terms, this works fine. For open-ended discovery where you want unexpected topic connections, it becomes noise.

An Experiment: Explicit Topic Extraction

We're now testing a different approach: instead of relying on implicit semantic similarity, we're extracting explicit topic tags from each content chunk.

The hypothesis is straightforward. Rather than asking "what does this content feel like semantically?" we ask "what is this content actually about?" and get concrete answers: "customer retention," "async communication," "pricing psychology."

We're using Apple's on-device language model for extraction. This keeps everything private—no cloud calls for this processing—but it also means we're constrained by what the on-device model can do. Early results are promising, though we're still tuning the prompts and evaluating output quality.

Chunk Size as a Variable

One unexpected finding: chunk size matters more than we thought.

Our original 800-character chunks often blended multiple ideas together. A single chunk might discuss three different points, producing topic tags that were a muddy average of all of them. We're experimenting with 400-character chunks—more focused, cleaner signals, but also more processing overhead.

The tradeoff is still being evaluated. Smaller chunks mean more extraction calls, which means longer initial processing time. We're testing background processing approaches to make this invisible, but it's not clear yet whether the quality improvement justifies the cost.

Focused Generation: A New Direction

One thing we're exploring is passing specific chunks—not whole bookmarks—to generation. The idea: if a discovery card surfaces a connection between two specific excerpts, why send the entire 3,000-word articles to the LLM? The relevant content might be one paragraph from each.

This is still experimental. We need to understand whether focused excerpts produce better output or whether the surrounding context from full articles actually helps. User testing will tell us.

What We're Learning

A few takeaways so far:

Embeddings are powerful but limited. They excel at finding semantic similarity, but "semantic similarity" includes things like writing style and rhetorical structure. For some use cases, that's exactly what you want. For topic-based discovery, it can be misleading.

On-device AI enables privacy-first experimentation. Being able to run extraction and matching entirely on-device means we can iterate quickly without privacy concerns. The capability constraints are real, but so is the freedom to experiment.

The right abstraction level is unclear. Should discovery operate on whole documents, chunks, or something else? We don't know yet. The answer probably depends on the type of content and the user's goals.

Where This Is Heading

If you're in the beta and have opinions about what makes a discovery feel useful—or frustrating—we'd love to hear about it. This is exactly the kind of feature where user feedback shapes the direction.

More updates as we learn. If you want to follow along or participate in testing, reach out.

Discovering What You Forgot You Knew