Reading across books with Claude Code

gmays 140 points pieterma.es

−

kylehotchkiss

In several years, IMO the most interesting people are going to be the ones still actually reading paper books and not trying to shove everything into a LLM

−

hungryhobbit

I don't think the Venn diagram of those people and everyone else is as separate as you imagine.

I'm a Literature major and avid reader, but projects like this are still incredibly exciting to me. I salivate at the thought of new kinds of literary analysis that AI is going to open up.

−

imdsm

the people most likely to analyse books like this are those of us who are more likely to read them as well

−

pradmatic

Sure but those people don't have to be mutually exclusive. At the very least, a tool like this can help me decide what to read next.

−

fatherwavelet

I still read a lot of books and I use LLMs all the time. I have even got a bunch of book recommendations from LLMs. Imagine that. You actually have agency over these tools. I know it is hard to believe for some.

−

alansaber

I can't wait to have the LLM autopilot my neuralink whilst i'm in VR mario kart.

−

zkmon

I used AI for accelerating my reading a book recently. This is a interesting usecase. But it same as racing for the destination instead enjoying the journey.

It kills the tone, pace and the expressions of the author. It is pretty much same as an assistant summarizing the whole book for you, if that's what you want. It misses the entire experience delivered by the author.

−

alansaber

Yes AI subsumes edge cases to produce a very uniform optimal writing (what we call AI slop). I am assuming this is a book you were reading for knowledge work, not for fun? Not heard about people recreationally using AI for consumer content that's a bridge too far for me lol.

−

CuriouslyC

It's not optimal. It's overwritten, repetitive, cliche and increasingly incoherent over longer generations. I say this as someone who likes AI and uses it to create rough drafts and structural revisions of my ideas.

−

alansaber

Exactly stochastic but statistically optimal based on a bunch of very broad range of text which often is not actually good writing

−

Ronsenshi

For me this looks like a great way to build connections between books in order to create a recommendation engine - something better than what Goodreads & Co provides. Something actually useful.

The cost of indexing using third party API is extremely high, however. This might work out well with an open source model and a cluster of raspberry pi for large library indexing?

−

padolsey

The incumbants Goodreads and their owner Amazon have indeed done such a poor job at this. Seven years ago I tried creating a basic graph using collaborative-filtering (effectively using our actual reading patterns as the embeddings space instead of semantics [human X likes book Y so likers of Y might like other things that human X has enjoyed]). It works well to this day (ablf.io) but the codebase is so ugly I've not had the bravery to update its data in a couple of years.

−

alansaber

Yes imo this is very useful but there's not a clear industry standard on how to do so yet, which I imagine will change? Tell me if i'm missing something

−

voidhorse

This was posted before and there were many good criticisms raised in the comments thread.

I'd just reiterate two general points of critique:

1. The point of establishing connections between texts is semantic and terms can have vastly different semantic meanings dependent on the sphere of discourse in which they occur. Because of the way LLMs work, the really novel connections probably won't be found by an LLM since the way they function is quite literally to uncover what isn't novel.

2. Part of the point in making these connections is the process that acts on the human being making the connections. Handing it all off to an LLM is no better than blindly trusting authority figures. If you want to use LLMs as generators of possible starting points or things to look at and verify and research yourself, that seems totally fine.

−

smakt

One has to be the special kind of stupid that is blinded by efficiency promises from the LLM Church to think the article is any worth.

It's the usual jargon soup. Publish a vetted paper with repeatable steps instead of a hyped-up, garbage, supposed 100x productivity bomb.

And his best result is mechanical findings from where the LLM got the highest correlations between its vectors: Bravo; there's always going to be a top item in any ordered list, but it doesn't make it automatically interesting. Reading literature is about witnessing the journey the characters take. Reading technical material is about memorizing enough of it. In both cases the material has to go through a brain. I find it idiotic to assign any value to outputs like "Oh King Lear's X is highly correlated to Antigone's Y"

−

ebiester

I did a similar thing with productivity books early last year, but never released it because it wasn't high enough quality. I keep meaning to get back to that project but it had a much more rigid hypothesis in mind - trying to get the kind of classification from this is pretty difficult and even more so to get high value from it.

−

lloydatkinson

How can anyone even trust crap like this? It was only a few days ago Claude and ChatGPT hallucinated a bunch of stuff from actual docs I sent them links to. When asked about it, they just apologised.

−

mpalmer

Synthesizing 500 words at a time into digestible topics is significantly less prone to error. You're giving it a lot of info and asking for an organized subset. It's good at following such direction.

In your example, you're doing the inverse (give me a lot of text based on a little), and that's where LLMs have no problem hallucinating the new information.

−

alansaber

Exactly the more tightly scoped the problem the less stochastic noise. Even better if you can add more signals based on deterministic algorithms like keyword presence etc. It gets very domain-specific very fast

−

jszymborski

This is all interesting, however I find myself most interested in how the topic tree is created. It seems super useful for lots of things. Anyone can point me to something similar with details?

EDIT: Whoops, I found more details at the very end of the article.

−

alansaber

He asks g2.5 flash to assign a topic. I am also interested in the best way to develop a general schema- there is a good deal of literature on this but nothing stands out, I think the standard approach is open ended classification generation using a single model then binning. Actually the novelty in his approach is first asking if a chunk is useful (ie adding a filter for non-semantic information) which I would normally do at the dataset creation stage.

−

rbbydotdev

I had a similar toy project. Attempting to make custom day trips from guide books. I immediately ran into limitations naïvely chunking paragraphs into a RAG. My next attempt I’m going to try using a llm model to extract “entities” like holidays/places/history and store them in a graph db coupled with vectors and original source text or index references(page + column)

Still experimental and way outside my expertise, would love to hear anyone with ideas or experience with this kind of problem

−

doytch

The mental model I had of this was actually on the paragraph or page level, rather than words like the post demos. I think it'd be really interesting if you're reading a take on a concept in one book and you can immediately fan-out and either read different ways of presenting the same information/argument, or counters to it.

−

nubskr

I've been using Claude Code for my research notes and had the same realization, it's less about perfecting prompts and more about building tools so it can surprise you. The moment I stopped treating it like a function and started treating it like a coworker who reads at 1000 wpm, everything clicked

−

imranq

I really liked the approach of getting new topics to research via embeddings, trails, and claude code, but often what will this give you outside of novelty?

−

skeptrune

I really like the idea of the topic tree. That intuitively resonates.

−

duck

Discussed earlier this week: https://news.ycombinator.com/item?id=46567400