Reading across books with Claude Code
I'm a Literature major and avid reader, but projects like this are still incredibly exciting to me. I salivate at the thought of new kinds of literary analysis that AI is going to open up.
It kills the tone, pace and the expressions of the author. It is pretty much same as an assistant summarizing the whole book for you, if that's what you want. It misses the entire experience delivered by the author.
The cost of indexing using third party API is extremely high, however. This might work out well with an open source model and a cluster of raspberry pi for large library indexing?
I'd just reiterate two general points of critique:
1. The point of establishing connections between texts is semantic and terms can have vastly different semantic meanings dependent on the sphere of discourse in which they occur. Because of the way LLMs work, the really novel connections probably won't be found by an LLM since the way they function is quite literally to uncover what isn't novel.
2. Part of the point in making these connections is the process that acts on the human being making the connections. Handing it all off to an LLM is no better than blindly trusting authority figures. If you want to use LLMs as generators of possible starting points or things to look at and verify and research yourself, that seems totally fine.
It's the usual jargon soup. Publish a vetted paper with repeatable steps instead of a hyped-up, garbage, supposed 100x productivity bomb.
And his best result is mechanical findings from where the LLM got the highest correlations between its vectors: Bravo; there's always going to be a top item in any ordered list, but it doesn't make it automatically interesting. Reading literature is about witnessing the journey the characters take. Reading technical material is about memorizing enough of it. In both cases the material has to go through a brain. I find it idiotic to assign any value to outputs like "Oh King Lear's X is highly correlated to Antigone's Y"
In your example, you're doing the inverse (give me a lot of text based on a little), and that's where LLMs have no problem hallucinating the new information.
EDIT: Whoops, I found more details at the very end of the article.
Still experimental and way outside my expertise, would love to hear anyone with ideas or experience with this kind of problem