A Personal Interest Rubric

Have you ever lost something and then spent the next ten years trying to find it?

I have. It’s an essay about the fall of the Roman Empire. Did the average citizen know the empire had fallen?

No. It took them about 200 years to figure out, but you’re just going to have to trust me because all the kings horses and all the king’s men haven’t been able to find that paper again.

All of this is backstory

Back in 2020 I got way into the idea of Personal Knowledge Management (PKM) and went about setting up my first “PKM Stack,” or set of tools for solving the problem: how do I stop losing all the interesting things I read.

The first version of this stack used Instapaper as my read-it-later, IFTTT, Dropbox, and Obsidian. The workflow sucked. But the basic idea was I’d read something, if it had any highlights those would get put in a markdown file in Dropbox by IFTTT, and then I’d manually drag it over to Obsidian eventually.

When AI first started taking off Instapaper used it as an opportunity to raise their rates, and not having been CRAZY happy with that solution, and not seeing any value in the then-state of AI, I used it as a chance to rethink, and ended up hopping to Readwise Reader.

Instapaper to Reader

When I migrated from Instapaper to Reader it was, by necessity, an imperfect migration. Instapaper’s output format included a link to the original document, the parts I had highlighted, but not the document text. Reader had to go out and fetch the document bodies again, and in a handful of cases failed. Either because Reader couldn’t parse it, or because the document was gone.

I also had the habit of archiving everything in the hope I would be able to find it later if I needed. This comes from deep trauma: I’m still looking for a paper I read a decade ago about whether the average imperial citizen knew the Roman Empire had fallen.

The result is a large backlog of low-value things that I had read, and saved in case, that I was unlikely to ever look at again. Or find any value in if I did.

Reader to the Future

Reader is fine for, well, reading, but it’s got usability gaps for making notes and long-term storage. Especially since my primary modality for using it is on an iPad mini on a flight somewhere. Most of my life runs out of Obsidian, why not this?

After thinking about it I decided that I want every document that I “archive” in Reader to go through an enrichment pipeline that summarizes it, extracts any highlights, extracts any key ideas or topics, and puts a markdown file in the right part of my Obsidian vault with all that information and a link back to the Reader copy, so that I can reference it if needed.

To make it maximally useful, I needed to do this for everything in the archive, and to do that I needed to get rid of all the low-value documents hanging around in there.

How to Build A Rubric And Learn to Love the Atomic Bomb

Going through the entire archive was going to require a system that could look at everything and reliably decide what I should or shouldn’t keep. To do that, I needed to be able to describe what I would want to keep. I needed an interest rubric.

Have you ever tried to describe yourself? Like really describe yourself? Maybe this is easier for other people but man I struggled. I came up with I think about four core interests, but I knew that wasn’t accurate.

My next attempt was to open a conversation with Claude, explain to it what I was trying to do, and ask it to interview me. This was... Medium successful? It ended up being a much better place to start from but was not in and of itself nearly comprehensive. Again it continued to rely on the things I could think of to prompt it with.

Finally I thought, okay. Well. I’ve got this backlog of things I’ve read. I know it’s got some stuff I want to get rid of, but it’s also filled with lots of stuff I’d want to keep. Can I use that as the basis for a rubric?

Bingo.

The next problem was purely technical. Over 700 documents, many running into the tens of thousands of words. There was going to be no way to put them all into Claude’s context at once. I thought about it and figured “good enough” might be “good enough” at scale: I had Claude write a series of Python scripts that would randomly sample about half of the documents and post them to an OpenAI batch job with the goal of summarizing and extracting key insights.

I’m not certain if this was the absolute final version of the prompt, but it looked a lot like this:

Your task is to analyze this document and create a comprehensive summary that would allow someone to understand its core value without reading the full text.
Document Title: {doc[’title’]} Author: {doc.get(‘author’, ‘Unknown’)} URL: {doc.get(‘url’, ‘Not provided’)}

Content: {doc[’extracted_content’][:15000]} # Truncate very long content

Please provide:

A concise but information-dense summary (2-3 paragraphs) that captures the key ideas, arguments, and context of the document.

3-5 key insights or takeaways from the document that represent its most valuable content.

A brief assessment of why this content might be valuable to someone with interests in rationality, economics, science fiction, fantasy, and tabletop role-playing games. If the content seems unrelated to these areas, explain what type of person might find it valuable instead.

Format your response as structured JSON with the following keys: “summary”, “key_insights” (as an array), and “value_assessment”.

I think of “summarize a bunch of text” as something Large Language Models (LLMs, such as Claude or ChatGPT) are good at. Perfect use case.

My next step was to take the first Markdown file and the basis of my rubric from the interviews, and attach them to a conversation with Claude Sonnet 3.7 with Extended Thinking. I typed out like four paragraphs of stream-of-consciousness about what I was trying to do and asked it, in so many words: “based on all these summaries, what interests do you think I have that you don’t see on the rubric?”

Thus began a long dialogue. It would propose, based on what it saw, what it thought I liked. I would correct, confirm, cajole, and provide insight. Why yes, I am deeply interested in Philosophy. Actually no I’m not that interested in international finance, that one article happened to be really interesting.

Claude:

Governance and Legal Systems: Your reading suggests deep interest in specific governance mechanisms and legal frameworks beyond general institutional analysis. How central is this comparative approach to governance in your thinking?

Me:

I think I'm interested in how the world works, and how it could be made to work better. How do we build a better world, or at least minimize the chances of a worse world. I don't think I'm specifically interested in governance and legal systems outside of that context.

Claude:

There's a recurring pattern of interest in Philosophical Frameworks for analyzing complex issues. Would you consider this a hidden interest, or do you see it as simply an aspect of Rationality?

Me:

I am definitely interested in philosophy and philosophical frameworks. If I didn't need to "work for a living," I would strongly consider going back to school for a PhD in Philosophy. I think all of life's interesting questions are in the world of philosophy, and that it's a shame there's no way to make money doing it.

At the end, I’d ask it to spit out an updated version of the rubric, which I would take to a new conversation with Claude, the next markdown file, and repeat the process. After the second round I asked it to provide me with a prompt that captured everything I was trying to do, which I used for all subsequent iterations of the conversation.

The output of all these rounds was a several-kilobyte markdown file which accurately captures some, but not all, of my various interests. Not all my interests are captured in the things I read, and a random sampling was always bound to miss something. Nevertheless it’s good for what it got!

If you think it’d be interesting to see the final result (either to see what we built, or to get a better sense of who I am as a person,) I’ll attach a lightly edited version at the end.

Actually Doing the Thing, for Exceptionally Large Values of Thing

Originally I was going to take my rubric and repurpose those scripts to have ChatGPT do the review, but I ran into a technical issue and had to pivot to Anthropic for this part. The ChatGPT batch API is convinced that I have batches in progress, even when I don’t and I can’t submit any more.

Anthropic ended up working out well because their API will let you set a “system” prompt that is distinct from the “user” prompt. The entire rubric and guidance on the return format went into the system prompt, and the user prompt ended up being the document data.

The first run, everything scored too high. Every document was a keeper, even ones I knew I didn’t want to keep.

I adjusted the rubric slightly, and it got a little better.

Finally, I added a point deduction metric for certain topics I knew I didn’t like, and that hit the sweet spot. The final run resulted in a markdown file reviewing everything in my Reader backlog that didn’t have highlights, and suggested around 83 deletions.

I manually reviewed all suggested deletions, and agreed with all but one of them, which revealed an obvious gap in the rubric. Success! The “keeps” were lightly spot-checked. I don’t need it to be absolutely right about them, because the worst case there is that a future step of this project costs me a little extra money by doing enrichment on a document that has nothing of value to give. I’m not that worried about it.

Conclusion

So what I did I learn? I learned a lot about how batch jobs to OpenAI and Anthropic work. I learned a medium amount about my own interests. I feel like if I had any need to do it I could write an extremely good dating profile now. “Enjoys long walks in the forest and applied epistemology.” I learned a bit more about how to wrangle the best results out of the current state of LLMs.

My next steps are to take everything I learned and built and begin working on the enrichment pipeline, which I think will look pretty similar in a lot of ways to start. Because I first need to deal with my archive, batch jobs are the most cost effective option. Eventually I’m imagining an AWS Step Function. All of which is better than Ghostreader. Most of which is better than nothing.

Did this help me find my Roman Empire? Well, no. As far as I can tell that’s well and truly gone. But I think it’s improving the process by which I never let the Empire fall again.

2026-02-01 Update

I used to have the entire Personal Interest Rubric here, because I thought it was somewhat interesting and people would want to see a sample of what I was talking about. Unfortunately, it made the entire post unreadable. Also, I'm no longer using the rubric. More on what I'm doing now to come.