I Now Remember Every Idea on my Walks
I lose my best ideas in the dumbest ways. I’ll be walking the dog, think of a solid video hook, and by the time I unlock my phone and find the right app… it’s gone. Or worse, I write it down “somewhere,” and never see it again.
So in this build, we’re continuing to fix that with one rule: hands-free capture. One trigger, one voice note, and the system figures out where it belongs.
Today we build the first end-to-end version: phone to a secure n8n webhook, journal every note, then AI routes it, and we execute one real action: create a video idea in Obsidian. It doesn’t matter if your end action is different. The pattern is the point. If you missed the other two videos that led us to this point, check out this playlist to see the whole collection.
“Just take a note” sounds simple until you build it. Mobile networks drop. Shortcuts retries. You accidentally trigger it twice. If your flow isn’t built for that, it will silently lose inputs or spam duplicates.
So here’s the baseline I want before any AI touches this: security, audit, reliability.
Security means the webhook is not a public suggestion box. It needs an Authorization header with a Bearer token, or it gets rejected.
Audit means we store the raw text, the intent we classified, and the final outcome.
Reliability means idempotency. If the same text comes in twice, even with different timestamps, we treat the second one as “already seen.”
Alright, let’s open n8n and start from a blank workflow. I want one entry point. One intake path. Everything else happens after we’ve recorded the input.
First node is a Webhook. POST only. Require an Authorization header that looks like “Bearer” plus a secret token. It’s a shared secret so random internet traffic can’t hit your endpoint.
Right after the webhook, check the text. Nothing there? Reject. Then respond immediately with a 200 so Shortcuts doesn’t hang and retry. The rest of the workflow can run behind the scenes where we can log and recover.
Publish the workflow, and now we can send real data from the phone. And I am showing using an iPhone, because, well, that’s what I own. But I am sure all of this will work on other platforms, but I haven’t ever had something else, at least not since my Windows phone 30 years ago, so I can’t show you how to do it there. But I am going to assume that you are smart enough to take the pattern and apply it to whatever you use…rather than being the idiot who is going to complain in the comments
On the iPhone side, we’re building a Shortcut that does three jobs: capture your voice, package a tiny payload, and POST it to n8n with the Bearer token header.
Start with Dictate Text. Keep it short. This is for quick thoughts, not essays.
Next, build a Dictionary with one field: text. That’s it. The whole system is built around capturing fast, and then sorting later.
Now add “Get Contents of URL.” This is your POST to the n8n webhook URL. Method: POST. Body: JSON. And add the Authorization header with “Bearer YOUR_TOKEN”.
Then bind it to a trigger. On iPhone, I use triple-tap which I can get to by going to Settings → Accessibility → Touch → Back Tap. You can do one thing with a double tap and something else with a triple tap. I was using double tap for something so I used triple tap for this workflow. On other platforms, use whatever gets you to “one move, no app hunting.”
Test it with a few phrases. In n8n, you should see the webhook fire every time.
If it doesn’t, troubleshoot in this order: URL is correct, method is POST, body is JSON, and the Authorization header is exactly right. If nothing makes sense, its probably the auth header…
Before we let AI touch anything, we build the part that makes this trustworthy: the journal. It’s a write-ahead log. Record first. Route second.
In n8n, I’m using Data Tables. I used to use an external table for this but then n8n added DataTables and its changed everything. It’s just a window into the database n8n is built on, so a super easy feature. But it makes it incredibly easy to work with. Make a table like NotesJournal with a boring schema: sourceText, destination, outputText, status.
In the workflow, after auth, look up sourceText in the table. If it already exists, stop. That’s idempotency, and it’s purely the text. Same words twice means “ignore the second one,” even if timestamps differ.
If it doesn’t exist, create the row right away and then respond to the shortcut. That receipt is the whole point. Capture shouldn’t wait for AI.
From there on out, every step updates the same row: classified, extracted, done. If something fails, you have a record you can replay later without re-recording.
Here’s the trick with AI in automations: you don’t ask it to “do the automation.” You ask it to label and structure text so your workflow can do the automation. We talked about that in the last two videos. It’s tempting to think that MoltBot or ClaudeBot can do it all for you, but sometimes you are lucky and often times you are not. And the permissions you have to give up to get that are staggering. Wha t we are building here is reliable every single time.
So we start with a small set of buckets. Mine are: purchases, 3d print ideas, groceries, calendar, todo, wins, video ideas, and other. And I’m keeping those labels stable. If you rename buckets every week, you’ll spend your life fixing Switch nodes. But feel free to define what makes sense for you.
So the first AI call is text classification. I’m using Ollama Cloud for this because I think it is pretty amazing. I pay $20 a month and that covers all of my workflows, all interactive prompts, and a lot of time in Open Code as well with plenty of overhead to spare. You’d think that after being on the project at the very beginning I would qualify for some sort of Friends and Family pricing, but nope, I pay full fare. But its far faster than my GPU and I get all the same privacy and security benefits as running it offline.
This video isn’t sponsored by Ollama, none of my videos ever have been, though I am open to it…Jeff? Michael? Patrick? You have my number…unless you deleted it already. But this video is sponsored by n8n. Now I didn’t make this in n8n because they sponsored. In fact, they had no say in what I say in this video. They just help me make the videos I was going to make anyway about n8n. Pretty cool that they are ok with that. It’s been the automation platform I have used for everything since it was a tiny little project when I was still at Datadog and I keep going back to them. You can self host like I do, or let them do the hard stuff for you for a little fee. It’s up to you. Find out more at n8n.io.
For the text classifier, you provide the text to classify, and then add the categories. For each category tell the AI how to identify the text. It doesn’t have to be complicated. This is what I have and it works incredibly well. Like shockingly well. For the model, I am using Minimax m2.
Once we have intent, we update the journal row: the destination gets saved, status becomes classified, and we keep going. The destination determines which branch to follow because we do different things depending on where it goes.
Second AI call is extraction. This is where we turn freeform speech into structured output that we can route. And the big move here is schema-first. You decide the fields. The model fills them. For this example I have just one field: the video idea. And its required. This just extracts the idea from the text and returns it. But depending on your needs, you may extract other stuff.
For example, for task you might extract: title, notes, dueDate , and priority . For calendar: title, startDateTime, endDateTime, and location. But don’t over complicate it. Keep it simple enough that you’ll actually use it. That’s why I just stick with the rough idea to capture. and for this one I am using GPT OSS 120b. Not for any reason…it worked well and I just never changed it. Generally I try to use the smallest, fastest model that does the job every time. in most cases for most tasks you rarely need the most expensive model.
After extraction, update the journal row: output text saved, status set to extracted.
In this video, we’re implementing one full action: video_idea to Obsidian. There are a few ways to do this, but the pattern is the same. Build a note body from extractedJson. Use a deterministic filename if you can, like a slug of the title, so repeated runs don’t create ten copies. Then create or update the file.
When that succeeds, mark status as done.
And yes, this is where n8n earns its keep. You’re not just wiring API calls together. You’re building an automation that can be trusted: secure intake, durable journal, narrow AI calls, and clean branching.
But when I said we were done, we aren’t really done. We wrote a file but that doesn’t get it straight into Obsidian. n8n is strict on the parts of the filesystem it can and cannot write to. So we wrote a file using the Dropbox integration. Dropbox takes care of replicating the file to my other machines, including my mac. And Shortcuts comes back into play.
Shortcuts isn’t perfect. Things get way too complicated way too quickly and I could solve it more easily with code. But I don’t have to maintain the shortcuts engine which is a huge win. In fact, I often hear folks say that using n8n is crazy when they can build something custom that does it faster. Everyone thinks that at first until they experience the mental cost of maintenance. Its ok to let someone else take care of that. So let’s take a look at this shortcut.
It sees the files added to the Dropbox folder and then for each one, gets the content and type. Then depending on the type, it gets written to a few different markdown files in my obsidian vault. I also delete the file from the Dropbox folder since it isn’t needed anymore. The actual file writing is done with the action included in Actions for Obsidian which is a great tool to have if you use Obsidian and Shortcuts.
Next, we tweak it: prompt tuning, and a quick log review to prove that notes really go to the right place.
n8n does offer logs that we can review but we have something even better for most cases: the journal. We can see everything that came in and where it ended up. Pick one row in your Data Table and see where it ended up by looking at the status: received, classified, extracted, done. This is why we store status at every step. It’s a breadcrumb trail you can actually use when something breaks. If it broke, review that node and the logs. To make things even better, you might record the execution id in the table, then you can see exactly what happened in that case.
Then prompt tuning. For the classifier, add a couple of short examples that match your real voice notes. If things are going into the wrong buckets, tighten your allowed labels and keep “other” as the safe bin.
For extraction, keep the schema smaller than you think it should be. Every extra field is one more thing the model can invent. Set temperature low, because you want boring, consistent output.
And now the test I care about: send the exact same phrase twice with a few minutes in between. The first one should create a journal entry and add the Obsidian file. The second one should hit the idempotency check and stop cold. No duplicate note. No extra action. Just a clean “already seen this.”
At that point, you’ve got something you can trust.
So here’s the new normal.
You triple tap, you talk, and the note lands in a system that’s secure, durable, and searchable. It responds fast, it journals every capture, and it only uses AI for the narrow part: classify and extract.
And because we built idempotency around the text, retries stop being scary. Same phrase twice means one result, not two.
Right now, we implemented one full action: video ideas spoken into my phone while walking the dog into Obsidian. Next, add the other branches one at a time: tasks, calendar, and plain notes. Keep the same pattern: record first, then decide, then act, then mark done.
If you don’t want to pay for the subscription, swap Ollama Cloud for a local Ollama instance. That’s just a matter of changing the Ollama credential. Same workflow. Or change it to any other AI service, but note the potential loss of security or privacy depending on your choice here.
If you want the rest of this build, watch the playlist. Like the video and Subscribe for the next episodes. And in the comments, tell me what your “one move” trigger is, because I’m always stealing good ideas. Thanks so much for watching. Goodbye.