Current AI LLMs are so terrible. Basic task failure beyond writing, is everywhere.

contract · Nov 10, 2025

We have access to world's greatest AI programs and they all, every single one, fail to deliver beyond basic tasks such as writing (very minor complex tasks ie. combine ABC without duplicates.)

The "brain" part is just missing. The moving around text/generating text is there, but the logic to realize there's duplicates, it doesn't flow, is just not there... You can say it's input failure, but it's not.

I'm assuming all AI companies are dumping everything into music, video, image generation and text is a thing of the past..

eliquid · Nov 11, 2025

I'm not so sure of this.

Granted, I don't know what you are doing and example you gave might just be weak ( or I don't understand it ), but I have gotten Ai to do whatever I want and I ask and do some complex shit with it.

I think it comes down to a few things:

1. model you are working with. I only use ChatGPT for certain XYZ tasks, and then I use Claude for certain ABC tasks, and Gemini for certain JKL tasks. You gotta understand each's stregths.

2. token usage. If you are trying to one shot everything, it's going to fail if you have a ton of thinking it needs to do. The first 100k tokens work 500% better than the last 100k tokens in a one shot prompt. Even if you are not one shotting it and going in and using the chat window for multiple asks in the same window, its the same idea until you hit the token limit.

3. chat window vs API. This kinda touches on #2 but is different. I find using the "chat window" of any model worse than if I use the API through N8N or Make or w/e. Some of this is token use, but I also think it's because of of workflow and how you have to ask/prompt/design the end result. IYKYK.

4. repeat importance. Even with all the experience I have, an Ai/LLM will forget or leave something out. I have to put the task in the prompt 3 times if it is important. I generally put it in the 1st, middle and end of the prompt and I will word it slightly different each placement.

5. sometimes you have to teach Ai. Ai is just a summary of everything it has been given. This means it's initial training from the LLM, but also what you have in your chat history with it. Or your RAG setup. If your duplicates are not exact match wordings ( character by character ) you are going to have to teach it and give it instructions what a duplicate actually is. Maybe instead of being a duplicate at an exact match, it's a duplicate by intent. You will have to teach it that intent and provide it examples so it can learn and do it.

secretagentdad · Nov 11, 2025

^^ damn that's a good post
Summarizes my experience with one exception.

Number 3. Chat window has given me pure brilliance almost like its tuned differently a couple times or allowed a different set of resource constraints. Specifically with chat gpt. It won't build shit in chat window, like you can do with api credits or claudes more templated chat window that forces everything down a couple of design trees.

5. is really really true. With importance repetition, exception calling out and asking it to rebuild its model to not miss items. Chat gpt has ingested an enormous quantity of data, I have asked it specifically to use its word vector summaries or emphasize social data and build a model to reach a conclusion. Then, given it reference items and asked where they go relative to a list or some other hierarchy to tune it with pretty insane results.
Its clearly got some organized data sets its not just allowed to spit out but can pull real conclusions from if you tell it to lay off the language model projected answer patronizer buillshit and tune in based on summaries of dataset it was exposed to.

bernard · Nov 13, 2025

ChatGPT certainly varies wildly in quality and as said, it seems mostly to be related to the context window being reduced sometimes. I've also seen ChatGPT completely misunderstand a word or something and then answer based on that, which was rare to see before.

Also, ChatGPT 5 seems to be more of an answer machine, way more often going for the websearch. They've even limited the answers it gives from websearch to 1 as a default, so this is a clear sign, they've probably gotten A LOT of normie users, who ask simple questions about something factual, so they've been forced to change their model to use less compute.

As Eliquid wrote, it's also about using different models for me. Gemini is probably the most impressive for for a lot of things, but it is also confidently wrong a lot of times. Claude is disagreeable, ChatGPT probably the most balanced and trustworthy.

ChatGPT 5.1 is out now btw and is the default for ChatGPT Pro, not sure when that happened, but first impressions is that it is actually very impressive, in style more back to the original 4o, lots of emojis and more certain of itself, fewer guardrails, so give it a try.

Current AI LLMs are so terrible. Basic task failure beyond writing, is everywhere.

contract

We're all gunna mine it brah.

eliquid

secretagentdad

Time to be a hoot.

bernard