a couple times it got into a loop of 'i need to take X out' ok now that's gone, ill add Y. Ok now that i added Y, i need to add X. but now that i did that, I'll remove Y.
Tobe fair, I ran out of limit on claude and chatgpt trying to make the same changes. It was a curious one; the code was a bit above my head but i eventually took about 45mins on my own tweaking to get it working. I was shocked when i accidentally got it fixed.
yesterday I posted a video testing claude-4 sonnet solving an https://simstack.io long-horizon swe challenge unaided (https://news.ycombinator.com/item?id=44424468). for comparison, here's gemini 2.5-pro.
I noticed that 2.5-pro is way more cavalier, skipping backups, and trying "more stuff more quickly" than claude's more cautious approach.
Ive been using gemini cli,
a couple times it got into a loop of 'i need to take X out' ok now that's gone, ill add Y. Ok now that i added Y, i need to add X. but now that i did that, I'll remove Y.
Tobe fair, I ran out of limit on claude and chatgpt trying to make the same changes. It was a curious one; the code was a bit above my head but i eventually took about 45mins on my own tweaking to get it working. I was shocked when i accidentally got it fixed.