PM's AI toolkit - OpenAI Codex and Google Jules

If I'd try Jules first - I'd have a totally different impression of the state of coding agents today. A much worse impression. Here's why.

Everyone who's trying AI coding agents today arrives at them in different ways. It could be a colleague's recommendation, or an ad, or they read or watched a review... often AI platforms try to cross-promote, but they do that poorly.

I was in my second month using AI for coding when I learned of Codex's existence. My routine was very basic and slow. I'd paste a chunk of code into ChatGPT and ask it to solve my problem. Then I'd copy the output code into my IDE and try to run it. Obviously, I felt there should be a better way, but it was just not easy to find.

Then one bright day, I described my "process" to a colleague, and they said: "Surely you use Codex for that?" and I was like: "Yeah, surely", simultaneously thinking, " What is Codex"?

OpenAI Codex

Then I went and tried Codex, and almost immediately it became my main tool for vibe coding. Codex is OpenAI's coding agent that connects to your GitHub repo and is able to solve small and large programming tasks for you.

The interface is very simple and familiar. It's just a chat box with two settings: planning and doing. In the default, doing mode, you describe the task, click a button, and it starts coding. You can watch how it boots up a virtual machine, run scripts, writes code and so on. When it's done, you get a summary of changes and a diff view of the code. If you're happy with the changes, you can create a PR and merge it into your codebase.

So Codex is not a one-in-all tool for agentic AI development, like v0 or Replit promises to be. You still need to combine it with at least some IDE and a hosting solution.

Depending on the complexity of your task, Codex takes anything from a few minutes to a dozen minutes to complete the work. Rarely have I seen it work for more than 20 minutes, and I tried some rather bulky tasks. Very rarely, Codex fails to complete the task. Splitting the tasks into smaller steps worked for me on another attempt.

If before describing your task in Codex, you'd type /plan - it would run in planning mode. In this mode, Codex won't make changes to your code, but it could analyse what needs to be done and suggest tasks for you to run. I found this mode to be very helpful when debugging a problem or considering solutions for a complex feature.

What didn't I like in Codex?

It doesn't say what it'll do before doing it. This could be annoying and wasteful (credits-wise) for Codex to work on a task, completing it, and then you realise that the output is not what you wanted. Maybe you described it poorly, or AI ran wild, but you just wasted a bit of time and credits. I wish Codex would run in planning mode by default, first telling you exactly the changes it plans to make and asking for your confirmation or modification. Maybe OpenAI can get "inspired" by how Jules does it (below).

Sometimes, Codex gets into infinite loops trying to solve a problem. It's not a literal infinite loop, but rather multiple attempts to solve a problem without success. I wish Codex could just "give up" and suggest alternative ways to find a solution.

Also, I noticed Codex is very cautious to delete any code even if explicitly given permission to do so. This, I assume, must be a feature, not a bug. However, for me, it results in the codebase growing with no longer used code and the need to do a separate trimming with potential debugging when issues arise.

Lastly, and it's not related to Codex performance, the Usage information is all over the place. It's very hard to know how much of Codex you actually left with. One day, it showed me that I have 26% of my weekly limit remaining and 5000 credits available. The next day (and I didn't use Codex in between), it showed 0 credits and 100% weekly limit available. While I was playing around with the tool, it didn't matter much, but once I published one of my projects, I began to worry that I would run out of Codex credits exactly when something important needed to be fixed.

And then I tried Google's Jules coding agent that put my Codex experience into perspective. I wasn't seeking an alternative, just bought the Gemini Pro subscription for unrelated reasons and noticed it comes with access to Jules.

Google Jules

First impression was rather good. The interface is familiar, and connecting to GitHub took seconds. And then I tried a very simple prompt - a little change to my unpublished app. Before starting to work on a task, Jules creates a plan. I really like that feature for reasons outlined above. I only wish it were easier to modify the plan without re-prompting. At times, I just needed to "cross out" a particular step on the plan suggested by Jules, but I didn't find an easy way to do it.

The first surprise of using Jules was how long it took to complete the first task. I genuinely thought it silently failed because it was taking more than half an hour for a task that Codex usually completes within a few minutes. The final timer said 36 minutes. "Okay", I thought, "maybe it just gets familiar with the codebase and next time will be quicker". But alas, every task I tried in Jules took ages compared to Codex. Maybe it's Codex being very quick and not Jules being too slow, but the speed remains my top complaint for using Jules.

And then there's the failure rate. My sample size is not big, only about a dozen tasks I tried both in Jules and in Codex. Unfortunately, in Jules, I saw "unable to complete the task" much more often than in Codex. Combined with how slow Jules is, it's super frustrating to be waiting for a good part of an hour for Jules to complete the task, only to see the dreaded failure message.

I really hope Jules will improve quickly. I really wanted to like it, as it's a very lucrative proposition to have it as part of the Google AI suite. But in its current shape, it's barely an alternative to Codex.

In general, it's great that we have such strong competition in the area of AI coding agents. Already today, product creators can do so much more than even a year ago, and it's very exciting to see what new, cool products will be created with the help of these AI coding agents.

Test-n-Tell.com - product management and operations

Search This Blog