Back to blog
#debugging#workflow#morning-review

How I debug failed overnight tasks at 7 AM

Wake up, open Zowl, check the dashboard. 48 done, 4 failed, 2 skipped. Here's my 15-minute morning review process.

How I debug failed overnight tasks at 7 AM

The alarm goes off

6:58 AM. Coffee isn't ready yet but I'm already reaching for the MacBook. This is the part most people ask about: what does the morning actually look like when you've had agents coding all night?

It looks boring. That's the honest answer. I open Zowl, look at the dashboard, and read numbers. Today's numbers: 48 done, 4 failed, 2 skipped. That's a solid night.

Step one: ignore the green

The 48 passed tasks don't need me yet. I'll review their code later. Right now, I only care about the red.

I open the failed tasks view and sort by pipeline step. This matters because where a task failed tells me what kind of problem it is before I even read the log:

  • Failed at pre-check = the agent couldn't understand the task or found a conflict with existing code
  • Failed at implement = the agent got stuck during execution
  • Failed at validate = the code was written but didn't pass checks

Each failure type has a different fix. Knowing the step saves me from reading the entire session log top to bottom.

Step two: read the error, not the whole log

Session logs from overnight runs can be long. I'm talking 3,000+ lines for a complex task. Reading all of that at 7 AM with half a cup of coffee in me is not happening.

Here's my shortcut. I open the session for a failed task and go straight to the step that failed. Zowl marks the failure point, so I don't have to scroll. I read the last 20-30 lines before the failure. That's where the actual error lives.

This morning, task #12 failed at validation. The last few lines:

FAIL src/lib/scheduler/__tests__/recurrence.test.ts
  ● Recurrence › should handle monthly recurrence on the 31st

    Expected: "2026-03-31"
    Received: "2026-03-28"

Tests: 1 failed, 14 passed, 15 total

That tells me everything. The agent wrote a recurrence function that doesn't handle months with fewer than 31 days. It's a logic bug, not a hallucination. The validation step caught it because the acceptance criteria in the PRD specified "handle months with varying lengths." Good. The test worked. The implementation didn't.

Step three: diagnose the root cause

This is the part that takes practice. The error tells you what broke. But the question is: why did it break? And 90% of the time, I already know the answer.

It's the PRD.

Not always. But almost always. Here's my mental checklist:

  1. Was the requirement specific enough? If the PRD said "handle recurrence" without specifying edge cases, that's on me.
  2. Was the context complete? Did I tell the agent about existing date utility functions it should've used?
  3. Was there an ambiguity? If two reasonable interpretations exist, the agent picked one and I wanted the other.

For task #12, the PRD said "support monthly recurrence" but didn't mention end-of-month edge cases. That's a PRD problem. The agent implemented a naive "same day next month" approach, which breaks on the 31st for months with 30 days. Fair enough. I didn't specify, it didn't handle it.

Step four: fix the PRD, not the code

This is the thing that took me months to learn. My instinct used to be: open the generated code, find the bug, fix it manually. But that defeats the entire purpose. If I'm hand-editing generated code, I'm just a slow IDE.

Instead, I fix the PRD. For the recurrence bug:

### Requirements (updated)
4. Monthly recurrence: schedule on the same day each month.
   If the target day doesn't exist in a month (e.g., Jan 31
   → Feb), clamp to the last day of that month.
   Use date-fns `setDate` + `lastDayOfMonth` from src/lib/dates.ts.

Three lines. That's the fix. Not code. Instructions.

Step five: re-run only what failed

I select the 4 failed tasks and the 2 that were skipped (those two depended on a failed task). Six tasks total. I hit re-run.

This is something that tripped me up early on. I used to re-run the entire pipeline. All 54 tasks. That's wasteful and slow. Zowl tracks dependencies, so it knows which skipped tasks were blocked by which failures. I just re-run the failures and their dependents. Six tasks instead of fifty-four.

The re-run usually finishes in 15-20 minutes for a batch this size. I use that time to start reviewing the 48 that passed.

The full morning timeline

Here's how it actually breaks down:

6:58  Open Zowl, check dashboard
7:00  Open 4 failed tasks, sort by step
7:02  Task #12: read validation error, update PRD
7:05  Task #19: read pre-check error, add missing
       file path context to PRD
7:08  Task #33: read implementation error, fix
       ambiguous requirement
7:11  Task #41: read validation error, add edge
       case to acceptance criteria
7:13  Re-run 6 tasks (4 failed + 2 skipped)
7:14  Start reviewing passed tasks while re-run
       executes

Fifteen minutes from open laptop to re-run started. Sixteen on a slow day.

The patterns I've memorized

After months of mornings like this, certain failure patterns are automatic for me now. I don't even need to think about them.

"Cannot find module X" = I forgot to mention an existing file in the PRD's context section. The agent created its own version instead of importing the existing one. Fix: add the import path to the PRD.

Test expects X, received Y = the PRD was ambiguous about behavior. The agent made a reasonable choice that wasn't my choice. Fix: make the requirement explicit.

"File already exists" = two tasks tried to create the same file. Dependency order issue. Fix: add the dependency link so they run sequentially.

Pre-check says "conflicting pattern" = the agent read the codebase and found that what I'm asking for contradicts something that already exists. This one's actually the pre-check doing its job. Fix: update the PRD to acknowledge the existing code and describe how to integrate with it.

Why I don't get frustrated

I used to. Trust me. Waking up to failed tasks felt personal, like the tool was failing me. But once I started tracking root causes, the data changed my perspective.

Out of the last 200 failed tasks I've logged, 178 were PRD problems. That's 89%. The agent did exactly what I told it to do. I just told it the wrong thing, or I didn't tell it enough. The key is understanding where failures originate. I first understood this by studying session history.

The morning review isn't debugging the agent. It's debugging my own instructions. And once you see it that way, the fifteen minutes stops feeling like cleanup and starts feeling like a feedback loop. Every failed task makes tomorrow's PRDs better.

My one rule

Don't fix generated code by hand unless it's truly faster. Fix the PRD and re-run. It takes discipline because the manual fix is right there, tempting you. But every hand-edit is a lesson you didn't teach the system.

The re-run is the lesson. The updated PRD is the documentation. And next time you write a similar task, you'll include that edge case from the start.

Fifteen minutes. That's the cost of overnight automation when the PRDs are decent. I'll take that trade every morning. If you want to understand how to write PRDs that prevent these failures in the first place, see what "done" means to an AI agent. And if you're ready to try this workflow, Zowl handles the orchestration automatically.