AI Website Audits: Fix INP & Lighthouse Scores the Smart Way
Learn how AI-assisted audits reveal hidden performance issues, including INP, and how to apply fixes that boost SEO and conversions.
Why Website Performance Is the Battleground
Rankings and conversions hinge increasingly on core site performance metrics rather than just superficial keywords. Google’s Interaction to Next Paint (INP) is an example of an often overlooked score that became a Core Web Vital in 2024 when it replaced First Input Delay. It measures responsiveness in the real world: how fast a page reacts when users click, tap, or type — direct interactions. A slow response means lost trust, higher bounce rates, and weaker rankings. If you’ve invested in content and backlinks but neglected INP or other Core Web Vitals, you’re leaving revenue on the table.
The reality is that traditional site and digital asset audits tend to be heavily labor-intensive. Developers historically have combed through Lighthouse reports, Chrome DevTools traces, and performance budgets manually — page-by-page. AI accelerates this process by correlating metrics, spotting anomalies, and proposing fixes aligned with Lighthouse guidance in a fraction of the traditional audit time. The result is not replacing engineers but enabling them to focus on execution instead of investigation, and to translate these results into plain language for stakeholders when necessary.
How AI-Assisted Audits Work
AI models currently excel at pattern recognition across structured data — any other situation and the error rates spike an uncomfortable amount and their efficiency fails to outweigh corrective costs. Feeding them Lighthouse JSON reports, CrUX field data, and RUM logs so they can flag correlations that human reviewers may miss is an ideal application of AI tooling as it stands. For example, if an INP issue appears only on mobile Safari with long event handlers, AI will surface that issue faster than manual inspections could ever hope (and offer several fix options, too). This mirrors how we already use log aggregation in operations: humans define objectives and machines highlight anomalies.
A simple but effective AI audit pipeline may look like this: run Lighthouse in CI/CD, collect field data via Chrome UX Report (CrUX), feed outputs into an AI analysis layer, and then generate prioritized, actionable recommendations. With this approach, each recommendation links back to the source metric and a direct impact on users — so every decision is less about technical pride and more about knowing exactly which revenue path you’re protecting.
To be clear, this process is not theoretical — at Maelstrom Web Services, I use similar audit flows when tuning site speed and reducing script bloat because automation catches regressions early, which frees my attention for high-impact engineering fixes that matter.
INP: The Metric That Exposes Hidden Lag
Unlike First Input Delay, which only measured the first interaction, INP tracks responsiveness across the full session. It may seem minute, but if a button takes 300ms to react at checkout...users notice and brand/experience perception downgrades. Google’s own documentation stresses that “low INP values are critical for perceived performance.” The process of fixing INP often reveals deeper architectural issues on that specific page: too much main-thread JavaScript, blocking third-party scripts, or poorly optimized event listeners — it is a "distracting vital" because it often indicates larger problems.
As always, internal links also play a part in this discussion. I explained the engineering trade-offs in Progressive Enhancement in Practice, which shows why building resilient fallbacks often improves INP naturally — if your site collapses without JavaScript then INP will be suffering as well.
The external research backs these concepts up, it is not just my own experience which informs my advice. The PageSpeed Insights tool explicitly scores INP alongside CLS and LCP — emphasizing how responsiveness is inseparable from layout stability and loading speed.
Common Fixes AI Will Highlight
When AI parses Lighthouse reports, it tends to flag recurring offenders. These are not mysteries but neglected basics. I often see recommendations like:
- Break up long tasks on the main thread into smaller async chunks.
- Remove unused third-party scripts or defer them behind user interaction.
- Compress and properly size images; see our guide on Optimizing Images for Performance.
- Audit security headers that can block preloads; see All About Headers.
- Minimize CSS and JS payloads with techniques covered in Minimizing CSS & JS for Faster Loads.
AI helps by ranking these fixes based on real-world impact. Instead of a laundry list of technical jargon, you get a triage: “Fix this first; it moves INP from 300ms to 180ms. With your setup, you do this by implementing X then Y then Z.” With the workload volume web developers have, that context is the difference between noise and value.
Performance, Accessibility, and Brand Trust
Fixing INP and other web vitals is not just about numbers on a dashboard — it’s about perception. As any user can attest, something as simple as a laggy button makes a brand feel careless. Alternatively, a stable, responsive interaction builds confidence. It really is that simple. I covered this dynamic in Building Trust Through Brand Consistency because I have seen how performance and brand identity are inseparable. Accessibility is tied into this whole equation, too. The MDN accessibility docs emphasize that clear semantics and reduced input delay help not only Googlebot but also real users with assistive tech — AI audits surface accessibility regressions faster by correlating them with INP outliers.
This is why performance audits should work in conjunction with UX reviews; a consistent brand promise requires consistent responsiveness and flow.
How We Actually Use AI for Website Audits (INP-first, Code Included)
AI is not a magic wand...mostly. It is a fast, disciplined analyst only when you feed it the right evidence and ask for a structured, verifiable output without much need for guesswork or qualitative interpretation. Here’s the exact, repeatable workflow I use to find INP bottlenecks and turn them into a short, prioritized punch list — then prove the improvement.
The 5-stage loop
- Collect evidence: Lighthouse JSON (lab) + CrUX/RUM (field) + optional DevTools trace.
- Constrain the model: strict schema, conservative instructions, numeric estimates required.
- Rank by user impact: fixes that lower interaction blocking (INP) come first.
- Implement low-risk changes quickly (split long tasks, defer third-party, reduce handler work).
- Verify: re-run lab, profile in DevTools, watch field data trend down over 28 days.
1) Inputs AI can trust
- Lighthouse JSON (mobile + desktop). Keep full JSON; excerpt
audits.metrics,diagnostics,script-treemap-dataif you must. - Field data: CrUX or your RUM (p75 INP/LCP/CLS per device). Field wins arguments; lab guides fixes.
- DevTools trace (optional but gold): validates “long tasks”, event timing, and who’s blocking the main thread.
2) Output shape (no essays—give me JSON)
I force a strict schema so recommendations are ranked and testable:
{
"url": "https://example.com/",
"summary": "Plain-English 1–2 sentences",
"vitals": { "INP": {"field_ms": null, "lab_ms": null}, "LCP": {"field_ms": null, "lab_ms": null}, "CLS": {"field": null} },
"root_causes": [
{ "id": "long-task-in-click-handler", "evidence": ["trace: 280ms task @ app.js:342"], "confidence": "high" }
],
"fixes_ranked": [
{
"title": "Split 280ms click handler; lazy-init 3P widget",
"why": "Reduces interaction blocking → INP",
"steps": ["Wrap heavy work in requestIdleCallback", "Dynamic import 3P code post-interaction"],
"est_impact_ms": 120,
"effort": "M"
}
],
"regression_tests": ["LH budget: INP_lab ≤ 200ms", "Long tasks count ≤ 10", "No 3P init inside click handlers"]
}
3) My exact intake prompt (ChatGPT)
SYSTEM:
You are a senior web performance engineer. Be conservative, cite evidence, and output ONLY valid JSON matching the schema.
USER:
Goal: Find INP root causes and return ranked, code-level fixes with ms estimates. Audience: engineer implementing this week.
Inputs:
- Lighthouse JSON (mobile): <paste audits.metrics/diagnostics or full JSON>
- Field summary (CrUX/RUM): INP p75: 320ms (mobile); LCP p75: 2.7s; CLS p75: 0.05
- Optional trace notes: 280ms long task during click on #checkout; 3P widget init inside handler.
Schema (STRICT): <paste schema above>
Rules:
- Prefer INP fixes over LCP/CLS if trade-offs appear.
- Steps must be specific (what file, which handler, which API).
- Provide est_impact_ms for each fix. Use null only if unknowable.
Return JSON only.
4) Minimal, practical code (run local or CI)
Open Node.js script
// audit.mjs — fetch Lighthouse via PSI, ask ChatGPT, write ai-audit.json
import fs from "node:fs/promises";
const URL_TO_AUDIT = process.argv[2] || "https://example.com/";
const PSI_KEY = process.env.PSI_KEY; // PageSpeed Insights key
const OPENAI_KEY = process.env.OPENAI_API_KEY; // OpenAI key
const psi = new URL("https://www.googleapis.com/pagespeedonline/v5/runPagespeed");
psi.searchParams.set("url", URL_TO_AUDIT);
psi.searchParams.set("strategy", "MOBILE");
psi.searchParams.set("category", "PERFORMANCE");
if (PSI_KEY) psi.searchParams.set("key", PSI_KEY);
const psiRes = await fetch(psi);
if (!psiRes.ok) throw new Error(\`PSI failed: \${psiRes.status}\`);
const psiJson = await psiRes.json();
// extract compact metric block to keep token usage sane
const labs = psiJson.lighthouseResult?.audits?.metrics ?? {};
await fs.writeFile("./lighthouse.json", JSON.stringify(psiJson, null, 2));
const prompt = \`
URL: \${URL_TO_AUDIT}
Field Summary: {"INP_p75_ms": null, "LCP_p75_ms": null, "CLS_p75": null}
Lighthouse metrics (mobile): \${JSON.stringify(labs)}
Schema:
{
"url":"string","summary":"string",
"vitals":{"INP":{"field_ms":null,"lab_ms":null},"LCP":{"field_ms":null,"lab_ms":null},"CLS":{"field":null}},
"root_causes":[{"id":"string","evidence":["string"],"confidence":"low|med|high"}],
"fixes_ranked":[{"title":"string","why":"string","steps":["string"],"est_impact_ms":0,"effort":"S|M|L"}],
"regression_tests":["string"]
}
Rules: Prefer INP; concrete steps; include est_impact_ms. Output JSON only.
\`;
const aiRes = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: { "Authorization": \`Bearer \${OPENAI_KEY}\`, "Content-Type": "application/json" },
body: JSON.stringify({
model: "gpt-4o-mini",
temperature: 0.2,
messages: [
{ role: "system", content: "You are a strict performance auditor. Output valid JSON only." },
{ role: "user", content: prompt }
]
})
});
if (!aiRes.ok) throw new Error(\`OpenAI failed: \${aiRes.status}\`);
const aiJson = await aiRes.json();
const content = aiJson.choices?.[0]?.message?.content ?? "{}";
await fs.writeFile("./ai-audit.json", content);
console.log("Wrote ai-audit.json");
Run: OPENAI_API_KEY=xxx PSI_KEY=xxx node audit.mjs https://your-site.com/
Outputs: lighthouse.json (raw), ai-audit.json (ranked fixes).
5) Example fix pattern (before → after)
Defer heavy work out of the click path
// BEFORE: expensive sync work inside click handler (blocks interaction)
btn.addEventListener("click", (e) => {
heavyWidget.init(); // 200–300ms main-thread block
doCheckout(e);
});
// AFTER: keep interaction “thin”; lazy-init on idle; code-split heavy module
btn.addEventListener("click", (e) => {
requestAnimationFrame(() => doCheckout(e)); // interaction remains responsive
if (!window.__widgetReady) {
requestIdleCallback(() => {
import("./heavyWidget.js").then(m => { m.init(); window.__widgetReady = true; });
});
}
});
Typical impact observed: −120–180ms from the click’s blocking time → measurable INP improvement.
6) Verification (we don’t ship vibes)
- Lab: re-run Lighthouse (mobile). Track Interaction to Next Paint and long tasks count.
- Profile: DevTools Performance → confirm the specific long task is gone or split.
- Field: watch p75 INP trend (CrUX/RUM) over the rolling 28-day window. Log deploy date for attribution.
- Budgets: add LHCI budgets to CI (e.g.,
INP_lab ≤ 200ms, long tasks ≤ 10, JS ≤ 250KB).
7) 60-second quick check (no CI)
When I’m triaging a single URL, I paste this into ChatGPT with a Lighthouse excerpt:
Act as a conservative performance engineer. Using the Lighthouse metrics/diagnostics below, return:
1) A 2-sentence summary in plain English.
2) A ranked list of 3–5 INP-focused fixes with code-level steps and estimated ms impact.
Prefer splitting long tasks, deferring third-party, and shrinking handler work. Keep it practical.
Lighthouse excerpt:
<paste audits.metrics + diagnostics>
The philosophy is simple: evidence → structured AI analysis → small, safe code changes → proof. We respect constraints, measure deltas, and ship fewer regressions.
Implementation: Build AI Into Your Workflow
The smartest way to use AI audits is not as a one-off experiment but as a repeatable guardrail in production. Establish and integrate AI processes into CI/CD pipelines so regressions trigger alerts before deployment and store baseline data so you can measure drift over time. For example, we use automated audits in the same loop as our Checklist Before Launching a Site to maintain quality across the entirety of a site. Every release gets tested for SEO, security, and performance before it ever even gets pushed to a repository.
None of this is to say that AI replaces human employees. Engineers should still validate fixes with manual inspection and diagnose deeper problems that automated audits are likely to miss — there are plenty, too. AI as a second set of eyes is appropriate, just do not use it as an infallible oracle. By pairing automation with professional judgment, you get both speed and rigor at a fraction of the resource expenditure. The goal is discipline: fewer regressions, more predictable launches, and higher confidence in every release.
Fix Responsiveness With Intelligence
As we always say, Google does not rank intentions; it ranks experiences — INP, CLS, and LCP quantify that experience into actionable technical improvements. AI-assisted audits let you see hidden drag, prioritize intelligently, and fix with focus. This combination of machine speed and human discipline is the new baseline for competitive websites.
The payoff compounds: higher Lighthouse scores, stronger SEO, and smoother conversions; all contribute to the deeper win of establishing user trust. When your site feels instant and reliable, users return. That is the real measure of performance.