SEO Data Analysis & Visualization with ChatGPT's Code Interpreter

Exploring AI possibilities

Jul 28, 2023

ChatGPT’s experimental Code Interpreter model is here and OpenAI claims it’s especially useful for doing data analysis and visualization. Let’s put it to the test and run through some examples.

I’m going to break this down as follows:

A quick introduction
3 example scenarios with detailed steps to replicate
Initial thoughts and takeaways

Introduction

The TL;DR is: ChatGPT’s code interpreter is good. It’s really good. If you work in SEO with lots of data, an AI-powered assistant now makes your life easier.

Before we get into the examples, let’s just make a few things clear:

Code interpreter is an experimental model, be responsible
Sometimes it doesn’t work or it gives you bad or inconsistent outputs
Good prompts are critical, you need to experiment and get familiar with the conversational approach

You’ve got to try it for yourself and experiment. You need to be specific and detailed with your prompts, but it soon starts to become intuitive.

Let’s jump into the examples.

Example 1 — Custom CTR Model

In this first example, we are going to use Google Search Console (GSC) performance data to build a quick custom CTR model. The steps are as follows:

Export GSC performance data
1. We recommend using at least 12 months of data and using ‘Queries.csv’ for query-level CTR data
Upload ‘Queries.csv’ into ChatGPT
Make your requests with prompts to analyze the data
Make your requests with prompts to visualize the data

Let’s walk through the prompts.

Work out the average CTR by rounded position. Show the data for positions 1-20 in a modern table format with CTR displayed as a percentage

As we watch ChatGPT interpret the data and explain its process, we can validate that these steps make sense. Then it’ll quickly run those steps and present the data in the table format we requested.

This is exactly the data we want. We now have a custom CTR model based on internal query data, that can help us better understand the available search opportunity.

Now how do we visualize this in a more meaningful way?

Visualize the table in a graph showing the curve line for positions 1 through 20

This is a much better way to visualize the CTR curve. We could combine this visual with insights we can derive from the data.

Improving from position 3 to 2 will result in an ~84% increase in clicks
Improving from position 4 to 2 will result in an ~152% increase in clicks
Improving from position 7 to 2 will result in an ~863% increase in clicks

Going back to our original table, we can easily request additional outputs based on how we might want to manipulate the data. For example, let’s say I want to better understand percentage improvements of positions and the change that has on the total amount of traffic.

Add 2 new columns: show the percentage increase when a position improves, show the total clicks for 100k searches for each position

This gives me a better directional understanding of search traffic opportunities, providing useful context when looking at the search volume.

I’m sure we could make a nice visual of the above too, but let’s move on to the second example.

Example 2 — Using CTR Models to Understand Traffic Opportunity

In our second example, we’re going to combine our CTR model with third-party keyword data from Semrush. The steps are as follows:

Export Semrush keyword data
1. We want to include keywords, search volume & keyword difficulty, and we’ll save them as ‘Keywords.csv’
Upload ‘Queries.csv’ into ChatGPT and build a custom CTR model per the steps above
Upload ‘Keywords’.csv into ChatGPT
Make your requests with prompts to analyze the data
Make your requests with prompts to visualize the data

We’ll assume the CTR model has been created and walk through the prompts for pulling in the Semrush keyword data.

Combining some of our prompts, we’re going to ask for the visual right away so we’re not first getting a table and then a visual like in the first example. As you experiment with prompts you’ll start to figure out how to string things together.

Calculate traffic for each keyword for positions 1 through 20. Traffic = volume * CTR by position. Aggregate all the keyword data and visualize as a graph showing the curve line for positions 1 through 20

Remember how we included keyword difficulty in our data? We can now make custom views of our data with KD% breakdowns.

Modify the graph by filtering to use only keywords with a 'Keyword Difficulty' number of 30 or lower. Keep the same color trendline

The formatting of visuals can be inconsistent, so you can get a little more specific in your prompts by including things like ‘keep the same color trend line’.

What if we combine these 2 views?

Show 2 lines: one with all keywords and one with only keywords with a 'keyword difficulty' number of 30 or lower

This is a good way to add some context to the traffic opportunity. The curve growth for KD 30 and lower looks a lot less pronounced in this view.

The ability to quickly make these modifications is really powerful. For another quick example, you could segment keywords by search intent, e.g. informational, commercial, transactional. This is a powerful way for ecommerce brands to segment conversion-focused keywords that can be used with revenue metrics.

Example 3 — Using CTR Models & Rank Data to Forecast Growth

In our third example, we’re going to combine our CTR model with third-party rank data from Semrush. The steps are as follows:

Export Semrush rank data
1. We want to include keywords, position, search volume & keyword difficulty, and we’ll save them as ‘Rankings.csv’
Upload ‘Queries.csv’ into ChatGPT and build a custom CTR model per the steps above
Upload ‘Rankings.csv’ into ChatGPT
Make your requests with prompts to analyze the data
Make your requests with prompts to visualize the data

We start again with building our custom CTR model and then we’re ready to walk through the prompts for pulling in the Semrush rank data.

For our forecast, we’re going to make a very broad assumption that every keyword will improve by 1 position each month for the next 12 months. We will use our CTR model to calculate estimated traffic for each keyword and visualize how traffic will change over the 12 months.

Let’s pull our estimated traffic first.

Calculate the current traffic for each keyword using the 'Average CTR' based on 'Position'. Traffic = volume * CTR by position. Include keyword 'Volume' and 'Position' and present data in a table. Exclude keywords with positions >20

We specified ‘exclude keywords with positions >20’ because our CTR model only goes to 20 and there are keywords ranking >20 in the Semrush data.

Now let’s build out our 12 months of improvements.

Keeping the table format above, help forecast 12 months of traffic where the position improves by 1 place each month. Once a position reaches 1 it will stay in that position

This appears to look good but it’s hard to visualize. You know what’s next.

Aggregate all the traffic data by month and visualize as a graph showing the curve line for months 1 to 12

We made a slight adjustment to the x-axis labels, and yes those small modifications work exactly as you’d expect.

One thing that stands out with this forecast is the immediate growth. SEO takes time, right? Well, in our data we do have a lot of keywords sitting in prominent positions, so it makes sense that they’re able to produce short-term traffic improvements.

Let’s modify this to show our “striking distance” keywords that sit in positions 11-20.

Repeat but for keywords in positions 11-20. Use the same color trendline

Our model tells us that it takes around 6 months for keywords in positions 11-20 to move into positions high enough to show a meaningful increase in traffic.

The good thing here is that growth is going to continue past 12 months. Let’s modify it so we can see a 24-month view.

Keep the table the same but show me 24 months using the same position improvement method

We can now see that it’s going to take ~18 months for keywords in positions 11-20 to hit their max traffic potential.

What if we combine views?

Modify the graph to include 2 lines: one for only keywords in positions 1-5, and one for only keywords in positions 6-20

If we assume we’re making site-wide improvements for all of these keywords, we can see that there is potential for short-term wins, but it will take longer for keywords outside of the top 5 positions to drive significant traffic increases.

What if we have limited resources and can only focus on one of these groups at a time? For example, let’s assume that we won’t have available resources to focus on keywords in positions 6-20 for the first 4 months, so we expect growth to start from month 5 onwards.

Repeat but have keywords in positions 6-20 only improving position from month 5 onwards. They will improve with the same position logic as above

Our blue line now has a nice gradual curve.

As you can see, there are lots of different ways you can modify and customize your visuals, as well as the ability to blend data sources for additional refinements.

While it’s generally good to be more detailed and specific with prompts, you can be more vague if you’re looking for inspiration. Use ‘help me visualize’ and make refinements based on your outputs.

Initial Thoughts & Takeaways

I'm having a lot of fun playing with the code interpreter, so that's a pretty good endorsement if you ask me. This is a significant development for SEOs working with data, as it unlocks new possibilities and improves existing workflows.

Using the code interpreter is something you should spend some time with to fully understand its capabilities. Hopefully, the examples above can inspire and help you get started.

The bigger picture here is the need to figure out the right way to integrate AI responsibly into workflows, understand what it can and can't handle, and provide details on how to achieve consistently reliable outputs.

I'm excited to continue exploring possibilities and refining my approach. If you haven't started experimenting yet, now is a good time to get started.

Simon’s Substack

Discussion about this post