Use IMPORTXML to Compile YouTube Analytics for Any User
Paul Koch, Former Data & Analytics Director
Scrape your way to a better picture of account performance.
Hi there — Google recently changed their Sheets setup, so the solution below no longer works. Feel free to use this article as a starting point for your own work, and share in the comments if you're able to create a functioning work-around. We're hoping to post an updated solution soon ourselves!
"How does our YouTube posting frequency compare to our competitors?” “Are our organization’s videos getting longer or shorter over time?” “How does this year's publishing volume compare to each of the past 5 years?” The answers to these questions are surprisingly difficult to extract from YouTube’s Analytics interface or API, but simple to get through page scraping.
Here’s the type of data you can quickly get, even for other usernames, such as NHLVideo:
A quick shout-out: the process below builds on the solid resources written by Nate Shivar and Tom Woodward.
To start, you’ll need Google Chrome with the Scraper add-on, as well as Google Sheets.
Go to the username’s “Videos” tab, for example, https://www.youtube.com/user/NHLVideo/videos.
Click “Load More” until all videos have loaded.
Right click on a video title and select “Scrape Similar.”
Scraper will open a window that lists the title and URL; click “Copy to clipboard.”
Make a copy of this Google Sheet, paste the data into cell A3, and drag down the formulas from the remaining cells. That’s it!
Here’s a look at what’s happening under the hood of the spreadsheet. In each of these, we’re assuming that C3 is the full YouTube URL of a given video.
=value(REGEXREPLACE(text(importxml(C3,"//*[contains(@class, 'watch-view-count')]"),0)," view(s)?",""))
The IMPORTXML function pulls in the class of 'watch-view-count', which includes the words “view” or “views.” Because we just want the number, IMPORTXML strips off the text.
Video Duration in Seconds:
The IMPORTXML function pulls in the itemprop of 'duration', but includes the time in a format such as 'PT137M17S'. In this format, the number of minutes appear between the T and the M, and the number of seconds appear between the M and the S. REGEXEXTRACT pulls out these minute and second values, then we convert everything to total seconds.
The IMPORTXML function pulls in the class of 'like-button-renderer-like-button', but Google Sheets will return an “N/A” if there are no likes. We use the ISNA formula to check for the error and, if detected, return a 0.
This formula is similar to the Likes except that we’re importing the class 'like-button-renderer-dislike-button'.
=regexreplace(importxml(C3,"//*[contains(@class, 'watch-time-text')]"),"((Uploaded)|(Published)|(Streamed live)) on ","")
The IMPORTXML function pulls in the class of 'watch-time-text', but the results can vary, potentially saying, “Uploaded on,” “Published on,” or “Streamed live on,” and no matter what, we just want the date. The REGEXREPLACE formula strips off this extraneous information.
Hope this post helped you better understand your and your competitors’ YouTube videos!