top of page
Search
  • Writer's pictureAtanu Maity

Want Data from Youtube? Great Idea! Lets have it

Like social media platforms, Youtube also has become a good interest for Data Science guys to collect data. By Data , I mean Number of Views, Likes, Dis-Likes, Shares and lots of other Metrices. These metrices can be very useful for video content specific analysis. Along with these metrices we can extract comments also. Youtube comments are a great source of public reviews on a particular video. You can apply several Text Mining techniques to extract meaningful insights, which can really help you to understand the mood of the audience.


Now the question is how to get those data!!!


Here comes the topic of this discussion. We will show how to do it. But becuase of limitations, we will only show the extraction methodology for comments, along with its handson PYTHON CODES. Great!!


The first thing you need to do is to get the API key from Google Developer Console. To get any kind of Google Data (youtube is a google product only), you need to have one Developer API key for yourself. Using this in your code, will authorize you to fetch data from their sites. How to get that and the detailed methodology, is out of scope of our discussion here. We will discuss that in some other post.


Packages need to be installed: google-api-python-client, oauth2client


How to install: pip install google-api-python-client, pip install oauth2client


How to do it: Instead of having the full lines of code here, I have put the code on github. Basically, there are three different scripts which are having inter dependencies.

Scripts are,

The crawler.py is actually dependent on other two scripts. The only thing you have to change in the scripts is, in youtubecrawlerfunctionmode.py and youtube.py change the, DEVELOPER_KEY = "your youtube-api key" section. Here please provide your own API key collected from your Google Developer Console. Thats it.


Now suppose you want to extract the comments from the videos for the search item 'ColdPlay Yellow'. The only script you have to run is crawler.py, by passing your desired search term in it.


crawler.py:

from youtube import youtube

search_term = 'coldplay yellow' # your desired search term

output=youtube(search_term)


In 'output', you will get a list of dictionaries, where each dictionary contains the comments for a particular 'v_id' (video id: in a youtube link https://www.youtube.com/watch?v=RJaj39jI-qk, 'v_id' is 'RJaj39jI-qk' ) as {'user': 'comment'} ( key-value pair) pair dictionary. For one search_term you will get multiple video results , hence multiple video_ids. So, Output

will look like [{'v_id1':{'user1': 'comment1', 'user2': 'comment2',...} , ..., 'v_idn':{''user1': 'comment1', 'user2': 'comment2',...}}]. Afterwards , you can modify this list structure to dataframe structure and use the data for any of your use.


In the upcoming posts if we get chance, we will discuss the other two scripts and methodology in detail. Untill that, try your hands with this code and let us know if you stuck anywhere.


Happy Learning.

91 views0 comments

Recent Posts

See All

Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The main 4 components of Airflow a

bottom of page