Python Simplified

PythonSimplifiedcomLogo

How to Scrape Tweets Without Twitter’s API Using TWINT

Scrape Tweets with Twint

Introduction

If you are working on a project that deals with Tweets, you probably using Twitter API. In order use use Twitter API, you need to apply for developer account first and get it approved. Once approved you will be provided with API Key, API Key Secret, Access Token, and Access Token Secret. Once you have these these keys and token, you can actually start using Twitter API to scrape tweets for your project. 

With Twint package, you don’t need all these keys and token. All you need to do is install the package Twint and start using it right away. 

What is Twint

Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s followers, following, tweets, etc. without having to use Twitter API.

Here are some of the benefits

  • Twitter API has restrictions to scrape only the last 3200 Tweets. But Twint can fetch almost all Tweets.
  • Set up is really quick as there is no hassle of setting up Twitter API.
  • Can be used anonymously without Twitter sign-up.
  • It’s free!! No pricing limitations.
  • Provides easy to use options to store scraped tweets into different formats — CSV, JSON, SQLite, and Elasticsearch.

Installation

Using Pip

				
					pip3 install twint
				
			

Directly from Git

				
					git clone --depth=1 https://github.com/twintproject/twint.git
%cd twint
pip3 install . -r requirements.txt
				
			

Initial set up

Firstly, we create a Twitter configuration object called c , and then different parameters are passed to the object. These parameters will define how we are going to scrape the tweets. 

In the below example, we are passing only two parameters Username and Limit. Username is the twitter id of the user and Limit is the how many tweets to be scraped. The limit works in the increment of 100 so Limit 1 means 100 tweets. Finally, twint.run.search will scrape the twitter and return the tweets.

				
					import twint

# Configure
c = twint.Config()
c.Username = "sonusood"
c.Limit = 1

# Run
twint.run.Search(c)
				
			
				
					# Sample output
---------------
1401229868600225794 2021-06-05 17:30:08 +0000 <SonuSood> There are many stories like these that I have experienced very closely in last few days. We get to know only about few. It's high time that the previleged ones come forward and support such needy families. They need us. Please find them because they might not be able to find u🙏
1401229866968576007 2021-06-05 17:30:08 +0000 <SonuSood> City: Nagpur Day:1 Mother, Father, Brother &amp; Sister all tested positive. Day:7  Brother dies, Mom n Dad who are critical are not informed.  Day:9 Father dies without knowing the son passed away 2 days ago.  Day:10 Mother is very critical.  Only survivor is this 19 yr old girl💔
1401133094615150592 2021-06-05 11:05:36 +0000 <SonuSood> #Throwback to the modelling days in Mumbai.  https://t.co/Fxpc9KDwEJ
1401132849906917377 2021-06-05 11:04:37 +0000 <SonuSood> You are a Star 🌟
1401121032145018882 2021-06-05 10:17:40 +0000 <SonuSood> This is so awesome 🙏
				
			

If you trying this on the notebook you may get into run into the error “RuntimeError: This event loop is already running”. To resolve this error, follow the below steps:

				
					pip install nest_asyncio
import nest_asyncio
nest_asyncio.apply()
				
			

Now that initial set up is done, let us go through some of the examples below.

Examples

For the below examples, I will be using the twitter id SonuSood. You probably know him very well if you are from India. He has been working tirelessly to help the needy since the pandemic started last year. 

1. Tweets from a specific date

The below code scrape tweets of Sonu Sood from the date 05-Jun-2021. 

				
					# Configure
c = twint.Config()
c.Username = "sonusood"
c.Limit = 1
c.Since = '2021-06-05'

# Run
twint.run.Search(c)
				
			

2. Tweets with specific search strings

This code scrape the tweets from Sonu Sood which contain the string ‘India‘.

				
					# Configure
c = twint.Config()
c.Username = "sonusood"
c.Limit = 1
c.Search = ['India']

# Run
twint.run.Search(c)
				
			

3. Tweets with Images, Videos or Media (images or videos)

Code to scrape tweets of Sonu Sood that contains images, videos or both. The below code is looking for both images and videos. If you are looking for only images or videos then set c.Images or c.Videos to True.

				
					# Configure
c = twint.Config()
c.Username = "sonusood"
c.Limit = 1
#c.Images= True
#c.Videos = True
c.Media = True

# Run
twint.run.Search(c)
				
			

4. Popular tweets of a user

The below code gets you the popular tweets of Sonu Sood. All you have to do it set Popular_tweets to True.

				
					# Configure
c = twint.Config()
c.Username = "sonusood"
c.Limit = 1
c.Popular_tweets = True

# Run
twint.run.Search(c)
				
			

5. Tweets based on min likes, min retweets, and min replies

You can also scrape the tweets based on min likes, min retweets and min replies. You need to set Min_likes or Min_replies or Min_retweets to the required value. The below code scrapes tweets with Min_likes of 30,000.

				
					c = twint.Config()
c.Username = "sonusood"
c.Limit = 1
c.Min_likes = 30000
#c.Min_replies = 1000
#c.Min_retweets = 100

twint.run.Search(c)
				
			

6. Store tweets as CSV or JSON

You can store the scraped tweets into a CSV or JSON file. You need to set Store_json or Store_csv to True and then mention the name of the output file to c.Output as shown below.

				
					c = twint.Config()
c.Limit = 1
c.Username = 'sonusood'
c.Min_likes = 30000
# c.Store_csv = True
c.Store_json = True
c.Output = "tweets.json"

twint.run.Search(c)

				
			

7. Store tweets as Pandas DataFrame

Twint also supports Pandas which means the scraped tweets can be loaded into Pandas dataframe which you can use it further. The below code, scrape top 100 tweets of Sonu Sood with min likes of 30,000 and put them into dataframe.

				
					c = twint.Config()
c.Limit = 1
c.Username = 'sonusood'
c.Min_likes = 30000
c.Pandas = True

twint.run.Search(c)

Tweets_df = twint.storage.panda.Tweets_df
				
			

These are just some of the functionalities provided by Twint. But Twint provides many such configurations and the list of all the configurations is listed on the official page here. You can use a combination of these options to filter out the tweets of your interest for further analysis.

Complete code

Conclusion

In this article, we went through some of the functionalities provided by Twint for scrapping tweets. It also provides some advanced capabilities such as storing tweets into the database, memory (python lists), and Elasticsearch. Using the knowledge gained here one can quickly scrape the tweets and start using it for your further analysis.

Originally published at Medium on Oct 15, 2020.

References

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on email
Chetan Ambi

Chetan Ambi

A Software Engineer & Team Lead with over 10+ years of IT experience, a Technical Blogger with a passion for cutting edge technology. Currently working in the field of Python, Machine Learning & Data Science. Chetan Ambi holds a Bachelor of Engineering Degree in Computer Science.
Scroll to Top