![]() ![]() The "topics" produced by topic modeling techniques are clusters of similar words. A document typically concerns multiple topics in different proportions thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear approximately equally in both. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. To mention a few classifications by hashtags, languages, user count … Topic modelingīest explanation from Wikipedia about topic modeling: “ In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Just like the above steps using different libraries and functions we can classify our dataset into the topic of interest. ![]() def drop_duplicate(self, df: pd.DataFrame) -> pd.DataFrame: For example, using this function I removed duplicates. Since it is text data in a table format the basic cleaning technique one can do is removing any numbers, symbols, abbreviations and removing any duplicates. def find_full_text(self)->list:Ĭleaning the data was the next step I took on the data frame. ![]() Using different functions like this one it can be extracted different point of interests. I extracted the loaded data using this function appending every tweet into a list. Here you can find the basic functions to do it. I loaded the Twitter data into pandas data frame using exporting python code. To load the data from JSON format we need to install the required libraries. The data was collected using the following keywords: Ĭreated_at, source, original_text, polarity, subjectivity, lang, favorite_count, retweet_count, original_author, followers_count, friends_count, possibly_sensitive, hashtags, user_mentions, place Twitter’s API allows you to do complex queries like pulling every tweet about a certain topic within the last twenty minutes or pull a certain user’s non- retweeted tweets.įor this project, we were provided pre-downloaded data on COVID19 related topics. Social Media Tweet Analysis on Twitter Dataset was the main task for the project and to discover abstract topics from the tweets also to classify a tweet as a positive or negative tweet sentiment-wise. One can learn from the tweets about almost anything from a company’s customer satisfaction level and the chance of coming back, to who has the likelihood of winning the next election in a certain area or even the probability of the next massive protest where and when…, it is possible to gain insights about how customers feel about certain topics, detecting urgent issues in real-time before they spiral out of control. The interesting thing about Twitter, unlike other social media platforms, is that almost all user’s tweets are available for the public and can be mined & analyzed using data processing tools and algorithms. ![]() in this digital era, it is likely to go running into data frequently to be able to make data-driven decisions. In the process of running a successful business, for social media monitoring, for governments to get an insight on what’s going on in the digital world …etc. ![]()
0 Comments
Leave a Reply. |