Data Extraction & Labeling
With ChatGPT Prompts Only!

Data extraction and labeling are crucial tasks in the field of data mining, enabling organizations to unlock valuable insights from vast amounts of information. However, extracting structured data from unstructured text has historically been a complex and challenging endeavor. Fortunately, advancements in natural language processing (NLP) have paved the way for innovative solutions. One such breakthrough is ChatGPT, a cutting-edge language model developed by OpenAI. By harnessing the power of ChatGPT prompts, data labeling and extraction processes have become more accessible and streamlined than ever before.

Sentiment Labeling

Sentiment labeling is the process of categorizing text or data based on the expressed sentiment, such as positive, negative, or neutral, to analyze attitudes or opinions. You can use it to find out if a movie review is positive or negative or a Tweet or any other paragraph of text. Here is my prompt template to classify an input text between the ### markers and get also the scores and important words and phrases which drives the scores.

    The following text is a movie rating. Is it mostly positive, negative or neutral? 
    Give me your scores for positive, negative, neutral evaluations.
    Give me the the words or phrases which are driving the score most.
    Based on the text, what would be your estimated star rating from 1-5?
    ### [USER REVIEW TEXT] ###

The typical output would be something like this ...

Sentiment labaling can be easy if you have phrases like: "bad movie", "great movie", ... but it can be pretty complicated with negation statements: "not a bad movie" or even double negations: "it is not not a bad movie".

The older version of ChatGPT (GPT-3.5) reliable find negations in review texts and label it the right way. But with reviews like: "I must say, this is not a movie that I would not recommend!" ... the limits of GPT-3.5 are reached and it will label the "not recommend" and the "not" as negative statements. On the other hand, the current version GPT-4 gives you this output ...