ValueError: not enough values to unpack (expected 3, got 2) when extracting data using zip() in Pandas

I’m trying to clean and organize my data from a CSV file using Python and Pandas. Specifically, I want to extract structured information (like Social Security Numbers, Date of Birth, and Relationships) from the ‘Notes’ column of my DataFrame. However, I keep encountering this error:

PS C:\Users\hokop\Documents\GitHub\Tina-Agency-of-Texas-Data> python test2.py
Traceback (most recent call last):
  File "C:\Users\hokop\Documents\GitHub\Tina-Agency-of-Texas-Data\test2.py", line 80, in <module>
    df['SSN'],df['DOB'],df['Relationship'] = zip(*df['Notes'].apply(extract_info))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 3, got 2)

I’m confident that my extract_info function is returning three values (SSN, DOB, Relationship). When I print the output within the function, all three variables are there. Here’s a simplified version of my code:

import re
import pandas as pd

# Sample input data
df = pd.read_csv('contacts.csv')

# Define regex patterns for DOB and SSN
dob_pattern = r'\b(?:DOB:|DOB;|DOB: |DOB;)\s*:? ?([0-9]{2}/[0-9]{2}/[0-9]{4})\b'
ssn_pattern = r'\b(?:SS|SS |SS#|SS:|SS: |SS;|SS; |SS# |SS#:|SS#: )\s*:? ?([0-9]{3}-[0-9]{2}-[0-9]{4}|[0-9]{9})\b'
name_pattern3 = r'(?P<first>[A-Za-z]+)(?:\s+(?P<middle>[A-Za-z]+))?\s+(?P<last>[A-Za-z]+)'
name_pattern2 = r'(?P<first>[A-Za-z\'-]+)\s+(?P<last>[A-Za-z\'-]+)'

# Define a list of relationship keywords
relationship_keywords = [
    "father",
    "mother",
    "brother",
    "sister",
    "friend",
    "spouse",
    "partner",
    "child",
    "aunt",
    "uncle",
    "cousin"
]

# Compile a regex pattern for the relationships
relationship_pattern = r'\b(?:' + '|'.join(relationship_keywords) + r')\b'

# Function to extract structured information
def extract_info(entry):
    if not isinstance(entry, str):  # Check if the entry is a string
        return '',''  # Return empty values for non-strings

    
    # Initialize variables
    name = ""
    dob = ""
    ssn = ""
    relationship = "asd"
    
    # Split entry into lines
    lines = entry.splitlines()
    for line in lines:
        line = line.strip()
        
        # if re.match(relationship_pattern, line): 
        #     relationship = re.search(relationship_pattern, line).group(1)
            
        #     if re.match(name_pattern3, line): 
        #         name = re.search(name_pattern3, line).group(1)
        #     if re.match(name_pattern2, line):
        #         name = re.search(name_pattern2, line).group(1)
        # elif not relationship:
        #     relationship = 'asd'
        if re.match(name_pattern3, line): 
            
            name = re.search(name_pattern3, line).group(1)
        elif re.match(name_pattern2, line):
            name = re.search(name_pattern2, line).group(1)
        elif re.match(ssn_pattern, line):
            # Extract SSN
            ssn = re.search(ssn_pattern, line).group(1)
        elif re.match(dob_pattern, line):
            # Extract DOB
            dob = re.search(dob_pattern, line).group(1)
        else:
            # Assume the remaining line is the name
            if line.strip() != '':
                name = line
            else:
                name=""
    relationship = "asd"


    return ssn, dob, relationship
# Process each entry and create a list of dictionaries

df['SSN'],df['DOB'],df['Relationship'] = zip(*df['Notes'].apply(extract_info))

# Convert structured data to a DataFrame for better visualization
df.to_csv('ssn.csv', index=False)

# Display the DataFrame
print(df)

I’m expecting the extract_info function to return a tuple of three values, which should be unpacked into three new columns (SSN, DOB, Relationship). But the error suggests that sometimes only two values are returned.

Here are a few details about my setup:

I’m using regex to extract specific patterns.
If an entry doesn’t match the expected patterns, I want the corresponding values to default to empty strings.
What could be causing the function to return only two values instead of three in some cases? Any advice on how to debug or fix this issue would be greatly appreciated!

You need to sign in to view this answers

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

ValueError: not enough values to unpack (expected 3, got 2) when extracting data using zip() in Pandas

Leave feedback about this Cancel Reply

PROS

CONS

Categories

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP

Recent Posts

Postgres drop type XX000 “cache lookup failed for type”

Login servlet app with session and cookies

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Follow Us

ValueError: not enough values to unpack (expected 3, got 2) when extracting data using zip() in Pandas

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP