October 22, 2024
Chicago 12, Melborne City, USA
python

can someone help me figure out what is wrong with my code?


import os
import zipfile
import pandas as pd
Function to find ZIP files with the relevant keywords (VTE, CLI, ART)
def find_zip_files(month_folder_path):
zip_files = {"vte": None, "cli": None, "art": None}
List all files in the month folder
for filename in os.listdir(month_folder_path):
if "VTE" in filename and filename.endswith('.zip'):
zip_files["vte"] = os.path.join(month_folder_path, filename)
elif "CLI" in filename and filename.endswith('.zip'):
zip_files["cli"] = os.path.join(month_folder_path, filename)
elif "ART" in filename and filename.endswith('.zip'):
zip_files["art"] = os.path.join(month_folder_path, filename)
return zip_files
Data_extraction_function
def extract_csv_from_zip(zip_path):
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
for file in zip_ref.namelist():
if file.endswith('.csv'): # Look for CSV files in the ZIP
with zip_ref.open(file) as csvfile:
return pd.read_csv(csvfile)
return None
Merge_function 
def merge_data(vte_df, cli_df, art_df):
if vte_df is not None and cli_df is not None:
merged_vte_cli = pd.merge(vte_df, cli_df, on='clicod', how='outer')
else:
raise ValueError("VTE or CLI data missing, cannot merge.")
if art_df is not None:
merged_final = pd.merge(merged_vte_cli, art_df, on='artcod', how='outer')
else:
merged_final = merged_vte_cli
return merged_final
Main loop iterate through years and months
def process_folders(base_path, years, max_months_per_year):
for year in years:
year_folder_path = os.path.join(base_path, year)
max_month = max_months_per_year.get(year, 12)
for month in range(1, max_month + 1):
month_folder = f'M{month:02d}' 
month_folder_path = os.path.join(year_folder_path, month_folder)
if os.path.exists(month_folder_path):
Find ZIP files in the current month folder
zip_files = find_zip_files(month_folder_path)
Extract CSV files from the ZIPs
vte_df = extract_csv_from_zip(zip_files["vte"]) if zip_files["vte"] else None
cli_df = extract_csv_from_zip(zip_files["cli"]) if zip_files["cli"] else None
art_df = extract_csv_from_zip(zip_files["art"]) if zip_files["art"] else None
If VTE or CLI files are missing, skip the month
if vte_df is None or cli_df is None:
print(f"Skipping {month_folder} in {year} due to missing VTE or CLI data.")
continue
Merge the data
merged_data = merge_data(vte_df, cli_df, art_df)
Output the merged data to a CSV file
output_file = f'merged_data_{year}_{month_folder}.csv'
merged_data.to_csv(output_file, index=False)
print(f"Merged data for {year} {month_folder} saved to {output_file}.")
else:
print(f"{month_folder_path} does not exist. Skipping...")
Define the base path and years
base_path = r"C:\Users\DATA\Wholesalers"
years = ['Y2023', 'Y2024']
max_months_per_year = {'Y2023':12,'Y2024':8}
Process all folders
process_folders(base_path, years, max_months_per_year)`

CONTEXT
I have a directory with two folders (Y2023, Y2024), each containing month folders (M01, M02, etc.), and within each month folder, there are ZIP folders with the name having the key word (VTE, CLI, ART) these zip folders contain each a CSV with the same name.
I need to extract the CVs from the ZIP folders then merge the VTE and CLI data on client code, then merge this result with the ART data on article code for every month folder apart then repeat for all months folders and do the same for all years folders then group by QT sold.
The goal here is to have a dataframe that contains all sales from VTE_files matched by client code to client_file to gather client information any by article code to article_file to gather article information.
The VTE file contains the article code and client code and quantity sold.
The data frame should also include the total quantity sold per month per product.
Wholesaler folder has two folders for Y2023 and Y2024. Each folders has monthly data in folder M01 / M02 etc as follows:
Problem
My code skips all files as if they were empty when they are not.
I tried the above code which shows a message that all files were skipped ( for all files): below the output:
Skipping M01 in Y2023 due to missing VTE or CLI data.
Skipping M02 in Y2023 due to missing VTE or CLI data.
Skipping M03 in Y2023 due to missing VTE or CLI data.
Skipping M04 in Y2023 due to missing VTE or CLI data.
Skipping M05 in Y2023 due to missing VTE or CLI data.
Skipping M06 in Y2023 due to missing VTE or CLI data.
Skipping M07 in Y2023 due to missing VTE or CLI data.
Skipping M08 in Y2023 due to missing VTE or CLI data.
Skipping M09 in Y2023 due to missing VTE or CLI data.
Skipping M10 in Y2023 due to missing VTE or CLI data.
Skipping M11 in Y2023 due to missing VTE or CLI data.
Skipping M12 in Y2023 due to missing VTE or CLI data.
Skipping M01 in Y2024 due to missing VTE or CLI data.
Skipping M02 in Y2024 due to missing VTE or CLI data.
Skipping M03 in Y2024 due to missing VTE or CLI data.
Skipping M04 in Y2024 due to missing VTE or CLI data.
Skipping M05 in Y2024 due to missing VTE or CLI data.
Skipping M06 in Y2024 due to missing VTE or CLI data.
Skipping M07 in Y2024 due to missing VTE or CLI data.
Skipping M08 in Y2024 due to missing VTE or CLI data.



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video