OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Decoding error in a json file Extra data. Does anyone know what could be causing this?

  • Thread starter Thread starter Bruno Ciccarino
  • Start date Start date
B

Bruno Ciccarino

Guest
I decided to make a bot for Telegram that has a Markov chain, and to train the Markov chain, I downloaded the chat history of my group of friends and made a script to filter and separate only the messages from the group, but some errors appeared in the JSOM, and as the file is very large, (7193776 lines), I wrote a script that automates the correction of the json as it is unfeasible to correct it manually, I handled exceptions, and it is returning this coding error:

Code:
It was not possible correct the JSON: Extra data: line 1 column 3 (char 2)
JSON decoding error on line: Extra data: line 1 column 7 (char 6)

I think it's because JSON has more than one object per line, I wanted to know your opinion, what do you think it could be?

Here is a snippet of code from the script (I won't go through it in full so as not to make this post too long):

Code:
            item = json.loads(line)
            return item
        except json.JSONDecodeError as e:
            print(f"Unable to fix JSON: {e}")
            return None

# Open input file
with open(input_file, 'r', encoding='utf-8') as in_file:
    for line in in_file:
        # Tenta corrigir a linha
        fixed_line = fix_json_line(line.strip())
        if fixed_line:
            fixed_messages.append(fixed_line)
        else:
            unfixed_lines.append(line)

# Saves corrected messages to a new JSON file
with open(output_file, 'w', encoding='utf-8') as out_file:
    json.dump(fixed_messages, out_file, ensure_ascii=False, indent=2)

# Saves lines that could not be corrected in a separate file
if unfixed_lines:
    with open('unfixed_lines.txt', 'w', encoding='utf-8') as unfixed_file:
        unfixed_file.writelines(unfixed_lines)
    print(f"Lines that could not be corrected were saved in 'unfixed_lines.txt'")
else:
    print("All lines were successfully corrected.")

print(f"Lines corrected and saved in {output_file}")

I've already tried searching but as the description of the error is very vague (I couldn't specify the error further) Google returned several possible errors but I don't know what it could be, I asked GPT chat but some days it seems like it doesn't work properly. .. And I asked my technical course teacher but he couldn't answer me! This is an example of how the json is formatted:

Code:
{
   "id": 610775,
   "type": "message",
   "date": "2024-06-27T13:55:13",
   "date_unixtime": "1719507313",
   "from": "Swelve",
   "from_id": "user5957514107",
   "reply_to_message_id": 610761,
   "text": "antes de pisar numa universidade ele deveria revisar esse português dele",
   "text_entities": [
    {
     "type": "plain",
     "text": "antes de pisar numa universidade ele deveria revisar esse português dele"
    }
   ]
  },
  {
   "id": 610776,
   "type": "message",
   "date": "2024-06-27T13:56:31",
   "date_unixtime": "1719507391",
   "from": "Old dirty bastard λ",
   "from_id": "user1758042831",
   "text": "No mínimo",
   "text_entities": [
    {
     "type": "plain",
     "text": "No mínimo"
    }
   ]
  }
 ]
}

From my point of view there is nothing wrong with json, but as I have just started with python and machine learning I could be very wrong (before I was a functional programming-loving hippie, but here in my country there are more python jobs so I started studying Python to have a chance at a job) I'm asking this because in order to correct the error I first have to know what the error is, and I don't even know that...
<p>I decided to make a bot for Telegram that has a Markov chain, and to train the Markov chain, I downloaded the chat history of my group of friends and made a script to filter and separate only the messages from the group, but some errors appeared in the JSOM, and as the file is very large, (7193776 lines), I wrote a script that automates the correction of the json as it is unfeasible to correct it manually, I handled exceptions, and it is returning this coding error:</p>
<pre><code>It was not possible correct the JSON: Extra data: line 1 column 3 (char 2)
JSON decoding error on line: Extra data: line 1 column 7 (char 6)
</code></pre>
<p>I think it's because JSON has more than one object per line, I wanted to know your opinion, what do you think it could be?</p>
<p>Here is a snippet of code from the script (I won't go through it in full so as not to make this post too long):</p>
<pre><code> item = json.loads(line)
return item
except json.JSONDecodeError as e:
print(f"Unable to fix JSON: {e}")
return None

# Open input file
with open(input_file, 'r', encoding='utf-8') as in_file:
for line in in_file:
# Tenta corrigir a linha
fixed_line = fix_json_line(line.strip())
if fixed_line:
fixed_messages.append(fixed_line)
else:
unfixed_lines.append(line)

# Saves corrected messages to a new JSON file
with open(output_file, 'w', encoding='utf-8') as out_file:
json.dump(fixed_messages, out_file, ensure_ascii=False, indent=2)

# Saves lines that could not be corrected in a separate file
if unfixed_lines:
with open('unfixed_lines.txt', 'w', encoding='utf-8') as unfixed_file:
unfixed_file.writelines(unfixed_lines)
print(f"Lines that could not be corrected were saved in 'unfixed_lines.txt'")
else:
print("All lines were successfully corrected.")

print(f"Lines corrected and saved in {output_file}")

</code></pre>
<p>I've already tried searching but as the description of the error is very vague (I couldn't specify the error further) Google returned several possible errors but I don't know what it could be, I asked GPT chat but some days it seems like it doesn't work properly. .. And I asked my technical course teacher but he couldn't answer me! This is an example of how the json is formatted:</p>
<pre><code>{
"id": 610775,
"type": "message",
"date": "2024-06-27T13:55:13",
"date_unixtime": "1719507313",
"from": "Swelve",
"from_id": "user5957514107",
"reply_to_message_id": 610761,
"text": "antes de pisar numa universidade ele deveria revisar esse português dele",
"text_entities": [
{
"type": "plain",
"text": "antes de pisar numa universidade ele deveria revisar esse português dele"
}
]
},
{
"id": 610776,
"type": "message",
"date": "2024-06-27T13:56:31",
"date_unixtime": "1719507391",
"from": "Old dirty bastard λ",
"from_id": "user1758042831",
"text": "No mínimo",
"text_entities": [
{
"type": "plain",
"text": "No mínimo"
}
]
}
]
}
</code></pre>
<p>From my point of view there is nothing wrong with json, but as I have just started with python and machine learning I could be very wrong (before I was a functional programming-loving hippie, but here in my country there are more python jobs so I started studying Python to have a chance at a job) I'm asking this because in order to correct the error I first have to know what the error is, and I don't even know that...</p>
 
Top