OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Formatting JSON for Python - need to remove \"

  • Thread starter Thread starter alexander s
  • Start date Start date
A

alexander s

Guest
I've got some JSON that looks like this:

Code:
{"name": "John",
 "description": "I'm just \"A BOY\" okay? He said \"Hello, World!\" to everyone.",
 "remark": "\"This is a test\" he mentioned."}

And the \" instances are breaking json.loads().

I feel like I've tried every regex under the sun to target these instances (but leave all the other double quotes, not preceded by a backslash) and replace them with an empty string (functionally just strip them). If anyone has tips I'd appreciate it.

My implementation right now is something like:

Code:
import re

# Example JSON string with single backslash escaped double quotes
json_string = '''
"name": "John",
"description": "I'm just \"A BOY\" okay? He said \"Hello, World!\" to everyone.",
"remark": "\"This is a test\" he mentioned."
'''

# Define a regular expression pattern to match \" within a string
pattern = r'\\"'

# Use re.sub to replace all occurrences of the pattern with an empty string
cleaned_string = re.sub(pattern, '', json_string)

print(cleaned_string)

But when i run this in a repl, nothing changes.

For reference, Id just like the output to be:

Code:
{"name": "John",
 "description": "I'm just A BOY okay? He said Hello, World! to everyone.",
 "remark": "This is a test he mentioned."}

Edit: for clarity this is just an example of the nature of the input data im working with, its coming from AWS Cloudwatch logs so I don't have an easy way to manipulate the input before dragging it into Python. For example, part of the payload is something like

Code:
"\"Girl Let's Talk\" Virtual 90s Kickback"

In context:

Code:
{"search_ads": [ {"event_id": "4838383", "ad_id": "1112", "budget_amount": 5.0, "currency": "USD", "marketplace": "Online_US", "score": 18.205433, "p_click": 0.0, "p_order": 0.0, "goal": 2, "category_id": 113, "subcategory_id": 13999, "format": null, "is_paid": false, "online_event": true, "event_start_date": "2024-06-28T00:00:00Z", "latitude": null, "longitude": null, "name": "\"Girl Let's Talk\" Virtual 90s Kickback", "vip_status": false, "is_participant": true}]}

so the \" characters are really the only problem - if I copy all that input into VS Code and just search for/delete that pattern, json.loads() works great as is.

As one commenter mentioned, i think what im looking for is a regex that will match and strip the pattern \" but ive had no luck with that so far! Ive only been able to strip either the \s, which leaves me with double quotes that break json.loads() (expecting delimiter aka thinks this is another JSON key/val pair) or stripping all the double-quotes, which of course completely breaks the same.
<p>I've got some JSON that looks like this:</p>
<pre class="lang-json prettyprint-override"><code>{"name": "John",
"description": "I'm just \"A BOY\" okay? He said \"Hello, World!\" to everyone.",
"remark": "\"This is a test\" he mentioned."}
</code></pre>
<p>And the <code>\"</code> instances are breaking <code>json.loads()</code>.</p>
<p>I feel like I've tried every regex under the sun to target these instances (but leave all the other double quotes, not preceded by a backslash) and replace them with an empty string (functionally just strip them). If anyone has tips I'd appreciate it.</p>
<p>My implementation right now is something like:</p>
<pre class="lang-py prettyprint-override"><code>import re

# Example JSON string with single backslash escaped double quotes
json_string = '''
"name": "John",
"description": "I'm just \"A BOY\" okay? He said \"Hello, World!\" to everyone.",
"remark": "\"This is a test\" he mentioned."
'''

# Define a regular expression pattern to match \" within a string
pattern = r'\\"'

# Use re.sub to replace all occurrences of the pattern with an empty string
cleaned_string = re.sub(pattern, '', json_string)

print(cleaned_string)
</code></pre>
<p>But when i run this in a repl, nothing changes.</p>
<p>For reference, Id just like the output to be:</p>
<pre class="lang-json prettyprint-override"><code>{"name": "John",
"description": "I'm just A BOY okay? He said Hello, World! to everyone.",
"remark": "This is a test he mentioned."}
</code></pre>
<p>Edit: for clarity this is just an example of the nature of the input data im working with, its coming from AWS Cloudwatch logs so I don't have an easy way to manipulate the input before dragging it into Python. For example, part of the payload is something like</p>
<pre class="lang-json prettyprint-override"><code>"\"Girl Let's Talk\" Virtual 90s Kickback"
</code></pre>
<p>In context:</p>
<pre class="lang-json prettyprint-override"><code>{"search_ads": [ {"event_id": "4838383", "ad_id": "1112", "budget_amount": 5.0, "currency": "USD", "marketplace": "Online_US", "score": 18.205433, "p_click": 0.0, "p_order": 0.0, "goal": 2, "category_id": 113, "subcategory_id": 13999, "format": null, "is_paid": false, "online_event": true, "event_start_date": "2024-06-28T00:00:00Z", "latitude": null, "longitude": null, "name": "\"Girl Let's Talk\" Virtual 90s Kickback", "vip_status": false, "is_participant": true}]}
</code></pre>
<p>so the <code>\"</code> characters are really the only problem - if I copy all that input into VS Code and just search for/delete that pattern, <code>json.loads()</code> works great as is.</p>
<p>As one commenter mentioned, i think what im looking for is a regex that will match and strip the pattern <code>\"</code> but ive had no luck with that so far! Ive only been able to strip either the <code>\s</code>, which leaves me with double quotes that break <code>json.loads()</code> (expecting delimiter aka thinks this is another JSON key/val pair) or stripping all the double-quotes, which of course completely breaks the same.</p>
 

Latest posts

Top