OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

remove double quotes from tab delimited data in python

  • Thread starter Thread starter Sandeep Mohanty
  • Start date Start date
S

Sandeep Mohanty

Guest
I have some tab delimited data stored in a csv file. I am trying to open the csv file and separate the data on the separator '\t'. but in the data there are some extra quotes present for which I am not able to get the desired output and facing issues with the data. Need help on this.

Input sample data

Code:
"id name    class   age school  doj dol status  source"
"001    sandeep 10  16  dav 2012.12.12  2023.12.12  passed  database"
"002    rahul   09  15  ximb    2023.11.11  ""inprogress    manual"
"003    aditya  12  18  kmbb    2024.01.12  ""inprogress    schoolrecord"
"004    ved ""09    15  ximb    2022.11.11  2023.12.13  passed  manual"

enter image description here

Code:

Code:
import pandas as pd 
file='data_tab_delimited.csv'
data = pd.read_csv(file,sep="\t")
print(data)

data.to_csv('school.csv')

Output obtained:

Code:
,id name    class   age school  doj dol status  source
0,001   sandeep 10  16  dav 2012.12.12  2023.12.12  passed  database
1,"002  rahul   09  15  ximb    2023.11.11  ""inprogress    manual"
2,"003  aditya  12  18  kmbb    2024.01.12  ""inprogress    schoolrecord"
3,"004  ved ""09    15  ximb    2022.11.11  2023.12.13  passed  manual"

Desired output:

Code:
    id     name  class  age school         doj         dol    status        source
0  001  sandeep     10   16    dav  2012.12.12  2023.12.12    passed     database
1  002    rahul      9   15   ximb  2023.11.11         NaN  inprogress      manual
2  003   aditya     12   18   kmbb  2024.01.12         NaN  inprogress  schoolrecord
3  004      ved    NaN   15   ximb  2022.11.11  2023.12.13    passed       manual

Sample of source data:

Code:
"Equipment Number   Equipment Desc  Equipment category  Type of Technical Object    Technical Object Desc   Object number   Maintenance Plan    PLANT   Planner Group   Planner Group Desc  Work Center Work Center Desc    ABC indicator   Maintenance plant   LOCATION    Location Desc   Valid To Date   Start-up Date   Manufacturer serial number  Manufacturer model number   Manufacturer part number    Manufacturer of asset   COUNTRY Year of construction    Month of construction   ROOM    Sort field  Cost Center Catalog Profile Catalog Profile Desc    Superordinate Equipment Guarantee date  Warranty ends   Created on  FUNCTION_LOCATION   STATUS  SOURCE_ID"



"0000101    U02 GENANC RELAY PANEL  K   PWELE-OBJ   ELECTRICAL OBJECTS  IE0567      5010    TM2 Ture Mai-Elec   EGXX1   ELECTRICAL MAINT. Unit-2(GENEANC)   C   5XX SXX     9999.12.31      ""G9876.PG1/0ABC            XX ABXD CO. LTD                     50MAABC PWABC   ELEC SYS GEN                2011.12.15      INACTIVE    DUMMY"
<p>I have some tab delimited data stored in a csv file. I am trying to open the csv file and separate the data on the separator <code>'\t'</code>. but in the data there are some extra quotes present for which I am not able to get the desired output and facing issues with the data. Need help on this.</p>
<p><strong>Input sample data</strong></p>
<pre><code>"id name class age school doj dol status source"
"001 sandeep 10 16 dav 2012.12.12 2023.12.12 passed database"
"002 rahul 09 15 ximb 2023.11.11 ""inprogress manual"
"003 aditya 12 18 kmbb 2024.01.12 ""inprogress schoolrecord"
"004 ved ""09 15 ximb 2022.11.11 2023.12.13 passed manual"
</code></pre>
<p><a href="https://i.sstatic.net/LR7I37pd.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/LR7I37pd.png" alt="enter image description here" /></a></p>
<p><strong>Code:</strong></p>
<pre><code>import pandas as pd
file='data_tab_delimited.csv'
data = pd.read_csv(file,sep="\t")
print(data)

data.to_csv('school.csv')
</code></pre>
<p><strong>Output obtained:</strong></p>
<pre class="lang-none prettyprint-override"><code>,id name class age school doj dol status source
0,001 sandeep 10 16 dav 2012.12.12 2023.12.12 passed database
1,"002 rahul 09 15 ximb 2023.11.11 ""inprogress manual"
2,"003 aditya 12 18 kmbb 2024.01.12 ""inprogress schoolrecord"
3,"004 ved ""09 15 ximb 2022.11.11 2023.12.13 passed manual"
</code></pre>
<p><strong>Desired output:</strong></p>
<pre class="lang-none prettyprint-override"><code> id name class age school doj dol status source
0 001 sandeep 10 16 dav 2012.12.12 2023.12.12 passed database
1 002 rahul 9 15 ximb 2023.11.11 NaN inprogress manual
2 003 aditya 12 18 kmbb 2024.01.12 NaN inprogress schoolrecord
3 004 ved NaN 15 ximb 2022.11.11 2023.12.13 passed manual
</code></pre>
<p><strong>Sample of source data:</strong></p>
<pre class="lang-none prettyprint-override"><code>"Equipment Number Equipment Desc Equipment category Type of Technical Object Technical Object Desc Object number Maintenance Plan PLANT Planner Group Planner Group Desc Work Center Work Center Desc ABC indicator Maintenance plant LOCATION Location Desc Valid To Date Start-up Date Manufacturer serial number Manufacturer model number Manufacturer part number Manufacturer of asset COUNTRY Year of construction Month of construction ROOM Sort field Cost Center Catalog Profile Catalog Profile Desc Superordinate Equipment Guarantee date Warranty ends Created on FUNCTION_LOCATION STATUS SOURCE_ID"



"0000101 U02 GENANC RELAY PANEL K PWELE-OBJ ELECTRICAL OBJECTS IE0567 5010 TM2 Ture Mai-Elec EGXX1 ELECTRICAL MAINT. Unit-2(GENEANC) C 5XX SXX 9999.12.31 ""G9876.PG1/0ABC XX ABXD CO. LTD 50MAABC PWABC ELEC SYS GEN 2011.12.15 INACTIVE DUMMY"
</code></pre>
 

Latest posts

B
Replies
0
Views
1
Blundering Ecologist
B
Top