OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

text file lines → table rows via existing stored procedure that inserts a single row

  • Thread starter Thread starter arbennett4
  • Start date Start date
A

arbennett4

Guest
Been struggling with this one. Long story short I have a problem where we need to read a file at different times throughout the month provided by the client. The file can be a few hundred lines, or over a million lines. I've searched and nothing quite hits the stored procedure call issue (I've seen using executemany, etc)

For each line, I need to read the line contents (delimited), and write the contents to a database. I currently have ALL of it working. The issue is timing. I took this out of an interface engine because it took 1 hr to read roughly 5000 lines. I created this application in python with the hopes to significantly speed up this process.

My ask is: what would be the best way to "multi-thread" this to increase the speed? Basically I need to read the file as fast as possible and get the data inserted into the DB as fast as possible. The bolded section below is the part I'd like to somehow multi-thread and have execute multiple at a time.

Code:
def ReadMetaDataFile (srcFile, fileDelimiter, clientID, mysqlCnx):
    functionResponse = ''
    path, filename = os.path.split(srcFile)

    try:
        with open(srcFile, 'r') as delimitedFile:
            fileReader = csv.DictReader(delimitedFile, delimiter=fileDelimiter)
            #next(fileReader, None)
            
            if(mysqlCnx and mysqlCnx.is_connected()):
                cursor = mysqlCnx.cursor()
                for row in fileReader:
                    **rowArgs = (clientID, row['FILENAME'], '', row['DATA_FORMAT'], 1)
                    cursor.callproc('sp_InsertMetaData', rowArgs)
                    mysqlCnx.commit()**
                cursor.close()
                mysqlCnx.close()
                functionResponse = "MetaData File Loaded Successfully"
    except mysql.connector.Error as err:
        functionResponse =  "Error processing MySQL SPROC call: " + str(err)
    except (mysql.connector.Error, IOError) as err:
        functionResponse =  "Error connecting to MySQL in function: " + str(err)
    except Exception as err:
        functionResponse =  "Exception Found in function: " + str(err)
    finally:
        mysqlCnx.close()
        return functionResponse

--UPDATE I've tried several different things and at this point I'm just setting up the sproc calls(not actually executing) and it's taking over 20 mins to read 20k rows from a file. Having millions of rows in some files this is simply too long. Any suggestions please.

Code:
def ReadMetaDataFile (srcFile, fileDelimiter, clientID, engine):
    functionResponse = ''
    path, filename = os.path.split(srcFile)

    try:
        query = 'CALL sp_InsertMetaData (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'
        with open(srcFile, 'r') as delimitedFile:
            fileReader = csv.DictReader(delimitedFile, delimiter=fileDelimiter)
            conn = engine.raw_connection()
            cursor = conn.cursor()
            for row in fileReader:
                rowArgs = (clientID, row['FILENAME'], '', row['DATA_FORMAT'], row['SOURCE_SYSTEM_NAME'], row['CONFIDENTIALITY_CODE'], row['STATUS'], row['DOCUMENT_TYPE_SOURCE_SYSTEM'], row['DOCUMENT_TYPE_ID'], row['DOCUMENT_TYPE_DESCRIPTION'], row['DATE_OF_SERVICE'], row['DOCUMENT_ID'], row['SOURCE_CREATED_DATE'], row['SOURCE_LAST_MODIFIED_DATE'], row['TIMEZONE'], row['MRNSOURCE_SYSTEM'], row['PATIENT_MRN'], row['MEMBER_NBR'], row['PATIENT_LAST_NAME'], row['PATIENT_FIRST_NAME'], row['PATIENT_MIDDLE_NAME'], row['GENDER'], row['PATIENT_DATE_OF_BIRTH'], row['ENCOUNTER_SOURCE'], row['ENCOUNTER_ID'], row['ENCOUNTER_TYPE'], row['ADMIT_TIME'], row['DISCHARGE_TIME'], row['FACILITY_NAME'], row['FACILITY_SOURCE_SYSTEM'], row['PROVIDER_TYPE'], row['PROVIDER_SOURCE_SYSTEM'], row['PROVIDER_IDENTIFIER'], row['PROVIDER_LAST_NAME'], row['PROVIDER_FIRST_NAME'], row['PROVIDER_MIDDLE_NAME'], row['PROVIDER_CREDENTIAL'], row['PROVIDER_SPECIALTY'], 1)
                cursor.callproc("sp_InsertMetaData", rowArgs)
            #conn.commit()
            cursor.close()

            functionResponse = "MetaData File Loaded Successfully"
<p>Been struggling with this one. Long story short I have a problem where we need to read a file at different times throughout the month provided by the client. The file can be a few hundred lines, or over a million lines. I've searched and nothing quite hits the stored procedure call issue (I've seen using executemany, etc)</p>
<p>For each line, I need to read the line contents (delimited), and write the contents to a database. I currently have ALL of it working. The issue is timing. I took this out of an interface engine because it took 1 hr to read roughly 5000 lines. I created this application in python with the hopes to significantly speed up this process.</p>
<p>My ask is: what would be the best way to "multi-thread" this to increase the speed? Basically I need to read the file as fast as possible and get the data inserted into the DB as fast as possible. The bolded section below is the part I'd like to somehow multi-thread and have execute multiple at a time.</p>
<pre class="lang-py prettyprint-override"><code>def ReadMetaDataFile (srcFile, fileDelimiter, clientID, mysqlCnx):
functionResponse = ''
path, filename = os.path.split(srcFile)

try:
with open(srcFile, 'r') as delimitedFile:
fileReader = csv.DictReader(delimitedFile, delimiter=fileDelimiter)
#next(fileReader, None)

if(mysqlCnx and mysqlCnx.is_connected()):
cursor = mysqlCnx.cursor()
for row in fileReader:
**rowArgs = (clientID, row['FILENAME'], '', row['DATA_FORMAT'], 1)
cursor.callproc('sp_InsertMetaData', rowArgs)
mysqlCnx.commit()**
cursor.close()
mysqlCnx.close()
functionResponse = "MetaData File Loaded Successfully"
except mysql.connector.Error as err:
functionResponse = "Error processing MySQL SPROC call: " + str(err)
except (mysql.connector.Error, IOError) as err:
functionResponse = "Error connecting to MySQL in function: " + str(err)
except Exception as err:
functionResponse = "Exception Found in function: " + str(err)
finally:
mysqlCnx.close()
return functionResponse
</code></pre>
<p>--UPDATE
I've tried several different things and at this point I'm just setting up the sproc calls(not actually executing) and it's taking over 20 mins to read 20k rows from a file. Having millions of rows in some files this is simply too long. Any suggestions please.</p>
<pre class="lang-py prettyprint-override"><code>def ReadMetaDataFile (srcFile, fileDelimiter, clientID, engine):
functionResponse = ''
path, filename = os.path.split(srcFile)

try:
query = 'CALL sp_InsertMetaData (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'
with open(srcFile, 'r') as delimitedFile:
fileReader = csv.DictReader(delimitedFile, delimiter=fileDelimiter)
conn = engine.raw_connection()
cursor = conn.cursor()
for row in fileReader:
rowArgs = (clientID, row['FILENAME'], '', row['DATA_FORMAT'], row['SOURCE_SYSTEM_NAME'], row['CONFIDENTIALITY_CODE'], row['STATUS'], row['DOCUMENT_TYPE_SOURCE_SYSTEM'], row['DOCUMENT_TYPE_ID'], row['DOCUMENT_TYPE_DESCRIPTION'], row['DATE_OF_SERVICE'], row['DOCUMENT_ID'], row['SOURCE_CREATED_DATE'], row['SOURCE_LAST_MODIFIED_DATE'], row['TIMEZONE'], row['MRNSOURCE_SYSTEM'], row['PATIENT_MRN'], row['MEMBER_NBR'], row['PATIENT_LAST_NAME'], row['PATIENT_FIRST_NAME'], row['PATIENT_MIDDLE_NAME'], row['GENDER'], row['PATIENT_DATE_OF_BIRTH'], row['ENCOUNTER_SOURCE'], row['ENCOUNTER_ID'], row['ENCOUNTER_TYPE'], row['ADMIT_TIME'], row['DISCHARGE_TIME'], row['FACILITY_NAME'], row['FACILITY_SOURCE_SYSTEM'], row['PROVIDER_TYPE'], row['PROVIDER_SOURCE_SYSTEM'], row['PROVIDER_IDENTIFIER'], row['PROVIDER_LAST_NAME'], row['PROVIDER_FIRST_NAME'], row['PROVIDER_MIDDLE_NAME'], row['PROVIDER_CREDENTIAL'], row['PROVIDER_SPECIALTY'], 1)
cursor.callproc("sp_InsertMetaData", rowArgs)
#conn.commit()
cursor.close()

functionResponse = "MetaData File Loaded Successfully"
</code></pre>
 

Latest posts

S
Replies
0
Views
1
Safwan Aipuram
S
Top