OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Dataframe to SQLAlchemy Update Statement Improvement

  • Thread starter Thread starter max
  • Start date Start date
M

max

Guest
I created a method that updates a table called Trades from the content of a dataframe. Therefore I pass the index_cols (required for the where statement) and the update_cols (required for the set statement). The method then iterates through each row and then updates only one row inside the sql table (each row in the df represents a row in sql and thats why I have to iterate through the df).

The problem I have now is the following:

  • The update statement is rather slow, it takes a lot of time to update the values inside the table.
  • I tried different chunk sizes, but there was not really an improvement

Are there any improvement possibilities? Or are there best practices I didn't follow?

Code:
def update_trades_from_df(session_maker: sessionmaker, df: pd.DataFrame, index_cols: str | list[str],
                          update_cols: str | list[str] = None, chunk_size: int = 10000):
    if isinstance(index_cols, str):
        index_cols = [index_cols]

    if isinstance(update_cols, str):
        update_cols = [update_cols]

    valid_columns = Trades.__table__.columns.keys()
    if not update_cols:
        update_cols = [col for col in df.columns if col not in index_cols and col in valid_columns]

    with session_maker() as session:
        for chunk in chunk_using_generators(df, chunk_size):
            for idx, row in chunk.iterrows():
                filter_conditions = [getattr(Trades, col) == row[col] for col in index_cols]
                filter_condition = and_(*filter_conditions)
                values = {col: row[col] for col in update_cols}
                stmt = update(Trades).where(filter_condition).values(**values)
                logger.debug(stmt.compile(bind=get_db_engine(), dialect=mssql.dialect(), compile_kwargs={"literal_binds": True}))
                session.execute(stmt)
            session.commit()
<p>I created a method that updates a table called <code>Trades</code> from the content of a dataframe. Therefore I pass the <code>index_cols</code> (required for the where statement) and the <code>update_cols</code> (required for the set statement). The method then iterates through each row and then updates only one row inside the sql table (each row in the df represents a row in sql and thats why I have to iterate through the df).</p>
<p>The problem I have now is the following:</p>
<ul>
<li>The update statement is rather slow, it takes a lot of time to update the values inside the table.</li>
<li>I tried different chunk sizes, but there was not really an improvement</li>
</ul>
<p>Are there any improvement possibilities? Or are there best practices I didn't follow?</p>
<pre class="lang-py prettyprint-override"><code>def update_trades_from_df(session_maker: sessionmaker, df: pd.DataFrame, index_cols: str | list[str],
update_cols: str | list[str] = None, chunk_size: int = 10000):
if isinstance(index_cols, str):
index_cols = [index_cols]

if isinstance(update_cols, str):
update_cols = [update_cols]

valid_columns = Trades.__table__.columns.keys()
if not update_cols:
update_cols = [col for col in df.columns if col not in index_cols and col in valid_columns]

with session_maker() as session:
for chunk in chunk_using_generators(df, chunk_size):
for idx, row in chunk.iterrows():
filter_conditions = [getattr(Trades, col) == row[col] for col in index_cols]
filter_condition = and_(*filter_conditions)
values = {col: row[col] for col in update_cols}
stmt = update(Trades).where(filter_condition).values(**values)
logger.debug(stmt.compile(bind=get_db_engine(), dialect=mssql.dialect(), compile_kwargs={"literal_binds": True}))
session.execute(stmt)
session.commit()
</code></pre>
 

Latest posts

Í
Replies
0
Views
1
Íñigo Enrique Hernández
Í
A
Replies
0
Views
1
ahmed siala
A
Top