Currently I’m querying two separate tables in a SQL database – one from which I get "data" from my_table
and the other from which I get "units" from a config
table. The units correspond 1:1 with the values in the data, i.e., each column of data values has a corresponding unit.
Because the query_units
returns a single column of info, I end up transposing the resulting DataFrame
with T
. Then I concat
that transposed DataFrame
with my main data to end up with my desired result, i.e. a row of "units" followed by the rest of the data rows (see the example at the bottom).
Ultimately I’m trying to determine if there’s a more efficient way to handle this setup.
Running two separate queries and concatenating the results (albeit with some transposition) works, but I’m wondering if there’s a way to, say, handle this with a single SQL query which I can use to get a single DataFrame
.
import pandas as pd
import pyodbc
# connect to database
conn = pyodbc.connect(('<connection info here>'))
# query to get all data from 'my_table' between these two timestamps
query_data = "SELECT * FROM my_table WHERE timestamp BETWEEN '2024/10/24 11:00:00' AND '2024/10/24 12:00:00'"
# query to get units from the 'config' table for the items in 'my_table'
query_units = "SELECT units FROM config WHERE t_name=my_table"
# get a DataFrame containing the requested data
df = pd.read_sql_query(query_data, conn, 'timestamp')
# and a DataFrame of the units for each data item, transposed because this returns a
# single column and I want it as a row (using the data column names as its index)
eu = pd.read_sql_query(query_units, conn, index=df.columns).T
# combine (concat) the units row and the rest of the data, joined along 'df.columns'
# (the name of each data value)
df = concat(objs=(eu, df))
This is an example result of the above (NB: this is correct)
data1 data2 data3 ... dataN
units volts amps temp ... volts (this is the row that gets inserted)
11:00 10 5 69 ... 9
11:30 11 5 70 ... 10
12:00 12 6 72 ... 9
You need to sign in to view this answers