OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to send arrow data from FastAPI to the JS Apache Arrow package without copying?

  • Thread starter Thread starter Dean MacGregor
  • Start date Start date
D

Dean MacGregor

Guest
Let's say generically my setup is like this:

Code:
from fastapi import FastAPI, Response
import pyarrow as pa
import pyarrow.ipc as ipc
app = FastAPI()


@app.get("/api/getdata")
async def getdata():
    table = pa.Table.from_pydict({
      "name": ["Alice", "Bob", "Charlie"], 
      "age": [25, 30, 22]})

    ### Not really sure what goes here
    ## something like this...
sink = io.BytesIO()
with ipc.new_file(sink, table.schema) as writer:
    for batch in table.to_batches():
        writer.write(batch)
sink.seek(0)
return StreamingResponse(content=sink, media_type="application/vnd.apache.arrow.file")

This works but I'm copying the whole table to BytesIO first? It seems like what I need to do is make a generator that yields whatever writer.write(batch) writes to the Buffer instead of actually writing it but I don't know how to do that. I tried using the pa.BufferOutputStream instead of BytesIO but I can't put that in as a return object for fastapi.

My goal is to be able to get the data on the js side like this...

Code:
import { tableFromIPC } from "apache-arrow";
const table = await tableFromIPC(fetch("/api/getdata"));
console.table([...table]);

In my approach, this works, I'd just like to know if there's a way to do this without the copying.
<p>Let's say generically my setup is like this:</p>
<pre><code>from fastapi import FastAPI, Response
import pyarrow as pa
import pyarrow.ipc as ipc
app = FastAPI()


@app.get("/api/getdata")
async def getdata():
table = pa.Table.from_pydict({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 22]})

### Not really sure what goes here
## something like this...
sink = io.BytesIO()
with ipc.new_file(sink, table.schema) as writer:
for batch in table.to_batches():
writer.write(batch)
sink.seek(0)
return StreamingResponse(content=sink, media_type="application/vnd.apache.arrow.file")


</code></pre>
<p>This <em>works</em> but I'm copying the whole table to BytesIO first? It seems like what I need to do is make a generator that yields whatever <code>writer.write(batch)</code> writes to the Buffer instead of actually writing it but I don't know how to do that. I tried using the <code>pa.BufferOutputStream</code> instead of BytesIO but I can't put that in as a return object for fastapi.</p>
<p>My goal is to be able to get the data on the js side like this...</p>
<pre><code>import { tableFromIPC } from "apache-arrow";
const table = await tableFromIPC(fetch("/api/getdata"));
console.table([...table]);
</code></pre>
<p>In my approach, this works, I'd just like to know if there's a way to do this without the copying.</p>
 

Latest posts

Online statistics

Members online
0
Guests online
3
Total visitors
3
Ads by Eonads
Top