OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Vectors in Redis Search index are corrupted even though index searches work correctly

  • Thread starter Thread starter magnanimousllamacopter
  • Start date Start date
M

magnanimousllamacopter

Guest
I have a redis cache using Redis Search and an HNSW index on a 512 element vector of float32 values.

It is defined like this:

Code:
schema = (
    VectorField(
        "vector",
        "HNSW",
        {
            "TYPE": "FLOAT32",
            "DIM": 512,
            "DISTANCE_METRIC": "IP",
            "EF_RUNTIME": 400,
            "EPSILON": 0.4
        },
        as_name="vector"
    ),
)

definition = IndexDefinition(prefix=[REDIS_PREFIX], index_type=IndexType.HASH)
res = client.ft(REDIS_INDEX_NAME).create_index(
    fields=schema, definition=definition
)

I can insert numpy float32 vectors into this index by writing the result of vector.tobytes() into them directly. I can then accurately query those same vectors using a vector similarity search.

Despite this working correctly, when I read these vectors out of the cache using client.hget(key, "vector") I get results that are a variable number of bytes. All of these vectors are definitely 512 elements when I insert them, but sometimes they come back as a number of bytes that isn't even a multiple of 4! I can't decode them back into a numpy vector at that point.

I can't tell if this is a bug, or if I'm doing something wrong. Either way, something clearly isn't right.
<p>I have a redis cache using Redis Search and an HNSW index on a 512 element vector of float32 values.</p>
<p>It is defined like this:</p>
<pre class="lang-py prettyprint-override"><code>schema = (
VectorField(
"vector",
"HNSW",
{
"TYPE": "FLOAT32",
"DIM": 512,
"DISTANCE_METRIC": "IP",
"EF_RUNTIME": 400,
"EPSILON": 0.4
},
as_name="vector"
),
)

definition = IndexDefinition(prefix=[REDIS_PREFIX], index_type=IndexType.HASH)
res = client.ft(REDIS_INDEX_NAME).create_index(
fields=schema, definition=definition
)
</code></pre>
<p>I can insert numpy float32 vectors into this index by writing the result of <code>vector.tobytes()</code> into them directly. I can then accurately query those same vectors using a vector similarity search.</p>
<p>Despite this working correctly, when I read these vectors out of the cache using <code>client.hget(key, "vector")</code> I get results that are a variable number of bytes. All of these vectors are definitely 512 elements when I insert them, but sometimes they come back as a number of bytes that isn't even a multiple of 4! I can't decode them back into a numpy vector at that point.</p>
<p>I can't tell if this is a bug, or if I'm doing something wrong. Either way, something clearly isn't right.</p>
 

Latest posts

Online statistics

Members online
1
Guests online
2
Total visitors
3
Top