Curently Im working on caching for out chatbot I resided to use native redis similarity search. Their node-redis librarie provide interface to do it. Ando also LangchainJS has options for caching, but it wont work as I need it, because I already have embeddings to save and search with, and Embedding class wont work (maybe it can, but this will be crutches), so I desided to write custom solution with node-redis. And here the problem...
Here in their documentation they have examples how to work with KNN queries
FT.SEARCH idx "*=>[KNN 10 @vec $BLOB]" PARAMS 2 BLOB "\x12\xa9\xf5\x6c" DIALECT 2
and it work (at least there is no errors) if I use it in termila or RedisInsight, but when I'm trying to run this query with nodejs it throws error [ErrorReply: Syntax error at offset 2 near > ]
async function searchSimilar(vector: number[]) {
const query = `* => [KNN 10 @embedding $BLOB]`;
const options = {
PARAMS: {
BLOB: Buffer.from(new Float32Array(vector).buffer),
DIALECT: 2,
},
};
const result = await client.ft.search("idx:answers", query, options);
return result;
}
Maybe I didn't understand how it works... Here the example how it work in Langchain
async similaritySearchVectorWithScore(query, k, filter) {
if (filter && this.filter) {
throw new Error("cannot provide both `filter` and `this.filter`");
}
const _filter = filter ?? this.filter;
const results = await this.redisClient.ft.search(this.indexName, ...this.buildQuery(query, k, _filter));
const result = [];
if (results.total) {
for (const res of results.documents) {
if (res.value) {
const document = res.value;
if (document.vector_score) {
result.push([
new Document({
pageContent: document[this.contentKey],
metadata: JSON.parse(this.unEscapeSpecialChars(document.metadata)),
}),
Number(document.vector_score),
]);
}
}
}
}
return result;
}
buildQuery(query, k, filter) {
const vectorScoreField = "vector_score";
let hybridFields = "*";
// if a filter is set, modify the hybrid query
if (filter && filter.length) {
// `filter` is a list of strings, then it's applied using the OR operator in the metadata key
// for example: filter = ['foo', 'bar'] => this will filter all metadata containing either 'foo' OR 'bar'
hybridFields = `@${this.metadataKey}:(${this.prepareFilter(filter)})`;
}
const baseQuery = `${hybridFields} => [KNN ${k} @${this.vectorKey} $vector AS ${vectorScoreField}]`;
const returnFields = [this.metadataKey, this.contentKey, vectorScoreField];
const options = {
PARAMS: {
vector: this.getFloat32Buffer(query),
},
RETURN: returnFields,
SORTBY: vectorScoreField,
DIALECT: 2,
LIMIT: {
from: 0,
size: k,
},
};
return [baseQuery, options];
}
getFloat32Buffer(vector) {
return Buffer.from(new Float32Array(vector).buffer);
}
Here I found one more example in Python, and I can't understand why my query doesn't work
def create_query(
return_fields: list,
search_type: str="KNN",
number_of_results: int=20,
vector_field_name: str="img_vector",
gender: t.Optional[str] = None,
category: t.Optional[str] = None
):
tag = "("
if gender:
tag += f"@gender:{{{gender}}}"
if category:
tag += f"@category:{{{category}}}"
tag += ")"
# if no tags are selected
if len(tag) < 3:
tag = "*"
base_query = f'{tag}=>[{search_type} {number_of_results} @{vector_field_name} $vec_param AS vector_score]'
return Query(base_query)\
.sort_by("vector_score")\
.paging(0, number_of_results)\
.return_fields(*return_fields)\
.dialect(2)
I've read documentations, searched internet, opened issues and discussions on GitHub and researched build of node js libraries
Take a look at this example. The
DIALECT
attribute should appear on its own, and not as part of thePARAMS
, which should include only parameters for the query itself (like theBLOB
in your example).Basically you passed the
DIALECT
as a parameter for the query (even though it is not used there), while the parsing itself was done using the default dialect, which is currently 1 (can be changed withFT.CONFIG SET DEFAULT_DIALECT <n>
). KNN search is not available in dialect 1 so you get a syntax error.Hope that helps!