I am using Coldfusion, to do a site search for my company. I have a stored procedure being called within a cfc, that returns all results for a keyword, no limits.
Then I do a subquery that either restricts what data I have access to based on security settings, or limits it based on me filtering the result to 1 of 5 subsets of data.
Now we are searching the database using a view, and that view is a union all query, allowing me to search multiple tables at once, and the result returns columns of "id, type, title, url, rank". With different rank values for how it found this specific results, which ends up with duplicate results with different rank values....
Now my boss, wants me to add up all the rank values, update 1 record, and remove the rest of the duplicates...
So for example, if i was searching for hte word business
And I had different results like, found in title, +500, exact match for title +1000, found in description +200
But the problem is that when I tried to loop thru all thru all the results, it added significant performance drain.
So I am looking for an alternative way to sum up the rank values, and then get rid of the results.
Below is my basic logic flow for the whole process
Do Stored Procedure which searches the view for a keyword, searches in different fields in different ways generating duplicate results with different rank values.
Eliminates search results of those that i do not have access to, and if i had chosen to filter results into a certain subset of results, like in books, groceries, plants, make up any categories you like :)
Here i will show you the code:
<cfloop query="get_matches">
<!--- check if this already exists --->
<cfquery name="check_match" dbtype="query">
select id, sum(rank) as total
from get_matches
where url = '#get_matches.url#'
group by id
</cfquery>
<cfif check_match.total gt 1>
<!--- add the two rank values --->
<cfset my_rank = val(check_match.total) />
<!--- update the search results --->
<cfset get_matches.rank[get_matches.currentrow] = javacast("int",my_rank) />
<!--- get a list of rows that has that url --->
<!--- eliminate all other rows --->
<cfquery name="get_matches_bot" dbtype="query">
select id, type, title, url, rank
from get_matches
where url <> '#get_matches.url#'
group by id, type, title, url, rank
</cfquery>
<cfquery name="get_matches_top" dbtype="query">
select id, type, title, url, rank
from get_matches
where url = '#get_matches.url#'
and rank = #my_rank#
group by id, type, title, url, rank
</cfquery>
<cfquery name="get_matches" dbtype="query">
select id, type, title, url, rank
from get_matches_top
union
select id, type, title, url, rank
from get_matches_bot
group by id, type, title, url, rank
order by rank desc, title asc
</cfquery>
</cfif>
</cfloop>
Then after all that is done to help both get unique rows that have total number of rank values for where they had records in the #2 results.
<cfquery name="get_matches" dbtype="query">
select id, type, title, url, rank
from get_matches
group by id, type, title, url, rank
order by rank desc, title asc
</cfquery>
There has to be a better way, performance wise, to sum the rank values, get rid of duplicate row's without looping.
Any thoughts or suggestions?
I figured out the answer was to do a sum, around the rank field, and that made it fast.