Background:
This is part of a scheduled job that retrieves data from an external site (the external site provides an API for retrieving the data via web service) and updates a database with new information. It is retrieving approximately 3,500 data items. My current scheduled job creates blocks of CFThread
tasks that runs 10 threads at a time and joins them before starting the next block of 10.
Code:
<cfset local.nMaxThreadCount = 10>
<!---retrieve a query that contains the items that need to be updated, approximately 3,500 items--->
<cfset local.qryItemsNeedingUpdate = getItemsNeedingUpdate(dtMostRecentItemPriceDate = local.qryMostRecentItemPriceDate.dtMostRecentItemPrice[1])>
<cfset local.nThreadBlocks = Ceiling(local.qryItemsNeedingUpdate.RecordCount / local.nMaxThreadCount)>
<cftry>
<cfloop index="local.nThreadBlock" from="1" to="#local.nThreadBlocks#">
<cfif local.nThreadBlock EQ local.nThreadBlocks>
<cfset local.nThreadCount = local.qryItemsNeedingUpdate.RecordCount MOD local.nMaxThreadCount>
<cfelse>
<cfset local.nThreadCount = local.nMaxThreadCount>
</cfif>
<cfset local.lstThreads = "">
<cfloop index="local.nThread" from="1" to="#local.nThreadCount#">
<cfset local.nQryIdx = ((local.nThreadBlock - 1) * local.nMaxThreadCount) + local.nThread>
<cfset local.vcThreadName = "updateThread#local.qryItemsNeedingUpdate.nItemID[local.nQryIdx]#">
<cfset local.lstThreads = ListAppend(local.lstThreads, local.vcThreadName)>
<!---create the attributes struct to pass to a thread--->
<cfset local.stThread = StructNew()>
<cfset local.stThread.action = "run">
<cfset local.stThread.name = local.vcThreadName>
<cfset local.stThread.nItemID = local.qryItemsNeedingUpdate.nItemID[local.nQryIdx]>
<!---spawn thread--->
<cfthread attributecollection="#local.stThread#">
<cfset updateItemPrices(nItemID = attributes.nItemID)>
</cfthread>
</cfloop>
<!---join threads--->
<cfthread action="join" name="#local.lstThreads#" />
</cfloop>
<cfcatch type="any">
<cflog text="detailed error message logged here..." type="Error" file="myDailyJob" application="yes">
</cfcatch>
</cftry>
Questions:
Is this kind of logic needed for background processes? That is, is CFThread action="join"
needed? Nothing is displayed from the threads and the threads are independent (do not rely on the other threads or the process that spawned them). The threads update prices in a database and die. Is it necessary to throttle the threads, that is, run 10 at a time and join them? Could the process loop and create all 3,500 threads at once? Will ColdFusion queue the extra threads and run them as it has time?
"join" isn't necessary unless you need to output info to the page after threads complete.
Threads will queue; this varies by the version of ColdFusion you're running.
For what you're doing however, threads aren't what you want. You want to use a message queue, like ActiveMQ or Amazon SQS. You can use an event gateway like the ActiveMQ gateway that comes with Adobe CF, or write your own if you're working with a different message queue or CF engine. (For example, I wrote a messaging system that uses Amazon SQS and Railo event gateways, written in CFML)