I have a network of Java Threads (Flow-Based Programming) communicating via fixed-capacity channels - running under WindowsXP. What we expected, based on our experience with "green" threads (non-preemptive), would be that threads would switch context less often (thus reducing CPU time) if the channels were made bigger. However, we found that increasing channel size does not make any difference to the run time. What seems to be happening is that Java decides to switch threads even though channels aren't full or empty (i.e. even though a thread doesn't have to suspend), which costs CPU time for no apparent advantage. Also changing Thread priorities doesn't make any observable difference.
My question is whether there is some way of persuading Java not to make unnecessary context switches, but hold off switching until it is really necessary to switch threads - is there some way of changing Java's dispatching logic? Or is it reacting to something I didn't pay attention to?! Or are there other asynchronism mechanisms, e.g. Thread factories, Runnable(s), maybe even daemons (!). The answer appears to be non-obvious, as so far none of my correspondents has come up with an answer (including most recently two CS profs). Or maybe I'm missing something that's so obvious that people can't imagine my not knowing it...
I've added the send and receive code here - not very elegant, but it seems to work...;-) In case you are wondering, I thought the goLock logic in 'send' might be causing the problem, but removing it temporarily didn't make any difference. I have added the code for send and receive...
public synchronized Packet receive() {
if (isDrained()) {
return null;
}
while (isEmpty()) {
try {
wait();
} catch (InterruptedException e) {
close();
return null;
}
if (isDrained()) {
return null;
}
}
if (isDrained()) {
return null;
}
if (isFull()) {
notifyAll(); // notify other components waiting to send
}
Packet packet = array[receivePtr];
array[receivePtr] = null;
receivePtr = (receivePtr + 1) % array.length;
//notifyAll(); // only needed if it was full
usedSlots--;
packet.setOwner(receiver);
if (null == packet.getContent()) {
traceFuncs("Received null packet");
} else {
traceFuncs("Received: " + packet.toString());
}
return packet;
}
synchronized boolean send(final Packet packet, final OutputPort op) {
sender = op.sender;
if (isClosed()) {
return false;
}
while (isFull()) {
try {
wait();
} catch (InterruptedException e) {
indicateOneSenderClosed();
return false;
}
sender = op.sender;
}
if (isClosed()) {
return false;
}
try {
receiver.goLock.lockInterruptibly();
} catch (InterruptedException ex) {
return false;
}
try {
packet.clearOwner();
array[sendPtr] = packet;
sendPtr = (sendPtr + 1) % array.length;
usedSlots++; // move this to here
if (receiver.getStatus() == StatusValues.DORMANT || receiver.getStatus() == StatusValues.NOT_STARTED) {
receiver.activate(); // start or wake up if necessary
} else {
notifyAll(); // notify receiver
// other components waiting to send to this connection may also get
// notified,
// but this is handled by while statement
}
sender = null;
Component.network.active = true;
} finally {
receiver.goLock.unlock();
}
return true;
}
thanks for asking! I have been discussing the same question on the Sun forum, and here is my last post on that forum:
Our best guess right now is that this effect results from Windows' scheduling logic.
Microsoft seems to be acknowledging that this area needs some improvement as it is introducing UMS - I quote: "UMS is recommended for applications with high performance requirements that need to efficiently run many threads concurrently on multiprocessor or multicore systems. ... UMS is available starting with 64-bit versions of Windows 7 and Windows Server 2008 R2. This feature is not available on 32-bit versions of Windows." Hopefully, Java will take advantage of UMS in some later release.
Thanks for your help!
I'm a bit embarrassed - it suddenly occurred to me this afternoon that maybe the network whose performance I was worried about was just too simple, as I only had two processes, and two processors. So Windows may have been trying too hard to keep the processors balanced! So I wondered what would happen if I gave Windows lots of processes.
I set up two networks:
a) 50 Generate components feeding 50 Discard components - i.e. highly parallel network - so that's 100 threads in total
b) 50 Generate components feeding 1 Discard component - i.e. highly "funnelled" network - so that's 51 threads
I ran each one 6 times with a connection capacity of 10, and 6 times with a connection capacity of 100. Every run generated a total of 50 * 20,000 information packets, for a total of 1,000,000 packets, and ran for about 1 minute..
Here are the averages of the 4 cases: a) with connection capacity of 10 - 59.151 secs. a) with connection capacity of 100 - 52.008 secs.
b) with connection capacity of 10 - 76.745 secs. b) with connection capacity of 100 - 60.667 secs.
So it looks like the connection capacity does make a difference! And, it looks like JavaFBP performs reasonably well... I apologize for being a bit hasty - but maybe it made us all think a bit more deeply about multithreading in a multicore machine... ;-)
Apologies again, and thanks to everyone who contributed thoughts on this topic!