What is the optimal number of processes per core? Say you're given a machine with 2 CPUs and 4 cores each, what is the number of processes that will give you the best performance?
Related Questions in PERFORMANCE
- Upsert huge amount of data by EFCore.BulkExtensions
- How can I resolve this error and work smoothly in deep learning?
- Efficiently processing many small elements of a collection concurrently in Java
- Theme Preloader for speed optimization in WordPress
- I need help to understand the time wich my simple ''hello world'' is taking to execute
- Non-blocking state update
- Do conditional checks cause bottlenecks in Javascript?
- Performance of sketch drastically decreases outside of the P5 Web Editor
- sample query for review for improvement on big query
- Is there an indexing strategy in Postgres which will operate effectively for JOINs with ORs
- Performance difference between two JavaScript code snippets for comparing arrays of strings
- C++ : Is there an objective universal way to compare the speed of iterative algorithms?
- How to configure api http request with load testing
- the difference in terms of performance two types of update in opensearch
- Sveltekit : really long to send the first page and intense CPU computation
Related Questions in CONCURRENCY
- Unexpected inter-thread happens-before relationships from relaxed memory ordering
- Multiple Processes, Multiple Processors, Single Priority Queue - Java Thread-Safe and Concurrency -
- Efficiently processing many small elements of a collection concurrently in Java
- Zig Concurrency Vs Erlang Concurrency, is Zig less efficient than Erlang?
- Two Update statements on a row are running simultaneously with no locking in MYSQL
- How to Identify Specific Transaction Anomalies in a Given Schedule?
- How can I improve concurrent message processing with Google Task Queue?
- Why does the following program printf "thread 1 exists" twice in WSL2?
- ModelState.IsValid is false when its Data Model Concurrency Token is non nullable
- .NET A second operation was started on this context instance before a previous operation completed
- Can someone tell me what's wrong with mi Task.await?
- I am a beginner. More than problems, I have ideas I share my code ;D. NO RULES
- Understanding Potential Deadlock in Resource Pool Implementation Described in "Go in Action"
- Why are pre-allocated stacks expensive, given 64-bit virtual memory?
- Concurrency issues with server-sent events in Python
Related Questions in PARALLEL-PROCESSING
- How to calculate Matrix exponential with Tailor series PARALLEL using MPI c++
- Efficiently processing many small elements of a collection concurrently in Java
- Parallelize filling of Eigen Matrix in C++
- Memory efficient parallel repeated rarefaction with subsequent matrix addition of large data set
- How to publish messages to RabbitMQ by using Multi threading?
- Running a C++ Program with CMake, MPI and OpenCV
- Alternative approach to io.ReadAll to store memory consumption and send a PUT Request with valid data
- Parallelize nested loop with running sum in Fortran
- Can I use parfor within a parfeval in Matlab R2019b and if yes how?
- Parallel testing with cucumber, selenium and junit 5
- Parallel.ForEach vs ActionBlock
- Passing variable to foreach-object -parallel which is with in start-job
- dbatools SQL Functions Not Running In Parallel While SQL Server queries do in Powershell
- How do I run multiple instances of my Powershell function in parallel?
- Joblib.parallel vs concurrent.futures
Related Questions in CPU-CORES
- Joblib.parallel vs concurrent.futures
- Python multiprocessing Pool.map uses all cores instead of the specified number
- Number of cores used doing parallel programming in R
- 16 cores, yet performance plateaus when computing inner product with >= 4 threads. What's happening?
- Why CPU cores perform differently with the same task under multiprocessing?
- Count the number of CPU cores using pthread_setaffinity_np
- Does hyperthreading have an bad impact on a core which is fully occupied
- What happens when I create several threads with a single core CPU?
- Get number of cores and core IDs in foreach
- How can I set docker affinity in docker correctly?
- Limiting the cores to use for running a program inside a container
- How can I set or limit the number of cores that a trial is allowed to use?
- Which process is running on which cpu core?
- In multi core embedded Rust, can I use a static mut for one way data sharing?
- Reading x86_cpu_to_apicid, Linux kernel mapping between cores and APIC IDs
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
The answer is naturally - it depends. Obviously if you're interested in the performance of a certain single-threaded application, other processes just clutter your machine and compete over the shared resources. So let's look at two cases where this question may be interesting:
The second case is easier to answer, it (.. wait for it ..) depends on what you're running! If you have locks, more threads may lead to higher contention and conflicts. If you're lock free (or even some flavors of wait-free), you may still have fairness issues. It also depends on how the work is balanced internally in your application, or how your task schedulers work. There are simply too many possible solutions out there today.
If we assume you have perfect balancing between your threads, and no overhead for increased number, you can perhaps align this with the other use case where you simply run multiple independent processes. In that case, performance may have several sweet spots. The first is when you reach the number of physical cores (in your case 8, assuming you have 4 physical cores per socket). At that point, you're saturating your existing HW to the max. However, if you have some SMT mechanism (like Hyperthreading) supported, you may extend the overall number of cores by 2x, using 2 logical cores per each physical one. This doesn't add any resource into the story, it just splits the existing ones, which may have some penalty over the execution of each process, but on the other hand can run 2x processes simultaneously.
The overall aggregated speedup may vary, but i've seen number of up to 30% on average on generic benchmarks. As a thumbrule, processes that are memory latency bound or have complicated control flow, can benefit from this since the core can still progress when one thread is blocked. Code that is more oriented on execution bandwidth (like heavy floating point calculations) or memory bandwidth, wouldn't gain as much.
Beyond that number of processes, it may still be beneficial in some cases to add more processes - they won't run in parallel but if the overhead for context switches isn't too high, and you want to minimize the average wait time (which it also a way to look at performance that's not pure IPC), or you depend on communicating output out as early as possible - there are scenarios where this is useful.
One last point - the "optimal" number of processes may be even less than the number cores if your processes saturate other resources before reaching that point. If for example each thread requires a huge chunk virtual memory, you may start thrashing pages and page them off (painful penalty). If each thread has a large data-set which is uses over and over, you could fill up your shared cache and start losing from that point by adding more threads. Same goes for heavy IO, and so on.
As you can see, there's no right or wrong answer here, you simply need to benchmark your code over different systems.