Why does sleeping increase the execution time of an independent piece of code that is executed before/afterwards?Improve INSERT-per-second performance of SQLite?Why does Python code run faster in a function?Python time.sleep vs busy wait accuracyWhy does GCC generate 15-20% faster code if I optimize for size instead of speed?Strange (huge) performance difference between std::vector, QList and std::listReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy does this piece of code gets slower with time?Perf overcounting simple CPU-bound loop: mysterious kernel work?C++ What does the execution time of a class method depend on?Why is if (variable1 % variable2 == 0) inefficient?

How many Jimmys can fit?

Is this Cambridge Dictionary example of "felicitate" valid?

Interpretation of non-significant results as "trends"

How to configure apt in Debian Buster after release

What was the nature of the known bugs in the Space Shuttle software?

Did Rabbi Akiva accept arguments from ignorance?

Is there a method for differentiating informative comments from commented out code?

Why did Dumbledore ignore this line?

What could cause the sea level to massively decrease?

Found and corrected a mistake on someone's else paper -- praxis?

How do ballistic trajectories work in a rotating cylinder world?

Why is a mixture of two normally distributed variables only bimodal if their means differ by at least two times the common standard deviation?

Why AI became applicable only after Nvidia's chips were available?

Users forgetting to regenerate PDF before sending it

QR codes, do people use them?

What exactly is a "murder hobo"?

Can Jimmy hang on his rope?

Why does Trump want a citizenship question on the census?

Why does sleeping increase the execution time of an independent piece of code that is executed before/afterwards?

Why won't the U.S. sign a peace treaty with North Korea?

Why different specifications for telescopes and binoculars?

Replacement for Thyme

What do you call a situation where you have choices but no good choice?

When I press the space bar it deletes the letters in front of it



Why does sleeping increase the execution time of an independent piece of code that is executed before/afterwards?


Improve INSERT-per-second performance of SQLite?Why does Python code run faster in a function?Python time.sleep vs busy wait accuracyWhy does GCC generate 15-20% faster code if I optimize for size instead of speed?Strange (huge) performance difference between std::vector, QList and std::listReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy does this piece of code gets slower with time?Perf overcounting simple CPU-bound loop: mysterious kernel work?C++ What does the execution time of a class method depend on?Why is if (variable1 % variable2 == 0) inefficient?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








8















I noticed something really strange that I haven't seen before. The basic setup is described in this pseudo code:



TARGET_LOOP_TIME = X

loop forever:
before = now()
payload()
payload_time = now() - before
sleep(TARGET_LOOP_TIME - payload_time)


This setup is fairly common, e.g. for keeping the loop at 60 FPS. The interesting part is: the payload_time depends on the sleep duration! If the TARGET_LOOP_TIME is high and the program will thus sleep a lot, the payload_time is way higher compared to when the program does not sleep at all.



To measure this, I wrote this program:



use std::time::Duration, Instant;

const ITERS: usize = 100;

fn main() Duration::from_millis(n))
let mut payload_duration = Duration::from_millis(0);

for _ in 0..ITERS
let before = Instant::now();
x += count_primes(3_500);
let elapsed = before.elapsed();
payload_duration += elapsed;

// Sleep the remaining time
if loop_time > elapsed
std::thread::sleep(loop_time - elapsed);



let avg_duration = payload_duration / ITERS as u32;
println!("loop_time :.2? t=> :.2?", loop_time, avg_duration);


println!("", x);


/// Dummy function.
fn count_primes(up_to: u64) -> u64 n


I iterate over different target loop times to test (1ms to 30ms) and iterate for ITERS many times. I compiled this with cargo run --release. On my machine (Ubuntu), the program outputs:



loop_time 1.00ms => 3.37ms
loop_time 2.00ms => 3.38ms
loop_time 3.00ms => 3.17ms
loop_time 4.00ms => 3.25ms
loop_time 5.00ms => 3.38ms
loop_time 6.00ms => 4.05ms
loop_time 7.00ms => 4.09ms
loop_time 8.00ms => 4.48ms
loop_time 9.00ms => 4.43ms
loop_time 10.00ms => 4.22ms
loop_time 11.00ms => 4.59ms
loop_time 12.00ms => 5.53ms
loop_time 13.00ms => 5.82ms
loop_time 14.00ms => 6.18ms
loop_time 15.00ms => 6.32ms
loop_time 16.00ms => 6.96ms
loop_time 17.00ms => 8.00ms
loop_time 18.00ms => 7.97ms
loop_time 19.00ms => 8.28ms
loop_time 20.00ms => 8.75ms
loop_time 21.00ms => 9.70ms
loop_time 22.00ms => 9.57ms
loop_time 23.00ms => 10.48ms
loop_time 24.00ms => 10.29ms
loop_time 25.00ms => 10.31ms
loop_time 26.00ms => 10.82ms
loop_time 27.00ms => 10.84ms
loop_time 28.00ms => 10.82ms
loop_time 29.00ms => 10.91ms


I made a plot of those numbers (the sleep_time is max(0, loop_time - avg_duration)):



enter image description here



When the program does not sleep at all, the payload needs approximately 3.3ms (as the first three measurements show). As soon as the loop starts to sleep after the payload, the payload duration rises! In fact, it increases to roughly 10.5ms where it stays. Sleeping even longer does not increase the payload time.



Why? Why does the execution time of the piece of code depend on something I do afterwards (or before)? That does not make sense to me! It looks like the CPU says "I'm gonna sleep afterwards anyway, so let's take it slowly". I thought about caching effects, especially of the instruction cache, but loading instruction data from the main memory does not take 7ms! Something else is going on here!



Is there a way to fix this? I.e. to make the payload execute as quickly as possible regardless of sleep time?










share|improve this question
























  • Couldn't the compiler optmize out count_primes? It seems to be a pure function and is called with the same parameters every iteration. It should also be possible to compute x once at compile time.

    – Jens
    8 hours ago











  • @Jens That's a good point. It doesn't seem like it's optimized out, but yeah, I should fix that.

    – Lukas Kalbertodt
    8 hours ago











  • Note for everyone: it seems like this is due to CPU frequency scaling and it doesn't happen on all machines. This is still not a full answer however and I hope someone could provide more information.

    – Lukas Kalbertodt
    8 hours ago

















8















I noticed something really strange that I haven't seen before. The basic setup is described in this pseudo code:



TARGET_LOOP_TIME = X

loop forever:
before = now()
payload()
payload_time = now() - before
sleep(TARGET_LOOP_TIME - payload_time)


This setup is fairly common, e.g. for keeping the loop at 60 FPS. The interesting part is: the payload_time depends on the sleep duration! If the TARGET_LOOP_TIME is high and the program will thus sleep a lot, the payload_time is way higher compared to when the program does not sleep at all.



To measure this, I wrote this program:



use std::time::Duration, Instant;

const ITERS: usize = 100;

fn main() Duration::from_millis(n))
let mut payload_duration = Duration::from_millis(0);

for _ in 0..ITERS
let before = Instant::now();
x += count_primes(3_500);
let elapsed = before.elapsed();
payload_duration += elapsed;

// Sleep the remaining time
if loop_time > elapsed
std::thread::sleep(loop_time - elapsed);



let avg_duration = payload_duration / ITERS as u32;
println!("loop_time :.2? t=> :.2?", loop_time, avg_duration);


println!("", x);


/// Dummy function.
fn count_primes(up_to: u64) -> u64 n


I iterate over different target loop times to test (1ms to 30ms) and iterate for ITERS many times. I compiled this with cargo run --release. On my machine (Ubuntu), the program outputs:



loop_time 1.00ms => 3.37ms
loop_time 2.00ms => 3.38ms
loop_time 3.00ms => 3.17ms
loop_time 4.00ms => 3.25ms
loop_time 5.00ms => 3.38ms
loop_time 6.00ms => 4.05ms
loop_time 7.00ms => 4.09ms
loop_time 8.00ms => 4.48ms
loop_time 9.00ms => 4.43ms
loop_time 10.00ms => 4.22ms
loop_time 11.00ms => 4.59ms
loop_time 12.00ms => 5.53ms
loop_time 13.00ms => 5.82ms
loop_time 14.00ms => 6.18ms
loop_time 15.00ms => 6.32ms
loop_time 16.00ms => 6.96ms
loop_time 17.00ms => 8.00ms
loop_time 18.00ms => 7.97ms
loop_time 19.00ms => 8.28ms
loop_time 20.00ms => 8.75ms
loop_time 21.00ms => 9.70ms
loop_time 22.00ms => 9.57ms
loop_time 23.00ms => 10.48ms
loop_time 24.00ms => 10.29ms
loop_time 25.00ms => 10.31ms
loop_time 26.00ms => 10.82ms
loop_time 27.00ms => 10.84ms
loop_time 28.00ms => 10.82ms
loop_time 29.00ms => 10.91ms


I made a plot of those numbers (the sleep_time is max(0, loop_time - avg_duration)):



enter image description here



When the program does not sleep at all, the payload needs approximately 3.3ms (as the first three measurements show). As soon as the loop starts to sleep after the payload, the payload duration rises! In fact, it increases to roughly 10.5ms where it stays. Sleeping even longer does not increase the payload time.



Why? Why does the execution time of the piece of code depend on something I do afterwards (or before)? That does not make sense to me! It looks like the CPU says "I'm gonna sleep afterwards anyway, so let's take it slowly". I thought about caching effects, especially of the instruction cache, but loading instruction data from the main memory does not take 7ms! Something else is going on here!



Is there a way to fix this? I.e. to make the payload execute as quickly as possible regardless of sleep time?










share|improve this question
























  • Couldn't the compiler optmize out count_primes? It seems to be a pure function and is called with the same parameters every iteration. It should also be possible to compute x once at compile time.

    – Jens
    8 hours ago











  • @Jens That's a good point. It doesn't seem like it's optimized out, but yeah, I should fix that.

    – Lukas Kalbertodt
    8 hours ago











  • Note for everyone: it seems like this is due to CPU frequency scaling and it doesn't happen on all machines. This is still not a full answer however and I hope someone could provide more information.

    – Lukas Kalbertodt
    8 hours ago













8












8








8


1






I noticed something really strange that I haven't seen before. The basic setup is described in this pseudo code:



TARGET_LOOP_TIME = X

loop forever:
before = now()
payload()
payload_time = now() - before
sleep(TARGET_LOOP_TIME - payload_time)


This setup is fairly common, e.g. for keeping the loop at 60 FPS. The interesting part is: the payload_time depends on the sleep duration! If the TARGET_LOOP_TIME is high and the program will thus sleep a lot, the payload_time is way higher compared to when the program does not sleep at all.



To measure this, I wrote this program:



use std::time::Duration, Instant;

const ITERS: usize = 100;

fn main() Duration::from_millis(n))
let mut payload_duration = Duration::from_millis(0);

for _ in 0..ITERS
let before = Instant::now();
x += count_primes(3_500);
let elapsed = before.elapsed();
payload_duration += elapsed;

// Sleep the remaining time
if loop_time > elapsed
std::thread::sleep(loop_time - elapsed);



let avg_duration = payload_duration / ITERS as u32;
println!("loop_time :.2? t=> :.2?", loop_time, avg_duration);


println!("", x);


/// Dummy function.
fn count_primes(up_to: u64) -> u64 n


I iterate over different target loop times to test (1ms to 30ms) and iterate for ITERS many times. I compiled this with cargo run --release. On my machine (Ubuntu), the program outputs:



loop_time 1.00ms => 3.37ms
loop_time 2.00ms => 3.38ms
loop_time 3.00ms => 3.17ms
loop_time 4.00ms => 3.25ms
loop_time 5.00ms => 3.38ms
loop_time 6.00ms => 4.05ms
loop_time 7.00ms => 4.09ms
loop_time 8.00ms => 4.48ms
loop_time 9.00ms => 4.43ms
loop_time 10.00ms => 4.22ms
loop_time 11.00ms => 4.59ms
loop_time 12.00ms => 5.53ms
loop_time 13.00ms => 5.82ms
loop_time 14.00ms => 6.18ms
loop_time 15.00ms => 6.32ms
loop_time 16.00ms => 6.96ms
loop_time 17.00ms => 8.00ms
loop_time 18.00ms => 7.97ms
loop_time 19.00ms => 8.28ms
loop_time 20.00ms => 8.75ms
loop_time 21.00ms => 9.70ms
loop_time 22.00ms => 9.57ms
loop_time 23.00ms => 10.48ms
loop_time 24.00ms => 10.29ms
loop_time 25.00ms => 10.31ms
loop_time 26.00ms => 10.82ms
loop_time 27.00ms => 10.84ms
loop_time 28.00ms => 10.82ms
loop_time 29.00ms => 10.91ms


I made a plot of those numbers (the sleep_time is max(0, loop_time - avg_duration)):



enter image description here



When the program does not sleep at all, the payload needs approximately 3.3ms (as the first three measurements show). As soon as the loop starts to sleep after the payload, the payload duration rises! In fact, it increases to roughly 10.5ms where it stays. Sleeping even longer does not increase the payload time.



Why? Why does the execution time of the piece of code depend on something I do afterwards (or before)? That does not make sense to me! It looks like the CPU says "I'm gonna sleep afterwards anyway, so let's take it slowly". I thought about caching effects, especially of the instruction cache, but loading instruction data from the main memory does not take 7ms! Something else is going on here!



Is there a way to fix this? I.e. to make the payload execute as quickly as possible regardless of sleep time?










share|improve this question
















I noticed something really strange that I haven't seen before. The basic setup is described in this pseudo code:



TARGET_LOOP_TIME = X

loop forever:
before = now()
payload()
payload_time = now() - before
sleep(TARGET_LOOP_TIME - payload_time)


This setup is fairly common, e.g. for keeping the loop at 60 FPS. The interesting part is: the payload_time depends on the sleep duration! If the TARGET_LOOP_TIME is high and the program will thus sleep a lot, the payload_time is way higher compared to when the program does not sleep at all.



To measure this, I wrote this program:



use std::time::Duration, Instant;

const ITERS: usize = 100;

fn main() Duration::from_millis(n))
let mut payload_duration = Duration::from_millis(0);

for _ in 0..ITERS
let before = Instant::now();
x += count_primes(3_500);
let elapsed = before.elapsed();
payload_duration += elapsed;

// Sleep the remaining time
if loop_time > elapsed
std::thread::sleep(loop_time - elapsed);



let avg_duration = payload_duration / ITERS as u32;
println!("loop_time :.2? t=> :.2?", loop_time, avg_duration);


println!("", x);


/// Dummy function.
fn count_primes(up_to: u64) -> u64 n


I iterate over different target loop times to test (1ms to 30ms) and iterate for ITERS many times. I compiled this with cargo run --release. On my machine (Ubuntu), the program outputs:



loop_time 1.00ms => 3.37ms
loop_time 2.00ms => 3.38ms
loop_time 3.00ms => 3.17ms
loop_time 4.00ms => 3.25ms
loop_time 5.00ms => 3.38ms
loop_time 6.00ms => 4.05ms
loop_time 7.00ms => 4.09ms
loop_time 8.00ms => 4.48ms
loop_time 9.00ms => 4.43ms
loop_time 10.00ms => 4.22ms
loop_time 11.00ms => 4.59ms
loop_time 12.00ms => 5.53ms
loop_time 13.00ms => 5.82ms
loop_time 14.00ms => 6.18ms
loop_time 15.00ms => 6.32ms
loop_time 16.00ms => 6.96ms
loop_time 17.00ms => 8.00ms
loop_time 18.00ms => 7.97ms
loop_time 19.00ms => 8.28ms
loop_time 20.00ms => 8.75ms
loop_time 21.00ms => 9.70ms
loop_time 22.00ms => 9.57ms
loop_time 23.00ms => 10.48ms
loop_time 24.00ms => 10.29ms
loop_time 25.00ms => 10.31ms
loop_time 26.00ms => 10.82ms
loop_time 27.00ms => 10.84ms
loop_time 28.00ms => 10.82ms
loop_time 29.00ms => 10.91ms


I made a plot of those numbers (the sleep_time is max(0, loop_time - avg_duration)):



enter image description here



When the program does not sleep at all, the payload needs approximately 3.3ms (as the first three measurements show). As soon as the loop starts to sleep after the payload, the payload duration rises! In fact, it increases to roughly 10.5ms where it stays. Sleeping even longer does not increase the payload time.



Why? Why does the execution time of the piece of code depend on something I do afterwards (or before)? That does not make sense to me! It looks like the CPU says "I'm gonna sleep afterwards anyway, so let's take it slowly". I thought about caching effects, especially of the instruction cache, but loading instruction data from the main memory does not take 7ms! Something else is going on here!



Is there a way to fix this? I.e. to make the payload execute as quickly as possible regardless of sleep time?







performance rust






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 8 hours ago









Shepmaster

171k19 gold badges375 silver badges524 bronze badges




171k19 gold badges375 silver badges524 bronze badges










asked 9 hours ago









Lukas KalbertodtLukas Kalbertodt

28.7k4 gold badges70 silver badges133 bronze badges




28.7k4 gold badges70 silver badges133 bronze badges












  • Couldn't the compiler optmize out count_primes? It seems to be a pure function and is called with the same parameters every iteration. It should also be possible to compute x once at compile time.

    – Jens
    8 hours ago











  • @Jens That's a good point. It doesn't seem like it's optimized out, but yeah, I should fix that.

    – Lukas Kalbertodt
    8 hours ago











  • Note for everyone: it seems like this is due to CPU frequency scaling and it doesn't happen on all machines. This is still not a full answer however and I hope someone could provide more information.

    – Lukas Kalbertodt
    8 hours ago

















  • Couldn't the compiler optmize out count_primes? It seems to be a pure function and is called with the same parameters every iteration. It should also be possible to compute x once at compile time.

    – Jens
    8 hours ago











  • @Jens That's a good point. It doesn't seem like it's optimized out, but yeah, I should fix that.

    – Lukas Kalbertodt
    8 hours ago











  • Note for everyone: it seems like this is due to CPU frequency scaling and it doesn't happen on all machines. This is still not a full answer however and I hope someone could provide more information.

    – Lukas Kalbertodt
    8 hours ago
















Couldn't the compiler optmize out count_primes? It seems to be a pure function and is called with the same parameters every iteration. It should also be possible to compute x once at compile time.

– Jens
8 hours ago





Couldn't the compiler optmize out count_primes? It seems to be a pure function and is called with the same parameters every iteration. It should also be possible to compute x once at compile time.

– Jens
8 hours ago













@Jens That's a good point. It doesn't seem like it's optimized out, but yeah, I should fix that.

– Lukas Kalbertodt
8 hours ago





@Jens That's a good point. It doesn't seem like it's optimized out, but yeah, I should fix that.

– Lukas Kalbertodt
8 hours ago













Note for everyone: it seems like this is due to CPU frequency scaling and it doesn't happen on all machines. This is still not a full answer however and I hope someone could provide more information.

– Lukas Kalbertodt
8 hours ago





Note for everyone: it seems like this is due to CPU frequency scaling and it doesn't happen on all machines. This is still not a full answer however and I hope someone could provide more information.

– Lukas Kalbertodt
8 hours ago












1 Answer
1






active

oldest

votes


















4














I am quite sure that this is caused by the CPU throttling. When the OS scheduler detects there is little work to do, the CPU frequency lowers to save power.



When you do a lot of sleeps, you are telling the scheduler that you are not in such a hurry, and the CPU can take it easy.



You can see that this is the case by running a CPU intensive task in another window, with low priority. For example, in Linux you can run:



$ nice bash -c 'while true ; do : ; done'


And at the same time, in another window run your program:



$ cargo run --release
loop_time 1.00ms => 3.13ms
loop_time 2.00ms => 3.17ms
loop_time 3.00ms => 3.19ms
loop_time 4.00ms => 3.13ms
loop_time 5.00ms => 3.16ms
loop_time 6.00ms => 3.22ms
loop_time 7.00ms => 3.14ms
loop_time 8.00ms => 3.15ms
loop_time 9.00ms => 3.13ms
loop_time 10.00ms => 3.18ms
loop_time 11.00ms => 3.14ms
loop_time 12.00ms => 3.17ms
loop_time 13.00ms => 3.15ms
...


Avoiding this depends on your OS. For example, in Linux, you can fiddle with sys/devices/system/cpu/* options. I think that UPower provides some functions to manage it from a non-root application. It would be nice if there were a crate that managed this cross-system, but I don't know of any.



An easy but hacky way to fix this, if you don't mind the wasted power, is just to run an idle thread with a busy loop.



std::thread::spawn(|| 
use thread_priority::*; //external crate thread-priority
let thread_id = thread_native_id();
set_thread_priority(
thread_id,
ThreadPriority::Min,
ThreadSchedulePolicy::Normal(NormalThreadSchedulePolicy::Idle),
)
.unwrap();
loop
);


Naturally, if you just want to avoid throttling in this piece of code, you can do a busy wait:



 //if loop_time > elapsed 
// std::thread::sleep(loop_time - elapsed);
//
// Busy-wait the remaining time, to avoid CPU throttling
while loop_time > before.elapsed()
//you may want to try both with and without yield
std::thread::yield_now();






share|improve this answer

























  • Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

    – Lukas Kalbertodt
    7 hours ago











  • I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

    – Lukas Kalbertodt
    7 hours ago











  • @LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

    – rodrigo
    7 hours ago











  • @LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

    – rodrigo
    7 hours ago






  • 1





    @LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

    – Jens
    6 hours ago










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56922289%2fwhy-does-sleeping-increase-the-execution-time-of-an-independent-piece-of-code-th%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









4














I am quite sure that this is caused by the CPU throttling. When the OS scheduler detects there is little work to do, the CPU frequency lowers to save power.



When you do a lot of sleeps, you are telling the scheduler that you are not in such a hurry, and the CPU can take it easy.



You can see that this is the case by running a CPU intensive task in another window, with low priority. For example, in Linux you can run:



$ nice bash -c 'while true ; do : ; done'


And at the same time, in another window run your program:



$ cargo run --release
loop_time 1.00ms => 3.13ms
loop_time 2.00ms => 3.17ms
loop_time 3.00ms => 3.19ms
loop_time 4.00ms => 3.13ms
loop_time 5.00ms => 3.16ms
loop_time 6.00ms => 3.22ms
loop_time 7.00ms => 3.14ms
loop_time 8.00ms => 3.15ms
loop_time 9.00ms => 3.13ms
loop_time 10.00ms => 3.18ms
loop_time 11.00ms => 3.14ms
loop_time 12.00ms => 3.17ms
loop_time 13.00ms => 3.15ms
...


Avoiding this depends on your OS. For example, in Linux, you can fiddle with sys/devices/system/cpu/* options. I think that UPower provides some functions to manage it from a non-root application. It would be nice if there were a crate that managed this cross-system, but I don't know of any.



An easy but hacky way to fix this, if you don't mind the wasted power, is just to run an idle thread with a busy loop.



std::thread::spawn(|| 
use thread_priority::*; //external crate thread-priority
let thread_id = thread_native_id();
set_thread_priority(
thread_id,
ThreadPriority::Min,
ThreadSchedulePolicy::Normal(NormalThreadSchedulePolicy::Idle),
)
.unwrap();
loop
);


Naturally, if you just want to avoid throttling in this piece of code, you can do a busy wait:



 //if loop_time > elapsed 
// std::thread::sleep(loop_time - elapsed);
//
// Busy-wait the remaining time, to avoid CPU throttling
while loop_time > before.elapsed()
//you may want to try both with and without yield
std::thread::yield_now();






share|improve this answer

























  • Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

    – Lukas Kalbertodt
    7 hours ago











  • I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

    – Lukas Kalbertodt
    7 hours ago











  • @LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

    – rodrigo
    7 hours ago











  • @LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

    – rodrigo
    7 hours ago






  • 1





    @LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

    – Jens
    6 hours ago















4














I am quite sure that this is caused by the CPU throttling. When the OS scheduler detects there is little work to do, the CPU frequency lowers to save power.



When you do a lot of sleeps, you are telling the scheduler that you are not in such a hurry, and the CPU can take it easy.



You can see that this is the case by running a CPU intensive task in another window, with low priority. For example, in Linux you can run:



$ nice bash -c 'while true ; do : ; done'


And at the same time, in another window run your program:



$ cargo run --release
loop_time 1.00ms => 3.13ms
loop_time 2.00ms => 3.17ms
loop_time 3.00ms => 3.19ms
loop_time 4.00ms => 3.13ms
loop_time 5.00ms => 3.16ms
loop_time 6.00ms => 3.22ms
loop_time 7.00ms => 3.14ms
loop_time 8.00ms => 3.15ms
loop_time 9.00ms => 3.13ms
loop_time 10.00ms => 3.18ms
loop_time 11.00ms => 3.14ms
loop_time 12.00ms => 3.17ms
loop_time 13.00ms => 3.15ms
...


Avoiding this depends on your OS. For example, in Linux, you can fiddle with sys/devices/system/cpu/* options. I think that UPower provides some functions to manage it from a non-root application. It would be nice if there were a crate that managed this cross-system, but I don't know of any.



An easy but hacky way to fix this, if you don't mind the wasted power, is just to run an idle thread with a busy loop.



std::thread::spawn(|| 
use thread_priority::*; //external crate thread-priority
let thread_id = thread_native_id();
set_thread_priority(
thread_id,
ThreadPriority::Min,
ThreadSchedulePolicy::Normal(NormalThreadSchedulePolicy::Idle),
)
.unwrap();
loop
);


Naturally, if you just want to avoid throttling in this piece of code, you can do a busy wait:



 //if loop_time > elapsed 
// std::thread::sleep(loop_time - elapsed);
//
// Busy-wait the remaining time, to avoid CPU throttling
while loop_time > before.elapsed()
//you may want to try both with and without yield
std::thread::yield_now();






share|improve this answer

























  • Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

    – Lukas Kalbertodt
    7 hours ago











  • I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

    – Lukas Kalbertodt
    7 hours ago











  • @LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

    – rodrigo
    7 hours ago











  • @LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

    – rodrigo
    7 hours ago






  • 1





    @LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

    – Jens
    6 hours ago













4












4








4







I am quite sure that this is caused by the CPU throttling. When the OS scheduler detects there is little work to do, the CPU frequency lowers to save power.



When you do a lot of sleeps, you are telling the scheduler that you are not in such a hurry, and the CPU can take it easy.



You can see that this is the case by running a CPU intensive task in another window, with low priority. For example, in Linux you can run:



$ nice bash -c 'while true ; do : ; done'


And at the same time, in another window run your program:



$ cargo run --release
loop_time 1.00ms => 3.13ms
loop_time 2.00ms => 3.17ms
loop_time 3.00ms => 3.19ms
loop_time 4.00ms => 3.13ms
loop_time 5.00ms => 3.16ms
loop_time 6.00ms => 3.22ms
loop_time 7.00ms => 3.14ms
loop_time 8.00ms => 3.15ms
loop_time 9.00ms => 3.13ms
loop_time 10.00ms => 3.18ms
loop_time 11.00ms => 3.14ms
loop_time 12.00ms => 3.17ms
loop_time 13.00ms => 3.15ms
...


Avoiding this depends on your OS. For example, in Linux, you can fiddle with sys/devices/system/cpu/* options. I think that UPower provides some functions to manage it from a non-root application. It would be nice if there were a crate that managed this cross-system, but I don't know of any.



An easy but hacky way to fix this, if you don't mind the wasted power, is just to run an idle thread with a busy loop.



std::thread::spawn(|| 
use thread_priority::*; //external crate thread-priority
let thread_id = thread_native_id();
set_thread_priority(
thread_id,
ThreadPriority::Min,
ThreadSchedulePolicy::Normal(NormalThreadSchedulePolicy::Idle),
)
.unwrap();
loop
);


Naturally, if you just want to avoid throttling in this piece of code, you can do a busy wait:



 //if loop_time > elapsed 
// std::thread::sleep(loop_time - elapsed);
//
// Busy-wait the remaining time, to avoid CPU throttling
while loop_time > before.elapsed()
//you may want to try both with and without yield
std::thread::yield_now();






share|improve this answer















I am quite sure that this is caused by the CPU throttling. When the OS scheduler detects there is little work to do, the CPU frequency lowers to save power.



When you do a lot of sleeps, you are telling the scheduler that you are not in such a hurry, and the CPU can take it easy.



You can see that this is the case by running a CPU intensive task in another window, with low priority. For example, in Linux you can run:



$ nice bash -c 'while true ; do : ; done'


And at the same time, in another window run your program:



$ cargo run --release
loop_time 1.00ms => 3.13ms
loop_time 2.00ms => 3.17ms
loop_time 3.00ms => 3.19ms
loop_time 4.00ms => 3.13ms
loop_time 5.00ms => 3.16ms
loop_time 6.00ms => 3.22ms
loop_time 7.00ms => 3.14ms
loop_time 8.00ms => 3.15ms
loop_time 9.00ms => 3.13ms
loop_time 10.00ms => 3.18ms
loop_time 11.00ms => 3.14ms
loop_time 12.00ms => 3.17ms
loop_time 13.00ms => 3.15ms
...


Avoiding this depends on your OS. For example, in Linux, you can fiddle with sys/devices/system/cpu/* options. I think that UPower provides some functions to manage it from a non-root application. It would be nice if there were a crate that managed this cross-system, but I don't know of any.



An easy but hacky way to fix this, if you don't mind the wasted power, is just to run an idle thread with a busy loop.



std::thread::spawn(|| 
use thread_priority::*; //external crate thread-priority
let thread_id = thread_native_id();
set_thread_priority(
thread_id,
ThreadPriority::Min,
ThreadSchedulePolicy::Normal(NormalThreadSchedulePolicy::Idle),
)
.unwrap();
loop
);


Naturally, if you just want to avoid throttling in this piece of code, you can do a busy wait:



 //if loop_time > elapsed 
// std::thread::sleep(loop_time - elapsed);
//
// Busy-wait the remaining time, to avoid CPU throttling
while loop_time > before.elapsed()
//you may want to try both with and without yield
std::thread::yield_now();







share|improve this answer














share|improve this answer



share|improve this answer








edited 6 hours ago

























answered 8 hours ago









rodrigorodrigo

66.3k5 gold badges98 silver badges135 bronze badges




66.3k5 gold badges98 silver badges135 bronze badges












  • Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

    – Lukas Kalbertodt
    7 hours ago











  • I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

    – Lukas Kalbertodt
    7 hours ago











  • @LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

    – rodrigo
    7 hours ago











  • @LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

    – rodrigo
    7 hours ago






  • 1





    @LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

    – Jens
    6 hours ago

















  • Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

    – Lukas Kalbertodt
    7 hours ago











  • I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

    – Lukas Kalbertodt
    7 hours ago











  • @LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

    – rodrigo
    7 hours ago











  • @LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

    – rodrigo
    7 hours ago






  • 1





    @LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

    – Jens
    6 hours ago
















Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

– Lukas Kalbertodt
7 hours ago





Thanks! The CPU frequency thing makes sense. Regarding the solution: instead of a busy thread I could probably just busy wait instead of sleeping in my main thread.

– Lukas Kalbertodt
7 hours ago













I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

– Lukas Kalbertodt
7 hours ago





I have an additional question though (in case someone can answer this): according to this paper (if I understand it correctly) the latency to change between two CPU frequency is fairly low (<100µs). So I would expect the CPU to increase the frequency fairly quickly after my payload started. Why isn't that happening?

– Lukas Kalbertodt
7 hours ago













@LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

– rodrigo
7 hours ago





@LukasKalbertodt: It is not so easy. You seem to think that the CPU is going from 100% power to 0% and then to 100% again, and the time you lose is because of the latency in the switch. But while the latency in the switch is definitely a thing, your main issue is the frequency throttling, that is that the OS scheduler sees that your code is only e.g. 30% of the time busy, so it can reduce the CPU frequency to 30% of its maximum without losing overall performance, while saving a lot in power.

– rodrigo
7 hours ago













@LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

– rodrigo
7 hours ago





@LukasKalbertodt: In fact, in your graph you can read that the OS is actually trying to predict the time you are going to sleep, and go to the lowers power that makes you still sleep just a bit. Up to a maximum of course, the CPU frequency has a minimum limit.

– rodrigo
7 hours ago




1




1





@LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

– Jens
6 hours ago





@LukasKalbertodt These settings are OS-specific. On Linux, you can echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, on Windows you select something the power saving settings. There is also Turbo Boost which dynamically increases the max. frequency of a core if there is thermal budget. I would recommand to disable power scaling and turbo boost in the BIOS for benchmarking.

– Jens
6 hours ago








Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56922289%2fwhy-does-sleeping-increase-the-execution-time-of-an-independent-piece-of-code-th%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

François Viète Contents Biography Work and thought Bibliography See also Notes Further reading External links Navigation menup. 21Google Bookspp. 75–77Google BooksDe thou (from University of Saint Andrews)ArchivedGoogle BooksGoogle BooksGoogle BooksGoogle booksGoogle Bookscc-parthenay.frL'histoire universelle (fr)Universal History (en)ArchivedAdsabs.harvard.eduPagesperso-orange.frArchive.orgChikara Sasaki. Descartes' mathematical thought p.259Google BooksGoogle BooksGoogle Bookspp. 152 and onwardGoogle BooksGoogle BooksScribd.comGoogle Books1257-7979Google BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGallica.bnf.frGoogle BooksGoogle Books"François Viète"Francois Viète: Father of Modern Algebraic NotationThe Lawyer and the GamblerAbout TarporleySite de Jean-Paul GuichardL'algèbre nouvelle"About the Harmonicon"cb120511976(data)1188044800000 0001 0913 5903n82164680ola2013766880073431702w6vt1sb70287374827140948071409480