View Full Version : Ive searched the board and couldn't find the thread where...
Smurftra
09-02-2003, 07:20 PM
...someone posted about another way of getting the tickcount. A more precise method. When i read it it as always moved by leaps of 10 or 20, like it was 10ms precise.
Thanks
Smurf
Scorpio
09-02-2003, 08:24 PM
Try surrounding your timing code with timeBeginPeriod(1) and timeEndPeriod(1).
For example, a simple non-yielding sleep funtion would look like:
inline void Hip_Snooze(uint32 nSnoozeTime)
{
timeBeginPeriod(1);
uint32 uNowTime = timeGetTime();
uint32 uEndTime = uNowTime + nSnoozeTime;
while (uNowTime <= uEndTime)
{
uNowTime = timeGetTime();
}
timeEndPeriod(1);
}
Not sure if this is what you needed...I noticed that Win2k and XP would not give me the resolution I needed unless I used the timeBegin/EndPeriod functions (I never had a problem in 95 or 98).
Hope this helps...
-Scorpio
Mike Boeh
09-02-2003, 10:01 PM
Use QueryPerformanceCounter, it's the only way to fly :)
Mike Wiering
09-03-2003, 12:52 AM
You can use rdtsc instruction in assembly, this does require a Pentium so your program won't work on a 486 or below (although I've heard that some 486's had it too). I use the following function in Delphi 3 for mesuring code performance:
function Rdtsc: LongInt; register;
asm
db $0F, $31
end;
Mattias
09-03-2003, 01:04 AM
And here's how you would do it in C++
DWORD dwLow;
DWORD dwHigh;
__asm
{
rdtsc
mov dwLow,eax
mov dwHigh,edx
};
If you only call it every now and then, QueryPerformanceCounter is fine, but if you call it lots of time per frame, your app can become really, really slow, and then you are better off using the above.
Also, just to clear something up: I've often heard people claim that the timeGetTime() function is inaccurate, and that you should therefore use QueryPerformanceCounter. That is not the case, it is just that QueryPerformanceCounter has a higher resolution, meaning you can use it to measure shorter intervals than would be possible with timeGetTime, and that you will get more digits after the decimal point (if you were to convert it to seconds, that is).
If you need to measure the time in order to get framerate independent execution, then stick with timeGetTime, it will work just fine. But if you want to measure the performance of specific sections of your code, use the performance counter.
Mike Wiering
09-03-2003, 01:27 AM
If you only call it every now and then, QueryPerformanceCounter is fine, but if you call it lots of time per frame, your app can become really, really slow, and then you are better off using the above. I just tried QueryPerformanceCounter in Delphi and it actually seems to be A LOT FASTER than Rdtsc:
Caption := 'qpc: ' + IntToStr (- (qpc - qpc)) + ', rdtsc: ' + IntToStr (- (rdtsc - rdtsc));
qpc is a function I wrote around QueryPerformanceCounter to get a longint.
The result is: "qpc: 6, rdtsc: 84"
I think this is very strange since rdtsc is only one instruction! Has anyone else experimented with this?
Mattias
09-03-2003, 02:06 AM
I have an application where most of the function calls are surrounded by calls to query the current time, to measure the performance. If i use QueryPerformanceCounter, framerate slows to a crawl, and when looking at it in CodeAnalyst, it turns out that most of the time (80% or so) is spent inside hal.dll, and not inside my app. But when I change to rdtsc, the difference in framerate is barely noticable.
I suspect that there's something wrong with your test application, but as I don't know much about delphi, I can't tell you what... Have you tried running a profiler on it?
Mattias
09-03-2003, 02:13 AM
heh, sorry, just remembered, rdtsc and the performance counter doesn't use the same unit... rdtsc counts cycles, the performance counter returns "counts". Both things should be converted to time before being compared (whenever I use them, I'm only interested in relative values, like "how much of the total time were spent in this function" and not in the absolute values)
Mike Wiering
09-03-2003, 02:29 AM
Ah, that explains it!
Ok, if I first calculate the number of cycles in a time unit, then the results look more like I would expect:
qpc: 3360, rdtsc: 84
jaggu
09-03-2003, 02:38 AM
Because of instruction pairing and out of order execution on all pentium processors, even if RDTSC instruction gives the right readout it may be wrong for your program because some of its instructions may be waiting to execute or executing when RDTSC was executed! The solution is to wait for all instructions of your code to execute (called serialising) which can be done with an instruction like CPUID. I've been using RDTSC for profiling my game and here's the code:
In rdtsc.asm:
.586
.model flat, c
.mmx
.code
; read time stamp counter
_rdtsc PROC
push ebx ; cpuid affects ebx which must not be affected across procedures
; hence push it now and pop it when we RET
cpuid ; waits for all instructions currently executing to complete
rdtsc ; value of the time stamp counter is provided in edx:eax
push edx
push eax
fild QWORD PTR [esp] ; clock count is an integer so use fild instead of fld
pop eax
pop edx
pop ebx
RET
_rdtsc ENDP
end
assemble the above into rdtsc.obj using MASM as follows:
ml /c rdtsc.asm
Then in VC++, add rdtsc.obj and link with your program. In a header file, declare:
extern "C" {
double _rdtsc ();
}
Now to use it for timing your program:
double d1 = _rdtsc ();
// your code
double d2 = _rdtsc ();
d2-d1 is the number of clock cycles elapsed executing your code.
You need to find out clock speed of
processor to print timing info in seconds. You can use this code:
double t1 = clock (), t2, d1 = _rdtsc (), d2 = 0;
while (1) {
t2 = (clock () - t1) / CLOCKS_PER_SEC;
if (t2 > 1) break;
}
cps = _rdtsc () - d1;
I've used clock () from time.h, wait for 1 second and find the number of clock cycles elapsed using _rdtsc; cps is the clock speed. You can use QueryPerformanceCounter library instead.
To profile your code and print timing info in seconds:
double d1 = _rdtsc ();
// your code
double elapsedSeconds = (_rdtsc () - d1) / cps;
Hope this helps. Love to hear what others think.
Mattias
09-03-2003, 03:56 AM
That's a good point, I didn't know that! Thanks for sharing...
By the way, is there a reason not to do the whole thing as inline assembler rather than using MASM?
jaggu
09-03-2003, 04:37 AM
I have not done any inline assembly yet so I may be wrong but it seems instead of repeating the same instructions every time you need it, its better to wrap it up into a function and call it. I'm profiling my code at many places, so inline assembly at all those locations is cumbersome and affects readability.
The implementation I've presented is highly accurate altho not absolutely accurate. There is some nanoseconds lost because we have to substract the execution time of all instructions except RDTSC. There is addtional nanoseconds lost if the function is not inlined. The syntax:
extern "C" {
inline double _rdtsc ();
}
usually inlines the function on all modern compilers. If you are using inline assembly, you wouldnt have to worry about this aspect.
princec
09-03-2003, 06:32 AM
I must say that there is only one place ever you need to use a high resolution timer and that's for accurate frame rate capping; any other uses are much better done by using a proper profiler on your code. AFTER you've identified a performance problem.
Cas :)
Benski
09-03-2003, 02:53 PM
Here's the routine that I use in MSVC++ .NET. I seem to remember it working in 6.0 also.
#pragma warning (push)
#pragma warning (disable:4035)
__forceinline unsigned __int64 __cdecl BENCHMARK()
{
__asm
{
cpuid
rdtsc
}
}
#pragma warning (pop)
the #pragmas are there because it doesn't seem to return a value. However, rdtsc puts the cycle count in eax, which is where the compiler stores the return value.
As someone said above, cpuid pauses until the instruction cache has emptied. If you don't do this, then the processor may execute rdtsc before it runs your code to test!
I also have a whole benchmarking class which makes it trivial to benchmark various parts of your program and get a readout later. If there is any interest I'll document the class and post it =)
Smurftra
09-03-2003, 06:42 PM
"I also have a whole benchmarking class which makes it trivial to benchmark various parts of your program and get a readout later. If there is any interest I'll document the class and post it =)"
Yeah, there is interest. :)
Thanks everyone, that was all the information i was looking for.
Smurf