View Full Version : The Great Big DirectDraw Blitting Thread
programmer_ted
08-28-2003, 07:27 AM
Well, I have to say the past week and a half has been eventful. I'm the lead programmer of the game Mission: Land!, developed by Fusion Apple Entertainment (http://www.fusionapple.com). Recently, we stumbled upon a bug where the game would run extremely slow on some computers. After a bit of testing, we came to the conclusion that the problem occurred when, for whatever reason, DirectDraw was storing all (or most) surfaces in system memory.
This was a particularly difficult problem for a couple reasons. At first glance, there was nothing we could really take out of the game to increase framerate. Also, the game didn't appear to be using any extra processor time where it shouldn't be.
Obviously blitting from system memory to video memory is slower than from video memory to video memory, but since there wasn't any huge amounts of blitting going on in the game at any given time, my hands were tied. I posted on the forums here and got some helpful responses, one of which incited me to write my own blitter (thanks to MirekCz).
So, I wrote my own blitter. Turns out it ran about as slow as the DirectDraw blitter, but I learned a lot about blitting sprites. Here's the good part: after re-examining the game, I came across the stars in the background. They were four 400x300 surfaces, all of which were blitted four times each frame. Since they were all colorkeyed, I previously had disregarded that the 40-or-so-pixels-per-surface would be of any strain on the computer. But, DirectDraw has to check EVERY pixel to see if it lies within the colorkey range, does it not? These memory accesses were causing all the slowdown. Imagine drawing four 800x600 surfaces each frame, and you'll see what I mean.
So, if you're having problems running your game on computers with older video cards, take these steps and see if they fix your problems:
1) Only draw as much as you absolutely have to each frame. Large sprites can and WILL slow down the game on systems with older video cards.
2) Store as much as possible in video memory. If at all possible, the primary surface and back buffer should be stored in video memory. This way, it's only slow when blitting system memory sprites to the back buffer.
3) Just because a surface is colorkeyed DOES NOT mean it will take less time to blit. The blitter still has to check what pixels fall within the colorkey range.
If anyone can think of anything else, feel free to post here. Hopefully we can eliminate the need for another thread like this one.
Oh, and...
I'm really interested in hearing if you come up with a solution because I believe my game is suffering from the same thing. - BrewKnowC
Well there you go :D
ggambett
08-28-2003, 08:00 AM
3) Just because a surface is colorkeyed DOES NOT mean it will take less time to blit. The blitter still has to check what pixels fall within the colorkey range.
Not necessarily true. Most blitters use or should use RLE compression of transparent zones.
programmer_ted
08-28-2003, 08:05 AM
Ah, I didn't think of this. DirectDraw didn't appear to do this when the surface was in system memory, though. I'm guessing that's because it couldn't use the hardware blitter.
Dexterity
08-28-2003, 08:20 AM
For the blitter I wrote for Dweep, I used RLE compression of all sprites, and it's very fast, even on old PCs with 1MB video cards. There's no need to check each pixel for transparency, since the transparent areas are just skipped entirely.
programmer_ted
08-28-2003, 08:52 AM
That's one of the advantages of writing your own blitter. I was writing this post mostly for those that wanted to use the DirectDraw Blt and BltFast functions.
BrewKnowC
08-28-2003, 02:56 PM
programmer_ted - Thanks for the heads-up! I was still having this problem, but had put it on the back burner for the time being. You may have just saved me the same headache that you went through and for that I am very appreciative. I'll take a closer look to see if i'm blitting any huge images or any images that are mostly transparent. Thanks Again!
OT - i'm just happy i finally got my download size under 3MB! w00t!
- Bruno
programmer_ted
08-28-2003, 03:51 PM
Glad I (may have?) helped you. Of all I posted, I'm guessing the colorkey problem was the least known. I knew that when blitting from video memory to video memory I wasn't having any slow-down, but wasn't sure why. Now, I'm guessing it's because DirectDraw doesn't RLE compress transparent areas when blitting from system memory to video memory (or system memory to system memory, probably). As long as the thread helps one person, it wasn't a total waste of time ;)
Oh, and congrats on the download size :D
Crispie_Critter
08-28-2003, 05:10 PM
RLE compression only helps on large sprites that have large amounts of transparency. For most cases / sprites you will not need to worry about RLE compression if there is very little transparency. Unless your really trying to squeeze the performance out of your game you won't know the difference from using RLE and not using RLE. That being said, RLE shouldn't be a blanket trouble shooter for bad sprite design / implementation. By that I mean, you should use RLE to fix that 800 x 600 bmp that contains 32 stars and the rest is transparent.
Now all that being said back to the original thrust of the post. You should be double / triple buffering your screens through video memory. Your primary and secondary video buffer should always be in video memory otherwise you will see slow downs. Large sprites with lots of transparency are the devil and should be split up where possible. Don't use gradiants / large amounts of colours in your sprites palette. This is a really, really important rule. The more colours on your sprites palette, the harder it has to work. A large palette for a sprite is 16 colours. If your using one huge 256 colour palette for all your game features you will get slow downs.
Hope this helps :-P
BrewKnowC
08-28-2003, 07:47 PM
If your using one huge 256 colour palette for all your game features you will get slow downs.
Crispie Critter - Is this true even when using non-palletted images such as 24-bit onto a 16-bit surface? or is this only for 8-bit palletted images?
Thanks,
-Bruno
programmer_ted
08-28-2003, 08:57 PM
Thanks for your additions, Crispie_Critter. Good advice indeed. As far as stars, they were only actually singular pixels on a large (colorkeyed) black background. I plotted the pixels by themselves and that worked fine.
Crispie_Critter
08-28-2003, 11:23 PM
Originally posted by BrewKnowC
Crispie Critter - Is this true even when using non-palletted images such as 24-bit onto a 16-bit surface? or is this only for 8-bit palletted images?
Thanks,
-Bruno
I'm unsure exactly what your refering too. If you mean in the 3D sense I have no clue. Here I'll give you a scenario.
Let's say your making a platformer game. Game specs are 800 x 600 screen size, 16bit colour, sound ect. Let's say you have 1 main sprite with 35 animation frames of animation (dieing and what not). Let's also say you have 6 enemies with 6 frames each (basic shooting then dieing or whatever). Ok in your level you have say 32 different tile types. These are static non animated tiles that you use for the player to walk on. For the moment we won't worry about backgrounds or parralax scrolling.
Ok So we have roughly 7 sprites and 32 tiles. Let's say the sprites contain 16 different 16bit colours. Let's say the tiles each contain 8 different colours. That makes a total of 368 different colours within the 16bit colour palette (this palette contains 65535). Now every palette you create is basicly indexed (thats the reason you create them) So let's say we don't include a palette with our sprites. That means when we blit our sprites the program must go through the whole 16bit colour palette to find the colours it needs. This is very time consuming, so to get more speed let's create a palette with 368 different colours. Ok now searching through this palette should be a lot faster, however let's get it even faster. If all your sprites contain different colours (and I'm talking number here, eg #112-#223-#90 is a colour number) then your sprite artist should be shot !! Where possible a good spriter will reuse colours where possible. The benefits to this are two fold. One, Smaller pallete's which equals faster speeds. Two, Artistic consistancy. Because your sharing colours it helps towards maintaining consistancy within your game. Now as we keep going Let's say we can reduce our palette to 256 colours. No when that split gets blitted it is even faster again !
The next step ? Well the next step is to then split the large 256 colour palette into smaller palettes. Now in this stage you have to weigh up the benefits. By that I mean too many palette's and your going to create more overheads, two few and you won't get much benefit. Ok so let's say of our 7 animated sprites, our main character should have a palette of his own. So when then create a palette for it of say 16 colours. Now when we blit that sprite to the screen ... you guessed it we only have to search through 16 colour indexed palette. More Speed !! Anyways I think you should be able to gleem the idea for what I'm saying, if anything is unclear post below !! :-P
Edit: BTW forgot to mention. Despite what people tell you the Direct Draw blitter is fairly decent. However you will be able to write your own blitter that is a lot better BUT writing your own blitter won't help if your passing the data to it the wrong way :-)
Crispie_Critter
08-28-2003, 11:26 PM
Originally posted by programmer_ted
Thanks for your additions, Crispie_Critter. Good advice indeed. As far as stars, they were only actually singular pixels on a large (colorkeyed) black background. I plotted the pixels by themselves and that worked fine.
Ouch !! Singular pixels on a black background and you were blitting the whole thing !!!
Well at least you have seen the light :-P Just a little addon to this they way I would have done it would be:
1) Create a Random generator for the stars and do them on the fly
2) Create an arrary of the stars screen position (eg 132,211; 176,30). Would of only done it this way if I wanted them in a specific point :-P
MirekCz
08-29-2003, 12:19 AM
Crispy is wrong here I believe.
alpha/colorkey-RLE compression is VERY helpfull for almost all kinds of transparent images. Only images which often change between transparent/nontransparent pixels (like a X pattern) will work poorly with RLE.
The reason is obvious. (data from some old old tests on cyrix p120 or so)
A transparent bitblt (like if x!=0 drawpixel) is about 3x slower then normal bitblt. (I'm talking here about optimized stuff)
An RLE bitblt is often FASTER then normal bitblt.
Another good thing is that RLE files are often smaller then normal image files - so it gives you some lossless image compression as a free gift :-)
There are 2 reasons:
1.RLE is faster then normal transparent, because instead doing a check once per every pixel to see if it's transparent or not you do it only few/several times per line (in most cases, X pattern is an obvious case which doesn't work here).
2.RLE can often be even faster then normal bitblt, because all the transparent pixels are "jumped over" with very little effort from CPU, while with normal bitblt it needs to copy all pixels.
Depending on the transparent image, RLE is mostly around the speed or faster then normal bitblt.
And about stars problem.. with RLE compression you can do those 800x600 bitblts very cheaply. It can be done in software rendering with very little work.
if you create RLE from chunks of data where there's:
1.32bit number representing nr of transparent/nontransparent pixels (in this 32bit number you keep 24bits as the nr of transparent pixels and 8bits as the nr of non-transparent pixels. It's obvious you won't have more then 255 non-transparent pixels in a row while you can have many many transparent pixels in a row)
One notice.. for most use 16bit size chunk will be enought for sprites and it can save you a decent amount of file size.
2.color data for current chunk (8/16/32bit, whatever you use) x nr of nontransparent pixels
Now, you can have a nice star system with either 1 pixel large stars or several-more pixels stars and bitblt should be extremely fast.
Another good thing is that the amount of memory required by this RLE bitmap will be much much lower then your 800x600 bitmap (depending on amount of stars I would guess a number of 10-100x applies here)
About the question if you can use RLE compression for 8/16/32/whatever bpp images - yes sure. If you look at my chunk description you will notice that it's easy to put whatever data you need into it. With higher res images RLE might actually work even better comparing to other methods, as you save much bigger chunks of memory to read when "jumping over" transparent pixels.
And the last thing, remember that RLE can be used nicely with alpha-blending. As the colorkey value you use alpha channel (is it 0 - completely transparent, or >0 - semi/non-transparent) and while drawing pixels you apply usual alpha-blending.
Hope it helps, hope I have covered every question you might have.
If you need some more info feel free to ask, RLE compression is a bit of my favorite toy. The reason is probably that I "discovered" it myself several years ago... ohh well, reinventing the wheel :-)))
Crispie_Critter
08-29-2003, 12:48 AM
Ok for starters a definition:
An RLE sprite is a bitmap that has been compressed with a algorithm called 'Run Length Encoding' for sprites with alot of the same color.
Basicly what this means is an RLE sprite is quicker when dealing with sprites with large sections of transparency because, rather than testing each pixel to see if it is a transparent colour, and RLE sprite knows which pixels are transparent. Now that being said, RLE sprites with most blitters have limited functions like no rotating ect. RLE sprites are generally designed for linear access. Now with using RLE sprites I have not noticed much of a difference over when I use standard bitmap sprites.
Now my whole point of this post is this. Above I was trying to stress the point that changing blitters, using RLE sprites or whatever in most of these cases is not the problem. The problem is the type of data / sprite your trying to pass through the blitter. You can get a lot more speed fixing up palette issues and changing the way you represent sprites rather than changing the way you proccess them.
Now if you use masked sprites, which is a totaly different kettle of fish, I doubt you would get any performance from RLE. As far the alpha blended stuff, I was pretty sure RLE could just till if it was transparent (0) or not (>0). Although this may be the blitter I am using, I haven't really investigated it fully ....
*Edit -- Aaahhhh your talking about alpha-blending RLE compression, my current blitter doesn't support that. Don't really need it either and couldn't be bothered writing my own so that isn't going to change for a bit :-P
Carrot
08-29-2003, 02:05 AM
Originally posted by Crispie_Critter
Direct Draw blitter is fairly decent. However you will be able to write your own blitter that is a lot better BUT writing your own blitter won't help if your passing the data to it the wrong way :-)
Are you talking about the DD software renderer?
Am I missing something here or why is everyone writing their own blitters?! Is it just for older video cards?
Because I can't see how you could improve on the speed of even the most basic hardware-blitting cards.
freeman
08-29-2003, 02:19 AM
Originally posted by programmer_ted
Recently, we stumbled upon a bug where the game would run extremely slow on some computers. After a bit of testing, we came to the conclusion that the problem occurred when, for whatever reason, DirectDraw was storing all (or most) surfaces in system memory.
As mentioned quite a few times before on this forum, there were problems with DirectDraw on Geforce cards with at least one version of the nVidia Detonator drivers. Maybe that is the cause of your problem? Use search to find out more about it.
MirekCz
08-29-2003, 03:37 AM
freeman:it's also possible he was running it on vidcard with low memory..
3x 800x600 bitmaps calls a warning alarm in my head :-)
no 2 and probably 4mb card could fit everything into mem.. speed suffered greatly
Carrot:problem with ddraw is lack of alpha-blending...
Crispie:I can hardly follow you why you can't rotate RLE compressed sprites. But all in all, rotating a sprite is the worst thing you can do as image quality suffers greatly unless a complicated and expensive algo is used.
Simple rotations by 90/180/270/mirror+ a set of rotating bitmaps is the way to go. It's pretty much as easy to do with RLE as with normal bitmaps.
Using RLE compressed sprites won't give you much unless you're drawing a LOT of them.
For example in puzzle game when you use alpha-blending to put a piece on map or something you will be perfectly fine with the simplest solution.
But in action game where you see a lot of alpha-blended smoke, fire, explosions etc. RLE compression can really impact your drawing speed.
Actually the biggest penalty for using software rendering is the time spent to copy data from sysmem to vidmem. The drawing itself is often quite fast.
Originally posted by Carrot
Are you talking about the DD software renderer?
Am I missing something here or why is everyone writing their own blitters?! Is it just for older video cards?
Because I can't see how you could improve on the speed of even the most basic hardware-blitting cards. I don't do it for speed. I do it for compatibility. In every game I have worked on to date the bugs that have been the biggest problem are bugs that have been from different computers acting differently. Be it through Sound Drivers changing the Floating point control register or stretched blits that draw totally black on some video cards.
nVidia took the final step by thumping their entire DirectDraw implentation on the head a few times with an anvil. Luckily it recovered.
For 2d I'm going with pure software rendering for now. Yes it is much slower, but I'm still doing mostly ok speed wise. Speed of development is much quicker.
In the future I'm going to go with 2d on 3d (or who knows? even 3d on 3d). Both OpenGL and Direct3d (barring ealy Direct3d) seem to have across the board implementations that work well on all systems. I still see lots of game reports like "the Floors don't appear on some GeForce4mx cards" with games, but If I keep things simple I'm hoping it'll work out for the best.
koder
08-29-2003, 07:34 AM
When using RLE, do you store the data (sprite etc.) in system memory or can it be stored in video memory?
freeman
08-29-2003, 07:52 AM
Originally posted by MirekCz
freeman:it's also possible he was running it on vidcard with low memory..
Of course it could be that or alot of other things, I was just giving him a hint about the driver issue...
programmer_ted
08-29-2003, 11:14 AM
Quick note: a bunch of people have been offering ideas on how to fix the stars. I already have! Thanks for the suggestions though ;)
Basically, the stars when they were surfaces "moved" at different speeds to lend a feeling of depth (they were also darker depending on distance). I just wrote a bit of code that chose random locations and depth, chose RGB depending on depth, and moves the stars depending on depth. Looks the same, but a lot faster.
programmer_ted
08-29-2003, 11:15 AM
Oh, and koder, store as many sprites as possible in video memory! Blits are a LOT faster from video memory to video memory.
MirekCz
08-29-2003, 03:39 PM
koder: no hardware I know supports RLE compressed sprites.. so you store them in sysmem.
programmer_ted:
When I use software rendering one of the reasons to use it is to allow all guys with 1/2/4mb vidcards see my work.
If you think about it, those vidcards have rarelly space to keep any sprites besides primary/backsurface.
And if this equipment can handle it, all other computers will do just fine without pushing any sprites to vidmem and complicating code.
Of course there might be situations when it's good to do so.
(for example you need highres mode like 800x600 but most of the screen is static.. so you perform software rendering only on the part of screen that is dynamic and store the rest in vidmem for reuse)
programmer_ted
08-29-2003, 09:28 PM
Sorry about that, didn't see that he said RLE. MirekCz: makes sense. I've already made my own software blitter that I'm going to modify over the next couple days. For some reason some frames were taking over 100 ms to draw, and I can only bring it down to the DirectDraw blitting. I'll see how it goes.
MirekCz
08-30-2003, 04:36 AM
programmer_ted:
if you don't do it yet, use the performace counter to measure fps rates. It's highly accurate.
You can easily use it to check the speed of various code parts and/or for example while testing show several numbers which represent drawing last frame times divided by various parts of code
(so you do several time checks for various parts of code (ddraw blitting, sysmem->vidmem blitting, sysmem sprite drawing, everything else) and display them on screen to keep track which part of code eats most of the 100ms time period.
programmer_ted
08-30-2003, 08:51 AM
Right, I was thinking of that too. I'll probably do that since implementing my blitter (again) would be a lot of work. I'm also checking out AMD CodeAnalyst, but since I'm not on XP or 2000, I'll have to do that on the laptop ;)
BrewKnowC
09-11-2003, 04:46 PM
Hi, I am reviving this thread because I am having an awful time trying to reach my minimum target system (200 Mhz - 32mb ram). I tried as suggested and used the queryPerformanceCounter and found that the problem lies within my 'renderWorld' and 'renderHUD'. (the renderWorld renders all the tiles and objects)
The 'renderWorld' function used 80-84% of the ticks of each frame and the 'renderHUD' used 15-18% of the ticks. So I decided to pull apart the 'renderWorld' function and found:
basetiles: 32%
low z-order object tiles: 15%
high z-order object tiles: 64%
I'm using the examples from the level 1 of my game which contains:
77 (40x40) basetiles
2 (40x80) low order object tiles
83 (40x80) high order object tiles
The video mode is 16bit 640x480. The above numbers DO NOT include any alpha blending or other features and I am using the DD blt funtion for all draws.
Am I unreasonable to think that its possible to get this to run on my target minimum pc? The slowest computer I can get this to run on correctly is a 350-400Mhz computer, which IMHO is way too high for the type of game it is.
If you have yet to see that game and think it would help to play the game in order to find a solution, you can download the beta here: Trials of Werlin beta (http://www.bantamcity.com/files/ToWBetaInstall.exe)
I also looked into crispie_critter's answer to this problem about the palletes but I was unable to understand how to use a pallete with 16 bit surfaces. Does anyone else understand his post and can better explain how to do this? *shrug*
Thanks
-Bruno
Pyabo
09-11-2003, 07:22 PM
You should not have any problem doing this on a very low-end machine. From the amount of graphics you've listed, you should be able to get everything into video memory with no trouble... is this not the case?
MirekCz
09-11-2003, 10:24 PM
brew:this is quite high videmode res and you can have problems to make it work fine on p200.
if you look for similar games, Diablo used 640x480x8bpp and worked quite fine on p100... and p200 doesn't have double the data transfer.
1.Are you using RLE for your Z-ordered bitmaps?
2.Are you using specialized function to draw basetiles? (it's pretty straightforward, could be done with RLE, but obviously having a separate function which knows how to render those will be a big plus)
3.Are you using MMX if possible? this could give you some speed advantage, and from p200 all cpus have got mmx.
4... if you expect your users to have gfx card with few spare mb of mem (like 4), then you can transfer part of work to graphic card.
The obvious solution would be to draw HUD on graphic card and the rest on cpu
5.You can go down to 640x400...
PS.hmm, are all those high order object tiles drawn? 83 looks like quite a lot of them?!? actually much more then basetiles.. you might do an overkill with them.
PPS.Please also write what fps rate you get and what fps rate you expect...
Pyabo
09-11-2003, 10:33 PM
I don't see how RLE sprites are going to help here... the FASTEST possible way to display graphics is to have the video card doing the work for you. That means holding all your graphics in VRAM as raw bitmaps and then calling DDraw's Blit function so that the blitting is done by the graphics hardware.
You might need a 4mb video card instead of 2mb... is that seriously going to affect your target audience?
MirekCz
09-12-2003, 03:13 AM
Pyabo:there are some problems...
1.In P200 times 1/2mb vidcards were quite popular...
2.No alpha-blending like effects... and that's the reason to use 16bit color mode for me :-) Without alpha-blending 8bpp mode (like diablo) is fine...
BrewKnowC
09-12-2003, 05:10 AM
Originally posted by MirekCz
brew:this is quite high videmode res and you can have problems to make it work fine on p200.
if you look for similar games, Diablo used 640x480x8bpp and worked quite fine on p100... and p200 doesn't have double the data transfer.
1.Are you using RLE for your Z-ordered bitmaps?
2.Are you using specialized function to draw basetiles? (it's pretty straightforward, could be done with RLE, but obviously having a separate function which knows how to render those will be a big plus)
3.Are you using MMX if possible? this could give you some speed advantage, and from p200 all cpus have got mmx.
4... if you expect your users to have gfx card with few spare mb of mem (like 4), then you can transfer part of work to graphic card.
The obvious solution would be to draw HUD on graphic card and the rest on cpu
5.You can go down to 640x400...
PS.hmm, are all those high order object tiles drawn? 83 looks like quite a lot of them?!? actually much more then basetiles.. you might do an overkill with them.
PPS.Please also write what fps rate you get and what fps rate you expect...
MirekCz - Thank you for the thorough post. I will try to clarify a few things about my situation:
1. This is my first real game and I don't know anything about RLE yet. What would this entail in converting my current routines?
2. I'm not sure what you mean by 'specialized functions'. The only thing my routine does is make sure there is no base tile under objects that completely cover the basetile (ex. Walls). Which is why I'm rendering more 'high-order' objects than base tiles. The more walls there are per level, the less basetiles that are rendered.
3. I'm not using MMX because I'm using the DD blitter for all my drawing. (by MMX i'm assuming I would need to write my own blitter?)
4. This sounds like a good idea. Maybe the HUD and basetiles (being drawn alot) can go in VRAM (which would take less than 1MB.)
I'm using 16bit mode because I WILL have alpha blending (but the numbers I gave in my previous post DID NOT include any). I just thought it would be easier to narrow down the problem if I didn't include any alpha-blending for the moment, so that I was certain that all blitting was done with the DD blitter.
I'm trying to reach 30fps and am getting about 10fps on a test 200Mhz machine. One thing about this machine that bothers me is that the video ram is not true (its shared) so this maybe why I did not see a speed increase when switching some stuff into VRAM earlier in my trials.
Thanks
-Bruno
ggambett
09-12-2003, 05:38 AM
The obvious question : you aren't drawing everything in each frame, right?
BrewKnowC
09-12-2003, 09:10 AM
ggambet - Yeah, i kinda am drawing everything each frame. :( Is there a better way? (I'm not trying to sound sarcastic, I really don't know) I have alot of animated sprites and alot of sprites that actually move accross the map.
ggambett
09-12-2003, 09:27 AM
Originally posted by BrewKnowC
ggambet - Yeah, i kinda am drawing everything each frame. :( Is there a better way? (I'm not trying to sound sarcastic, I really don't know) I have alot of animated sprites and alot of sprites that actually move accross the map.
Yes, of course. Draw only what changes. I assume you're using a back buffer. When a sprite moves, you should redraw only what was beneath it, and draw the sprite in the same position.
That's the basic idea, which can be vastly improved. Maintain a list of areas that need redraw - this is the famous dirty rectangle list. After the update cycle and before the buffer flip, redraw anything that intersects those rectangles.
It can get tricky if you have lots of small rectangles, or big overlapping rectangles. In the end, I found a good compromise in dividing the screen in 64x64 "patches" and using these as the unit for being dirty or not.
BrewKnowC
09-12-2003, 06:20 PM
ggambet - Thanks! After reading your post, I researched more on the dirty rectangles method (i remebered reading about it in the past and thinking it was not useful with todays fast computers... this was before i started my game targetted towards slower machines). I believe this idea coupled with using bltfast instead of blt will help me to achieve my goal! :)
Thanks alot ggambet, programmer_ted, and others. I will let you know if I reach my target pc goal.
-Bruno
Try the beta!!
Trials of Werlin beta (http://www.bantamcity.com/files/ToWBetaInstall.exe)
koder
09-12-2003, 06:36 PM
BrewKnowC - Where did you find the information on dirty rectangles? I'm having similiar problems, and after following this thread, I think I need to check into dirty rectangles also. I've used them in Windows GDI games but I am new to DX. I didn't think they would be necessary.
BrewKnowC
09-13-2003, 04:36 AM
koder - Here are a few links with source code. I was having trouble finding any theory online, but I'm sure its out there somewhere.
link 1 (http://agdn.netfirms.com/main/html/dirtyRect.htm)
link 2 (http://www.codeguru.com/multimedia/flickerfree2d.html)
The resource I used the most was a book I already owned... here's the link from amazon:
book (http://www.amazon.com/exec/obidos/tg/detail/-/0761530894/104-5294090-3688716?v=glance)
The book is pretty good if you are making a tile/isometric game, and not bad otherwise. Hope this helps...
-Bruno
If you haven't tried the beta, you better get hoppin (http://www.bantamcity.com/files/ToWBetaInstall.exe)
programmer_ted
09-13-2003, 10:58 AM
Glad you figured out your problem (or mostly anyway :P) - sorry for not replying sooner. If there's anything I can help with, gimme a hollar!
MirekCz
09-14-2003, 02:35 PM
ok, here's my answer...
first of all, using a system with shared vidmem is rather a bad idea.. it might give you all kind of strange info and unless this is your primary target audience I would rather use a normal PC with separate vidcard... althrought having a shared solution around might be good for additional testing.
---------------------------------------------------------------------------------
About your answers:
1. This is my first real game and I don't know anything about RLE yet. What would this entail in converting my current routines?
2. I'm not sure what you mean by 'specialized functions'. The only thing my routine does is make sure there is no base tile under objects that completely cover the basetile (ex. Walls). Which is why I'm rendering more 'high-order' objects than base tiles. The more walls there are per level, the less basetiles that are rendered.
3. I'm not using MMX because I'm using the DD blitter for all my drawing. (by MMX i'm assuming I would need to write my own blitter?)
-------------------------------------------------------------------------------
your algorythm optimization is quite nice, good work.. but I mean something different as I'm talking about drawing specialized functions... read below.
DD blitter is far from perfect.... and you will have to code your own alpha-blending blitter anyway, which are only an addon to normal blitter (ie instead drawing pixels you alpha-blend them...)
About RLE.. it will be handy, it will give you a nice speedup... I would say AT LEAST 2x speedup from a normal transparent test function with colorkey... but for stuff like tiles, walls and characters i would expect an average of 3x speedup
Now you can use RLE compression to draw tiles (i assume we're talking diablo-like 3d iso tiles, so
/\
\/
drawing them colorkeyed is A LOT OF WORK.
You can use RLE compression and gain a lot of speed (i would say around 4xfaster then colorkeyed as this is very well suitable for RLE compression - only one transparent/non-transparent change per line and a lot of transparent pixels)
or you can use specialized function and gain a bit more
by specialized function i mean that you know how your tile looks.. lets say 32x16 , and you know the shape (ie first line is 3 pixels wide, second line 7pixels wide, third line 11pixels wide etc etc)
so you write a function which looks like:
(assuming we start at the left-up edge of 32x16 tile , first line is 3pixels width and second line is 7pixels width)
1.go 14 pixels right(32/2=16 , 16-2=14 , so we draw to middle-1, middle and middle+1 pixel)
2.draw 3 pixels from stream
3.go 1 line down and 5 pixels left (3 to go back 3 pixels drawn, 2 more to draw next line as 7pixels wide)
4.draw 7pixels from stream
etc.
so as stream you have got all used color pixels and as output you draw them in tile order... no need for any additional memory reads, if statements or anything... so it's decentlyt faster then RLE and lots faster then normal colorkeyed bitblt.
You could use this not only for tiles, but also for walls... either fixed height walls, or a bit smarter function with a loop that would allow any-height walls.. hope you get the idea.
The advantages of specialized function:
-fastest
-images take v.little memory both on disk (additional compression like gzip, etc easily possible)
as disadvantage you need to do some additional work...:
-write this function (piece of cake actually if you can handle usuall blitter)
-write a tool that will take as input a 32x16 image and as output give you a stream of used pixels (actually this can be also done a runtime, but you loose the advantage of taking less space on disk)
RLE.. I have written here or in other topic about it, so I won't repeat myself, just a short summary:
advantages:
-v.fast, much faster then colorkeyed function in 99% of cases, very often 3x+ faster
-takes less space on disk/in memory
-can be generally used to all kind of transparent images (normal, alpha-blended, etc)
disadvantages:
-a bit tricky to code (considerable harder then specialized function)
-needs a tool that will convert normal image to "RLE image"
I have got some simple RLE code around.. althrought it uses asm, but it will give you the idea...
MMX is of course done with asm, you can get pretty nice speedup using it, mainly when performing memory copy and alpha-blending...
-------------------------------------------------------------------------------
4. This sounds like a good idea. Maybe the HUD and basetiles (being drawn alot) can go in VRAM (which would take less than 1MB.)
-------------------------------------------------------------------------------
HUD, very good, but basetiles, not really...
reason is simple, if you use alpha-blending you need to know what's under the semi-transparent image... if basetiles are drawn in vidmem you couldn't possibly know about it in your code that performs alpha-blending...
one thing about HUD, if you go with something like Diablo (ie HUD is at the bottom, takes whole height), even with 640x480 mode you can easily allow yourself to make HUD for ex. 80pixels height and then when performing sysmem backbuffer->videomem backbuffer copy you only need to copy 640x400 area and the rest is covered with hardware-drawn HUD... so not only you don't need to draw it in sysmem, you also don't need to copy this part of screen to vidmem, nice save...
of course this can be also done when with a little hassle when HUD has got a different position/shape (like it's on the right side of screen), you would just have to modify your sysmem->vidmem copy function to for example copy 560x480 pixels and "jump over" the 80pixels on the right as they would never be used...
(I would still keep backbuffer size as 640x480, this way you can change HUD possition and/or reuse your code in other games without the need to rewrite all your blitting code.
So drop DD blitter and write your own blitters, it will take some effort, but if done right it will give you huge speedup
------------------------------------------------------------------------------
I'm trying to reach 30fps and am getting about 10fps on a test 200Mhz machine. One thing about this machine that bothers me is that the video ram is not true (its shared) so this maybe why I did not see a speed increase when switching some stuff into VRAM earlier in my trials.
-------------------------------------------------------------------------------
not sure if you will reach it, especially as a lot of other stuff goes into game (AI, sound, etc.)
But I would say that 20fps is a reasonable expectation when you get 10fps with unoptimized stuff... and 20fps should be good enought to play such a game well.
And about dirty-rectangles idea... you should carefully consider if you really want it.
It has got many potential pitfalls, as you might easily get into problems with order of rendering objects...
Another problem is, that when characters move there's a lot of area to redrawn.. if you're looking for many characters/objects(like players, enemies, fireballs, missiles, whatever) on screen at one time you might find out that at those all-important moments dirty rectangles are slower then redrawing everything...
If you want lots of objects on screen I would keep myself far from dirty rectangles.. they can provide you with better fps at times, but kill your fps at most critical moments when there's lots of action on screen... and it's the lowest fps value that matters most.
Hope it helps :)
ggambett
09-15-2003, 06:13 AM
write a tool that will take as input a 32x16 image and as output give you a stream of used pixels (actually this can be also done a runtime, but you loose the advantage of taking less space on disk)
Hmmm, this is not necessarily true. If you do lossless compression, yes, sure. But it can compress less than a lossy compression algorithm (ie JPEG). I think the best approach is having the sprites stored as JPEG color / 4-Bit BMP alpha in the disk (minimum size), and Alpha-RLE-compress them at load time (faster blits).
If you want lots of objects on screen I would keep myself far from dirty rectangles.. they can provide you with better fps at times, but kill your fps at most critical moments when there's lots of action on screen... and it's the lowest fps value that matters most.
True. That's why I ended up using "patches" of fixed size and position. There's no overdraw. At most some overhead for drawing 80 patches instead of one big rectangle.
MirekCz
09-15-2003, 07:26 AM
There are few problems with jpeg and similar compression:
1.you loose quality
2.it's unacceptable for sprites (when they are colorkeyed you're not able to recreate colors exactly after jpeg loosy compression, so weird artifacts appear near the transparent/non-transparent edge)
In general RLE compression or a stream like write of sprites/tiles is for me a better idea, as you don't lose quality and when using 8bpp palette mode and/or gzip/similar compression you will end up with similar results.
I have succesfully used such compression scheme in my game with very good results (final image was similar size to jpg, loseless compression)
About patches, yes you're right.. it could work good.. but in worst case scenario it will still bring small performace hit (ie the 80patches case)..
From my point of view such a system is quite complicated. I'm not sure if it's a good idea to spend time writing something like that if his engine works on his target platform (ie p200) with decent performace. I would first look at other ways (asm optimizing blitters) to gain some stable fps rate and implement such system only if game would run poorly with little action on screen on his target machine...
And there's an additional drawback - in diablo like engine with dynamic lighting you would end up with awfull results from such system... there's simply no way to use something like that with dynamic lighting...
ggambett
09-15-2003, 08:22 AM
1.you loose quality
2.it's unacceptable for sprites (when they are colorkeyed you're not able to recreate colors exactly after jpeg loosy compression, so weird artifacts appear near the transparent/non-transparent edge)
For point 1, I lose as much quality as I want, not more :) It's giving really satisfactory results in Betty's Beer Bar (http://www.mrio-software.com/info.php?id=bbb). For example, I can compress many animation frames very losely (sp?) because each one is seen 1/20 of a second.
For point 2, true, but I don't use colorkeys, but full alpha channels - stored separately in 4bpp, LZW compressed bmps, so no quality loss here.