GPU chip implementation on CPU - will it be faster?What was the first CPU with exposed pipeline?Video chip for hypothetical 1988 arcade gameWere video chips specific to the CPU?Will PC-DOS run faster on 4 or 8 core modern machines?Test emulated 8080 CPU without an OS?Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?How did 2-chip CPUs work?Why did Intel abandon unified CPU cache?Does the Intel 8085 CPU use real memory addresses?
Is there any practical application for performing a double Fourier transform? ...or an inverse Fourier transform on a time-domain input?
Heat reduction based on compression
Pythagorean triple with hypotenuse a power of 2
I don't have the theoretical background in my PhD topic. I can't justify getting the degree
Thank God it's Friday, tomorrow is THE weekend. Why the definite article?
Do they have Supervillain(s)?
What is the difference between Major and Minor Bug?
Numbers Decrease while Letters Increase
How many US airports have 4 or more parallel runways?
Architectural feasibility of a tiered circular stone keep
Why do all fields in a QFT transform like *irreducible* representations of some group?
Why do banks “park” their money at the European Central Bank?
How much authority do teachers get from *In Loco Parentis*?
Why are non-collision-resistant hash functions considered insecure for signing self-generated information
Does travel insurance for short flight delays exist?
What is the best type of paint to paint a shipping container?
Strange-looking FM transmitter circuit
What to say to a student who has failed?
What is this symbol: semicircles facing each other?
Is "The life is beautiful" incorrect or just very non-idiomatic?
Handling Disruptive Student on the Autistic Spectrum
Would it be possible to have a GMO that produces chocolate?
Why did MS-DOS applications built using Turbo Pascal fail to start with a division by zero error on faster systems?
Is there any way to keep a player from killing an NPC?
GPU chip implementation on CPU - will it be faster?
What was the first CPU with exposed pipeline?Video chip for hypothetical 1988 arcade gameWere video chips specific to the CPU?Will PC-DOS run faster on 4 or 8 core modern machines?Test emulated 8080 CPU without an OS?Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?How did 2-chip CPUs work?Why did Intel abandon unified CPU cache?Does the Intel 8085 CPU use real memory addresses?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
So, modern video cards include two devices - a video signal generator and a graphic accelerator.
The first of them converts data from video memory into MDA / CGA / VGA / RCA / DVI / HDMI / etc format.
The second is another co-processor (like mathematical co-processor), the purpose of which is to free the CPU from graphic calculations. This co-processor implements algorithms (as I understand it is not software but hardware) copying bits, drawing lines, scrolling, working with sprites, etc. It can implement acceleration of 2D or 3D calculations. Well, or both together.
Old computers, of course, had their own "video cards", or more precisely - video chips. If my searches are correct, not all of these "video cards" contained a video accelerator - that is, the poor CPU had to calculate the graphics completely by itself. And this includes computing graphics algorithms (Brezenham, etc.) and copying bytes. Least. For example, this is done in Atari 2600. This is a very slow way to work with graphics.
Sega Genesis, C64, Gameboy, Gameboy Color, Amiga's PC (just a couple of examples) had their own video chips for implementing graphics. Moreover, different platforms produced different graphics due to the different implementation of video chips.
So, since now I work with a homebrew for my pleasure, I can make design decisions that differ from those that were in old computers.
So, if in the system that I design, do not use video chips from old computers, but use another CPU. The program of which is independent of the main CPU and implements video algorithms - Bresenham's, scrolling, copying bytes, etc. These algorithms, of course, I should write. And the data exchange between CPU1 and CPU2 = GPU goes through a buffer. For example, CPU1 writes to the buffer that the GPU needs to draw a line from (x1, y1) to (x2, y2) or draw the desired sprite in the desired position. Well, this is an option, you can make an exchange in another way.
Will such a method be faster than classic old video chips?
For example, could Sega Genesis play heavier games if its video chip were replaced with a second Motorotola 68000 (8 Mhz) in the program of which video algorithms would be implemented?
(I understand that a direct replacement in the board will not work :) :)).
I mean, if the engineers made that decision at the design stage.)
video cpu
add a comment |
So, modern video cards include two devices - a video signal generator and a graphic accelerator.
The first of them converts data from video memory into MDA / CGA / VGA / RCA / DVI / HDMI / etc format.
The second is another co-processor (like mathematical co-processor), the purpose of which is to free the CPU from graphic calculations. This co-processor implements algorithms (as I understand it is not software but hardware) copying bits, drawing lines, scrolling, working with sprites, etc. It can implement acceleration of 2D or 3D calculations. Well, or both together.
Old computers, of course, had their own "video cards", or more precisely - video chips. If my searches are correct, not all of these "video cards" contained a video accelerator - that is, the poor CPU had to calculate the graphics completely by itself. And this includes computing graphics algorithms (Brezenham, etc.) and copying bytes. Least. For example, this is done in Atari 2600. This is a very slow way to work with graphics.
Sega Genesis, C64, Gameboy, Gameboy Color, Amiga's PC (just a couple of examples) had their own video chips for implementing graphics. Moreover, different platforms produced different graphics due to the different implementation of video chips.
So, since now I work with a homebrew for my pleasure, I can make design decisions that differ from those that were in old computers.
So, if in the system that I design, do not use video chips from old computers, but use another CPU. The program of which is independent of the main CPU and implements video algorithms - Bresenham's, scrolling, copying bytes, etc. These algorithms, of course, I should write. And the data exchange between CPU1 and CPU2 = GPU goes through a buffer. For example, CPU1 writes to the buffer that the GPU needs to draw a line from (x1, y1) to (x2, y2) or draw the desired sprite in the desired position. Well, this is an option, you can make an exchange in another way.
Will such a method be faster than classic old video chips?
For example, could Sega Genesis play heavier games if its video chip were replaced with a second Motorotola 68000 (8 Mhz) in the program of which video algorithms would be implemented?
(I understand that a direct replacement in the board will not work :) :)).
I mean, if the engineers made that decision at the design stage.)
video cpu
1
You don't want line drawing. It's almost useless. You want copy-zooming of rectangles on either rectangles (2D) or tetragons (3D).
– Janka
7 hours ago
2
Nice idea, reminds me of arcade boards making use of Z80 CPUs as a sound coprocessors.
– sgorozco
7 hours ago
FWIW the Atari 2600 is probably the worst example you could come up with for what you described there (which is often called a “dumb frame buffer” display). The Atari 2600 didn’t have a frame buffer. It barely had anything at all. The way you do video on an Atari 2600 is nothing less than hair-raising, white-knuckle inducing.
– Euro Micelli
6 hours ago
A GPU is just a CPU that's highly optimized for its task (drawing graphics). Just like a DSP is a CPU that's highly optimized for signal processing. So replacing a GPU with a CPU of approximately same complexity and clock rate will make drawing graphics slower, not faster.
– dirkt
2 hours ago
add a comment |
So, modern video cards include two devices - a video signal generator and a graphic accelerator.
The first of them converts data from video memory into MDA / CGA / VGA / RCA / DVI / HDMI / etc format.
The second is another co-processor (like mathematical co-processor), the purpose of which is to free the CPU from graphic calculations. This co-processor implements algorithms (as I understand it is not software but hardware) copying bits, drawing lines, scrolling, working with sprites, etc. It can implement acceleration of 2D or 3D calculations. Well, or both together.
Old computers, of course, had their own "video cards", or more precisely - video chips. If my searches are correct, not all of these "video cards" contained a video accelerator - that is, the poor CPU had to calculate the graphics completely by itself. And this includes computing graphics algorithms (Brezenham, etc.) and copying bytes. Least. For example, this is done in Atari 2600. This is a very slow way to work with graphics.
Sega Genesis, C64, Gameboy, Gameboy Color, Amiga's PC (just a couple of examples) had their own video chips for implementing graphics. Moreover, different platforms produced different graphics due to the different implementation of video chips.
So, since now I work with a homebrew for my pleasure, I can make design decisions that differ from those that were in old computers.
So, if in the system that I design, do not use video chips from old computers, but use another CPU. The program of which is independent of the main CPU and implements video algorithms - Bresenham's, scrolling, copying bytes, etc. These algorithms, of course, I should write. And the data exchange between CPU1 and CPU2 = GPU goes through a buffer. For example, CPU1 writes to the buffer that the GPU needs to draw a line from (x1, y1) to (x2, y2) or draw the desired sprite in the desired position. Well, this is an option, you can make an exchange in another way.
Will such a method be faster than classic old video chips?
For example, could Sega Genesis play heavier games if its video chip were replaced with a second Motorotola 68000 (8 Mhz) in the program of which video algorithms would be implemented?
(I understand that a direct replacement in the board will not work :) :)).
I mean, if the engineers made that decision at the design stage.)
video cpu
So, modern video cards include two devices - a video signal generator and a graphic accelerator.
The first of them converts data from video memory into MDA / CGA / VGA / RCA / DVI / HDMI / etc format.
The second is another co-processor (like mathematical co-processor), the purpose of which is to free the CPU from graphic calculations. This co-processor implements algorithms (as I understand it is not software but hardware) copying bits, drawing lines, scrolling, working with sprites, etc. It can implement acceleration of 2D or 3D calculations. Well, or both together.
Old computers, of course, had their own "video cards", or more precisely - video chips. If my searches are correct, not all of these "video cards" contained a video accelerator - that is, the poor CPU had to calculate the graphics completely by itself. And this includes computing graphics algorithms (Brezenham, etc.) and copying bytes. Least. For example, this is done in Atari 2600. This is a very slow way to work with graphics.
Sega Genesis, C64, Gameboy, Gameboy Color, Amiga's PC (just a couple of examples) had their own video chips for implementing graphics. Moreover, different platforms produced different graphics due to the different implementation of video chips.
So, since now I work with a homebrew for my pleasure, I can make design decisions that differ from those that were in old computers.
So, if in the system that I design, do not use video chips from old computers, but use another CPU. The program of which is independent of the main CPU and implements video algorithms - Bresenham's, scrolling, copying bytes, etc. These algorithms, of course, I should write. And the data exchange between CPU1 and CPU2 = GPU goes through a buffer. For example, CPU1 writes to the buffer that the GPU needs to draw a line from (x1, y1) to (x2, y2) or draw the desired sprite in the desired position. Well, this is an option, you can make an exchange in another way.
Will such a method be faster than classic old video chips?
For example, could Sega Genesis play heavier games if its video chip were replaced with a second Motorotola 68000 (8 Mhz) in the program of which video algorithms would be implemented?
(I understand that a direct replacement in the board will not work :) :)).
I mean, if the engineers made that decision at the design stage.)
video cpu
video cpu
asked 8 hours ago
AlexAlex
1076 bronze badges
1076 bronze badges
1
You don't want line drawing. It's almost useless. You want copy-zooming of rectangles on either rectangles (2D) or tetragons (3D).
– Janka
7 hours ago
2
Nice idea, reminds me of arcade boards making use of Z80 CPUs as a sound coprocessors.
– sgorozco
7 hours ago
FWIW the Atari 2600 is probably the worst example you could come up with for what you described there (which is often called a “dumb frame buffer” display). The Atari 2600 didn’t have a frame buffer. It barely had anything at all. The way you do video on an Atari 2600 is nothing less than hair-raising, white-knuckle inducing.
– Euro Micelli
6 hours ago
A GPU is just a CPU that's highly optimized for its task (drawing graphics). Just like a DSP is a CPU that's highly optimized for signal processing. So replacing a GPU with a CPU of approximately same complexity and clock rate will make drawing graphics slower, not faster.
– dirkt
2 hours ago
add a comment |
1
You don't want line drawing. It's almost useless. You want copy-zooming of rectangles on either rectangles (2D) or tetragons (3D).
– Janka
7 hours ago
2
Nice idea, reminds me of arcade boards making use of Z80 CPUs as a sound coprocessors.
– sgorozco
7 hours ago
FWIW the Atari 2600 is probably the worst example you could come up with for what you described there (which is often called a “dumb frame buffer” display). The Atari 2600 didn’t have a frame buffer. It barely had anything at all. The way you do video on an Atari 2600 is nothing less than hair-raising, white-knuckle inducing.
– Euro Micelli
6 hours ago
A GPU is just a CPU that's highly optimized for its task (drawing graphics). Just like a DSP is a CPU that's highly optimized for signal processing. So replacing a GPU with a CPU of approximately same complexity and clock rate will make drawing graphics slower, not faster.
– dirkt
2 hours ago
1
1
You don't want line drawing. It's almost useless. You want copy-zooming of rectangles on either rectangles (2D) or tetragons (3D).
– Janka
7 hours ago
You don't want line drawing. It's almost useless. You want copy-zooming of rectangles on either rectangles (2D) or tetragons (3D).
– Janka
7 hours ago
2
2
Nice idea, reminds me of arcade boards making use of Z80 CPUs as a sound coprocessors.
– sgorozco
7 hours ago
Nice idea, reminds me of arcade boards making use of Z80 CPUs as a sound coprocessors.
– sgorozco
7 hours ago
FWIW the Atari 2600 is probably the worst example you could come up with for what you described there (which is often called a “dumb frame buffer” display). The Atari 2600 didn’t have a frame buffer. It barely had anything at all. The way you do video on an Atari 2600 is nothing less than hair-raising, white-knuckle inducing.
– Euro Micelli
6 hours ago
FWIW the Atari 2600 is probably the worst example you could come up with for what you described there (which is often called a “dumb frame buffer” display). The Atari 2600 didn’t have a frame buffer. It barely had anything at all. The way you do video on an Atari 2600 is nothing less than hair-raising, white-knuckle inducing.
– Euro Micelli
6 hours ago
A GPU is just a CPU that's highly optimized for its task (drawing graphics). Just like a DSP is a CPU that's highly optimized for signal processing. So replacing a GPU with a CPU of approximately same complexity and clock rate will make drawing graphics slower, not faster.
– dirkt
2 hours ago
A GPU is just a CPU that's highly optimized for its task (drawing graphics). Just like a DSP is a CPU that's highly optimized for signal processing. So replacing a GPU with a CPU of approximately same complexity and clock rate will make drawing graphics slower, not faster.
– dirkt
2 hours ago
add a comment |
2 Answers
2
active
oldest
votes
As you surmise, one way of dividing up graphic image generation is
into two parts: building a bitmap in memory (with or without hardware
assistance) and then converting the bitmap into a video signal. And
that is how modern graphics cards do this.
Some older systems also did this, but as well there were three other
common methods of dividing up the work.
Instead of having a bitmap in memory, keep a "block map" that
mapped shorter codes to small bitmaps, and use these codes to
generate the screen image on the fly. The most common use would be
for text screens, usually mapping a single 8-bit value to a 64-bit
or larger bitmap, letting e.g. a 960 byte array of memory map 40x24
character cells to a 240x192 screen image which would otherwise
have needed 5760 bytes for a bitmap, requiring both more memory and
more processing time if the CPU was itself filling that RAM with
appropriate patterns for the characters.Text-based systems would usually have the characters/blocks in ROM;
on some systems these were fixed, though usually there would be
characters containing specific subpatterns that could be used to
generate "block graphics."Systems supporting graphical (particularly gaming) applications
would often allow the programmer to specify a set of custom block
patterns in RAM.Systems with a block map mode might offer only that, as the
Commodore PET and Sinclair ZX80, did. (The Z80 also offered the
ability to have a shorter map that left out blank characters on the
screen, so if you were displaying only a dozen short lines, your
map would be smaller than if you had to display a full screen of
text.) Or they might offer a choice of modes, such as the Apple II
with a black-and-white character block mode, a 16-colour "low-res
graphics" block mode, and a 2-4 colour bitmapped "high-res"
graphics mode.Avoid having a screen bitmap at all, and instead directly generate
data that is eventually turned into the video signal. This is how
the Atari 2600 worked: the CPU would load registers on the TIA
that specified information about what was to appear on the current
scan line, wait until the TIA had finished sending that scan line
to the video output, reload the registers, and continue on in this
way until a full video frame had been written. This has the
advantage of needing no RAM to hold a screen bitmap; the Atari 2600
had only 128 bytes of RAM in total.Have a combination of the two, using either bitmap or block mode
for the "background" of the image, but additionally generating
further image information from other data on top of that. Often the
extra image data generated on top of the background was a lot more
sophisticated than the Atari 2600 example above; typically you'd
have 8x8, 8x16 or 16x16 bitmaps called "sprites" that would be
placed at arbitrary locations on the screen, and as well as being
used for display the positions could be checked to see if they had
collided and so on.
So these are the types of systems you'll be comparing with your idea
of using a coprocessor to generate graphics. You'll note that these
systems can have certain limitations on what kind of graphics they can
generate: for example, a block map system will not be able to generate
certain kinds of images that a bitmap system can generate because
typically there will not be enough different blocks to give each block
on the screen its own individual pattern. This can be worked around by
simplifying the image being generated, and this might also produce an
increase in speed of generation, but it would be up to you to decide
whether the simplified result is still acceptable for your
application.
What speed all comes down to in the end, as you compare these systems,
is how much work the CPU needs to do to generate the image. The less
data you need the CPU to move, the faster it generally works. Assuming
that your video coprocessor has full access to all of memory it should
be able to emulate any of the systems above and thus be just as fast.
If you want it to be faster, you'd need to find ways to let the CPU
send the same "instructions" to the graphics system using less memory
movement. One example might be to be able to specify vectors that the
coprocessor could render, or even go as far as a full 3D rendering
system like modern graphics cards use. These are actually a
combination of hardware and software; the developer writes small
programs ("shaders") in a special langauge that are sent to the GPU to
execute.
Here are some further resources you may find useful when thinking
about how you want to design your graphics coprocessor.
The Vulcan-74 is a mostly 7400-based graphics processer that lets
a 6502 do Amiga-level graphics. It's built with mainly 7400 series
parts (except for its 12.5 MB of memory, for obvious reasons) on
breadboards, and is full of good ideas for both the software
CPU/coprocessor interface and how to actually build something like
this. See also its discussion thread on forums.6502.org.The "Pixel Processing Unit" section of The Ultimate Game Boy
Talk video provides a nice overview of the GameBoy's block
map (here called "tile") plus sprites system, including information
about smooth scrolling and screen overlays. This is heavily
optimized for certain kinds of games, and gives you an idea of what
you need to be able to compete with if you're trying to be as fast
or faster. Take careful note of the particular structures and
features they offer to let the CPU specify images with minimal
processing.
add a comment |
For example, could Sega Genesis play heavier games if its video chip
were replaced with a second Motorotola 68000 (8 Mhz) in the program of
which video algorithms would be implemented?
No, it would be slower - much slower. The Yamaha YM7101 VDP can display 80 32x32 pixel sprites (20 per scanline), over 2 tile maps which can be freely scrolled vertically and horizontally. To do that completely in software would require many logical operations and memory accesses per pixel.
An 8MHz 68000 can barely execute 2 instructions per microsecond, and most memory operations take several microseconds. Creating those VDP functions in software would be many times slower, even with the help of a 'dumb' frame buffer. If the CPU had to do it all then it would be maxed out just trying to produce a static display, let alone scrolling and rendering sprites over it.
A CPU executes general purpose instructions, which gives it a lot of flexibility but slows it down relative to a video display 'processor' whose job is to simply push pixels out from a frame buffer. The VDP doesn't execute instructions, it has DMA buffers controlled by hardware counters that are synchronized to the video frame. Once the control registers have been set up it just continuously reads video memory and outputs pixels.
The Amiga's custom chipset is an example of a video subsystem that does have a processor, called the 'Copper' (short for co-processor). It only has 3 instructions - Move, Skip, and Wait, all of which are 32 bits long. Move loads an immediate value into any custom chip register, Wait waits until the beam counters reach a particular horizontal and vertical position on the screen, and Skip jumps over the next instruction if past a particular position. This is about as RISC as you can get, but it still isn't fast enough to produce a full-resolution bitmap display directly.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "648"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f12138%2fgpu-chip-implementation-on-cpu-will-it-be-faster%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
As you surmise, one way of dividing up graphic image generation is
into two parts: building a bitmap in memory (with or without hardware
assistance) and then converting the bitmap into a video signal. And
that is how modern graphics cards do this.
Some older systems also did this, but as well there were three other
common methods of dividing up the work.
Instead of having a bitmap in memory, keep a "block map" that
mapped shorter codes to small bitmaps, and use these codes to
generate the screen image on the fly. The most common use would be
for text screens, usually mapping a single 8-bit value to a 64-bit
or larger bitmap, letting e.g. a 960 byte array of memory map 40x24
character cells to a 240x192 screen image which would otherwise
have needed 5760 bytes for a bitmap, requiring both more memory and
more processing time if the CPU was itself filling that RAM with
appropriate patterns for the characters.Text-based systems would usually have the characters/blocks in ROM;
on some systems these were fixed, though usually there would be
characters containing specific subpatterns that could be used to
generate "block graphics."Systems supporting graphical (particularly gaming) applications
would often allow the programmer to specify a set of custom block
patterns in RAM.Systems with a block map mode might offer only that, as the
Commodore PET and Sinclair ZX80, did. (The Z80 also offered the
ability to have a shorter map that left out blank characters on the
screen, so if you were displaying only a dozen short lines, your
map would be smaller than if you had to display a full screen of
text.) Or they might offer a choice of modes, such as the Apple II
with a black-and-white character block mode, a 16-colour "low-res
graphics" block mode, and a 2-4 colour bitmapped "high-res"
graphics mode.Avoid having a screen bitmap at all, and instead directly generate
data that is eventually turned into the video signal. This is how
the Atari 2600 worked: the CPU would load registers on the TIA
that specified information about what was to appear on the current
scan line, wait until the TIA had finished sending that scan line
to the video output, reload the registers, and continue on in this
way until a full video frame had been written. This has the
advantage of needing no RAM to hold a screen bitmap; the Atari 2600
had only 128 bytes of RAM in total.Have a combination of the two, using either bitmap or block mode
for the "background" of the image, but additionally generating
further image information from other data on top of that. Often the
extra image data generated on top of the background was a lot more
sophisticated than the Atari 2600 example above; typically you'd
have 8x8, 8x16 or 16x16 bitmaps called "sprites" that would be
placed at arbitrary locations on the screen, and as well as being
used for display the positions could be checked to see if they had
collided and so on.
So these are the types of systems you'll be comparing with your idea
of using a coprocessor to generate graphics. You'll note that these
systems can have certain limitations on what kind of graphics they can
generate: for example, a block map system will not be able to generate
certain kinds of images that a bitmap system can generate because
typically there will not be enough different blocks to give each block
on the screen its own individual pattern. This can be worked around by
simplifying the image being generated, and this might also produce an
increase in speed of generation, but it would be up to you to decide
whether the simplified result is still acceptable for your
application.
What speed all comes down to in the end, as you compare these systems,
is how much work the CPU needs to do to generate the image. The less
data you need the CPU to move, the faster it generally works. Assuming
that your video coprocessor has full access to all of memory it should
be able to emulate any of the systems above and thus be just as fast.
If you want it to be faster, you'd need to find ways to let the CPU
send the same "instructions" to the graphics system using less memory
movement. One example might be to be able to specify vectors that the
coprocessor could render, or even go as far as a full 3D rendering
system like modern graphics cards use. These are actually a
combination of hardware and software; the developer writes small
programs ("shaders") in a special langauge that are sent to the GPU to
execute.
Here are some further resources you may find useful when thinking
about how you want to design your graphics coprocessor.
The Vulcan-74 is a mostly 7400-based graphics processer that lets
a 6502 do Amiga-level graphics. It's built with mainly 7400 series
parts (except for its 12.5 MB of memory, for obvious reasons) on
breadboards, and is full of good ideas for both the software
CPU/coprocessor interface and how to actually build something like
this. See also its discussion thread on forums.6502.org.The "Pixel Processing Unit" section of The Ultimate Game Boy
Talk video provides a nice overview of the GameBoy's block
map (here called "tile") plus sprites system, including information
about smooth scrolling and screen overlays. This is heavily
optimized for certain kinds of games, and gives you an idea of what
you need to be able to compete with if you're trying to be as fast
or faster. Take careful note of the particular structures and
features they offer to let the CPU specify images with minimal
processing.
add a comment |
As you surmise, one way of dividing up graphic image generation is
into two parts: building a bitmap in memory (with or without hardware
assistance) and then converting the bitmap into a video signal. And
that is how modern graphics cards do this.
Some older systems also did this, but as well there were three other
common methods of dividing up the work.
Instead of having a bitmap in memory, keep a "block map" that
mapped shorter codes to small bitmaps, and use these codes to
generate the screen image on the fly. The most common use would be
for text screens, usually mapping a single 8-bit value to a 64-bit
or larger bitmap, letting e.g. a 960 byte array of memory map 40x24
character cells to a 240x192 screen image which would otherwise
have needed 5760 bytes for a bitmap, requiring both more memory and
more processing time if the CPU was itself filling that RAM with
appropriate patterns for the characters.Text-based systems would usually have the characters/blocks in ROM;
on some systems these were fixed, though usually there would be
characters containing specific subpatterns that could be used to
generate "block graphics."Systems supporting graphical (particularly gaming) applications
would often allow the programmer to specify a set of custom block
patterns in RAM.Systems with a block map mode might offer only that, as the
Commodore PET and Sinclair ZX80, did. (The Z80 also offered the
ability to have a shorter map that left out blank characters on the
screen, so if you were displaying only a dozen short lines, your
map would be smaller than if you had to display a full screen of
text.) Or they might offer a choice of modes, such as the Apple II
with a black-and-white character block mode, a 16-colour "low-res
graphics" block mode, and a 2-4 colour bitmapped "high-res"
graphics mode.Avoid having a screen bitmap at all, and instead directly generate
data that is eventually turned into the video signal. This is how
the Atari 2600 worked: the CPU would load registers on the TIA
that specified information about what was to appear on the current
scan line, wait until the TIA had finished sending that scan line
to the video output, reload the registers, and continue on in this
way until a full video frame had been written. This has the
advantage of needing no RAM to hold a screen bitmap; the Atari 2600
had only 128 bytes of RAM in total.Have a combination of the two, using either bitmap or block mode
for the "background" of the image, but additionally generating
further image information from other data on top of that. Often the
extra image data generated on top of the background was a lot more
sophisticated than the Atari 2600 example above; typically you'd
have 8x8, 8x16 or 16x16 bitmaps called "sprites" that would be
placed at arbitrary locations on the screen, and as well as being
used for display the positions could be checked to see if they had
collided and so on.
So these are the types of systems you'll be comparing with your idea
of using a coprocessor to generate graphics. You'll note that these
systems can have certain limitations on what kind of graphics they can
generate: for example, a block map system will not be able to generate
certain kinds of images that a bitmap system can generate because
typically there will not be enough different blocks to give each block
on the screen its own individual pattern. This can be worked around by
simplifying the image being generated, and this might also produce an
increase in speed of generation, but it would be up to you to decide
whether the simplified result is still acceptable for your
application.
What speed all comes down to in the end, as you compare these systems,
is how much work the CPU needs to do to generate the image. The less
data you need the CPU to move, the faster it generally works. Assuming
that your video coprocessor has full access to all of memory it should
be able to emulate any of the systems above and thus be just as fast.
If you want it to be faster, you'd need to find ways to let the CPU
send the same "instructions" to the graphics system using less memory
movement. One example might be to be able to specify vectors that the
coprocessor could render, or even go as far as a full 3D rendering
system like modern graphics cards use. These are actually a
combination of hardware and software; the developer writes small
programs ("shaders") in a special langauge that are sent to the GPU to
execute.
Here are some further resources you may find useful when thinking
about how you want to design your graphics coprocessor.
The Vulcan-74 is a mostly 7400-based graphics processer that lets
a 6502 do Amiga-level graphics. It's built with mainly 7400 series
parts (except for its 12.5 MB of memory, for obvious reasons) on
breadboards, and is full of good ideas for both the software
CPU/coprocessor interface and how to actually build something like
this. See also its discussion thread on forums.6502.org.The "Pixel Processing Unit" section of The Ultimate Game Boy
Talk video provides a nice overview of the GameBoy's block
map (here called "tile") plus sprites system, including information
about smooth scrolling and screen overlays. This is heavily
optimized for certain kinds of games, and gives you an idea of what
you need to be able to compete with if you're trying to be as fast
or faster. Take careful note of the particular structures and
features they offer to let the CPU specify images with minimal
processing.
add a comment |
As you surmise, one way of dividing up graphic image generation is
into two parts: building a bitmap in memory (with or without hardware
assistance) and then converting the bitmap into a video signal. And
that is how modern graphics cards do this.
Some older systems also did this, but as well there were three other
common methods of dividing up the work.
Instead of having a bitmap in memory, keep a "block map" that
mapped shorter codes to small bitmaps, and use these codes to
generate the screen image on the fly. The most common use would be
for text screens, usually mapping a single 8-bit value to a 64-bit
or larger bitmap, letting e.g. a 960 byte array of memory map 40x24
character cells to a 240x192 screen image which would otherwise
have needed 5760 bytes for a bitmap, requiring both more memory and
more processing time if the CPU was itself filling that RAM with
appropriate patterns for the characters.Text-based systems would usually have the characters/blocks in ROM;
on some systems these were fixed, though usually there would be
characters containing specific subpatterns that could be used to
generate "block graphics."Systems supporting graphical (particularly gaming) applications
would often allow the programmer to specify a set of custom block
patterns in RAM.Systems with a block map mode might offer only that, as the
Commodore PET and Sinclair ZX80, did. (The Z80 also offered the
ability to have a shorter map that left out blank characters on the
screen, so if you were displaying only a dozen short lines, your
map would be smaller than if you had to display a full screen of
text.) Or they might offer a choice of modes, such as the Apple II
with a black-and-white character block mode, a 16-colour "low-res
graphics" block mode, and a 2-4 colour bitmapped "high-res"
graphics mode.Avoid having a screen bitmap at all, and instead directly generate
data that is eventually turned into the video signal. This is how
the Atari 2600 worked: the CPU would load registers on the TIA
that specified information about what was to appear on the current
scan line, wait until the TIA had finished sending that scan line
to the video output, reload the registers, and continue on in this
way until a full video frame had been written. This has the
advantage of needing no RAM to hold a screen bitmap; the Atari 2600
had only 128 bytes of RAM in total.Have a combination of the two, using either bitmap or block mode
for the "background" of the image, but additionally generating
further image information from other data on top of that. Often the
extra image data generated on top of the background was a lot more
sophisticated than the Atari 2600 example above; typically you'd
have 8x8, 8x16 or 16x16 bitmaps called "sprites" that would be
placed at arbitrary locations on the screen, and as well as being
used for display the positions could be checked to see if they had
collided and so on.
So these are the types of systems you'll be comparing with your idea
of using a coprocessor to generate graphics. You'll note that these
systems can have certain limitations on what kind of graphics they can
generate: for example, a block map system will not be able to generate
certain kinds of images that a bitmap system can generate because
typically there will not be enough different blocks to give each block
on the screen its own individual pattern. This can be worked around by
simplifying the image being generated, and this might also produce an
increase in speed of generation, but it would be up to you to decide
whether the simplified result is still acceptable for your
application.
What speed all comes down to in the end, as you compare these systems,
is how much work the CPU needs to do to generate the image. The less
data you need the CPU to move, the faster it generally works. Assuming
that your video coprocessor has full access to all of memory it should
be able to emulate any of the systems above and thus be just as fast.
If you want it to be faster, you'd need to find ways to let the CPU
send the same "instructions" to the graphics system using less memory
movement. One example might be to be able to specify vectors that the
coprocessor could render, or even go as far as a full 3D rendering
system like modern graphics cards use. These are actually a
combination of hardware and software; the developer writes small
programs ("shaders") in a special langauge that are sent to the GPU to
execute.
Here are some further resources you may find useful when thinking
about how you want to design your graphics coprocessor.
The Vulcan-74 is a mostly 7400-based graphics processer that lets
a 6502 do Amiga-level graphics. It's built with mainly 7400 series
parts (except for its 12.5 MB of memory, for obvious reasons) on
breadboards, and is full of good ideas for both the software
CPU/coprocessor interface and how to actually build something like
this. See also its discussion thread on forums.6502.org.The "Pixel Processing Unit" section of The Ultimate Game Boy
Talk video provides a nice overview of the GameBoy's block
map (here called "tile") plus sprites system, including information
about smooth scrolling and screen overlays. This is heavily
optimized for certain kinds of games, and gives you an idea of what
you need to be able to compete with if you're trying to be as fast
or faster. Take careful note of the particular structures and
features they offer to let the CPU specify images with minimal
processing.
As you surmise, one way of dividing up graphic image generation is
into two parts: building a bitmap in memory (with or without hardware
assistance) and then converting the bitmap into a video signal. And
that is how modern graphics cards do this.
Some older systems also did this, but as well there were three other
common methods of dividing up the work.
Instead of having a bitmap in memory, keep a "block map" that
mapped shorter codes to small bitmaps, and use these codes to
generate the screen image on the fly. The most common use would be
for text screens, usually mapping a single 8-bit value to a 64-bit
or larger bitmap, letting e.g. a 960 byte array of memory map 40x24
character cells to a 240x192 screen image which would otherwise
have needed 5760 bytes for a bitmap, requiring both more memory and
more processing time if the CPU was itself filling that RAM with
appropriate patterns for the characters.Text-based systems would usually have the characters/blocks in ROM;
on some systems these were fixed, though usually there would be
characters containing specific subpatterns that could be used to
generate "block graphics."Systems supporting graphical (particularly gaming) applications
would often allow the programmer to specify a set of custom block
patterns in RAM.Systems with a block map mode might offer only that, as the
Commodore PET and Sinclair ZX80, did. (The Z80 also offered the
ability to have a shorter map that left out blank characters on the
screen, so if you were displaying only a dozen short lines, your
map would be smaller than if you had to display a full screen of
text.) Or they might offer a choice of modes, such as the Apple II
with a black-and-white character block mode, a 16-colour "low-res
graphics" block mode, and a 2-4 colour bitmapped "high-res"
graphics mode.Avoid having a screen bitmap at all, and instead directly generate
data that is eventually turned into the video signal. This is how
the Atari 2600 worked: the CPU would load registers on the TIA
that specified information about what was to appear on the current
scan line, wait until the TIA had finished sending that scan line
to the video output, reload the registers, and continue on in this
way until a full video frame had been written. This has the
advantage of needing no RAM to hold a screen bitmap; the Atari 2600
had only 128 bytes of RAM in total.Have a combination of the two, using either bitmap or block mode
for the "background" of the image, but additionally generating
further image information from other data on top of that. Often the
extra image data generated on top of the background was a lot more
sophisticated than the Atari 2600 example above; typically you'd
have 8x8, 8x16 or 16x16 bitmaps called "sprites" that would be
placed at arbitrary locations on the screen, and as well as being
used for display the positions could be checked to see if they had
collided and so on.
So these are the types of systems you'll be comparing with your idea
of using a coprocessor to generate graphics. You'll note that these
systems can have certain limitations on what kind of graphics they can
generate: for example, a block map system will not be able to generate
certain kinds of images that a bitmap system can generate because
typically there will not be enough different blocks to give each block
on the screen its own individual pattern. This can be worked around by
simplifying the image being generated, and this might also produce an
increase in speed of generation, but it would be up to you to decide
whether the simplified result is still acceptable for your
application.
What speed all comes down to in the end, as you compare these systems,
is how much work the CPU needs to do to generate the image. The less
data you need the CPU to move, the faster it generally works. Assuming
that your video coprocessor has full access to all of memory it should
be able to emulate any of the systems above and thus be just as fast.
If you want it to be faster, you'd need to find ways to let the CPU
send the same "instructions" to the graphics system using less memory
movement. One example might be to be able to specify vectors that the
coprocessor could render, or even go as far as a full 3D rendering
system like modern graphics cards use. These are actually a
combination of hardware and software; the developer writes small
programs ("shaders") in a special langauge that are sent to the GPU to
execute.
Here are some further resources you may find useful when thinking
about how you want to design your graphics coprocessor.
The Vulcan-74 is a mostly 7400-based graphics processer that lets
a 6502 do Amiga-level graphics. It's built with mainly 7400 series
parts (except for its 12.5 MB of memory, for obvious reasons) on
breadboards, and is full of good ideas for both the software
CPU/coprocessor interface and how to actually build something like
this. See also its discussion thread on forums.6502.org.The "Pixel Processing Unit" section of The Ultimate Game Boy
Talk video provides a nice overview of the GameBoy's block
map (here called "tile") plus sprites system, including information
about smooth scrolling and screen overlays. This is heavily
optimized for certain kinds of games, and gives you an idea of what
you need to be able to compete with if you're trying to be as fast
or faster. Take careful note of the particular structures and
features they offer to let the CPU specify images with minimal
processing.
edited 7 hours ago
answered 7 hours ago
Curt J. SampsonCurt J. Sampson
3,4179 silver badges38 bronze badges
3,4179 silver badges38 bronze badges
add a comment |
add a comment |
For example, could Sega Genesis play heavier games if its video chip
were replaced with a second Motorotola 68000 (8 Mhz) in the program of
which video algorithms would be implemented?
No, it would be slower - much slower. The Yamaha YM7101 VDP can display 80 32x32 pixel sprites (20 per scanline), over 2 tile maps which can be freely scrolled vertically and horizontally. To do that completely in software would require many logical operations and memory accesses per pixel.
An 8MHz 68000 can barely execute 2 instructions per microsecond, and most memory operations take several microseconds. Creating those VDP functions in software would be many times slower, even with the help of a 'dumb' frame buffer. If the CPU had to do it all then it would be maxed out just trying to produce a static display, let alone scrolling and rendering sprites over it.
A CPU executes general purpose instructions, which gives it a lot of flexibility but slows it down relative to a video display 'processor' whose job is to simply push pixels out from a frame buffer. The VDP doesn't execute instructions, it has DMA buffers controlled by hardware counters that are synchronized to the video frame. Once the control registers have been set up it just continuously reads video memory and outputs pixels.
The Amiga's custom chipset is an example of a video subsystem that does have a processor, called the 'Copper' (short for co-processor). It only has 3 instructions - Move, Skip, and Wait, all of which are 32 bits long. Move loads an immediate value into any custom chip register, Wait waits until the beam counters reach a particular horizontal and vertical position on the screen, and Skip jumps over the next instruction if past a particular position. This is about as RISC as you can get, but it still isn't fast enough to produce a full-resolution bitmap display directly.
add a comment |
For example, could Sega Genesis play heavier games if its video chip
were replaced with a second Motorotola 68000 (8 Mhz) in the program of
which video algorithms would be implemented?
No, it would be slower - much slower. The Yamaha YM7101 VDP can display 80 32x32 pixel sprites (20 per scanline), over 2 tile maps which can be freely scrolled vertically and horizontally. To do that completely in software would require many logical operations and memory accesses per pixel.
An 8MHz 68000 can barely execute 2 instructions per microsecond, and most memory operations take several microseconds. Creating those VDP functions in software would be many times slower, even with the help of a 'dumb' frame buffer. If the CPU had to do it all then it would be maxed out just trying to produce a static display, let alone scrolling and rendering sprites over it.
A CPU executes general purpose instructions, which gives it a lot of flexibility but slows it down relative to a video display 'processor' whose job is to simply push pixels out from a frame buffer. The VDP doesn't execute instructions, it has DMA buffers controlled by hardware counters that are synchronized to the video frame. Once the control registers have been set up it just continuously reads video memory and outputs pixels.
The Amiga's custom chipset is an example of a video subsystem that does have a processor, called the 'Copper' (short for co-processor). It only has 3 instructions - Move, Skip, and Wait, all of which are 32 bits long. Move loads an immediate value into any custom chip register, Wait waits until the beam counters reach a particular horizontal and vertical position on the screen, and Skip jumps over the next instruction if past a particular position. This is about as RISC as you can get, but it still isn't fast enough to produce a full-resolution bitmap display directly.
add a comment |
For example, could Sega Genesis play heavier games if its video chip
were replaced with a second Motorotola 68000 (8 Mhz) in the program of
which video algorithms would be implemented?
No, it would be slower - much slower. The Yamaha YM7101 VDP can display 80 32x32 pixel sprites (20 per scanline), over 2 tile maps which can be freely scrolled vertically and horizontally. To do that completely in software would require many logical operations and memory accesses per pixel.
An 8MHz 68000 can barely execute 2 instructions per microsecond, and most memory operations take several microseconds. Creating those VDP functions in software would be many times slower, even with the help of a 'dumb' frame buffer. If the CPU had to do it all then it would be maxed out just trying to produce a static display, let alone scrolling and rendering sprites over it.
A CPU executes general purpose instructions, which gives it a lot of flexibility but slows it down relative to a video display 'processor' whose job is to simply push pixels out from a frame buffer. The VDP doesn't execute instructions, it has DMA buffers controlled by hardware counters that are synchronized to the video frame. Once the control registers have been set up it just continuously reads video memory and outputs pixels.
The Amiga's custom chipset is an example of a video subsystem that does have a processor, called the 'Copper' (short for co-processor). It only has 3 instructions - Move, Skip, and Wait, all of which are 32 bits long. Move loads an immediate value into any custom chip register, Wait waits until the beam counters reach a particular horizontal and vertical position on the screen, and Skip jumps over the next instruction if past a particular position. This is about as RISC as you can get, but it still isn't fast enough to produce a full-resolution bitmap display directly.
For example, could Sega Genesis play heavier games if its video chip
were replaced with a second Motorotola 68000 (8 Mhz) in the program of
which video algorithms would be implemented?
No, it would be slower - much slower. The Yamaha YM7101 VDP can display 80 32x32 pixel sprites (20 per scanline), over 2 tile maps which can be freely scrolled vertically and horizontally. To do that completely in software would require many logical operations and memory accesses per pixel.
An 8MHz 68000 can barely execute 2 instructions per microsecond, and most memory operations take several microseconds. Creating those VDP functions in software would be many times slower, even with the help of a 'dumb' frame buffer. If the CPU had to do it all then it would be maxed out just trying to produce a static display, let alone scrolling and rendering sprites over it.
A CPU executes general purpose instructions, which gives it a lot of flexibility but slows it down relative to a video display 'processor' whose job is to simply push pixels out from a frame buffer. The VDP doesn't execute instructions, it has DMA buffers controlled by hardware counters that are synchronized to the video frame. Once the control registers have been set up it just continuously reads video memory and outputs pixels.
The Amiga's custom chipset is an example of a video subsystem that does have a processor, called the 'Copper' (short for co-processor). It only has 3 instructions - Move, Skip, and Wait, all of which are 32 bits long. Move loads an immediate value into any custom chip register, Wait waits until the beam counters reach a particular horizontal and vertical position on the screen, and Skip jumps over the next instruction if past a particular position. This is about as RISC as you can get, but it still isn't fast enough to produce a full-resolution bitmap display directly.
answered 3 hours ago
Bruce AbbottBruce Abbott
1,2423 silver badges6 bronze badges
1,2423 silver badges6 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Retrocomputing Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f12138%2fgpu-chip-implementation-on-cpu-will-it-be-faster%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You don't want line drawing. It's almost useless. You want copy-zooming of rectangles on either rectangles (2D) or tetragons (3D).
– Janka
7 hours ago
2
Nice idea, reminds me of arcade boards making use of Z80 CPUs as a sound coprocessors.
– sgorozco
7 hours ago
FWIW the Atari 2600 is probably the worst example you could come up with for what you described there (which is often called a “dumb frame buffer” display). The Atari 2600 didn’t have a frame buffer. It barely had anything at all. The way you do video on an Atari 2600 is nothing less than hair-raising, white-knuckle inducing.
– Euro Micelli
6 hours ago
A GPU is just a CPU that's highly optimized for its task (drawing graphics). Just like a DSP is a CPU that's highly optimized for signal processing. So replacing a GPU with a CPU of approximately same complexity and clock rate will make drawing graphics slower, not faster.
– dirkt
2 hours ago