Introduction
This is a series about how to program for the Game Boy Advance (GBA) using the Rust programming language.
License
The work in this project is licensed as follows:
- Rust code:
Zlib OR Apache-2.0 OR MIT
- All other content (linker scripts, book text, etc):
CC0-1.0
Support The Project
If you'd like to support the book you can sign up to be a Github Sponsor.
Basics
Let's program some stuff to run on the GBA.
Basic Compilation
As usual with any new Rust project we'll need a Cargo.toml
file:
# Cargo.toml
[package]
name = "gba_from_scratch"
version = "0.1.0"
edition = "2021"
And we want some sort of program to run so let's make an example called ex1.rs
in the examples/
directory.
It can just be a classic "Hello, World" type program to start.
// examples/ex1.rs fn main() { println!("hello"); }
Since we're not running the compiler on the GBA itself, then we'll need to "cross-compile" our program. It's called "cross compilation" when you build a program for some system other than the system that you're running the compiler on. The system running the compiler is called the "host" system, and the system you're building for is called the "target" system. In our case, the host system can be basically anything that can run a Rust toolchain. I've had success on Windows, Linux, and Mac, there's no big difficulties.
To do a cross compile, we pass --target to cargo
.
If we look up the Game Boy Advance on wikipedia, we can see that it has an ARM7TDMI CPU.
The "ARM7T" part means that it uses the "ARMv4T" CPU architecture.
Now we go the Platform Support page and use "ctrl+F" to look for "ARMv4T".
We can see three(-ish) entries that might(?) be what we want.
armv4t-none-eabi
armv4t-unknown-linux-gnueabi
thumbv4t-none-eabi
This is the part where my "teach like you're telling a story" style breaks down a bit.
What should happen next is that we pick the thumbv4t-none-eabi
target.
Except there's not an easy to find document that tells you this step that I can just link to and have you read a few lines.
The shortest version of the full explanation is something like "Many ARM CPUs support two code 'states', and one of them is called 'thumb', and that's the better default on the GBA."
We can certainly talk more about that later, but for now you just gotta go with it.
Let's see what happens when we pass --target thumbv4t-none-eabi
as part of a call to cargo
:
>cargo build --example ex1 --target thumbv4t-none-eabi
Compiling gba_from_scratch v0.1.0 (D:\dev\gba-from-scratch)
error[E0463]: can't find crate for `std`
|
= note: the `thumbv4t-none-eabi` target may not be installed
= help: consider downloading the target with `rustup target add thumbv4t-none-eabi`
= help: consider building the standard library from source with `cargo build -Zbuild-std`
error: requires `sized` lang_item
For more information about this error, try `rustc --explain E0463`.
error: could not compile `gba_from_scratch` (lib) due to 2 previous errors
Well we seem to have already configured something wrong, somehow. The trouble with a wrong project configuration is that the compiler can't always guess what you meant to do. This means that the error message suggestions might be helpful, but they also might lead you down the wrong path.
One suggested way to fix the problem is to add the thumbv4t-none-eabi
target with rustup
.
It seems pretty low risk to just try installing that, so let's see.
>rustup target add thumbv4t-none-eabi
error: toolchain 'nightly-x86_64-pc-windows-msvc' does not contain component 'rust-std' for target 'thumbv4t-none-eabi'; did you mean 'thumbv6m-none-eabi'?
note: not all platforms have the standard library pre-compiled: https://doc.rust-lang.org/nightly/rustc/platform-support.html
help: consider using `cargo build -Z build-std` instead
Ah, dang.
If we double check the Platform Support page we might see that thumbv4t-none-eabi
is in the "Tier 3" section.
Tier 3 targets don't have a standard library available in rustup
.
How about this build-std
thing?
The -Z
flags are all unstable flags, so we can check the unstable section of the cargo manual.
Looks like build-std lets us build our own standard library.
We're going to need Nightly rust, so set that up how you want if you need to.
You can use rustup default nightly
(which sets the system global default), or you can use a toolchain file if you want to use Nightly on just this one project.
Once we've set for Nightly use, we need to get the rust-src
component from rustup
too.
rustup default nightly
rustup component add rust-src
Okay let's try again
> cargo build --example ex1 --target thumbv4t-none-eabi -Z build-std
Compiling compiler_builtins v0.1.89
Compiling core v0.0.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core)
Compiling libc v0.2.140
Compiling cc v1.0.77
Compiling memchr v2.5.0
Compiling std v0.0.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std)
Compiling unwind v0.0.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/unwind)
Compiling rustc-std-workspace-core v1.99.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/rustc-std-workspace-core)
Compiling alloc v0.0.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc)
Compiling cfg-if v1.0.0
Compiling adler v1.0.2
Compiling rustc-demangle v0.1.21
Compiling rustc-std-workspace-alloc v1.99.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/rustc-std-workspace-alloc)
Compiling panic_abort v0.0.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/panic_abort)
Compiling panic_unwind v0.0.0 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/panic_unwind)
Compiling gimli v0.26.2
Compiling miniz_oxide v0.5.3
Compiling hashbrown v0.12.3
Compiling object v0.29.0
Compiling std_detect v0.1.5 (/Users/dg/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/stdarch/crates/std_detect)
error[E0432]: unresolved import `alloc::sync`
--> /Users/dg/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gimli-0.26.2/src/read/dwarf.rs:2:12
|
2 | use alloc::sync::Arc;
| ^^^^ could not find `sync` in `alloc`
For more information about this error, try `rustc --explain E0432`.
error: could not compile `gimli` (lib) due to previous error
warning: build failed, waiting for other jobs to finish...
Whoa... that's way too much. We didn't mean for all of that to happen. Let's check that cargo manual again. Ah, it says we need to pass an argument to our command line argument if we don't want as much stuff to be build
> cargo build --example ex1 --target thumbv4t-none-eabi -Z build-std=core
Compiling gba_from_scratch v0.1.0 (/Users/dg/gba-from-scratch)
error[E0463]: can't find crate for `std`
|
= note: the `thumbv4t-none-eabi` target may not support the standard library
= note: `std` is required by `gba_from_scratch` because it does not declare `#![no_std]`
= help: consider building the standard library from source with `cargo build -Zbuild-std`
For more information about this error, try `rustc --explain E0463`.
error: could not compile `gba_from_scratch` (lib) due to previous error
That's different from before at least.
Well, we told to to only build core
and not std
, and then it said we couldn't use std
.
Makes sense.
Lets change the example.
// ex1.rs
#![no_std]
fn main() {
println!("hello");
}
And we need to fix our lib.rs
to also be no_std
.
It doesn't do anything else for now, it's just blank beyond being no_std.
#![allow(unused)] fn main() { // lib.rs #![no_std] }
Now rust-analyzer is telling me we can't use println in our example.
Also, we're missing a #[panic_handler]
.
Here's the error.
> cargo build --example ex1 --target thumbv4t-none-eabi -Z build-std=core
Compiling gba_from_scratch v0.1.0 (/Users/dg/gba-from-scratch)
error: cannot find macro `println` in this scope
--> examples/ex1.rs:4:3
|
4 | println!("hello");
| ^^^^^^^
error: `#[panic_handler]` function required, but not found
error: could not compile `gba_from_scratch` (example "ex1") due to 2 previous errors
Well, we can comment out the println!
.
For the panic handler, we go to the Attributes part of the rust reference.
That links us to panic_handler, which sets what function gets called in event of panic.
// ex1.rs #![no_std] fn main() { // } #[panic_handler] fn panic_handler(_: &core::panic::PanicInfo) -> ! { loop {} }
Now we get a new, different error when we try to build:
> cargo build --example ex1 --target thumbv4t-none-eabi -Z build-std=core
Compiling gba_from_scratch v0.1.0 (/Users/dg/gba-from-scratch)
error: requires `start` lang_item
error: could not compile `gba_from_scratch` (example "ex1") due to previous error
Alright so what's this start
lang item deal?
Well it has to do with the operating system being able to run your executable.
The details aren't important for us, because there's no operating system on the GBA.
Instead of trying to work with the start
thing, we'll declare our program as #![no_main]
.
This prevents the compiler from automatically generating the main
entry fn, which is what's looking to call that start fn.
Note that this generated main
fn is separate from the main
fn that we normally think of as being the start of the program.
Because, as always, programmers are very good at naming things.
// ex1.rs #![no_std] #![no_main] fn main() { // } #[panic_handler] fn panic_handler(_: &core::panic::PanicInfo) -> ! { loop {} }
Okay let's try another build.
> cargo build --example ex1 --target thumbv4t-none-eabi -Z build-std=core
Compiling gba_from_scratch v0.1.0 (/Users/dg/gba-from-scratch)
warning: function `main` is never used
--> examples/ex1.rs:4:4
|
4 | fn main() {
| ^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: `gba_from_scratch` (example "ex1") generated 1 warning
Finished dev [unoptimized + debuginfo] target(s) in 0.64s
Okay. It builds.
Using mGBA
Let's see if it works I guess.
Personally I like to use mGBA as my emulator of choice, but any GBA emulator should be fine.
If you're on Windows then your executable will be called mgba.exe
by default, and if you're on Mac or Linux you'll get both mgba
(no UI) and mgba-qt
(has a menu bar and such around the video frame).
On my Windows machine I just made a copy of mgba.exe
that's called mgba-qt.exe
so that both names work on all of my devices.
> mgba target/thumbv4t-none-eabi/debug/examples/ex1
The emulator starts and then... shows a dialog box. "An error occurred." says the box's title bar. "Could not load game. Are you sure it's in the correct format?" Well, sorry mgba, but we're not sure it's in the correct format. In fact, we're pretty sure it's not the correct format right now. I guess we'll have to inspect the compilation output.
ARM Binutils
If we go to ARM's developer website we can fine the ARM Toolchain Downloads page.
This lets us download the tools for working with executables for the arm-none-eabi
family of targets.
This includes our thumbv4t
program, as well as other variants of ARM code.
You can get it from their website, or if you're on a Linux you can probably get it from your package manager.
The binutils package for a target family has many individual tools.
The ones we'll be using will all be named arm-none-eabi-
to start, to distinguish them from the same tool for other targets.
So if we want to use "objdump" we call it with arm-none-eabi-objdump
and so on.
That's exactly what we want to use right now.
We pass the name of the compiled executable, and then whichever other options we want.
For now let's look at the --section-headers
> arm-none-eabi-objdump target/thumbv4t-none-eabi/debug/examples/ex1 --section-headers
target/thumbv4t-none-eabi/debug/examples/ex1: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .debug_abbrev 000000f4 00000000 00000000 00000094 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
1 .debug_info 000005a6 00000000 00000000 00000188 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
2 .debug_aranges 00000020 00000000 00000000 0000072e 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
3 .debug_str 00000495 00000000 00000000 0000074e 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
4 .debug_pubnames 000000c0 00000000 00000000 00000be3 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
5 .debug_pubtypes 00000364 00000000 00000000 00000ca3 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
6 .ARM.attributes 00000030 00000000 00000000 00001007 2**0
CONTENTS, READONLY
7 .debug_frame 00000028 00000000 00000000 00001038 2**2
CONTENTS, READONLY, DEBUGGING, OCTETS
8 .debug_line 00000042 00000000 00000000 00001060 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
9 .comment 00000013 00000000 00000000 000010a2 2**0
CONTENTS, READONLY
There's a few columns of note:
Size
is the number of bytes for the section.VMA
is the Virtual Memory Address. On the GBA this means the intended address when the main program is running. All of our data starts in ROM, and some of it we will copy into RAM just after boot. When a section is intended to be copied into RAM, it will have a VMA separate from the LMA.LMA
is the Logical Memory Address. On the GBA this means the address in ROM.
Which means... according to the chart... none of this data would end up in the ROM? I guess that means that, if we extracted our raw program from the ELF container file that the compiler uses, we would end up with a totally blank ROM. That certainly doesn't sound like what mgba would call the "correct format".
Linker Scripts
What's wrong is that we need to adjust the linker script.
That link goes to the documentation for the binutils linker (called ld
), and technically we're actually using the linker that ships with the compiler (called rust-lld
).
rust-lld
is the Rust version of lld
, which is LLVM's linker that's intended to be a "drop in" replacement for GNU's ld
.
Both linkers use a linker script system, and they both even use the same linker script format.
I tried to find an in depth manual for lld
specifically, but all I could find was the top level "man page" explanations.
Referring to the the GNU ld
manual will have to do.
You don't have to read the whole manual, the short story goes like this: linkers take one or more "object" files and "link" them into a single "executable" file. The linker script is what guides the linker in exactly what to do. If you don't say what script to use then the linker will use a default linker script that it keeps wherever. When the target is a "normal" target like Windows or Mac then using a default linker script is just fine. When the target is something a little more esoteric, like most embedded devices, including the GBA, then the default won't be good enough. We'll have to write our own script and make the linker use that.
One complexity here is that the linker script to use is an argument passed to the linker.
And the way you pass args to the linker is that you tell rustc
to do it.
Except with cargo build
there's no way to tell rustc
an extra argument.
We could use cargo rustc
, but it's a pain to have to remember an alternate command.
As much as possible we'd like cargo build
to work.
We could use a build.rs
file to pass an arg to the linker, but making a build script just to pass one argument seems like maybe overkill.
Probably we should just set it as part of our the RUSTFLAGS
environment variable.
The catch with RUSTFLAGS
is that any time you change it you have to build the entire crate graph again.
We want to "write it down" (so to speak) and have it automatically be the same every time.
This can be done with a cargo configuration file.
First let's make a blank normal_boot.ld
file in a linker_scripts/
folder.
Then in the .cargo
folder we fill in config.toml
# .cargo/config.toml
[target.thumbv4t-none-eabi]
rustflags = ["-Clink-arg=-Tlinker_scripts/normal_boot.ld"]
while we're at it, we can even set a default target (which is used when we don't specify --target
, and we can configure for build-std
to be automatically be used, all in the same file.
# .cargo/config.toml
[unstable]
build-std = ["core"]
[build]
target = "thumbv4t-none-eabi"
[target.thumbv4t-none-eabi]
rustflags = ["-Clink-arg=-Tlinker_scripts/normal_boot.ld"]
Great, let's try it out
> cargo build --example ex1
warning: function `main` is never used
--> examples\ex1.rs:4:4
|
4 | fn main() {
| ^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: `gba_from_scratch` (example "ex1") generated 1 warning
Finished dev [unoptimized + debuginfo] target(s) in 0.10s
Cool. It's a lot less to type, and we're ready to fill in our linker script.
Our linker script is called normal_boot.ld
because there's two ways for the GBA to boot up.
One of them is the "normal" style with a program running off of the game pak.
The other is "multiboot" where the GBA can download a program over the link cable.
Since we might want to do multiboot some day, we might as well give our linker script a specific name to start with.
Once things are set up we won't really have to think about it on a regular basis, so it's fine.
There's three things we'll have to concern ourselves with:
- The entry point
- The memory locations
- The sections
Picking an entry point is easy, it's just the name of a symbol.
The traditional entry point name is just _start
, so we'll go with that.
ENTRY(_start)
Having an entry point set doesn't really matter for running the program on actual GBA hardware. Still when the entry point ends up at one of the usual address values, it helps the heuristic system mgba uses to determine if it should run our program as a normal game or a multiboot game, so it's not entirely useless.
Which brings us to the memory portion.
The GBA has three main chunks of memory: Read-Only Memory (ROM), Internal Work RAM (IWRAM), and External Work RAM (EWRAM). We can cover more of the fine differences later, for now it's enough to write them down into our linker script. For each one we have to specify the base address and the size in bytes.
MEMORY {
ewram (w!x) : ORIGIN = 0x2000000, LENGTH = 256K
iwram (w!x) : ORIGIN = 0x3000000, LENGTH = 32K
rom (rx) : ORIGIN = 0x8000000, LENGTH = 32M
}
Finally, we have to tell the linker which output section to assign all of the input sections it finds. This uses a glob-matching sort of system. We specify an output section that we want to have created, and then in the braces for it we list matchers that are checked against each input section the linker sees. When an input section fits one of the matchers, it goes with that output section.
Program code is supposed to end up in the .text
section, so we can start with just that.
SECTIONS {
.text : {
*(.text .text.*);
} >rom
}
Here we've got one matcher listed, *(.text .text.*);
.
The *
at the start means it applies to any input file.
We could limit what files it applies to, if we wanted, but generally we shouldn't.
Inside the parenthesis is a space separated list of globs.
We've got two: .text
and .text.*
.
The first is for the exact match .text
, and the second is for anything that starts with .text.
.
The convention for section names is to start with a .
, and they can't have spaces.
Rust will default to having every function in its own section, all with the prefix .text.
.
Unused code can only be removed one entire input section at a time, so having every function in a distinct input section keeps our output as small as possible.
The >rom
part after tha braces allocates the entire output section into the rom
memory that we declared before.
All together, we've got this:
/* normal_boot.ld */
/* THIS LINKER SCRIPT FILE IS RELEASED TO THE PUBLIC DOMAIN (SPDX: CC0-1.0) */
ENTRY(_start)
MEMORY {
ewram (w!x) : ORIGIN = 0x2000000, LENGTH = 256K
iwram (w!x) : ORIGIN = 0x3000000, LENGTH = 32K
rom (rx) : ORIGIN = 0x8000000, LENGTH = 32M
}
SECTIONS {
.text : {
*(.text._start);
*(.text .text.*);
} >rom
}
This isn't a complete and "final" linker script, but for now it's enough to let us proceed.
If we rebuild the program right now we still won't get anything in the output .text
section.
Remember that dead code warning we keep getting on our main
function?
Nothing in our program ever calls main
, and it's not public for outsiders to call, so it gets discarded during linking.
Since no code can call main
then no code can panic either, and the panic_handler
function gets removed as well.
We end up with nothing at all.
Writing A _start
We need to add some code to our progam so that there will be something to output.
Might as well define the _start
function.
_start
doesn't work like a normal function.
The way the very start of the GBA's ROM works is special.
When the GBA first boots the BIOS (which is part of the GBA itself, not part of our ROM) takes control.
It and plays the boot animation and sound that you're probably familiar with, then does a checksum on our ROM's header data.
If the checksum passes the BIOS jumps control to 0x0800_0000
(the start of ROM).
That's where our _start
will be.
The first instruction can be "anything" but immediateley after that is the rest of the header data.
That means that in practice the very first instruction of _start
has to be a jump past the rest of the header data, since the header data isn't executable code.
Sticking non-executable data into the middle of a function isn't something that the compiler is really capable of dealing with, so we'll have to take direct control of the situation.
We could do this using either global_assembly! or a #[naked] function.
One might think that we should pick the Stable option (global assembly), over the Nightly option (a naked function).
However, naked functions are basically much easier to work with.
Since using build-std
means that we have to use Nightly anyway, it's not that bad to also use naked functions as well.
If naked functions were the very last thing that required us to use Nightly we could move to global assembly instead.
At the top of ex1.rs
we need to add #![feature(naked_functions)]
.
Then we add our _start
function.
In addition to marking it as #[naked]
, we also mark it #[no_mangle]
.
We need to use #[instruction_set(arm::a32)]
as well.
This is part of that arm/thumb thing from before.
Because the BIOS jumps to the start of the ROM with the CPU in a32 mode, our function must be encoded appropriately.
Since _start
has got to specifically at the very start of the ROM we'll use #[link_section = ".text._start"]
to assign our function a specific section name we can use in our linker script.
Since _start
is going to be "called" by the outside world we have to assign it the extern "C"
ABI.
Since it should never return we will mark the return type as -> !
.
So far it all looks like this:
#![allow(unused)] fn main() { // ex1.rs #[naked] #[no_mangle] #[instruction_set(arm::a32)] #[link_section = ".text._start"] unsafe extern "C" fn _start() -> ! { todo!() } }
Inside of the _start
function, because it's a naked function, we must put an asm!
block as the only statement.
Our assembly will be very simple for now.
Let's look at it on its own.
b 1f
.space 0xE0
1:
b 1b
In the first line we branch (b
) to the label 1
that is "forward" from the instruction (1f
).
Then with .space
we put 0xE0 blank bytes.
This is called a "directive", it doesn't emit an instruction directly, instead it tells the assembler to do a special action.
We can tell it's a directive because it has a .
at the beginning.
The blank space is where the header data can go when we need to fill it in.
mgba doesn't check the header, so during development it's fine to leave the header blank.
We can always fix the header data after compilation using a special tool called gbafix
when we need to.
The 1:
is a label.
We know it's a label because it ends with :
.
Unlike with function names, a label can be just a number.
In fact, it's preferred to only use numberic labels whenever possible.
When a non-numeric label is defined more than once it causes problems (that's why function names are mangled by default, and we had to use no_mangle
).
When a numeric label is defined more than once, all instances of that label can co-exist just fine.
When you jump to a numbered label (forward or back), it just jumps to the closest instance of that number (in whichever direction).
Note that a label can have something else on the same line following the :
.
Usually a label will be on a line of its own so that it stands out a little more in the code, but that's just a code style thing.
Something can follow a label on the same line as well.
If a label is on a line of its own, the label "points to" the next line that has a non-label thing on it.
You can also have more than one label point at the same line, if necessary.
Finally, our second actual instruction is that we want to branch backward to the label 1
.
Since that 1
label points at the branch itself, this instruction causes an infinite loop.
The same as if we'd written loop {}
in rust.
At the end of our assembly we have to put options(noreturn)
.
That's just part of how #[naked]
functions work.
So when we put it all together we get this:
#![allow(unused)] fn main() { // ex1.rs #[naked] #[no_mangle] #[instruction_set(arm::a32)] #[link_section = ".text._start"] unsafe extern "C" fn _start() -> ! { core::arch::asm! { "b 1f", ".space 0xE0", "1:", "b 1b", options(noreturn) } } }
And we also want to adjust the linker script.
Since _start
is now in .text._start
, we'll put a special matcher for that to make sure it stays at the start of the ROM, no matter what order the linker sees our files in.
/* normal_boot.ld */
SECTIONS {
.text : {
*(.text._start);
*(.text .text.*);
} >rom
}
And after all of this, we can build our example and see that something shows up in the .text
section of the executable.
> cargo build --example ex1 && arm-none-eabi-objdump target/thumbv4t-none-eabi/debug/examples/ex1 --section-headers
Compiling core v0.0.0 (C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core)
Compiling rustc-std-workspace-core v1.99.0 (C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\rustc-std-workspace-core)
Compiling compiler_builtins v0.1.89
Compiling gba_from_scratch v0.1.0 (D:\dev\gba-from-scratch)
Finished dev [unoptimized + debuginfo] target(s) in 9.98s
target/thumbv4t-none-eabi/debug/examples/ex1: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000000e6 08000000 08000000 00010000 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .ARM.exidx 00000010 080000e8 080000e8 000100e8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .debug_abbrev 0000010a 00000000 00000000 000100f8 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
3 .debug_info 000005b7 00000000 00000000 00010202 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
4 .debug_aranges 00000028 00000000 00000000 000107b9 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
5 .debug_ranges 00000018 00000000 00000000 000107e1 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
6 .debug_str 0000049c 00000000 00000000 000107f9 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
7 .debug_pubnames 000000cb 00000000 00000000 00010c95 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
8 .debug_pubtypes 00000364 00000000 00000000 00010d60 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
9 .ARM.attributes 00000030 00000000 00000000 000110c4 2**0
CONTENTS, READONLY
10 .debug_frame 00000038 00000000 00000000 000110f4 2**2
CONTENTS, READONLY, DEBUGGING, OCTETS
11 .debug_line 00000056 00000000 00000000 0001112c 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
12 .comment 00000013 00000000 00000000 00011182 2**0
CONTENTS, READONLY
I think we're ready to test the program.
Obviously we just use cargo run
and...
> cargo run --example ex1
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running `target\thumbv4t-none-eabi\debug\examples\ex1`
error: could not execute process `target\thumbv4t-none-eabi\debug\examples\ex1` (never executed)
Caused by:
%1 is not a valid Win32 application. (os error 193)
Ah, right, Windows doesn't know how to run GBA programs, of course.
Instead, let's adjust the .cargo/config.toml
to set a "runner" value in our target confituration.
When we have a runner set, cargo run
will call the runner program and pass the program we picked as the first argument.
# .cargo/config.toml
[target.thumbv4t-none-eabi]
rustflags = ["-Clink-arg=-Tlinker_scripts/normal_boot.ld"]
runner = "mgba-qt" #remove the -qt part if you're on Windows!
And so we try again
> cargo run --example ex1
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running `mgba-qt target\thumbv4t-none-eabi\debug\examples\ex1`
If everything is right so far, mGBA should launch and show a white screen. Congrats, it didn't crash.
Checking With objdump
If we want to double check that our code is showing up in the executable properly we can even use objdump
to check that.
If we pass --disassemble
we can get a printout of the assembly.
There's a bunch of other options for how to configure that output too, so check the --help
output to see what you can do.
I like to use --demangle --architecture=armv4t --no-show-raw-insn -Mreg-names-std
, and you get output like this:
> arm-none-eabi-objdump target/thumbv4t-none-eabi/debug/examples/ex1 --disassemble --demangle --architecture=armv4t --no-show-raw-insn -Mreg-names-std
target/thumbv4t-none-eabi/debug/examples/ex1: file format elf32-littlearm
Disassembly of section .text:
08000000 <_start>:
8000000: b 80000e4 <_start+0xe4>
...
80000e4: b 80000e4 <_start+0xe4>
80000e8: udf #65006 ; 0xfdee
Disassembly is a tricky thing sometimes.
It's not always clear to the disassembler what is code and what's data.
Or when it should decode a32
code (4 bytes each) or t32
code (2 bytes each).
In this case, the disassembler did notice that enough bytes in a row are all zero, and it just cuts that from the output with a ...
.
That's cool, but it doesn't always work.
Every once in a while the disassembler will interpret things wrong and a chunk of the display will be nonsense.
It's kinda just how it goes, try not to worry if you see it happen.
Also, at the end of our function we can see there's an undefined instruction.
Those will happen sometimes at the end functions.
I'm unclear on why.
It doesn't seem to be for alignment, because going 4 bytes past 0x0800_00E8
to 0x0800_00EC
would make things less aligned.
Still, I guess it's not really a big deal when it happens.
We've got so much ROM space available that an occasional 2 or 4 bytes extra won't really break the bank.
Proving Our Program Is Doing Something
It's all nice and well to see a white screen, but let's finish up this section by having our program do something, anything at all, which lets us see that we're really having an effect on the GBA.
The simplest thing to do would be to make the screen turn on black instead of white. When the BIOS transfers control to our program a thing called the "forced blank" mode is active. This makes the display draw all pixels as white. If we turn off the forced blank bit we'll get a black screen instead.
All we have to do is add a few more lines of assembly to our _start
function:
#![allow(unused)] fn main() { // in `main` of ex1.rs core::arch::asm! { "b 1f", ".space 0xE0", "1:", "mov r0, #0x04000000", "mov r1, #0", "strh r1, [r0]", "2:", "b 2b", options(noreturn) } }
This part after the header data is what's new:
mov r0, #0x04000000
mov r1, #0
strh r1, [r0]
mov
will "move" a value into a register.
This shares the usual assignment syntax of Rust and most other programming languages:
the destination register is on the left,
and the source data to move into that register is on the right.
So you could think of it being similar to
#![allow(unused)] fn main() { let r0 = 0x04000000; }
The #
means that the value is an "immediate" value.
It gets encoded into the instruction itself, so it doesn't have to "come from" anywhere else.
With LLVM's assembler it seems like actually putting the #
before an immediate value is optional (that is: the program will compile the same without it),
but on some assemblers putting the #
is required, so I'll be putting it in the tutorial code.
After we move values into r0
and r1
we have a strh
.
This will "store(half)" the data in the first argument to the address in the second argument.
In other words, it writes the lower 16 bits of the register to the address, as if the address was a *mut u16
.
The argument order for single loads and stores on ARM is that the address is always last, and in square brackets.
The square brackets make it fairly easy to spot when skimming through a big pile of assembly.
After doing that strh
we have an "empty loop" like we had before, but just using the label 2
instead of 1
this time.
And if we turn on the program...
cargo run --example ex1
Instead of a totally white screen, we'll see a totally black screen. We've had some effect on the GBA.
Which is enough to call this article over. In the next article we'll actually learn more details about what we just did, as well as more details about how else we can affect the screen.
This is the exact state of the repo when I finished this article.
User Input
So far we can build a program that shows a white screen, or a program that shows a black screen. As fascinating as this is, we can't even make the program switch from white to black while it's running. That will be our goal for this part.
For this article we'll be mostly working on a new example: ex2.rs
Memory Mapped Input/Output (MMIO)
At the end of the last article I told you to put a mysterious bit of assembly into the program
mov r0, #0x04000000
mov r1, #0
strh r1, [r0]
This "resets the forced blank bit", and that lets the display show the normal picture instead of all white. At the moment our normal picture is all black, but soon it will be something else.
What's happening is called Memory Mapped Input/Output, or Memory Mapped IO, or even just MMIO.
The CPU only knows how to do math and access memory. What "accessing memory" actually means is that a signal goes along a "bus". The signal can be pushed out to other hardware ("store"), or be pulled in from the other hardware ("load"). When the signal's address points a memory device it's how we store data for later. There's also other types of device too, things that don't just store data. When the signal goes there, "other stuff" happens.
The address 0x04000000
connects to a part of the display system called the Display Control.
When we set the display control's bits to 0_u16
with our strh
instruction, that includes the forced blank bit.
There's other bits too, which we'll get to soon.
All of the GBA's hardware is controlled via MMIO, so most of this series will involve explaining MMIO address values and the correct way to set ths bits at each address.
Note that an MMIO address is not like normal memory:
- Sometimes an address will be read-only (writes are totally ignored) or write-only (reading produces garbage data).
- Sometimes an address will allow both reads and writes, but what you read back will be something else from what you last wrote.
- This is not the case with any of the GBA's MMIO, but on some other devices (eg: the NES) reading an MMIO location can be "destructive", changing the value just by reading it.
Volatile Memory Access
Normally the compiler will try to keep memory accesses to a minimum. If it sees you read an address twice without a write in between, it'll (usually) only do the first read. If you write to an address twice without a read in between it'll (usually) skip the first write. I say "usually" because it depends on optimization level and such. It's simple stuff, but it makes programs fast, and we want our programs fast.
However, when working with MMIO every single memory access has to happen exactly as we write it in our program. If we're (for example) reading the address for the button data then of course we'd read it over and over without ever doing a write. But we still need every single read to actually happen so we can get the newest button data.
To tell the compiler this, we use a "volatile" load or store instead of a normal load or store.
This is done with the read_volatile and write_volatile pointer methods.
But those are unsafe
methods because the compiler naturally doesn't know if, for any given pointer, it's safe to just read or write some data.
Pointers can come from anywhere, they might be dangling, etc etc, all the normal problems with raw pointers.
Instead, we'll use the voladdress crate.
It's got some alternatives to just raw points that ease the volatile usage quite a bit.
I made it specifically to power the gba
crate's MMIO, so we can be fairly confident that it'll be useful for writing GBA programs.
> cargo add voladdress
Updating crates.io index
Adding voladdress v1.3.0 to dependencies.
Now in our lib.rs
we can declare DISPCNT
.
That's the short name that GBATEK (the main GBA homebrew manual) and mGBA use for the display control.
In Rust terms it's a VolAddress for a u16
value.
It's safe to read or write, and it's located at 0x0400_0000
like we saw before.
#![allow(unused)] fn main() { // in lib.rs use voladdress::{Safe, VolAddress}; pub const DISPCNT: VolAddress<u16, Safe, Safe> = unsafe { VolAddress::new(0x0400_0000) }; }
Now we can adjust the display control within Rust. Neato.
Moving _start
Into The Library
When we made ex1.rs
we put the _start
function directly into the example file.
That's not a great long term plan.
We want to have a _start
function that just does the correct startup "in the background", automatically.
We don't want to be thinking about it again with each new example we make.
So first let's copy the _start
function into lib.rs
.
This will require us to put #![feature(naked_functions)]
at the top of lib.rs
.
Again, we could use global assembly instead, but I think that global assembly is just a little worse than naked functions, and we're already on Nightly.
Now all of our examples moving forward will have the _start
function (assuming they link in our library).
That's fine, except that right now _start
doesn't have a way to call any function in our executable.
We're gonna rewrite _start
to do whatever startup it needs and then we'll have it call another function.
If we pick an un-mangled name for the function that _start
calls each executable we make will be able to make a function with that name and the linker will weave it all together just fine.
Since it's the conventional "beginning of the program" name let's use main
.
First we update _start
:
#![allow(unused)] fn main() { // in lib.rs #[naked] #[no_mangle] #[instruction_set(arm::a32)] #[link_section = ".text._start"] unsafe extern "C" fn _start() -> ! { core::arch::asm! { "b 1f", ".space 0xE0", "1:", "ldr r12, =main", "bx r12", options(noreturn) } } }
Our new assembly is this part:
ldr r12, =main
bx r12
The first line, ldr <reg>, =symbol
, is a special "pseudo instruction".
It looks like an instruction, but what the assembler actually outputs is a slight variation.
What will happen is that the assembler will insert a load operation for an address relative to this instruction, and then at that relative address the assembler will also insert the address of main
itself.
This way we don't have to know where main
is.
In fact we don't even have to have defined main
at all.
That's good, because our library won't define main
anyway.
As long as the final executable defines main
somewhere the linker will patch it all together.
The second line bx <reg>
is a "branch-exchange".
This is a special kind of branch that we have to use with ARMv4T when we want to switch between ARM code (a32) and Thumb code (t32).
It switches to the correct code mode as part of the jumping the program's execution to the new address.
The _start
function must be written in a32 code, but most of the rest of the program, including main
, could be written in either code type.
Since main
might be a different code type from _start
we use bx
instead of the basic b
instruction we've been using previously.
(note: there's a third type of branch on the GBA called bl
, which we'll see eventually).
While b
instruction jumped to a label, bx
jumps to a register.
That's why we have to load main
into r12
before we can use bx
.
I picked r12
in this case just because the convention is that it's a "scratch" register.
With the C ABI the caller will never pass data through r12
, and functions are allowed to modify r12
without restoring the value before they return.
That's all that _start
has to do for now.
Later it will have some setup work to do before calling main
, but not yet.
This Is An Incomplete Start Function
NOTE: This _start
function is "incomplete" in the sense that it doesn't initialize RAM.
This means that you can't use any static mutable data with non-zero initial values.
We're not doing that right now, so that's not a problem for us right now, and we'll get to that eventually.
But it is a non-obvious limitation worth mentioning.
Adding main
To ex2.rs
Now in ex2.rs
we need to have a main
function that's no_mangle
, extern "C"
, and that doesn't ever return.
To begin, we'll make the actual body of main
just do what we were doing before.
First write 0 to DISPCNT
, and then do a loop
forever.
// ex2.rs #![no_std] #![no_main] use gba_from_scratch::DISPCNT; #[no_mangle] pub extern "C" fn main() -> ! { DISPCNT.write(0); loop {} } #[panic_handler] fn panic_handler(_: &core::panic::PanicInfo) -> ! { loop {} }
And if we run this in mGBA with cargo run --example ex2
we see... actually we see a mostly black screen but with a white line on it.
That's... not what we expected?
That's not either of the types of screen that we got before.
Here's where things get kinda weird.
If we run our program in --release
mode we don't see the line.
Let's look at the output of the compiler again with objdump
.
In fact, now that we've got more than one example let's have a script to store that "use objdump" stuff.
I'm gonna make a dump.bat
, but you can make dump.sh
if you're on Mac or Linux.
It's just a few plain commands, no special scripting.
cargo build --examples
arm-none-eabi-objdump target/thumbv4t-none-eabi/debug/examples/ex1 --section-headers --disassemble --demangle --architecture=armv4t --no-show-raw-insn -Mreg-names-std >target/ex1.txt
arm-none-eabi-objdump target/thumbv4t-none-eabi/debug/examples/ex2 --section-headers --disassemble --demangle --architecture=armv4t --no-show-raw-insn -Mreg-names-std >target/ex2.txt
Okay, and the target/ex1.txt
file has about what we expect in it. A bunch of sections like we saw before and then.
Disassembly of section .text:
08000000 <_start>:
8000000: b 80000e4 <_start+0xe4>
...
80000e4: mov r0, #67108864 ; 0x4000000
80000e8: mov r1, #0
80000ec: strh r1, [r0]
80000f0: b 80000f0 <_start+0xf0>
80000f4: udf #65006 ; 0xfdee
Yep, just what we expected.
Let's see what's in target/ex2.txt
, same basic thing, right?
Ah, wait, well there's 29 sections instead of 12.
That's probably fine, more debug info or something, probably?
Won't affect our code, I'm sure.
Disassembly of section .text:
08000000 <_start>:
8000000: b 80000e4 <_start+0xe4>
...
80000e4: ldr r12, [pc, #4] ; 80000f0 <_start+0xf0>
80000e8: bx r12
80000ec: udf #65006 ; 0xfdee
80000f0: .word 0x08000115
Sure, what we expected...
080000f4 <voladdress::voladdress_::VolAddress<T,R,voladdress::Safe>::write>:
80000f4: push {r7, lr}
80000f6: sub sp, #16
80000f8: str r1, [sp, #4]
80000fa: str r0, [sp, #8]
80000fc: add r2, sp, #12
80000fe: strh r1, [r2, #0]
8000100: bl 8000178 <core::num::nonzero::NonZeroUsize::get>
8000104: ldr r1, [sp, #4]
8000106: bl 800012c <core::ptr::write_volatile>
800010a: add sp, #16
800010c: pop {r7}
800010e: pop {r0}
8000110: mov lr, r0
8000112: bx lr
Oops.. that's... not a good way to write to a pointer.
08000114 <main>:
8000114: movs r0, #1
8000116: lsls r0, r0, #26
8000118: movs r1, #0
800011a: bl 80000f4 <voladdress::voladdress_::VolAddress<T,R,voladdress::Safe>::write>
800011e: b.n 8000120 <main+0xc>
8000120: b.n 8000120 <main+0xc>
Oh?
08000122 <rust_begin_unwind>:
8000122: sub sp, #4
8000124: str r0, [sp, #0]
8000126: b.n 8000128 <rust_begin_unwind+0x6>
8000128: b.n 8000128 <rust_begin_unwind+0x6>
800012a: bmi.n 80000d6 <_start+0xd6>
Okay that one seems okay, I think?
0800012c <core::ptr::write_volatile>:
800012c: push {r7, lr}
800012e: sub sp, #24
8000130: str r0, [sp, #0]
8000132: movs r2, r1
8000134: str r2, [sp, #4]
8000136: str r0, [sp, #12]
8000138: add r0, sp, #16
800013a: strh r1, [r0, #0]
800013c: movs r0, #1
800013e: cmp r0, #0
8000140: bne.n 8000154 <core::ptr::write_volatile+0x28>
8000142: b.n 8000144 <core::ptr::write_volatile+0x18>
8000144: ldr r0, [sp, #4]
8000146: ldr r1, [sp, #0]
8000148: strh r0, [r1, #0]
800014a: add sp, #24
800014c: pop {r7}
800014e: pop {r0}
8000150: mov lr, r0
8000152: bx lr
8000154: ldr r0, [sp, #0]
8000156: str r0, [sp, #8]
8000158: ldr r0, [sp, #8]
800015a: str r0, [sp, #20]
800015c: bl 8000180 <core::intrinsics::is_aligned_and_not_null>
8000160: cmp r0, #0
8000162: bne.n 8000170 <core::ptr::write_volatile+0x44>
8000164: b.n 8000166 <core::ptr::write_volatile+0x3a>
8000166: ldr r0, [pc, #12] ; (8000174 <core::ptr::write_volatile+0x48>)
8000168: movs r1, #111 ; 0x6f
800016a: bl 80002fc <core::panicking::panic_nounwind>
800016e: udf #254 ; 0xfe
8000170: b.n 8000144 <core::ptr::write_volatile+0x18>
8000172: nop ; (mov r8, r8)
8000174: .word 0x08000440
Oh... uh... oh no. And there's more. It goes on and on, but I think you get the joke at this point.
Yeah, rustc
outputs utter garbage code without optimizations enabled.
Just, atrocious.
The only reason it's usable at all on your desktop is because modern computers are so fast.
Our best bet is to just turn on full optimizations for the debug profile.
This can be done in Cargo.toml
.
In a new profile.dev
section we set the opt-level
to 3.
[profile.dev]
opt-level = 3
And rebuild / redump the program:
Disassembly of section .text:
08000000 <_start>:
8000000: b 80000e4 <_start+0xe4>
...
80000e4: ldr r12, [pc, #4] ; 80000f0 <_start+0xf0>
80000e8: bx r12
80000ec: udf #65006 ; 0xfdee
80000f0: .word 0x080000f5
080000f4 <main>:
80000f4: movs r0, #1
80000f6: lsls r0, r0, #26
80000f8: movs r1, #0
80000fa: strh r1, [r0, #0]
80000fc: b.n 80000fc <main+0x8>
That's it, that's our whole program once optimizations have been applied. Now we don't get the white line.
Why did we get it before?
I don't know exactly.
Our ex2
program "stops" on an infinite loop that's just as fast as the ex1
version, even if it takes longer to get there.
I'd have thought that it wouldn't make a difference, but somehow it does.
Emulators are weird like that sometimes.
Oh, and speaking of weird stuff, while we're adjusting build configuration things, I found out about that undefined instruction thing.
Our good friend Scott wrote in (so to speak) and suggested trying -Ztrap-unreachable=no
in RUSTFLAGS.
So we just add it in the .cargo/config.toml
:
[target.thumbv4t-none-eabi]
rustflags = ["-Ztrap-unreachable=no", "-Clink-arg=-Tlinker_scripts/normal_boot.ld"]
runner = "mgba-qt"
and rebuild / redump again...
Disassembly of section .text:
08000000 <_start>:
8000000: b 80000e4 <_start+0xe4>
...
80000e4: ldr r12, [pc] ; 80000ec <_start+0xec>
80000e8: bx r12
80000ec: .word 0x080000f1
080000f0 <main>:
80000f0: movs r0, #1
80000f2: lsls r0, r0, #26
80000f4: movs r1, #0
80000f6: strh r1, [r0, #0]
80000f8: b.n 80000f8 <main+0x8>
The undefined instruction is gone! Magical! I guess the explanation is that LLVM is trying to add in a "guard" against code accidentally flowing past the end of the function. When the CPU is made to execute and undefined instruction it causes a special kind of "interrupt" to happen. We'll mostly talk about interrupts later, but for now let's just say that what LLVM is expecting is that the Operating System will handle the interrupt by killing the program (the undefined instruction "traps" your program). We don't really have an OS on the GBA, we are the OS you might say. Regardless of what you call it, that undefined instruction won't "trap" like LLVM thinks it will. The undefined interrupt handler in the BIOS just returns and the device just keeps executing. So that undefined instruction is purely a waste of space to us.
We might as well leave -Ztrap-unreachable=no
set in our configuration.
The -Z
part means that it's a Nightly flag, but we're on Nightly for other stuff already so it's fine.
If we have to be on Nightly for build-std
, we might as well take advantage of the other extra flags we can.
More Assembly Details
Let's quickly take another close look at our two functions so far.
First is _start
08000000 <_start>:
8000000: b 80000e4 <_start+0xe4>
...
80000e4: ldr r12, [pc] ; 80000ec <_start+0xec>
80000e8: bx r12
80000ec: .word 0x080000f1
So the ldr r12, =main
has become ldr r12, [pc]
.
The pc
register is the "program counter".
That's storing the next address for the CPU to read and start doing an instruction.
The ARM7TDMI has a 3 stage CPU pipeline: Fetch, Decode, Execute.
The pc
register will always be pointing two instructions ahead of what instruction is actually executing.
So by the time we're executing the ldr
, the pc
register is two instructions ahead on .word 0x080000f1
.
The .word
directive inserts a literal 4 byte value, in this case 0x080000f1
.
That's the address of main
, +1.
The +1 part makes the address odd, which is how bx
will know what code state to switch to.
So after we load an address into r12
, we use bx
to branch-exchange to that address.
The "exchange" part is because there's a register called the "current program status register".
This register holds several bit flags about the program's current status.
Importantly it has a T flag, which says if the CPU is running in thumb state or not.
A branch-exchange will "exchange" the lowest bit in the register holding the target address with the current value of the T flag.
- If the target address is odd then the T flag becomes set (the program will run as thumb code).
- If the target address is even then the T flag becomes cleared (the program will run as arm code).
And I know it's an "exchange", but the previous T value basically goes nowhere. They just call it an exchange to give it a fancy name, I guess.
I hope that wasn't too much. If not, don't worry. It's not essential to understand the full details right away if you want to just keep going.
Let's look over at main
.
080000f0 <main>:
80000f0: movs r0, #1
80000f2: lsls r0, r0, #26
80000f4: movs r1, #0
80000f6: strh r1, [r0, #0]
80000f8: b.n 80000f8 <main+0x8>
Ah, here's something interesting.
Instead of mov
we're doing movs
, and instead of lsl
(logical shift left) we're doing lsls
.
When an instruction ends with s
then it "sets the status flags".
_start
is ARM code, and most all ARM instructions can choose to set the flags or not. We haven't set the status flags in our small amount of code so far.main
is thumb code, and most all thumb instructions are forced to set the flags. This constraints how much you can reorder your instructions. Each operation that sets flags you care about has to come just before whatever the thing using those flags is.
But let's notice something else.
We can see the addresses of each instruction, and in _start
we can see each instruction is 4 bytes.
With main
we can see that each instruction is just 2 bytes.
This is the advantage of using thumb code.
The program is significantly smaller.
In fact, the CPU has to access ROM over a 16-bit bus. This means that if the CPU needs a 32-bit value (such as an ARM instruction) then it needs to do two reads in a row to get the complete data. This means that when running programs out of ROM they actually run slower if they're ARM code. The CPU will have to wait over and over as it tries to get each complete instruction just half an instruction at a time. This is why we're having the default encoding for our program be thumb code.
Also, the way to get 0x0400_0000
into a register has changed:
- With ARM code we can
mov
the value directly as one instruction. - With thumb code we have to
movs
a 1 and then as a separate step we left shift it by 26 bits to get the right value.
What's happening is that the ARM mov
instruction doesn't encode the full litearal 0x0400_0000
within the instruction.
It's only a 32 bit instruction, so it can't store a 32 bit value and also the bits to declare mov
.
Instead, it stores mov
and a compressed form of the data: 1<<26
.
But thumb code is only 16 bits, so it can't even store that much.
Since each thumb instruction is only 2 bytes instead of 4, there's less bits to fit immediate values and do instruction variations and such.
This means that in a lot of cases an operation that's one ARM instruction will be more than one thumb instruction.
Because of this, thumb code vs ARM code is not as simple as "your program is half as big".
You get a significant savings on average, but the exact ratio depends on the program
We can see the disassembler is showing our strh
as strh r1, [r0, #0]
.
This is saying "r0
plus 0".
Actually any store or load can be "plus some immediate value", but when the modifier is plus 0 we don't need to write it.
In this case, the disassembler is just being a little silly in how it prints things.
Also, when we see b.n 80000f8
, this b.n
means "branch instruction with narrow encoding".
The explanation here is that in later versions of ARM there was a "thumb 2" introduced.
In thumb 2, some instructions will be encoded as one opcode (each of which is two bytes), but then other uses will be two opcodes.
The .n
is the "narrow" encoding, meaning it's the one opcode version.
On the GBA we don't use thumb2 at all, but since the objdump
program is designed to work with all versions of ARM it just prints this way.
The Backdrop Color
The "backdrop" color is the color that's shown in a pixel if no background layer or object is visible in that pixel. Right now when we turn off forced blank we see a black screen because the backdrop color is black. If we were to change the backdrop color we'd see the whole screen filled with some other color.
First let's declare an MMIO for the backdrop color.
#![allow(unused)] fn main() { // in lib.rs pub const BACKDROP: VolAddress<u16, Safe, Safe> = unsafe { VolAddress::new(0x0500_0000) }; }
Now let's update ex2.rs
so that we set the backdrop color before we turn off forced blank.
// in ex2.rs use gba_from_scratch::{BACKDROP, DISPCNT}; #[no_mangle] pub extern "C" fn main() -> ! { BACKDROP.write(0b11111); DISPCNT.write(0); loop {} }
Let's look at that assembly:
080000f0 <main>:
80000f0: movs r0, #5
80000f2: lsls r0, r0, #24
80000f4: movs r1, #31
80000f6: strh r1, [r0, #0]
80000f8: movs r0, #1
80000fa: lsls r0, r0, #26
80000fc: movs r1, #0
80000fe: strh r1, [r0, #0]
8000100: b.n 8000100 <main+0x10>
So first it gets the BACKDROP
address in a register (5<<24), then the color value (31), then writes that, and the rest of the program is like we've seen before.
Makes sense.
We could also change the backdrop color after turning off forced blank if we wanted to. However, by default it's best practice to only adjust the display when forced blank is on or when you know it's the vertical blank period. Otherwise you can get accidental display artifacts on the screen.
If we run the program now we'll see a red screen.
The magic looking 0b11111
value is because the GBA has 5-bit per channel color.
A GBA color value is a u16
with the channels going from low to high:
0bX_BBBBB_GGGGG_RRRRR
So 0b11111
is "full red, no green or blue".
Using a raw u16
isn't that great.
We'd probably like to have a little bit more meaning to the type so that it's clearer what's going on.
We can put names on our functions and magic values, things like that.
If we replace the u16
in BACKDROP
with a repr(transparent)
wrapper type over an actual u16
then things will be a lot better.
This is called using a "newtype", and we'll be doing it a lot.
#![allow(unused)] fn main() { // in lib.rs pub const BACKDROP: VolAddress<Color, Safe, Safe> = unsafe { VolAddress::new(0x0500_0000) }; #[derive(Clone, Copy, PartialEq, Eq)] #[repr(transparent)] pub struct Color(pub u16); impl Color { pub const RED: Self = Self::rgb(31, 0, 0); #[inline] #[must_use] pub const fn rgb(r: u16, g: u16, b: u16) -> Self { Self(r | (g << 5) | (b << 10)) } } }
Then we change ex2.rs
to use our new Color type.
use gba_from_scratch::{Color, BACKDROP, DISPCNT}; #[no_mangle] pub extern "C" fn main() -> ! { BACKDROP.write(Color::RED); DISPCNT.write(0); loop {} }
Practically self-documenting code at this point!
If we run the program again we can see a red screen too. Let's double check our assembly to make sure we didn't kill performance somehow.
080000f0 <main>:
80000f0: movs r0, #5
80000f2: lsls r0, r0, #24
80000f4: movs r1, #31
80000f6: strh r1, [r0, #0]
80000f8: movs r0, #1
80000fa: lsls r0, r0, #26
80000fc: movs r1, #0
80000fe: strh r1, [r0, #0]
8000100: b.n 8000100 <main+0x10>
Hey it's the exact same as before. We've got a zero-runtime-cost abstraction, the promise of Rust is real!
Reading The Buttons
Fun as it is to have a single static color, that's still not very exciting.
We can read the current state of the keys from the KEYINPUT
control.
This includes both the "buttons" as well as the direction-pad value.
#![allow(unused)] fn main() { // in lib.rs pub const KEYINPUT: VolAddress<u16, Safe, ()> = unsafe { VolAddress::new(0x400_0130) }; }
Note that instead of Safe
as the write type we've put ()
instead.
The key data is naturally read-only.
The CPU can't just tell the GBA to make a button be pressed or not, that's not gonna move the buttons.
With this new MMIO we can read the keys and then show a color based on the value:
// in ex2.rs #[no_mangle] pub extern "C" fn main() -> ! { DISPCNT.write(0); loop { let k = KEYINPUT.read(); BACKDROP.write(Color(k)); } }
Now if we run the program and press different keys we'll see the color change.
Each bits of KEYINPUT
that's connected to a key will be 0 when the key is pressed and 1 when the key is released.
It's known as a "low-active" control scheme, because when a key is pressed it goes from high (1) to low (0).
Bits not connected to any key will always just be 0.
Which key controls which bit is as follows:
Bit | Key |
---|---|
0 | A |
1 | B |
2 | Select |
3 | Start |
4 | Right |
5 | Left |
6 | Up |
7 | Down |
8 | R |
9 | L |
Like with the color data, we probably want to make a newtype for all this.
#![allow(unused)] fn main() { // in lib.rs pub const KEYINPUT: VolAddress<KeyInput, Safe, ()> = unsafe { VolAddress::new(0x400_0130) }; #[derive(Clone, Copy, PartialEq, Eq)] #[repr(transparent)] pub struct KeyInput(pub u16); #[rustfmt::skip] impl KeyInput { #[inline] pub const fn a(self) -> bool { (self.0 & (1<<0)) == 0 } #[inline] pub const fn b(self) -> bool { (self.0 & (1<<1)) == 0 } #[inline] pub const fn select(self) -> bool { (self.0 & (1<<2)) == 0 } #[inline] pub const fn start(self) -> bool { (self.0 & (1<<3)) == 0 } #[inline] pub const fn right(self) -> bool { (self.0 & (1<<4)) == 0 } #[inline] pub const fn left(self) -> bool { (self.0 & (1<<5)) == 0 } #[inline] pub const fn up(self) -> bool { (self.0 & (1<<6)) == 0 } #[inline] pub const fn down(self) -> bool { (self.0 & (1<<7)) == 0 } #[inline] pub const fn r(self) -> bool { (self.0 & (1<<8)) == 0 } #[inline] pub const fn l(self) -> bool { (self.0 & (1<<9)) == 0 } } }
(This is kinda begging for a macro_rules!
, but it's basically fine to put that off until later.)
Also let's add a definition for GREEN
on our color type.
#![allow(unused)] fn main() { impl Color { pub const RED: Self = Self::rgb(31, 0, 0); pub const GREEN: Self = Self::rgb(0, 31, 0); // ... } }
Now we can read the keys, and set the color to red or green based on if a key is pressed or not:
#[no_mangle] pub extern "C" fn main() -> ! { DISPCNT.write(0); loop { let k = KEYINPUT.read(); BACKDROP.write(if k.a() { Color::RED } else { Color::GREEN }) } }
And I think that's enough for one article.
This is the exact commit of the project files when I finished writing this article.
Objects / Sprites
Now that we can get user input there's a lot of things that we could learn about next. Probably we should focus on how to improve our drawing abilities.
Most of the GBA's drawing abilities involve either the 4 background layers, or the 128 objects (called "OBJ" for short). The background layers let you draw a few "big" things (128x128 or bigger), and the objects let you draw many "small" things (64x64 or less).
The objects have a fairly consistent behavior, while the four background layers behave differently depending on the "video mode" that you set in the display control. That's reason enough to focus on the objects first.
Are They Objects Or Are They Sprites?
The objects are sometimes called "sprites". GBATEK calls them objects, and mGBA (v0.10 at least) calls them sprites. Some people care about the difference between the two terms, but I don't. I'm just going to say "object" most of the time in this series because the data for them is called the "object active memory".
Display Control
We've already seen that the display control has a "forced blank" bit. Most of the other bits are for background control stuff, but since some of them affect object display we'll just cover that right now.
Bit(s) | Setting |
---|---|
0-2 | Video Mode |
3 | (Unused in GBA mode) |
4 | Frame Select |
5 | Unlocked H-blank |
6 | Linear object tile mapping |
7 | Forced Blank |
8 | Enable Background 0 |
9 | Enable Background 1 |
10 | Enable Background 2 |
11 | Enable Background 3 |
12 | Enable Objects |
13 | Window 0 Display Flag |
14 | Window 1 Display Flag |
15 | OBJ Window Display Flag |
- Video Mode: This sets which mode the four background layers will operate with. Despite this being a 3-bit field, only modes 0 through 5 give a useful display. Modes 6 and 7 cause garbage output.
- Frame Select: Affects which bitmap frame is used in video mode 4 or 5.
- Unlocked H-blank: GBATEK calls this "H-Blank Interval Free", and mGBA's debug controls call this "Unlocked H-blank". This bit affects what you can do during the "horizontal blank" time between each scanline being shown, but when it's on fewer objects can be drawn. We won't be doing any per-scanline drawing for now, so we'll leave it off by default.
- Linear object tile mapping: This affects how we lay out the tiles for multi-tile objects. We'll talk about the details of this in just a moment.
- Forced Blank: Hey we know about this bit. When it's on, the display won't access any memory and will just output white pixels any time it would have rendered a pixel normally.
- Enable Background: These four bits set if we want each of the four background layers on. For now we don't care.
- Enable Objects: This bit sets the objects to be displayed.
- Window Flags: These three bits affect the "window" special graphical feature. We'll ignore these bits for now.
I'm going to use the bitfrob
crate to get some bit manipulation utilities.
> cargo add bitfrob
Updating crates.io index
Adding bitfrob v1.3.0 to dependencies.
Features:
- track_caller
Updating crates.io index
Now we can give a type to our display control value, as well as just enough methods to get started.
Unlike with our Color
type, with the DisplayControl
we want to completely prevent an invalid video mode from being set, so we'll keep the u16
that we're wrapping as a private field.
Then we just have one "builder" method for each bit or group of bits that we want to be able to change.
To start we can skip all the background related bits, so we'll only need three builders.
#![allow(unused)] fn main() { // in lib.rs use bitfrob::u16_with_bit; pub const DISPCNT: VolAddress<DisplayControl, Safe, Safe> = unsafe { VolAddress::new(0x0400_0000) }; #[derive(Clone, Copy, PartialEq, Eq)] #[repr(transparent)] pub struct DisplayControl(u16); impl DisplayControl { #[inline] pub const fn new() -> Self { Self(0) } #[inline] pub const fn with_linear_obj_tiles(self, linear: bool) -> Self { Self(u16_with_bit(6, self.0, linear)) } #[inline] pub const fn with_forced_blank(self, blank: bool) -> Self { Self(u16_with_bit(7, self.0, blank)) } #[inline] pub const fn with_objects(self, objects: bool) -> Self { Self(u16_with_bit(12, self.0, objects)) } } }
This will require updates to both ex2.rs
and ex3.rs
.
- For example 2, instead of writing
0
we'd writeDisplayControl::new()
instead. - For example 3, we want to enable object display, since we're about to start showing some objects.
// in ex3.rs const JUST_SHOW_OBJECTS: DisplayControl = DisplayControl::new().with_objects(true); #[no_mangle] pub extern "C" fn main() -> ! { DISPCNT.write(JUST_SHOW_OBJECTS); loop { let k = KEYINPUT.read(); BACKDROP.write(if k.a() { Color::RED } else { Color::GREEN }) } }
For now that's all we need to do for the display control.
Object Palette
Objects always need to use "paletted" color. Instead of each pixel within the object's image holding a full color value, it just holds an index into the palette. This allows each pixel to only need 4 or 8 bits each, instead of the 16 bits needed for a complete color.
The palette for objects starts at 0x0500_0200
, and it's 256 entries long.
Each object can use 8 bits per pixel (8bpp) or 4 bits per pixel (4bpp).
- When an object is set for 8bpp each non-zero pixel value is the 8-bit index into the object palette. A pixel value of 0 means that the object is transparent in that pixel. This allows for up to 255 colors to be used within a single object.
- When an object is set for 4bpp each non-zero pixel value is the low half of the full index value. A second setting within the object's attributes determine the upper half of the index value. This effectively splits the palette memory into 16 "palbank" groupings. As with 8bpp objects, a pixel value of 0 makes a transparent pixel. This allows for up to 15 colors within a single object.
You might notice that index 0 of the object palette isn't ever used by either mode. The memory itself exists for consistency, but the GBA will never use the color value in that position. Call it a free global variable for your own personal use, if you want.
Since we have a series of color values instead of just a single color value,
this time we'll declare the object palette as a VolBlock instead of a VolAddress
.
#![allow(unused)] fn main() { // in lib.rs pub const OBJ_PALETTE: VolBlock<Color, Safe, Safe, 256> = unsafe { VolBlock::new(0x0500_0200) }; }
A VolBlock
works mostly like an array does.
We call OBJ_PALETTE.index(i)
to get a particular VolAddress
, and then we can read or write that address.
We could also use get
if we want to do an optional lookup, or we could iterate the block, etc.
First let's make some more named color constants. We'll name each of the 8 colors you get when each of the three color channels is either no-intensity or full-intensity.
#![allow(unused)] fn main() { // in lib.rs impl Color { pub const BLACK: Self = Self::rgb(0, 0, 0); pub const BLUE: Self = Self::rgb(0, 0, 31); pub const GREEN: Self = Self::rgb(0, 31, 0); pub const CYAN: Self = Self::rgb(0, 31, 31); pub const RED: Self = Self::rgb(31, 0, 0); pub const MAGENTA: Self = Self::rgb(31, 0, 31); pub const YELLOW: Self = Self::rgb(31, 31, 0); pub const WHITE: Self = Self::rgb(31, 31, 31); // ... } }
Now we can set up a backdrop color and two different palette entries.
// in ex3.rs #[no_mangle] pub extern "C" fn main() -> ! { BACKDROP.write(Color::MAGENTA); OBJ_PALETTE.index(1).write(Color::RED); OBJ_PALETTE.index(2).write(Color::WHITE); DISPCNT.write(JUST_SHOW_OBJECTS); loop {} }
If we run the example in mGBA we can check our work using the debug utilities. In the menu, "Tools -> Game State Views -> View Palette..." will open a dialog showing all the background and object palette info.
- The backdrop color will show up in the 0th entry of the background palette.
- The two object palette colors will be in positions 1 and 2 of the top row.
Each row of the palette is shown 16 colors at a time, so it's easy to tell what's happening in both 8bpp and 4bpp modes.
That should be enough palette setup to continue with the tutorial.
Object Tile Memory
First, what is a tile exactly:
- A tile is an 8x8 square of palette indexes.
- A palette index can be either 4 bits per pixel (4bpp) or 8 bits per pixel (8bpp). This is the "bit depth" of the indexes.
- The indexes store one row at a time, left to right, top to bottom.
So we might have the following Rust constants
#![allow(unused)] fn main() { // in lib.rs pub const PIXELS_PER_TILE: usize = 8 * 8; pub const BITS_PER_BYTE: usize = 8; pub const SIZE_OF_TILE4: usize = (PIXELS_PER_TILE * 4) / BITS_PER_BYTE; pub const SIZE_OF_TILE8: usize = (PIXELS_PER_TILE * 8) / BITS_PER_BYTE; }
Also, there's 32K of object tile RAM.
#![allow(unused)] fn main() { // in lib.rs macro_rules! kilobytes { ($bytes:expr) => { $bytes * 1024 }; } pub const SIZE_OF_OBJ_TILE_MEM: usize = kilobytes!(32); }
Now we know how big everything is, in bytes.
However, the GBA's video memory does NOT work right with individual byte writes.
We can cover the details another time, but with video memory you always have to write in 16-bit or 32-bit chunks.
Also, the GBA is simply much faster at transferring bulk data around when it's aligned to 4.
Data aligned to 4 can be copied one or more u32
values at time (one or more "words" in ARM terms).
Being more aligned than 4 doesn't help any extra, but we want to have at least alignment 4 with anything big.
Tiles, particularly if we've got dozens or hundreds of them, count as "big enough to care about alignment".
This means that instead of modeling tile data as being arrays of u8
, we'll use smaller arrays of u32
, which will keep the data aligned to 4.
#![allow(unused)] fn main() { // in lib.rs pub const SIZE_OF_U32: usize = core::mem::size_of::<u32>(); pub const TILE4_WORD_COUNT: usize = SIZE_OF_TILE4 / SIZE_OF_U32; pub const TILE8_WORD_COUNT: usize = SIZE_OF_TILE8 / SIZE_OF_U32; pub const OBJ_TILE_MEM_WORD_COUNT: usize = SIZE_OF_OBJ_TILE_MEM / SIZE_OF_U32; }
Which lets us declare the block of u32
values where our object tile data goes.
#![allow(unused)] fn main() { // in lib.rs pub const OBJ_TILES_U32: VolBlock<u32, Safe, Safe, OBJ_TILE_MEM_WORD_COUNT> = unsafe { VolBlock::new(0x0601_0000) }; }
Here's where things get kinda weird. An object's attributes (most of which we'll cover lower down) include a "Tile ID" for the base tile of the object. These tile id values are used as a 32 byte index, regardless of if the object uses 4bpp or 8bpp drawing. This means that they line up perfectly with a 4bpp view of the tile data, and we get 1024 IDs.
#![allow(unused)] fn main() { // in lib.rs pub type Tile4 = [u32; TILE4_WORD_COUNT]; pub const OBJ_TILE4: VolBlock<Tile4, Safe, Safe, 1024> = unsafe { VolBlock::new(0x0601_0000) }; }
But with 8bpp objects we end up in a pickle.
We could use a VolSeries, which is an alternative to the VolBlock
type, for when the stride and the element size aren't the same.
The VolSeries
type is mostly intended for when the stride is bigger than the element size, but the math will work out either way.
Note that since 8bpp tiles are twice as big we have to cut down the number of tiles from 1024 to 1023 so that using the last index doesn't go out of bounds.
#![allow(unused)] fn main() { // in lib.rs pub type Tile8 = [u32; TILE8_WORD_COUNT]; pub const OBJ_TILE8: VolSeries<Tile8, Safe, Safe, 1023, 32> = unsafe { VolSeries::new(0x0601_0000) }; }
And, well, it looks kinda weird every time I look at the code but... that's how the hardware works. It's the ultimate arbiter of what's correct, so sometimes you gotta just go with it.
We can always think about this more later, and maybe improve it then. For now it's enough that we've got the right addresses at all.
One final note: In video modes 3, 4, and 5 the lower half of the object tile region instead gets used as part of the background. In this case, only object tile index values 512 and above are usable for object display.
Object Attribute Memory
Separate from the object tile memory, there's also the Object Attribute Memory (OAM) region. This has space for 128 "attribute" entries, which defines how the objects are shown.
Each attribute needs 48 bits.
This is an unfortunate number of bits, because it's not a clean power of 2.
Normally we refer to each attribute entry as having three u16
attributes just called 0, 1, and 2.
#![allow(unused)] fn main() { #[derive(Clone, Copy, PartialEq, Eq, Default)] #[repr(transparent)] pub struct ObjAttr0(pub u16); #[derive(Clone, Copy, PartialEq, Eq, Default)] #[repr(transparent)] pub struct ObjAttr1(pub u16); #[derive(Clone, Copy, PartialEq, Eq, Default)] #[repr(transparent)] pub struct ObjAttr2(pub u16); }
In between each attribute entry is part of an affine entry.
That's right, just a part of an affine entry.
A full affine entry is four i16
values (called A, B, C, and D).
There's one i16
affine value per three u16
attribute values.
The memory looks kinda like this.
- obj0.attr0
- obj0.attr1
- obj0.attr2
- affine0.a
- obj1.attr0
- obj1.attr1
- obj1.attr2
- affine0.b
- obj2.attr0
- obj2.attr1
- obj2.attr2
- affine0.c
- obj3.attr0
- obj3.attr1
- obj3.attr2
- affine0.d
And then that pattern repeats 32 times. It's a little strange, but the hardware does what it does.
We can use use several VolSeries
declarations to model this.
#![allow(unused)] fn main() { // in lib.rs pub const OBJ_ATTRS_0: VolSeries<ObjAttr0, Safe, Safe, 128, 64> = unsafe { VolSeries::new(0x0700_0000) }; pub const OBJ_ATTRS_1: VolSeries<ObjAttr1, Safe, Safe, 128, 64> = unsafe { VolSeries::new(0x0700_0000 + 2) }; pub const OBJ_ATTRS_2: VolSeries<ObjAttr2, Safe, Safe, 128, 64> = unsafe { VolSeries::new(0x0700_0000 + 4) }; }
Alternately, we could group the attributes into a single struct and view things that way.
#![allow(unused)] fn main() { // in lib.rs #[derive(Clone, Copy, PartialEq, Eq, Default)] #[repr(C)] pub struct ObjAttr(pub ObjAttr0, pub ObjAttr1, pub ObjAttr2); pub const OBJ_ATTRS: VolSeries<ObjAttr, Safe, Safe, 128, 64> = unsafe { VolSeries::new(0x0700_0000) }; }
Using the ObjAttr
type and OBJ_ATTRS
series would make it so that all three object attribute fields get accessed.
If you're only intending to update the position of an object (in attributes 0 and 1) without touching attribute 2, then maybe you'd care.
It's pretty unlikely to matter, but maybe.
Let's go over the actual properties within each object attribute field.
Object Attribute 0
- Bits 0 through 7 are the Y coordinate of the top-left corner of the object.
The screen is 160 pixels tall, and the coordinates wrap.
If you want something to appear to move up past the top of the screen, then wrap the Y value around.
Alternately, you can do the position math using signed values and then
as
cast the value to unsigned. - Bits 8 and 9 set what mGBA calls the "transform" of the object:
- 0 is no transform.
- 1 is affine rendering. Which affine entry is used is set in attribute 1.
- 2 is no transform and the object not drawn (it's "disabled").
- 3 is just like 1 but the object is rendered with double size.
- Bits 10 and 11 set the special effect mode:
- 0 is no special effect.
- 1 is alpha blending.
- 2 is window masking. The object isn't shown, but acts as part of the object window mask.
- 3 is not allowed.
- Bit 12 sets if the object uses the Mosaic special effect. This can be enabled/disabled seprately from the other effects above.
- Bit 13 sets if the object uses 8bpp (bit set), or 4bpp (bit cleared).
- Bits 14 and 15 set the "shape" of the object. The exact dimensions also depend on the "size" set in attribute 1
- 0 is square
- 1 is wider
- 2 is taller
- 3 is not allowed.
(WxH) | Size 0 | Size 1 | Size 2 | Size 3 |
---|---|---|---|---|
Shape 0 | 8x8 | 16x16 | 32x32 | 64x64 |
Shape 1 | 16x8 | 32x8 | 32x16 | 64x32 |
Shape 2 | 8x16 | 8x32 | 16x32 | 32x64 |
Object Attribute 1
- Bits 0 through 8 are the X coordinate of the top-left corner of the object. This works basically the same as with the Y coordinate, but the screen is 260 pixels wide so 9 bits are used.
- Bits 8 through 13 depend on if the object is using affine rendering or not.
- When affine (or double sized affine) rendering is used, they set the index of the affine entry used.
- Otherwise Bit 12 sets horizontal flip and Bit 13 set vertical flip.
- Bits 14 and 15 set the size of the sprite.
Object Attribute 2
- Bits 0 through 9 set the base tile index of the object. As mentioned above, in video modes 3, 4, and 5 this needs to be 512 or more.
- Bits 10 and 11 are the "priority" value. Lower priority objects and layers are sorted closer to the viewer, and so they are what's seen if they overlap something farther away. Within a given priority layer, objects always draw over backgrounds, and lower index objects/backgrounds draw over higher index ones.
- Bits 12 through 15 set the palette bank the object uses if it's using 4bpp.
Object Rendering Time
There's a limit to how many objects can be drawn per scanline, but it's not a specific number of objects. Instead, the OAM engine has a buffer that's as wide as the screen, and there's a time limit per scanline on filling the buffer.
- When the "Unlocked H-blank" bit is clear in
DISPCNT
you get 1210 cycles (304 * 4 - 6) - When the "Unlocked H-blank" bit is set in
DISPCNT
you get 954 cycles (240 * 4 - 6)
The number of cycles each object consumes depends on the object's horizontal size:
- Normal objects consume
width
cycles. - Affine objects consume
2 * width + 10
cycles.
Objects are processed by their index order. Objects not on the current scanline, horizontally/vertically off-screen, or that are "disabled" as their attribute 0 transform, are skipped in rendering but still take two cycles to process. Even when an object won't be drawn on the current scanline the OAM engine has to look at the attributes to know that. If not all objects are handled and time runs out then any unprocessed objects simply won't be drawn on this scanline.
Showing Static Objects
Armed with all this knowledge we can probably show a static object.
First we want to set at least one tile in the object tile memory to some sort of pattern.
If we write a hex u32
literal, then each digit of the hex value will be 4 bits, so we can make a 4bpp tile pretty easy.
One catch is that the indexes fill the tile from left to right, but we write numbers in code with the low-place-value digits on the right.
So our "tile" as a u32
literal will be left-right flipped from how it'll appear on the GBA:
#![allow(unused)] fn main() { /// A tile with an extra notch on the upper left. #[rustfmt::skip] const TILE_UP_LEFT: [u32; 8] = [ // Each hex digit is one 4bpp index value. // Also, the image is left-right flipped from how it // looks in code because the GBA is little-endian! 0x11111111, 0x12222111, 0x12222111, 0x12222221, 0x12222221, 0x12222221, 0x12222221, 0x11111111, ]; }
And we can copy the data into object tile 1 in our main
function.
#[no_mangle] pub extern "C" fn main() -> ! { BACKDROP.write(Color::MAGENTA); OBJ_PALETTE.index(1).write(Color::RED); OBJ_PALETTE.index(2).write(Color::WHITE); OBJ_TILE4.index(1).write(TILE_UP_LEFT); DISPCNT.write(JUST_SHOW_OBJECTS); loop {} }
We can make other similar tiles too, one for each corner notch.
#![allow(unused)] fn main() { OBJ_TILE4.index(1).write(TILE_UP_LEFT); OBJ_TILE4.index(2).write(TILE_UP_RIGHT); OBJ_TILE4.index(3).write(TILE_DOWN_LEFT); OBJ_TILE4.index(4).write(TILE_DOWN_RIGHT); }
If we show an 8x8 object using object tile 1, then we'll see an upper-left square.
#![allow(unused)] fn main() { OBJ_ATTRS.index(0).write(ObjAttr(ObjAttr0(0), ObjAttr1(0), ObjAttr2(1))); }
And if we make it wider we can see an upper left and upper right square too.
#![allow(unused)] fn main() { OBJ_ATTRS.index(0).write(ObjAttr( ObjAttr0(1 << 14), ObjAttr1(0), ObjAttr2(1), )); }
But when we make it taller instead, we see... just one tile? Why isn't there a second tile drawn below the first?
This is that "Linear object tile mapping" flag from way back with the Display Control. It defaults to off, so by default when we want to have an object more than 8 pixels tall, the next row of the object will use +32 indexes from the previous row.
#![allow(unused)] fn main() { // using `x + 32*y` to get the index OBJ_TILE4.index(1 + 32*0).write(TILE_UP_LEFT); OBJ_TILE4.index(2 + 32*0).write(TILE_UP_RIGHT); OBJ_TILE4.index(1 + 32*1).write(TILE_DOWN_LEFT); OBJ_TILE4.index(2 + 32*1).write(TILE_DOWN_RIGHT); }
Alternately, we can set up our display control to use the linear object tile system and then just fill tiles 1 to 4 like they're a normal array.
const JUST_OBJECTS_LINEAR: DisplayControl = DisplayControl::new().with_objects(true).with_linear_obj_tiles(true); #[no_mangle] pub extern "C" fn main() -> ! { BACKDROP.write(Color::MAGENTA); OBJ_PALETTE.index(1).write(Color::RED); OBJ_PALETTE.index(2).write(Color::WHITE); OBJ_TILE4.index(1).write(TILE_UP_LEFT); OBJ_TILE4.index(2).write(TILE_UP_RIGHT); OBJ_TILE4.index(3).write(TILE_DOWN_LEFT); OBJ_TILE4.index(4).write(TILE_DOWN_RIGHT); OBJ_ATTRS.index(0).write(ObjAttr( ObjAttr0(0), // square shape ObjAttr1(1 << 14), // size 1 ObjAttr2(1), // base tile 1 )); DISPCNT.write(JUST_OBJECTS_LINEAR); loop {} }
It's really up to you. As long as you're consistent, either way will work.
Of course, also you'd want to have a lot of methods for easily getting/setting the right bits of each attribute value.
I'll put those in to lib.rs
right now, but I'm not gonna show them all here in the tutorial text.
They all do just what you'd expect based on the DisplayControl
type.
For the ObjAttr
type we can have methods that dispatch to the correct inner field's method.
On the x
and y
properties we can make them take i16
instaed of u16
and then just cast inside the setter.
The user will probably want to support signed positions so that stuff can go up off the screen and left off the screen.
#![allow(unused)] fn main() { impl ObjAttr { #[inline] pub const fn new() -> Self { Self(ObjAttr0::new(), ObjAttr1::new(), ObjAttr2::new()) } #[inline] pub const fn with_size(self, size: u16) -> Self { Self(self.0, self.1.with_size(size), self.2) } #[inline] pub const fn with_tile(self, tile: u16) -> Self { Self(self.0, self.1, self.2.with_tile(tile)) } #[inline] pub const fn with_x(self, x: i16) -> Self { Self(self.0, self.1.with_x(x), self.2) } #[inline] pub const fn with_y(self, y: i16) -> Self { Self(self.0.with_y(y), self.1, self.2) } } }
And our final main
goes like this:
#[no_mangle] pub extern "C" fn main() -> ! { BACKDROP.write(Color::MAGENTA); OBJ_PALETTE.index(1).write(Color::RED); OBJ_PALETTE.index(2).write(Color::WHITE); OBJ_TILE4.index(1).write(TILE_UP_LEFT); OBJ_TILE4.index(2).write(TILE_UP_RIGHT); OBJ_TILE4.index(3).write(TILE_DOWN_LEFT); OBJ_TILE4.index(4).write(TILE_DOWN_RIGHT); let obj = ObjAttr::new().with_size(1).with_tile(1).with_x(10).with_y(23); OBJ_ATTRS.index(0).write(obj); DISPCNT.write(JUST_OBJECTS_LINEAR); loop {} }
which displays a little square deal thing
That's it for now. Next time we'll see about making our square move around and stuff.