一所懸命HWデザイナーのふりをやっとる現代のC++が大嫌いなSWの開発者。FPGAのGPUとレトロな家庭用ゲーム機を作ること、うちの猫(ニュートリノ+コスモス)、お箏の演奏、ゲーム開発、抹茶+和菓子がめっさ好きやねん。tweetsは私個人の意見で、会社を代表するものやない
I’m tired. While I do really want people to be able to follow along with the project, explaining Vivado in the previous post really took alot out of me, and resulted in a post that way longer than it should have been. Luckily this is the final tutorial post, after which I can just assume everyone knows Vivado/Vitis/AXI, and I can transition to talking only about the GPU itself
But first, a disclaimer: Vitis can be… rough. It crashes more than it should. Sometimes project creation doesn’t actually create a project, and you have to retry the exact same thing multiple times until it works. To put it nicely, it doesn’t play well with source control. I once asked the devs what files needed to be checked in, and no one seemed to know. The devs did, however, suggest making a TCL script to regenerate the project every time, and checking that in. Later on, a mercurial ignore file was provided, but it seems to just add almost everything to source control anyway, including most of the generated files. It’s ridiculously easy to corrupt a project, and when it happens, you have to recreate the entire thing from scratch. I suspect some things might hardcode absolute paths, and so things can break when directory names change. Even after updating the hardware, sometimes Vitis just doesn’t notice the hardware changes until you do some raindance of unrelated operations
But I absolutely give credit to the devs. They are on the forums, and are really good about responding to issues. If you take some time to help them repro the issue, they absolutely will get it fixed. Even with the rough edges, Xilinx is doing a very good job of constantly improving Vitis. The only reason I am even bringing any of this up is that even a simple tutorial that works for me might not work for you. I can’t anticipate everything that might go wrong, and so even if you meticulously follow along, I can’t guarantee random steps won’t fail along the way
Quick recap! In our last episode, you wrote the RTL for the front end, added an AXI master for reading/writing DDR, and made an AXI slave for recieving MMIO register writes from the CPU. Hopefully all that just works the first time, because explaining how to simulate or in-hardware debug Zynq designs is going to have to be an entire post in itself. Now all that’s left to do is to write the software SDK side. I’ll be going through two examples: a simplified test that lights up LEDs when the CPU writes to certain addresses (shown below), and then my actual front end command buffer API
always_ff @(posedge gfx_clk) begin
if (state == kTestStateWaitAddr) begin
if (mmio_valid_in) begin
// addr is 4 byte aligned, so [2] is the LSB of the reg num
led_mask <= 1'b1 << mmio_addr_or_data_in[3:2];
state <= kTestStateWaitData;
end
end else begin
if (mmio_valid_in) begin
for (int i = 0; i < 4; ++i) begin
led_local[i] <=
(~led_mask[i] & led_local[i]) | (led_mask[i] & mmio_addr_or_data_in[0]);
end
state <= kTestStateWaitAddr;
end
end
end
Case 0: Lighting Up LEDs
Launch Vitis, choose Embedded Development → Create Platform Component, and when asked to specify a hardware design, browse to the XSA file you exported from Vivado. The OS will be standalone and you can choose ARM core 0 to run on. When that’s done, you can click the + button next to VITIS COMPONENTS → VITIS, and follow the steps to add an application using the hardware platform you just created. Building will fail, though, because there are currently no sources added, so right click on Sources → src, and add a new main.c with int main( void )
Now open xparameters.h and you’ll see the rather useful looking
/* Definitions for peripheral MMIO_REG_SLAVE_0 */
#define XPAR_MMIO_REG_SLAVE_0_BASEADDR 0x40000000
#define XPAR_MMIO_REG_SLAVE_0_HIGHADDR 0x40000fff
/* Canonical definitions for peripheral MMIO_REG_SLAVE_0 */
#define XPAR_MMIO_REG_SLAVE_0_BASEADDR 0x40000000
#define XPAR_MMIO_REG_SLAVE_0_HIGHADDR 0x40000fff
Hey, its the defines for the slave we made in the previous post! So include xparameters.h, xil_cache.h, and xil_io.h, and add to main()
uint64_t reg = (uint64_t)XPAR_MMIO_REG_SLAVE_0_BASEADDR;
Xil_Out32(reg + 0, 1);
Xil_Out32(reg + 4, 1);
Xil_Out32(reg + 8, 1);
Xil_Out32(reg + 12, 1);
And that’s it. Build the application, and debug just like you would in VSCode. You can click on the debug button under FLOW → Component → YourApplicationName, or you can control the debugger from the debug tab all the way on the left. If all went well, you should see all 4 LEDs illuminated
Case 1: Dispatching Command Buffers
The Vivado block design diagram for the front end test looks almost identical to the simple test, except it also has some special bits for DMA to BRAM, and allowing the FPGA to trigger CPU interrupts when a rendertarget is no longer used. The software side is also similar, but the data we’re writing is the start and end addresses of command buffers. The current test looks like this
static const int kMaxCommands = 32;
// pick some address not used by rendertargets
uint64_t *command_buffer_address = (uint64_t *)(kDebugFramebufferStart - 5 * 640 * 480 * 2);
CommandBuffer cb0, cb1, cb2;
cb0.init(command_buffer_address + kMaxCommands * 0, kMaxCommands);
cb1.init(command_buffer_address + kMaxCommands * 1, kMaxCommands);
cb2.init(command_buffer_address + kMaxCommands * 2, kMaxCommands);
// cb0 turns the four LEDs on and writes label
cb0.writeMmioReg(GpuRegsMmioAddr(kGpuRegApertureDebug, kDebugApertureLed), 0xF);
uint32_t *label = cb0.nop(1);
uint32_t *label2 = cb0.nop(1);
*label = 0;
*label2 = 0;
cb0.writeLabel(label, kWriteEventImmediate, 1);
// cb1 waits on the GPU-written label and sets rgb led 5 to green
cb1.writeMmioReg(GpuRegsMmioAddr(kGpuRegApertureDebug, kDebugApertureLed5Rgb), 0x2);
cb1.waitLabel(label, kLabelCmpEq, 1);
// cb2 waits on the CPU-written label and sets rgb led 6 to blue
cb2.waitLabel(label2, kLabelCmpEq, 123);
cb2.writeMmioReg(GpuRegsMmioAddr(kGpuRegApertureDebug, kDebugApertureLed6Rgb), 0x4);
// cb0 calls the others
cb0.call(&cb1);
cb0.call(&cb2);
// doesn't currently snoop the cache
cb1.flushToMemory();
cb2.flushToMemory();
cb0.submit();
*label2 = 123;
// still no CPU cache snooping
Xil_DCacheFlushRange((INTPTR)label2, 4);
The test doesn’t show dynamic command buffer patching, but rather focuses on call/return, and label reads and writes. Cb1 lights up an LED to verify it was called, and then waits on a label that was previously written by cb0. Cb2 waits on a label that will eventually be written by the CPU, and then lights up a different LED for verification. Cb0 just calls cb1 and then cb2
The main workhorse is the submit function, which just flushes the command buffer to memory and does the two MMIO register writes
void CommandBuffer::submit(bool flush)
{
uint64_t first = GetMmioRegAddr(kGpuRegApertureCpuMapped, kCpuMappedApertureCbSubmitFirst);
uint64_t last = GetMmioRegAddr(kGpuRegApertureCpuMapped, kCpuMappedApertureCbSubmitLast);
if (flush)
{
flushToMemory();
}
Xil_Out32(first, (uint64_t)m_start);
Xil_Out32(last, (uint64_t)m_current - PACKET_SIZE_BYTES);
}
And although it’s not that interesting, here are a few sample packet writers
uint32_t *CommandBuffer::nop(uint32_t num_doublewords)
{
uint64_t *retval = NULL;
if (num_doublewords)
{
// todo: assert if not enough memory
uint64_t *next_addr = m_current + num_doublewords;
retval = m_current;
SharedAddressPacket first;
first.opcode = kFrontEndPacketOpcodeNop;
first.flag = 0;
first.addr = (uint64_t)next_addr >> ADDR_SHIFT;
first.reserved_lo = 0;
*m_current = first.as_u64;
m_current = next_addr;
}
return (uint32_t *)retval;
}
void CommandBuffer::call(CommandBuffer *dest)
{
// this command buffer jumps to the start of the jump target command buffer
jump(dest->m_start);
// the target command buffer jumps to the next command in this command buffer
// (which does not yet exist!)
dest->jump(m_current);
// temp: make sure this jump isn't the last packet in the command buffer
// a better fix would be doing this before the submit if needed
nop(1);
}
void CommandBuffer::waitLabel(uint32_t *addr, LabelCmp cmp, uint32_t ref_val)
{
// todo: assert if not enough memory
SharedAddressPacket first;
first.opcode = kFrontEndPacketOpcodeWaitLabel;
first.flag = cmp;
first.addr = (uint64_t)addr >> ADDR_SHIFT;
first.reserved_lo = ref_val;
*m_current++ = first.as_u64;
}
void CommandBuffer::writeMmioReg(GpuRegsMmioAddr addr, uint32_t val)
{
union Packet
{
struct
{
uint64_t payload : 32;
uint64_t reserved_hi : (PACKET_SIZE_BITS - 32 - kGpuRegsMmioAddrBits - NUM_OPCODE_BITS);
uint64_t addr : kGpuRegsMmioAddrBits;
uint64_t opcode : NUM_OPCODE_BITS;
};
uint64_t as_u64;
};
// todo: assert if not enough memory
Packet packet;
packet.opcode = kFrontEndPacketOpcodeWriteMmio;
packet.addr = addr.as_u32;
packet.reserved_hi = 0;
packet.payload = val;
*m_current++ = packet.as_u64;
}
A few notes before bringing this Vivado/Vitis tutorial trilogy to a final close. First, at some point you are going to need to make changes to the hardware. Even if you follow all the recommended steps, there is a chance that Vitis will still use the old hardware you exported from Vivado. Rather than explain the fix, here is a thread in which I solved my own problem 🙂 The TL;DR is sometimes its not enough to go to the platform settings, switch XSA, regenerate the BSP, clean the platform target, rebuild, and clean the application and rebuild. Sometimes you also need to pretend you’re going to create a boot image, but then cancel
Finally, you’re probably going to want to play around with interrupts, DMA, and all kinds of things that require lengthy setups involving calling lots of random functions in a very specific order. I have yet to find any useful info on the internet, but Vitis does give you some excellent samples you can directly copy. If you go to settings → Navigate To BSP Settings → your_platform_here → ps7_cortexa9_0 → zynq_fsbl → Board Support Package → standalone → drivers, you will see a list of things your current hardware design supports, with a link to an example you can add to your project and steal from. So for example, I am using DMA in my Vivado design, and so the DMA shows up in the list below