一所懸命HWデザイナーのふりをやっとる現代のC++が大嫌いなSWの開発者。FPGAのGPUとレトロな家庭用ゲーム機を作ること、うちの猫(ニュートリノ+コスモス)、お箏の演奏、ゲーム開発、抹茶+和菓子がめっさ好きやねん。tweetsは私個人の意見で、会社を代表するものやない
So exactly how did we get here? About a year or so ago, a Very Successful update of WordPress by Dreamhost appears to have disappeared all my website files. A few weeks were spent debating whether or not to attempt manual recreation of the blog posts, but since it sounded so boring and tedious, and the original posts are probably available through the Internet Wayback Machine, I decided to accept that nothing lasts forever, and to just start the blog all over again to coincide with starting a new GPU design
So here’s how this is going to work. I’ll be posting about the work while doing it, rather than waiting to finish the entire project. Why? Well, I screw up alot. I think something is an amazing idea only to find out that it will not work at all. I change my mind as I get new ideas. So maybe there is some value in documenting not only the final design, but also how I get there, what goes wrong, and what changes along the way. For those of you who want to follow along, the next post should serve as guide to get your environment set up, but in this post I’d like to go over the prerequisites
Knowledge
Required: You should obviously be comfortable with binary and hex, and be familiar with basic concepts such as clocks, flip flops, LUTs, and logic functions (and, or, xor, etc). Anything else I will do my best to explain where needed
Nice to have: I won’t be posting much RTL except where it’s necessary to explain something, but it’s still a good idea to have a passing familiarity with SystemVerilog. It’s syntactically similar to high level languages such as C, so if you come from a software background, parts of it might feel familiar. Knowing in general how SVA works is helpful as I tend to talk about validation quite a bit. Finally FPGA architecture might come in handy as well when optimisation and timing topics come up, but I wouldn’t call it a necessity for following along
Hardware
Strictly speaking, you don’t really *need* hardware to follow along. You can always just simulate your design in Vivado, but for those of you excited by the idea of seeing actual graphics output on a real physical screen, I can only recommend Xilinx FPGAs. The three most affordable options are Spartan, Artix, and Zynq. Spartan is the cheapest, Artix boards are slightly larger and have transceivers, and Zynq is basically an Artix with DDR controller and ARM CPU hard IP included. I’ve made GPUs with all three of them, and can say they are all good choices, but I currently use Zynq so I don’t have to waste precious board resources implementing a CPU
Once you decide on one of the above, its time to choose a board, preferably one with HDMI output since this is a GPU, and HDMI PMODs can be a bit hard to come by. I am currently using Digilent’s excellent Zybo Z7, but you can choose whatever board fits your needs. Cheaper options exist, especially from Chinese and Taiwanese makers, but if you are a beginner, you may want to consider not just price but also support. Digilent boards can be a bit on the expensive side, but their documentation is excellent, they have massive forums, and the community tends to be pretty helpful
You are also going to need a screen. In a pinch you can use any computer monitor with HDMI input, assuming it supports 640×480, which not all do. If you’re feeling fun, you can also get a small LCD screen and pretend you’re making a portable console. I’ve had good luck with a Waveshare 12030 I bought in Akihabara for 6000 yen, but there are lots of options. Just keep in mind that many of these screens were designed for RPI, and so you should prepared to write the RTL for the touchscreen driver yourself if you want to use the touch function
Software
I mentioned above that I can only recommend Xilinx FPGAs, and one of the main reasons is Xilinx provides a fully featured free IDE called Vivado that contains all the development and debugging tools you’ll ever need in one place. You are going to want to make a Xilinx account and download it, since FPGA dev without it would be quite painful. IIRC open source tools also exist, but I’d very strongly recommend against using them since they are not made by the people who made the hardware, they don’t really support SystemVerilog so well, and they don’t support full usage of slice resources. You’re better off using the proper professional tools from Xilinx since they are free and incredibly powerful
For those of you going with Zynq, the workflow can be a bit painful. You’ll do the hardware side in Vivado, but then use a very rough and buggy tool called Vitis to do the CPU software side. The Vitis download includes Vivado as well, so there is no need to manually download Vivado also. And when you run the installer, be sure to uncheck all the devices you are not using, else you may end up with 100GB worth of stuff you don’t need
Vitis and Vivado are both available in Windows and Linux, but it is fairly clear Xilinx mainly cares about Linux. To be fair, they have been giving the Windows version a bit of love these last few years, and the improvements are noticeable, but as of 2024 the Linux version is so much faster and stable than the Windows version, even on the exact same PC, that sometimes it can be hard to explain. Now don’t get me wrong, I despise Linux. It is very much a product whose philosophy I strongly disagree with, but the Linux Vivado experience is so much better that I am now using it as my primary OS. If you have a PC you don’t need for work/gamedev/software programming, it might be worth going through the ordeal of setting up Linux and using it as a dedicated FPGA machine. The setup will be painful, but if you never update/change/touch anything once it’s working, maybe it’s ok?
JPUv3
Here I want to outline some of the goals for this new GPU. Previous 3D designs were more like demos, in that vertices and textures were hardcoded in BRAMs, and there wasn’t much CPU control. This time, I am going for full programmability from CPU in an attempt to make something actually usable. My current list of hard requirements is as follows
- redo tile distribution to hit 4x perf and to only dispatch valid tiles
- redo rasterisation to hit 2x perf
- bring back texturing, but fix the texel address generation bottleneck
- bring back the gfx caches, but increase performance
- better memory fabric
- bring back render to texture, limited to power of two sizes
- fix the inefficient logic that was handling tiled/linear writes
- try other primitives besides triangles (tri strip?)
- finally deal with state for multiple draws in flight
- front end command buffers with label reads/writes, conditional branches, call/return, and JTS
- 180MHz is the absolute minimum gfx clk, but I am targeting 200MHz to 220MHz
On top of my nice-to-have list would be depth buffers, something that’s probably not going to happen this time, so don’t get your hopes up. Next is indices for building primitives. And finally is I’d like to make slight improvements to sync, both when waiting for previous graphics work and when waiting for a render target to become unused
To pull all this off, I need a plan of attack. Personally, I need to keep my ambition in check while somehow being even more obsessed with optimisation than I was before. I need to start favouring area more heavily in tradeoffs. Previous designs traded way too much area for minor optimisations, and so this time around, to get everything to fit I really need to avoid getting too clever
That’s all for now. I’ve not blogged in awhile, so I am trying to ease back into it with some non-technical posts. I am also trying very hard to write shorter posts, as writing the super long ones on the previous blog was very very hard on me. If you have any questions, feel free to leave me a comment below, or just message me on BlueSky as I am no longer using Twitter