Implementing this renderer turned out to be quite challenging as expected. Two of the major problems I faced concerned the speed and accuracy of the output. In retrospect, I think the wrong algorithms were chosen and both these things suffered as a result. I did have good success optimising with x86 assembly however and managed to improve the speed of critical sections like line filling and screen clearing considerably- following some careful use of the SSE instruction set.
While it might not be the most advanced software renderer in the world, I think the results were pretty good for a first attempt. Certainly, I learnt a lot from working on the project and have a much better understanding of how 3D hardware works internally because of it- so it was well worth while doing in that regard.