FBlockedGrid3f

Note on the block size: 8 seems to be the sweet spot with vastly better performance in SDF generation perhaps because a float block can fit in the L1 cache with room to spare.