How do YOU multithread?

The Partridge Family were neither partridges nor a family. Discuss.
albinopapa
Posts: 4373
Joined: February 28th, 2013, 3:23 am
Location: Oklahoma, United States

Re: How do YOU multithread?

Post by albinopapa » August 3rd, 2015, 4:47 am

From what I can tell, while unaligned memory is slower on my Phenom II and more significant on the Intel chips is AMD chips seem to handle unaligned memory access rather well. Perhaps, writing SSE using _mm_loadu_si128 and _mm_setu_si128 would not only be easier to code, but on modern procs wouldn't be all that inefficient.

That being said though, I'm sure doing this on the pixel array makes a big difference.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com

User avatar
chili
Site Admin
Posts: 3948
Joined: December 31st, 2011, 4:53 pm
Location: Japan
Contact:

Re: How do YOU multithread?

Post by chili » August 4th, 2015, 4:20 am

Unaligned cases will always be slower, the important difference here is the difference between loading using a unaligned instruction, or loading using aligned loads and then fixing up. For my Core2 (and my old Nahalem i7 if I remember correctly), it was faster to fix up by hand than to directly load using the unaligned load instructions. However, my new i7 Haswell (and my i5 at work, not sure what microarchitecture) is significantly better with the unaligned load and store instructions.

I am impressed that your chip processes the unaligned loads so well. Maybe AMD was a little ahead of Intel in speeding up those accesses.

Still from a programmers perspective, it might be worthwhile to use different codepaths depending on processor. There are still a lot of processors out there that would benefit, and the whole idea of optimizing is to allow a wider range of systems to attain a playable level of performance (and not just to see how many pixels (or whatever) you can push per second on the fastest machines (although that is fun too)).

Edit.

I just noticed the FX numbers you put up. Thanks for that! Seems like you're getting faster performance with the unaligned instruction even in the aligned case there. Weird. Anyways, the moral is, on recent processors, unaligned moves are strictly better than trying to roll your own alignment fixing scheme.
Chili

Post Reply