How do YOU multithread?
Re: How do YOU multithread?
Derp, what is all this calamity about ?, looks interesting. I too had it crash on drawing ... something amiss here ? ...
Curiosity killed the cat, satisfaction brought him back
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: How do YOU multithread?
chili wrote:
...
P.S. You can get around the branching if with the pshufb instruction I think.
Also, you might want to look into XOP instructions. They are only on AMD processors, and I think they're being discontinued, but they look pretty sexy anyways.
Phenom II processors don't have SSSE3, nor do they have XOP apparently. Both weren't implemented until the FX chips with the Bulldozer architecture. So I'm stuck with SSE/SSE2 integer functions until I upgrade sometime around Christmas.
Have been having troubles concentrating past few days, still think I drank too much over last weekend and haven't recovered fully. I started using the shuffle method only because I wasn't aware/forgot that there was a shift method that could shift all 128 bits as a single element and not as smaller data types. I will still have to use if blocks since they have to be immediate constants and not an array of constants, which would have been super cool.
Thanks for the offer about your code, but I'm bound and determined to work on it a bit longer. I appreciate the mini lessons here.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
Re: How do YOU multithread?
SSE stuff. Fun stuff.MrGodin wrote:Derp, what is all this calamity about ?, looks interesting.
Awesome, that's what I like to hear, but don't knock yourself out; sometimes you need to take a break for a few days/weeks/months. It'll always be there when the interest rekindles.albinopapa wrote:Thanks for the offer about your code, but I'm bound and determined to work on it a bit longer. I appreciate the mini lessons here.
By the way, could you test this out for me on your machine? Press the space bar to activate unaligned mode. I need data for aligned and unaligned mode for the cases of POS=0 and POS=1 (use the arrow keys to change position).
- Attachments
-
- yes.zip
- (174.33 KiB) Downloaded 107 times
Chili
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: How do YOU multithread?
Dead on Run. Debug shows error at movntdqa.
Code: Select all
00171C16 add eax,dword ptr [ebx]
00171C18 ret 0F8C1h
00171C1B add al,byte ptr [ebx]
00171C1D enter 0E1C1h,4
00171C21 mov dword ptr [ebp+8],ecx
00171C24 mov edx,dword ptr [edi]
00171C26 mov eax,dword ptr [ebp-0Ch]
00171C29 mov ecx,dword ptr [edi+0Ch]
00171C2C imul edx,esi
00171C2F add eax,edx
00171C31 lea edi,[ecx+eax*4]
00171C34 mov eax,dword ptr [ebp-10h]
00171C37 add eax,edx
00171C39 lea ecx,[ecx+eax*4]
00171C3C mov eax,dword ptr [ebp-14h]
00171C3F mov eax,dword ptr [eax+14h]
00171C42 add eax,dword ptr [ebp+8]
00171C45 cmp dword ptr [ebp-0Ch],0
00171C49 je 00171C58
00171C4B movntdqa xmm4,xmmword ptr [edi-10h]
00171C51 psrldq xmm4,8
00171C56 jmp 00171C5C
00171C58 movdqa xmm4,xmm7
00171C5C cmp edi,ecx
00171C5E jae 00171D08
00171C64 movntdqa xmm6,xmmword ptr [edi] // arrow is here
00171C69 movdqa xmm5,xmm6
00171C6D add edi,10h
00171C70 pslldq xmm5,8
00171C75 por xmm5,xmm4
00171C79 movntdqa xmm4,xmmword ptr [eax]
00171C7E movdqa xmm2,xmm5
00171C82 punpcklbw xmm5,xmm7
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
Re: How do YOU multithread?
It also crashes on my Core 2 Duo-Vista-32, but runs on my i7-Win7-64 and my i5-Win7-32 at work. Maybe could be that the sprite is not aligned_malloc, but then it should fail on the first mov, not the second. So maybe it's running past the end of the memory block, and some OS are fine with that, and some get pissed... Gonna have to look into this.
Chili
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: How do YOU multithread?
Just wondering,
isn't that where you have if == and if >=, perhaps the first one is just being skipped. Also, I don't think movntdqa uses _mm_stream_load according to msdn docs, and it's SSE4.1 which Phenom doesn't support, but your Core 2 should. Just tried on Core 2 duo Wind 10 64 bit, crashes upon open and won't let me go into debug.
Code: Select all
je
jae
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
Re: How do YOU multithread?
Alright, replaced new with aligned malloc for the sysBuffer and got rid of the nt moves. It works on my laptop now, so hopefully it'll work on your machine too.
Interesting results for me: for my Haswell I get no speed difference on pos0 (aligned case), whereas there is about a 17% performance hit in the aligned routine for pos1. Unaligned access strictly better.
On the Core2, there is a 33% slowdown for unaligned on pos0, and the difference grows to a 2x slowdown on pos1. Aligned access strictly better for performance (unaligned still wins for ease/elegance of code of course).
So what instructions/routine you should choose really does depend on the processor. I'm interested in seeing how your phenom II (and FX if you can access one) compares.
Interesting results for me: for my Haswell I get no speed difference on pos0 (aligned case), whereas there is about a 17% performance hit in the aligned routine for pos1. Unaligned access strictly better.
On the Core2, there is a 33% slowdown for unaligned on pos0, and the difference grows to a 2x slowdown on pos1. Aligned access strictly better for performance (unaligned still wins for ease/elegance of code of course).
So what instructions/routine you should choose really does depend on the processor. I'm interested in seeing how your phenom II (and FX if you can access one) compares.
- Attachments
-
- yes.zip
- (173.87 KiB) Downloaded 106 times
Chili
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: How do YOU multithread?
Not sure what the pos thing is about, but Phenom II 955 3.2GHz quad core:
Code: Select all
aligned pos 0: 0.184 - 0.187
unaligned pos 0: 0.185 - 0.190
// frustrating to get to pos 1
aligned pos 1: 0.210 - 0.214
unaligned pos 1: 0.198 - 0.201
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: How do YOU multithread?
I might have a solution to my original code acting funny on your machines. Check D3DGraphics::EndFrame. I don't account for any padding, I only assume it will be a multiple of 16. May not be the case considering the 1920x1080 resolution, but on my Core2 duo on board graphics, at 800x600, the pitch was 4096 instead of 3200. The computer I code on has a GTX 560 and it's always been 4 * screenwidth. It's possible that it's causing a problem with the pitch.
Anyway, I know we've moved on, but just thought of it and figured I'd at least put it out there.
Anyway, I know we've moved on, but just thought of it and figured I'd at least put it out there.
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com
-
- Posts: 4373
- Joined: February 28th, 2013, 3:23 am
- Location: Oklahoma, United States
Re: How do YOU multithread?
AMD FX 8350
Aligned POS 0: 0.104
Unaligned POS 0: 0.096
Aligned POS 1: 0.110
Unaligned POS 1: 0.100
Aligned POS 0: 0.104
Unaligned POS 0: 0.096
Aligned POS 1: 0.110
Unaligned POS 1: 0.100
If you think paging some data from disk into RAM is slow, try paging it into a simian cerebrum over a pair of optical nerves. - gameprogrammingpatterns.com