Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 3 - ISA: X86 duality #44

Merged
merged 5 commits into from
Apr 30, 2024
Merged

Chapter 3 - ISA: X86 duality #44

merged 5 commits into from
Apr 30, 2024

Conversation

pveentjer
Copy link
Contributor

@pveentjer pveentjer commented Apr 2, 2024

Added note about the duality of load/store and register/memory behavior of the X86.

@pveentjer pveentjer changed the title ISA: X86 duality Chapter 3 - ISA: X86 duality Apr 3, 2024
@dendibakh
Copy link
Owner

Not sure what value it provides to the readers. If there is some implicit point that you're trying to make, then I suggest to make it explicit.

@pveentjer
Copy link
Contributor Author

pveentjer commented Apr 9, 2024

Hi Denis,

thank you for the review.

The point I'm trying to make is that in the initial section, it states that modern architectures are load-store. X86 is one of the most used ISAs and isn't a load-store architecture but register-memory. As a consequence, a reader of the book could falsely assume that X86 ISA is a load-store architecture.

If you think this brings no value, I'll close the PR.

If you think this brings value, I'm all ears to rewrite it. Perhaps add it as a footnote?

Regards,

Peter.

@dendibakh
Copy link
Owner

Ok, I changed the original paragraph:
register-based, load-store architectures -> register-memory architectures
Let me know if that's good now.

@pveentjer
Copy link
Contributor Author

pveentjer commented Apr 10, 2024

Hi Denis,

the original text was correct. So modern ISA's like RISC-V and ARM are load-store architectures.

The X86 ISA is register-memory, but after uops conversion, the X86 microarchitecture also has transformed into a load-store architecture.

@dendibakh
Copy link
Owner

Ooops, of course, you're right. I was implicitly thinking about x86 again. :)
Will fix.

@dendibakh
Copy link
Owner

Please check now.

@pveentjer
Copy link
Contributor Author

pveentjer commented Apr 13, 2024

What is missing is that the X86 microarchitecture is a load/store architecture as part of the uops conversion.

Given the following code:

add [C],[A],[B] ;; C=A+B

After uops conversion it could look like this:

load r1, [A]         ;; load [A] in r1
load r2, [B]         ;; load [B] in r2
add r3, r1, r2       ;; add r1 and r2 and write it to r3
store r3, [C]        ;; store r3 in [C]

I find it very helpful because I need to think a lot less about the complex addressing modes and it helps me to understand the performance opportunities. In the first example, it isn't immediately clear that the loads of [A] and [B] can be performed out of order, but in the uops version, it is much more obvious.

It could also help to prevent people to manually 'optimize' code like this (C-example):

register int a=A; 
register int b=B;
C=a+b;

This is written by a 'smart' engineer who wants to help the CPU by giving more opportunities for out-of-order execution because he doesn't understand the uops version of add [C],[A],[B]. But he is just making the code more complex and bigger and the CPU will already do this for him anyway.

@dendibakh
Copy link
Owner

Ok, I agree, but I think this section is not the best place to discuss uops. I have a section for this: 4-4 UOP.md
There I talk about uops cracking. Please check it and let me know if you have any comments.

@dendibakh
Copy link
Owner

Thank you @pveentjer !

@dendibakh dendibakh merged commit b0baf29 into dendibakh:main Apr 30, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants