-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renamed bsz
to bs
for consistency; removed dead code
#299
Conversation
[ghstack-poisoned]
ghstack-source-id: 0b273e8f81013c1c632f0c505b7229d51af3e488 Pull Request resolved: #299
@@ -132,7 +132,6 @@ class Attention(nn.Module): | |||
Attributes: | |||
n_kv_heads (int): Number of key and value heads. | |||
n_heads (int): Number of query heads. | |||
n_local_kv_heads (int): Number of local key and value heads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an attribute (only one occurrence of n_local_kv_heads
if you search in this file)
@@ -183,12 +182,12 @@ def forward( | |||
torch.Tensor: Output tensor after attention. | |||
|
|||
""" | |||
bsz, seqlen, _ = x.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all inline comments in this method use bs
for batch size so can make this bs
for consistency
@@ -421,7 +420,7 @@ def forward(self, tokens: torch.Tensor): | |||
torch.Tensor: Output logits after applying the Transformer model. | |||
|
|||
""" | |||
_bsz, seqlen = tokens.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, _bsz
is unused, so just remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it helps readability to know the tokens.shape
is (batch size, sequence length), I can keep it and maybe rename it to _bs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not used, it improves code readability -- it tells how many dimensions tokens
has, and what they are. So IMO I'd wish they are kept. Also, the "unusedness" has been indicated using the _
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it helps readability to know the
tokens.shape
is (batch size, sequence length), I can keep it and maybe rename it to_bs
?
just saw this message, yeah I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to _bs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment inline.
@@ -421,7 +420,7 @@ def forward(self, tokens: torch.Tensor): | |||
torch.Tensor: Output logits after applying the Transformer model. | |||
|
|||
""" | |||
_bsz, seqlen = tokens.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not used, it improves code readability -- it tells how many dimensions tokens
has, and what they are. So IMO I'd wish they are kept. Also, the "unusedness" has been indicated using the _
prefix.
some minor cleanups [ghstack-poisoned]
ghstack-source-id: bbedad3819ab9ef90b233209c34dd1dbc846b06a Pull Request resolved: #299
ghstack-source-id: bbedad3819ab9ef90b233209c34dd1dbc846b06a Pull Request resolved: #299
ghstack-source-id: bbedad3819ab9ef90b233209c34dd1dbc846b06a Pull Request resolved: pytorch#299
ghstack-source-id: bbedad3819ab9ef90b233209c34dd1dbc846b06a Pull Request resolved: pytorch#299
Stack from ghstack (oldest at bottom):
bsz
tobs
for consistency; removed dead code #299some minor cleanups