Skip to content

Commit

Permalink
Paper Revision{2023.findings-acl.38}, closes #3885.
Browse files Browse the repository at this point in the history
  • Loading branch information
anthology-assist committed Sep 17, 2024
1 parent c45051c commit 536512f
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion data/xml/2023.findings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3166,10 +3166,12 @@
<author><first>Ryan</first><last>Cotterell</last><affiliation>ETH Zürich</affiliation></author>
<pages>598-614</pages>
<abstract>Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data in NLP, despite being devised initially as a compression method.BPE appears to be a greedy algorithm at face value, but the underlying optimization problem that BPE seeks to solve has not yet been laid down. We formalize BPE as a combinatorial optimization problem. Via submodular functions, we prove that the iterative greedy version is a 1/sigma*(1-e(-sigma))-approximation of an optimal merge sequence, where sigma is the total backward curvature with respect to the optimal merge sequence. Empirically the lower bound of the approximation is approx0.37.We provide a faster implementation of BPE which improves the runtime complexity from O(NM) to O(N log M), where N is the sequence length and M is the merge count. Finally, we optimize the brute-force algorithm for optimal BPE using memoization.</abstract>
<url hash="40ed62e8">2023.findings-acl.38</url>
<url hash="8867d506">2023.findings-acl.38</url>
<bibkey>zouhar-etal-2023-formal</bibkey>
<doi>10.18653/v1/2023.findings-acl.38</doi>
<video href="2023.findings-acl.38.mp4"/>
<revision id="1" href="2023.findings-acl.38v1" hash="40ed62e8"/>
<revision id="2" href="2023.findings-acl.38v2" hash="8867d506" date="2024-09-17">Fix typos in Proof of Theorem 4.2 and Algorithm 3 as well as the malformed rendering of Figure 3.</revision>
</paper>
<paper id="39">
<title>Automatic Named Entity Obfuscation in Speech</title>
Expand Down

0 comments on commit 536512f

Please sign in to comment.