gobyexample/public/strings-and-runes

268 lines
16 KiB
Plaintext
Generated

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Go by Example: Strings and Runes</title>
<link rel=stylesheet href="site.css">
</head>
<script>
onkeydown = (e) => {
if (e.key == "ArrowLeft") {
window.location.href = 'pointers';
}
if (e.key == "ArrowRight") {
window.location.href = 'structs';
}
}
</script>
<body>
<div class="example" id="strings-and-runes">
<h2><a href="./">Go by Example</a>: Strings and Runes</h2>
<table>
<tr>
<td class="docs">
<p>A Go string is a read-only slice of bytes. The language
and the standard library treat strings specially - as
containers of text encoded in <a href="https://en.wikipedia.org/wiki/UTF-8">UTF-8</a>.
In other languages, strings are made of &ldquo;characters&rdquo;.
In Go, the concept of a character is called a <code>rune</code> - it&rsquo;s
an integer that represents a Unicode code point.
<a href="https://go.dev/blog/strings">This Go blog post</a> is a good
introduction to the topic.</p>
</td>
<td class="code empty leading">
</td>
</tr>
<tr>
<td class="docs">
</td>
<td class="code leading">
<a href="https://go.dev/play/p/39BpTLf6BAz"><img title="Run code" src="play.png" class="run" /></a><img title="Copy code" src="clipboard.png" class="copy" />
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kn">import</span> <span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="s">&#34;fmt&#34;</span>
</span></span><span class="line"><span class="cl"> <span class="s">&#34;unicode/utf8&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p><code>s</code> is a <code>string</code> assigned a literal value
representing the word &ldquo;hello&rdquo; in the Thai
language. Go string literals are UTF-8
encoded text.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="kd">const</span> <span class="nx">s</span> <span class="p">=</span> <span class="s">&#34;สวัสดี&#34;</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>Since strings are equivalent to <code>[]byte</code>, this
will produce the length of the raw bytes stored within.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Len:&#34;</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>Indexing into a string produces the raw byte values at
each index. This loop generates the hex values of all
the bytes that constitute the code points in <code>s</code>.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%x &#34;</span><span class="p">,</span> <span class="nx">s</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">()</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>To count how many <em>runes</em> are in a string, we can use
the <code>utf8</code> package. Note that the run-time of
<code>RuneCountInString</code> depends on the size of the string,
because it has to decode each UTF-8 rune sequentially.
Some Thai characters are represented by multiple UTF-8
code points, so the result of this count may be surprising.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Rune count:&#34;</span><span class="p">,</span> <span class="nx">utf8</span><span class="p">.</span><span class="nf">RuneCountInString</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>A <code>range</code> loop handles strings specially and decodes
each <code>rune</code> along with its offset in the string.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">idx</span><span class="p">,</span> <span class="nx">runeValue</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">s</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%#U starts at %d\n&#34;</span><span class="p">,</span> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">idx</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>We can achieve the same iteration by using the
<code>utf8.DecodeRuneInString</code> function explicitly.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;\nUsing DecodeRuneInString&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">w</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span> <span class="nx">i</span> <span class="o">+=</span> <span class="nx">w</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">width</span> <span class="o">:=</span> <span class="nx">utf8</span><span class="p">.</span><span class="nf">DecodeRuneInString</span><span class="p">(</span><span class="nx">s</span><span class="p">[</span><span class="nx">i</span><span class="p">:])</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">&#34;%#U starts at %d\n&#34;</span><span class="p">,</span> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="nx">w</span> <span class="p">=</span> <span class="nx">width</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>This demonstrates passing a <code>rune</code> value to a function.</p>
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nf">examineRune</span><span class="p">(</span><span class="nx">runeValue</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">examineRune</span><span class="p">(</span><span class="nx">r</span> <span class="kt">rune</span><span class="p">)</span> <span class="p">{</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
<p>Values enclosed in single quotes are <em>rune literals</em>. We
can compare a <code>rune</code> value to a rune literal directly.</p>
</td>
<td class="code">
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="k">if</span> <span class="nx">r</span> <span class="o">==</span> <span class="sc">&#39;t&#39;</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;found tee&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="nx">r</span> <span class="o">==</span> <span class="sc">&#39;ส&#39;</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;found so sua&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre>
</td>
</tr>
</table>
<table>
<tr>
<td class="docs">
</td>
<td class="code leading">
<pre class="chroma"><code><span class="line"><span class="cl"><span class="gp">$</span> go run strings-and-runes.go
</span></span><span class="line"><span class="cl"><span class="go">Len: 18
</span></span></span><span class="line"><span class="cl"><span class="go">e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
</span></span></span><span class="line"><span class="cl"><span class="go">Rune count: 6
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A &#39;ส&#39; starts at 0
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E27 &#39;ว&#39; starts at 3
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E31 &#39;ั&#39; starts at 6
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A &#39;ส&#39; starts at 9
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E14 &#39;ด&#39; starts at 12
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E35 &#39;ี&#39; starts at 15</span></span></span></code></pre>
</td>
</tr>
<tr>
<td class="docs">
</td>
<td class="code">
<pre class="chroma"><code><span class="line"><span class="cl"><span class="go">Using DecodeRuneInString
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A &#39;ส&#39; starts at 0
</span></span></span><span class="line"><span class="cl"><span class="go">found so sua
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E27 &#39;ว&#39; starts at 3
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E31 &#39;ั&#39; starts at 6
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A &#39;ส&#39; starts at 9
</span></span></span><span class="line"><span class="cl"><span class="go">found so sua
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E14 &#39;ด&#39; starts at 12
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E35 &#39;ี&#39; starts at 15</span></span></span></code></pre>
</td>
</tr>
</table>
<p class="next">
Next example: <a href="structs">Structs</a>.
</p>
<p class="footer">
by <a href="https://markmcgranaghan.com">Mark McGranaghan</a> and <a href="https://eli.thegreenplace.net">Eli Bendersky</a> | <a href="https://github.com/mmcgrana/gobyexample">source</a> | <a href="https://github.com/mmcgrana/gobyexample#license">license</a>
</p>
</div>
<script>
var codeLines = [];
codeLines.push('');codeLines.push('package main\u000A');codeLines.push('import (\u000A \"fmt\"\u000A \"unicode/utf8\"\u000A)\u000A');codeLines.push('func main() {\u000A');codeLines.push(' const s \u003D \"สวัสดี\"\u000A');codeLines.push(' fmt.Println(\"Len:\", len(s))\u000A');codeLines.push(' for i :\u003D 0; i \u003C len(s); i++ {\u000A fmt.Printf(\"%x \", s[i])\u000A }\u000A fmt.Println()\u000A');codeLines.push(' fmt.Println(\"Rune count:\", utf8.RuneCountInString(s))\u000A');codeLines.push(' for idx, runeValue :\u003D range s {\u000A fmt.Printf(\"%#U starts at %d\\n\", runeValue, idx)\u000A }\u000A');codeLines.push(' fmt.Println(\"\\nUsing DecodeRuneInString\")\u000A for i, w :\u003D 0, 0; i \u003C len(s); i +\u003D w {\u000A runeValue, width :\u003D utf8.DecodeRuneInString(s[i:])\u000A fmt.Printf(\"%#U starts at %d\\n\", runeValue, i)\u000A w \u003D width\u000A');codeLines.push(' examineRune(runeValue)\u000A }\u000A}\u000A');codeLines.push('func examineRune(r rune) {\u000A');codeLines.push(' if r \u003D\u003D \'t\' {\u000A fmt.Println(\"found tee\")\u000A } else if r \u003D\u003D \'ส\' {\u000A fmt.Println(\"found so sua\")\u000A }\u000A}\u000A');codeLines.push('');codeLines.push('');
</script>
<script src="site.js" async></script>
</body>
</html>