268 lines
16 KiB
Plaintext
Generated
268 lines
16 KiB
Plaintext
Generated
<!DOCTYPE html>
|
|
<html>
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<title>Go by Example: Strings and Runes</title>
|
|
<link rel=stylesheet href="site.css">
|
|
</head>
|
|
<script>
|
|
onkeydown = (e) => {
|
|
|
|
if (e.key == "ArrowLeft") {
|
|
window.location.href = 'pointers';
|
|
}
|
|
|
|
|
|
if (e.key == "ArrowRight") {
|
|
window.location.href = 'structs';
|
|
}
|
|
|
|
}
|
|
</script>
|
|
<body>
|
|
<div class="example" id="strings-and-runes">
|
|
<h2><a href="./">Go by Example</a>: Strings and Runes</h2>
|
|
|
|
<table>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>A Go string is a read-only slice of bytes. The language
|
|
and the standard library treat strings specially - as
|
|
containers of text encoded in <a href="https://en.wikipedia.org/wiki/UTF-8">UTF-8</a>.
|
|
In other languages, strings are made of “characters”.
|
|
In Go, the concept of a character is called a <code>rune</code> - it’s
|
|
an integer that represents a Unicode code point.
|
|
<a href="https://go.dev/blog/strings">This Go blog post</a> is a good
|
|
introduction to the topic.</p>
|
|
|
|
</td>
|
|
<td class="code empty leading">
|
|
|
|
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
<a href="https://go.dev/play/p/39BpTLf6BAz"><img title="Run code" src="play.png" class="run" /></a><img title="Copy code" src="clipboard.png" class="copy" />
|
|
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kn">package</span> <span class="nx">main</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kn">import</span> <span class="p">(</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="s">"fmt"</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="s">"unicode/utf8"</span>
|
|
</span></span><span class="line"><span class="cl"><span class="p">)</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p><code>s</code> is a <code>string</code> assigned a literal value
|
|
representing the word “hello” in the Thai
|
|
language. Go string literals are UTF-8
|
|
encoded text.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="kd">const</span> <span class="nx">s</span> <span class="p">=</span> <span class="s">"สวัสดี"</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>Since strings are equivalent to <code>[]byte</code>, this
|
|
will produce the length of the raw bytes stored within.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">"Len:"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>Indexing into a string produces the raw byte values at
|
|
each index. This loop generates the hex values of all
|
|
the bytes that constitute the code points in <code>s</code>.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p"><</span> <span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"%x "</span><span class="p">,</span> <span class="nx">s</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">()</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>To count how many <em>runes</em> are in a string, we can use
|
|
the <code>utf8</code> package. Note that the run-time of
|
|
<code>RuneCountInString</code> depends on the size of the string,
|
|
because it has to decode each UTF-8 rune sequentially.
|
|
Some Thai characters are represented by multiple UTF-8
|
|
code points, so the result of this count may be surprising.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">"Rune count:"</span><span class="p">,</span> <span class="nx">utf8</span><span class="p">.</span><span class="nf">RuneCountInString</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>A <code>range</code> loop handles strings specially and decodes
|
|
each <code>rune</code> along with its offset in the string.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">idx</span><span class="p">,</span> <span class="nx">runeValue</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">s</span> <span class="p">{</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"%#U starts at %d\n"</span><span class="p">,</span> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">idx</span><span class="p">)</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="p">}</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>We can achieve the same iteration by using the
|
|
<code>utf8.DecodeRuneInString</code> function explicitly.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">"\nUsing DecodeRuneInString"</span><span class="p">)</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">w</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p"><</span> <span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">);</span> <span class="nx">i</span> <span class="o">+=</span> <span class="nx">w</span> <span class="p">{</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">width</span> <span class="o">:=</span> <span class="nx">utf8</span><span class="p">.</span><span class="nf">DecodeRuneInString</span><span class="p">(</span><span class="nx">s</span><span class="p">[</span><span class="nx">i</span><span class="p">:])</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Printf</span><span class="p">(</span><span class="s">"%#U starts at %d\n"</span><span class="p">,</span> <span class="nx">runeValue</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">w</span> <span class="p">=</span> <span class="nx">width</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>This demonstrates passing a <code>rune</code> value to a function.</p>
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="nf">examineRune</span><span class="p">(</span><span class="nx">runeValue</span><span class="p">)</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
|
|
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"><span class="kd">func</span> <span class="nf">examineRune</span><span class="p">(</span><span class="nx">r</span> <span class="kt">rune</span><span class="p">)</span> <span class="p">{</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
<p>Values enclosed in single quotes are <em>rune literals</em>. We
|
|
can compare a <code>rune</code> value to a rune literal directly.</p>
|
|
|
|
</td>
|
|
<td class="code">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"> <span class="k">if</span> <span class="nx">r</span> <span class="o">==</span> <span class="sc">'t'</span> <span class="p">{</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">"found tee"</span><span class="p">)</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="nx">r</span> <span class="o">==</span> <span class="sc">'ส'</span> <span class="p">{</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">"found so sua"</span><span class="p">)</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
|
|
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
<table>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
|
|
</td>
|
|
<td class="code leading">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"><span class="gp">$</span> go run strings-and-runes.go
|
|
</span></span><span class="line"><span class="cl"><span class="go">Len: 18
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">e0 b8 aa e0 b8 a7 e0 b8 b1 e0 b8 aa e0 b8 94 e0 b8 b5
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">Rune count: 6
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A 'ส' starts at 0
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E27 'ว' starts at 3
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E31 'ั' starts at 6
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A 'ส' starts at 9
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E14 'ด' starts at 12
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E35 'ี' starts at 15</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td class="docs">
|
|
|
|
</td>
|
|
<td class="code">
|
|
|
|
<pre class="chroma"><code><span class="line"><span class="cl"><span class="go">Using DecodeRuneInString
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A 'ส' starts at 0
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">found so sua
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E27 'ว' starts at 3
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E31 'ั' starts at 6
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E2A 'ส' starts at 9
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">found so sua
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E14 'ด' starts at 12
|
|
</span></span></span><span class="line"><span class="cl"><span class="go">U+0E35 'ี' starts at 15</span></span></span></code></pre>
|
|
</td>
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
<p class="next">
|
|
Next example: <a href="structs">Structs</a>.
|
|
</p>
|
|
|
|
|
|
<p class="footer">
|
|
by <a href="https://markmcgranaghan.com">Mark McGranaghan</a> and <a href="https://eli.thegreenplace.net">Eli Bendersky</a> | <a href="https://github.com/mmcgrana/gobyexample">source</a> | <a href="https://github.com/mmcgrana/gobyexample#license">license</a>
|
|
</p>
|
|
|
|
</div>
|
|
<script>
|
|
var codeLines = [];
|
|
codeLines.push('');codeLines.push('package main\u000A');codeLines.push('import (\u000A \"fmt\"\u000A \"unicode/utf8\"\u000A)\u000A');codeLines.push('func main() {\u000A');codeLines.push(' const s \u003D \"สวัสดี\"\u000A');codeLines.push(' fmt.Println(\"Len:\", len(s))\u000A');codeLines.push(' for i :\u003D 0; i \u003C len(s); i++ {\u000A fmt.Printf(\"%x \", s[i])\u000A }\u000A fmt.Println()\u000A');codeLines.push(' fmt.Println(\"Rune count:\", utf8.RuneCountInString(s))\u000A');codeLines.push(' for idx, runeValue :\u003D range s {\u000A fmt.Printf(\"%#U starts at %d\\n\", runeValue, idx)\u000A }\u000A');codeLines.push(' fmt.Println(\"\\nUsing DecodeRuneInString\")\u000A for i, w :\u003D 0, 0; i \u003C len(s); i +\u003D w {\u000A runeValue, width :\u003D utf8.DecodeRuneInString(s[i:])\u000A fmt.Printf(\"%#U starts at %d\\n\", runeValue, i)\u000A w \u003D width\u000A');codeLines.push(' examineRune(runeValue)\u000A }\u000A}\u000A');codeLines.push('func examineRune(r rune) {\u000A');codeLines.push(' if r \u003D\u003D \'t\' {\u000A fmt.Println(\"found tee\")\u000A } else if r \u003D\u003D \'ส\' {\u000A fmt.Println(\"found so sua\")\u000A }\u000A}\u000A');codeLines.push('');codeLines.push('');
|
|
</script>
|
|
<script src="site.js" async></script>
|
|
</body>
|
|
</html>
|