Improve Unicode documentation, fix typos.

git-svn-id: file:///fltk/svn/fltk/branches/branch-1.3-porting@11549 ea41ed52-d2ee-0310-a9c1-e6b18d33e121
2016-04-07 00:03:19 +00:00 · 2016-04-07 00:03:19 +00:00 · 406fcaf305
commit 406fcaf305
parent f120334da3
1 changed files with 25 additions and 18 deletions
--- a/documentation/src/unicode.dox
+++ b/documentation/src/unicode.dox
@ -12,8 +12,9 @@ the current state of Unicode support.
 \section unicode_about About Unicode, ISO 10646 and UTF-8

 The summary of Unicode, ISO 10646 and UTF-8 given below is
-deliberately brief, and provides just enough information for
+deliberately brief and provides just enough information for
 the rest of this chapter.
+
 For further information, please see:
 - http://www.unicode.org
 - http://www.iso.org
@ -21,11 +22,12 @@ For further information, please see:
 - http://www.cl.cam.ac.uk/~mgk25/unicode.html
 - http://www.apps.ietf.org/rfc/rfc3629.html

+
 \par The Unicode Standard

 The Unicode Standard was originally developed by a consortium of mainly
 US computer manufacturers and developers of multi-lingual software.
-It has now become a defacto standard for character encoding,
+It has now become a defacto standard for character encoding
 and is supported by most of the major computing companies in the world.

 Before Unicode, many different systems, on different platforms,
@ -40,7 +42,8 @@ and typographic publishing systems, such as algorithms for sorting and
 comparing text, composite character and text rendering, right-to-left
 and bi-directional text handling.

-<i>There are currently no plans to add this extra functionality to FLTK.</i>
+\note There are currently no plans to add this extra functionality to FLTK.
+

 \par ISO 10646

@ -57,8 +60,8 @@ which contains the characters required for almost all known languages.
 The standard also defines three different implementation levels specifying
 how these characters can be combined.

-<i>There are currently no plans for handling the different implementation
-levels or the combining characters in FLTK.</i>
+\note There are currently no plans for handling the different implementation
+levels or the combining characters in FLTK.

 In UCS, characters have a unique numerical code and an official name,
 and are usually shown using 'U+' and the code in hexadecimal,
@ -67,15 +70,15 @@ The UCS characters U+0000 to U+007F correspond to US-ASCII,
 and U+0000 to U+00FF correspond to ISO 8859-1 (Latin1).

 ISO 10646 was originally designed to handle a 31-bit character set
-from U+00000000 to U+7FFFFFFF, but the current idea is that 21-bits
+from U+00000000 to U+7FFFFFFF, but the current idea is that 21 bits
 will be sufficient for all future needs, giving characters up to
 U+10FFFF.  The complete character set is sub-divided into \e planes.
 <i>Plane 0</i>, also known as the <b>Basic Multilingual Plane</b>
 (BMP), ranges from U+0000 to U+FFFD and consists of the most commonly
 used characters from previous encoding standards. Other planes
 contain characters for specialist applications.
-\todo
-Do we need this info about planes?
+
+\todo Do we need this info about planes?

 The UCS also defines various methods of encoding characters as
 a sequence of bytes.
@ -87,7 +90,7 @@ but this is even more wasteful for ASCII or Latin1.

 \par UTF-8

-The Unicode standard defines various UCS Transformation Formats.
+The Unicode standard defines various UCS Transformation Formats (UTF).
 UTF-16 and UTF-32 are based on units of two and four bytes.
 UCS characters requiring more than 16 bits are encoded using
 "surrogate pairs" in UTF-16.
@ -100,9 +103,11 @@ making the transformation to Unicode quick and easy.
 All UCS characters above U+007F are encoded as a sequence of
 several bytes. The top bits of the first byte are set to show
 the length of the byte sequence, and subseqent bytes are
-always in the range 0x80 to 0x8F. This combination provides
+always in the range 0x80 to 0xBF. This combination provides
 some level of synchronisation and error detection.

+\par
+
 <table summary="Unicode character byte sequences" align="center">
 <tr>
 <td>Unicode range</td>
@ -134,6 +139,8 @@ some level of synchronisation and error detection.
 </tr>
 </table>

+\par
+
 Moving from ASCII encoding to Unicode will allow all new FLTK
 applications to be easily internationalized and used all over
 the world. By choosing UTF-8 encoding, FLTK remains largely
@ -176,12 +183,12 @@ the following limitations:

 - FLTK will only handle single characters, so composed characters
  consisting of a base character and floating accent characters
-  will be treated as multiple characters; 
+  will be treated as multiple characters.

 - FLTK will only compare or sort strings on a byte by byte basis
-  and not on a general Unicode character basis;
+  and not on a general Unicode character basis.

- FLTK will not handle right-to-left or bi-directional text;
+- FLTK will not handle right-to-left or bi-directional text.
  
  \todo
  Verify 16/24 bit Unicode limit for different character sets?
@ -189,7 +196,7 @@ the following limitations:
  appears to handle a wider set. What about illegal characters?
  See comments in %fl_utf8fromwc() and %fl_utf8toUtf16().

-\section unicode_illegals Illegal Unicode and UTF-8 sequences
+\section unicode_illegals Illegal Unicode and UTF-8 Sequences

 Three pre-processor variables are defined in the source code [1] that
 determine how %fl_utf8decode() handles illegal UTF-8 sequences:
@ -240,7 +247,7 @@ of the sequence. Trailing bytes in a UTF-8 sequence will return -1.
 Please see the individual function description for further details
 about error handling and return values.

-\section unicode_fltk_calls FLTK Unicode and UTF-8 functions
+\section unicode_fltk_calls FLTK Unicode and UTF-8 Functions

 This section currently provides a brief overview of the functions.
 For more details, consult the main text for each function via its link.
@ -348,8 +355,8 @@ or ISO-8859-1 characters below 0xFF are replaced with '?'.
 \par
 Both functions return the number of bytes that would be written, not
 counting the null terminator.
-\p destlen provides a means of limiting the number of bytes written,
-so setting \p destlen to zero is a means of measuring how much storage
+\p dstlen provides a means of limiting the number of bytes written,
+so setting \p dstlen to zero is a means of measuring how much storage
 would be needed before doing the real conversion.


@ -455,7 +462,7 @@ converts the strings to lower case Unicode as part of the comparison.
 \p %flt_utf_strncasecmp() only compares the first \p n characters [bytes?]


-\section unicode_system_calls FLTK Unicode versions of system calls
+\section unicode_system_calls FLTK Unicode Versions of System Calls

 - int fl_access(const char* f, int mode)
  \b OksiD