javascript - 字符编码: â?

我正在尝试拼凑出这串神秘的字符串??我在我们的数据库中看到了很多 - 我相当确定这是字符编码之间转换的结果，但我并不完全肯定。

用户能够将文本输入(或剪切和粘贴)到 Ext-Js 富文本编辑器中。数据被发布到一个 severlet，该 severlet 将其保存到数据库中，当我在数据库中查看它时，我看到了那些奇怪的字符......

如果我能够发现正确的编码，是否有任何方法可以将它们解码回其原始含义 - 或者在转换过程中是否会丢失位或字节？
用户正在从多个版本的 MS Word 和 PDF 中进行剪切和粘贴。编码是否遵循用户复制的位置？

谢谢

网站是UTF-8 我们使用的是ms sql server 2005；

SELECT serverproperty('Collation') -- 服务器默认排序规则。 Latin1_General_CI_AS

SELECT databasepropertyex('xxxx', 'Collation') -- 数据库默认值 SQL_Latin1_General_CP1_CI_AS

和列:

Column_name Type    Computed    Length  Prec    Scale   Nullable    TrimTrailingBlanks  FixedLenNullInSource    Collation
text    varchar no  -1                  yes no  yes SQL_Latin1_General_CP1_CI_AS

The non-Unicode equivalents of the nchar, nvarchar, and ntext data types in SQL Server 2000 are listed below. When Unicode data is inserted into one of these non-Unicode data type columns through a command string (otherwise known as a "language event"), SQL Server converts the data to the data type using the code page associated with the collation of the column. When a character cannot be represented on a code page, it is replaced by a question mark (?), indicating the data has been lost. Appearance of unexpected characters or question marks in your data indicates your data has been converted from Unicode to non-Unicode at some layer, and this conversion resulted in lost characters.

所以这可能是问题的根本原因......而且对我们来说解决起来并不容易。

最佳答案

â 在 ISO-8859-1 和 windows-1252 中编码为 0xE2。 0xE2 也是 UTF-8 中三字节序列的前导字节。 (具体来说，对于 U+2000 到 U+2FFF 范围，其中包括 windows-1252 字符 –—''‚“”„†‡•…‰´›€™)。

因此，您的文本似乎以 UTF-8 编码，但被误解为 Windows-1252 格式，并显示为 â 后跟两个不可打印的字符。

关于javascript - 字符编码: â?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4547149/

javascript - 字符编码: â?

上一篇：javascript - 用户定义对象的原型(prototype)有什么作用？

下一篇：javascript ->>、<<、| 的含义JavaScript 中的和 &