C isupper() 函数

标签 c

我目前正在阅读“The C Programming Language 2nd edition”,我对这个练习不太清楚:

Functions like isupper can be implemented to save space or to save time. Explore both possibilities.

  • 如何实现这个功能?
  • 我应该如何写两个版本,一个是为了节省时间,一个是为了 节省空间(一些伪代码会很好)?

如果能给我一些建议,我将不胜感激。

最佳答案

原始答案

一个版本使用一个用适当值初始化的数组,代码集中每个字符一个字节(加 1 以允许 EOF,它也可以传递给分类函数):

static const char bits[257] = { ...initialization... };

int isupper(int ch)
{
    assert(ch == EOF || (ch >= 0 && ch <= 255));
    return((bits+1)[ch] & UPPER_MASK);
}

请注意,“位”可由所有各种函数使用,如 isupper()islower()isalpha() 等,并为掩码设置适当的值。如果您使“位”数组在运行时可变,则可以适应不同的(单字节)代码集。

这需要空间——数组。

另一个版本假设大写字符的连续性,以及有限的有效大写字符集(对 ASCII 没问题,对 ISO 8859-1 或其相关标准不太好):

int isupper(int ch)
{
    return (ch >= 'A' && ch <= 'Z');  // ASCII only - not a good implementation!
}

这可以(几乎)用宏来实现;很难避免对字符进行两次评估,这在标准中实际上是不允许的。使用非标准 (GNU) 扩展,它可以实现为仅对字符参数求值一次的宏。将其扩展到 ISO 8859-1 需要第二个条件,即:

int isupper(int ch)
{
    return ((ch >= 'A' && ch <= 'Z')) || (ch >= 0xC0 && ch <= 0xDD));
}

经常将其作为宏重复,由于位掩码具有固定大小,“空间节省”很快成为一种成本。

鉴于现代代码集的要求,映射版本在实践中几乎无一异常(exception)地被使用;它可以在运行时适应当前代码集等,而基于范围的版本则不能。


扩展答案

I still can't figure out how UPPER_MASK works. Can you explain it more specifically?

忽略 header 中符号的命名空间问题,您有一系列十二个分类宏:

  • isalpha()
  • isupper()
  • islower()
  • isalnum()
  • isgraph()
  • isprint()
  • iscntrl()
  • isdigit()
  • isblank()
  • isspace()
  • ispunct()
  • isxdigit()

isspace()isblank()的区别是:

  • isspace()空格 (' ')、换页符 ('\f')、换行符 ('\n')、回车符 ('\r')、水平制表符 ('\t') 和垂直制表符 ('\v'< )/li>
  • isblank()空格 ( ' ' ) 和水平制表符 ( '\t' )

C 标准中有这些字符集的定义,以及 C 语言环境的指南。

例如(在 C 语言环境中),如果 islower() 为真,则 isupper()isalpha() 为真,但在其他语言环境中则不必为真。

我认为必要的部分是:

  • DIGIT_MASK
  • XDIGT_MASK
  • ALPHA_MASK
  • LOWER_MASK
  • UPPER_MASK
  • PUNCT_MASK
  • SPACE_MASK
  • PRINT_MASK
  • CNTRL_MASK
  • BLANK_MASK

从这十个面具,你可以创建另外两个:

  • ALNUM_MASK = ALPHA_MASK | DIGIT_MASK
  • GRAPH_MASK = ALNUM_MASK | PUNCT_MASK

从表面上看,您也可以使用 ALPHA_MASK = UPPER_MASK | LOWER_MASK ,但在某些语言环境中,存在既不是大写也不是小写的字母字符。

因此,我们可以如下定义掩码:

enum CTYPE_MASK {
    DIGIT_MASK = 0x0001,
    XDIGT_MASK = 0x0002,
    LOWER_MASK = 0x0004,
    UPPER_MASK = 0x0008,
    ALPHA_MASK = 0x0010,
    PUNCT_MASK = 0x0020,
    SPACE_MASK = 0x0040,
    PRINT_MASK = 0x0080,
    CNTRL_MASK = 0x0100,
    BLANK_MASK = 0x0200,

    ALNUM_MASK = ALPHA_MASK | DIGIT_MASK,
    GRAPH_MASK = ALNUM_MASK | PUNCT_MASK
};

extern unsigned short ctype_bits[];

字符集的数据;显示的数据适用于 ISO 8859-1 的前半部分,但与所有 8859-x 代码集的前半部分相同。我使用 C99 指定的初始值设定项作为文档辅助,即使条目都是按顺序排列的:

unsigned short ctype_bits[] =
{
    [EOF   +1] = 0,
    ['\0'  +1] = CNTRL_MASK,
    ['\1'  +1] = CNTRL_MASK,
    ['\2'  +1] = CNTRL_MASK,
    ['\3'  +1] = CNTRL_MASK,
    ['\4'  +1] = CNTRL_MASK,
    ['\5'  +1] = CNTRL_MASK,
    ['\6'  +1] = CNTRL_MASK,
    ['\a'  +1] = CNTRL_MASK,
    ['\b'  +1] = CNTRL_MASK,
    ['\t'  +1] = CNTRL_MASK|SPACE_MASK|BLANK_MASK,
    ['\n'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\v'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\f'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\r'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\x0E'+1] = CNTRL_MASK,
    ['\x0F'+1] = CNTRL_MASK,
    ['\x10'+1] = CNTRL_MASK,
    ['\x11'+1] = CNTRL_MASK,
    ['\x12'+1] = CNTRL_MASK,
    ['\x13'+1] = CNTRL_MASK,
    ['\x14'+1] = CNTRL_MASK,
    ['\x15'+1] = CNTRL_MASK,
    ['\x16'+1] = CNTRL_MASK,
    ['\x17'+1] = CNTRL_MASK,
    ['\x18'+1] = CNTRL_MASK,
    ['\x19'+1] = CNTRL_MASK,
    ['\x1A'+1] = CNTRL_MASK,
    ['\x1B'+1] = CNTRL_MASK,
    ['\x1C'+1] = CNTRL_MASK,
    ['\x1D'+1] = CNTRL_MASK,
    ['\x1E'+1] = CNTRL_MASK,
    ['\x1F'+1] = CNTRL_MASK,

    [' '   +1] = SPACE_MASK|PRINT_MASK|BLANK_MASK,

    ['!'   +1] = PUNCT_MASK|PRINT_MASK,
    ['"'   +1] = PUNCT_MASK|PRINT_MASK,
    ['#'   +1] = PUNCT_MASK|PRINT_MASK,
    ['$'   +1] = PUNCT_MASK|PRINT_MASK,
    ['%'   +1] = PUNCT_MASK|PRINT_MASK,
    ['&'   +1] = PUNCT_MASK|PRINT_MASK,
    ['\''  +1] = PUNCT_MASK|PRINT_MASK,
    ['('   +1] = PUNCT_MASK|PRINT_MASK,
    [')'   +1] = PUNCT_MASK|PRINT_MASK,
    ['*'   +1] = PUNCT_MASK|PRINT_MASK,
    ['+'   +1] = PUNCT_MASK|PRINT_MASK,
    [','   +1] = PUNCT_MASK|PRINT_MASK,
    ['-'   +1] = PUNCT_MASK|PRINT_MASK,
    ['.'   +1] = PUNCT_MASK|PRINT_MASK,
    ['/'   +1] = PUNCT_MASK|PRINT_MASK,

    ['0'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['1'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['2'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['3'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['4'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['5'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['6'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['7'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['8'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['9'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,

    [':'   +1] = PUNCT_MASK|PRINT_MASK,
    [';'   +1] = PUNCT_MASK|PRINT_MASK,
    ['<'   +1] = PUNCT_MASK|PRINT_MASK,
    ['='   +1] = PUNCT_MASK|PRINT_MASK,
    ['>'   +1] = PUNCT_MASK|PRINT_MASK,
    ['?'   +1] = PUNCT_MASK|PRINT_MASK,
    ['@'   +1] = PUNCT_MASK|PRINT_MASK,

    ['A'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['B'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['C'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['D'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['E'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['F'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['G'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['H'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['I'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['J'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['K'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['L'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['M'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['N'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['O'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['P'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['Q'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['R'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['S'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['T'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['U'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['V'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['W'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['X'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['Y'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['Z'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,

    ['['   +1] = PUNCT_MASK|PRINT_MASK,
    ['\\'  +1] = PUNCT_MASK|PRINT_MASK,
    [']'   +1] = PUNCT_MASK|PRINT_MASK,
    ['^'   +1] = PUNCT_MASK|PRINT_MASK,
    ['_'   +1] = PUNCT_MASK|PRINT_MASK,
    ['`'   +1] = PUNCT_MASK|PRINT_MASK,

    ['a'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['b'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['c'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['d'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['e'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['f'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['g'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['h'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['i'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['j'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['k'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['l'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['m'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['n'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['o'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['p'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['q'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['r'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['s'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['t'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['u'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['v'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['w'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['x'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['y'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['z'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,

    ['{'   +1] = PUNCT_MASK|PRINT_MASK,
    ['|'   +1] = PUNCT_MASK|PRINT_MASK,
    ['}'   +1] = PUNCT_MASK|PRINT_MASK,
    ['~'   +1] = PUNCT_MASK|PRINT_MASK,
    ['\x7F'+1] = CNTRL_MASK,

    ...continue for second half of 8859-x character set...
};

#define isalpha(c)  ((ctype_bits+1)[c] & ALPHA_MASK)
#define isupper(c)  ((ctype_bits+1)[c] & UPPER_MASK)
#define islower(c)  ((ctype_bits+1)[c] & LOWER_MASK)
#define isalnum(c)  ((ctype_bits+1)[c] & ALNUM_MASK)
#define isgraph(c)  ((ctype_bits+1)[c] & GRAPH_MASK)
#define isprint(c)  ((ctype_bits+1)[c] & PRINT_MASK)
#define iscntrl(c)  ((ctype_bits+1)[c] & CNTRL_MASK)
#define isdigit(c)  ((ctype_bits+1)[c] & DIGIT_MASK)
#define isblank(c)  ((ctype_bits+1)[c] & BLANK_MASK)
#define isspace(c)  ((ctype_bits+1)[c] & SPACE_MASK)
#define ispunct(c)  ((ctype_bits+1)[c] & PUNCT_MASK)
#define isxdigit(c) ((ctype_bits+1)[c] & XDIGT_MASK)

如前所述,此处的名称实际上位于为用户保留的命名空间中,因此如果您查看 <ctype.h> header ,您会发现更多神秘名称,它们可能都以一两个下划线开头。

关于C isupper() 函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2169261/

相关文章:

C 中的 Const 表达式 - 如何散列字符串?

c - 从代表不同值的不同文件访问 C 中的相同宏

c - 使用结构体在函数之间传递数组信息

c - 释放整个列表并将头设置为 NULL

c++ - 为什么比较无符号表达式 < 0 总是假的

c++ - 地址的奇怪错误

c - BST 中的段错误

c++ - C++ 中的 Win32 显示中文...我做错了什么?

c - 扫描文件并分配正确的空间来保存文件

基于C的conf文件阅读器