c - 在任意长度的位串中删除除最左边的 1 以外的所有内容的最有效方法

标签 c bit-manipulation mask bit

我最近突然想到,我不知道该怎么做。可以使用 (x & ~(x-1)) 删除字符串 x 最右边的 1 位。最左边的 1 位是否存在类似的表达式?

最佳答案

恕我直言,没有循环是做不到的。

至少在 C 语言中,按位运算只支持整数类型(即 intchar 的所有变体),但不支持数组。

关于char,我不确定,因为我最近看到了一个热门讨论是否可以考虑char。 (在 C 中,较小类型的值通常被隐式转换为 int(或 unsigned int)以进行算术或位运算。)

不直接支持任意长度的整数。它们必须类似于整数类型的数组。

也就是说,该算法实际上需要三个阶段:

  1. 跳过第一个非零元素之前的所有数组元素

  2. 用我命名为“2 的底幂”处理第一个非零元素

  3. 将第一个非零元素之后的所有数组元素设置为0

如果 1. 或 2. 到达数组末尾,当然会跳过剩余的步骤。

“之前”和“之后”的实际含义取决于 Endiannes用于在数组中存储任意长度的数字(即从最低或最高有效位开始)。

实际上,我相信,这也必须在每一种高级语言(比 C 语言)中完成——无论是显式的还是“在幕后”。 (异常(exception)情况是,如果某个 CPU 会通过此操作的特定操作码来支持此操作——恕我直言,这很难相信。)

所以,这就是我在 C 中得到的:

#include <stdio.h>

typedef unsigned char Byte;

unsigned floorPow2Byte(unsigned v)
{
  /* set all bits below left most bit */
  v |= v >> 1;
  v |= v >> 2;
  v |= v >> 4;
  /* clear all bits below left most bit */
  v &= ~(v >> 1);
  /* done */
  return v;
}

void leftMostBit(size_t size, Byte bits[])
{
  size_t i = size;
  while (i--) if (bits[i]) break;
  if (i > size) return; /* wrap around -> everything was 0 */
  bits[i] = (Byte)floorPow2Byte(bits[i]);
  while (i) bits[--i] = 0;
}

void printBits(size_t size, Byte bits[])
{
  static const char *tbl[] = {
    "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111",
    "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111"
  };
  while (size--) {
    printf("%s%s", tbl[bits[size] >> 4], tbl[bits[size] & 0xf]);
  }
}

#define SIZE(ARRAY) (sizeof ARRAY / sizeof *ARRAY)

int main(void)
{
  /* samples */
  Byte bits1[] = { 0x00, 0xef, 0xbe, 0xad, 0x0b, 0x00 };
  Byte bits2[] = { 0xff, 0xff, 0xff };
  Byte bits3[] = { 0x00, 0x00, 0x00 };
  Byte bits4[] = { 0x00, 0x00, 0x01 };
  Byte bits5[] = { 0x00, 0x00, 0x80 };
  Byte bits6[] = { 0x00, 0x01, 0x80 };
  Byte bits7[] = { 0x00, 0x80, 0x80 };
  Byte bits8[] = { 0x80, 0x80, 0x80 };
  /* check it out */
#define DO(DATA) \
  printf("Input : "); printBits(SIZE(DATA), DATA); printf("\n"); \
  leftMostBit(SIZE(DATA), DATA); \
  printf("Output: "); printBits(SIZE(DATA), DATA); printf("\n")

  DO(bits1);
  DO(bits2);
  DO(bits3);
  DO(bits4);
  DO(bits5);
  DO(bits6);
  DO(bits7);
  DO(bits8);

#undef DO
  /* done */
  return 0;
}

ideone 上编译和测试.

输出:

Input : 000000000000101110101101101111101110111100000000
Output: 000000000000100000000000000000000000000000000000
Input : 111111111111111111111111
Output: 100000000000000000000000
Input : 000000000000000000000000
Output: 000000000000000000000000
Input : 000000010000000000000000
Output: 000000010000000000000000
Input : 100000000000000000000000
Output: 100000000000000000000000
Input : 100000000000000100000000
Output: 100000000000000000000000
Input : 100000001000000000000000
Output: 100000000000000000000000
Input : 100000001000000010000000
Output: 100000000000000000000000

最值得注意的部分可能是函数 floorPow2Byte()。这受到 Round up to the next highest power of 2 的强烈启发。但我不得不稍微修改一下。

思路很简单,调试却花费了意想不到的时间。 (TGIF)


更新:

当我使用 unsigned 而不是 Byte 时,示例显然会更有效率。但是,这不会改变整个算法。 更新的来源:

#include <assert.h>
#include <stdio.h>

/* Assuming that size_t has "machine word width"
 * this might be the unsigned int type which might be process most
 * efficiently.
 */
typedef size_t Word;

Word floorPow2(Word v)
{
  assert(sizeof v <= 8);
  v |= v >> 1;
  v |= v >> 2;
  v |= v >> 4;
  if (sizeof v > 1) {
    v |= v >> 8;
    if (sizeof v > 2) {
      v |= v >> 16;
      if (sizeof v > 4) {
        v |= v >> 32;
      }
    }
  }
  v &= ~(v >> 1);
  return v;
}

void leftMostBit(size_t size, Word bits[])
{
  size_t i = size;
  while (i--) if (bits[i]) break;
  if (i > size) return; /* wrap around -> everything was 0 */
  bits[i] = floorPow2(bits[i]);
  while (i) bits[--i] = 0;
}

void printBits(size_t size, Word bits[])
{
  static const char *tbl[] = {
    "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111",
    "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111"
  };
  while (size--) {
    if (sizeof *bits > 1) {
      if (sizeof *bits > 2) {
        if (sizeof *bits > 4) {
          printf("%s%s", tbl[bits[size] >> 60 & 0xf], tbl[bits[size] >> 56 & 0xf]);
          printf("%s%s", tbl[bits[size] >> 52 & 0xf], tbl[bits[size] >> 48 & 0xf]);
          printf("%s%s", tbl[bits[size] >> 44 & 0xf], tbl[bits[size] >> 40 & 0xf]);
          printf("%s%s", tbl[bits[size] >> 36 & 0xf], tbl[bits[size] >> 32 & 0xf]);
        }
        printf("%s%s", tbl[bits[size] >> 28 & 0xf], tbl[bits[size] >> 24 & 0xf]);
        printf("%s%s", tbl[bits[size] >> 20 & 0xf], tbl[bits[size] >> 16 & 0xf]);
      }
      printf("%s%s", tbl[bits[size] >> 12 & 0xf], tbl[bits[size] >> 8 & 0xf]);
    }
    printf("%s%s", tbl[bits[size] >> 4 & 0xf], tbl[bits[size] & 0xf]);
  }
}

#define SIZE(ARRAY) (sizeof ARRAY / sizeof *ARRAY)

int main(void)
{
  /* samples */
  Word bits1[] = { 0x00, 0xef, 0xbe, 0xad, 0x0b, 0x00 };
  Word bits2[] = { 0xff, 0xff, 0xff };
  Word bits3[] = { 0x00, 0x00, 0x00 };
  Word bits4[] = { 0x00, 0x00, 0x01 };
  Word bits5[] = { 0x00, 0x00, 0x80 };
  Word bits6[] = { 0x00, 0x01, 0x80 };
  Word bits7[] = { 0x00, 0x80, 0x80 };
  /* check it out */
#define DO(DATA) \
  printf("Input : "); printBits(SIZE(DATA), DATA); printf("\n"); \
  leftMostBit(SIZE(DATA), DATA); \
  printf("Output: "); printBits(SIZE(DATA), DATA); printf("\n")

  DO(bits1);
  DO(bits2);
  DO(bits3);
  DO(bits4);
  DO(bits5);
  DO(bits6);
  DO(bits7);

#undef DO
  /* done */
  return 0;
}

许多 if 可能会让人感到不舒服,但一个好的 C 编译器应该认识到这些条件是编译时可计算的,并将分别“消除”它们。

在 Windows 10(64 位)上使用 VS2013 编译和测试:

Input : 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010110000000000000000000000000000000000000000000000000000000010101101000000000000000000000000000000000000000000000000000000001011111000000000000000000000000000000000000000000000000000000000111011110000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Input : 000000000000000000000000000000000000000000000000000000001111111100000000000000000000000000000000000000000000000000000000111111110000000000000000000000000000000000000000000000000000000011111111
Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Input : 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Input : 000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Input : 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Input : 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Input : 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

关于c - 在任意长度的位串中删除除最左边的 1 以外的所有内容的最有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48220836/

相关文章:

c - 使用按位运算符提取位

c - 对于我的用例,最有效的位 vector 压缩方法是什么?

c - 将 32 位变量移动 32 位有什么不好?

python匹配正则表达式

c - 字符串文字的地址长度

c - 为什么我不需要 scanf 的符号? (在 C 中)

jquery - 使用 Jquery 和 css 屏蔽屏幕

python - 在特定条件下填充 Pandas 数据框列

c - 在 C 中单独获取所有可用 CPU 的 CPU 利用率

c - 可读性与可维护性 : Condensing statements to loops