在 C99 中收集没有浪费 malloc 的字符串

标签 c malloc

我目前正在阅读 21st Century C,并尝试使用一些人为设计的示例代码。

我试图解决的问题不是重复 malloc()realloc() 缓冲区,完整代码是 here , 但我已经内联了下面的重要部分:

函数_sgr_color(int modes[])应该这样调用,两者是等价的,第二个是包装复合文字的宏:

_sgr_color((int[]){31,45,0}); // literal function 
sgv_color(31, 45);            // va macro wrapper

它们都应该返回类似于 \x1b[;31;45m 的内容。

然而,魔数(Magic Number)在原始代码中被定义为常量(typedef enum SGR_COLOR { SGR_COLOUR_BLACK = 31},等等)

在函数_sgr_color(int modes[])中,我知道我需要分配一个缓冲区,并返回它,没问题,但我不知道缓冲区需要多长时间直到我走 modes[]:

我已经内联注释了代码:

/* Static, to make callers use the macro */
static char *_sgr_color(int modes[]) {
  /* We increment the writeOffset to move the "start" pointer for the memcpy */
  int writeOffset = 0;

  /* Initial length, CSI_START and CSI_END are `\x1b[' and `m' respectively */
  int len = strlen(CSI_START CSI_END);

  /* Loop over modes[] looking for a 0, then break, count the number of entries
   * this is +1 to account for the ; that we inject.
   */
  for (int i = 0; modes[i] > 0; i++) {
    len += sizeof(enum SGR_COLOR) + 1;
  }

  /* Local buffer, unsafe for return but the length at least is right (no +1 for
   * \0
   * because we are in control of reading it, and we'll allocate the +1 for the
   * buffer which we return
   */
  char buffer[len];

  /* Copy CSI_START into our buffer at position 0 */
  memcpy(buffer, CSI_START, strlen(CSI_START));
  /* Increment writeOffset by strlen(CSI_START) */
  writeOffset += strlen(CSI_START);

  /* Loop again over modes[], inefficient to walk it twice,
   * but preferable to extending a buffer in the first loop with
   * realloc().
   */
  for (int i = 0; modes[i] > 0; i++) {

    /* Copy the ; separator into the buffer and increment writeOffset by
     * sizeof(char) */
    memcpy(buffer + writeOffset, ";", sizeof(char));
    writeOffset += sizeof(char);

    /* Write the mode number (int) to the buffer, and increment the writeOffset
     * by the appropriate amount
     */
    char *modeistr;
    if (asprintf(&modeistr, "%d", modes[i]) < 0) {
      return "\0";
    }
    memcpy(buffer + writeOffset, modeistr, sizeof(enum SGR_COLOR));
    writeOffset += strlen(modeistr);
    free(modeistr);
  }

  /* Copy the CSI_END into the buffer, no need to touch writeOffset */
  memcpy(buffer + writeOffset, CSI_END, strlen(CSI_END));
  char *dest = malloc(len + 1);

  /* Copy the buffer into the return buffer, strncopy will fill the +1 with \0
   * as per the documentation:
   *
   * > The stpncpy() and strncpy() functions copy at most n characters
   * > from src into dst.  If src is less than n characters long, the
   * > remainder of dst is filled with `\0' characters.
   * > Otherwise, dst is not terminated.
   *
   */
  strncpy(dest, buffer, len);
  return dest;
}

这里的代码有效,一个示例程序以正确的顺序输出正确的字节,并且颜色代码有效,但是存在标记的问题:

  1. asprintf() 的使用破坏了我不想重复调用 malloc() 的理由。

我正在努力了解如何简化这段代码(如果有的话),以及这可能会如何影响我不重复分配内存的愿望。

最佳答案

看起来你可以根据给定的模式数量和它们允许值的已知界限(加上所涉及的常量字符串的已知长度)来计算所需的最大空间。在那种情况下,尽可能少的动态内存分配将通过

  • 只执行一次 malloc() 以获得足够大的缓冲区来容纳所有内容,而不管模式的实际值如何,
  • 直接写入该缓冲区(无 asprintf()),
  • 并最终返回它(没有需要复制的本地暂存缓冲区)。

如果需要,您可以跟踪实际写入了多少数据,并在最后使用 realloc() 将缓冲区缩小到实际使用的空间(这应该便宜且可靠,因为您会减少分配)。如果内存充足,那么您可以跳过 realloc() —— 输出缓冲区将占用比它需要的更多的内存,但是当缓冲区被释放时它会被恢复。

与预先计算每种模式缓冲区中需要多少空间相比,这将更容易、更可靠,甚至可能更快,这是最小化动态分配的另一种选择。

例如:

/* The maximum number of decimal digits in a valid mode */
#define MODE_MAX_DIGITS 6

static char *_sgr_color(int modes[]) {
  char *buffer;
  char *buf_tail;
  char *temp;

  /* Space required for the start and end sequences, plus a string terminator
   * (-1 instead of +1 because the two sizeofs each include space for one
   * terminator)
   */
  int len = sizeof(CSI_START) + sizeof(CSI_END) - 1;

  /* Increase the required length to provide enough space for all the modes and
   * their semicolon separators.
   */
  for (int i = 0; modes[i] > 0; i++) {
    len += MODE_MAX_DIGITS + 1;
  }

  /* Allocate a buffer big enough to hold the entire result, no matter
   * what the actual mode values are
   */
  buffer = malloc(len);
  buf_tail = buffer;

  /* Copy CSI_START into our buffer at the current position (the beginning),
   * and advance the tail pointer to the next available position
   */
  buf_tail += sprintf(buf_tail, "%s", CSI_START);

  /* Loop again over modes[].  It's more efficient to walk it twice than to
   * repeatedly extend the buffer as would be required to walk it only once.
   */
  for (int i = 0; modes[i] > 0; i++) {
    /* Write the ; separator and mode into the buffer; track the buffer tail */
    buf_tail += sprintf(buf_tail, ";%d", modes[i]);
  }

  /* Copy the CSI_END into the buffer, and update the buffer tail */
  buf_tail += sprintf(buf_tail, "%s", CSI_END);

  /* shrink the buffer to the space actually used (optional) */
  temp = realloc(buffer, 1 + buf_tail - buffer);

  /* realloc() should not fail in this case, but if it does then temp
   * will be NULL and buffer will still be valid.  Else temp PROBABLY
   * is equal to buffer, but that's not guaranteed.
   */
  return temp ? temp : buffer;
}

关于在 C99 中收集没有浪费 malloc 的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27411877/

相关文章:

c - 分配指针的Strlen()

c - 在程序运行时分配内存

c - 在代数符号中为上三角矩阵分配空间

c++ - X11 : How to Render Continuously

C编程: How to program for Unicode?

c - 使用 exit() 而不是 fcloseall() 来关闭多个文件好吗?

c - 如何使用 C 程序和 malloc 找出处理器的页面大小?

c++ - 如何覆盖 exit(),可能是通过抛出异常

c - 从文件中扫描单词并将其保存在数组中

c - 我如何在c中使用动态二维数组