java - 将段落分解为字符串标记

标签 java algorithm substring

我能够根据第 n 个给定的字符限制将文本段落分解为子字符串。我遇到的冲突是我的算法正是这样做的,并且正在分解单词。这就是我被困住的地方。如果字符限制出现在单词的中间,我如何回溯到一个空格,以便我的所有子字符串都有完整的单词?

这是我正在使用的算法

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
    result[i] = mText.substring(j, j + charLimit);
    j += charLimit;
}

result[lastIndex] = mText.substring(j);

我正在使用任意第 n 个整数值设置 charLimit 变量。 mText 是带有一段文本的字符串。关于如何改进这个问题有什么建议吗?预先感谢您。

我收到了很好的回复,只是为了让你知道我做了什么来确定我是否降落在一个空间上,我使用了这个 while 循环。我只是不知道如何纠正这一点。

while (!strTemp.substring(strTemp.length() - 1).equalsIgnoreCase(" ")) {
    // somehow refine string before added to array
}

最佳答案

不确定我是否正确理解了您想要的内容,但我的解释的答案:

您可以使用 lastIndexOf 找到字符限制之前的最后一个空格。然后检查您是否足够接近限制(对于没有空格的文本),即:

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
    splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
    splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
    result[i] = mText.substring(j, splitpoint).trim();
    j = splitpoint;
}

result[lastIndex] = mText.substring(j).trim();

这将搜索 charLimit 之前的最后一个空格(示例值),如果字符串小于 tolerance,则在此处拆分字符串,或者在 charLimit 处拆分 如果不是。

此解决方案的唯一问题是最后一个 Stringtoken 可能比 charLimit 长,因此您可能需要调整 arrayLength 并循环 while (mText - j >字符限制)


编辑

运行示例代码:

 public static void main(String[] args) {
    String mText =  "I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?";

    int charLimit = 40;
    int arrayLength = 0;
    arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

    String[] result = new String[arrayLength];
    int j = 0;
    int tolerance = 10;
    int splitpoint;
    int lastIndex = result.length - 1;
    for (int i = 0; i < lastIndex; i++) {
        splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
        splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
        result[i] = mText.substring(j, splitpoint);
        j = splitpoint;
    }

    result[lastIndex] = mText.substring(j);

    for (int i = 0; i<arrayLength; i++) {
        System.out.println(result[i]);
    }
}

输出:

I am able to break up paragraphs of text
 into substrings based upon nth given
 character limit. The conflict I have is
 that my algorithm is doing exactly
 this, and is breaking up words. This is
 where I am stuck. If the character
 limit occurs in the middle of a word,
 how can I back track to a space so that
 all my substrings have entire words?

其他编辑:根据 curiosu 的建议添加了 trim() 。它删除字符串标记周围的空格。

关于java - 将段落分解为字符串标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25411319/

相关文章:

javascript - 下载文件

java - 二维数组中的最短路径

c - Haskell二叉树快速实现

java - 我需要批量后台删除服务/软件

java - 使用正则表达式和 android 对不同的字段进行分类

iphone - 检测 NSString 中的撇号?

Python 计数子串

c# - string.IndexOf 搜索全词匹配

java - Android通知图标根据状态栏颜色改变背景

algorithm - 使用队列实现栈——最好的复杂度