Java 8 使用 Streams 算法搜索 ArrayList 失败

标签 java javafx java-stream

我们使用 Stream 来搜索字符串 ArrayList,字典文件已排序并包含 307107 个小写单词
我们使用 findFirst 从 TextArea 中的文本中查找匹配项
只要单词拼写错误超过 3 个字符,搜索就会得到有利的结果
如果拼写错误的单词像这样“Charriage”,则结果与匹配完全不同
显而易见的目标是在不需要查看大量单词的情况下尽可能接近正确

这是我们正在测试的文本
Tak 实现了 hommaker 和 aparent,因为 Chariage NOT ME Charriag 将缺失的元音添加到 Cjarroage

我们对流搜索过滤器进行了一些重大更改,并进行了合理的改进
我们将编辑发布的代码,以仅包含搜索失败的代码部分
下面是对流过滤器进行的代码更改
在代码更改之前,如果 searchString 在位置 1 处有拼写错误的字符,则在字典中找不到结果,新的搜索过滤器修复了该问题
我们还通过增加endsWith的字符数量来添加更多搜索信息
那还失败什么!如果 searchString(拼写错误的单词)在单词末尾缺少一个字符,并且该单词在位置 1 到 4 之间有一个不正确的字符,则搜索失败
我们正在努力添加和删除字符,但我们不确定这是一个可行的解决方案

如果您想要我们将在 GitHub 上发布的完整项目,我们将不胜感激,只需在评论中提问即可

问题仍然是当拼写错误的单词中缺少多个字符时如何修复此搜索过滤器?

经过几个小时的免费 txt 词典搜索后,这是最好的词典之一
侧栏事实是,它有 115726 个长度 > 5 的单词,并且单词末尾有一个元音。这意味着它有 252234 个末尾没有元音的单词
这是否意味着我们有 32% 的机会通过在 searchString 末尾添加元音来解决问题?不是一个问题,只是一个奇怪的事实!

这里是字典下载的链接,并将words_alpha.txt文件放在C盘上的C:/A_WORDS/words_alpha.txt"); words_alpha.txt

更改前的代码

}if(found != true){

    lvListView.setStyle("-fx-font-size:18.0;-fx-background-color: white;-fx-font-weight:bold;");
    for(int indexSC = 0; indexSC < simpleArray.length;indexSC++){

    String NewSS = txtMonitor.getText().toLowerCase();

    if(NewSS.contains(" ")||(NewSS.matches("[%&/0-9]"))){
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText(NOT+" Not in Dictionary");
        txaML.appendText(NOT+" Not in Dictionary");
        onCheckSpelling();
        return;
    }

    int a = NewSS.length();
    int Z;
    if(a == 0){// manage CR test with two CR's
        Z = 0;
    }else if(a == 3){
        Z = 3;
    }else if(a > 3 && a < 5){
        Z = 4;
    }else if(a >= 5 && a < 8){
        Z = 4;
    }else{
        Z = 5;
    }

    System.out.println("!!!! NewSS "+NewSS+" a "+a+" ZZ "+Z);

    if(Z == 0){// Manage CR in TextArea
        noClose = true;
        strSF = "AA";
        String NOT = txtMonitor.getText().toLowerCase();
        //txtTo.setText("Word NOT in Dictionary");// DO NO SEARCH
        //txtTest.setText("Word NOT in Dictionaary");
        txtTest.setText("Just a Space");
        onCheckSpelling();   
    }else{
        txtTest.setText("");
        txaML.clear();
        txtTest.setText("Word NOT in Dictionaary");
        txaML.appendText("Word NOT in Dictionaary");
        String strS = searchString.substring(0,Z).toLowerCase();
        strSF = strS; 
    }
    // array & list use in stream to add results to ComboBox
    List<String> cs = Arrays.asList(simpleArray);
    ArrayList<String> list = new ArrayList<>();

    cs.stream().filter(s -> s.startsWith(strSF))
      //.forEach(System.out::println); 
    .forEach(list :: add);   

    for(int X = 0; X < list.size();X++){
    String A = (String) list.get(X);  

改进的新代码

        }if(found != true){

    for(int indexSC = 0; indexSC < simpleArray.length;indexSC++){

    String NewSS = txtMonitor.getText().toLowerCase();
    if(NewSS.contains(" ")||(NewSS.matches("[%&/0-9]"))){
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText(NOT+" Not in Dictionary");

        onCheckSpelling();
        return;
    }
    int a = NewSS.length();
    int Z;
    if(a == 0){// manage CR test with two CR's
        Z = 0;
    }else if(a == 3){
        Z = 3;
    }else if(a > 3 && a < 5){
        Z = 4;
    }else if(a >= 5 && a < 8){
        Z = 4;
    }else{
        Z = 5;
    }

    if(Z == 0){// Manage CR
        noClose = true;
        strSF = "AA";
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText("Just a Space");
        onCheckSpelling();

    }else{
        txtTest.setText("");
        txtTest.setText("Word NOT in Dictionaary");
        String strS = searchString.substring(0,Z).toLowerCase();
        strSF = strS; 
    }
    ArrayList list = new ArrayList<>(); 
    List<String> cs = Arrays.asList(simpleArray);
    // array list & list used in stream foreach filter results added to ComboBox
    // Code below provides variables for refined search
    int W = txtMonitor.getText().length();

    String nF = txtMonitor.getText().substring(0, 1).toLowerCase();

    String nE = txtMonitor.getText().substring(W - 2, W);
    if(W > 7){
    nM = txtMonitor.getText().substring(W-5, W);
    System.out.println("%%%%%%%% nE "+nE+" nF "+nF+" nM = "+nM);
    }else{
    nM = txtMonitor.getText().substring(W-1, W);   
    System.out.println("%%%%%%%% nE "+nE+" nF "+nF+" nM = "+nM);
    }

    cs.stream().filter(s -> s.startsWith(strSF)
            || s.startsWith(nF, 0)
            && s.length()<= W+2
            && s.endsWith(nE)
            && s.startsWith(nF)
            && s.contains(nM)) 
    .forEach(list :: add);

    for(int X = 0; X < list.size();X++){
    String A = (String) list.get(X);
    sort(list);

    cboSelect.setStyle("-fx-font-weight:bold;-fx-font-size:18.0;");
    cboSelect.getItems().add(A);
    }// Add search results to cboSelect
    break;

这是 FXML 文件的屏幕截图,控件的名称与我们代码中使用的名称相同,但 ComboBox 除外
FXML layout

最佳答案

我正在添加 JavaFX 答案。此应用程序使用Levenshtein Distance。您必须单击检查拼写才能开始。您可以从列表中选择一个单词来替换当前正在检查的单词。我注意到 Levenshtein Distance 返回了很多单词,因此您可能需要找到其他方法来进一步减少列表。

Main

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javafx.application.Application;
import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.ListView;
import javafx.scene.control.TextArea;
import javafx.scene.control.TextField;
import javafx.scene.layout.VBox;
import javafx.stage.Stage;

public class App extends Application
{

    public static void main(String[] args)
    {
        launch(args);
    }

    TextArea taWords = new TextArea("Tak Carrage thiss on hoemaker answe");
    TextField tfCurrentWordBeingChecked = new TextField();
    //TextField tfMisspelledWord = new TextField();
    ListView<String> lvReplacementWords = new ListView();
    TextField tfReplacementWord = new TextField();

    Button btnCheckSpelling = new Button("Check Spelling");
    Button btnReplaceWord = new Button("Replace Word");

    List<String> wordList = new ArrayList();
    List<String> returnList = new ArrayList();
    HandleLevenshteinDistance handleLevenshteinDistance = new HandleLevenshteinDistance();
    ObservableList<String> listViewData = FXCollections.observableArrayList();

    @Override
    public void start(Stage primaryStage)
    {
        setupListView();
        handleBtnCheckSpelling();
        handleBtnReplaceWord();

        VBox root = new VBox(taWords, tfCurrentWordBeingChecked, lvReplacementWords, tfReplacementWord, btnCheckSpelling, btnReplaceWord);
        root.setSpacing(5);
        Scene scene = new Scene(root);
        primaryStage.setScene(scene);
        primaryStage.show();
    }

    public void handleBtnCheckSpelling()
    {
        btnCheckSpelling.setOnAction(actionEvent -> {
            if (btnCheckSpelling.getText().equals("Check Spelling")) {
                wordList = new ArrayList(Arrays.asList(taWords.getText().split(" ")));
                returnList = new ArrayList(Arrays.asList(taWords.getText().split(" ")));
                loadWord();
                btnCheckSpelling.setText("Check Next Word");
            }
            else if (btnCheckSpelling.getText().equals("Check Next Word")) {
                loadWord();
            }
        });
    }

    public void handleBtnReplaceWord()
    {
        btnReplaceWord.setOnAction(actionEvent -> {
            int indexOfWordToReplace = returnList.indexOf(tfCurrentWordBeingChecked.getText());
            returnList.set(indexOfWordToReplace, tfReplacementWord.getText());
            taWords.setText(String.join(" ", returnList));
            btnCheckSpelling.fire();
        });
    }

    public void setupListView()
    {
        lvReplacementWords.setItems(listViewData);
        lvReplacementWords.getSelectionModel().selectedItemProperty().addListener((obs, oldSelection, newSelection) -> {
            tfReplacementWord.setText(newSelection);
        });
    }

    private void loadWord()
    {
        if (wordList.size() > 0) {
            tfCurrentWordBeingChecked.setText(wordList.get(0));
            wordList.remove(0);
            showPotentialCorrectSpellings();
        }
    }

    private void showPotentialCorrectSpellings()
    {
        List<String> potentialCorrentSpellings = handleLevenshteinDistance.getPotentialCorretSpellings(tfCurrentWordBeingChecked.getText().trim());
        listViewData.setAll(potentialCorrentSpellings);
    }
}

CustomWord Class

/**
 *
 * @author blj0011
 */
public class CustomWord
{

    private int distance;
    private String word;

    public CustomWord(int distance, String word)
    {
        this.distance = distance;
        this.word = word;
    }

    public String getWord()
    {
        return word;
    }

    public void setWord(String word)
    {
        this.word = word;
    }

    public int getDistance()
    {
        return distance;
    }

    public void setDistance(int distance)
    {
        this.distance = distance;
    }

    @Override
    public String toString()
    {
        return "CustomWord{" + "distance=" + distance + ", word=" + word + '}';
    }
}

HandleLevenshteinDistance Class

/**
 *
 * @author blj0011
 */
public class HandleLevenshteinDistance
{

    private List<String> dictionary = new ArrayList<>();

    public HandleLevenshteinDistance()
    {
        try {
            //Load DictionaryFrom file
            //See if the dictionary file exists. If it don't download it from Github.
            File file = new File("alpha.txt");
            if (!file.exists()) {
                FileUtils.copyURLToFile(
                        new URL("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"),
                        new File("alpha.txt"),
                        5000,
                        5000);
            }

            //Load file content to a List of Strings
            dictionary = FileUtils.readLines(file, Charset.forName("UTF8"));
        }
        catch (IOException ex) {
            ex.printStackTrace();
        }

    }

    public List<String> getPotentialCorretSpellings(String misspelledWord)
    {
        LevenshteinDistance levenshteinDistance = new LevenshteinDistance();
        List<CustomWord> customWords = new ArrayList();

        dictionary.stream().forEach((wordInDictionary) -> {
            int distance = levenshteinDistance.apply(misspelledWord, wordInDictionary);
            if (distance <= 2) {
                customWords.add(new CustomWord(distance, wordInDictionary));
            }
        });

        Collections.sort(customWords, (CustomWord o1, CustomWord o2) -> o1.getDistance() - o2.getDistance());

        List<String> returnList = new ArrayList();
        customWords.forEach((item) -> {
            System.out.println(item.getDistance() + " - " + item.getWord());
            returnList.add(item.getWord());
        });

        return returnList;
    }
}

关于Java 8 使用 Streams 算法搜索 ArrayList 失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58651888/

相关文章:

java - 如何将 Java 光标设置为不可用的图标?

java - 如何在 Maven 中指定某些后缀文件之间的依赖规则?

java - 只使用 System.in.read() 解析输入

fonts - 如何在 JavaFx 2.2 中嵌入 .ttf 字体?

Java Stream `generate()`如何获取 "include"第一个 "excluded"元素

java - 中级流操作未按计数进行评估

java - 流减少不正确地使用长类型

java - 根据单个单元格值获取整个 Excel 行

java - 使用 Camel 提取文件名中的值

JavaFX 延迟绘制形状