我想出了下面的代码来生成 100001 个随机字符串。这些字符串应该是唯一的。但是,下面的代码需要几个小时才能完成这项工作。谁能告诉我如何优化它以及为什么它这么慢?
string getRandomString(int length) {
static string charset = "abcdefghijklmnopqrstuvwxyz";
string result;
result.resize(length);
for (int i = 0; i < length; i++) {
result[i] = charset[rand() % charset.length()];
}
return result;
}
void main(){
srand(time(NULL));
vector<string> storeUnigrams;
int numUnigram = 100001;
string temp = "";
int minLen = 3;
int maxLen = 26;
int range = maxLen - minLen + 1;
int i =0;
while(i < numUnigram){
int lenOfRanString = rand()%range + minLen;
temp = getRandomString(lenOfRanString);
bool doesithave = false;
for(int j =0 ; j < storeUnigrams.size() ; j++){
if(temp.compare(storeUnigrams[j]) == 0){
doesithave = true;
break;
}
if(temp.compare(storeUnigrams[j]) < 0){
break;
}
}
if(!doesithave){
storeUnigrams.push_back(temp);
sort(storeUnigrams.begin(),storeUnigrams.end());
i++;
}
}
最佳答案
有两个因素会使您的代码变慢:
- 通过线性搜索检查字符串是否已经存在 – O(n)
- 在每次迭代中对 vector 进行排序 – O(n log n)
使用例如用于存储字符串的 set
– 它是自动排序的,并且检查是否存在的速度很快:
int main(){
srand(time(NULL));
set<string> storeUnigrams;
int numUnigram = 100001;
int minLen = 3;
int maxLen = 26;
int range = maxLen - minLen + 1;
while(storeUnigrams.size() < numUnigram){
int lenOfRanString = rand()%range + minLen;
storeUnigrams.insert(getRandomString(lenOfRanString));
}
}
关于c++ - 极慢的随机字符串生成器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11914871/