给定一个包含多个散布破折号的字符的字符串,例如 string s = "A--TG-DF----GR--";
,我希望随机选择一个破折号 block (大小可以为 1、2、...、字符串中连续破折号的最大数量),并将它们复制到随机选择的字符串的另一部分。
例如,A--TG-DF(---)-GR--
移至 A--T(---)G-DF-GR--
尽管
另一次迭代可能会给出 A--TG-DF----GR(--)
移至 A--TG-(--)DF----GR
.
我通过 int i = rand() % (int) s.length();
生成字符串的随机索引。插入通过 s.insert(rand() % (int) s.length(), substr);
进行,其中substr
是破折号 block 。
我的主要问题在于随机找到一组连续的破折号。我想到使用 s.find("-")
,但这只会返回单个破折号的第一个实例,而不是破折号集合的随机位置。
最佳答案
我知道这个问题可能源于 XY problems ,但我发现它仍然是一个很好的挑战,因此我考虑使用 Boost Interval Container 库来实现它。
该库的美妙之处在于您可以忘记很多细节,同时不会牺牲很多性能。
我冒昧地概括了它,以便它能够同时移动多个破折号 block (统一随机选择)。
解决方案运行 Live On Coliru 并在大约 2673 毫秒(我的机器上为 1156 毫秒)内生成给定样本的 1,000,000 次随机转置,其中移动 block 的数量随机变化 (1..3):
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
Note: the benchmark times include the time taken to build the histogram of unique results generated
让我们在这里进行代码演练来解释一下:
我尝试使用“可读”类型:
namespace icl = boost::icl; using Position = size_t; using Map = icl::interval_set<Position>; using Region = Map::value_type;
例如构建破折号
Map
的函数很简单:template <typename It> Map region_map(It f, It l) { Map dashes; for (Position pos = 0; f!=l; ++f, ++pos) if ('-' == *f) dashes += pos; return dashes; }
Note how I didn't particularly optimize this. I let the interval_set combine adjacent dashes. We might use hinted insertion, or a parser that add consecutive dashes as a block. I opted for KISS here.
稍后,我们生成重定位,将
Region
映射到文本其余部分中不移动的Position
。using Relocs = boost::container::flat_multimap<Position, Region>;
通过使用平面多重映射,调用者可以将条目按升序插入点排序。因为我们使用了
reserve()
-ed 的预先平面多重映射,所以我们避免了这里基于节点的映射实现的开销。我们首先选择要移动的仪表板:
Map pick_dashes(int n) { Map map; if (!_dashes.empty()) for (int i = 0; i < n; ++i) map += *std::next(_dashes.begin(), _select(_rng)); return map; }
随机分布已在构造时确定尺寸,例如:
_dashes(region_map(_input.begin(), _input.end())), _rng(std::random_device {}()), _select (0, _dashes.iterative_size() - 1), _randpos(0, _input.size() - 1),
接下来,我们为每个分配插入位置。诀窍是在源的非移动(惰性)区域内分配位置。
- 这包括留在其位置的其他破折号 block
存在一种退化情况,即所有内容都是破折号 block ,我们在构造函数中检测到了这一点:
_is_degenerate(cardinality(_dashes) == _input.size())
所以代码如下:
Relocs gen_relocations(int n=1) { Map const moving = pick_dashes(n); Relocs relocs; relocs.reserve(moving.iterative_size()); if (_is_degenerate) { // degenerate case (everything is a dash); no non-moving positions // exist, just pick 0 for(auto& region : moving) relocs.emplace(0, region); } else { auto inertia = [&] { Position inert_point; while (contains(moving, inert_point = _randpos(_rng))) ; // discard insertion points that are moving return inert_point; }; for(auto& region : moving) relocs.emplace(inertia(), region); } return relocs; }
现在我们需要做的就是应用重定位。
这个的通用实现非常简单。同样,为了保持简单,它没有特别优化(KISS):
template <typename F> void do_apply_relocations(Relocs const& mobility, F const& apply) const { icl::split_interval_set<Position> steps {{0, _input.size()}}; for (auto& m : mobility) { steps += m.first; // force a split on insertion point steps -= m.second; // remove the source of the move //std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n"; } auto next_mover = mobility.begin(); for(auto& step : steps) { while (next_mover!=mobility.end() && contains(step, next_mover->first)) apply((next_mover++)->second, true); apply(step, false); } }
Note The trick here is that we "abuse" the
split_interval_set
combining strategy to break the processing into sub-ranges that "stop" at the randomly generated insertion points: these artificial regions are the "steps" in our generation loop.我们实现的
apply
函数是为了得到我们想要的东西。在我们的例子中,我们想要一个像A--TG-DFGR(----)--
这样的字符串,因此我们编写一个附加到容器的重载(例如std::string
code>) 使用任何输出迭代器:template <typename Out> Out apply_relocations(Relocs const& mobility, Out out) const { if (_is_degenerate) return std::copy(_input.begin(), _input.end(), out); auto to_iter = [this](Position pos) { return _input.begin() + pos; }; do_apply_relocations(mobility, [&](Region const& r, bool relocated) { if (relocated) *out++ = '('; out = std::copy( to_iter(first(r)), to_iter(last(r)+1), out ); if (relocated) *out++ = ')'; }); return out; }
Note The "complicated" part here are mapping the
Position
to input iterators (to_iter
) and the code to optionally add()
around moved blocks.
这样,我们就看到了所有的代码。
完整列表
#include <boost/container/flat_map.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/icl/split_interval_set.hpp>
#include <boost/icl/separate_interval_set.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/range/algorithm.hpp>
#include <iomanip>
#include <iostream>
#include <random>
#include <map>
#include <chrono>
namespace icl = boost::icl;
using Position = size_t;
using Map = icl::interval_set<Position>;
using Region = Map::value_type;
using Relocs = boost::container::flat_multimap<Position, Region>;
struct Generator {
Generator(std::string const& input)
: _input(input),
_dashes(region_map(_input.begin(), _input.end())),
_rng(std::random_device {}()),
_select (0, _dashes.iterative_size() - 1),
_randpos(0, _input.size() - 1),
_is_degenerate(cardinality(_dashes) == _input.size())
{
}
unsigned random(unsigned below) {
return _rng() % below; // q&d, only here to make the tests deterministic for a fixed seed
}
Map full() const {
return Map { { 0, _input.size() } };
}
Relocs gen_relocations(int n=1) {
Map const moving = pick_dashes(n);
Relocs relocs;
relocs.reserve(moving.iterative_size());
if (_is_degenerate)
{
// degenerate case (everything is a dash); no non-moving positions
// exist, just pick 0
for(auto& region : moving)
relocs.emplace(0, region);
} else {
auto inertia = [&] {
Position inert_point;
while (contains(moving, inert_point = _randpos(_rng)))
; // discard insertion points that are moving
return inert_point;
};
for(auto& region : moving)
relocs.emplace(inertia(), region);
}
return relocs;
}
template <typename Out>
Out apply_relocations(Relocs const& mobility, Out out) const {
if (_is_degenerate)
return std::copy(_input.begin(), _input.end(), out);
auto to_iter = [this](Position pos) { return _input.begin() + pos; };
do_apply_relocations(mobility, [&](Region const& r, bool relocated) {
if (relocated) *out++ = '(';
out = std::copy(
to_iter(first(r)),
to_iter(last(r)+1),
out
);
if (relocated) *out++ = ')';
});
return out;
}
template <typename F>
void do_apply_relocations(Relocs const& mobility, F const& apply) const {
icl::split_interval_set<Position> steps {{0, _input.size()}};
for (auto& m : mobility) {
steps += m.first; // force a split on insertion point
steps -= m.second; // remove the source of the move
//std::cout << m.second << " moving to #" << m.first << ": " << steps << "\n";
}
auto next_mover = mobility.begin();
for(auto& step : steps) {
while (next_mover!=mobility.end() && contains(step, next_mover->first))
apply((next_mover++)->second, true);
apply(step, false);
}
}
private:
std::string _input;
Map _dashes;
std::mt19937 _rng;
std::uniform_int_distribution<Position> _select;
std::uniform_int_distribution<Position> _randpos;
bool _is_degenerate;
Map pick_dashes(int n) {
Map map;
if (!_dashes.empty())
for (int i = 0; i < n; ++i)
map += *std::next(_dashes.begin(), _select(_rng));
return map;
}
template <typename It> Map region_map(It f, It l) {
Map dashes;
for (Position pos = 0; f!=l; ++f, ++pos)
if ('-' == *f)
dashes += pos;
return dashes;
}
};
int main() {
for (std::string test_case : {
"----",
"A--TG-DF----GR--",
"",
"ATGDFGR",
})
{
auto start = std::chrono::high_resolution_clock::now();
Generator gen(test_case);
std::string result;
std::map<std::string, size_t> histo;
for(int i = 0; i < 1000000; ++i) {
auto const mobility = gen.gen_relocations(1 + gen.random(3)); // move n blocks of dashes
result.clear();
gen.apply_relocations(mobility, std::back_inserter(result));
histo[result]++;
}
std::cout << histo.size() << " unique results for '" << test_case << "'"
<< " in " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now()-start).count() << "ms\n";
std::multimap<size_t, std::string, std::greater<size_t> > ranked;
for (auto& entry : histo)
ranked.emplace(entry.second, entry.first);
int topN = 10;
for (auto& rank : ranked)
{
std::cout << std::setw(8) << std::right << rank.first << ": " << rank.second << "\n";
if (0 == --topN)
break;
}
}
}
打印例如
1 unique results for '----' in 186ms
1000000: ----
3430 unique results for 'A--TG-DF----GR--' in 1156ms
9251: A(----)--TG-DFGR--
9226: (----)A--TG-DFGR--
9191: A--T(----)G-DFGR--
9150: A--TG-DFGR-(----)-
9132: A--(----)TG-DFGR--
9128: A--TG(----)-DFGR--
9109: A--TG-D(----)FGR--
9098: A--TG-DFG(----)R--
9079: A--TG-DFGR(----)--
9047: A-(----)-TG-DFGR--
1 unique results for '' in 25ms
1000000:
1 unique results for 'ATGDFGR' in 77ms
1000000: ATGDFGR
关于c++ - 从字符串中随机选择特定子序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28397514/