java - Hbase - 如何添加 super 列族?

标签 java mysql hbase super-columns

我正在尝试创建将 MYSQL 数据库转换为 NOSQL Hbase 数据库的 Java 应用程序。
至此已经正确从mysql读取数据并插入到hbase
但现在我正在尝试处理MYSQL表之间的关系, 我知道如果存在关系,您应该添加其中一个表作为 super 列族。
SuperColumn 我查看了 apatch 网站文档,但找不到任何内容。
有任何想法吗 ?

最佳答案

列族与关系无关。相反,您必须通过行键设计正确创建倒排索引,这可以通过了解另一个表的键来有效地 O(1) 检索数据。或者为了避免连接,请尝试将所有数据存储在一行中。任何为 HBase 提供 SQL 接口(interface)的工具都会生成需要时间启动和执行的作业。如果执行 Get 操作或扫描连续行,HBase 会很快。 希望这有用。

更新

有关列族的更多详细信息,请参阅好书 Architecting HBase Applications

A column family is an HBase-specific concept that you will not find in other RDBMS applications. For the same region, different column families will store the data into different files and can be configured differently. Data with the same access pattern and the same format should be grouped into the same column family. As an example regarding the format, if you need to store a lot of textual metadata information for customer profiles in addition to image files for each customer’s profile photo, you might want to store them into two different column families: one compressed (where all the textual information will be stored), and one not compressed (where the image files will be stored). As an example regarding the access pattern, if some information is mostly read and almost never written, and some is mostly written and almost never read, you might want to separate them into two different column families. If the different columns you want to store have a similar format and access pattern, regroup them within the same column family. The write cache memory area for a given RegionServer is shared by all the column families configured for all the regions hosted by the given host. Abusing column families will put pressure on the memstore, which will generate many small files, which in turn will generate a lot of compactions that might impact the performance. There is no technical limitation on the number of column families you can configure for a table. However, over the last three years, most of the use cases we had the chance to work on only required a single column family. Some required two column families, but each time we have seen more than two column families, it has been possible and recommended to reduce the number to improve efficiency. If your design includes more than three column families, you might want to take a deeper look at it and see if all those families are really required; most likely, they can be regrouped. If you do not have any consistency constraints between your two columns families and data will arrive into them at a different time, instead of creating two column families for a single table, you can also create two tables, each with a single column family. This strategy is useful when it comes time to decide the size of the regions. Indeed, while it was better to keep the two column families almost the same size, by splitting them accross two different tables, it is now easier to let me grow independently.

还有这个answer可能会有用。

关于java - Hbase - 如何添加 super 列族?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44880421/

相关文章:

php - Mysql过程执行两次

java - 使用 Hadoop-2.2.0 启动 hbase 时出错

java - 将此递归结构从 Python 移植到 Java 时要使用什么数据类型

java - 我可以将静态方法注册为 Guava EventBus 订阅者吗?

mysql - 通过自动执行内连接进行排序

MySQL > SSMS 2012 链接服务器 "Requested conversion is not supported."

scala - Prediction.io - pio 训练失败

hadoop - hbase 找不到现有表

java - 无法连接到在 AWS EC2 上运行的 Selenium Grid

java - Spark (Kafka) 流式内存问题