events - 将事件归结为时间间隔

标签 events logging mapreduce reducing

设想:
我有一个记录事件的服务,例如这个 CSV 示例:

#TimeStamp, Name, ColorOfPullover
TimeStamp01, Peter, Green
TimeStamp02, Bob, Blue
TimeStamp03, Peter, Green
TimeStamp04, Peter, Red
TimeStamp05, Peter, Green

事件,例如彼得穿着绿色会经常连续出现。

我有两个目标:
  • 保持数据尽可能小
  • 保留所有相关数据

  • 相关意思是:我需要知道,一个人在哪个时间段穿着什么颜色。例如:
    #StartTime, EndTime, Name, ColorOfPullover
    TimeStamp01, TimeStamp03, Peter, Green
    TimeStamp02, TimeStamp02, Bob, Blue
    TimeStamp03, TimeStamp03, Peter, Green
    TimeStamp04, TimeStamp04, Peter, Red
    TimeStamp05, TimeStamp05, Peter, Green
    

    在这种格式中,我可以回答以下问题:在 TimeStamp02 时间彼得穿什么颜色? (我可以安全地假设每个人在两个记录的相同颜色的事件之间穿着相同的颜色。)

    主要问题 :
    我可以使用现有的技术来实现这一目标吗? IE。我可以为它提供连续的事件流并提取和存储相关数据吗?

    准确地说,我需要实现这样的算法(伪代码)。 OnNewEvent为 CSV 示例的每一行调用方法。其中参数event已经包含行中的数据作为成员变量。
    def OnNewEvent(even)
        entry = Database.getLatestEntryFor(event.personName)
        if (entry.pulloverColor == event.pulloverColor)
            entry.setIntervalEndDate(event.date)
            Database.store(entry)
        else
            newEntry = new Entry
            newEntry.setIntervalStartDate(event.date)
            newEntry.setIntervalEndDate(event.date)
            newEntry.setPulloverColor(event.pulloverColor))
            newEntry.setName(event.personName)
            Database.createNewEntry(newEntry)
        end
    end
    

    最佳答案

    This is typical scenario of any streaming architecture.  
    
    There are multiple existing technologies which work in tandem  to get what you want. 
    
    
    1.  NoSql Database (Hbase, Aerospike, Cassandra)
    2.  streaming jobs Like Spark streaming(micro batch), Storm 
    3.  Run mapreduce in micro batch to insert into NoSql Database.
    4.  Kafka Distriuted queue
    
    The end to end flow. 
    
    Data -> streaming framework -> NoSql Database. 
    OR 
    Data -> Kafka -> streaming framework -> NoSql Database. 
    
    
    IN NoSql database there are two ways to model your data. 
    1. Key by "Name" and for every event for that given key, insert into Database.
       While fetching u get back all events corresponding to that key. 
    
    2. Key by "name", every time a event for key is there, do a UPSERT into a existing blob(Object saved as binary), Inside the blob you maintain the time range and color seen.  
    
    Code sample to read and write to Hbase and Aerospike  
    

    Hbase:http://bytepadding.com/hbase/

    Aerospike:http://bytepadding.com/aerospike/

    关于events - 将事件归结为时间间隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46222812/

    相关文章:

    javascript - AlloyUI Scheduler - 从数据库加载后添加自定义属性

    JQuery "tap"事件触发两次

    javascript - 两个元素使用相同的事件函数

    java - 如何将 Log4J 输出复制到字符串?

    javascript - 如何使用winston 日志记录创建事件ID?

    c# - asp.net 用户控件事件传播问题

    hadoop - 我可以在Spark中创建序列文件吗?

    r - 使用hadoop流运行R脚本作业失败:PipeMapRed.waitOutputThreads():子进程失败,代码为1

    mongodb - DBRefs vs Map/Reduce vs 手册引用

    php - Zend Framework 2 中的数据库日志记录 : wrong "extra" column name