Step 1- I have a hadoop streaming job that takes variable time based on amount of data to process Step 2- Once the job is done, I need to import all that data dump into mongodb and create a flat csv file out of it
问题
有什么方法可以使用hadoop流将步骤2粘贴到步骤1,并避免手动执行步骤2?
最佳答案
我建议使用https://github.com/Yelp/mrjob或https://github.com/klbostee/dumbo之类的东西。专门用于mrjob和您的问题http://packages.python.org/mrjob/job.html#writing-multi-step-jobs
关于python - hadoop-streaming:作业完成后自动进行后处理吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9994724/