spark 算子理解和存储方式

对 combineByKey 的理解,参看,http://luojinping.com/2016/01/…

combineByKey应用举例
求均值
val rdd = sc.textFile(“气象数据”)
val rdd2 = rdd.map(x=>x.split(” “)).map(x => (x(0).substring(“从年月日中提取年月”),x(1).toInt))
val createCombiner = (k: String, v: Int)=> {
(v,1)
}
val mergeValue = (c:(Int, Int), v:Int) => {
(c._1 + v, c._2 + 1)
}
val mergeCombiners = (c1:(Int,Int),c2:(Int,Int))=>{
(c1._1 + c2._1, c1._2 + c2._2)
}
val vdd3 = vdd2.combineByKey(
createCombiner,
mergeValue,
mergeCombiners
)
rdd3.foreach(x=>println(x._1 + “: average tempreture is ” + x._2._1/x._2._2)

spark 和 hbase 的结合看这里,Spark算子:RDD行动Action操作(7)–saveAsNewAPIHadoopFile、saveAsNewAPIHadoopDatasetSpark算子:RDD行动Action操作(6)–saveAsHadoopFile、saveAsHadoopDatasetSparkSQL读取HBase数据,通过自定义外部数据源

另外,这一个系列文章也不错,标签:spark算子

这也不错,还保持着持续更新,lujinhong

各种算子的解释也看这里,样例有助于理解,spark常用transformation和action.html

Leave a Reply

Your email address will not be published. Required fields are marked *