spark 操作 mysql

Posted on 2016-04-19 by ZRJ

主要有两个思路，一个是旧的，spark 1.3 之前，自己动手丰衣足食，后来有了 spark sql，使用它的 dataframe，也是可以的 ================================= 旧的有： Spark与Mysql(JdbcRDD)整合开发 Spark将计算结果写入到Mysql中这种 jdbc rdd 的，貌似是 scala 专属，Spark SQL: JdbcRDD ==========================……

阅读全文

spark 算子理解和存储方式

Posted on 2016-04-19 by ZRJ

对 combineByKey 的理解，参看，http://luojinping.com/2016/01/… combineByKey应用举例求均值 val rdd = sc.textFile(“气象数据”) val rdd2 = rdd.map(x=>x.split(” “)).map(x => (x(0).substring(“从年月日中提取年月”),x(1).toInt)) val createCombiner = (k: String, v:……

阅读全文

spark 移动均值

Posted on 2016-04-18 by ZRJ

想要在 spark 上算移动均值，可以参考这个 http://stackoverflow.com/quest… You can use the sliding function from MLLIB which probably does the same thing as Daniel’s answer. You will have to sort the data by time before using the sliding function. import org.apache.spark.mllib.rdd.RDDFun……

阅读全文

spark Task not serializable

Posted on 2016-04-18 by ZRJ

http://stackoverflow.com/quest… In case of using Java API you should avoid anonymous class when passing to the mapping function closure. Instead of doing map( new Function) you need a class that extends your function and pass that to the map(..) See: https://yanago.wordpress.com/2… ht……

阅读全文

ZRJ

学习笔记

Tag Archives: Spark

spark 操作 mysql

spark 算子理解和存储方式

spark 移动均值

spark Task not serializable