主要有两个思路,一个是旧的,spark 1.3 之前,自己动手丰衣足食,后来有了 spark sql,使用它的 dataframe,也是可以的
=================================
旧的有:
Spark与Mysql(JdbcRDD)整合开发
Spark将计算结果写入到Mysql中
这种 jdbc rdd 的,貌似是 scala 专属,Spark SQL: JdbcRDD
==========================…… 阅读全文
Tag Archives: Spark
spark 算子理解和存储方式
对 combineByKey 的理解,参看,http://luojinping.com/2016/01/…
combineByKey应用举例
求均值
val rdd = sc.textFile(“气象数据”)
val rdd2 = rdd.map(x=>x.split(” “)).map(x => (x(0).substring(“从年月日中提取年月”),x(1).toInt))
val createCombiner = (k: String, v:…… 阅读全文
spark 移动均值
想要在 spark 上算移动均值,可以参考这个
http://stackoverflow.com/quest…
You can use the sliding function from MLLIB which probably does the same thing as Daniel’s answer. You will have to sort the data by time before using the sliding function.
import org.apache.spark.mllib.rdd.RDDFun…… 阅读全文
spark Task not serializable
http://stackoverflow.com/quest…
In case of using Java API you should avoid anonymous class when passing to the mapping function closure. Instead of doing map( new Function) you need a class that extends your function and pass that to the map(..) See: https://yanago.wordpress.com/2…
ht…… 阅读全文