注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

阿弥陀佛

街树飘影未见尘 潭月潜水了无声 般若观照心空静...

 
 
 

日志

 
 
关于我

一直从事气象预报、服务建模实践应用。 注重气象物理场、实况场、地理信息、本体知识库、分布式气象内容管理系统建立。 对Barnes客观分析, 小波,计算神经网络、信任传播、贝叶斯推理、专家系统、网络本体语言有一定体会。 一直使用Java、Delphi、Prolog、SQL编程。

网易考拉推荐

CosineSimilarity  

2015-01-13 22:38:46|  分类: Spark |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package examples.mllib

//import scopt.OptionParser

import org.apache.spark.SparkContext._
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.distributed.{MatrixEntry, RowMatrix}
import org.apache.spark.{SparkConf, SparkContext}

/**
* Compute the similar columns of a matrix, using cosine similarity.
*
* The input matrix must be stored in row-oriented dense format, one line per row with its entries
* separated by space. For example,
* {{{
* 0.5 1.0
* 2.0 3.0
* 4.0 5.0
* }}}
* represents a 3-by-2 matrix, whose first row is (0.5, 1.0).
*
* Example invocation:
*
* bin/run-example mllib.CosineSimilarity \
* --threshold 0.1 data/mllib/sample_svm_data.txt
*/
object CosineSimilarity {
//data/mllib/sample_svm_data.txt
//sparkdata/ncep_BE/ncep_BE2015010403.DAT
case class Params(inputFile: String = "sparkdata/ncep_BE/ncep_BE2015010403.DAT", threshold: Double = 0.1)
extends AbstractParams[Params]

def main(args: Array[String]) {
val defaultParams = Params()

// val parser = new OptionParser[Params]("CosineSimilarity") {
// head("CosineSimilarity: an example app.")
// opt[Double]("threshold")
// .required()
// .text(s"threshold similarity: to tradeoff computation vs quality estimate")
// .action((x, c) => c.copy(threshold = x))
// arg[String]("<inputFile>")
// .required()
// .text(s"input file, one row per line, space-separated")
// .action((x, c) => c.copy(inputFile = x))
// note(
// """
// |For example, the following command runs this app on a dataset:
// |
// | ./bin/spark-submit --class org.apache.spark.examples.mllib.CosineSimilarity \
// | examplesjar.jar \
// | --threshold 0.1 data/mllib/sample_svm_data.txt
// """.stripMargin)
// }
//
// parser.parse(args, defaultParams).map { params =>
run(defaultParams)
// } getOrElse {
// System.exit(1)
// }
}

def run(params: Params) {
val conf = new SparkConf().setAppName("CosineSimilarity")
val sc = new SparkContext(conf)

// Load and parse the data file.
val rows = sc.textFile(params.inputFile).filter(line => (line(1)>='0' && line(1)<='9') ).map {line =>
val values = line.split(' ').filter(str => !str.isEmpty).map(_.toDouble)
Vectors.dense(values)
}.cache()
val mat = new RowMatrix(rows)

// Compute similar columns perfectly, with brute force.
val exact = mat.columnSimilarities()

// Compute similar columns with estimation using DIMSUM
val approx = mat.columnSimilarities(params.threshold)

val exactEntries = exact.entries.map { case MatrixEntry(i, j, u) => ((i, j), u) }
val approxEntries = approx.entries.map { case MatrixEntry(i, j, v) => ((i, j), v) }
val MAE = exactEntries.leftOuterJoin(approxEntries).values.map {
case (u, Some(v)) =>
math.abs(u - v)
case (u, None) =>
math.abs(u)
}.mean()

println(s"Average absolute error in estimate is: $MAE")

sc.stop()
}
}
  评论这张
 
阅读(1356)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017