spark sql 1.2.0 测试
生活随笔
收集整理的這篇文章主要介紹了
spark sql 1.2.0 测试
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1:啟動shell
master=spark://feng02:7077 ./bin/spark-shell
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@30aceb27scala> import sqlContext.createSchemaRDD import sqlContext.createSchemaRDDscala> case class Person(name: String, age: Int) defined class Personscala> val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)) 15/02/27 16:41:19 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975 15/02/27 16:41:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB) 15/02/27 16:41:19 INFO MemoryStore: ensureFreeSpace(22736) called with curMem=163705, maxMem=280248975 15/02/27 16:41:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 267.1 MB) 15/02/27 16:41:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:54438 (size: 22.2 KB, free: 267.2 MB) 15/02/27 16:41:19 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/02/27 16:41:19 INFO SparkContext: Created broadcast 0 from textFile at <console>:17 people: org.apache.spark.rdd.RDD[Person] = MappedRDD[3] at map at <console>:17scala> people.registerTempTable("people")scala> val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19") teenagers: org.apache.spark.sql.SchemaRDD = SchemaRDD[6] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == Project [name#0]Filter ((age#1 >= 13) && (age#1 <= 19))PhysicalRDD [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at ExistingRDD.scala:36scala> teenagers.map(t => "Name: " + t(0)).collect().foreach(println) 15/02/27 16:42:24 INFO FileInputFormat: Total input paths to process : 1 15/02/27 16:42:24 INFO SparkContext: Starting job: collect at <console>:18 15/02/27 16:42:24 INFO DAGScheduler: Got job 0 (collect at <console>:18) with 1 output partitions (allowLocal=false) 15/02/27 16:42:24 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:18) 15/02/27 16:42:24 INFO DAGScheduler: Parents of final stage: List() 15/02/27 16:42:24 INFO DAGScheduler: Missing parents: List() 15/02/27 16:42:24 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at <console>:18), which has no missing parents 15/02/27 16:42:24 INFO MemoryStore: ensureFreeSpace(6416) called with curMem=186441, maxMem=280248975 15/02/27 16:42:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 6.3 KB, free 267.1 MB) 15/02/27 16:42:24 INFO MemoryStore: ensureFreeSpace(4290) called with curMem=192857, maxMem=280248975 15/02/27 16:42:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.2 KB, free 267.1 MB) 15/02/27 16:42:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:54438 (size: 4.2 KB, free: 267.2 MB) 15/02/27 16:42:24 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/02/27 16:42:24 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/02/27 16:42:24 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[7] at map at <console>:18) 15/02/27 16:42:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/02/27 16:42:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1349 bytes) 15/02/27 16:42:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/02/27 16:42:24 INFO HadoopRDD: Input split: file:/home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/examples/src/main/resources/people.txt:0+32 15/02/27 16:42:24 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/02/27 16:42:24 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/02/27 16:42:24 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/02/27 16:42:24 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/02/27 16:42:24 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/02/27 16:42:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1734 bytes result sent to driver 15/02/27 16:42:24 INFO DAGScheduler: Stage 0 (collect at <console>:18) finished in 0.248 s 15/02/27 16:42:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 237 ms on localhost (1/1) 15/02/27 16:42:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/02/27 16:42:24 INFO DAGScheduler: Job 0 finished: collect at <console>:18, took 0.348078 s Name: Justinscala> val teenagers = sqlContext.sql("SELECT name FROM people ") 15/02/27 17:06:45 INFO BlockManager: Removing broadcast 1 15/02/27 17:06:45 INFO BlockManager: Removing block broadcast_1_piece0 15/02/27 17:06:45 INFO MemoryStore: Block broadcast_1_piece0 of size 4290 dropped from memory (free 280056118) 15/02/27 17:06:45 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:54438 in memory (size: 4.2 KB, free: 267.2 MB) 15/02/27 17:06:45 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/02/27 17:06:45 INFO BlockManager: Removing block broadcast_1 15/02/27 17:06:45 INFO MemoryStore: Block broadcast_1 of size 6416 dropped from memory (free 280062534) 15/02/27 17:06:45 INFO ContextCleaner: Cleaned broadcast 1 teenagers: org.apache.spark.sql.SchemaRDD = SchemaRDD[10] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == Project [name#0]PhysicalRDD [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at ExistingRDD.scala:36scala> teenagers.map(t => "Name: " + t(0)).collect().foreach(println) 15/02/27 17:06:50 INFO SparkContext: Starting job: collect at <console>:18 15/02/27 17:06:50 INFO DAGScheduler: Got job 1 (collect at <console>:18) with 1 output partitions (allowLocal=false) 15/02/27 17:06:50 INFO DAGScheduler: Final stage: Stage 1(collect at <console>:18) 15/02/27 17:06:50 INFO DAGScheduler: Parents of final stage: List() 15/02/27 17:06:50 INFO DAGScheduler: Missing parents: List() 15/02/27 17:06:50 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[11] at map at <console>:18), which has no missing parents 15/02/27 17:06:50 INFO MemoryStore: ensureFreeSpace(5512) called with curMem=186441, maxMem=280248975 15/02/27 17:06:50 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.4 KB, free 267.1 MB) 15/02/27 17:06:50 INFO MemoryStore: ensureFreeSpace(3790) called with curMem=191953, maxMem=280248975 15/02/27 17:06:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.7 KB, free 267.1 MB) 15/02/27 17:06:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:54438 (size: 3.7 KB, free: 267.2 MB) 15/02/27 17:06:50 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 15/02/27 17:06:50 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838 15/02/27 17:06:50 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MappedRDD[11] at map at <console>:18) 15/02/27 17:06:50 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 15/02/27 17:06:50 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1349 bytes) 15/02/27 17:06:50 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 15/02/27 17:06:50 INFO HadoopRDD: Input split: file:/home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/examples/src/main/resources/people.txt:0+32 15/02/27 17:06:50 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1763 bytes result sent to driver 15/02/27 17:06:50 INFO DAGScheduler: Stage 1 (collect at <console>:18) finished in 0.018 s 15/02/27 17:06:50 INFO DAGScheduler: Job 1 finished: collect at <console>:18, took 0.032411 s Name: Michael Name: Andy Name: Justinscala> 15/02/27 17:06:50 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 19 ms on localhost (1/1) 15/02/27 17:06:50 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/02/27 17:36:19 INFO BlockManager: Removing broadcast 2 15/02/27 17:36:19 INFO BlockManager: Removing block broadcast_2 15/02/27 17:36:19 INFO MemoryStore: Block broadcast_2 of size 5512 dropped from memory (free 280058744) 15/02/27 17:36:19 INFO BlockManager: Removing block broadcast_2_piece0 15/02/27 17:36:19 INFO MemoryStore: Block broadcast_2_piece0 of size 3790 dropped from memory (free 280062534) 15/02/27 17:36:19 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:54438 in memory (size: 3.7 KB, free: 267.2 MB) 15/02/27 17:36:19 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 15/02/27 17:36:19 INFO ContextCleaner: Cleaned broadcast 2
總結
以上是生活随笔為你收集整理的spark sql 1.2.0 测试的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MySQL协议分析
- 下一篇: spark sql and hive 3