log 로 살펴본 Spark 동작 과정

spark가 실행되는 과정을 로그를 통해서 살펴 보는것도 의미가 있다고 생각해서 진행하고자 합니다.

시간 날때마다 정리를 해서 업로드 할 예정입니다.

코드는 아래와 같다.

val sparkSession = SparkUtil.getInstance();

//Load RDD
val readme = sparkSession.sparkContext.textFile("./data/README.md");
readme.toDebugString;
println(readme.count());

sparkSession.stop()

아래는 위 코드의 SparkLog 입니다.

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/03/22 16:50:39 INFO SparkContext: Running Spark version 2.4.3
20/03/22 16:50:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/22 16:50:40 INFO SparkContext: Submitted application: Spark Basic Test
20/03/22 16:50:40 INFO SecurityManager: Changing view acls to: daeyunkim
20/03/22 16:50:40 INFO SecurityManager: Changing modify acls to: daeyunkim
20/03/22 16:50:40 INFO SecurityManager: Changing view acls groups to: 
20/03/22 16:50:40 INFO SecurityManager: Changing modify acls groups to: 
20/03/22 16:50:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(daeyunkim); groups with view permissions: Set(); users  with modify permissions: Set(daeyunkim); groups with modify permissions: Set()
20/03/22 16:50:40 INFO Utils: Successfully started service 'sparkDriver' on port 52540.
20/03/22 16:50:40 INFO SparkEnv: Registering MapOutputTracker
20/03/22 16:50:40 INFO SparkEnv: Registering BlockManagerMaster
20/03/22 16:50:40 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/22 16:50:40 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/22 16:50:40 INFO DiskBlockManager: Created local directory at /private/var/folders/qx/84bp85pn2y5gn6_96k267psc0000gn/T/blockmgr-fa3cf378-4049-4c69-8c24-7eaddb7af53e
20/03/22 16:50:40 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
20/03/22 16:50:40 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/22 16:50:40 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/03/22 16:50:40 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.30.1.58:4040
20/03/22 16:50:40 INFO Executor: Starting executor ID driver on host localhost
20/03/22 16:50:40 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52541.
20/03/22 16:50:40 INFO NettyBlockTransferService: Server created on 172.30.1.58:52541
20/03/22 16:50:40 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/22 16:50:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.30.1.58, 52541, None)
20/03/22 16:50:41 INFO BlockManagerMasterEndpoint: Registering block manager 172.30.1.58:52541 with 2004.6 MB RAM, BlockManagerId(driver, 172.30.1.58, 52541, None)
20/03/22 16:50:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.30.1.58, 52541, None)
20/03/22 16:50:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.30.1.58, 52541, None)
20/03/22 16:50:41 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 2004.4 MB)
20/03/22 16:50:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 2004.4 MB)
20/03/22 16:50:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.30.1.58:52541 (size: 20.4 KB, free: 2004.6 MB)
20/03/22 16:50:42 INFO SparkContext: Created broadcast 0 from textFile at TestInternalRDD.scala:12
20/03/22 16:50:42 INFO FileInputFormat: Total input paths to process : 1
20/03/22 16:50:42 INFO SparkContext: Starting job: count at TestInternalRDD.scala:14
20/03/22 16:50:42 INFO DAGScheduler: Got job 0 (count at TestInternalRDD.scala:14) with 2 output partitions
20/03/22 16:50:42 INFO DAGScheduler: Final stage: ResultStage 0 (count at TestInternalRDD.scala:14)
20/03/22 16:50:42 INFO DAGScheduler: Parents of final stage: List()
20/03/22 16:50:42 INFO DAGScheduler: Missing parents: List()
20/03/22 16:50:42 INFO DAGScheduler: Submitting ResultStage 0 (./data/README.md MapPartitionsRDD[1] at textFile at TestInternalRDD.scala:12), which has no missing parents
20/03/22 16:50:42 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 2004.4 MB)
20/03/22 16:50:42 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2023.0 B, free 2004.4 MB)
20/03/22 16:50:42 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.30.1.58:52541 (size: 2023.0 B, free: 2004.6 MB)
20/03/22 16:50:42 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
20/03/22 16:50:42 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (./data/README.md MapPartitionsRDD[1] at textFile at TestInternalRDD.scala:12) (first 15 tasks are for partitions Vector(0, 1))
20/03/22 16:50:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/03/22 16:50:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7932 bytes)
20/03/22 16:50:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7932 bytes)
20/03/22 16:50:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/22 16:50:42 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/03/22 16:50:42 INFO HadoopRDD: Input split: file:/Users/daeyunkim/Documents/SparkCode/BasicSparkScala/data/README.md:0+2243
20/03/22 16:50:42 INFO HadoopRDD: Input split: file:/Users/daeyunkim/Documents/SparkCode/BasicSparkScala/data/README.md:2243+2244
20/03/22 16:50:42 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 875 bytes result sent to driver
20/03/22 16:50:42 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 875 bytes result sent to driver
20/03/22 16:50:42 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 141 ms on localhost (executor driver) (1/2)
20/03/22 16:50:42 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 122 ms on localhost (executor driver) (2/2)
20/03/22 16:50:42 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/03/22 16:50:42 INFO DAGScheduler: ResultStage 0 (count at TestInternalRDD.scala:14) finished in 0.238 s
20/03/22 16:50:42 INFO DAGScheduler: Job 0 finished: count at TestInternalRDD.scala:14, took 0.316393 s
108
20/03/22 16:50:42 INFO SparkUI: Stopped Spark web UI at http://172.30.1.58:4040
20/03/22 16:50:42 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/22 16:50:42 INFO MemoryStore: MemoryStore cleared
20/03/22 16:50:42 INFO BlockManager: BlockManager stopped
20/03/22 16:50:42 INFO BlockManagerMaster: BlockManagerMaster stopped
20/03/22 16:50:42 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/22 16:50:42 INFO SparkContext: Successfully stopped SparkContext
20/03/22 16:50:42 INFO ShutdownHookManager: Shutdown hook called
20/03/22 16:50:42 INFO ShutdownHookManager: Deleting directory /private/var/folders/qx/84bp85pn2y5gn6_96k267psc0000gn/T/spark-de3a49e8-e75a-4c6f-8013-a6829507e0f3

+ Recent posts