A library for reading data from Google Cloud Pub/Sub using Spark Streaming.
Using SBT:
libraryDependencies += "org.apache.bahir" %% "spark-streaming-pubsub" % "2.2.0"
Using Maven:
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-pubsub_2.11</artifactId>
<version>2.2.0</version>
</dependency>
This library can also be added to Spark jobs launched through spark-shell
or spark-submit
by using the --packages
command line option.
For example, to include it when starting the spark shell:
$ bin/spark-shell --packages org.apache.bahir:spark-streaming-pubsub_2.11:2.2.0
Unlike using --jars
, using --packages
ensures that this library and its dependencies will be added to the classpath.
The --packages
argument can also be used with bin/spark-submit
.
First you need to create credential by SparkGCPCredentials, it support four type of credentials
SparkGCPCredentials.builder.build()
SparkGCPCredentials.builder.jsonServiceAccount(PATH_TO_JSON_KEY).build()
SparkGCPCredentials.builder.p12ServiceAccount(PATH_TO_P12_KEY, EMAIL_ACCOUNT).build()
SparkGCPCredentials.builder.metadataServiceAccount().build()
val lines = PubsubUtils.createStream(ssc, projectId, subscriptionName, credential, ..)
JavaDStream<SparkPubsubMessage> lines = PubsubUtils.createStream(jssc, projectId, subscriptionName, credential...)
See end-to-end examples at Google Cloud Pubsub Examples