A library for reading data from Google Cloud Pub/Sub using Spark Streaming.
Using SBT:
libraryDependencies += "org.apache.bahir" %% "spark-streaming-pubsub" % "2.4.0-SNAPSHOT"
Using Maven:
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-pubsub_2.11</artifactId>
<version>2.4.0-SNAPSHOT</version>
</dependency>
This library can also be added to Spark jobs launched through spark-shell
or spark-submit
by using the --packages
command line option.
For example, to include it when starting the spark shell:
$ bin/spark-shell --packages org.apache.bahir:spark-streaming-pubsub_2.11:2.4.0-SNAPSHOT
Unlike using --jars
, using --packages
ensures that this library and its dependencies will be added to the classpath.
The --packages
argument can also be used with bin/spark-submit
.
First you need to create credential by SparkGCPCredentials, it support four type of credentials
SparkGCPCredentials.builder.build()
SparkGCPCredentials.builder.jsonServiceAccount(PATH_TO_JSON_KEY).build()
SparkGCPCredentials.builder.jsonServiceAccount(JSON_KEY_BYTES).build()
SparkGCPCredentials.builder.p12ServiceAccount(PATH_TO_P12_KEY, EMAIL_ACCOUNT).build()
SparkGCPCredentials.builder.p12ServiceAccount(P12_KEY_BYTES, EMAIL_ACCOUNT).build()
SparkGCPCredentials.builder.metadataServiceAccount().build()
val lines = PubsubUtils.createStream(ssc, projectId, subscriptionName, credential, ..)
JavaDStream<SparkPubsubMessage> lines = PubsubUtils.createStream(jssc, projectId, subscriptionName, credential...)
See end-to-end examples at Google Cloud Pubsub Examples
To run the PubSub test cases, you need to generate Google API service account key files and set the corresponding environment variable to enable the test.
Credentials
Tab> Create credentials
button> Service account key
Role> Pub/Sub> Pub/Sub Editor
and check the option Furnish a private key
to create one. You need to create one for JSON key file, another for P12.Service account ID
mvn clean package -DskipTests -pl streaming-pubsub
export ENABLE_PUBSUB_TESTS=1
export GCP_TEST_ACCOUNT="THE_P12_SERVICE_ACCOUNT_ID_MENTIONED_ABOVE"
export GCP_TEST_PROJECT_ID="YOUR_GCP_PROJECT_ID"
export GCP_TEST_JSON_KEY_PATH=/path/to/pubsub/credential/files/Apache-Bahir-PubSub-1234abcd.json
export GCP_TEST_P12_KEY_PATH=/path/to/pubsub/credential/files/Apache-Bahir-PubSub-5678efgh.p12
mvn test -pl streaming-pubsub