A library for reading social data from twitter using Spark Streaming.
Using SBT:
libraryDependencies += "org.apache.bahir" %% "spark-streaming-twitter" % "2.4.0-SNAPSHOT"
Using Maven:
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>2.4.0-SNAPSHOT</version>
</dependency>
This library can also be added to Spark jobs launched through spark-shell
or spark-submit
by using the --packages
command line option.
For example, to include it when starting the spark shell:
$ bin/spark-shell --packages org.apache.bahir:spark-streaming-twitter_2.11:2.4.0-SNAPSHOT
Unlike using --jars
, using --packages
ensures that this library and its dependencies will be added to the classpath.
The --packages
argument can also be used with bin/spark-submit
.
This library is cross-published for Scala 2.11 and Scala 2.12, so users should replace the proper Scala version in the commands listed above.
TwitterUtils
uses Twitter4j to get the public stream of tweets using Twitter’s Streaming API. Authentication information
can be provided by any of the methods supported by Twitter4J library. You can import the TwitterUtils
class and create a DStream with TwitterUtils.createStream
as shown below.
import org.apache.spark.streaming.twitter._
TwitterUtils.createStream(ssc, None)
import org.apache.spark.streaming.twitter.*;
TwitterUtils.createStream(jssc);
You can also either get the public stream, or get the filtered stream based on keywords. See end-to-end examples at Twitter Examples.
Executing integration tests requires users to register custom application at Twitter Developer Portal and obtain private OAuth credentials. Below listing present how to run complete test suite on local workstation.
cd streaming-twitter
env ENABLE_TWITTER_TESTS=1 \
twitter4j.oauth.consumerKey=${customer key} \
twitter4j.oauth.consumerSecret=${customer secret} \
twitter4j.oauth.accessToken=${access token} \
twitter4j.oauth.accessTokenSecret=${access token secret} \
mvn clean test