A library for reading social data from twitter using Spark Streaming.
Using SBT:
libraryDependencies += "org.apache.bahir" %% "spark-streaming-twitter" % "2.3.1"
Using Maven:
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>2.3.1</version>
</dependency>
This library can also be added to Spark jobs launched through spark-shell
or spark-submit
by using the --packages
command line option.
For example, to include it when starting the spark shell:
$ bin/spark-shell --packages org.apache.bahir:spark-streaming-twitter_2.11:2.3.1
Unlike using --jars
, using --packages
ensures that this library and its dependencies will be added to the classpath.
The --packages
argument can also be used with bin/spark-submit
.
This library is cross-published for Scala 2.10 and Scala 2.11, so users should replace the proper Scala version (2.10 or 2.11) in the commands listed above.
TwitterUtils
uses Twitter4j to get the public stream of tweets using Twitter’s Streaming API. Authentication information
can be provided by any of the methods supported by Twitter4J library. You can import the TwitterUtils
class and create a DStream with TwitterUtils.createStream
as shown below.
import org.apache.spark.streaming.twitter._
TwitterUtils.createStream(ssc, None)
import org.apache.spark.streaming.twitter.*;
TwitterUtils.createStream(jssc);
You can also either get the public stream, or get the filtered stream based on keywords. See end-to-end examples at Twitter Examples