A library for reading social data from twitter using Spark Streaming.
libraryDependencies += "org.apache.bahir" %% "spark-streaming-twitter" % "2.3.0"
<dependency> <groupId>org.apache.bahir</groupId> <artifactId>spark-streaming-twitter_2.11</artifactId> <version>2.3.0</version> </dependency>
This library can also be added to Spark jobs launched through
spark-submit by using the
--packages command line option.
For example, to include it when starting the spark shell:
$ bin/spark-shell --packages org.apache.bahir:spark-streaming-twitter_2.11:2.3.0
--packages ensures that this library and its dependencies will be added to the classpath.
--packages argument can also be used with
This library is cross-published for Scala 2.10 and Scala 2.11, so users should replace the proper Scala version (2.10 or 2.11) in the commands listed above.
TwitterUtils uses Twitter4j to get the public stream of tweets using Twitter’s Streaming API. Authentication information
can be provided by any of the methods supported by Twitter4J library. You can import the
TwitterUtils class and create a DStream with
TwitterUtils.createStream as shown below.
import org.apache.spark.streaming.twitter._ TwitterUtils.createStream(ssc, None)
import org.apache.spark.streaming.twitter.*; TwitterUtils.createStream(jssc);
You can also either get the public stream, or get the filtered stream based on keywords. See end-to-end examples at Twitter Examples