UnknownHostExceptionError in Spark Streaming

.socketTextStream serves a completely different purpose. Spark Streaming does not have any receiver to fetch a URL periodically.

You will need to write a separate program to fetch the URL periodically and feed it to Spark Streaming. You have many options:

  • Write a shell script to download the URL periodically to a directory, then use Apache Flume to read the files in that directory and send them to Spark Streaming. There is an integration guide: Spark Streaming + Flume Integration Guide
  • Write your own Spark Streaming receiver. You can start here.
  • In your Spark app, start a thread that fetches the URL periodically and open a socket to send the contents, then connect to that socket (e.g. .socketTextStream(, 9999)).

There are a lot of variations and a few more advanced solutions, but I would say these are the more convenient.

