Most efficient way to find unique entries in a large data set

I would not use a sorted array. I would create a Map<String, Integer> where the key is your word and the value is the count of the number of occurrences of the word. As you read each word, do something like this:

Integer count = map.get(word);
if (count == null) {
    count = 0;
map.put(word, count + 1);

Then just iterate over the map's entry set and do whatever you need to do with the counts.

If you know, or can estimate, the number of unique words then you should use this number in the HashMap constructor (so you don't grow the map many times).

If you use a sorted array, your run time cannot be better than proportional to NlogN (where N is the number of words in your list). If you use a HashMap, you can achieve a runtime that grows linearly with N (you save yourself the factor of logN).

Another advantage of using a Map is the memory used is proportional to the number of unique words, rather than the total number of words (assuming that you build the map while reading the words, rather than reading all words into a collection and then adding them to the map).

