Collections
framework.On to Chapter 16! We've already touched on a couple of the ideas
at the beginning of the chapter— like
what ArrayList<String>
means, and how to sort
an ArrayList
. But in the chapter we'll learn a lot
more about these parts of Java.
So the code on p. 531 should be no big deal. The only maybe-new
thing is the technique for splitting a String
(using
the split()
method, not surprisingly). Similarly,
we've already had the "discovery" that ArrayList
doesn't have a sort()
method. The discussion on
p. 533–4 starts to give you an overview of what other kinds
of "collections" are available in the Java libraries; note that
most of the distinctions among these different collections are
based on what kinds of operations are fast, or not, for each
collection (e.g. in HashSet
you can find an element
quickly).
(Note the remark about "my Data Structures class from college"—obviously most people in this class haven't had that class yet! That's the class where we talk a LOT about what a "tree" is, what a "linked list" is, what "hash" means. So if you're not sure what those are, don't worry! It's not super-important to understand now, but this is a valuable preview of what you'll encounter when you get to data structures.
P. 536 is where things start to get interesting: rather than
working with String
s in our collection, we're going to work
with instances of a new class we've
created, Song
. There's a lot going on here: for the
first time, we're overriding the toString()
method
that is defined up in the Object
class (take a look
back at p. 208 in Chapter 8 if you need a refresher). And look at
the annotation—if you pass an ArrayList
to System.out.println()
, it will call
the toString()
method on each element in the list. (Put
another way, the Java API is set up to automatically produce a String
representation
of any ArrayList
, as long as the elements of the list have a toString()
method.
The official documentation for this
is here). Remember
this, about overriding toString()
!
So then on p. 538 we have another interesting problem: when we try
to sort our list of Song
s, Java says it can't find a
sort method—even though this worked just fine a minute ago,
when we were sorting String
s. If you want to see the
online version of the API documentation shown on p. 539,
it's here. And
the book very correctly notes that we have to "learn how to interpret
the documentation"—the notation in this method is a little
challenging.
P. 540 and 541 give a good overview of what "generics" means. But before you continue, make sure you understand why the situation illustrated on the bottom of 540 is better than the situation above it—exactly what kind of bad thing could happen "WITHOUT generics"? (Hint: we've seen this diagram before, on p. 211.)
So then a little weirdness in point (2) on p. 544. It's not clear
from this page why this is important (but this should look a lot
like the signature of the sort()
method that we're
trying to understand. Hang in there... on p. 545, don't worry
(yet) about the differences between the two lines at the top, but you should
feel OK about the differences between "taking only Animal
" and "taking any type that
is a type of Animal
.")
Finally, p. 547 points out the problem with trying to sort a list
of Song
s. Now, you can use the discussion from p. 544
on to help you unpack the code here. The main point is that if
you're going to sort a collection of some type, that type needs to
implement the Comparable<T>
interface. (Note that
in theory you could have Song
implement all kinds
of Comparable
interfaces—not
just Comparable<Song>
but
maybe Comparable<String>
, Comparable<Animal>
,
and so on. But does it make sense to compare a song to an animal? Or
even to a string? Almost always this doesn't make any sense, so
usually a class will only define the behavior for comparing two
objects of that class. But it's technically possible to compare two
objects of any two different classes, if you want.
So on to the next problem, on p. 551. (By the way, when the book
talks about "The horrible way...", who does that remind you of?). Be
sure you understand the difference between Comparable
and Comparator
. They both address the same problem, but
in different ways. And as this ongoing jukebox example shows, we
often need to use both.
Finally, note that the implementation on p. 553 makes
the Comparator
an inner class (and also suggests we
could define several inner Comparator
s. Why did they
use an inner class here? Would a non-inner class work? (By the way,
the paragraph
on this
page entitled "Why Use Nested Classes?" nicely summarizes why
inner classes can be good (and when).
OK, let's finish the chapter. The next issue with the jukebox (now that we've figured out how to sort songs by different characteristics) is that when we ask for a list of songs that have been played, we're not interested (right now, anyway) in duplicates. So, p. 557 summarizes the three main types of collections Java has: lists, sets, and maps. (It might have been nice to know this back on p. 533, but now you know, and you can re-read the chapter....) By the way, it's nice that book gives us this nice general overview of (parts of) the Collections
API, but it doesn't seem like any of the individual API docs have this information in it. How did the book authors figure this stuff out? Probably, they followed the link in one of those API documents to "The Collections Framework" and started reading the "Overview" they found there.
You should be sure you understand the basic differences among lists, sets, and maps. The diagram on p. 558 is useful for understanding how the most useful collection classes are related to each other. Don't worry too much right now about what exactly all these different classes are for, but DO notice the basic structure: each of the Set
, List
, and Map
interfaces have a few classes that implement them. The names of those classes (mostly) include the name of the interface they implement plus (mostly) something else, like "Tree" or "Linked." The idea here is that the interface tells you what behavior the class offers, and the rest of the name tells you something about the class is implemented. If you're paying attention, you might be thinking, "Wait, doesn't that, like, reveal crucial implementation details, which is a grievous violation of ENCAPSULATION?" In another context, it might be—especially in a context where we might want to change how we implement something in the future. But in the context of collections, knowing about the implementation gives us valuable information about performance—what operations this class can do quickly, and what operations will be more time-consuming. So these names are useful and important.
So anyway: we need to store our songs in a set, because sets automatically prevent duplicates from being stored—either something is stored in the set, or it's not. We have several classes to choose from but (as it turns out), HashSet
is the most straightforward. The code on p. 559 should be pretty easy to follow—note the HashSet.addAll()
method. But, of course, just switching to a HashSet
doesn't quite solve our duplicate problem (kinda like Java wouldn't "automatically" sort our Song
objects). What's the problem this time? Remember, earlier, we had to provide customized comparison functionality; now we have a similar problem: we need to provide customized equality functionality. When we left out the comparison code for songs, remember, the compiler complained—in that case, we actually had to implement an interface. But in the case of equality, the Object
class provides default behavior—so if we don't provide any custom code, then the default code will run. So, the code will compile and run, but, as we see on p. 559, it may not have the behavior we want.
What do we have to do? The relatively obvious part: we need to override the equals()
method: this method defines what it means for two Song
objects to be equal to each other (what is the book's definition of song-equality? it works for their examples, but do you think it's the most robust definition?) But there's a less-obvious thing we need to do, to: we need to override the hashCode()
method. (You can read the documentation here; the book summarizes it pretty well.) You'll discuss hash codes more in data structures; the main idea we need now is that hash codes provide a quick way to check if two objects are the same or not. The default behavior is for every object to have a unique hashcode (this is not a 100% guarantee, but it's quite close). What does this have to do with HashSet
? When we try to add an element to a HashSet
, it compares the hashcode of that element to the hashcodes of the elements already in the set. If the hashcode doesn't match any existing elements, then that element is treated a a new/unique one. That's why the code on p. 559 doesn't work the way we want it to: every object gets a unique value, so they all "look different" to the HashSet
. We need to override hashCode()
so that if two objects are equal, then they also have the same hash code. Note how the book does this—the equals()
and hashCode()
implementations are quite similar.
The box on p. 563 is a good summary of how to work with hash codes; the Q&A on that page gives a little more detail on what's going on with hashcodes inside HashSet
(or other "Hash"-flavored collections), but the details aren't important for this course.
And now, a quick look at two more collections. TreeSet
provides the same no-duplicate behavior, but it doesn't rely on hash codes. Instead, it keeps its elements sorted, so if you try to add an element that's already there, it's easy to discover that it's already there. Note that you MUST provide comparison behavior—either the objects you're adding implement Comparable
, or else you have to pass a Comparator
object to a constructor when you create the TreeSet
. And a Map
behaves kinda like a dictionary, except that you're not restricted to looking up words and definitions. The book's example is a nice one—we can create a map that allows us to quickly look up the score of a player.
The last few pages of the chapter go back and address some interesting issues around using generics: as usual, the issue is making sure that collections declared generically don't contain objects of an unexpected type. The book's treatment of this complex issue is pretty good, but you may need to read it a few times before you feel like you have a handle on the problem and the solution! As usual, the exercises at the end of the chapter are really good ways to check your understanding.