p3: pretty pictures (via genetic algorithms)
First version: 09-Nov-1999 by Dan S. Wallach; revised 2007 by {adamstep, derrley, dsandler}
Deadlines
Sign up for your milestone meetings!
| Mon 31 March | Announce your partnerships to the TAs |
| Tue 8 April | Spec due (12:01am) |
| Tue 15 April | Beta due (12:01am) |
| Wed 23 April | Final due (last day of class!) |
Prologue: Project teams
Momma always said, “Give ’em the good news first. It’ll soften the blow.” Project 3 will be completed by student groups of no more than two students, and most of you will be delighted to discover that we’re allowing each student to choose his or her partner for project 3. Each group must announce its membership by email to jeff.kilpatrick@ no later than Monday, 31 March. Any student who has not submitted a choice will be paired arbitrarily by the grading staff on Monday. Please make your submission as soon as you and your partner have entered into an agreement to work together. Also, please do not submit a choice before you have entered into an agreement with your submitted chosen partner. (If we receive conflicting information from different members of a team, we will attempt to notify the parties involved in a timely manner, but we might decide to pair you up randomly instead. You have been warned!)
Please choose your partner for this project wisely. Project 3 requires more design, thought, and lines of code than either of the other projects you have worked on in this class. Please be sure to choose someone with whose schedule significantly overlaps your own. While there are times you will want to split up (divide and conquer?) work for this project, more often—especially in the early design phase—you will want to work closely with your partner.
Introduction
Project 3 is called “Pretty Pictures” because you (the student) will be building an application that can be used to generate quite interesting pictures. The specifics of these pictures and of the application come in the following sections.
In order to be successful in project 3, you must be proficient in many of the areas covered in the course during the semester. What follows, here, is a general outline of what you’ll be required to implement in your version of Pretty Pictures. Details for each of these list items come later in this document. Consider each of these major features equally weighted with all others.
- User Interface: Your application must be driven by a usable, event-driven GUI interface written in Java’s Swing framework. This interface must behave much like any web-browser does. Namely, it must have a forward, a back, a stop, and a reload button, and each of these must have sensible behaviors. This means that different "views" the user comes into contact with must behave much like web pages do, and your application must behave much like a web browser does.
- Functions: Your application must allow the user to see the picture which represents a mathematical function that is composed only of specific intrinsic functions.
- Breeding: Your application must allow the user to choose any set of rendered functions and breed them. Breeding a set of functions should generate a newer "generation" of functions. Each member of this newer generation should have specific traits and characteristics taken from members of its parent generation.
- Animation: Your application must be able to generate a genetic animation between any two rendered functions. This genetic animation is a bit more complicated (and quite a bit more awesome) than a simple fade-out/fade-in.
- Files: Your application must be able to save both rendered images and animations to disk. (You’ll find these images make spiffy desktop backgrounds.)
- Concurrency: When the user asks your application to attempt to complete any of these required tasks, the application must (1) remain responsive and (2) give progress feedback. This means that your application will fail if it does not correctly make use of Java’s concurrency features.
User interface requirements
The user interface for your program will be modeled after that of a Web browser. (This is actually a gift in disguise; we’re saving you the burden of inventing an entirely new interface metaphor on top of all the other work you have to do.)
In particular, your program will present most information in a single, central content area. You must implement a history (again, like browser history) for the various views that appear in this content area; the user may navigate the history with Back and Forward buttons, and can consult a history list to jump to an arbitrary point forward or backward in time. (There are some tricky edge cases here: What if a user goes back, and then launches off in a different direction? When in doubt, fire up your favorite browser and see how it’s done there.)
You will also have a Stop button that aborts all computation immediately (even if in the middle of doing something), as well as a Restart button that flushes what you’ve got and starts over.
You must document your user interface in your spec. For the spec deadline we will expect preliminary diagrams (viz., “wireframes”) covering all major windows and views of your application, including a description of how all the pieces fit together to let the user accomplish tasks (“scenarios” or “use cases”). Where appropriate, defend your interface design in terms of principles and patterns of “good” UI (cite lecture or outside sources as necessary). As your interface design evolves over the course of the project, you must update your spec to reflect the changes, just as you do when the software design changes.
We encourage you to be creative in this project; if you feel you can make a more usable program by expanding beyond the Web browser model, go for it. However, if you deviate significantly from what is described here, you must defend your decisions in your specification. Please keep in mind that the minimum requirement (which you are not given liberty to change) is the basic feature set of a Web browser as described above.
We strongly encourage simple, informal user testing of the user interface. (That is, unit tests are required for the correctness of your code, and user tests are recommended for the “correctness” of your user interface.) Sit your test subjects (e.g. significant other, roommates, etc.) down in front of your program, explain its general purpose, and then silently watch them play. It should be an enlightening, frustrating, and fun experience, and it will give you a reality check on your design. (Remember: “the designer is a bad test subject: he or she knows too much.”) You might consider including any particularly illuminating user test anecdotes in your spec, too.
How to render functions
Here’s the general idea: a picture is really a function that maps points (X,Y) to colors (R,G,B). You evaluate this function once for each pixel on the screen and store the results.
Your application will be able to render any function built out of members of a rich set of intrinsic functions. Each function is defined as taking zero or more floating-point arguments and returning a floating-point value. This function will then be evaluated for each pixel (i.e., looping over X and Y) and for each color channel (R,G,B).
We will define an image over X and Y from -1 to 1. The upper left corner of the image will be (-1, 1), which is to say, it will be the usual Cartesian coordinates you do on paper rather than the backward geometry you see in many computer graphics libraries.
You must support everything below (and feel free to add additional stuff). You should scale things such that the minimum (R,G,B) value is (-1, -1, -1) and the maximum value is (1, 1, 1), mapping to pure black and pure white, respectively.
- Zero arguments
- Constants [-1, 1]
- The variables X and Y (the target pixel you’re evaluating)
- One argument
- Negate
- Round-down and round-up
- Sin, Cos, ATan (trigonometric functions)
- Exponentiate, Log
- Absolute value
- Clip (result is clipped to [-1,1])
- Clip(0.5) = 0.5
- Clip(1.5) = 1.0
- Clip(-1.5) = -1.0
- Clip(10) = 1.0
- Wrap (result is wrapped to [-1,1])
- Wrap(0.5) = 0.5
- Wrap(1.5) = -0.5
- Wrap(-1.5) = 0.5
- Wrap(10) = 0
- Two arguments
- Add, subtract, multiply, divide
- External image (X and Y inputs to this
function return the colors of the image)
- Clipping version (points outside the box are clipped back to the unit box)
- Tiling version (points outside the box are brought back inside the box using a modulo-like operation)
- Perlin noise (Java source code available)
- Grayscale noise (same value for R, G, and B)
- Color noise (different values for R, G, and B)
- Special one-argument
- RGB-To-YCrCb - this converts from the RGB color-space that you know to a luminance / chrominance space commonly used in digital television and other applications (see below).
- YCrCb-To-RGB - this is the inverse operation to the above.
Now, not all of these functions are defined continuously. You should have appropriate error handling (i.e., just define divide-by-zero to return zero and continue onward).
The color space converter intrinsic functions are implemented using the following system of equations:
Y = 0.2989 R + 0.5866 G + 0.1145 B Cb = -0.1687 R - 0.3312 G + 0.5000 B Cr = 0.5000 R - 0.4183 G - 0.0816 B
R = Y + 1.4022 Cr G = Y - 0.3456 Cb - 0.7145 Cr B = Y + 1.7710 Cb
When your genetic images start combining color space changes with piecewise multiplication, you’ll start getting some very pretty visual effects.
It is important to note that you’ll need to make a decision about how functions and constants are represented in your design. A naive approach may only allow color to be introduced into pictures via Perlin noise. This approach is not acceptable.
You have two options for how to represent functions and two options for how to represent constants:
- You may represent constants either as numbers or as ordered triples (R, G, B).
- You may design functions to operate on ordered triples (R, G, B) or on single numbers.
If functions operate on ordered triples, then constants must also be defined as ordered triples. In this situation, constants represent solid colors. If functions operate on single numbers, your constants must be single numbers as well. In this case, there must be three independent trees defined in a function (one to operate on each of (R, G, B). This second method allows for slightly more interesting pictures (different functions are operating in each of the color channels) but they also tend to be a bit more “busy.” Which of these methods you prefer is a matter of taste.
Breeding functions?
This portion of the assignment is based on a paper written in 1991 by Karl Sims, Artificial Evolution for Computer Graphics. Scroll down to the bottom of his Web page and check out the pretty pictures. For a whizzy Web implementation of the same thing, check out The Gallery of Random Art. You should consider the Sims paper to be a part of your assignment specification.
It's not as crazy as it sounds. While there are a number of techniques available, you will use the same technique as Sims does in section 4.3 ("Mating Symbolic Expressions"). The idea is that any function can be represented as a tree (where leafs of this tree are either variables {x,y} or constants, and these leaves are joined to a head by intrinsic functions.) You may find it useful to think of these functions as S-Expressions. For example, the function (Add x y) would be represented by a tree whose head is Add. Add's left child would be x (which is a leaf) and Add's right child would be y (which is also a leaf). Given two trees, Sims describes two forms of breeding which can be performed on them:
- To create a child from two parents which are similar in their genotype, recursively traverse both parents. So long as the parents are the same, the child will be the same as the parent. Once a difference is reached, randomly choose one parent or the other.
- If the two parents have unrelated genotypes, they won't be able to breed together as above. Instead, take a random clipping from one parent and splice it onto a copy of the other parent at a random location, replacing whatever was there before.
Your Pretty Pictures implementation must allow the user to successively create generations of pictures. In order to create a new generation, the user must select which pictures he or she would like to include in the parent set. Members in arbitrary pairs in this set are then bred with each other. The resulting genotypes create a new generation.
In addition to evolution through breeding, you should also support evolution through mutation. Mutation means replacing an existing node in the tree with something generated at random (or tweaking some existing constant within the tree). Cool points: you could try a variety of different mutation strategies. One option is simply pruning the tree with increasing probability as the depth increases. Or, you could do something akin to normal breeding, except your mutation might cross-breed a picture with itself. You could even look at doing tree rotations! In nature, real mutation is quite complex. You may find that different mutation strategies yield very different results.
Animations
You will implement genetic cross dissolves as described in section 4.5 of the Sims paper. The idea is that the user selects two images that are (hopefully) genetically similar to each other and the asks for them to be cross-dissolved. To interpolate to a genotype between the two source genotypes, recursively walk down from the top. While the source genotypes are the same, continue recursively. Where they differ, evaluate both sides and linearly interpolate between the values they return. When implemented correctly on genetically similar functions, the result is pretty cool.
Writing your pictures and animations to disk
The user should have the ability to save any picture or animation he or she generates in the program. As a minimum, Pretty Pictures should be able to export pictures to PNG files, and to be able to export animations to an ordered series of PNG files. Do not export JPEGs (or any other lossy format). Do not export bitmaps. After writing gzip, you should understand the value of compression.
The PNG (Portable Network Graphics) format [Wikipedia] is a capable image format. A PNG is organized into "chunks"; the simplest PNG is a header chunk, a DEFLATEd (!) image data chunk, and a footer chunk. You can support PNG in one of two ways:
- You can write the PNG format yourself, which is pretty easy if you poke around the spec—much easier than gzip, since you only need to write "type 2" (8-bit-per-channel RGB) images. You can make use of Java's DEFLATE classes here too.
- You can use the
javax.imageioclasses built into Java. (This is the easy way out.)
Concurrency requirements
There are two equally important aspects of Pretty Pictures that require concurrency.
- Because core functionality in Pretty Pictures requires the use of long running algorithms (like image rendering), it is very important that your implementation allows the the GUI to always be responsive. You must do this by keeping long running computation in non-event threads. These long running computations must yield control to the event thread often enough to allow the event thread to respond to user input. Naturally, then, the user should be able to kill computation running in the background (via the stop button), be able to start the computation over (via the reload button), and be kept up to date about the progress of said computation. For the purpose of image rendering, we require that this "feedback" be incremental presentation of the image itself. When Pretty Pictures is rendering an image, you should be reminded of what it was like to load a web page full of pictures on a dial-up internet connection.
- You're also required to concurrently render more than one image at a time in the portion of your implementation of Pretty Pictures that displays the image grid. Think carefully about what methods of synchronization are required to do this. More than one solution is acceptable. Rendering images in order (one at a time) is not one of them.
Specific requirements
Now that Project 3's conceptual features have been specified, we can specify how these requirements should translate into your user interface.
Your program must have the following sections ("pages," if you will):
- Your program must have a grid section which contains a set of thumbnails. There can be variable number of thumbnails on this page, but your implementation must support the simultaneous rendering of at least twenty five pictures. The set of thumbnails in this area represents the current generation. The user should be given controls to select members of this generation which will serve as parents for the next generation. In addition, the user should be able to set a mutation rate. The user should be able to save and load the state of this page (that is, the set of genotypes for the current generation). This is similar to the "favorites" or "bookmarks" feature offered in most browsers.
- Your program must have a section which can display any rendered image by itself at a resolution which is user defined. There must be a simple link system between a thumbnail on the grid and the image at full size. (If you're ever confused about how your program should behave, think carefully about the browser metaphor. On most websites, the user is offered an easy method of navigation from a thumbnail to a full sized image -- the mouse click. This area of the program should also give the user the option to save his rendered image to disk.
- Your program should have an area which allows the user to load arbitrary image files from disk for user in the "ExternalImage" intrinsic function. You are only required to support the PNG format.
- Your program should have an area which plays an animation much like youtube plays silly videos. It is important to note that while it is only required that you write animations to disk as successive image dumps, you are required to support live preview of these animations in your program. There should be an intuitive link between this area of the program and the grid area, which will allow the user to animate a transition between any two images he or she has bred. This area should be able to generate an animation of user defined resolution and frame rate.
Final submission
In the end, you will have created a program that lets you explore cool images. As part of your final submission, you will create a Web page for all the world to see, that has three of your favorite images (rendered at 600x600 pixel resolution) as well as a screen-dump of your GUI as it's running with the user selecting images and also a screen-dump of the animation functionality. For each of the three images, your Web page should also show the genotype of the image.
To help your T.A.s (and yourself), you should also generate a Web page with a bunch of test images. You should have one image per intrinsic in your genome to prove to yourself (and to us) that you are implementing that intrinsic function properly. For example, color Perlin noise should look different from black and white Perlin noise. To demonstrate your RGB-To-YCrCb and reverse functions, you should probably choose suitable inputs, such as sweeps from left to right with black to white or red to green. You may find you want to add other intrinsic functions to help you with this testing. That's fine, but consider that you've already got some fairly powerful intrinsic functions already in there.
As soon as you've got a URL, even if there's nothing there yet, e-mail your grading group. We'll set up a big web page so you can look at each others' pretty pictures.
Optional stuff
Pretty Pictures (with no additional requirements) requires a substantial time commitment to be done right. However, if you're confident that your implementation is entirely bug-free and feature-complete, and you're looking for some extra points (possibly because you made Kyle too lame of a character in your adventure game), try some of these additional features:
- Normally, computer graphics images are defined to be a two-dimensional
array of (R,G,B) tuples, where
each value is a single byte (usually from zero through 255).
The functions described above, however, are described "continuously"
from -1.0 to 1.0. When you convert from an integer pixel coordinate to
a floating-point phenotype coordinate, and then look up a value in
an embedded image (converting back to integer pixels again), it's possible for the image coordinates
to map somewhere in between the discrete pixel points.
The solution you need to use is called bilinear interpolation.
In one dimension, interpolation is easy. If x is a value between
zero and one, and you have P0 and P1 at each point, the
linear interpolation is (1-x)*P0 + x*P1.
In two dimensions, your sample point is usually inside a box with four
samples on each edge. The way to think of the solution is that you
first compute the linear interpolation in the x-axis for the
bottom points and the top points, then you interpolate those intermediate
results on the y-axis. Assuming points P00, P01, P10, and P11, the final equation would be something like:
- t0 = (1-x)*P00 + x*P10.
- t1 = (1-x)*P01 + x*P11.
- result = (1-y)*t0 + y*t1.
- If you're feeling really gonzo (not required for this assignment), you could investigate trilinear mipmapping. The above case takes care of zooming in to an image, but could generate aliasing artifacts when you zoom out (e.g., moire patterns). Modern texture mapping hardware does this and it looks great, but the software speeds would be even slower.
- Make it so that your animation export actually exports an mpeg instead of just a series of PNGs.
- There are many other color spaces out there. If you're going for cool points, you might want to look at other matrix multiplication operators on the RGB 3-vectors.
- A time parameter (t) that varies from zero to one (or any other range) turns any expression into an animation. If it goes zero to one and back to zero, then it makes for a loop that can repeat infinitely. These may not be quite as cool as the genetic cross-dissolves, but they're easy to implement.
- Different statistical biases on the random selection of different intrinsic gene functions can help you tune what kinds of images you'll get. For example, once you load a picture of your face, you can make it more likely to appear.
- When you crossbreed, particularly with only a handful of parents, you may get many children with identical genotypes. You only need to send one of them on to the next generation, and otherwise you might as well try to generate another child.
- One interesting option is bookmarks. Whenever you find an image that you like, you could bookmark it and assign it some probability of occuring in future generations just like any other intrinsic function.
- Once you've found a clever way to warp one image, you're going to want to be able to apply it to lots of other images. You could take a particular genotype, turn the embedded image name into an external variable, and make a command-line utility (or Photoshop plug-in!) suitable for general use.
- Once you've bookmarked a lot of favorites, you may be able to automatically identify common subexpressions that are popular among your bookmarks. Those would be even better candidates for keeping around for future generations. (Hint: if you had the right hash function, you could detect common subexpressions by looking for hash collisions.)
- While most interesting image functions will have dependencies on both x and y, many of their subexpressions will be invariant with respect to those variables. You may be able to optimize away a non-trivial amount of computation through partial evaluation of the expressions.