You get to pick your own dataset.
Here are some repositories with many, many datasets to choose from:
You do not have to pick a dataset from one of these places. These are just suggestions.
Your data must be contain a mix of categorical and continuous variables and be complex enough that you can create 8 interesting graphs (or 6 graphs if your team only has 3 people). Datasets with only a few variables will not work.
You CANNOT use any of the datasets that were used in any previous assignments in this course or any other course you have taken. You must use a dataset that everyone in your group has never worked with before.
We are strongly encouraging groups to pick different datasets, so that no group is using the same dataset. If you choose a dataset that another group has already chosen, we might ask some groups to switch.
Be sure to read the guidelines on the graphics below. These will certainly influence what datasets you choose.
Go to Lab with your group members and work together to submit the assignment, in which you will finalize your choice of dataset and start to plan out your poster.
You will also sign up for your group’s Tuesday 4/10 check-in with Jerzy.
If you cannot make it to Lab, coordinate with your group to ensure that your group members are turning in the assignment, and make plans to make up for the work you missed. In general, be a good group member. Part of your score on this project will be based on teammates’ assessment of each other’s contributions.
There will be no lecture on Monday 4/9.
Instead, by the end of class time (1:20pm), the first draft of your poster should be complete. Each group member should review the work of all other group members, give constructive feedback on others’ graphics, and help solve any issues that arise. Use the Lecture10_Checklist.pdf cheat-sheet to help review each graph.
Also be sure to coordinate themes, fonts, color palettes, etc. across your graphs. For instance, if you map the categories of some variable to a certain color palette or a certain set of line types, use the same mapping for every graph showing that variable on your poster.
Again, there is no lecture that day, but you are welcome to hold your group check-in meeting at our usual class time & place.
We will hold 15-minute meetings with each team to review your posters on Tuesday, 4/10 during 10am to 5pm in BH 140D (Mac computer lab).
Each group is required to:
When bringing the poster to these meetings, just bring a small printed copy (on a 8.5x11 sheet of paper) and a computer copy (.pdf). The printed copy does not need to be in color.
There will be no lecture on Wednesday 4/11.
Instead, by the end of class time (1:20pm), each group should submit a single file, named Group[X].pdf, containing their final poster. Jerzy and the TAs will handle poster printing. Late posters will be deducted a substantial portion of the project grade and may not be printed in time for the presentations.
Each group will present their posters to Jerzy, the TAs, other professors, Statistics graduate students, and others in a public poster presentation.
Each group member is required to speak about two graphs when presenting the posters.
Please have at least one group member (but preferably, all group members) arrive 15 minutes early to set up your poster!
You are welcome to use your own template/design. See Canvas / Assignments / Static Graphics Group Project for PowerPoint templates and a \(\LaTeX\) template.
Keep in mind that your posters will be printed on a 3-feet tall, 4-feet wide poster. As such, ensure that your graphs are high-resolution! See Lab 09 for how to save high-resolution graphs.
Each group member is required to create two graphics.
It’s nearly the end of the semester, so the graphs you make should be (nearly) perfect. Take care to create excellent graphs that are informative and easy to understand. All graphs should be properly labeled, titled, etc.
Additional restrictions (these will certainly influence your choice of dataset):
Your graphs should tell a somewhat cohesive story. Come up with some general questions you want to answer with your dataset, and use your graphs to walk the viewer through a comprehensive analysis of those questions. Use your graphs to demonstrate your findings and conclusions.
Each group is required to make several of the following types of graphs:
(Homework 10 will give you practice graphing networks and time series.)
Each group can have no more than a few graphs that show a single variable (e.g. one-variable bar charts, histograms, density estimates).
Each group should not have more than a few of the same type of graph. For example, instead of many histograms, include some density plots instead. Instead of many scatterplots, include some heat maps, contour plots, regression plots, etc instead.
Nonetheless, always try to think: Is there a simpler graph that can tell the same story? Do not make your graphs unnecessarily complicated either.
Finally, coordinate with your teammates to use consistent design across all of the different graphs. If a variable is used in several graphs, use the same color scheme for that variable in all graphs (don’t map Male/Female to Blue/Red in one plot but Green/Purple in another). All team members should use the same ggplot theme (don’t have different font styles in different graphs). Keep it looking clean and professional.
However, requirement 1 (tell a cohesive story) is the most important. Requirements 2-5 (variety in graphs) will not make up for a poster of diverse but pointless graphs that convey no message about what you learned from the dataset.
Below each graph, you can have up to three bullet-points, each of which is no more than 1-2 sentences, describing the graph, takeaways, etc.
Each group should also designate a section of their poster to give a brief overview of the dataset. Cite the data source here (no need to put data-source captions on every graph separately), and describe important features (number of variables and observations; what do individual cases represent; any interesting aspects of the data collection process; etc.).
You are also permitted to have Introduction, Conclusions, and/or Acknowledgements sections that contain additional text. That said, written text should be kept to a minimum.
Each group member is required to speak about two graphics. These do not have to be the graphics they created, but it probably makes sense to do so.
Be professional and courteous to anyone visiting your poster. Assume that they do not know anything about your dataset, and be sure to explain the dataset / questions you’re trying to answer in a clear and concise way.
It should take about 5 minutes to go over your entire poster. Aim to summarize each graph and its main takeaways in less than a minute, if possible.
The Static Graphics Group Project is worth 15% of your final grade. This will be divided as follows:
You will earn a high grade on the check-in if you…
Your poster will earn a high grade if you…
Your presentation will earn a high grade if you…
Your group member evaluation will earn a high grade if you…
Remember to review the teamwork handouts and resources, from CMU’s Global Communication Center, to prevent or manage conflicts in group work: