Monthly Archives: October 2018

Need Statistic Project Ideas?

The Best Statistic Project Ideas

Or how about a data backed equivalent of r/circlejerk? (The original use case was determining what domains are the most popular.)Your group should HAND IN ONE PROJECT PROPOSAL (with all group members’ names and section leaders on it) by the proposal due date given above.Do certain subpopulations get mammograms more frequently than others?I imagine that this could come in handy for coaches attempting to get an edge over opponent teams and, more generally, for that cross-section between geeks and gamblers attempting to build analytic models to make better bets.Web 2.0 websites (like Reddit) are sometimes gamed by voting rings,” which are groups of people that intentionally vote up each other’s content, regardless of quality.The only practical application of neutrinos I’ve heard of, for instance, is because finance.” Should your algorithm buy Indonesian sesame seed futures? With weather data, it might know.

  • Align into a logical argument where evidence is presented by specific facts
  • a thorough description of how you will collect your data.
  • If you’re interested in building predictive models, Max Kuhn’s Applied Predictive Modeling is awesome and I highly recommend it. If you’re interested in visualizations, I’ve heard good things about Tufte’s The Visual Display of Quantitative Information, but I haven’t read it myself. (Edit: Commenter DivM recommends Visualize This and The Functional Art over Tufte.)

Stanford has 35 million Amazon reviews available for download.

Is Obama in bed with big oil? Or extremist environmentalists? Or the corn lobbies? And who was backing that Herman Cain dude, anyways? The 2012 Presidential Campaign Finance data is available for download.One way to start saving all those future lives might be by digging into this data set of every recorded meteor impact on Earth from 2500 BCE to 2012.Namely, if someone searches for something, what do they click on? Downsides: It’s a Russian search engine with Russian search results.Can we predict the order of the NFL draft based on characteristics of the players?

How will a statistics project sample help?

  • Clarity: Is it easy for your reader to understand what you did and the arguments you made?
  • Conclusions: Answer your question of interest.
  • State your question up front, and use statistics to help answer it. The statistics should not drive the question; the question should drive the statistics.
  • Statement of the problem: Describe the questions you address and any key issues surrounding the questions.
  • A purpose statement
  • Use this data set from Donors Choose to determine the characteristics that make the funding of projects more likely. You could send your results to the Donors Choose folks to help them improve the funding rate for their projects. (Difficulty: Mediumish; Effort: Mediumish)
  • Results: Present relevant descriptive statistics (e.g., number of men and women surveyed, if that is important). Include tables or graphs that support your analyses (be judicious here–too many tables and graphs hurts the clarity of your message).

Statistic Project Ideas

The Federal Railroad administration keeps a list of railroad safety information including accidents and incidents, inventory and highway-rail crossing data.” Someone (like the NY Times) could overlay this on a map of the United States and figure out if people in poor regions are more likely to be hit by trains or something.

  • Make use of relevant terminology
  • Be selective with computer output to help clarity.
  • your research question (including a brief description of why this question is of interest),
  • Most importantly, talk to your instructor and TAs for advice. You can ask them, for example, about your planned methods of analysis and see what they think.
  • “The Texas Transportation Institute’s latest Urban Mobility Report puts the annual cost of congestion to the nation, including both travel delays and expenditures on fuel, at more than $100 billion.”
  • Analysis of the data


I think Steven Pinker’s data is in there someone.Here’s some supermarket data from there.

  • Building an aggregator for statistics papers across disciplines that can be the central resource for statisticians. Journals ranging from PLoS Genetics to Neuroimage now routinely publish statistical papers. But there is no one central resource that aggregates all the statistics papers published across disciplines. Such a resource would be hugely useful to statisticians. You could build it using blogging software like WordPress so articles could be tagged/you could put the resource in your RSS feeder. (Difficulty: Lowish; Effort: Mediumish)
  • Download data on state of the union speeches from here and use the tm package in R to analyze the patterns of word use over time (Difficulty: Lowish; Effort: Lowish)
  • Methods: describe the methodology you used to collect and analyze the data.
  • Discussion: What implications do your results have for the population you sampled from? What could be done to improve the study if it was done again? What types of biases might exist?
  • Analyses: Describe the analyses you did. Be ready to explain why you believe these methods are justified.
  • Data

Interesting Data Sets

1) Send an e-mail to the instructor explaining how the group members have not contributed adequately.

Wondering what the internet really cares about? Well, I don’t know about that, but you could answer an easier question: What does Reddit care about? Someone has scraped the top 2.5 million Reddit posts and then placed them on GitHub.With this they write, ” We find an accuracy of 86.7% in predicting the daily up and down changes in the closing values of the DJIA.” A number of Twitter data sets are freely available here.

Do FOCUS students at Duke eat, sleep, and go to parties with different frequencies than non-FOCUS students?While all of Wikipedia is freely available, DBpedia is an attempt to synthesize it into a more structured format.There is no formal write-up of your project, i.e., no term paper is written.Maybe you could make yourself a clone?Who is the United States killing with drones? If you’re content with Pakistan specific data, there is a list of drone strikes available here.* The Reference Energy Disaggregation Data Set has about 500 GB of compressed data on home energy use.