Data Science Statistics Cheat Sheet



  1. Data Science Statistics Cheat Sheet 2020
  2. Statistics Cheat Sheet For Data Science
  3. Cheat Sheet For Statistics Exam
  4. Statistics Cheat Sheet Pdf
By Ajay Ohri, May 2014.

Since this rather practical definition of a data scientist is reinforced by the accompanying words on a job website for “data scientists”, ergo, here are some tools for learning the primary languages in data science- Python, R and SQL. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language. I hope this huge list will be helpful to you. If you like any of the cheatsheet whether it will be machine learning algorithms cheat sheet or scikit-learn cheat sheet or data visualization cheat sheet or keras cheat sheet or tensorflow cheat sheet or any other cheatsheets, then please share this list with others, so they can also use this in machine learning or data science task.

This new cheat sheet will be included in my upcoming book Machine Learning: Foundations, Toolbox, and Recipes to be published in September 2019, and available (for free) to Data Science Central members exclusively. This cheat sheet is 14 pages long.


Over the past few years, as the buzz and apparently the demand for data scientists has continued to grow, people are eager to learn how to join, learn, advance and thrive in this seemingly lucrative profession. As someone who writes on analytics and occasionally teaches it, I am often asked - How do I become a data scientist?
Adding to the complexity of my answer is data science seems to be a multi-disciplinary field, while the university departments of statistics, computer science and management deal with data quite differently.
But to cut the marketing created jargon aside, a data scientist is simply a person who can write code in a few languages (primarily R, Python and SQL) for data querying, manipulation , aggregation, and visualization using enough statistical knowledge to give back actionable insights to the business for making decisions.
Since this rather practical definition of a data scientist is reinforced by the accompanying words on a job website for “data scientists” , ergo, here are some tools for learning the primary languages in data science- Python, R and SQL. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate.
The inclusion of SQL may lead to some to feel surprised (isn’t this the NoSQL era?) , but it is there for a logical reason. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. In addition one can solely use the sqldf package within R (and the less widely used python-sql or python-sqlparse libraries for Pythonic data scientists) or even the Proc SQL commands within the old champion language SAS, and do most of what a data scientist is expected to do (at least in data munging).
For Python, this is a rather partial list given the fact that Python, the most general purpose language within the data scientist quiver, can be used for many things. But for the data scientist, the packages of numpy, scipy , pandasData Science Statistics Cheat SheetCheat and scikit-learn seem the most pertinent.
Do all the thousands of R packages have useful interest to the aspiring data scientist? No.
Accordingly we chose the appropriate cheat sheets for you. Note that this is a curated list of lists. If there is anything that can be assumed in the field of data science, it should be that the null hypothesis is that the data scientist is intelligent enough to make his own decisions based on data and it’s context. 3 printouts is all it takes to speed up the aspiring data scientist’s journey.
Please add additional cheat sheets in comments below.

Data Science Statistics Cheat Sheet 2020

Cheat Sheets for Python
  • Python www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf
  • NumPy, SciPy and Pandas s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf

Statistics Cheat Sheet For Data Science


Cheat Sheets for R
  • Short Reference Card cran.r-project.org/doc/contrib/Short-refcard.pdf
  • R Functions for Regression Analysis cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
  • Time Series cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
  • Data Mining cran.r-project.org/doc/contrib/YanchangZhao-refcard-data-mining.pdf
  • Quandl s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+R+Cheat+Sheet.pdf

Cross Reference between R, Python (and Matlab)
Statistics Cheat Sheets for SQL
  • SQL Joins www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
  • SQL and Hive hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf

Additional
  • Cheat Sheets for Java introcs.cs.princeton.edu/java/11cheatsheet/
  • Linux Cheat Sheet www.linuxstall.com/linux-command-line-tips-that-every-linux-user-should-know/

Ajay Ohri is a popular writer and blogger on Analytics and Data Mining and is the author of R for Business Analytics book (Springer, 2012).

You’ve got data, you’ve got a hypothesis, you even gathered the courage to test it. But how do you choose the right test? Well, really you should have thought about this before you collected the data. Maybe you did and maybe you didn’t. I won’t judge (ok, maybe a little). Instead, I’ll offer you a handy cheat sheet that could help you navigate through the various tests and their assumptions whenever you need to do that.

When I just started my PhD program, I had very little knowledge of applied statistics in behavioural sciences. I had a computer science degree and could do probability, but was not friends with t-tests and ANOVAs. Thankfully, statistics courses were a degree requirement and I enjoyed them so much that I ended up teaching and tutoring undergraduates in the following years.

Cheat Sheet For Statistics Exam

This cheat sheet came about while I was taking the introductory univariate statistics course back in 2009. It took a significant number of neurons to remember all the tests and their assumptions, so I decided to follow Einstein’s advice and “Never memorize something that you can look up.”

Statistics Cheat Sheet Pdf

Yellow blocks show the questions to answer / decisions to make about your data, orange blocks show transformations required to use the tests and blue blocks show the tests and their assumptions. Don’t forget, this cheat sheet is just a start and is not a complete guide. Once it helped you pick a test, deep dive into the test’s details in your stats book or online resources.