Seminar 1 Introduction to Data Science

Содержание

1. Seminar 1 Introduction to Data Science
2. Grades50% - home assignments, 50% - group
3. DefinitionData analysis is the process of transforming
4. Data analysis techniquesData mining automatic discovery of
5. Two cultures of data analysisData is generated
6. Data modeling cultureStarts with assuming a data
7. Algorithmic modeling cultureConsiders the inside of the
8. Why do you need to learn data
9. Data manipulation by Tim Cook https://www.statschat.org.nz/2013/09/11/cumulative-totals-tend-to-increase/
10. Even academic superstars may be wronghttp://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646
11. A lot of fraud in science (especially in social sciences)https://www.financial-math.org/blog/2015/10/is-research-in-finance-and-economics-reproducible/
12. Random chance plays a huge role in social sciences http://www.tylervigen.com/spurious-correlations
13. Intuition might be wrongSimpson’s paradox: graduate admissions to UCB
14. Intuition might be wrongSimpson’s paradox: graduate admissions to UCB
15. Intuition might be wrong, part 2Monty Hall problemhttps://en.wikipedia.org/wiki/Monty_Hall_problemHumans vs birds: birds win (Herbranson, 2010)
16. RR is a language of statistical computingModern
17. P.S.Calling Bullshit is a highly recommended online course at the University of Washington http://callingbullshit.org/syllabus.html#Introduction
18. Скачать презентанцию

Grades50% - home assignments, 50% - group project96-100% - 10, 90-95% - 9, 80-89% - 8, 75-79% - 7, 65-74% - 6, 55-64% - 5, 45-54% - 4, 35-44% - 3,

Слайды и текст этой презентации

Слайд 1Seminar 1 Introduction to Data Science

Mikhail Kamrotov
Data Analysis in R

Слайд 2Grades
50% - home assignments, 50% - group project
96-100% - 10,

90-95% - 9, 80-89% - 8, 75-79% - 7, 65-74%

- 6, 55-64% - 5, 45-54% - 4, 35-44% - 3, 25-34% - 2, 0-24% - 1
You can work in pairs
Best solutions could be presented in class (5 minute talk) to get some extra points

Grades50% - home assignments, 50% - group project96-100% - 10, 90-95% - 9, 80-89% - 8, 75-79%

Слайд 3Definition
Data analysis is the process of transforming raw data into

usable information, often presented in the form of a published

analytical article, in order to add value to the statistical output. (OECD)
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making (Wikipedia)
Both miss one important step – collecting data.
Most theories are about modeling, but 80% of the time a data scientist spends on data collection and cleansing

DefinitionData analysis is the process of transforming raw data into usable information, often presented in the form

Слайд 4Data analysis techniques
Data mining
automatic discovery of useful information in

large data repositories
Descriptive statistics
summarizing features of data
Exploratory data analysis
finding

new features in data
Confirmatory data analysis
hypotheses testing
Predictive analytics
deriving predictions from data
Text analytics
extracting information from textual (i.e. unstructured) data

Data analysis techniquesData mining automatic discovery of useful information in large data repositories Descriptive statisticssummarizing features of

Слайд 5Two cultures of data analysis
Data is generated by a black

box
Input variables x (independent variables) go in one side (time

you spend on your home assignments)
On the other side the response variables y come out (your grades)
Two main goals: prediction and information
Two approaches: data modeling culture and algorithmic modeling culture

Two cultures of data analysisData is generated by a black boxInput variables x (independent variables) go in

Слайд 6Data modeling culture

Starts with assuming a data model for the

inside of the black box
The values of the parameters

are estimated from the data and the model then used for information and/or prediction
Model validation: goodness-of-fit tests

Data modeling cultureStarts with assuming a data model for the inside of the black box The values

Слайд 7Algorithmic modeling culture
Considers the inside of the box complex and

unknown
Tries to find a function f(x) - an algorithm that

operates on x to predict the responses y
Model validation: predictive accuracy

Algorithmic modeling cultureConsiders the inside of the box complex and unknownTries to find a function f(x) -

Слайд 8Why do you need to learn data analysis
Valuable skill that

is highly remunerative
Things sometimes are not as obvious as they

seem at first sight
Ability to verify results produced by your colleagues
The only way to make scientific contribution and verify theories, especially in social sciences

Why do you need to learn data analysisValuable skill that is highly remunerativeThings sometimes are not as

Слайд 9Data manipulation by Tim Cook
https://www.statschat.org.nz/2013/09/11/cumulative-totals-tend-to-increase/

Слайд 10Even academic superstars may be wrong
http://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646

Слайд 11A lot of fraud in science (especially in social sciences)
https://www.financial-math.org/blog/2015/10/is-research-in-finance-and-economics-reproducible/

A lot of fraud in science (especially in social sciences)https://www.financial-math.org/blog/2015/10/is-research-in-finance-and-economics-reproducible/

Слайд 12Random chance plays a huge role in social sciences
http://www.tylervigen.com/spurious-correlations

Слайд 13Intuition might be wrong
Simpson’s paradox: graduate admissions to UCB

Слайд 14Intuition might be wrong
Simpson’s paradox: graduate admissions to UCB

Слайд 15Intuition might be wrong, part 2
Monty Hall problem
https://en.wikipedia.org/wiki/Monty_Hall_problem
Humans vs birds: birds

win (Herbranson, 2010)

Слайд 16R
R is a language of statistical computing
Modern social sciences speak

mostly this language (and Python as well)
R download link: https://cran.r-project.org
RStudio

download: https://www.rstudio.com/products/rstudio/download/#download

RR is a language of statistical computingModern social sciences speak mostly this language (and Python as well)R

Слайд 17P.S.
Calling Bullshit is a highly recommended online course at the

University of Washington http://callingbullshit.org/syllabus.html#Introduction

P.S.Calling Bullshit is a highly recommended online course at the University of Washington http://callingbullshit.org/syllabus.html#Introduction

Скачать презентацию

Разделы презентаций

Seminar 1 Introduction to Data Science

Содержание

Слайды и текст этой презентации

Слайд 1Seminar 1 Introduction to Data Science

Mikhail Kamrotov
Data Analysis in R

Слайд 2Grades
50% - home assignments, 50% - group project
96-100% - 10,

90-95% - 9, 80-89% - 8, 75-79% - 7, 65-74%

Слайд 3Definition
Data analysis is the process of transforming raw data into

usable information, often presented in the form of a published

Слайд 4Data analysis techniques
Data mining
automatic discovery of useful information in

large data repositories
Descriptive statistics
summarizing features of data
Exploratory data analysis
finding

Слайд 5Two cultures of data analysis
Data is generated by a black

box
Input variables x (independent variables) go in one side (time

Слайд 6Data modeling culture

Starts with assuming a data model for the

inside of the black box
The values of the parameters

Слайд 7Algorithmic modeling culture
Considers the inside of the box complex and

unknown
Tries to find a function f(x) - an algorithm that

Слайд 8Why do you need to learn data analysis
Valuable skill that

is highly remunerative
Things sometimes are not as obvious as they

Слайд 9Data manipulation by Tim Cook
https://www.statschat.org.nz/2013/09/11/cumulative-totals-tend-to-increase/

Слайд 10Even academic superstars may be wrong
http://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646

Слайд 11A lot of fraud in science (especially in social sciences)
https://www.financial-math.org/blog/2015/10/is-research-in-finance-and-economics-reproducible/

Слайд 12Random chance plays a huge role in social sciences
http://www.tylervigen.com/spurious-correlations

Слайд 13Intuition might be wrong
Simpson’s paradox: graduate admissions to UCB

Слайд 14Intuition might be wrong
Simpson’s paradox: graduate admissions to UCB

Слайд 15Intuition might be wrong, part 2
Monty Hall problem
https://en.wikipedia.org/wiki/Monty_Hall_problem
Humans vs birds: birds

win (Herbranson, 2010)

Слайд 16R
R is a language of statistical computing
Modern social sciences speak

mostly this language (and Python as well)
R download link: https://cran.r-project.org
RStudio

Слайд 17P.S.
Calling Bullshit is a highly recommended online course at the

University of Washington http://callingbullshit.org/syllabus.html#Introduction

Обратная связь

Что такое TheSlide.ru?

Разделы презентаций

Seminar 1 Introduction to Data Science

Содержание

Слайды и текст этой презентации

Слайд 1Seminar 1 Introduction to Data ScienceMikhail KamrotovData Analysis in R

Слайд 2Grades50% - home assignments, 50% - group project96-100% - 10,

90-95% - 9, 80-89% - 8, 75-79% - 7, 65-74%

Слайд 3DefinitionData analysis is the process of transforming raw data into

usable information, often presented in the form of a published

Слайд 4Data analysis techniquesData mining automatic discovery of useful information in

large data repositories Descriptive statisticssummarizing features of dataExploratory data analysisfinding

Слайд 5Two cultures of data analysisData is generated by a black

boxInput variables x (independent variables) go in one side (time

Слайд 6Data modeling cultureStarts with assuming a data model for the

inside of the black box The values of the parameters

Слайд 7Algorithmic modeling cultureConsiders the inside of the box complex and

unknownTries to find a function f(x) - an algorithm that

Слайд 8Why do you need to learn data analysisValuable skill that

is highly remunerativeThings sometimes are not as obvious as they

Слайд 9Data manipulation by Tim Cook https://www.statschat.org.nz/2013/09/11/cumulative-totals-tend-to-increase/

Слайд 10Even academic superstars may be wronghttp://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646

Слайд 11A lot of fraud in science (especially in social sciences)https://www.financial-math.org/blog/2015/10/is-research-in-finance-and-economics-reproducible/

Слайд 12Random chance plays a huge role in social sciences http://www.tylervigen.com/spurious-correlations

Слайд 13Intuition might be wrongSimpson’s paradox: graduate admissions to UCB

Слайд 14Intuition might be wrongSimpson’s paradox: graduate admissions to UCB

Слайд 15Intuition might be wrong, part 2Monty Hall problemhttps://en.wikipedia.org/wiki/Monty_Hall_problemHumans vs birds: birds

win (Herbranson, 2010)

Слайд 16RR is a language of statistical computingModern social sciences speak

mostly this language (and Python as well)R download link: https://cran.r-project.org RStudio

Слайд 17P.S.Calling Bullshit is a highly recommended online course at the

University of Washington http://callingbullshit.org/syllabus.html#Introduction

Похожие презентации

Обратная связь

Что такое TheSlide.ru?

Слайд 1Seminar 1 Introduction to Data Science

Mikhail Kamrotov
Data Analysis in R

Слайд 2Grades
50% - home assignments, 50% - group project
96-100% - 10,

Слайд 3Definition
Data analysis is the process of transforming raw data into

Слайд 4Data analysis techniques
Data mining
automatic discovery of useful information in

large data repositories
Descriptive statistics
summarizing features of data
Exploratory data analysis
finding

Слайд 5Two cultures of data analysis
Data is generated by a black

box
Input variables x (independent variables) go in one side (time

Слайд 6Data modeling culture

Starts with assuming a data model for the

inside of the black box
The values of the parameters

Слайд 7Algorithmic modeling culture
Considers the inside of the box complex and

unknown
Tries to find a function f(x) - an algorithm that

Слайд 8Why do you need to learn data analysis
Valuable skill that

is highly remunerative
Things sometimes are not as obvious as they

Слайд 9Data manipulation by Tim Cook
https://www.statschat.org.nz/2013/09/11/cumulative-totals-tend-to-increase/

Слайд 10Even academic superstars may be wrong
http://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646

Слайд 11A lot of fraud in science (especially in social sciences)
https://www.financial-math.org/blog/2015/10/is-research-in-finance-and-economics-reproducible/

Слайд 12Random chance plays a huge role in social sciences
http://www.tylervigen.com/spurious-correlations

Слайд 13Intuition might be wrong
Simpson’s paradox: graduate admissions to UCB

Слайд 14Intuition might be wrong
Simpson’s paradox: graduate admissions to UCB

Слайд 15Intuition might be wrong, part 2
Monty Hall problem
https://en.wikipedia.org/wiki/Monty_Hall_problem
Humans vs birds: birds

Слайд 16R
R is a language of statistical computing
Modern social sciences speak

mostly this language (and Python as well)
R download link: https://cran.r-project.org
RStudio

Слайд 17P.S.
Calling Bullshit is a highly recommended online course at the