标签归档:机器学习算法

Deep Learning Specialization on Coursera

李航老师《统计学习方法(第二版)》出版及统计学习方法第一版PPT课件下载

李航老师《统计学习方法(第二版)》出版了,以下是相关介绍。

内 容 简 介

统计学习方法即机器学习方法,是计算机及其应用领域的一门重要学科。
本书分为监督学 习和无监督学习两篇,全面系统地介绍了统计学习的主要方法。
包括感知机、k 近邻法、朴素贝 叶斯法、决策树、逻辑斯谛回归与最大熵模型、
支持向量机、提升方法、EM 算法、隐马尔可夫 模型和条件随机场,以及聚类方法、
奇异值分解、主成分分析、潜在语义分析、概率潜在语义分 析、马尔可夫链蒙特卡罗法、
潜在狄利克雷分配和 PageRank 算法等。除有关统计学习、监督学 习和无监督学习的概
论和总结的四章外,每章介绍一种方法。叙述力求从具体问题或实例入手, 由浅入深,
阐明思路,给出必要的数学推导,便于读者掌握统计学习方法的实质,学会运用。
为满足读者进一步学习的需要,书中还介绍了一些相关研究,给出了少量习题,
列出了主要参 考文献。 本书是统计机器学习及相关课程的教学参考书,
适用于高等院校文本数据挖掘、信
息检索及自然语言处理等专业的大学生、
研究生,也可供从事计算机应用相关专业的研发人员 参考。

第二版序言

《统计学习方法》第一版于2012年出版,讲述了统计机器学习方法,主要是一些常用的监督学习方法.
第二版增加了一些常用的无监督学习方法,由此本书涵盖了传统统计机器学习方法的主要内容.
在撰写《统计学习方法》伊始,对全书内容做了初步规划.第一版出版之后,即着手无监督学习方法
的写作.由于写作是在业余时间进行,常常被主要工作打断,历经六年时间才使这部分工作得以完成.
犹未能加入深度学习和强化学习等重要内容,希望今后能够增补,完成整本书的写作计划.
《统计学习方法》的出版正值大数据和人工智能的热潮,生逢其时,截至2018年4月本书共印刷21次,
133,000册,得到了广大读者的欢迎和支持.有许多读者指出本书对学习和掌握机器学习技术有极大的
帮助.也有许多读者通过电子邮件、微博等方式指出书中的错误,提出改进的建议和意见.一些高校
将本书作为机器学习课程的教材或参考书.有的同学在网上发表了读书笔记.有的同学将本书介绍的
方法在计算机上实现.清华大学深圳研究生院袁春老师精心制作了第一版十二章的课件,在网上公布,
为大家提供教学之便.众多老师、同学、读者的支持和鼓励,让作者深受感动和鼓舞.在这里向所有
的老师、同学、读者致以诚挚的谢意!
能为中国的计算机科学、人工智能领域做出一点微薄的贡献,感到由衷的欣慰,同时也感受到作为知
识转播者的重大责任,让作者决意把本书写好.也希望大家今后不吝指教,多提宝贵意见,以帮助继
续提高本书的质量.在写作中作者也深切体会到教学相长的道理,经常发现自己对基础知识的掌握不
够扎实,整个过程帮助自己对相关知识进行了深入的学习,受益匪浅.
本书是一部机器学习的基本读物,要求读者拥有高等数学、线性代数和概率统计的基础知识.书中主
要讲述统计机器学习的方法,力求系统全面又简明扼要地阐述这些方法的理论、算法和应用,使读者
能对这些机器学习的基本技术有很好的掌握.针对每个方法,详细介绍其基本原理,基础理论,实际
算法,通常给出细致数学推导和具体实例,帮助读者理解,也便于日后复习.
第二版增加的无监督学习方法的初稿,王泉、陈嘉怡、柴琛林、赵程绮等帮助做了认真细致的校阅,
提出了许多宝贵意见,在此谨对他们表示衷心的感谢.清华大学出版社的薛慧编辑一直给予非常专业
的指导和帮助,在此也对她表示衷心的感谢.
由于作者水平有限,本书一定存在不少错误,恳请各位专家、老师、同学、读者批评指正.

李 航
2018年4月23日

目录

第一篇 监督学习
    
第二篇 无监督学习
第13章 无监督学习概论        
13.1.1 无监督学习基本原理
13.1.2 基本问题
13.1.3 机器学习三要素
13.1.4 无监督学习方法

第14章 聚类方法
14.1 聚类的基本概念
14.1.1 相似度或距离
14.1.2 类或簇
14.1.3 类与类之间的距离
14.2 层次聚类
14.3 k均值聚类
14.3.1 模型
14.3.2 策略
14.3.3 算法
14.3.4 算法特点
本章概要
继续阅读
习题
参考文献             

第15章 奇异值分解
15.1 奇异值分解的定义与性质
15.1.1 定义与定理
15.1.2 紧奇异值分解与截断奇异值分解
15.1.3 几何解释
15.1.4 主要性质
15.2 奇异值分解的计算
15.3 奇异值分解与矩阵近似
15.3.1 弗罗贝尼乌斯范数
15.3.2 矩阵的最优近似
15.3.3 矩阵的外积展开式
本章概要
继续阅读
习题
参考文献

第16章 主成分分析
16.1 总体主成分分析
16.1.1 基本想法
16.1.2 定义和导出
16.1.3 主要性质
16.1.4 主成分的个数
16.1.5 规范化变量的总体主成分
16.2 样本主成分分析
16.2.1 样本主成分的定义和性质
16.2.2 相关矩阵的特征值分解算法
16.2.3 数据局正的奇异值分解算法
本章概要
继续阅读
习题
参考文献

第17章 潜在语义分析
17.1 单词向量空间与话题向量空间
17.1.1 单词向量空间
17.1.2 话题向量空间
17.2 潜在语义分析算法
17.2.1 矩阵奇异值分解算法
17.2.2 例子
17.3 非负矩阵分解算法
17.3.1 非负矩阵分解
17.3.2 潜在语义分析模型
17.3.3 非负矩阵分解的形式化
17.3.4 算法
本章概要
继续阅读
习题
参考文献

第18章 概率潜在语义分析
18.1 概率潜在语义分析模型
18.1.1 基本想法
18.1.2 生成模型
18.1.3 共现模型
18.1.4 模型性质
18.2 概率潜在语义分析的算法
本章概要
继续阅读
习题
参考文献

第19章 马尔可夫链蒙特卡罗法
19.1 蒙特卡罗法
19.1.1 随机抽样
19.1.2 数学期望估计
19.1.3 积分计算
19.2 马尔可夫链
19.2.1 基本定义
19.2.2 离散状态马尔可夫链
19.2.3 连续状态马尔可夫链
19.2.4 马尔可夫链的性质
19.3 马尔可夫链蒙特卡罗法
19.3.1 基本想法
19.3.2 基本步骤
19.3.3 马尔可夫链蒙特卡罗法与统计学习
19.4 Metropolis-Hastings算法
19.4.1 基本原理
19.4.2 Metropolis-Hastings算法
19.4.3 单分量Metropolis-Hastings算法
19.5 吉布斯抽样
19.5.1 基本原理
19.5.2 吉布斯抽样算法
19.5.3 抽样计算
本章概要
继续阅读
习题
参考文献

第20章 潜在狄利克雷分配
20.1 狄利克雷分布
20.1.1 分布定义
20.1.2 共轭先验
20.2 潜在狄利克雷分配模型
20.2.1 基本想法
20.2.2 模型定义
20.2.3 概率图模型
20.2.4 随机变量序列的可交换性
20.2.5 概率公式
20.3 LDA的吉布斯抽样算法
20.3.1 基本想法
20.3.2 算法的主要部分
20.3.3 算法的后处理
20.3.4 算法
20.4 LDA的变分EM算法
20.4.1 变分推理
20.4.2 变分EM算法
20.4.3 算法推导
20.4.4 算法总结
本章概要
继续阅读
习题
参考文献

第21章 PageRank算法
21.1 PageRank的定义
21.1.1 基本想法
21.1.2 有向图和随机游走模型
21.1.3 PageRank的基本定义
21.1.4 PageRank的一般定义
21.2 PageRank的计算
21.2.1 迭代算法
21.2.2 幂法
21.3.3 代数算法
本章概要
继续阅读
习题
参考文献

第22章 无监督学习方法总结
22.1 无监督学习方法的关系和特点
22.1.1 各种方法之间的关系
22.1.2 无监督学习方法
22.1.3 基础及其学习方法
22.2 话题模型之间的关系和特点
参考文献

附录A 梯度下降法
附录B 牛顿法和拟牛顿法
附录C 拉格朗日对偶性
附录D 矩阵的基本子空间
附录E KL散度的定义和狄利克雷分布的性质
                  
索引

有同学将第一版相关的机器学习算法用python实现了一遍,可以参考:

https://github.com/WenDesi/lihang_book_algorithm (致力于将李航博士《统计学习方法》一书中所有算法实现一遍)

最后附上清华大学深圳研究生院袁春老师精心制作了第一版十二章的课件,关注公众号AINLP,回复”tongjixuexi”获取:

Coursera上机器学习课程(公开课)汇总推荐

Coursera上有很多机器学习课程,这里做个总结,因为机器学习相关的概念和应用很多,这里推荐的课程仅限于和机器学习直接相关的课程,虽然深度学习属于机器学习范畴,这里暂时也将其排除在外,后续会专门推出深度学习课程的系列推荐。

1. Andrew Ng 老师的 机器学习课程(Machine Learning)

机器学习入门首选课程,没有之一。这门课程从一开始诞生就备受瞩目,据说全世界有数百万人通过这门课程入门机器学习。课程的级别是入门级别的,对学习者的背景要求不高,Andrew Ng 老师讲解的又很通俗易懂,所以强烈推荐从这门课程开始走入机器学习。课程简介:

机器学习是一门研究在非特定编程条件下让计算机采取行动的学科。最近二十年,机器学习为我们带来了自动驾驶汽车、实用的语音识别、高效的网络搜索,让我们对人类基因的解读能力大大提高。当今机器学习技术已经非常普遍,您很可能在毫无察觉情况下每天使用几十次。许多研究者还认为机器学习是人工智能(AI)取得进展的最有效途径。在本课程中,您将学习最高效的机器学习技术,了解如何使用这些技术,并自己动手实践这些技术。更重要的是,您将不仅将学习理论知识,还将学习如何实践,如何快速使用强大的技术来解决新问题。最后,您将了解在硅谷企业如何在机器学习和AI领域进行创新。 本课程将广泛介绍机器学习、数据挖掘和统计模式识别。相关主题包括:(i) 监督式学习(参数和非参数算法、支持向量机、核函数和神经网络)。(ii) 无监督学习(集群、降维、推荐系统和深度学习)。(iii) 机器学习实例(偏见/方差理论;机器学习和AI领域的创新)。课程将引用很多案例和应用,您还需要学习如何在不同领域应用学习算法,例如智能机器人(感知和控制)、文本理解(网络搜索和垃圾邮件过滤)、计算机视觉、医学信息学、音频、数据库挖掘等领域。

这里有老版课程评论,非常值得参考推荐:Machine Learning

2. 台湾大学林轩田老师的 機器學習基石上 (Machine Learning Foundations)—Mathematical Foundations

如果有一定的基础或者学完了Andrew Ng老师的机器学习课程,这门机器学习基石上-数学基础可以作为进阶课程。林老师早期推出的两门机器学习课程口碑和难度均有:机器学习基石机器学习技法 ,现在重组为上和下,非常值得期待:

Machine learning is the study that allows computers to adaptively improve their performance with experience accumulated from the data observed. Our two sister courses teach the most fundamental algorithmic, theoretical and practical tools that any user of machine learning needs to know. This first course of the two would focus more on mathematical tools, and the other course would focus more on algorithmic tools. [機器學習旨在讓電腦能由資料中累積的經驗來自我進步。我們的兩項姊妹課程將介紹各領域中的機器學習使用者都應該知道的基礎演算法、理論及實務工具。本課程將較為著重數學類的工具,而另一課程將較為著重方法類的工具。]

3. 台湾大学林轩田老师的 機器學習基石下 (Machine Learning Foundations)—Algorithmic Foundations

作为2的姊妹篇,这个机器学习基石下-算法基础 更注重机器学习算法相关知识:

Machine learning is the study that allows computers to adaptively improve their performance with experience accumulated from the data observed. Our two sister courses teach the most fundamental algorithmic, theoretical and practical tools that any user of machine learning needs to know. This second course of the two would focus more on algorithmic tools, and the other course would focus more on mathematical tools. [機器學習旨在讓電腦能由資料中累積的經驗來自我進步。我們的兩項姊妹課程將介紹各領域中的機器學習使用者都應該知道的基礎演算法、理論及實務工具。本課程將較為著重方法類的工具,而另一課程將較為著重數學類的工具。

可参考早期的老版本课程评论:機器學習基石 (Machine Learning Foundations) 機器學習技法 (Machine Learning Techniques)

4. 华盛顿大学的 “机器学习专项课程(Machine Learning Specialization)

这个系列课程包含4门子课程,分别是 机器学习基础:案例研究 , 机器学习:回归 , 机器学习:分类, 机器学习:聚类与检索:

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.

4.1 Machine Learning Foundations: A Case Study Approach(机器学习基础: 案例研究)

你是否好奇数据可以告诉你什么?你是否想在关于机器学习促进商业的核心方式上有深层次的理解?你是否想能同专家们讨论关于回归,分类,深度学习以及推荐系统的一切?在这门课上,你将会通过一系列实际案例学习来获取实践经历。

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies. At the end of the first course you will have studied how to predict house prices based on house-level features, analyze sentiment from user reviews, retrieve documents of interest, recommend products, and search for images. Through hands-on practice with these use cases, you will be able to apply machine learning methods in a wide range of domains. This first course treats the machine learning method as a black box. Using this abstraction, you will focus on understanding tasks of interest, matching these tasks to machine learning tools, and assessing the quality of the output. In subsequent courses, you will delve into the components of this black box by examining models and algorithms. Together, these pieces form the machine learning pipeline, which you will use in developing intelligent applications. Learning Outcomes: By the end of this course, you will be able to: -Identify potential applications of machine learning in practice. -Describe the core differences in analyses enabled by regression, classification, and clustering. -Select the appropriate machine learning task for a potential application. -Apply regression, classification, clustering, retrieval, recommender systems, and deep learning. -Represent your data as features to serve as input to machine learning models. -Assess the model quality in terms of relevant error metrics for each task. -Utilize a dataset to fit a model to analyze new data. -Build an end-to-end application that uses machine learning at its core. -Implement these techniques in Python.

4.2 Machine Learning: Regression(机器学习: 回归问题)

这门课程关注机器学习里面的一个基本问题: 回归(Regression), 也通过案例研究(预测房价)的方式进行回归问题的学习,最终通过Python实现相关的机器学习算法。

Case Study – Predicting Housing Prices In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,…). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression. In this course, you will explore regularized linear regression models for the task of prediction and feature selection. You will be able to handle very large sets of features and select between models of various complexity. You will also analyze the impact of aspects of your data — such as outliers — on your selected models and predictions. To fit these models, you will implement optimization algorithms that scale to large datasets. Learning Outcomes: By the end of this course, you will be able to: -Describe the input and output of a regression model. -Compare and contrast bias and variance when modeling data. -Estimate model parameters using optimization algorithms. -Tune parameters with cross validation. -Analyze the performance of the model. -Describe the notion of sparsity and how LASSO leads to sparse solutions. -Deploy methods to select between models. -Exploit the model to form predictions. -Build a regression model to predict prices using a housing dataset. -Implement these techniques in Python.

4.3 Machine Learning: Classification(机器学习:分类问题)

这门课程关注机器学习里面的另一个基本问题: 分类(Classification), 通过两个案例研究进行学习:情感分析和贷款违约预测,最终通过Python实现相关的算法(也可以选择其他语言,但是强烈推荐Python)。

Case Studies: Analyzing Sentiment & Loan Default Prediction In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,…). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank. These tasks are an examples of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification. In this course, you will create classifiers that provide state-of-the-art performance on a variety of tasks. You will become familiar with the most successful techniques, which are most widely used in practice, including logistic regression, decision trees and boosting. In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale, using stochastic gradient ascent. You will implement these technique on real-world, large-scale machine learning tasks. You will also address significant tasks you will face in real-world applications of ML, including handling missing data and measuring precision and recall to evaluate a classifier. This course is hands-on, action-packed, and full of visualizations and illustrations of how these techniques will behave on real data. We’ve also included optional content in every module, covering advanced topics for those who want to go even deeper! Learning Objectives: By the end of this course, you will be able to: -Describe the input and output of a classification model. -Tackle both binary and multiclass classification problems. -Implement a logistic regression model for large-scale classification. -Create a non-linear model using decision trees. -Improve the performance of any model using boosting. -Scale your methods with stochastic gradient ascent. -Describe the underlying decision boundaries. -Build a classification model to predict sentiment in a product review dataset. -Analyze financial data to predict loan defaults. -Use techniques for handling missing data. -Evaluate your models using precision-recall metrics. -Implement these techniques in Python (or in the language of your choice, though Python is highly recommended).

4.4 Machine Learning: Clustering & Retrieval(机器学习:聚类和检索)

这门课程关注的是机器学习里面的另外两个基本问题:聚类和检索,同样通过案例研究进行学习:相似文档查询,一个非常具有实际应用价值的问题:

Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.

5. 密歇根大学的 Applied Machine Learning in Python(在Python中应用机器学习)

Python机器学习应用课程,这门课程主要聚焦在通过Python应用机器学习,包括机器学习和统计学的区别,机器学习工具包scikit-learn的介绍,有监督学习和无监督学习,数据泛化问题(例如交叉验证和过拟合)等。这门课程同时属于”Python数据科学应用专项课程系列(Applied Data Science with Python Specialization)“。

This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit. The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis. This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.

6. 俄罗斯国立高等经济学院和Yandex联合推出的 高级机器学习专项课程系列(Advanced Machine Learning Specialization)

该系列授课语言为英语,包括深度学习,Kaggle数据科学竞赛,机器学习中的贝叶斯方法,强化学习,计算机视觉,自然语言处理等7门子课程,截止目前前3门课程已开,感兴趣的同学可以关注:

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings.

以下是和机器学习直接相关的子课程,其他这里略过:

6.3 Bayesian Methods for Machine Learning(面向机器学习的贝叶斯方法)

该课程关注机器学习中的贝叶斯方法,贝叶斯方法在很多领域都很有用,例如游戏开发和毒品发现。它们给很多机器学习算法赋予了“超能力”,例如处理缺失数据,从小数据集中提取大量有用的信息等。当贝叶斯方法被应用在深度学习中时,它可以让你将模型压缩100倍,并且自动帮你调参,节省你的时间和金钱。

Bayesian methods are used in lots of fields: from game development to drug discovery. They give superpowers to many machine learning algorithms: handling missing data, extracting much more information from small datasets. Bayesian methods also allow us to estimate uncertainty in predictions, which is a really desirable feature for fields like medicine. When Bayesian methods are applied to deep learning, it turns out that they allow you to compress your models 100 folds, and automatically tune hyperparametrs, saving your time and money. In six weeks we will discuss the basics of Bayesian methods: from how to define a probabilistic model to how to make predictions from it. We will see how one can fully automate this workflow and how to speed it up using some advanced techniques. We will also see applications of Bayesian methods to deep learning and how to generate new images with it. We will see how new drugs that cure severe diseases be found with Bayesian methods.

7. 约翰霍普金斯大学的 Practical Machine Learning(机器学习实战)

这门课程从数据科学的角度来应用机器学习进修实战,课程将会介绍机器学习的基础概念譬如训练集,测试集,过拟合和错误率等,同时这门课程也会介绍机器学习的基本模型和算法,例如回归,分类,朴素贝叶斯,以及随机森林。这门课程最终会覆盖一个完整的机器学习实战周期,包括数据采集,特征生成,机器学习算法应用以及结果评估等。这门机器学习实践课程同时属于约翰霍普金斯大学的 数据科学专项课程(Data Science Specialization)系列:

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.

8. 卫斯理大学 Regression Modeling in Practice(回归模型实战)

这门课程关注的是数据分析以及机器学习领域的最重要的一个概念和工具:回归(模型)分析。这门课程使用SAS或者Python,从线性回归开始学习,到了解整个回归模型,以及应用回归模型进行数据分析:

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.

这门课程同时属于卫斯理大学的 数据分析与解读专项课程系列(Data Analysis and Interpretation Specialization)

9. 卫斯理大学的 Machine Learning for Data Analysis(面向数据分析的机器学习)

这门课程关注数据分析里的机器学习,机器学习的过程是一个开发、测试和应用预测算法来实现目标的过程,这门课程以 Regression Modeling in Practice(回归模型实战) 为基础,介绍机器学习中的有监督学习概念,同时从基础的分类算法到决策树以及聚类都会覆盖。通过完成这门课程,你将会学习如何应用、测试和解读机器学习算法用来解决实际问题。

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

这门课程同时属于卫斯理大学的 数据分析与解读专项课程系列(Data Analysis and Interpretation Specialization)

10. 加州大学圣地亚哥分校的 Machine Learning With Big Data(大数据机器学习)

这门课程关注大数据中的机器学习技术,将会介绍相关的机器学习算法和工具。通过这门课程,你可以学到:通过机器学习过程来设计和利用数据;将机器学习技术用于探索和准备数据来建模;识别机器学习问题的类型;通过广泛可用的开源工具来使用数据构建模型;在Spark中使用大规模机器学习算法分析大数据。

Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems. At the end of the course, you will be able to: • Design an approach to leverage data using the steps in the machine learning process. • Apply machine learning techniques to explore and prepare data for modeling. • Identify the type of machine learning problem in order to apply the appropriate set of techniques. • Construct models that learn from data using widely available open source tools. • Analyze big data problems using scalable machine learning algorithms on Spark.

这门课程同时属于 加州大学圣地亚哥分校的大数据专项课程系列(Big Data Specialization)

11. 俄罗斯搜索巨头Yandex推出的 Big Data Applications: Machine Learning at Scale(大数据应用:大规模机器学习)

机器学习正在改变世界,通过这门课程,你将会学习到:识别实战中需要用机器学习算法解决的问题;通过Spark MLLib构建、调参、和应用线性模型;里面文本处理的方法;用决策树和Boost方法解决机器学习问题;构建自己的推荐系统。

Machine learning is transforming the world around us. To become successful, you’d better know what kinds of problems can be solved with machine learning, and how they can be solved. Don’t know where to start? The answer is one button away. During this course you will: – Identify practical problems which can be solved with machine learning – Build, tune and apply linear models with Spark MLLib – Understand methods of text processing – Fit decision trees and boost them with ensemble learning – Construct your own recommender system. As a practical assignment, you will – build and apply linear models for classification and regression tasks; – learn how to work with texts; – automatically construct decision trees and improve their performance with ensemble learning; – finally, you will build your own recommender system! With these skills, you will be able to tackle many practical machine learning tasks. We provide the tools, you choose the place of application to make this world of machines more intelligent.

这门课程同时属于Yandex推出的 面向数据工程师的大数据专项课程系列(Big Data for Data Engineers Specialization)

注:原创文章,转载请注明出处“课程图谱博客”:http://blog.coursegraph.com

本文链接地址:http://blog.coursegraph.com/coursera上机器学习课程公开课汇总推荐 http://blog.coursegraph.com/?p=696