标签归档:Coursera Downloader

Deep Learning Specialization on Coursera

Coursera课程下载和存档计划二:Coursera课程速查表

基于课程图谱Coursera爬虫过了一遍目前Coursera旧平台的课程数据,提取其中几个有用信息作为Coursera课程下载的速查表,大家可以基于coursera-dl和课程短连接(Session Slug)下载相关的课程,具体下载方法可参考上一篇文章:Coursera课程下载和存档计划(一)

另外新建了一个 Github 项目:Coursera Archive,提供Markdown和Excel两个版本的list。之后计划将相关的网盘信息也汇总上去,欢迎大家一起参与下载和分享。

总计516个课程,具体信息可参考下表:
继续阅读

Coursera课程下载和存档计划一:Coursera Downloader 下载工具

上周三收到Coursera平台的群发邮件,大意是Coursera将在6月30号彻底关闭旧的课程平台,全面升级到新的课程平台上,一些旧的课程资源(课程视频、课程资料)将不再保存,如果你之前学习过相关的课程,或者有心仪的课程,Coursera建议你将这些课程资源下载下来备份。

说实话,自从Coursera这一两年逐渐进行“商业升级”以后,我已经很少在这个平台上学习公开课了,反而是edX的一些课程更吸引我,特别是课程质量,后者显得更用心很多。不过作为最早的MOOC平台Coursera,曾经诞生了很多经典课程,要是这些课程真的随Coursera平台的切换而丢失,实在可惜。这里曾经整理过一批“公开课可下载资源汇总”,很多来自于大家的贡献和分享,不过这也是两三年前的事情,一些课程网盘资源已经失效,这封邮件促使我开始检查这些网盘资源,特别是来自Coursera平台的课程资源。之前有些课程资源没有下载或者没有网盘资源,以为只要有Coursera账号,就可以随时登陆上去在线观看就可以了,也没有下载的欲望,现在不同了,例如斯坦福大学Dan Jurafsky和Christopher Manning的自然语言处理课程,例如一直没有开课却可以preview观看的大牛Pedro Domingos的机器学习课程,下载和备份是必须的。

工欲善其事,必先利其器,针对Coursera的下载工具有很多,包括一些浏览器插件,不过这里推荐的是Python下载工具Coursera Downloader, 简称coursera-dl。这个神器早在几年前我就用过,印象深刻,这次重拾,依然非常方便给力。最简单的安装方法是“pip install coursera”,可参考github上该项目的安装说明。下面以Mac OS系统为例简单说明一下基于virtualenv的安装使用方法,该方法对ubuntu这样的linux系统应该有效,windows下没有测试,未知。

首先从github上获取代码,git clone或者直接下载zip源码文件均可:

git clone https://github.com/coursera-dl/coursera-dl

Cloning into ‘coursera-dl’…
remote: Counting objects: 3357, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 3357 (delta 6), reused 0 (delta 0), pack-reused 3343
Receiving objects: 100% (3357/3357), 1.39 MiB | 75 KiB/s, done.
Resolving deltas: 100% (1852/1852), done.

cd coursera-dl/

virtualenv my-coursera

New python executable in /Users/xxxxxx/project/mooc/test/coursera-dl/my-coursera/bin/python
Installing setuptools, pip, wheel…done.

source my-coursera/bin/activate

pip install -r requirements.txt

Collecting beautifulsoup4>=4.1.3 (from -r requirements.txt (line 1))
…..
Installing collected packages: beautifulsoup4, six, html5lib, requests, urllib3, pyasn1, keyring
Successfully installed beautifulsoup4-4.4.1 html5lib-1.0b8 keyring-9.0 pyasn1-0.1.9 requests-2.10.0 six-1.10.0 urllib3-1.16

安装完毕,以下是coursera-dl的详细用法:

General: coursera-dl -u -p modelthinking-004
Multiple classes: coursera-dl -u -p saas historyofrock1-001 algo-2012-002
Filter by section name: coursera-dl -u -p -sf “Chapter_Four” crypto-004
Filter by lecture name: coursera-dl -u -p -lf “3.1_” ml-2012-002
Download only ppt files: coursera-dl -u -p -f “ppt” qcomp-2012-001
Use a ~/.netrc file: coursera-dl -n — matrix-001
Get the preview classes: coursera-dl -n -b ni-001
Specify download path: coursera-dl -n –path=C:\Coursera\Classes\ comnetworks-002
Display help: coursera-dl –help

Maintain a list of classes in a dir:
Initialize: mkdir -p CURRENT/{class1,class2,..classN}
Update: coursera-dl -n –path CURRENT `\ls CURRENT`

我们以Coursera上密歇根大学的“自然语言处理入门”课程为例,在旧的课程课程主页“Introduction to Natural Language Processing”,首先需要加入(Enroll)该课程的一个班次,目前只有2015年10月到12月开过一轮课,加入该轮课程后,进入到课程详细页面,可以看到网页链接类似这个形式:

https://class.coursera.org/nlpintro-001/lecture

对于Coursera Downloader来说,主要需要的就是这个”nlpintro-001″课程班次短链接,然后就可以尝试下载了,这里用 –path指定了课程下载路径:

coursera-dl -u 用户邮箱 -p 用户密码 --path=../../coursera_backup/ nlpintro-001

然后就开始了下载历程。。。。。。可能和网络有关,这个下载有时候会中断或者停止不动假死,coursera-dl提供了一个“Resuming downloads”模式,类似于“断点续传”,非常有用,可以用如下命令恢复之前中断的下载:

coursera-dl -u 用户邮箱 -p 用户密码 --path=../../coursera_backup/ --resume nlpintro-001

这种加入课程然后下载课程资料的方法比较全,除了课程视频外,还可以下载课程相关的课件和字幕。如果你没有加入课程,Coursera Downloader提供了一个下载preview课程的方法,不过只能下载课程视频,但是前提是你必须有Coursera账号。以一直没有开课却可以preview观看的大牛Pedro Domingos的机器学习课程为例,点击该课程主页Machine Learning上的”Preview lectures”按钮,即可得到课程预览链接“https://class.coursera.org/machlearning-001/lecture/preview”,按照Coursera Downloader上的方法,需要预先在用户主目录下设置一个 ~/.netrc 文件,文件格式如下:

machine coursera-dl login 用户邮箱 password 用户密码

非常重要的是,你需要把设置一下 ~/.netrc 的权限:

chmod og-rw ~/.netrc

否则,会遇到如下的错误,我已经踩过这个坑了:

~/.netrc access too permissive: access permissions must restrict access to only the owner

之后就可以用如下命令下载preview的课程视频文件了:

coursera-dl -n -b --path=../../coursera_backup/ machlearning-001

希望大家用这个工具或其他工具尽快保存Coursera自己心仪的课程,如果方便的话,上传到相关的网盘,做个分享,一方面自己做个备份,另一方便方便大家共享学习资源。这里先附上已经整理的5门Coursera公开课资源,部分课程资源还在下载和上传中,之后将陆续整理发布。

1、机器学习课程 by Andrew Ng

该课程已经在Coursera新的课程平台上发布(https://www.coursera.org/learn/machine-learning),在线的课程资源依然会得到保留,这里分享的百度网盘资源包含两个版本,来自于之前大家的分享:

链接: http://pan.baidu.com/s/1miMZHQo 密码: aeck

2、面向机器学习的神经网络(Neural Networks for Machine Learning)by Geoffrey Hinton

Geoffrey Hinton大神在Coursera上的这门课程只在2012年开过一轮,这次应该不会进行迁移了:

“Deep learning必修课”

“宗派大师+开拓者直接讲课,秒杀一切二流子”

看看上面的点评,对深度学习感兴趣的同学赶紧保存,本次分享包含两个版本,均为之前大家的分享:

链接: http://pan.baidu.com/s/1sk9cgK9 密码: ndm9

3、Daphne Koller教授的“概率图模型公开课(Probabilistic Graphical Models)

这次应该也不会迁移了,想当年多少大神在Coursera上开课。。。本次分享为之前一个朋友的共享:

链接: http://pan.baidu.com/s/1kVpRMKn 密码: 244s

4、Michael Collins大神的“自然语言处理公开课(Natural Language Processing)

NLP大神的课程,必须备份,来自之前一个朋友的分享:

链接: http://pan.baidu.com/s/1kV72IhT 密码: fxjw

5、斯坦福大学Dan Jurafsky和Christopher Manning两位大牛的“自然语言处理公开课(Natural Language Processing)

这门课程的授课老师是斯坦福教授Dan Jurafsky和Christopher Manning,两位都是NLP领域的大大牛,其他不说,仅仅是他们写的书应该是很多NLPer的入门书:前者写了《自然语言处理综论》,后者写了《统计自然语言处理基础》。

我用coursera-dl下载了一份并上传到百度网盘备份,需要的同学尽快保存:

链接: http://pan.baidu.com/s/1hrGMbkg 密码: a2w5

附Coursera邮件内容:

Save course materials for some courses by June 30

Dear XXX,

We wanted to inform you of an update to our technology platform that will affect access to some courses you previously joined.

In 2014, Coursera began developing a new technology platform to improve your learning experience, and to allow courses to run more frequently. The majority of our courses are now offered on the new platform.
This month, we are closing the old platform. One or more courses you joined are on the old platform.
Effective June 30, 2016, courses on the old platform will no longer be available. You should use this opportunity to save any relevant course materials or assignments.

How does this affect my courses?

Any courses and course materials on our old platform will no longer be accessible after June 30. Until that date, we encourage you to save any content you need for personal use and reference.
Any courses on the new platform will not be affected by this change.

Will this affect earned Certificates?
All Statements of Accomplishment (SoA) and Verified Certificates will remain accessible in your Accomplishments page, as long as you do not unenroll from courses you have completed on the old Cplatform. You are also welcome to download a copy for your records at any time. Statements and Certificates that you have shared to LinkedIn will also be maintained on your LinkedIn profile after June 30.

How do I know if a course is on the “old platform”?

If you aren’t sure which platform a course is on currently, navigate to the course and check the URL in the browser bar – courses on the old platform have URLs that begin with class.coursera.org (rather than then new platform, which uses the URL coursera.org/learn.)

How do I save course materials?

To save course materials from the old platform for reference:
• Download any lecture slides or videos that you would like to save for reference
• Save a record of your quizzes and other assignments by taking screenshots

More questions?

If you have a technical issue with your account, please visit our Help Center.
Thank you for being a part of our learning community, and for your patience and understanding through this product transition! We are excited to continue to improve the learning experience on Coursera, and we look forward to bringing you more great courses on the new platform.

注:原创文章,转载请注明出处“课程图谱博客”:http://blog.coursegraph.com

本文链接地址:http://blog.coursegraph.com/coursera课程下载和存档计划一