<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>r | Dr. Hui Lin</title><link>https://www.huilin.site/tag/r/</link><atom:link href="https://www.huilin.site/tag/r/index.xml" rel="self" type="application/rss+xml"/><description>r</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><image><url>https://www.huilin.site/media/icon_hu849bfb60811b9f998366b9def6f35d6e_33320_512x512_fill_lanczos_center_3.png</url><title>r</title><link>https://www.huilin.site/tag/r/</link></image><item><title>Learning ggmap</title><link>https://www.huilin.site/post/2017-03-31-learning-ggmaps-package/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.huilin.site/post/2017-03-31-learning-ggmaps-package/</guid><description>&lt;p>This is a document for myself to review ggmap package, and how to get quick start.&lt;/p>
&lt;p>ggmap is powerful map plot package in R. Compared with maps package, plots exported from ggmaps are elegant.&lt;/p>
&lt;p>Two steps to plot a map: plot a map raster, decorate the base map with your own data.&lt;/p>
&lt;h3 id="step-1-download-your-base-map">Step 1: download your base map&lt;/h3>
&lt;p>You need to know the location you will plot. Define location in two ways showing below:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">library(ggmap)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Location1 &amp;lt;- &amp;#34;University of Wisconsin, Milwaukee&amp;#34; #Use your address as your defination
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Location2 &amp;lt;- &amp;#34;c(lon = -95.3632715, lat = 29.7632836)&amp;#34; # Use longitude and latitude
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Use &lt;code>get_map&lt;/code> function to download the raster map in your location.
There are 3 map “sources” to obtain a map raster, and each of these sources has multiple “map types”&lt;/p>
&lt;blockquote>
&lt;p>stamen: “watercolor”, “toner”, &amp;ldquo;terrain&amp;rdquo;&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.huilin.site/image/stamen.jpg" alt="stamen" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;blockquote>
&lt;p>googlemap:&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="googlemap" srcset="
/media/image/googlemap_huc4bda9b1694f7f0bbbdd37e40951441f_70802_d2db8c23fb92364459e52bea05a23e0e.webp 400w,
/media/image/googlemap_huc4bda9b1694f7f0bbbdd37e40951441f_70802_e8e865b4a09150fb49e4a7c952a591fc.webp 760w,
/media/image/googlemap_huc4bda9b1694f7f0bbbdd37e40951441f_70802_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://www.huilin.site/media/image/googlemap_huc4bda9b1694f7f0bbbdd37e40951441f_70802_d2db8c23fb92364459e52bea05a23e0e.webp"
width="760"
height="224"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;blockquote>
&lt;p>osm:(sometimes their servers are unavailable)&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.huilin.site/image/osm.jpg" alt="osm" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">myMap &amp;lt;- get_map(location=myLocation, source=&amp;#34;stamen&amp;#34;, maptype=“watercolor&amp;#34;, crop=FALSE)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">ggmap(myMap)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">##zoom = integer from 3-21
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">##3 = continent, 10=city, 21=building
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">##(openstreetmap limit of 18)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="step-2-decorate-your-map-with-data">Step 2 decorate your map with data&lt;/h3>
&lt;hr>
&lt;p>In addition, a developing R package need to be concerned: &lt;a href="http://rmaps.github.io/" target="_blank" rel="noopener">rMap&lt;/a>.&lt;/p>
&lt;h3 id="references">References:&lt;/h3>
&lt;p>&lt;a href="https://github.com/dkahle/ggmap" target="_blank" rel="noopener">ggmap github&lt;/a>&lt;/p>
&lt;p>&lt;a href="https://cran.r-project.org/web/packages/ggmap/ggmap.pdf" target="_blank" rel="noopener">ggmap document&lt;/a>&lt;/p>
&lt;p>&lt;a href="https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/ggmap/ggmapCheatsheet.pdf" target="_blank" rel="noopener">ggmap quick start&lt;/a>&lt;/p>
&lt;p>[ggmap Introduction](&lt;a href="https://dl.dropboxusercontent.com/u/24648660/ggmap" target="_blank" rel="noopener">https://dl.dropboxusercontent.com/u/24648660/ggmap&lt;/a> useR 2012.pdf)&lt;/p></description></item><item><title>Learning ggplot2</title><link>https://www.huilin.site/post/2017-03-31-learning-ggplot2/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.huilin.site/post/2017-03-31-learning-ggplot2/</guid><description>&lt;p>本课程介绍三种R语言的绘图工具包：&lt;code>plot&lt;/code>,&lt;code>qplot&lt;/code>,&lt;code>ggplot&lt;/code>。三种绘图包的能够和语法均不相同。
&lt;code>plot&lt;/code>命令是R语言自带的绘图命令，绘图效果简单，适宜数据分析时绘图。
&lt;code>qplot&lt;/code>命令是R语言初级绘图语言包，能够提供符合出版物标准的简单绘图。得到的图形大方美观。&lt;/p>
&lt;h3 id="lesson-regression-models-introduction">Lesson: Regression Models Introduction&lt;/h3>
&lt;p>制图：plot(jitter(child,4) ~ parent,galton)&lt;/p>
&lt;p>建立回归函数：使用函数&lt;code>lm&lt;/code>(linear model)，例如：&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">regrline &amp;lt;- lm(child~parent, galton)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>建立回归函数之后，使用&lt;code>abline&lt;/code>(add straight lines to a plot)函数将回归函数在图表中画出，例如&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">abline(regrline, lwd = 3, col = &amp;#34;red&amp;#34;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">#lwd = line width, col = line color
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>画出直线之后可以使用函数&lt;code>summary&lt;/code>查看回归函数的各类参数包括残差， 系数，相关系数等等。&lt;/p>
&lt;hr>
&lt;h3 id="qplot">qplot&lt;/h3>
&lt;p>&lt;code>qplot&lt;/code> is a basic function in ggplot2 package. It provides some basic plots (e.g. points, smooth, boxplot) for users to learn their database generally.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">qplot(hwy, displ, data = mpg)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>qplot可调的参数有许多，如下展示&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>x, y, data, color, shape, size这些参数很容易理解。可以使用不同的变量实现变化。&lt;/p>
&lt;p>alpha用于调节透明度。0就是全透明，1就是实心。&lt;/p>
&lt;p>geom用于调节图表类型，有以下几个选项&amp;quot;point&amp;quot;, &amp;ldquo;smooth&amp;rdquo;, &amp;ldquo;boxplot&amp;rdquo;, &amp;ldquo;line&amp;rdquo;, &amp;ldquo;histogram&amp;rdquo;, &amp;ldquo;density&amp;rdquo;, &amp;ldquo;bar&amp;rdquo;, &amp;ldquo;jitter&amp;rdquo;.&lt;/p>
&lt;p>point表示散点图&lt;/p>
&lt;p>smooth做出拟合的曲线图&lt;/p>
&lt;p>&lt;a href="http://docs.ggplot2.org/0.9.3.1/geom_boxplot.html" target="_blank" rel="noopener">boxplot&lt;/a>做出股价图，此处不使用自定义的最高值，最低值和平均值，而是使用fill定义group，自动计算&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">qplot(year,averTLO, data = gt_sum, xlab = &amp;#34;Year&amp;#34;, ylab = &amp;#34;Land&amp;amp;Ocean Avg Temp&amp;#34;, geom = c(&amp;#34;boxplot&amp;#34;,&amp;#34;jitter&amp;#34;),fill=decade)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>![boxplot](/image/Land&amp;amp;Ocean Avg Temp vs year_boxplot_jitter.jpeg)&lt;/p>
&lt;p>line做出折线图&lt;/p>
&lt;p>&lt;a href="http://docs.ggplot2.org/0.9.3.1/geom_histogram.html" target="_blank" rel="noopener">histogram&lt;/a>只针对单变量的柱状分布图，纵轴为count，横轴为该单变量。&lt;/p>
&lt;p>&lt;a href="http://docs.ggplot2.org/0.9.3.1/geom_density.html" target="_blank" rel="noopener">density&lt;/a>同样只针对单变量，画出该单变量的密度分布图。纵轴为density，横轴为该单变量。&lt;/p>
&lt;p>&lt;a href="http://docs.ggplot2.org/0.9.3.1/geom_bar.html" target="_blank" rel="noopener">bar&lt;/a>即为常规柱状分布图，定义两个变量,利用fill变量可以实现多种变化&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">qplot(factor(cyl), data=mtcars, geom=&amp;#34;bar&amp;#34;, fill=factor(gear))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="http://docs.ggplot2.org/0.9.3.1/geom_bar-18.png" alt="qplot" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>jitter则是在x轴上产生随机变量从而避免图形重叠带来的困扰。例如&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">p &amp;lt;- ggplot(mpg, aes(displ, hwy))
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">p + geom_point()
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">不用jitter散点图的效果
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="http://docs.ggplot2.org/0.9.3.1/geom_jitter-2.png" alt="points withou jitter" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">p + geom_point(position = &amp;#34;jitter&amp;#34;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">使用上jitter的效果
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>method和formula这两个选项是针对smooth这个选项而出现的。当smooth选项被调用，默认的拟合方法为loess。还有其他拟合方式允许被调用，如&amp;rsquo;lm&amp;rsquo;:线性拟合，&amp;lsquo;gam&amp;rsquo;:generalized additive models,&amp;ldquo;rlm&amp;rdquo;: robust regression&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">For example, to add simple linear regression lines, you&amp;#39;d specify geom=&amp;#34;smooth&amp;#34;, method=&amp;#34;lm&amp;#34;, formula=y~x. Changing the formula to y~poly(x,2) would produce a quadratic fit. Note that the formula uses the letters x and y, not the names of the variables.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">For method=&amp;#34;gam&amp;#34;, be sure to load the mgcv package. For method=&amp;#34;rml&amp;#34;, load the MASS package.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>cited from &lt;a href="http://www.statmethods.net/advgraphs/ggplot2.html" target="_blank" rel="noopener">Quick-R&lt;/a>&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="http://docs.ggplot2.org/0.9.3.1/geom_jitter-4.png" alt="jitter" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>facets这个选项可以利用变量生成不同的分图，该选项的表达方式为：&lt;code>facets=rowvar~colvar&lt;/code>，若rowvar或colvar不需要设置变量则用“.”代替。例如&lt;code>facets = .~colvar&lt;/code>&lt;/p>
&lt;p>利用coord_flip()可以将图表翻转&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">ggplot(diamonds, aes(color, fill=cut)) + geom_bar() + coord_flip()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="http://docs.ggplot2.org/0.9.3.1/geom_bar-22.png" alt="ggplot_bar" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>需要叠加不同类型的图表是使用c指令，e.g. &lt;code>c(&amp;quot;point&amp;quot;, &amp;quot;smooth&amp;quot;)&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">qplot(year,averT, data = gt_sum, xlab = &amp;#34;Year&amp;#34;, ylab = &amp;#34;Land Avg Temp&amp;#34;, geom = c(&amp;#34;point&amp;#34;,&amp;#34;smooth&amp;#34;))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>![combine](/image/land avg temp vs year_point_smooth.jpeg)&lt;/p>
&lt;p>xlim, ylim, xlab, ylab均容易理解
main,sub用于调节主副标题&lt;/p>
&lt;h3 id="ggplot2">ggplot2&lt;/h3>
&lt;p>&lt;code>ggplot&lt;/code>采用图层式绘图方法，可根据自己的意图添加想要的图层，适合绘制复杂的大图。
下面是一个展示&lt;code>ggplot&lt;/code>绘图语法的例子，其中&lt;code>mpg&lt;/code>是&lt;code>ggplot&lt;/code>自带的一个关于汽车品牌，性质的数据库。&lt;code>hwy&lt;/code>和&lt;code>displ&lt;/code>分别是&lt;code>mpg&lt;/code>数据库中的字段，表示每加仑汽油行驶的里程数和汽车的排量。&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;gt;g&amp;lt;-ggplot(mpg, aes(hwy, displ))
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;g+geom_points()+geom_smooth()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>How to export a table.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">write.table(dataframe, &amp;#34;pathway&amp;#34;)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Learning R in Kaggle</title><link>https://www.huilin.site/post/2017-03-22-learning-r-in-kaggle/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.huilin.site/post/2017-03-22-learning-r-in-kaggle/</guid><description>&lt;p>I&amp;rsquo;m learning data analysis and explore in R following the tutorial posted in &lt;a href="https://www.kaggle.com/mrisdal/titanic/exploring-survival-on-the-titanic" target="_blank" rel="noopener">Kaggle&lt;/a>&lt;/p>
&lt;p>Here are some sentences I found it useful.&lt;/p>
&lt;p>&lt;code>train &amp;lt;- read.csv('../input/train.csv', stringsAsFactors = F)&lt;/code>&lt;/p>
&lt;p>This is how to read csv files. Also, use &lt;code>read.delim()&lt;/code>, &lt;code>read.delim2()&lt;/code> to read txt file.&lt;a href="https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html" target="_blank" rel="noopener">read documentation&lt;/a>&lt;/p>
&lt;p>&lt;code>stringAsFactors&lt;/code> is a useful factor. Here we do not want to use the headers as factors.&lt;/p>
&lt;p>&lt;code>str(dataframe)&lt;/code> use this to check data. Also if you use R Studio, use &lt;code>view(dataframe)&lt;/code> to check data.&lt;/p>
&lt;p>To check dataframe, we can also use &lt;code>tbl_df&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">full_df &amp;lt;- tbl_df(full)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">full_df
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Learning R using Swirl</title><link>https://www.huilin.site/post/2017-02-20-learning-r-using-swirl/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.huilin.site/post/2017-02-20-learning-r-using-swirl/</guid><description>&lt;h1 id="用swirl学习r语言learn-r-in-r">用Swirl学习R语言，Learn R, in R&lt;/h1>
&lt;blockquote>
&lt;p>how to import data (read function)
how to manipulate data wit dplyr&lt;/p>
&lt;/blockquote>
&lt;h3 id="lesson-getting-and-cleaning-data">lesson: Getting and Cleaning Data&lt;/h3>
&lt;p>我们可以使用&lt;code>read.csv&lt;/code>函数来导入数据。具体查看&lt;code>?read.csv&lt;/code>，例子&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">#set a path to csv file
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">path2csv &amp;lt;-&amp;#34;E:/R-3.3.2/library/swirl/Courses/Getting_and_Cleaning_Data/Manipulating_Data_with_dplyr/2014-07-08.csv&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">#use read.csv to read the csv file
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">mydf&amp;lt;-read.csv(path2csv,stringsAsFactors = FALSE)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">#stringsAsFactors: logical: should character vectors be converted to factors?
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>使用&lt;code>dim()&lt;/code>查看数据的行列情况&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">dim(mydf)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>使用Dplyr包处理数据，首先&lt;code>library(dplyr)&lt;/code>，然后使用&lt;code>tbl_df()&lt;/code>(tibble)函数读取frame中的数据，这一步很重要，只有这样才能继续使用下面的函数和功能。
使用函数&lt;code>rm(&amp;quot;what_you_want_to_delete&amp;quot;)&lt;/code>(remove)删除frame。&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">library(dplyr)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">cran&amp;lt;-tbl_df(mydf)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">rm(&amp;#34;my_df&amp;#34;)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>五个最基础最常用的函数工具：&lt;code>select()&lt;/code>, &lt;code>filter()&lt;/code>, &lt;code>arrange()&lt;/code>, &lt;code>mutate()&lt;/code>, &lt;code>summarize()&lt;/code>&lt;/p>
&lt;hr>
&lt;p>&lt;code>select&lt;/code>函数可选取frame中的任意列，不用使用&lt;code>$&lt;/code>符号。可以使用&lt;code>:&lt;/code>，选取连续列，使用&lt;code>-&lt;/code>删除不需要的列，例如&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">select(cran,country:r_arch)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">select(cran, -(time:size))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;p>&lt;code>filter&lt;/code>函数可以选取任意行&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nf">filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">cran&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kn">package&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;swirl&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nf">filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">cran&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="nx">country&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;IN&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">r_version&lt;/span> &lt;span class="o">&amp;lt;=&lt;/span> &lt;span class="s">&amp;#34;3.0.2&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="err">#&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="nx">AND&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nf">filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">cran&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nx">country&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;US&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> &lt;span class="nx">country&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s">&amp;#34;IN&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="err">#&lt;/span>&lt;span class="p">|&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="nx">OR&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">&amp;gt;&lt;/span> &lt;span class="nf">filter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">cran&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="p">!&lt;/span>&lt;span class="nx">is&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="nf">na&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">r_version&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="err">#&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>is.na()如果数据是空的，返回TRUE，反之FALSE&lt;/p>
&lt;hr>
&lt;p>&lt;code>arrange&lt;/code>函数可以根据要求重新排列行的顺序。&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">arrange(cran, ip_id)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">arrange(cran, desc(ip_id))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;p>&lt;code>mutate&lt;/code>函数可以用来增加一列派生变量。&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;gt; mutate(cran3, correct_size = size+1000)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt; mutate(cran3, temperature = mean(AvgTemp, na.rm = &amp;#34;TRUE&amp;#34;)) #求平均值，把空值删除
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt; mutate(cran3, decade = trunc(year/10))#trunc函数取整， 一系列的函数还有floor(), round(),signif().
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h3 id="lesson2-grouping-and-chaining-with-dplyr">Lesson2 Grouping and Chaining with dplyr&lt;/h3>
&lt;p>&lt;code>summarize&lt;/code>是个很重要的函数。例如&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-go" data-lang="go">&lt;span class="line">&lt;span class="cl">&lt;span class="nx">by_package&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nf">group_by&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">cran&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kn">package&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nx">pack_sum&lt;/span> &lt;span class="o">&amp;lt;-&lt;/span> &lt;span class="nf">summarize&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">by_package&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">count&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">n&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">unique&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">n_distinct&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">ip_id&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">countries&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">n_distinct&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">country&lt;/span>&lt;span class="p">),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nx">avg_bytes&lt;/span> &lt;span class="p">=&lt;/span> &lt;span class="nf">mean&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nx">size&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>n()表示的是括号内的字段有多少不为空的数据，n_distinct表示括号内的字段有多少不重复的数据。这个函数是&lt;code>length(unique(x))&lt;/code>的简化和快速版，更容易操作。&lt;/p>
&lt;p>如果需要使用递进关系的函数，那么可以使用%&amp;gt;%连接符，可以连接不同函数。例如，&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">cran %&amp;gt;%
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> select(ip_id, country, package, size) %&amp;gt;%
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> mutate(size_mb = size / 2^20) %&amp;gt;%
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> filter(size_mb &amp;lt;= 0.5) %&amp;gt;%
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> arrange(desc(size_mb))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>在上面的例子中，&lt;code>select&lt;/code>函数需要用到&lt;code>cran&lt;/code>数据框架，mutate函数需要用到&lt;code>select&lt;/code>函数处理之后的数据框架，……以此类推。而在末尾的&lt;code>%&amp;gt;%&lt;/code>连接符正好起到这样的作用。&lt;/p></description></item><item><title>R basics</title><link>https://www.huilin.site/post/2017-02-19-r-basic/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.huilin.site/post/2017-02-19-r-basic/</guid><description>&lt;h1 id="r语言学习笔记">R语言学习笔记&lt;/h1>
&lt;hr>
&lt;p>这里是关于一些R语言的语法备忘
&lt;a href="https://campus.datacamp.com/" target="_blank" rel="noopener">Learning Website&lt;/a>&lt;/p>
&lt;h3 id="unit-1-assignment-and-basic-calculation">Unit 1 Assignment and basic calculation&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">myapples = 3
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>or&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">myapples &amp;lt;- 3
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>+&lt;/code>, &lt;code>-&lt;/code>, &lt;code>*&lt;/code>, &lt;code>/&lt;/code>, &lt;code>^&lt;/code>&lt;/p>
&lt;p>&lt;code>%%&lt;/code> means the remainder.&lt;/p>
&lt;hr>
&lt;h3 id="unit-2-vectors">Unit 2 Vectors&lt;/h3>
&lt;p>combine function: &lt;code>c()&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">numeric_vector = c(1,2,3) #Or c(1:3)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>sum()&lt;/code>calculates the sum of all elements of a vector.
&lt;code>mean()&lt;/code> calculates the average of all elements of a vector.
Selection by comparison: logical comparison operator: &lt;code>&amp;lt;&lt;/code>, &lt;code>&amp;gt;&lt;/code>, &lt;code>&amp;lt;=&lt;/code>, &lt;code>&amp;gt;=&lt;/code>, &lt;code>==&lt;/code>, &lt;code>!=&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">poker_vector[selection_vector]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h3 id="unit-3-matrices">Unit 3 Matrices&lt;/h3>
&lt;p>Useful functions: &lt;code>matrix()&lt;/code>, &lt;code>colnames()&lt;/code>,&lt;code>rownames&lt;/code>, &lt;code>rbind&lt;/code>, &lt;code>cbind&lt;/code>. &lt;code>rowSums()&lt;/code>, &lt;code>colSums()&lt;/code>, e.g.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;gt;matrix(1:9,byrow =TRUE, nrow = 3)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> [,1] [,2] [,3]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[1,] 1 2 3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[2,] 4 5 6
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[3,] 7 8 9
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;new_hope &amp;lt;- c(460.998, 314.4)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;empire_strikes &amp;lt;- c(290.475, 247.900)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;return_jedi &amp;lt;- c(309.306, 165.8)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;star_wars_matrix &amp;lt;- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;region &amp;lt;- c(&amp;#34;US&amp;#34;, &amp;#34;non-US&amp;#34;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;titles &amp;lt;- c(&amp;#34;A New Hope&amp;#34;, &amp;#34;The Empire Strikes Back&amp;#34;, &amp;#34;Return of the Jedi&amp;#34;)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">#Usage of colnames and rownames
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;colnames(star_wars_matrix)&amp;lt;-region
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;rownames(star_wars_matrix)&amp;lt;-titles
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> US non-US
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">A New Hope 460.998 314.4
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">The Empire Strikes Back 290.475 247.9
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Return of the Jedi 309.306 165.8
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"># Usage of cbind and rbind
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">big_matrix &amp;lt;- cbind(matrix1, matrix2, vector1 ...)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">big_matrix = rbind(matrix1, ...)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;blockquote>
&lt;p>注意此处与MATLAB语法的区别，在MATLAB中选取矩阵的行列用同样使用&lt;code>[]&lt;/code>,但是选取整列这个功能，MATLAB中使用&lt;code>：&lt;/code>表示，例如&lt;code>school[1,:]&lt;/code>，而在R中，不使用任何符号，例如&lt;code>school[1,]&lt;/code>。&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h3 id="unit-4-factor">Unit 4 Factor&lt;/h3>
&lt;p>The term &lt;code>factor&lt;/code> refers to a statistical data type used to store categorical variables. The difference between a &lt;strong>categorical variable&lt;/strong> and a &lt;strong>continuous variable&lt;/strong> is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">factor_speed_vector &amp;lt;-factor(speed_vector, ordered=TRUE, levels=c(&amp;#34;slow&amp;#34;,&amp;#34;fast&amp;#34;,&amp;#34;insane&amp;#34;))
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h3 id="unit-5-data-frame">Unit 5 Data Frame&lt;/h3>
&lt;p>Useful fuctions show below:
&lt;code>head(variables)&lt;/code> shows the first observations of a data frame
&lt;code>tail(variables)&lt;/code> shows the last obseravations of a variables
&lt;code>str()&lt;/code> get a quick overview of data
&lt;code>data.frame(vectors1, vectors2, ...)&lt;/code> combine vectors into one data
&lt;code>$sign&lt;/code>: e.g. planets_df$diameter when data have names&lt;/p>
&lt;p>&lt;code>subset(my_df, subset = some_condition)&lt;/code>, e.g. subset(planet_df, diameter&amp;lt;1)
&lt;code>order()&lt;/code>interesting function e.g.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">&amp;gt;a = c(100,10,1000)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">order(a)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[1] 2 1 3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&amp;gt;a[order(a),]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">[1] 10 100 1000
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">#the comma is the solid brakets is crucial.
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h3 id="unit-6-list">Unit 6 List&lt;/h3>
&lt;p>List can have kinds of components: vector, matrices and data frames.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">My_list = list(my_vector, my_matrix, my_df)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Change the name of list&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">names(my_list)=c(&amp;#34;vec&amp;#34;, &amp;#34;mat&amp;#34;, &amp;#34;df&amp;#34;)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Or&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">my_list = list(my_vec=vec, my_matrix=mat,...)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To conveniently add elements to lists you can use the c() function, that you also used to build vectors:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">ext_list &amp;lt;- c(my_list , my_val)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This will simply extend the original list, my_list, with the component my_val. This component gets appended to the end of the list. If you want to give the new list item a name, you just add the name as you did before:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">ext_list &amp;lt;- c(my_list, my_name = my_val)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item></channel></rss>