<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Ramey &#187; data frames</title>
	<atom:link href="http://www.johnramey.net/tag/data-frames/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johnramey.net</link>
	<description>Don&#039;t think...compute.</description>
	<lastBuildDate>Wed, 07 Jul 2010 20:34:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>A Little Bit About Data Frames in R</title>
		<link>http://www.johnramey.net/2009/03/14/a-little-bit-about-data-frames-in-r/</link>
		<comments>http://www.johnramey.net/2009/03/14/a-little-bit-about-data-frames-in-r/#comments</comments>
		<pubDate>Sat, 14 Mar 2009 19:49:01 +0000</pubDate>
		<dc:creator>johnramey</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[data frames]]></category>

		<guid isPermaLink="false">http://www.johnramey.net/?p=155</guid>
		<description><![CDATA[Data frames in R are much like DataSets in SAS, SPSS, .NET, etc. Really, they are just spreadsheets that feel like a matrices. We can use these to look at numerical data along with any meta data or characteristics associated with the numbers though numbers are not required. From the R Documentation, &#8220;a data frame [...]]]></description>
			<content:encoded><![CDATA[<p>Data frames in R are much like DataSets in SAS, SPSS, .NET, etc.  Really, they are just spreadsheets that feel like a matrices. We can use these to look at numerical data along with any meta data or characteristics associated with the numbers though numbers are not required.  From the R Documentation, &#8220;a data frame is a list of variables of the same length with unique row names&#8221;, and also it is &#8220;a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on)&#8221;.</p>
<p>Let&#8217;s take a look at an example.  First, we start with generating a 3 x 3 identity matrix and assigning the matrix to the variable, <strong>mat</strong>.</p>
<p>[code]<br />
mat = diag( 3 )<br />
[/code]</p>
<p>By typing <strong>mat</strong>, we can see the output.</p>
<pre><strong>     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
</strong></pre>
<p>Next, we are going to convert this matrix to a data frame called <strong>mat_dataframe</strong> and output it.</p>
<p>[code] mat_dataframe = data.frame( mat )[/code]</p>
<p><strong> </strong></p>
<pre><strong>  X1 X2 X3
1  1  0  0
2  0  1  0
3  0  0  1
</strong></pre>
<p><strong></strong></p>
<p>Notice that the column names are <strong>X1</strong>, <strong>X2</strong>, and <strong>X3</strong> and that the row names are <strong>1</strong>, <strong>2</strong>, and <strong>3</strong>. Say we want to add more columns and rows to our data frame. Let&#8217;s first start by appending a row to the &#8220;mat_dataframe.&#8221;  We do this with <strong>rbind</strong>.</p>
<p>[code]<br />
mat_dataframe = rbind(mat_dataframe, c(2,2,2))<br />
[/code]</p>
<p>We have added a vector of twos to the next row of the data frame.  Here&#8217;s what <strong>mat_dataframe</strong> looks like so far.</p>
<pre><strong>  X1 X2 X3
1  1  0  0
2  0  1  0
3  0  0  1
4  2  2  2</strong></pre>
<p>Now, we should try appending 2 columns to the <strong>mat_dataframe</strong> using 2 different methods. The first line will create a new data frame from the original data frame and append a column called &#8220;City&#8221; with &#8220;Dallas&#8221; as the entry for each row. The second takes this data frame and adds another column called <strong>Color</strong> with entries <strong>blue</strong> and <strong>green</strong>.</p>
<p>[code]<br />
mat_dataframe = data.frame( mat_dataframe, City="Dallas" )<br />
mat_dataframe = cbind( mat_dataframe, Color=c( "blue", "green" ) )<br />
[/code]</p>
<p>Now, the <strong>mat_dataframe</strong> looks like this.</p>
<pre><strong><strong>  X1 X2 X3   City Color
1  1  0  0 Dallas  blue
2  0  1  0 Dallas green
3  0  0  1 Dallas  blue
4  2  2  2 Dallas green</strong></strong></pre>
<p>Notice that once <strong>blue</strong> and <strong>green</strong> were both used, they were both repeated. Before we move on, let me mention a gotcha when adding columns.  On the <strong>City</strong> column, I simply inserted <strong>Dallas</strong> for each row, but under the <strong>Color</strong> column, I added 2 different colors.  What happens if we specify three values? Let&#8217;s try this with a new column called <strong>Country</strong>.</p>
<p>[code]<br />
mat_dataframe = data.frame( mat_dataframe, Country=c( "USA", "Canada", "Mexico" ) )<br />
[/code]</p>
<p>We get the following error&#8230;</p>
<p><strong>Error in data.frame(mat_dataframe, Country = c(&#8220;USA&#8221;, &#8220;Canada&#8221;, &#8220;Mexico&#8221;)) : arguments imply differing number of rows: 4, 3</strong></p>
<p>A rule of thumb: make sure the number of values being assigned divides into the number of rows (or columns) of the data frame.  If our data frame had 6 rows (or 9 or 12 or &#8230; ), we could have used the above code.</p>
<p>Our data frame is essentially a matrix with a couple of attached column vectors containing strings. This may not seem very useful at first, but it is a wonderful data structure, making some statistical methods among other things easier to use. Soon, I will post a basic ANOVA example using data frames.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnramey.net/2009/03/14/a-little-bit-about-data-frames-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
