<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Ramey &#187; Statistics</title>
	<atom:link href="http://www.johnramey.net/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johnramey.net</link>
	<description>Don&#039;t think...compute.</description>
	<lastBuildDate>Wed, 07 Jul 2010 20:34:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Economic Impact of Wal-Mart</title>
		<link>http://www.johnramey.net/2010/04/03/economic-impact-of-walmart/</link>
		<comments>http://www.johnramey.net/2010/04/03/economic-impact-of-walmart/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 22:43:32 +0000</pubDate>
		<dc:creator>johnramey</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[antitrust]]></category>
		<category><![CDATA[monopolies]]></category>
		<category><![CDATA[spatialstatistics]]></category>
		<category><![CDATA[walmart]]></category>

		<guid isPermaLink="false">http://www.johnramey.net/?p=251</guid>
		<description><![CDATA[I remember discussing monopolies and antitrusts in my high school economics class. In fact, my major high school paper was describing the (evil) monopoly Microsoft, back when the Internet Explorer fight was occurring. I learned a lot about these big businesses at the time, and I was able to see the Wal-Mart corporation as a [...]]]></description>
			<content:encoded><![CDATA[<p>I remember discussing monopolies and antitrusts in my high school economics class.  In fact, my major high school paper was describing the (evil) monopoly Microsoft, back when the Internet Explorer fight was occurring. I learned a lot about these big businesses at the time, and I was able to see the Wal-Mart corporation as a potentially good thing, but as well as a potentially bad thing.</p>
<p>Some have hypothesized that Wal-Mart starves the Mom-n-Pop shops, preventing any chance for the little guys to compete against a department/electronics/car repair/gas station/whatever else Wal-Mart can do nowadays.  However, near them are many other stores that profit on the customers going to and leaving Wal-Mart.  As a statistician, this problem excites me because I would like data to settle this matter.</p>
<p>I am new to spatial statistics, so I am not looking for the most complex model that will really answer this question.  The model I am considering is simplistic by design and has much room for improvement.  This is actually the project on which I am working for my spatial statistics class.</p>
<p>One local economic indicator is the state (or county) unemployment rate, and because this <a href="http://www.bls.gov/lau/">information is readily available</a>, I am using it as the response in my model.  For now, I am not considering a spatio-temporal model, where I might consider the unemployment rate over time: like I said, simplistic!  For predictor variables, I am going to first look at the number of local Wal-Marts in each state and in each county.  Eventually, I will look at more information about these such as the number of &#8220;supercenters&#8221; apart from the number of &#8220;neighborhood markets.&#8221;  Also, looking at the opening date for each Wal-Mart would be of interest in the spatio-temporal model, but this is ignored for now, again for simplicity.</p>
<p>Over the next few days, I will be posting the code for <a href="http://en.wikipedia.org/wiki/Web_scraping">scraping</a> the Wal-Mart covariate data as well as the R code for the spatial analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnramey.net/2010/04/03/economic-impact-of-walmart/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Little Bit About Data Frames in R</title>
		<link>http://www.johnramey.net/2009/03/14/a-little-bit-about-data-frames-in-r/</link>
		<comments>http://www.johnramey.net/2009/03/14/a-little-bit-about-data-frames-in-r/#comments</comments>
		<pubDate>Sat, 14 Mar 2009 19:49:01 +0000</pubDate>
		<dc:creator>johnramey</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[data frames]]></category>

		<guid isPermaLink="false">http://www.johnramey.net/?p=155</guid>
		<description><![CDATA[Data frames in R are much like DataSets in SAS, SPSS, .NET, etc. Really, they are just spreadsheets that feel like a matrices. We can use these to look at numerical data along with any meta data or characteristics associated with the numbers though numbers are not required. From the R Documentation, &#8220;a data frame [...]]]></description>
			<content:encoded><![CDATA[<p>Data frames in R are much like DataSets in SAS, SPSS, .NET, etc.  Really, they are just spreadsheets that feel like a matrices. We can use these to look at numerical data along with any meta data or characteristics associated with the numbers though numbers are not required.  From the R Documentation, &#8220;a data frame is a list of variables of the same length with unique row names&#8221;, and also it is &#8220;a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on)&#8221;.</p>
<p>Let&#8217;s take a look at an example.  First, we start with generating a 3 x 3 identity matrix and assigning the matrix to the variable, <strong>mat</strong>.</p>
<p>[code]<br />
mat = diag( 3 )<br />
[/code]</p>
<p>By typing <strong>mat</strong>, we can see the output.</p>
<pre><strong>     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
</strong></pre>
<p>Next, we are going to convert this matrix to a data frame called <strong>mat_dataframe</strong> and output it.</p>
<p>[code] mat_dataframe = data.frame( mat )[/code]</p>
<p><strong> </strong></p>
<pre><strong>  X1 X2 X3
1  1  0  0
2  0  1  0
3  0  0  1
</strong></pre>
<p><strong></strong></p>
<p>Notice that the column names are <strong>X1</strong>, <strong>X2</strong>, and <strong>X3</strong> and that the row names are <strong>1</strong>, <strong>2</strong>, and <strong>3</strong>. Say we want to add more columns and rows to our data frame. Let&#8217;s first start by appending a row to the &#8220;mat_dataframe.&#8221;  We do this with <strong>rbind</strong>.</p>
<p>[code]<br />
mat_dataframe = rbind(mat_dataframe, c(2,2,2))<br />
[/code]</p>
<p>We have added a vector of twos to the next row of the data frame.  Here&#8217;s what <strong>mat_dataframe</strong> looks like so far.</p>
<pre><strong>  X1 X2 X3
1  1  0  0
2  0  1  0
3  0  0  1
4  2  2  2</strong></pre>
<p>Now, we should try appending 2 columns to the <strong>mat_dataframe</strong> using 2 different methods. The first line will create a new data frame from the original data frame and append a column called &#8220;City&#8221; with &#8220;Dallas&#8221; as the entry for each row. The second takes this data frame and adds another column called <strong>Color</strong> with entries <strong>blue</strong> and <strong>green</strong>.</p>
<p>[code]<br />
mat_dataframe = data.frame( mat_dataframe, City="Dallas" )<br />
mat_dataframe = cbind( mat_dataframe, Color=c( "blue", "green" ) )<br />
[/code]</p>
<p>Now, the <strong>mat_dataframe</strong> looks like this.</p>
<pre><strong><strong>  X1 X2 X3   City Color
1  1  0  0 Dallas  blue
2  0  1  0 Dallas green
3  0  0  1 Dallas  blue
4  2  2  2 Dallas green</strong></strong></pre>
<p>Notice that once <strong>blue</strong> and <strong>green</strong> were both used, they were both repeated. Before we move on, let me mention a gotcha when adding columns.  On the <strong>City</strong> column, I simply inserted <strong>Dallas</strong> for each row, but under the <strong>Color</strong> column, I added 2 different colors.  What happens if we specify three values? Let&#8217;s try this with a new column called <strong>Country</strong>.</p>
<p>[code]<br />
mat_dataframe = data.frame( mat_dataframe, Country=c( "USA", "Canada", "Mexico" ) )<br />
[/code]</p>
<p>We get the following error&#8230;</p>
<p><strong>Error in data.frame(mat_dataframe, Country = c(&#8220;USA&#8221;, &#8220;Canada&#8221;, &#8220;Mexico&#8221;)) : arguments imply differing number of rows: 4, 3</strong></p>
<p>A rule of thumb: make sure the number of values being assigned divides into the number of rows (or columns) of the data frame.  If our data frame had 6 rows (or 9 or 12 or &#8230; ), we could have used the above code.</p>
<p>Our data frame is essentially a matrix with a couple of attached column vectors containing strings. This may not seem very useful at first, but it is a wonderful data structure, making some statistical methods among other things easier to use. Soon, I will post a basic ANOVA example using data frames.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnramey.net/2009/03/14/a-little-bit-about-data-frames-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zero Factorial</title>
		<link>http://www.johnramey.net/2009/02/11/zero-factorial/</link>
		<comments>http://www.johnramey.net/2009/02/11/zero-factorial/#comments</comments>
		<pubDate>Wed, 11 Feb 2009 06:11:14 +0000</pubDate>
		<dc:creator>johnramey</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnramey.net/?p=89</guid>
		<description><![CDATA[I must have had this epiphany several times in the last few years, but I had it again this evening while being introduced to the dark side.  Many students are taught that [latex]n! = n \times (n-1) \times \ldots 2 \times 1[/latex] and that 0! is defined as 1.  But is there some higher level [...]]]></description>
			<content:encoded><![CDATA[<p>I must have had this epiphany several times in the last few years, but I had it again this evening while being introduced to the <a href="http://en.wikipedia.org/wiki/Bayesian_probability">dark side</a>.  Many students are taught that</p>
<p style="text-align: center;">[latex]n! = n \times (n-1) \times \ldots 2 \times 1[/latex]</p>
<p>and that 0! is defined as 1.  But is there some higher level mathematics in which this factorial idea is just a corollary as with most pre-graduate school mathematics.  Well, there is!  This is the <a href="http://en.wikipedia.org/wiki/Gamma_function">epic Gamma function</a>.  It is defined as</p>
<p style="text-align: center;">[latex]\Gamma( \alpha ) = \int_0^{\infty} y^{\alpha &#8211; 1} e^{-y} dy[/latex].</p>
<p>If [latex]\alpha[/latex] is a positive integer, then [latex]\Gamma( \alpha ) = (\alpha &#8211; 1)![/latex].</p>
<p>Here is the main event.  Suppose [latex]\alpha = 1[/latex]; because [latex]\alpha[/latex] is a positive integer, [latex]\Gamma( \alpha ) = 0![/latex]  But then, notice that</p>
<p style="text-align: center;">[latex]\Gamma( 1 ) = \int_0^{\infty} y^{ 1 &#8211; 1 } e^{-y} dy = \int_0^{\infty} e^{-y} dy = 1[/latex].</p>
<p>Hence, our conclusion is that 0! = 1.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnramey.net/2009/02/11/zero-factorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WTB [Probability Distribution Charts] PST</title>
		<link>http://www.johnramey.net/2009/01/31/wtb-probability-distribution-charts-pst/</link>
		<comments>http://www.johnramey.net/2009/01/31/wtb-probability-distribution-charts-pst/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 00:23:21 +0000</pubDate>
		<dc:creator>johnramey</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Random]]></category>

		<guid isPermaLink="false">http://www.johnramey.net/?p=50</guid>
		<description><![CDATA[My recent goal is to surround myself with things in my cubicle that will remind me of my work, so that when I inevitably become distracted, many &#8220;get back to work&#8221; signs act as a sometimes much needed slavedriver.  Among these signs are charts of probability distributions and their properties, but first, I needed to [...]]]></description>
			<content:encoded><![CDATA[<p>My recent goal is to surround myself with things in my cubicle that will remind me of my work, so that when I inevitably become distracted, many &#8220;get back to work&#8221; signs act as a sometimes much needed slavedriver.  Among these signs are charts of <a href="http://en.wikipedia.org/wiki/Probability_distribution">probability distributions</a> and their properties, but first, I needed to find a good set of them &#8212; concise but all the information I need is presented.  So, I Google&#8217;d for &#8220;<strong>lists of pdfs with properties</strong>,&#8221; forgetting momentarily that PDFs refer to more than just Probability Density (or Distribution) Functions.  I changed my query to  &#8220;<strong>probability distributions with properties</strong>&#8221; and was stunned to see Google&#8217;s following suggestion.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-51" title="google-search-for-probability-distributions" src="http://www.johnramey.net/wp-content/uploads/2009/01/google-search-for-probability-distributions.png" alt="google-search-for-probability-distributions" width="587" height="42" /></p>
<p style="text-align: left;">After being curious and clicking the 3rd of these links, I have manually turned on Google&#8217;s safe search feature for the first time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnramey.net/2009/01/31/wtb-probability-distribution-charts-pst/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
