CSCI-4830-002-2014 · antsankov · Nov 12, 2014 · Nov 13, 2014 · Nov 13, 2014 · Nov 13, 2014
diff --git a/README.md b/README.md
@@ -1,2 +1,141 @@
-challenge-week-12
-=================
+# Challenge Week 12 Submission Template
+
+# Reddit Data Challenges
+
+## Challenge 1
+
+![image](reddit-challenge-1.png)
+
+## Challenge 2
+
+I think it's interesting that you can find a list of the authors that have commented.This can be used to find those that are most likely to comment on something. 
+
+![image](reddit-challenge-2.png)
+
+## Challenge 3
+
+You can also find the largest number of people that have been given reddit gold. In our sample set, the max amount of gold given is 0. Cheap bastards. 
+
+![image](reddit-challenge-2.png)
+
+## Challenge 4
+
+We could track the number of gildings to understand wider trends in the Reddit community. We can find what posts and subreddits attract the most number of gildings, and why some posts are so successful yet other relatively similar ones aren't. 
+
+## Challenge 5
+
+My strategy for this problem was to look by subreddit, get the distinct commenters, and or them together between every subreddit. I was unable to complete this due to time constraints, but my answer is at:
+[challenge_5.py]
+
+## Challenge 6
+
+The fact that we only use comments with 10 upvotes, underrepresents smaller subreddits. Because there is a smaller proportion of voting users, there are much fewer posts that meet the voting cutoff. 
+
+## Challenge 7
+
+We are also biased against different times of day. There are certain times which get more upvotes than others, and others that don't. By implementing a cutoff like this we priortize certain times over others. 
+
+## Challenge 8
+
+We can prove this bias by making a cutoff proportional to the subreddit size and then compare the differences between this new dataset and our current dataset. 
+
+# Yelp and Weather 
+
+## Challenge 1
+```
+db.percipitation.aggregate([ 
+	{$match: {"DATE" : /20100425.*/}},
+	{ 
+	    $group: { 
+	        _id: null, 
+	        total: { 
+	            $sum: "$HPCP" 
+	        } 
+	    } 
+	} ] )
+
+RESULT
+
+{ "_id" : null, "total" : 62 }
+```
+
+![image](yelp-challenge-1.png)
+
+## Challenge 2
+```
+db.normals.aggregate([ 
+	{$match: {"STATION_NAME": "LAS VEGAS MCCARRAN INTERNATIONAL AIRPORT NV US"}},
+	{$match: {"DATE" : /20100425.*/}},
+	{ 
+	    $group: { 
+	        _id: null, 
+	        total: { 
+	            $avg: "$HLY-WIND-AVGSPD" 
+	        } 
+	    } 
+	} ] )
+
+{ "_id" : null, "total" : 110.08333333333333 }
+```
+
+![image](yelp-challenge-2.png)
+## Challenge 3
+
+```
+db.business.aggregate([
+	{$match: {"city": 'Madison'}},
+	{
+		$group: {
+			_id: null, 
+			total: {
+				$sum: "$review_count"
+			}
+		}
+	}])
+{ "_id" : null, "total" : 34410 }
+```
+
+![image](yelp-challenge-3.png)
+
+
+## Challenge 4
+
+```
+db.business.aggregate([
+	{$match: {"city": /.*Vegas.*/}},
+	{
+		$group: {
+			_id: null, 
+			total: {
+				$sum: "$review_count"
+			}
+		}
+	}])
+
+{ "_id" : null, "total" : 586381 }
+```
+
+![image](yelp-challenge-5.png)
+## Challenge 5
+
+```
+db.business.aggregate([
+	{$match: {"city": 'Phoenix'}},
+	{
+		$group: {
+			_id: null, 
+			total: {
+				$sum: "$review_count"
+			}
+		}
+	}])
+
+{ "_id" : null, "total" : 200089 }
+```
+
+![image](yelp-challenge-6.png)
+
+## Challenge 7 [BONUS]
+
+[Code]
+[Answer]
diff --git a/challenge_5.py b/challenge_5.py
@@ -0,0 +1,29 @@
+import pymongo
+
+def to_list(input_cursor):
+	return_list = list(input_cursor.find({},{"subreddit":1,"user":1,"_id":0}).limit(2))
+
+# Connection to Mongo DB
+try:
+    conn=pymongo.MongoClient()
+    print "Connected successfully!!!"
+except pymongo.errors.ConnectionFailure, e:
+   print "Could not connect to MongoDB: %s" % e 
+
+db = conn.week12
+reddit = db.reddit
+
+subreddits = reddit.distinct('subreddit')
+subreddit_user = {}
+
+for subreddit in subreddits:
+	subreddit_user[subreddit] = reddit.find({'subreddit' : subreddit}, {'author':-1})
+	#make a map for a subreddit to its users
+
+for i in subreddit_user:
+	print ("reddit is: " + i + " list is: " + to_list(subreddit_user[i]))
+
+#IMPORT: To acctually get the result we can or these different group of distinct users between all of the different subreddits. This will give us the similarities between reddits, which we can then comapre to actually get an answer. 
+
+#Didn't have enough time to implement.
+
diff --git a/mongo_scratch b/mongo_scratch
@@ -0,0 +1,91 @@
+mongoimport -d week12 -c normals --type csv --file 425247.csv --headerline
+
+mongoimport -d week12 -c percipitation --type csv --file 425248.csv --headerline
+
+db.percipitation.find({"DATE" : /20100425.*/})
+
+1a QUERY 
+
+db.percipitation.aggregate([ 
+	{$match: {"DATE" : /20100425.*/}},
+	{ 
+	    $group: { 
+	        _id: null, 
+	        total: { 
+	            $sum: "$HPCP" 
+	        } 
+	    } 
+	} ] )
+
+1b. RESULT
+
+{ "_id" : null, "total" : 62 }
+
+
+2a QUERY
+
+db.normals.aggregate([ 
+	{$match: {"STATION_NAME": "LAS VEGAS MCCARRAN INTERNATIONAL AIRPORT NV US"}},
+	{$match: {"DATE" : /20100425.*/}},
+	{ 
+	    $group: { 
+	        _id: null, 
+	        total: { 
+	            $avg: "$HLY-WIND-AVGSPD" 
+	        } 
+	    } 
+	} ] )
+
+2b RESULT 
+{ "_id" : null, "total" : 110.08333333333333 }
+
+mongoimport -d week12 -c review --type json --file yelp_academic_dataset_review.json 
+
+3a 
+
+db.business.aggregate([
+	{$match: {"city": /.*Vegas.*/}},
+	{
+		$group: {
+			_id: null, 
+			total: {
+				$sum: "$review_count"
+			}
+		}
+	}])
+
+
+db.business.aggregate([
+	{$match: {"city": 'Madison'}},
+	{
+		$group: {
+			_id: null, 
+			total: {
+				$sum: "$review_count"
+			}
+		}
+	}])
+
+
+
+db.business.aggregate([
+	{$match: {"city": 'Phoenix'}},
+	{
+		$group: {
+			_id: null, 
+			total: {
+				$sum: "$review_count"
+			}
+		}
+	}])
+
+mongoimport -d week12 -c reddit --type json --file reddit_small.json
+
+
+
+
+
+
+
+
+
diff --git a/reddit-challenge-1.png b/reddit-challenge-1.png
diff --git a/reddit-challenge-2.png b/reddit-challenge-2.png
diff --git a/reddit-challenge-3.png b/reddit-challenge-3.png
diff --git a/yelp-challenge-1.png b/yelp-challenge-1.png
diff --git a/yelp-challenge-2.png b/yelp-challenge-2.png
diff --git a/yelp-challenge-3.png b/yelp-challenge-3.png
diff --git a/yelp-challenge-5.png b/yelp-challenge-5.png
diff --git a/yelp-challenge-6.png b/yelp-challenge-6.png