Webscraping, algorithms, and end of metis week 2!
Approaching the end of Metis Data Science Bootcamp, week 2.
I just hit go on downloading 5000 webpages off steampowered.com (a PC store/hosting service, “Steam”), that I will scrape tomorrow to get the data for my Metis “Luther” project. They have a surpisingly thurough ‘stats’ subdivision of their website for me to pull data from. I even set a delay of 5 seconds between pulling each page in hopes of avoid getting booted off the site. We will see in the morning how it worked out. [update: got denied after the first 300 but found a work-around Friday morning by randomizig my access interval]
If all goes well I will have quite a bit of data to use as possible ‘features’ (variables) in my prediction model. I’m at around 30 now, and haven’t finished identifying them all, though I have code to pull almost all of them now and will get the last couple Friday. I expect 2-4 will be much more meaningful than the others, but part of the exercise here is to explore, particulary with algorithms and models which can handle large number of model features.
I’m pretty excited about getting to apply some of the stocastic gradient descent and other algorithms we have been learning. And to find out about the new ones we will learn tomorrow and next week. As amusing as it might be, I don’t see much of a future in web-scraping (though who knows?).
This morning Skip and I inadvertently solved our pair programing problem by ‘inventing’ a gradient descent algorithm, which unbeknowist to us was the basis of our morning lectures.
Tomorrow I am slated to give a short presentation on Self-Driving Cars. Much too broad a topic for a 10-15 minute talk as it turns out, so I had to focus on a few areas. It was suprisingly hard to get specifics on the algorithms used, partly because it isn’t really a solved problem and new things are always being tried; and partly because many of the groups having the most success aren’t about to give all their secrets away just yet. It would be a dream job getting to apply some of this data science to a system like that for a living, even more so for something with more freedom than a self-driving car like a drone or ROV. Probably need to have a PhD. in machine vision for that kind of stuff, but who knows… [update: presentation went long, but had lots of thoughtful questions from the audience, and I think it went very well considering.]
Anyway, good night. I am going to get some sleep while my computer is scraping the night away…