Learning Python – Project 3: Scrapping data from Steam’s Community Market

This Python project (using the Spyder python environment) aims to scrap data from Steam’s Community Market place for price data on about in-game video game items. For the uninitiated, Steam is… a lot of things. Mainly you can buy video games on Steam and easily play them with your friends. However, Steam has many other features. One is a “Workshop” where players of games can make mods or in-game item “skins” for their favorite gear. Your favorite gun looking a bit dull, there’s skins for that!


Semi-Automatic Pistol
Boring default skin
Sunrise SAP
Awesome rad skin

Skins do not change the item/gun/armor in any way aside from its visual appearance in game. Some games will take player made skins from the workshop and introduce them to the official game. This can be done through Steam or various other means specific to a game (such as players getting special chests to unlock). However, this process is usually done using real money one way or another. For example, the developers from a game called Rust will pick a handful of Workshop skins per week and sell them on Steam for a limited time for real life money, usually $2 – 5 per skin. When players buy a skin, a portion goes to the skin’s creator, another slice to Rust, and a bit to Steam. Other games use chests players can obtain for free by playing the game, but need to purchase a key for real life money (usually $2.50) in order to open the chest and get the skin inside.

Rust Workshop page for a rocket launcher skin by player |HypershoT|

However, what makes the skin economy is quite interesting from both a supply and demand viewpoint. Continuing with the Rust examples, skins are only available to purchase officially from Rust for one week. After that time, if a player wants a skin they have to buy it (with real money) from another player who has it. The supply of a skin (number of skins available) after the initial week is fixed. Other games that rely on chests either have the chest available for a limited time and/or set rarities to items contained with in chests so some are very common while other skins may have less than 1% chance of being found. Regardless, if you want a cool skin, this is where the Steam Community Market comes in.

Steam Community Market

Players of a game can buy and sell their in game items and skins on the market to other players. Again, all for real life money. Now, keep in mind for the demand side of this economy, all skins do is make your items in game look cooler so you can show off to your friends and opponents. These never make your items stronger or better. That being said, some item skins are so sought after they will regularly sell for hundreds to even thousands of real life dollars. People can make so much money off selling skins Steam will provide you with 1099‘s for income tax reporting. There’s money to be had out there!

Example Rust skin (Punishment Mask) that sells for hundreds of dollars

So whether you’re looking to diversity your investment portfolio with skins, do a bit of economics research on digital cosmetic items, or just wanting a programming project, this post will take you through how to pull market data from Steam about any game and its items that you’d like.


The Code

So first we need some python libraries to make our lives easier. Read the comments (#) if you’re interested in what they’re doing.

Next we need to get our login credentials. Some requests to the Market don’t need you to have an account but others do. I don’t know of a better way to get your login credentials aside from manually finding the cookie Steam puts on your computer when you login. Please comment below if you know a better way. In your browser log into Steam. Then, for Chrome, go settings > advanced > content settings > cookies > see all cookies and site data > find steamcommunity.com > find “steamLoginSecure” > and finally copy the “Content” string. Then we paste the string into Python and save it as a {dictionary} called cookie.

Next we need to get the game, or games, we want to get skins from. However, it’s not as easy as putting in the item’s name (cause I’m lazy and didn’t program such a feature). Rather, we need the game’s ID in Steam. You can easily find this by going to the game’s page in Steam, looking at the URL and find some numbers, like so:

That’s all for the setup, now we can start to get some data. This process occurs in two parts. First we ask for all the names of all the items on the Community market for a game. This is done with a simple “search” request. This request will give us item names and some basic information like an it’s current price. However, if we want more detailed information we need to use a different “pricehistory” search instead which we can only do for one item at a time.                      

Get all the item names:                   

We’ll start first by writing a for loop to loop through all our game IDs in [gameList]. Then we’ll create an empty [List] called [allItemNames] to store all the item names. Next we’ll make our first request. Requests are the key to this program and it is how we ask Steam for information via a specially constructed URL. This initial request is only to find the total number of items on the market for a given game. Steam will only return a max of 100 items at a time (even if you make count > 100) so we need to make multiple requests to get all the item names. We’ll make a request for the webpage with “requests.get” and our search URL then store it as allItemGet. We’ll then use the requests library again to get the .content of the page. Next, we use jsonto parse the content of the page into something that we can easily read and store it as allItems. Finally, to complete the finding-total-number-of-items step, we’ll find the total number of items from the [‘total_count’] key in allItems and store it as totalItems.

Now we have the total number of items, next we need to iterate through all the items listed starting from 0 in increments of 100 and getting the names of all of these items. This means making a search request for each block of 100 items. However, from search to search items may change position. We don’t want to miss these items so we loop in smaller batches (50) instead of 100. To do this we’ll create another for loop to handle this. In your search URL to Steam you can specify which position in the item list on the market you want to start with the “start=x” value. We’ll repeat the above process but this time looping through in batches of 50. Furthermore, we’ll get all 50 item’s names from the ‘results’ key in the {allItems} dictionary made when we convert the returned page into json. Next, we’ll loop through all the items in our allItems {dictionary} and pull out just the [‘hash_name’] so use for our pricehistory search. We’ll then keep appending our allItemNames list with the new names. To recap, we loop to get the basic search info for 50 items [allItems], then loop again to get the name currItem[‘hash_name’] of each of those 100 items. You’ll also notice I have a sleep timer in there. Steam limits you from making too many requests too quickly. I couldn’t find any hard data but people seem to think the limit is 40 per minute or 200 in five minutes. I was too lazy to change the (working) code after I learned this so as it stands the program will pause, via time.sleep(), for a random time between half a second and 2.5 seconds. If it ain’t broke don’t fix it. Lastly there is a print out showing which item batch the program is on and if you’re still getting good results from steam (code: 200).

Quick troubleshooting break:

It’s worth talking about these status codes and why I’m printing them. A few things could (did) go wrong at this stage. One, your cookie for Steam could expire. However, this cookie seems to be good for at least 24 hours and I’m pretty sure even longer? But, if you’re working on this over multiple days you may need to re-login to Steam and find your new cookie and paste it into cookie at the top of the program. If you key expired you will get code 400 for a bad request. Other things is that the search URL is bad which results in a bad request, code 500. Double check your item name if this happens (more on that below). However, if this does occur, the code has exceptions to handle it and keep running.

Just need a little more for the item names. Sometimes I was getting duplicates. I would guess it was because item positions changed as we made our requests, eg item 100 moved to position 101 between getting 0 100 and 101 to 200. To remove duplicates in a [list] in Python you can set a [list] to a set then back to a [list]. Lastly, we’ll use pickle to save our [allItemNames] as a txt file in your current working directory so we don’t need to run this bit of code anymore (unless new items are released).

Get all the price data:

Now we have a large [list] stored as a txt file that contains all the names of each item that’s on the steam market for a given game. Now we will iterate through every item and make a more specific request about the price history for a given item. Steam will return a ton of price history data including the median sale price and number of sales (volume) for every day the item has been for sale.

First we’ll again loop through our gameList and read in our txt file with pickle.load(). Next we’ll create a Pandas dataframe. Pandas allows us to create nice tables (dataframe) of data and do lots of math on them easily. At this point we are just creating a blank dataframe and labeling our columns to add data to as we collect it. I put in the currRun variable to keep track of which item we’re on. Games with a lot of items can take hours (dota 2 has over 30,000 items, mainly TI trading cards) so keeping track of progress is nice.

Next, we will write a loop that goes through [allItemNames]. Python has a handy for loop feature when working with [lists] where we can just say, for currItem in allItemNames:  and currItem will iterate through all of [allItemNames] as each element in the list (our item names). In other words, currItem will be our item’s name as we loop through all the items. We need to do one more thing before we make our request to Steam. The item names are stored as normal strings, like “Punishment Mask”. Now, that pesky space wont work in our URL search. We need to convert all symbols (but not letters and numbers) to ASCII before making our request. Spaces get converted to %20 and & to %26. For all the games I tested, this was sufficient. However, other games may use some other symbols that you’ll need to compensate for. Finally, we will make our request.get(), and as before, store it, get the item.content, and turn it into json.loads while naming the resulting dict {item}.

At this point in the code, I’ve cobbled together some exception catches to keep the code running if an item has some errors, such as the name is bad or other weird things occur. Again, check the print out status_code for clues. Things I saw were just empty returns or there could be an item listed but there was never a sale of it so no data was returned. These things are taken care of here and the program is told to skip these problematic items and keep on runnin’.

Everything else in the below block is a bit boring but if you’re learning Python or programming it could be useful to go over. Mainly I’m turning all the raw data into data that I want and into a more user friendly structure. First I initialize some [lists]. Then I loop through all the itemPriceData to get the price, volume, and date for every time* there was a sale of the item. I store these all in their own lists to make it easier to work with. For example, I have to convert the price and volume data from character strings into numbers to be able to do math on them. Before I get to the math part though I needed to handle the dates data. I did this using the datetime library. *The problem arises because Steam will return data from old dates as a single entry for a whole day, eg 05-03-2017 volume 10 price $5.00. However, for more recent dates you get more granular, hourly data. To make data analysis easier I averaged the sale prices from the same days and summed their volumes. Once this is done, we can create a new list of “normalized” dates that is just the number of days the item as been on the market for. With the dates thing sorted I get some basic info like the max() price and on which day (index()) it occurred. Then I did some basic math using the NumPy library such as getting the overall average price, the standard deviation, and how much the price changed from day 1 (which is indexed to [0]) until the most recent sale (the last value in our [list]). Whereas the first data point is at index [0], a trick in python to access the last element in a list is to index to [-1]. Other things included the slope of a linear regression from the SciPy.stats library to see how the item price changed over time.

Lastly, we just need to save all of this particular item’s data. The easiest way I knew how to do this was to make a {dict} which could then be easily turned into a Pandas dataframe. Then we could use the data from from this one item and append it to the master allItemsPD dataframe which will store the data from every single item as we loop through the names.
Don’t forget to let the program (and Steam’s servers) take a quick nap with time.sleep() again! Once all the items’ data have been collected, we can pickle the allItemsPD as a .pkl file so we don’t need to do this arduous data collection again (unless you need updated price information). Pickle will save your data in the Pandas dataframe format so when you load it again, all your data is still a dataframe.

Conclusions:

Once the program finishes you will have a dataframe with all the data we saved from currentItemDict for every item name we collected. To work with this data again just open the pickle file for a particular game and get analyzing. I always use a game’s ID in the file names to keep track of which file is which.
A second, less Python-centric post will follow with my analysis of the Rust skin market. I tried to run this on Dota2 but there was so many items it took many hours to get the data and I kept getting errors so I gave up. I did test PUBG as well and it worked fine. The code in its entirety is below and on gitHubso you can just copy+paste it into Python and, assuming you have all the libraries installed, should just run.

Analysis sneak peak; put all your savings into Rust skins?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.