6 Mar 2019

Learning Python – Project 3: Scrapping data from Steam’s Community Market

by Dr. Blake Porter | posted in: Computer Science, Learning Python, Video games | 51

Note: As of September 30th, 2023, this code works for Counter Strike 2 with no modifications needed. The gameID for CS2 is 730, the same as CSGO.

This Python project (using the Spyder python environment) aims to scrap data from Steam’s Community Market place for price data on about in-game video game items. For the uninitiated, Steam is… a lot of things. Mainly you can buy video games on Steam and easily play them with your friends. However, Steam has many other features. One is a “Workshop” where players of games can make mods or in-game item “skins” for their favorite gear. Your favorite gun looking a bit dull, there’s skins for that!

Semi-Automatic Pistol — Boring default skin

Skins do not change the item/gun/armor in any way aside from its visual appearance in game. Some games will take player made skins from the workshop and introduce them to the official game. This can be done through Steam or various other means specific to a game (such as players getting special chests to unlock). However, this process is usually done using real money one way or another. For example, the developers from a game called Rust will pick a handful of Workshop skins per week and sell them on Steam for a limited time for real life money, usually $2 – 5 per skin. When players buy a skin, a portion goes to the skin’s creator, another slice to Rust, and a bit to Steam. Other games use chests players can obtain for free by playing the game, but need to purchase a key for real life money (usually $2.50) in order to open the chest and get the skin inside.

Rust Workshop page for a rocket launcher skin by player |HypershoT|

However, the skin economy is quite interesting from both a supply and demand viewpoint. Continuing with the Rust examples, skins are only available to purchase officially from Rust for one week. After that time, if a player wants a skin they have to buy it (with real money) from another player who has it. The supply of a skin (number of skins available) after the initial week is fixed. Other games that rely on chests either have the chest available for a limited time and/or set rarities to items contained with in chests so some are very common while other skins may have less than 1% chance of being found. Regardless, if you want a cool skin, this is where the Steam Community Market comes in (as opposed to Steam’s own store).

Players of a game can buy and sell their in game items and skins on the market to other players. Again, all for real life money. Now, keep in mind for the demand side of this economy, all skins do is make your items in game look cooler so you can show off to your friends and opponents. These never make your items stronger or better. That being said, some item skins are so sought after they will regularly sell for hundreds to even thousands of real life dollars. People can make so much money off selling skins Steam will provide you with 1099‘s for income tax reporting. There’s money to be had out there!

Example Rust skin (Punishment Mask) that sells for hundreds of dollars

So whether you’re looking to diversity your investment portfolio with skins, do a bit of economics research on digital cosmetic items, or just wanting a programming project, this post will take you through how to pull market data from Steam about any game and its items that you’d like.

The Code

So first we need some python libraries to make our lives easier. Read the comments (#) if you’re interested in what they’re doing.

import requests # make http requests
import json # make sense of what the requests return
import pickle # save our data to our computer

import pandas as pd # structure out data
import numpy as np # do a bit of math
import scipy.stats as sci # do a bit more math

from datetime import datetime # make working with dates 1000x easier 
import time # become time lords
import random # create random numbers (probably not needed)

Next we need to get our login credentials. Some requests to the Market don’t need you to have an account but others do. I don’t know of a better way to get your login credentials aside from manually finding the cookie Steam puts on your computer when you login. Please comment below if you know a better way. In your browser log into Steam.
New method: for Chrome, as of March 2023, go to: more tools > developer tools > click the Application tab (top, may need to scroll to the right some), on the left panel, go to Storage > Cookies > https://store.steampowered.com/ > on the right panel find the steamLoginSecure and copy that huge block of random letters and numbers.
Old method: for Chrome (updated Feb 2021), go settings > privacy and security > cookies and other site data > see all cookies and site data (small text) > search steamcommunity.com > find “steamLoginSecure” > and finally copy the “Content” string.
Then we paste the string into Python and save it as a {dictionary} called cookie.

cookie = {'steamLoginSecure': '123451234512345%ABC%ABC%123%123456ABC12345'};

Next we need to get the game, or games, we want to get skins from. However, it’s not as easy as putting in the item’s name (cause I’m lazy and didn’t program such a feature). Rather, we need the game’s ID in Steam. You can easily find this by going to the game’s page in Steam, looking at the URL and find some numbers, like so:

gameList = ['252490','578080']; # this is for rust[0] and PUBG[1]

That’s all for the setup, now we can start to get some data. This process occurs in two parts. First we ask for all the names of all the items on the Community market for a game. This is done with a simple “search” request. This request will give us item names and some basic information like it’s current price. However, if we want more detailed information we need to use a different “pricehistory” search instead which we can only do for one item at a time.

Get all the item names:

We’ll start first by writing a for loop to loop through all our game IDs in [gameList]. Then we’ll create an empty [List] called [allItemNames] to store all the item names. Next we’ll make our first request. Requests are the key to this program and it is how we ask Steam for information via a specially constructed URL. This initial request is only to find the total number of items on the market for a given game. Steam will only return a max of 100 items at a time (even if you make count > 100) so we need to make multiple requests to get all the item names. We’ll make a request for the webpage with “requests.get” and our search URL then store it as allItemGet. We’ll then use the requests library again to get the .content of the page. Next, we use jsonto parse the content of the page into something that we can easily read and store it as allItems. Finally, to complete the finding-total-number-of-items step, we’ll find the total number of items from the [‘total_count’] key in allItems and store it as totalItems.

for gameID in gameList:
    # itialize
    allItemNames = [];
    
    # find total number items
    allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=100', cookies=cookie); # get page
    allItems = allItemsGet.content; # get page content
    
    allItems = json.loads(allItems); # convert to JSON
    totalItems = allItems['total_count']; # get total count

Now we have the total number of items, next we need to iterate through all the items listed starting from 0 in increments of 100 and getting the names of all of these items. This means making a search request for each block of 100 items. However, I found out the hard way that items change position so often that between searches of 100 item batches, many items get missed. We don’t want to miss these items so we loop in smaller batches (50) instead of 100. To do this we’ll create another for loop to handle this. In your search URL to Steam you can specify which position in the item list on the market you want to start with the “start=x” value. We’ll repeat the above process but this time looping through in batches of 50. Furthermore, we’ll get all 50 item’s names from the ‘results’ key in the {allItems} dictionary made when we convert the returned page into json. Next, we’ll loop through all the items in our allItems {dictionary} and pull out just the [‘hash_name’] so use for our pricehistory search. We’ll then keep appending our allItemNames list with the new names. To recap, we loop to get the basic search info for 50 items [allItems], then loop again to get the name currItem[‘hash_name’] of each of those 100 items. You’ll also notice I have a sleep timer in there. Steam limits you from making too many requests too quickly. I couldn’t find any hard data but people seem to think the limit is 40 per minute or 200 in five minutes. I was too lazy to change the (working) code after I learned this so as it stands the program will pause, via time.sleep(), for a random time between half a second and 2.5 seconds. If it ain’t broke don’t fix it. Lastly there is a print out showing which item batch the program is on and if you’re still getting good results from steam (code: 200).

    # you can only get 100 items at a time (despite putting in count= >100)
    # so we have to loop through in batches of 100 to get every single item name by specifying the start position
    for currPos in range(0,totalItems+50,50): # loop through all items
        time.sleep(random.uniform(0.5, 2.5)) # you cant make requests too quickly or steam gets mad
        
    # get item name of each
        allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?start='+str(currPos)+'&count=100&search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=5000', cookies=cookie);
        print('Items '+str(currPos)+' out of '+str(totalItems)+' code: '+str(allItemsGet.status_code)) # reassure us the code is running and we are getting good returns (code 200)
        
        allItems = allItemsGet.content;
        allItems = json.loads(allItems);
        allItems = allItems['results'];
        for currItem in allItems: 
            allItemNames.append(currItem['hash_name']) # save the names

Quick troubleshooting break:

It’s worth talking about these status codes and why I’m printing them. A few things could (did) go wrong at this stage. One, your cookie for Steam could expire. However, this cookie seems to be good for at least 24 hours and I’m pretty sure even longer? But, if you’re working on this over multiple days you may need to re-login to Steam and find your new cookie and paste it into cookie at the top of the program. If you key expired you will get code 400 for a bad request. Other things is that the search URL is bad which results in a bad request, code 500. Double check your item name if this happens (more on that below). However, if this does occur, the code has exceptions to handle it and keep running.

Just need a little more for the item names. Sometimes I was getting duplicates. I would guess it was because item positions changed as we made our requests, eg item 100 moved to position 101 between getting 0 100 and 101 to 200. To remove duplicates in a [list] in Python you can set a [list] to a set then back to a [list]. Lastly, we’ll use pickle to save our [allItemNames] as a txt file in your current working directory so we don’t need to run this bit of code anymore (unless new items are released).

    allItemNames = list(set(allItemNames))
    
    # Save all the name so we don't have to do this step anymore
    # use pickle to save all the names so i dont have to keep running above code
    with open(gameID+'ItemNames.txt', "wb") as file: # change the text file name to whatever you want
        pickle.dump(allItemNames, file)

Get all the price data:

Now we have a large [list] stored as a txt file that contains all the names of each item that’s on the steam market for a given game. Now we will iterate through every item and make a more specific request about the price history for a given item. Steam will return a ton of price history data including the median sale price and number of sales (volume) for every day the item has been for sale.

First we’ll again loop through our gameList and read in our txt file with pickle.load(). Next we’ll create a Pandas dataframe. Pandas allows us to create nice tables (dataframe) of data and do lots of math on them easily. At this point we are just creating a blank dataframe and labeling our columns to add data to as we collect it. I put in the currRun variable to keep track of which item we’re on. Games with a lot of items can take hours (dota 2 has over 30,000 items, mainly TI trading cards) so keeping track of progress is nice.

for gameID in gameList:
    # open file with all names
    with open(gameID+'ItemNames.txt', "rb") as file:   # Unpickling
       allItemNames = pickle.load(file)
    
    # intialize our Panda's dataframe with the data we want from each item
    allItemsPD = pd.DataFrame(data=None,index=None,columns = ['itemName','initial','timeOnMarket','priceIncrease','priceAvg','priceSD','maxPrice','maxIdx','minPrice','minIdx','swing','volAvg','volSD','slope','rr']);
    currRun = 1; # to keep track of the program running

Next, we will write a loop that goes through [allItemNames]. Python has a handy for loop feature when working with [lists] where we can just say, for currItem in allItemNames: and currItem will iterate through all of [allItemNames] as each element in the list (our item names). In other words, currItem will be our item’s name as we loop through all the items. We need to do one more thing before we make our request to Steam. The item names are stored as normal strings, like “Punishment Mask”. Now, that pesky space wont work in our URL search. We need to convert all symbols (but not letters and numbers) to ASCII before making our request. Spaces get converted to %20 and & to %26. For all the games I tested, this was sufficient. However, other games may use some other symbols that you’ll need to compensate for. Finally, we will make our request.get(), and as before, store it, get the item.content, and turn it into json.loads while naming the resulting dict {item}.

    for currItem in allItemNames: # go through all item names
        # need to encode symbols into ASCII for http (https://www.w3schools.com/tags/ref_urlencode.asp)
        currItemHTTP = currItem.replace(' ','%20'); # convert spaces to %20
        currItemHTTP = currItemHTTP.replace('&','%26'); # convert & to %26
        # I was lazy there's probably others but I catch this below
        item = requests.get('https://steamcommunity.com/market/pricehistory/?appid='+gameID+'&market_hash_name='+currItemHTTP, cookies=cookie); # get item data
        print(str(currRun),' out of ',str(len(allItemNames))+' code: '+str(item.status_code));
        currRun += 1;
        item = item.content;
        item = json.loads(item);

At this point in the code, I’ve cobbled together some exception catches to keep the code running if an item has some errors, such as the name is bad or other weird things occur. Again, check the print out status_code for clues. Things I saw were just empty returns or there could be an item listed but there was never a sale of it so no data was returned. These things are taken care of here and the program is told to skip these problematic items and keep on runnin’.

        if item: # did we even get any data back
            itemPriceData = item['prices'] # is there price data?
            if itemPriceData == False or not itemPriceData: # if there was an issue with the request then data will return false and the for loop will just continue to the next item
                continue               # this could be cause the http item name was weird (eg symbol not converted to ASCII) but it will also occur if you make too many requests too fast (this is handled below)
            else:

Everything else in the below block is a bit boring but if you’re learning Python or programming it could be useful to go over. Mainly I’m turning all the raw data into data that I want and into a more user friendly structure. First I initialize some [lists]. Then I loop through all the itemPriceData to get the price, volume, and date for every time* there was a sale of the item. I store these all in their own lists to make it easier to work with. For example, I have to convert the price and volume data from character strings into numbers to be able to do math on them. Before I get to the math part though I needed to handle the dates data. I did this using the datetime library. *The problem arises because Steam will return data from old dates as a single entry for a whole day, eg 05-03-2017 volume 10 price $5.00. However, for more recent dates you get more granular, hourly data. To make data analysis easier I averaged the sale prices from the same days and summed their volumes. Once this is done, we can create a new list of “normalized” dates that is just the number of days the item as been on the market for. With the dates thing sorted I get some basic info like the max() price and on which day (index()) it occurred. Then I did some basic math using the NumPy library such as getting the overall average price, the standard deviation, and how much the price changed from day 1 (which is indexed to [0]) until the most recent sale (the last value in our [list]). Whereas the first data point is at index [0], a trick in python to access the last element in a list is to index to [-1]. Other things included the slope of a linear regression from the SciPy.stats library to see how the item price changed over time.

                # initialize stuff
                itemPrices = []; # steam returns MEDIAN price for given time bin
                itemVol = [];
                itemDate = [];
                for currDay in itemPriceData: # pull out the actual data
                    itemPrices.append(currDay[1]); # idx 1 is price
                    itemVol.append(currDay[2]); # idx 2 is volume of items sold
                    itemDate.append(datetime.strptime(currDay[0][0:11], '%b %d %Y')) # idx 0 is the date
                
                # lists are strings, convert to numbers
                itemPrices = list(map(float, itemPrices));
                itemVol = list(map(int, itemVol));
                
                # combine sales that occurs on the same day
                # avg prices, sum volume
                # certainly not the best way to do this but, whatever
                for currDay in range(len(itemDate)-1,1,-1): # start from end (-1) and go to start
                    if itemDate[currDay] == itemDate[currDay-1]: # if current element's date same as the one before it
                        itemPrices[currDay-1] = np.mean([itemPrices[currDay],itemPrices[currDay-1]]); # average prices from the two days
                        itemVol[currDay-1] = np.sum([itemVol[currDay],itemVol[currDay-1]]); # sum volume
                        # delete the repeats
                        del itemDate[currDay] 
                        del itemVol[currDay] 
                        del itemPrices[currDay]
                
                # now that days are combined
                normTime = list(range(0,len(itemPrices))); # create a new list that "normalizes" days from 0 to n, easier to work with than datetime
                
                # some basic data
                timeOnMarket = (datetime.today()-itemDate[0]).days; # have to do this because if sales are spare day[0] could be months/years ago
                priceIncrease = itemPrices[-1] -itemPrices[0]; # what was the price increase from day 0 to the most recent day [-1]
                maxPrice = max(itemPrices); # max price
                maxIdx = itemPrices.index(maxPrice); # when was the max price?
                minPrice = min(itemPrices);
                minIdx = itemPrices.index(minPrice);
                swing = maxPrice - minPrice; # greatest price swing
                
                # get some descriptive stats
                itemPriceAvg = np.mean(itemPrices); # average price
                if len(itemPrices) > 1: # make sure there is at least two days of sales
                    itemPriceInitial = itemPrices[1] - itemPrices[0]; # how much did the price jump from day 0 to 1? eg the first trading day
                else:
                    itemPriceInitial = itemPrices[0];
                itemVolAvg = np.mean(itemVol);
                
                itemPriceSD = np.std(itemPrices);
                itemVolSD = np.std(itemVol);
                
                
                # linear regression to find slope and fit
                fitR = sci.linregress(normTime,itemPrices); # slope intercept rvalue pvalue stderr
                RR = float(fitR[2]**2); # convert to R^2 value

Lastly, we just need to save all of this particular item’s data. The easiest way I knew how to do this was to make a {dict} which could then be easily turned into a Pandas dataframe. Then we could use the data from from this one item and append it to the master allItemsPD dataframe which will store the data from every single item as we loop through the names.
Don’t forget to let the program (and Steam’s servers) take a quick nap with time.sleep() again! Once all the items’ data have been collected, we can pickle the allItemsPD as a .pkl file so we don’t need to do this arduous data collection again (unless you need updated price information). Pickle will save your data in the Pandas dataframe format so when you load it again, all your data is still a dataframe.

                
                
                # save data 
                currentItemDict = {'itemName':currItem,'initial':itemPriceInitial,'timeOnMarket':timeOnMarket,'priceIncrease':priceIncrease,'priceAvg':itemPriceAvg,'priceSD':itemPriceSD,'maxPrice':maxPrice,'maxIdx':maxIdx,'minPrice':minPrice,'minIdx':minIdx,'swing':swing,'volAvg':itemVolAvg,'volSD':itemVolSD,'slope':fitR[0],'rr':RR}
                currItemPD = pd.DataFrame(currentItemDict,index=[0]);
                allItemsPD= allItemsPD.append(currItemPD,ignore_index=True);
                
                
                time.sleep(random.uniform(0.5, 2.5))
        else:
            continue
print('All item data collected')

# save the dataframe
allItemsPD.to_pickle(gameID+'PriceData.pkl');

Conclusions:

Once the program finishes you will have a dataframe with all the data we saved from currentItemDict for every item name we collected. To work with this data again just open the pickle file for a particular game and get analyzing. I always use a game’s ID in the file names to keep track of which file is which.
A second, less Python-centric post will follow with my analysis of the Rust skin market. I tried to run this on Dota2 but there was so many items it took many hours to get the data and I kept getting errors so I gave up. I did test PUBG as well and it worked fine. The code in its entirety is below and on gitHubso you can just copy+paste it into Python and, assuming you have all the libraries installed, should just run.

Analysis sneak peak; put all your savings into Rust skins?

Update August 11th 2019: So sorry! The widget I was using to display code was converting the code to HTML. Mainly, this caused the “&” to turn into “& amp;”. The caused the search URL to be incorrect and cause errors, esp on line 64. I have fixed this now. But please make sure the URLs on line 46, 59, and 100 only have “&” in the string and not “& amp;”. This is kinda funny since it’s the opposite problem we solve on line 97 and 98! If you are getting other errors always try the GitHub code as a plan B. https://github.com/blakeporterneuro/steamMarket

# -*- coding: utf-8 -*-
"""
Created on Thu Feb 28 22:53:32 2019

@author: Blake Porter
www.blakeporterneuro.com

This code is made available with the CC BY-NC-SA 4.0 license 
https://creativecommons.org/licenses/by-nc-sa/4.0/

This script is broken into three parts
1) Item name collection
    • Get the name of every item on the steam market place for a given game
2) Item price data collection
    • Loop through the item names to get the price history for each item
3) Data analysis
"""


import requests # make http requests
import json # make sense of what the requests return
import pickle # save our data to our computer

import pandas as pd # structure out data
import numpy as np # do a bit of math
import scipy.stats as sci # do a bit more math

from datetime import datetime # make working with dates 1000x easier 
import time # become time lords
import random # create random numbers (probably not needed)

import matplotlib.pyplot as plt # plot data
import seaborn as sns # make plotted data pretty

# Login to steam on your browser and get your steam login cookie 
# For Chrome, settings > advanced > content settings > cookies > see all cookies and site data > find steamcommunity.com > find "steamLoginSecure" > copy the "Content" string and paste below
cookie = {'steamLoginSecure': '123451234512345%ABC%ABC%123%123456ABC12345'};

# gameList as a string or list of strings 
# rust, 252490, dota2, 570; CSGO, 730; pubg, 578080; TF2, 440; 
# you can find the app id by going to the community market and finding the appid=##### in the URL
gameList = ['252490','578080'];

for gameID in gameList:
    # itialize
    allItemNames = [];
    
    # find total number items
    allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=100', cookies=cookie); # get page
    allItems = allItemsGet.content; # get page content
    
    allItems = json.loads(allItems); # convert to JSON
    totalItems = allItems['total_count']; # get total count
    
    
    # you can only get 100 items at a time (despite putting in count= >100)
    # so we have to loop through in batches of 100 to get every single item name by specifying the start position
    for currPos in range(0,totalItems+50,50): # loop through all items
        time.sleep(random.uniform(0.5, 2.5)) # you cant make requests too quickly or steam gets mad
        
    # get item name of each
        allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?start='+str(currPos)+'&count=100&search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=5000', cookies=cookie);
        print('Items '+str(currPos)+' out of '+str(totalItems)+' code: '+str(allItemsGet.status_code)) # reassure us the code is running and we are getting good returns (code 200)
        
        allItems = allItemsGet.content;
        allItems = json.loads(allItems);
        allItems = allItems['results'];
        for currItem in allItems: 
            allItemNames.append(currItem['hash_name']) # save the names
            
        
    # remove dupes by converting from list to set and back again
    allItemNames = list(set(allItemNames))
    
    # Save all the name so we don't have to do this step anymore
    # use pickle to save all the names so i dont have to keep running above code
    with open(gameID+'ItemNames.txt', "wb") as file: # change the text file name to whatever you want
        pickle.dump(allItemNames, file)
    

    """
    ~*~*~*~*~*~*~*~*~*~*~*~
    
    Step 2: Data collection
    
    ~*~*~*~*~*~*~*~*~*~*~*~
    don't forget to import libraries if you start here
    """
for gameID in gameList:
    # open file with all names
    with open(gameID+'ItemNames.txt', "rb") as file:   # Unpickling
       allItemNames = pickle.load(file)
    
    # intialize our Panda's dataframe with the data we want from each item
    allItemsPD = pd.DataFrame(data=None,index=None,columns = ['itemName','initial','timeOnMarket','priceIncrease','priceAvg','priceSD','maxPrice','maxIdx','minPrice','minIdx','swing','volAvg','volSD','slope','rr']);
    currRun = 1; # to keep track of the program running
    
    for currItem in allItemNames: # go through all item names
        # need to encode symbols into ASCII for http (https://www.w3schools.com/tags/ref_urlencode.asp)
        currItemHTTP = currItem.replace(' ','%20'); # convert spaces to %20
        currItemHTTP = currItemHTTP.replace('&','%26'); # convert & to %26
        # I was lazy there's probably others but I catch this below
        item = requests.get('https://steamcommunity.com/market/pricehistory/?appid='+gameID+'&market_hash_name='+currItemHTTP, cookies=cookie); # get item data
        print(str(currRun),' out of ',str(len(allItemNames))+' code: '+str(item.status_code));
        currRun += 1;
        item = item.content;
        item = json.loads(item);
        if item: # did we even get any data back
            itemPriceData = item['prices'] # is there price data?
            if itemPriceData == False or not itemPriceData: # if there was an issue with the request then data will return false and the for loop will just continue to the next item
                continue               # this could be cause the http item name was weird (eg symbol not converted to ASCII) but it will also occur if you make too many requests too fast (this is handled below)
            else:
                # initialize stuff
                itemPrices = []; # steam returns MEDIAN price for given time bin
                itemVol = [];
                itemDate = [];
                for currDay in itemPriceData: # pull out the actual data
                    itemPrices.append(currDay[1]); # idx 1 is price
                    itemVol.append(currDay[2]); # idx 2 is volume of items sold
                    itemDate.append(datetime.strptime(currDay[0][0:11], '%b %d %Y')) # idx 0 is the date
                
                # lists are strings, convert to numbers
                itemPrices = list(map(float, itemPrices));
                itemVol = list(map(int, itemVol));
                
                # combine sales that occurs on the same day
                # avg prices, sum volume
                # certainly not the best way to do this but, whatever
                for currDay in range(len(itemDate)-1,1,-1): # start from end (-1) and go to start
                    if itemDate[currDay] == itemDate[currDay-1]: # if current element's date same as the one before it
                        itemPrices[currDay-1] = np.mean([itemPrices[currDay],itemPrices[currDay-1]]); # average prices from the two days
                        itemVol[currDay-1] = np.sum([itemVol[currDay],itemVol[currDay-1]]); # sum volume
                        # delete the repeats
                        del itemDate[currDay] 
                        del itemVol[currDay] 
                        del itemPrices[currDay]
                
                # now that days are combined
                normTime = list(range(0,len(itemPrices))); # create a new list that "normalizes" days from 0 to n, easier to work with than datetime
                
                # some basic data
                timeOnMarket = (datetime.today()-itemDate[0]).days; # have to do this because if sales are spare day[0] could be months/years ago
                priceIncrease = itemPrices[-1] -itemPrices[0]; # what was the price increase from day 0 to the most recent day [-1]
                maxPrice = max(itemPrices); # max price
                maxIdx = itemPrices.index(maxPrice); # when was the max price?
                minPrice = min(itemPrices);
                minIdx = itemPrices.index(minPrice);
                swing = maxPrice - minPrice; # greatest price swing
                
                # get some descriptive stats
                itemPriceAvg = np.mean(itemPrices); # average price
                if len(itemPrices) > 1: # make sure there is at least two days of sales
                    itemPriceInitial = itemPrices[1] - itemPrices[0]; # how much did the price jump from day 0 to 1? eg the first trading day
                else:
                    itemPriceInitial = itemPrices[0];
                itemVolAvg = np.mean(itemVol);
                
                itemPriceSD = np.std(itemPrices);
                itemVolSD = np.std(itemVol);
                
                
                # linear regression to find slope and fit
                fitR = sci.linregress(normTime,itemPrices); # slope intercept rvalue pvalue stderr
                RR = float(fitR[2]**2); # convert to R^2 value
                
                
                # save data 
                currentItemDict = {'itemName':currItem,'initial':itemPriceInitial,'timeOnMarket':timeOnMarket,'priceIncrease':priceIncrease,'priceAvg':itemPriceAvg,'priceSD':itemPriceSD,'maxPrice':maxPrice,'maxIdx':maxIdx,'minPrice':minPrice,'minIdx':minIdx,'swing':swing,'volAvg':itemVolAvg,'volSD':itemVolSD,'slope':fitR[0],'rr':RR}
                currItemPD = pd.DataFrame(currentItemDict,index=[0]);
                allItemsPD= allItemsPD.append(currItemPD,ignore_index=True);
                
                
                time.sleep(random.uniform(0.5, 2.5))
        else:
            continue
print('All item data collected')

# save the dataframe
allItemsPD.to_pickle(gameID+'PriceData.pkl');


"""
~*~*~*~*~*~*~*~*~*~*~*~

Step 3: Data analysis

~*~*~*~*~*~*~*~*~*~*~*~
    don't forget to import libraries if you start here
"""
gameID = '578080';
# open file with all names
with open(gameID+'PriceData.pkl', "rb") as file:   # Unpickling
   data = pickle.load(file)

51 Responses

Jackreacher

May 21, 2019 | Reply

Does it also contain frequency of items sold?
- Dr. Blake Porter
  
  May 21, 2019 | Reply
  
  Hi Jackreacher,
  Yes, the pricehistory request will give you the median sale price and volume (number of items sold) every day. For closer time points (within the last few weeks) it will give more temporal details (eg median sale price and volume every 3 hours).
  Cheers,
  Blake
  - Vaibhav Barot
    
    May 21, 2019 | Reply
    
    Thanks for replying. One more question,will this work with trading cards? I wish to store card names of some 1000-2000 games in database and was is the request limit? (Since I am not requesting from steam api)
    
    Dr. Blake Porter
    
    May 21, 2019 | Reply
    
    Absolutely, any “thing” on the steam community market, trading cards, loot boxes, keys, wallpapers. You will need to do a bit of extra work after you collect all the item names from Part 1 to parse the names to only cards, though.
    
    You can certainly store any number of store card names on your computer. However, if you are using my method here to ask steam for the names or to get the price history, you will need to adhere to steam’s policy and API request limit which works out to be a request every 2.5 seconds.
    
    Vaibhav
    
    June 18, 2019 |
    
    How to get info only on games I want, right now if I want to get card price of a particular game, I have to first get info of every item on steam (which are 3 lakh +) and than search through it. Writing appID = the gameid I want does not work, It just returns all steam items.
    
    Dr. Blake Porter
    
    June 18, 2019 |
    
    I’m not sure about the specific part of the code you are referring to. You need to have your game ID stored in the gameList variable. You should NOT set gameID to anything. gameID is loopign through gameList. What game are you trying to get info for?
    
    Vaibhav
    
    June 18, 2019 |
    
    For example consider this game – https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_899440&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2&appid=753
    
    I want to get quantity sold for all these cards , and many more games like this. Keeping gameid – 899440 wont work, I will have to keep gameid 753 and market_hash_name = name of these cards
    
    Dr. Blake Porter
    
    June 18, 2019 |
    
    There appears to be an issue with Steam’s market place and this game. The easiest way to test is to make the same request in your browser, eg https://steamcommunity.com/market/search?q=&appid=899440
    
    The Steam market doesn’t return any items. When this happens when you use Python it just returns everything on the Market. I think this is more of a Steam issue. You may be able to get around it by writing your own search URL’s (you will need to replace the ones in my Python code with one that works (like the one you posted here)). I don’t know IF it will work but it should be your next option.
    
    Vaibhav
    
    June 18, 2019 |
    
    Yes in the above url if i change tag_app_ + gameid, I can get list of all cards of any game I want. But I was interested in quantity sold of each of those cards,so I found another url –
    http://steamcommunity.com/market/pricehistory/?country=DE&currency=3&appid=440&market_hash_name=Specialized%20Killstreak%20Brass%20Beast
    
    Here I will have to change country,currency,appid=753, and market_hash_name would be gameid + card_name
    
    But I dont know , making these many requests for every game for every card would be possible or not.
    And in your code in line 59,add &norender=1 , otherwise it returns html page and get error keyError : ‘results’
    
    Dr. Blake Porter
    
    June 18, 2019 |
    
    Nice! It is certainly possible just make sure you keep the delay in the code. I collected all the items for Dota 2, CSGO, PUBG, and Rust which is over 100,000 items. Had no issues
Jeremie

May 26, 2019 | Reply

i get an error with [‘results’]

Items 0 out of 311823 code: 200
Traceback (most recent call last):
File “steamanalyst.py”, line 50, in
allItems = allItems[‘results’];
KeyError: ‘results’
- Dr. Blake Porter
  
  May 26, 2019 | Reply
  
  what is the contents of allItems ? There should be a results field
Carter

July 11, 2019 | Reply

When I attempt to run it I get the error, “name ‘data’ is not defined. I also get the error, “TypeError: ‘NoneType’ object is not subscriptable”. Let me know if you need more information!
- Dr. Blake Porter
  
  July 12, 2019 | Reply
  
  Hey Carter sorry to hear that. It would be good to know what exact line number that error is coming from as I can’t figure out from your comment where it would be.
Sergey Grigorenko

December 21, 2019 | Reply

How I can get the current price of each object. If you have a file with data on the current price (or on some day in the last months), could you send it by mail?
- Dr. Blake Porter
  
  December 22, 2019 | Reply
  
  If you want to get the current price of items you will need to run this program constantly, or however often you want for “current”. I do not have these as I have not ran this program since I wrote the post in March.
Taddy

August 7, 2020 | Reply

Hey there!

I believe I have run the “steamMarket_dataCollection.py” successfully but am confused by the first few lines of the following program. it shows several .pkl files that were not created using the steamMarket_dataCollection.py

# open file with all names
with open(gameID+’PriceData.pkl’, “rb”) as file: # Unpickling
data = pickle.load(file)

with open(gameID+’PriceData_3.pkl’, “rb”) as file: # Unpickling
combo = pickle.load(file)

with open(gameID+’PriceData_2.pkl’, “rb”) as file: # Unpickling
combo2 = pickle.load(file)

with open(gameID+’marketDelta.pkl’, “rb”) as file: # Unpickling
marketDelta = pickle.load(file)

with open(gameID+’marketPrice.pkl’, “rb”) as file: # Unpickling
marketPrice = pickle.load(file)

with open(gameID+’marketVol.pkl’, “rb”) as file: # Unpickling
marketVol = pickle.load(file)

I get this traceback when trying to run:
Traceback (most recent call last):
File “steam_mrkt_analysis.py”, line 36, in
with open(gameID+’PriceData_3.pkl’, “rb”) as file: # Unpickling
FileNotFoundError: [Errno 2] No such file or directory: ‘252490PriceData_3.pkl’

Were these files you already had? Or did I miss use the first part of this program? I am using python 3.8.

Thanks !!
- Dr. Blake Porter
  
  August 7, 2020 | Reply
  
  Hi Taddy,
  Yes those are some other data I collected (they’re how I made the stock-market-like index funds. steamMarket_dataAnalysis.py is really just a template for people who want to analyze their data to give them ideas.
  
  All those lines are doing is loading the data you saved at the end of “steamMarket_dataCollection.py”. Whatever you named your files there, with allItemsPD.to_pickle(gameID+’PriceData.pkl’); is what you should then load, e.g.
  
  gameID = ‘578080’;
  # open file with all names
  with open(gameID+’PriceData.pkl’, “rb”) as file: # Unpickling
  data = pickle.load(file)
  
  I just have multiple saves so I have appended ‘_3’ onto the file names, which is what you see in that error.
Kobe

January 14, 2021 | Reply

hi, do u have some kind of discord where could ask somethings?

thanks!
MagnusSapiens

February 22, 2021 | Reply

Hi Blake, I get the below error, any ideas?

On line 86, in

allItems = allItems[‘results’];
TypeError: ‘NoneType’ object is not subscriptable
- Dr. Blake Porter
  
  February 23, 2021 | Reply
  
  Hi Magnus,
  Sorry to hear you’re having issues.
  Can you please tell me what the print out (line 80) says? Specifically, there should be a part that says code: ###.
  
  If everything is working, you should see code: 200. If there is a code other than 200, please let me know what it is so I can help you figure out the issue.
  
  Also, before the code: ###, you should see “Items X out of Y”. These numbers should make senses, such as Items 100 out of 1000 or something. If it says Items 0 out of 0, there was an issue finding the names of all the items on the market.
  
  Please copy and paste what line 80 is printing out when you run the program.
  
  Cheers,
  Blake
  - Peder
    
    December 6, 2021 | Reply
    
    Hi there!
    
    Albeit late, I am working with your code now. Thank you for sharing.
    
    I am getting the same error and it is due to code “429”, which I see is too many requests 🙂
    
    Dr. Blake Porter
    
    December 7, 2021 | Reply
    
    Hi Pedro,
    Yeah I used my code again a few weeks ago and it seems like Steam has gotten more strict. I changed the sleep time to a solid 3 seconds and it seems fine, though it is much slower.
    
    Peder
    
    December 10, 2021 |
    
    Hello again!
    
    Thanks for the swift reply. I feel like I tried both 5 and 10 seconds and still got the error. Will try again.
    
    Thank you!
    
    Peder
    
    December 10, 2021 |
    
    EDIT: I realised that I should try updating the cookie, which I’ve seen you note. However, since I was able to do 20-30 loops without issue, it did not occur to me as a cause for concern.
    
    Guess what. I updated my cookie, went with 4 seconds (possibly too high) and it worked.
MagnusSapiens

February 22, 2021 | Reply

Hi Blake,

I’m getting the following error, any ideas?
j0nez

May 15, 2021 | Reply

Hey there,
is there a way to filter the items I get from a game?
E.g.: I only want to get CSGO Stickers from 2020 in AllItemsGet and not every CSGO Item

Thanks!^^
- Dr. Blake Porter
  
  May 15, 2021 | Reply
  
  There’s a few ways to do it. Either in step 1 you only save stickers in your itemNames.txt file, or when you get to step 2 you do something like ignore anything that isn’t a sticker.
  
  Both will require something like an if statement that looks for stickers. This is very easy to do since every sticker just has “sticker” in its name. You can do something like:
  
  if currItem.str.contains(‘sticker’)
  - j0nez
    
    May 15, 2021 | Reply
    
    But it would still have to go through every item. I wanted to use a filter to save time while crawling.
    Is that possible?
    
    Dr. Blake Porter
    
    May 15, 2021 | Reply
    
    Yup, so you can just do it in step 1 when you get the name of every item. Only save ones that say sticker in the name.
    
    j0nez
    
    May 15, 2021 |
    
    ok I put it in line 66 and get the error:
    AttributeError: ‘dict’ object has no attribute ‘str’
    
    Dr. Blake Porter
    
    May 15, 2021 |
    
    Sorry, my exact example would only work in part 2. But you’re close and have the right idea.
    
    For part one, add in the following line at line 66. I have included the previous and proceeding lines so it’s more clear where it goes
    
    for currItem in allItems: if "sticker" in currItem['hash_name'].lower(): # only save names with sticker in them allItemNames.append(currItem['hash_name'])
    
    Dr. Blake Porter
    
    May 15, 2021 |
    
    Sorry, formatting doesnt work well in comments. Here is the code: https://www.blakeporterneuro.com/wp-content/uploads/2021/05/codeStickersOnly.png
    
    j0nez
    
    May 15, 2021 |
    
    ok i got one last question.
    if want to check if the name contains “sticker” and “2020”, is that:
    
    if “sticker” and “2020” in currItem …
    
    or would i have to do a second if statement for that?
    
    Dr. Blake Porter
    
    May 15, 2021 |
    
    Nooo sorry Python doesn’t work like that. You will need to use any() which gets a little more complicated.
    
    see here: https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string
    
    j0nez
    
    May 15, 2021 |
    
    ok so this should work:
    if all(x in currItem[‘hash_name’].lower() for x in (“sticker”, “2020”)):
    
    j0nez
    
    May 15, 2021 |
    
    Alright I got it to work.
    Thank you so much for helping me.^^
Dr. Blake Porter

May 15, 2021 | Reply

Awesome! glad to hear
caterpie

February 6, 2022 | Reply

no matter what i do it still says module not named requests, I have tried everything from reinstalling the module, reinstalling pip, resetting my PATH, venv, conda, no matter what I do it will not run
ryan ryan

August 17, 2022 | Reply

hey can i have your contact ? like telegram or something , i need something similar but from a csgo store other than steam . i would like to talk to you .
- Dr. Blake Porter
  
  August 17, 2022 | Reply
  
  Hi Ryan,
  My email is blakeporterneuro@gmail.com where you can contact me.
  My expertise is in the Steam store, I’m not sure I can do something similar with another store/site.
  Cheers,
  Blake
Dennis Startsev

February 16, 2023 | Reply

Is there a way to do the not for all items, but rather for a specific one. For example ★ Butterfly Knife | Case Hardened (Well-worn). Maybe to skip to the second step and make it so that it will parse the data from the link as in this case (https://steamcommunity.com/market/listings/730/★%20Butterfly%20Knife%20%7C%20Case%20Hardened%20%28Well-Worn%29); I am a very beginner in coding, so I don’t know what availabilities we have.
- Dr. Blake Porter
  
  February 16, 2023 | Reply
  
  Hi Dennis,
  This is already what the code is doing, going one item at a time and getting its data. That is the only way to get historic skin data like price and volume. But this request is “nested” in a for loop (for currItem in allItemNames) that goes through all the item names I previously collected in Step 1. For your example, currItem would be “★ Butterfly Knife | Case Hardened (Well-worn)”. All you would need to do to get one item at a time is get rid of the for loop part and manually set the currItem variable to your skin of interest.
  
  If you’re not familiar with how for loops work, I would recommend using something like code academy (https://www.codecademy.com/learn/learn-python-3) to learn.
  Cheers,
  Blake
- Paul Ferrante
  
  March 12, 2023 | Reply
  
  Hey Dennis,
  
  I am trying to do the same thing, and I’m struggling as well. Would you possibly wish to collaborate? If so my email is paulferrante18@gmail.com.
  Best of luck!
Den

February 16, 2023 | Reply

Is there a way to do the data scraping for one item only. For example https://steamcommunity.com/market/listings/730/★%20Butterfly%20Knife%20%7C%20Case%20Hardened%20%28Well-Worn%29, it is ★ Butterfly Knife | Case Hardened (Well-Worn); it’s hash name is ★%20Butterfly%20Knife%20%7C%20Case%20Hardened%20%28Well-Worn%29.
So I was thinking instead of collecting all item’s names or even just category like sticker (as Dr. Blake explained above) to do the search of this item (or any other item) by using it’s hash name and then just continue from step 2 as usual.
I will apreciate any help Dr. Blake!
Frederik Vrålstad

December 31, 2023 | Reply

Hey,

Thanks for this guide, it was really helpful!

However, while implementing your code, I encountered an issue related to the deprecation of the append method in the latest version of Pandas. I found a solution on Stack Overflow that addresses this concern and provides a more efficient alternative.

In your code, where you use allItemsPD.append(currItemPD, ignore_index=True), the latest versions of Pandas have deprecated the append method. Instead, it’s recommended to collect DataFrames in a list and concatenate them afterward.

Here’s the improved part of the code:

# Before the loop, initialize an empty list
lst = []

for currItem in allItemNames:
# … (your existing loop code)

# Inside the loop, collect the new row (currItemPD) in the list
lst.append(currItemPD)

# After the loop, concatenate all DataFrames in the list
allItemsPD = pd.concat(lst, ignore_index=True)

I found a detailed discussion about this on Stack Overflow: https://stackoverflow.com/questions/75956209/error-dataframe-object-has-no-attribute-append
- Dr. Blake Porter
  
  April 10, 2024 | Reply
  
  Hi Frederik,
  Thank you so much for your comment and fix!
  Cheers,
  Blake
Dr. Blake Porter

April 10, 2024 | Reply

Hi Frederik,
Thank you so much for your comment and fix!
Cheers,
Blake
Andreeesss

June 23, 2024 | Reply

Hey!

I have the following question: I made a small parser to collect data on prices and requests, I want to do flexible sorting by types of items, their quality and rarity, can I somehow get this data from Steam? Or will I have to collect them manually?
- Dr. Blake Porter
  
  July 19, 2024 | Reply
  
  Hi Andres,
  Sorry I didn’t respond sooner, I didn’t get a notification of your comment.
  
  It really depends on the game and how much you can leverage the item name. For example, in CS2 you have the wear rating in the title of the item, so you can parse those. Other things, like rarity, could be inferred by the volume of trades. Otherwise, you’ll need to do a lot manually. Basically if it isn’t returned from the pricehistory request, you’ll need to get it manually or find a way to estimate it.
  Hope that helps,
  Blake
Costiinha

December 1, 2024 | Reply

Hi, nice work
Considering the four simple strategies methods that you mentioned in repo, which one seen to be more profitable and “stable”?
I know it depends… but thinking in percentages how can you distribute between these methods?
And which one you mostly use?

The Code

Get all the item names:

Quick troubleshooting break:

Get all the price data:

Conclusions:

Share this:

Like this:

Related

51 Responses

Leave a ReplyCancel reply

Discover more from Dr. Blake Porter