Learning Python – Project 3: Scrapping data from Steam’s Community Market

Note: As of September 30th, 2023, this code works for Counter Strike 2 with no modifications needed. The gameID for CS2 is 730, the same as CSGO.

This Python project (using the Spyder python environment) aims to scrap data from Steam’s Community Market place for price data on about in-game video game items. For the uninitiated, Steam is… a lot of things. Mainly you can buy video games on Steam and easily play them with your friends. However, Steam has many other features. One is a “Workshop” where players of games can make mods or in-game item “skins” for their favorite gear. Your favorite gun looking a bit dull, there’s skins for that!


Semi-Automatic Pistol
Boring default skin
Sunrise SAP
Awesome rad skin

Skins do not change the item/gun/armor in any way aside from its visual appearance in game. Some games will take player made skins from the workshop and introduce them to the official game. This can be done through Steam or various other means specific to a game (such as players getting special chests to unlock). However, this process is usually done using real money one way or another. For example, the developers from a game called Rust will pick a handful of Workshop skins per week and sell them on Steam for a limited time for real life money, usually $2 – 5 per skin. When players buy a skin, a portion goes to the skin’s creator, another slice to Rust, and a bit to Steam. Other games use chests players can obtain for free by playing the game, but need to purchase a key for real life money (usually $2.50) in order to open the chest and get the skin inside.

Rust Workshop page for a rocket launcher skin by player |HypershoT|

However, the skin economy is quite interesting from both a supply and demand viewpoint. Continuing with the Rust examples, skins are only available to purchase officially from Rust for one week. After that time, if a player wants a skin they have to buy it (with real money) from another player who has it. The supply of a skin (number of skins available) after the initial week is fixed. Other games that rely on chests either have the chest available for a limited time and/or set rarities to items contained with in chests so some are very common while other skins may have less than 1% chance of being found. Regardless, if you want a cool skin, this is where the Steam Community Market comes in (as opposed to Steam’s own store).

Steam Community Market

Players of a game can buy and sell their in game items and skins on the market to other players. Again, all for real life money. Now, keep in mind for the demand side of this economy, all skins do is make your items in game look cooler so you can show off to your friends and opponents. These never make your items stronger or better. That being said, some item skins are so sought after they will regularly sell for hundreds to even thousands of real life dollars. People can make so much money off selling skins Steam will provide you with 1099‘s for income tax reporting. There’s money to be had out there!

Example Rust skin (Punishment Mask) that sells for hundreds of dollars

So whether you’re looking to diversity your investment portfolio with skins, do a bit of economics research on digital cosmetic items, or just wanting a programming project, this post will take you through how to pull market data from Steam about any game and its items that you’d like.


The Code

So first we need some python libraries to make our lives easier. Read the comments (#) if you’re interested in what they’re doing.

import requests # make http requests
import json # make sense of what the requests return
import pickle # save our data to our computer

import pandas as pd # structure out data
import numpy as np # do a bit of math
import scipy.stats as sci # do a bit more math

from datetime import datetime # make working with dates 1000x easier 
import time # become time lords
import random # create random numbers (probably not needed)

Next we need to get our login credentials. Some requests to the Market don’t need you to have an account but others do. I don’t know of a better way to get your login credentials aside from manually finding the cookie Steam puts on your computer when you login. Please comment below if you know a better way. In your browser log into Steam.
New method: for Chrome, as of March 2023, go to: more tools > developer tools > click the Application tab (top, may need to scroll to the right some), on the left panel, go to Storage > Cookies > https://store.steampowered.com/ > on the right panel find the steamLoginSecure and copy that huge block of random letters and numbers.
Old method: for Chrome (updated Feb 2021), go settings > privacy and security > cookies and other site data > see all cookies and site data (small text) > search steamcommunity.com > find “steamLoginSecure” > and finally copy the “Content” string.
Then we paste the string into Python and save it as a {dictionary} called cookie.

cookie = {'steamLoginSecure': '123451234512345%ABC%ABC%123%123456ABC12345'};

Next we need to get the game, or games, we want to get skins from. However, it’s not as easy as putting in the item’s name (cause I’m lazy and didn’t program such a feature). Rather, we need the game’s ID in Steam. You can easily find this by going to the game’s page in Steam, looking at the URL and find some numbers, like so:

gameList = ['252490','578080']; # this is for rust[0] and PUBG[1]

That’s all for the setup, now we can start to get some data. This process occurs in two parts. First we ask for all the names of all the items on the Community market for a game. This is done with a simple “search” request. This request will give us item names and some basic information like it’s current price. However, if we want more detailed information we need to use a different “pricehistory” search instead which we can only do for one item at a time.                      

Get all the item names:                   

We’ll start first by writing a for loop to loop through all our game IDs in [gameList]. Then we’ll create an empty [List] called [allItemNames] to store all the item names. Next we’ll make our first request. Requests are the key to this program and it is how we ask Steam for information via a specially constructed URL. This initial request is only to find the total number of items on the market for a given game. Steam will only return a max of 100 items at a time (even if you make count > 100) so we need to make multiple requests to get all the item names. We’ll make a request for the webpage with “requests.get” and our search URL then store it as allItemGet. We’ll then use the requests library again to get the .content of the page. Next, we use jsonto parse the content of the page into something that we can easily read and store it as allItems. Finally, to complete the finding-total-number-of-items step, we’ll find the total number of items from the [‘total_count’] key in allItems and store it as totalItems.

for gameID in gameList:
    # itialize
    allItemNames = [];
    
    # find total number items
    allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=100', cookies=cookie); # get page
    allItems = allItemsGet.content; # get page content
    
    allItems = json.loads(allItems); # convert to JSON
    totalItems = allItems['total_count']; # get total count

Now we have the total number of items, next we need to iterate through all the items listed starting from 0 in increments of 100 and getting the names of all of these items. This means making a search request for each block of 100 items. However, I found out the hard way that items change position so often that between searches of 100 item batches, many items get missed. We don’t want to miss these items so we loop in smaller batches (50) instead of 100. To do this we’ll create another for loop to handle this. In your search URL to Steam you can specify which position in the item list on the market you want to start with the “start=x” value. We’ll repeat the above process but this time looping through in batches of 50. Furthermore, we’ll get all 50 item’s names from the ‘results’ key in the {allItems} dictionary made when we convert the returned page into json. Next, we’ll loop through all the items in our allItems {dictionary} and pull out just the [‘hash_name’] so use for our pricehistory search. We’ll then keep appending our allItemNames list with the new names. To recap, we loop to get the basic search info for 50 items [allItems], then loop again to get the name currItem[‘hash_name’] of each of those 100 items. You’ll also notice I have a sleep timer in there. Steam limits you from making too many requests too quickly. I couldn’t find any hard data but people seem to think the limit is 40 per minute or 200 in five minutes. I was too lazy to change the (working) code after I learned this so as it stands the program will pause, via time.sleep(), for a random time between half a second and 2.5 seconds. If it ain’t broke don’t fix it. Lastly there is a print out showing which item batch the program is on and if you’re still getting good results from steam (code: 200).

    # you can only get 100 items at a time (despite putting in count= >100)
    # so we have to loop through in batches of 100 to get every single item name by specifying the start position
    for currPos in range(0,totalItems+50,50): # loop through all items
        time.sleep(random.uniform(0.5, 2.5)) # you cant make requests too quickly or steam gets mad
        
    # get item name of each
        allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?start='+str(currPos)+'&count=100&search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=5000', cookies=cookie);
        print('Items '+str(currPos)+' out of '+str(totalItems)+' code: '+str(allItemsGet.status_code)) # reassure us the code is running and we are getting good returns (code 200)
        
        allItems = allItemsGet.content;
        allItems = json.loads(allItems);
        allItems = allItems['results'];
        for currItem in allItems: 
            allItemNames.append(currItem['hash_name']) # save the names
Quick troubleshooting break:

It’s worth talking about these status codes and why I’m printing them. A few things could (did) go wrong at this stage. One, your cookie for Steam could expire. However, this cookie seems to be good for at least 24 hours and I’m pretty sure even longer? But, if you’re working on this over multiple days you may need to re-login to Steam and find your new cookie and paste it into cookie at the top of the program. If you key expired you will get code 400 for a bad request. Other things is that the search URL is bad which results in a bad request, code 500. Double check your item name if this happens (more on that below). However, if this does occur, the code has exceptions to handle it and keep running.

Just need a little more for the item names. Sometimes I was getting duplicates. I would guess it was because item positions changed as we made our requests, eg item 100 moved to position 101 between getting 0 100 and 101 to 200. To remove duplicates in a [list] in Python you can set a [list] to a set then back to a [list]. Lastly, we’ll use pickle to save our [allItemNames] as a txt file in your current working directory so we don’t need to run this bit of code anymore (unless new items are released).

    allItemNames = list(set(allItemNames))
    
    # Save all the name so we don't have to do this step anymore
    # use pickle to save all the names so i dont have to keep running above code
    with open(gameID+'ItemNames.txt', "wb") as file: # change the text file name to whatever you want
        pickle.dump(allItemNames, file)

Get all the price data:

Now we have a large [list] stored as a txt file that contains all the names of each item that’s on the steam market for a given game. Now we will iterate through every item and make a more specific request about the price history for a given item. Steam will return a ton of price history data including the median sale price and number of sales (volume) for every day the item has been for sale.

First we’ll again loop through our gameList and read in our txt file with pickle.load(). Next we’ll create a Pandas dataframe. Pandas allows us to create nice tables (dataframe) of data and do lots of math on them easily. At this point we are just creating a blank dataframe and labeling our columns to add data to as we collect it. I put in the currRun variable to keep track of which item we’re on. Games with a lot of items can take hours (dota 2 has over 30,000 items, mainly TI trading cards) so keeping track of progress is nice.

for gameID in gameList:
    # open file with all names
    with open(gameID+'ItemNames.txt', "rb") as file:   # Unpickling
       allItemNames = pickle.load(file)
    
    # intialize our Panda's dataframe with the data we want from each item
    allItemsPD = pd.DataFrame(data=None,index=None,columns = ['itemName','initial','timeOnMarket','priceIncrease','priceAvg','priceSD','maxPrice','maxIdx','minPrice','minIdx','swing','volAvg','volSD','slope','rr']);
    currRun = 1; # to keep track of the program running

Next, we will write a loop that goes through [allItemNames]. Python has a handy for loop feature when working with [lists] where we can just say, for currItem in allItemNames:  and currItem will iterate through all of [allItemNames] as each element in the list (our item names). In other words, currItem will be our item’s name as we loop through all the items. We need to do one more thing before we make our request to Steam. The item names are stored as normal strings, like “Punishment Mask”. Now, that pesky space wont work in our URL search. We need to convert all symbols (but not letters and numbers) to ASCII before making our request. Spaces get converted to %20 and & to %26. For all the games I tested, this was sufficient. However, other games may use some other symbols that you’ll need to compensate for. Finally, we will make our request.get(), and as before, store it, get the item.content, and turn it into json.loads while naming the resulting dict {item}.

    for currItem in allItemNames: # go through all item names
        # need to encode symbols into ASCII for http (https://www.w3schools.com/tags/ref_urlencode.asp)
        currItemHTTP = currItem.replace(' ','%20'); # convert spaces to %20
        currItemHTTP = currItemHTTP.replace('&','%26'); # convert & to %26
        # I was lazy there's probably others but I catch this below
        item = requests.get('https://steamcommunity.com/market/pricehistory/?appid='+gameID+'&market_hash_name='+currItemHTTP, cookies=cookie); # get item data
        print(str(currRun),' out of ',str(len(allItemNames))+' code: '+str(item.status_code));
        currRun += 1;
        item = item.content;
        item = json.loads(item);

At this point in the code, I’ve cobbled together some exception catches to keep the code running if an item has some errors, such as the name is bad or other weird things occur. Again, check the print out status_code for clues. Things I saw were just empty returns or there could be an item listed but there was never a sale of it so no data was returned. These things are taken care of here and the program is told to skip these problematic items and keep on runnin’.

        if item: # did we even get any data back
            itemPriceData = item['prices'] # is there price data?
            if itemPriceData == False or not itemPriceData: # if there was an issue with the request then data will return false and the for loop will just continue to the next item
                continue               # this could be cause the http item name was weird (eg symbol not converted to ASCII) but it will also occur if you make too many requests too fast (this is handled below)
            else:

Everything else in the below block is a bit boring but if you’re learning Python or programming it could be useful to go over. Mainly I’m turning all the raw data into data that I want and into a more user friendly structure. First I initialize some [lists]. Then I loop through all the itemPriceData to get the price, volume, and date for every time* there was a sale of the item. I store these all in their own lists to make it easier to work with. For example, I have to convert the price and volume data from character strings into numbers to be able to do math on them. Before I get to the math part though I needed to handle the dates data. I did this using the datetime library. *The problem arises because Steam will return data from old dates as a single entry for a whole day, eg 05-03-2017 volume 10 price $5.00. However, for more recent dates you get more granular, hourly data. To make data analysis easier I averaged the sale prices from the same days and summed their volumes. Once this is done, we can create a new list of “normalized” dates that is just the number of days the item as been on the market for. With the dates thing sorted I get some basic info like the max() price and on which day (index()) it occurred. Then I did some basic math using the NumPy library such as getting the overall average price, the standard deviation, and how much the price changed from day 1 (which is indexed to [0]) until the most recent sale (the last value in our [list]). Whereas the first data point is at index [0], a trick in python to access the last element in a list is to index to [-1]. Other things included the slope of a linear regression from the SciPy.stats library to see how the item price changed over time.

                # initialize stuff
                itemPrices = []; # steam returns MEDIAN price for given time bin
                itemVol = [];
                itemDate = [];
                for currDay in itemPriceData: # pull out the actual data
                    itemPrices.append(currDay[1]); # idx 1 is price
                    itemVol.append(currDay[2]); # idx 2 is volume of items sold
                    itemDate.append(datetime.strptime(currDay[0][0:11], '%b %d %Y')) # idx 0 is the date
                
                # lists are strings, convert to numbers
                itemPrices = list(map(float, itemPrices));
                itemVol = list(map(int, itemVol));
                
                # combine sales that occurs on the same day
                # avg prices, sum volume
                # certainly not the best way to do this but, whatever
                for currDay in range(len(itemDate)-1,1,-1): # start from end (-1) and go to start
                    if itemDate[currDay] == itemDate[currDay-1]: # if current element's date same as the one before it
                        itemPrices[currDay-1] = np.mean([itemPrices[currDay],itemPrices[currDay-1]]); # average prices from the two days
                        itemVol[currDay-1] = np.sum([itemVol[currDay],itemVol[currDay-1]]); # sum volume
                        # delete the repeats
                        del itemDate[currDay] 
                        del itemVol[currDay] 
                        del itemPrices[currDay]
                
                # now that days are combined
                normTime = list(range(0,len(itemPrices))); # create a new list that "normalizes" days from 0 to n, easier to work with than datetime
                
                # some basic data
                timeOnMarket = (datetime.today()-itemDate[0]).days; # have to do this because if sales are spare day[0] could be months/years ago
                priceIncrease = itemPrices[-1] -itemPrices[0]; # what was the price increase from day 0 to the most recent day [-1]
                maxPrice = max(itemPrices); # max price
                maxIdx = itemPrices.index(maxPrice); # when was the max price?
                minPrice = min(itemPrices);
                minIdx = itemPrices.index(minPrice);
                swing = maxPrice - minPrice; # greatest price swing
                
                # get some descriptive stats
                itemPriceAvg = np.mean(itemPrices); # average price
                if len(itemPrices) > 1: # make sure there is at least two days of sales
                    itemPriceInitial = itemPrices[1] - itemPrices[0]; # how much did the price jump from day 0 to 1? eg the first trading day
                else:
                    itemPriceInitial = itemPrices[0];
                itemVolAvg = np.mean(itemVol);
                
                itemPriceSD = np.std(itemPrices);
                itemVolSD = np.std(itemVol);
                
                
                # linear regression to find slope and fit
                fitR = sci.linregress(normTime,itemPrices); # slope intercept rvalue pvalue stderr
                RR = float(fitR[2]**2); # convert to R^2 value

Lastly, we just need to save all of this particular item’s data. The easiest way I knew how to do this was to make a {dict} which could then be easily turned into a Pandas dataframe. Then we could use the data from from this one item and append it to the master allItemsPD dataframe which will store the data from every single item as we loop through the names.
Don’t forget to let the program (and Steam’s servers) take a quick nap with time.sleep() again! Once all the items’ data have been collected, we can pickle the allItemsPD as a .pkl file so we don’t need to do this arduous data collection again (unless you need updated price information). Pickle will save your data in the Pandas dataframe format so when you load it again, all your data is still a dataframe.

                
                
                # save data 
                currentItemDict = {'itemName':currItem,'initial':itemPriceInitial,'timeOnMarket':timeOnMarket,'priceIncrease':priceIncrease,'priceAvg':itemPriceAvg,'priceSD':itemPriceSD,'maxPrice':maxPrice,'maxIdx':maxIdx,'minPrice':minPrice,'minIdx':minIdx,'swing':swing,'volAvg':itemVolAvg,'volSD':itemVolSD,'slope':fitR[0],'rr':RR}
                currItemPD = pd.DataFrame(currentItemDict,index=[0]);
                allItemsPD= allItemsPD.append(currItemPD,ignore_index=True);
                
                
                time.sleep(random.uniform(0.5, 2.5))
        else:
            continue
print('All item data collected')

# save the dataframe
allItemsPD.to_pickle(gameID+'PriceData.pkl');

Conclusions:

Once the program finishes you will have a dataframe with all the data we saved from currentItemDict for every item name we collected. To work with this data again just open the pickle file for a particular game and get analyzing. I always use a game’s ID in the file names to keep track of which file is which.
A second, less Python-centric post will follow with my analysis of the Rust skin market. I tried to run this on Dota2 but there was so many items it took many hours to get the data and I kept getting errors so I gave up. I did test PUBG as well and it worked fine. The code in its entirety is below and on gitHubso you can just copy+paste it into Python and, assuming you have all the libraries installed, should just run.

Analysis sneak peak; put all your savings into Rust skins?

Update August 11th 2019: So sorry! The widget I was using to display code was converting the code to HTML. Mainly, this caused the “&” to turn into “& amp;”. The caused the search URL to be incorrect and cause errors, esp on line 64. I have fixed this now. But please make sure the URLs on line 46, 59, and 100 only have “&” in the string and not “& amp;”. This is kinda funny since it’s the opposite problem we solve on line 97 and 98! If you are getting other errors always try the GitHub code as a plan B. https://github.com/blakeporterneuro/steamMarket

# -*- coding: utf-8 -*-
"""
Created on Thu Feb 28 22:53:32 2019

@author: Blake Porter
www.blakeporterneuro.com

This code is made available with the CC BY-NC-SA 4.0 license 
https://creativecommons.org/licenses/by-nc-sa/4.0/

This script is broken into three parts
1) Item name collection
    • Get the name of every item on the steam market place for a given game
2) Item price data collection
    • Loop through the item names to get the price history for each item
3) Data analysis
"""


import requests # make http requests
import json # make sense of what the requests return
import pickle # save our data to our computer

import pandas as pd # structure out data
import numpy as np # do a bit of math
import scipy.stats as sci # do a bit more math

from datetime import datetime # make working with dates 1000x easier 
import time # become time lords
import random # create random numbers (probably not needed)

import matplotlib.pyplot as plt # plot data
import seaborn as sns # make plotted data pretty

# Login to steam on your browser and get your steam login cookie 
# For Chrome, settings > advanced > content settings > cookies > see all cookies and site data > find steamcommunity.com > find "steamLoginSecure" > copy the "Content" string and paste below
cookie = {'steamLoginSecure': '123451234512345%ABC%ABC%123%123456ABC12345'};

# gameList as a string or list of strings 
# rust, 252490, dota2, 570; CSGO, 730; pubg, 578080; TF2, 440; 
# you can find the app id by going to the community market and finding the appid=##### in the URL
gameList = ['252490','578080'];

for gameID in gameList:
    # itialize
    allItemNames = [];
    
    # find total number items
    allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=100', cookies=cookie); # get page
    allItems = allItemsGet.content; # get page content
    
    allItems = json.loads(allItems); # convert to JSON
    totalItems = allItems['total_count']; # get total count
    
    
    # you can only get 100 items at a time (despite putting in count= >100)
    # so we have to loop through in batches of 100 to get every single item name by specifying the start position
    for currPos in range(0,totalItems+50,50): # loop through all items
        time.sleep(random.uniform(0.5, 2.5)) # you cant make requests too quickly or steam gets mad
        
    # get item name of each
        allItemsGet = requests.get('https://steamcommunity.com/market/search/render/?start='+str(currPos)+'&count=100&search_descriptions=0&sort_column=default&sort_dir=desc&appid='+gameID+'&norender=1&count=5000', cookies=cookie);
        print('Items '+str(currPos)+' out of '+str(totalItems)+' code: '+str(allItemsGet.status_code)) # reassure us the code is running and we are getting good returns (code 200)
        
        allItems = allItemsGet.content;
        allItems = json.loads(allItems);
        allItems = allItems['results'];
        for currItem in allItems: 
            allItemNames.append(currItem['hash_name']) # save the names
            
        
    # remove dupes by converting from list to set and back again
    allItemNames = list(set(allItemNames))
    
    # Save all the name so we don't have to do this step anymore
    # use pickle to save all the names so i dont have to keep running above code
    with open(gameID+'ItemNames.txt', "wb") as file: # change the text file name to whatever you want
        pickle.dump(allItemNames, file)
    

    """
    ~*~*~*~*~*~*~*~*~*~*~*~
    
    Step 2: Data collection
    
    ~*~*~*~*~*~*~*~*~*~*~*~
    don't forget to import libraries if you start here
    """
for gameID in gameList:
    # open file with all names
    with open(gameID+'ItemNames.txt', "rb") as file:   # Unpickling
       allItemNames = pickle.load(file)
    
    # intialize our Panda's dataframe with the data we want from each item
    allItemsPD = pd.DataFrame(data=None,index=None,columns = ['itemName','initial','timeOnMarket','priceIncrease','priceAvg','priceSD','maxPrice','maxIdx','minPrice','minIdx','swing','volAvg','volSD','slope','rr']);
    currRun = 1; # to keep track of the program running
    
    for currItem in allItemNames: # go through all item names
        # need to encode symbols into ASCII for http (https://www.w3schools.com/tags/ref_urlencode.asp)
        currItemHTTP = currItem.replace(' ','%20'); # convert spaces to %20
        currItemHTTP = currItemHTTP.replace('&','%26'); # convert & to %26
        # I was lazy there's probably others but I catch this below
        item = requests.get('https://steamcommunity.com/market/pricehistory/?appid='+gameID+'&market_hash_name='+currItemHTTP, cookies=cookie); # get item data
        print(str(currRun),' out of ',str(len(allItemNames))+' code: '+str(item.status_code));
        currRun += 1;
        item = item.content;
        item = json.loads(item);
        if item: # did we even get any data back
            itemPriceData = item['prices'] # is there price data?
            if itemPriceData == False or not itemPriceData: # if there was an issue with the request then data will return false and the for loop will just continue to the next item
                continue               # this could be cause the http item name was weird (eg symbol not converted to ASCII) but it will also occur if you make too many requests too fast (this is handled below)
            else:
                # initialize stuff
                itemPrices = []; # steam returns MEDIAN price for given time bin
                itemVol = [];
                itemDate = [];
                for currDay in itemPriceData: # pull out the actual data
                    itemPrices.append(currDay[1]); # idx 1 is price
                    itemVol.append(currDay[2]); # idx 2 is volume of items sold
                    itemDate.append(datetime.strptime(currDay[0][0:11], '%b %d %Y')) # idx 0 is the date
                
                # lists are strings, convert to numbers
                itemPrices = list(map(float, itemPrices));
                itemVol = list(map(int, itemVol));
                
                # combine sales that occurs on the same day
                # avg prices, sum volume
                # certainly not the best way to do this but, whatever
                for currDay in range(len(itemDate)-1,1,-1): # start from end (-1) and go to start
                    if itemDate[currDay] == itemDate[currDay-1]: # if current element's date same as the one before it
                        itemPrices[currDay-1] = np.mean([itemPrices[currDay],itemPrices[currDay-1]]); # average prices from the two days
                        itemVol[currDay-1] = np.sum([itemVol[currDay],itemVol[currDay-1]]); # sum volume
                        # delete the repeats
                        del itemDate[currDay] 
                        del itemVol[currDay] 
                        del itemPrices[currDay]
                
                # now that days are combined
                normTime = list(range(0,len(itemPrices))); # create a new list that "normalizes" days from 0 to n, easier to work with than datetime
                
                # some basic data
                timeOnMarket = (datetime.today()-itemDate[0]).days; # have to do this because if sales are spare day[0] could be months/years ago
                priceIncrease = itemPrices[-1] -itemPrices[0]; # what was the price increase from day 0 to the most recent day [-1]
                maxPrice = max(itemPrices); # max price
                maxIdx = itemPrices.index(maxPrice); # when was the max price?
                minPrice = min(itemPrices);
                minIdx = itemPrices.index(minPrice);
                swing = maxPrice - minPrice; # greatest price swing
                
                # get some descriptive stats
                itemPriceAvg = np.mean(itemPrices); # average price
                if len(itemPrices) > 1: # make sure there is at least two days of sales
                    itemPriceInitial = itemPrices[1] - itemPrices[0]; # how much did the price jump from day 0 to 1? eg the first trading day
                else:
                    itemPriceInitial = itemPrices[0];
                itemVolAvg = np.mean(itemVol);
                
                itemPriceSD = np.std(itemPrices);
                itemVolSD = np.std(itemVol);
                
                
                # linear regression to find slope and fit
                fitR = sci.linregress(normTime,itemPrices); # slope intercept rvalue pvalue stderr
                RR = float(fitR[2]**2); # convert to R^2 value
                
                
                # save data 
                currentItemDict = {'itemName':currItem,'initial':itemPriceInitial,'timeOnMarket':timeOnMarket,'priceIncrease':priceIncrease,'priceAvg':itemPriceAvg,'priceSD':itemPriceSD,'maxPrice':maxPrice,'maxIdx':maxIdx,'minPrice':minPrice,'minIdx':minIdx,'swing':swing,'volAvg':itemVolAvg,'volSD':itemVolSD,'slope':fitR[0],'rr':RR}
                currItemPD = pd.DataFrame(currentItemDict,index=[0]);
                allItemsPD= allItemsPD.append(currItemPD,ignore_index=True);
                
                
                time.sleep(random.uniform(0.5, 2.5))
        else:
            continue
print('All item data collected')

# save the dataframe
allItemsPD.to_pickle(gameID+'PriceData.pkl');


"""
~*~*~*~*~*~*~*~*~*~*~*~

Step 3: Data analysis

~*~*~*~*~*~*~*~*~*~*~*~
    don't forget to import libraries if you start here
"""
gameID = '578080';
# open file with all names
with open(gameID+'PriceData.pkl', "rb") as file:   # Unpickling
   data = pickle.load(file)
   

48 Responses

    • Dr. Blake Porter

      Hi Jackreacher,
      Yes, the pricehistory request will give you the median sale price and volume (number of items sold) every day. For closer time points (within the last few weeks) it will give more temporal details (eg median sale price and volume every 3 hours).
      Cheers,
      Blake

      • Vaibhav Barot

        Thanks for replying. One more question,will this work with trading cards? I wish to store card names of some 1000-2000 games in database and was is the request limit? (Since I am not requesting from steam api)

        • Dr. Blake Porter

          Absolutely, any “thing” on the steam community market, trading cards, loot boxes, keys, wallpapers. You will need to do a bit of extra work after you collect all the item names from Part 1 to parse the names to only cards, though.

          You can certainly store any number of store card names on your computer. However, if you are using my method here to ask steam for the names or to get the price history, you will need to adhere to steam’s policy and API request limit which works out to be a request every 2.5 seconds.

          • Vaibhav

            How to get info only on games I want, right now if I want to get card price of a particular game, I have to first get info of every item on steam (which are 3 lakh +) and than search through it. Writing appID = the gameid I want does not work, It just returns all steam items.

          • Dr. Blake Porter

            I’m not sure about the specific part of the code you are referring to. You need to have your game ID stored in the gameList variable. You should NOT set gameID to anything. gameID is loopign through gameList. What game are you trying to get info for?

          • Dr. Blake Porter

            There appears to be an issue with Steam’s market place and this game. The easiest way to test is to make the same request in your browser, eg https://steamcommunity.com/market/search?q=&appid=899440

            The Steam market doesn’t return any items. When this happens when you use Python it just returns everything on the Market. I think this is more of a Steam issue. You may be able to get around it by writing your own search URL’s (you will need to replace the ones in my Python code with one that works (like the one you posted here)). I don’t know IF it will work but it should be your next option.

          • Vaibhav

            Yes in the above url if i change tag_app_ + gameid, I can get list of all cards of any game I want. But I was interested in quantity sold of each of those cards,so I found another url –
            http://steamcommunity.com/market/pricehistory/?country=DE&currency=3&appid=440&market_hash_name=Specialized%20Killstreak%20Brass%20Beast

            Here I will have to change country,currency,appid=753, and market_hash_name would be gameid + card_name

            But I dont know , making these many requests for every game for every card would be possible or not.
            And in your code in line 59,add &norender=1 , otherwise it returns html page and get error keyError : ‘results’

          • Dr. Blake Porter

            Nice! It is certainly possible just make sure you keep the delay in the code. I collected all the items for Dota 2, CSGO, PUBG, and Rust which is over 100,000 items. Had no issues

  1. Jeremie

    i get an error with [‘results’]

    Items 0 out of 311823 code: 200
    Traceback (most recent call last):
    File “steamanalyst.py”, line 50, in
    allItems = allItems[‘results’];
    KeyError: ‘results’

  2. Carter

    When I attempt to run it I get the error, “name ‘data’ is not defined. I also get the error, “TypeError: ‘NoneType’ object is not subscriptable”. Let me know if you need more information!

    • Dr. Blake Porter

      Hey Carter sorry to hear that. It would be good to know what exact line number that error is coming from as I can’t figure out from your comment where it would be.

  3. Sergey Grigorenko

    How I can get the current price of each object. If you have a file with data on the current price (or on some day in the last months), could you send it by mail?

    • Dr. Blake Porter

      If you want to get the current price of items you will need to run this program constantly, or however often you want for “current”. I do not have these as I have not ran this program since I wrote the post in March.

  4. Taddy

    Hey there!

    I believe I have run the “steamMarket_dataCollection.py” successfully but am confused by the first few lines of the following program. it shows several .pkl files that were not created using the steamMarket_dataCollection.py

    # open file with all names
    with open(gameID+’PriceData.pkl’, “rb”) as file: # Unpickling
    data = pickle.load(file)

    with open(gameID+’PriceData_3.pkl’, “rb”) as file: # Unpickling
    combo = pickle.load(file)

    with open(gameID+’PriceData_2.pkl’, “rb”) as file: # Unpickling
    combo2 = pickle.load(file)

    with open(gameID+’marketDelta.pkl’, “rb”) as file: # Unpickling
    marketDelta = pickle.load(file)

    with open(gameID+’marketPrice.pkl’, “rb”) as file: # Unpickling
    marketPrice = pickle.load(file)

    with open(gameID+’marketVol.pkl’, “rb”) as file: # Unpickling
    marketVol = pickle.load(file)

    I get this traceback when trying to run:
    Traceback (most recent call last):
    File “steam_mrkt_analysis.py”, line 36, in
    with open(gameID+’PriceData_3.pkl’, “rb”) as file: # Unpickling
    FileNotFoundError: [Errno 2] No such file or directory: ‘252490PriceData_3.pkl’

    Were these files you already had? Or did I miss use the first part of this program? I am using python 3.8.

    Thanks !!

    • Dr. Blake Porter

      Hi Taddy,
      Yes those are some other data I collected (they’re how I made the stock-market-like index funds. steamMarket_dataAnalysis.py is really just a template for people who want to analyze their data to give them ideas.

      All those lines are doing is loading the data you saved at the end of “steamMarket_dataCollection.py”. Whatever you named your files there, with allItemsPD.to_pickle(gameID+’PriceData.pkl’); is what you should then load, e.g.

      gameID = ‘578080’;
      # open file with all names
      with open(gameID+’PriceData.pkl’, “rb”) as file: # Unpickling
      data = pickle.load(file)

      I just have multiple saves so I have appended ‘_3’ onto the file names, which is what you see in that error.

  5. MagnusSapiens

    Hi Blake, I get the below error, any ideas?

    On line 86, in

    allItems = allItems[‘results’];
    TypeError: ‘NoneType’ object is not subscriptable

    • Dr. Blake Porter

      Hi Magnus,
      Sorry to hear you’re having issues.
      Can you please tell me what the print out (line 80) says? Specifically, there should be a part that says code: ###.

      If everything is working, you should see code: 200. If there is a code other than 200, please let me know what it is so I can help you figure out the issue.

      Also, before the code: ###, you should see “Items X out of Y”. These numbers should make senses, such as Items 100 out of 1000 or something. If it says Items 0 out of 0, there was an issue finding the names of all the items on the market.

      Please copy and paste what line 80 is printing out when you run the program.

      Cheers,
      Blake

      • Peder

        Hi there!

        Albeit late, I am working with your code now. Thank you for sharing.

        I am getting the same error and it is due to code “429”, which I see is too many requests 🙂

        • Dr. Blake Porter

          Hi Pedro,
          Yeah I used my code again a few weeks ago and it seems like Steam has gotten more strict. I changed the sleep time to a solid 3 seconds and it seems fine, though it is much slower.

          • Peder

            Hello again!

            Thanks for the swift reply. I feel like I tried both 5 and 10 seconds and still got the error. Will try again.

            Thank you!

          • Peder

            EDIT: I realised that I should try updating the cookie, which I’ve seen you note. However, since I was able to do 20-30 loops without issue, it did not occur to me as a cause for concern.

            Guess what. I updated my cookie, went with 4 seconds (possibly too high) and it worked.

  6. j0nez

    Hey there,
    is there a way to filter the items I get from a game?
    E.g.: I only want to get CSGO Stickers from 2020 in AllItemsGet and not every CSGO Item

    Thanks!^^

    • Dr. Blake Porter

      There’s a few ways to do it. Either in step 1 you only save stickers in your itemNames.txt file, or when you get to step 2 you do something like ignore anything that isn’t a sticker.

      Both will require something like an if statement that looks for stickers. This is very easy to do since every sticker just has “sticker” in its name. You can do something like:

      if currItem.str.contains(‘sticker’)

  7. caterpie

    no matter what i do it still says module not named requests, I have tried everything from reinstalling the module, reinstalling pip, resetting my PATH, venv, conda, no matter what I do it will not run

  8. ryan ryan

    hey can i have your contact ? like telegram or something , i need something similar but from a csgo store other than steam . i would like to talk to you .

    • Dr. Blake Porter

      Hi Dennis,
      This is already what the code is doing, going one item at a time and getting its data. That is the only way to get historic skin data like price and volume. But this request is “nested” in a for loop (for currItem in allItemNames) that goes through all the item names I previously collected in Step 1. For your example, currItem would be “★ Butterfly Knife | Case Hardened (Well-worn)”. All you would need to do to get one item at a time is get rid of the for loop part and manually set the currItem variable to your skin of interest.

      If you’re not familiar with how for loops work, I would recommend using something like code academy (https://www.codecademy.com/learn/learn-python-3) to learn.
      Cheers,
      Blake

  9. Den

    Is there a way to do the data scraping for one item only. For example https://steamcommunity.com/market/listings/730/★%20Butterfly%20Knife%20%7C%20Case%20Hardened%20%28Well-Worn%29, it is ★ Butterfly Knife | Case Hardened (Well-Worn); it’s hash name is ★%20Butterfly%20Knife%20%7C%20Case%20Hardened%20%28Well-Worn%29.
    So I was thinking instead of collecting all item’s names or even just category like sticker (as Dr. Blake explained above) to do the search of this item (or any other item) by using it’s hash name and then just continue from step 2 as usual.
    I will apreciate any help Dr. Blake!

  10. Frederik Vrålstad

    Hey,

    Thanks for this guide, it was really helpful!

    However, while implementing your code, I encountered an issue related to the deprecation of the append method in the latest version of Pandas. I found a solution on Stack Overflow that addresses this concern and provides a more efficient alternative.

    In your code, where you use allItemsPD.append(currItemPD, ignore_index=True), the latest versions of Pandas have deprecated the append method. Instead, it’s recommended to collect DataFrames in a list and concatenate them afterward.

    Here’s the improved part of the code:

    # Before the loop, initialize an empty list
    lst = []

    for currItem in allItemNames:
    # … (your existing loop code)

    # Inside the loop, collect the new row (currItemPD) in the list
    lst.append(currItemPD)

    # After the loop, concatenate all DataFrames in the list
    allItemsPD = pd.concat(lst, ignore_index=True)

    I found a detailed discussion about this on Stack Overflow: https://stackoverflow.com/questions/75956209/error-dataframe-object-has-no-attribute-append

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.