How to get raw contact pageviews analytics data from Hubspot

Feb 22, 2020 | 3 minute read

If you tried to extract data from Hubspot to your data warehouse using tools like Stitch, Segment or Fivetran, you may have noticed that it gets only the most basic stuff.

There is some more detailed data hidden in Hubspot API. In this blogpost I'll show you how to get all pageviews with full URLs for every contact in your Hubspot database.

You can use this data to properly model user journey and get information hidden from you by default (see Attribution analytics in Hubspot is bad). This can be very useful for Account Based Marketing, such as getting all pageviews for the whole company. This will let you create complete attribution model using data from each stakeholder activity.

Extract and transform

The trick is to use optional property parameter when calling Hubspot Contacts API described at https://developers.hubspot.com/docs/methods/contacts/get_contacts. One tiny thing called hs_analytics_last_url is updated on every pageview with new URL. Hubspot stores full history of this property, which means you can get all historic pageviews using following API request params: property=hs_analytics_last_url&propertyMode=value_and_history. Today we'll extract data for conversions, too. Property is called recent_conversion_event_name and all request params will look like this:

property=hs_analytics_last_url&property=recent_conversion_event_name&propertyMode=value_and_history

Refer to this helpful Ruby script:

require 'RestClient'
require 'json'
require 'csv'

@hapikey = ARGV[0] || ENV['HAPIKEY']

def list_contacts(offset = 0)
  RestClient.get "http://api.hubapi.com/contacts/v1/lists/all/contacts/all",
  {:params => {
    :hapikey => @hapikey,
    "count" => 100,
    "vidOffset" => offset,
    "property" => "hs_analytics_last_url",
    "property" => "recent_conversion_event_name",
    "propertyMode" => "value_and_history"
    }
  }
end

def parse_contacts(offset = 0)

  response =  JSON.parse(list_contacts(offset))

  # get batch of contacts
  contacts = response['contacts']
  
  if contacts.size > 0
    contacts.each do |contact|
      get_contact_interactions(contact)
    end
  end

  # try another batch
  if response["has-more"]
    puts response["vid-offset"]
    parse_contacts(response["vid-offset"])
  end
end

def get_contact_interactions(contact)
  CSV.open("pageviews.csv", "a+") do |csv|
    if contact["properties"]["hs_analytics_last_url"]
      contact["properties"]["hs_analytics_last_url"]["versions"].each do |pageview|
        if pageview["value"] != ""
          csv << [contact["vid"], pageview["timestamp"], pageview["value"]]
        end
      end
    end
  end

  CSV.open("conversions.csv", "a+") do |csv|
    if contact["properties"]["recent_conversion_event_name"]
      contact["properties"]["recent_conversion_event_name"]["versions"].each do |conversion|
        if conversion["value"] != ""
          csv << [contact["vid"], conversion["timestamp"], conversion["value"]]
        end
      end
    end
  end
end

parse_contacts

This script goes through your contacts and saves all pageviews in pageviews.csv file and all conversions (form submits) in conversions.csv file. You can then upload them to your own database for further analysis.

Example lines of both files:

Pageviews:

10104,1439153747166,http://example.com

Conversions:

6768,1436775647277,Newsletter

First value is contact vid, second is unix timestamp in miliseconds, and last one is URL in case of pageviews or form name for conversions.

If you want to have something more user friendly than timestamp, you can transform it to 2000-01-01 10:10:10 +0100 format with this code:

Time.at(timestamp.to_i/1000.floor)

Analysis

Having standard contacts and companies data loaded to data warehouse, you can join it with your custom pageviews and conversions history. Grouping all pageviews by contact company should give you much better knowledge of the whole customer journey, especially in B2B.


Hubspot Analytics ETL