If you tried to extract data from Hubspot to your data warehouse using tools like Stitch, Segment or Fivetran, you may have noticed that it gets only the most basic stuff.
There is some more detailed data hidden in Hubspot API. In this blogpost I’ll show you how to get all pageviews with full URLs for every contact in your Hubspot database.
You can use this data to properly model user journey and get information hidden from you by default (see Attribution analytics in Hubspot is bad). This can be very useful for Account Based Marketing, such as getting all pageviews for the whole company. This will let you create complete attribution model using data from each stakeholder activity.
The trick is to use optional property
parameter when calling Hubspot Contacts API described at https://developers.hubspot.com/docs/methods/contacts/get_contacts. One tiny thing called hs_analytics_last_url
is updated on every pageview with new URL. Hubspot stores full history of this property, which means you can get all historic pageviews using following API request params: property=hs_analytics_last_url&propertyMode=value_and_history
. Today we’ll extract data for conversions, too. Property is called recent_conversion_event_name
and all request params will look like this:
property=hs_analytics_last_url&property=recent_conversion_event_name&propertyMode=value_and_history
Refer to this helpful Ruby script:
require 'RestClient'
require 'json'
require 'csv'
@hapikey = ARGV[0] || ENV['HAPIKEY']
def list_contacts(offset = 0)
RestClient.get "http://api.hubapi.com/contacts/v1/lists/all/contacts/all",
{:params => {
:hapikey => @hapikey,
"count" => 100,
"vidOffset" => offset,
"property" => "hs_analytics_last_url",
"property" => "recent_conversion_event_name",
"propertyMode" => "value_and_history"
}
}
end
def parse_contacts(offset = 0)
response = JSON.parse(list_contacts(offset))
# get batch of contacts
contacts = response['contacts']
if contacts.size > 0
contacts.each do |contact|
get_contact_interactions(contact)
end
end
# try another batch
if response["has-more"]
puts response["vid-offset"]
parse_contacts(response["vid-offset"])
end
end
def get_contact_interactions(contact)
CSV.open("pageviews.csv", "a+") do |csv|
if contact["properties"]["hs_analytics_last_url"]
contact["properties"]["hs_analytics_last_url"]["versions"].each do |pageview|
if pageview["value"] != ""
csv << [contact["vid"], pageview["timestamp"], pageview["value"]]
end
end
end
end
CSV.open("conversions.csv", "a+") do |csv|
if contact["properties"]["recent_conversion_event_name"]
contact["properties"]["recent_conversion_event_name"]["versions"].each do |conversion|
if conversion["value"] != ""
csv << [contact["vid"], conversion["timestamp"], conversion["value"]]
end
end
end
end
end
parse_contacts
This script goes through your contacts and saves all pageviews in pageviews.csv
file and all conversions (form submits) in conversions.csv
file. You can then upload them to your own database for further analysis.
Example lines of both files:
Pageviews:
10104,1439153747166,http://example.com
Conversions:
6768,1436775647277,Newsletter
First value is contact vid, second is unix timestamp in miliseconds, and last one is URL in case of pageviews or form name for conversions.
If you want to have something more user friendly than timestamp, you can transform it to 2000-01-01 10:10:10 +0100
format with this code:
Time.at(timestamp.to_i/1000.floor)
Analysis
Having standard contacts and companies data loaded to data warehouse, you can join it with your custom pageviews and conversions history. Grouping all pageviews by contact company should give you much better knowledge of the whole customer journey, especially in B2B.
Unfortunately, the data you get here is still not complete. First pageview Hubspot actually saves in contact profile is the pageview only before first conversion. All previous pageviews are ignored.
Also, be aware of undocumented "feature": full list of recorded pageviews is available only using "get contact by id" endpoint. List of contacts has missing data without telling you that.