Categories
Technology

Google Spreadsheet & SQL Queries

Learning SQL
Creative Commons License photo credit: therefromhere
I am a big fan of Google Spreadsheets, they give you most of the features of a desktop “office” based spreadsheet but your data is stored online, no more need to carry your data on a USB stick.
I’m also a huge fan of databases, SQL is a fantastic tool for querying data.
Now you can combine the best of both worlds: store your data in a spreadsheet and then summarise the data using SQL.

I have uploaded a sample spreadsheet that shows this in action here.
As I’m training for the Edinburgh Marathon (feel free to donate here : http://www.justgiving.com/26miles4chas) I wanted a way to log my data and also summarise it the way that I wanted.

Data Entry

All data should be entered in the sheet entitled “Logger”, simply copy the last row with data, paste it on the row below and then enter your values. The cells in grey contain formula so should not be changed.
All data should be entered in KM, I also have a miles to KM converter on the page if you prefer imperial measures. Once you have logged your runs, you will want to see how they are summarised over the day, week, month or year.

Using SQL

When I was trying to get this to work I could not find an example that worked that’s why I’m writing this.
The basic syntax of a SQL statement is:

SELECT “the columns you want to see”
FROM “wherever the data is stored”
WHERE “you filter the rows of data to show”
GROUP BY “if you want to summarise the data”
ORDER BY “if you want your results sorted”

There are a couple of differences when you are using the Google Query syntax:
You don’t need the “FROM” clause in the query it’s self, it’s passed into the function as a parameter.
There’s an additional “LABEL” clause that allows you to change the column headings in your output.

Using The Query Function

A simple example would be:

=Query(Logger!A1:P999, “select C, sum(E) group by C order by C label sum(E) ‘Distance KM'”)

The first parameter defines the data that you will be querying (the from clause).
The second parameter defines the query, in this case, showing the week and the total KM run in that week.
Note you must always refer to the column letters, even if you have put titles in row 1.
These query statements can get quite long and hard to read. What I like to do is to store the queries in a separate workbook “Queries” and then refer to them:

=QUERY(Logger!A1:P999,Queries!B2)

Feel free to take a copy of my spreadsheet and put it to good use!

49 replies on “Google Spreadsheet & SQL Queries”

This is really interesting and has me thinking about a need it can fill for me. Any idea if one can do CROSSTAB queries in this way? For example, if I have 6 teams in a hockey league, and for each game there could be three outcomes: win, lose, draw. Can I set up a query that will list the six teams down the left column, list the three outcomes across the top and count the number of W, L, D for each team and fill in the table?

Thanks, again for this!

That’s a very nice offer. A little background: I’m the statistician for a local wiffle ball league, and I’m trying to get the whole process onto google docs and off MS Access. So here’s a link to the current working Google Spreadsheet:

https://spreadsheets.google.com/pub?key=0Apgq3e_CQsPCdHVsenhLMGJTaFdaUldQMldMQlBXUFE&hl=en&output=html

The ABs tab is the dummy data. Queries Tab has the three queries I have set up and they work, including the Batting query which uses the Pivot command. The three middle tabs are the query results. So I think I can get what I need so far, which is a tabulation of hits, outs, etc, grouped by batter.

What I need to do now is pull this data into another Sheet or another Google Spreadsheet, so I can do post processing to calculate batting averages, slugging averages, etc, and then sort the data to show the batting leaders.

Any tips, thoughts, advice would be very kind. Thanks for the offer.

Being based in Scotland I’m not too familiar with the stats or even the rules of Wiffle Ball.

What I would say is, any sheet in a spreadsheet can be used as the source of a query:
Sheet1 contains your raw data.
Sheet2 contains the results of a query on Sheet1
Sheet3 contains the results of a query on Sheet2
etc

Does that help?

Ross –

This is an excellent tutorial. Sharing the spreadsheet enabled your short tutorial to have so much more impact than just describing how to do the query. This is an outstanding example of the power of Google docs and learning.

Bless you. 10 minutes after reading this post, I accomplished a QUERY in a Google Docs spreadsheet that I had unsuccessfully spent 2 hours trying to do using their online documentation. It took me a bit of trial and error to discover the ability to create multi-line cells in a Google Docs spreadsheet using (Apple key – return) at the end of each line, but doing so ends all of the nonsense with punctuation marks to parse multi-part queries. Thanks again for posting your explanation and a public copy of your spreadsheet.

thanks very much for this – I’ve been looking everywhere for examples of a query that hides the label in a query result. Hopefully this will help other people on the same quest

My problem query was
=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’New Visitors’ and (A='”&B4&”‘ or A='”&B4&”/’) “)

which was creating “sum” and then the sum of C beneath it.
I tried sticking various flavours of “label” near the SUM, with no luck.

What I missed was that the label bit needs to be right at the end
=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’New Visitors’ and (A='”&B4&”‘ or A='”&B4&”/’) label sum(C) ””)

thankyou!

What about this problem in reverse? …using Google Spreadsheets (pivot tables) to do analysis and dynamic charting of SQL data. Any advice on a direction?

When you say “SQL Data” do you mean data held in a relational DB?
If so – I’m not aware of any way of querying data that is not already “in” Google.

What if I want to import Google Form to the Database instead. Would that work, too?

I’m not “really” a developer now so I’m not the best person to answer that question, maybe Matt has an opinion?
Based on the API calls in the documentation in his comment I would suggest it’s possible but, as to “how” to code it ……

Do you think it would be possible to do the opposite as well?

For example, I’d like to pump an SQL query results into a Google Spreadsheet so that I can use Google Charts and have them update automatically for reporting.

Does anyone know if this exists already?

If you have a look at the documentation that Matt links to above that shows you the API calls to access access a JDBC database. I don’t have a database at the moment that i can test this with unfortunately.

Have you been able to use the substring function in Google SQL queries? If so, can you provide an example?

I am trying to select only certain text from a column of my data.

thank you for posting an actual worksheet to work with.

i’ve been having issues referencing a ‘where’ as a dynamic cell.
i’m trying to create a large db for all my service invoice data and be able to call it based on client and date.

any help would be most appreciated.

Here is what I did, I have the query defined in one cell and then use a function to modify it.

select *
where Col1 > date ‘2008-01-01’
order by Col1 desc
limit **Limit**

Then

=substitute(B2, “**Limit**”, Dates!$B$4)

Where B2 is the cell with the query and Dates!$B$4 is my dynamic value
In my query I then refer to the result of the substitution.
This allows me to edit the contents of B4 and my results redraw !

Please watch out for those darned “smart-quotesâ€? re-type them in your editor !

Hi,

I created a query which “asks” the user to enter data into specific cells and the query then shows the result based on what the user wants (dynamic query, similar to advance filtering. For ex’ if my user fills in USA then the query will display the GDP of USA only).

I wonder how I can create an even more complex query that will ask the user if he seeks data greater than/less than/equal to a certain number.
For ex’- let’s say that cell A1 will have 3 options (less/greater/equal) and cell A2 will be a blank cell that the user will need to type a number.
I need to combine this in my query so that if my user will enter “Greater than” & “100” then only the data >100 will be shown, and if USA and “Greater than” & “100” then it will show only >100 data of USA only and so on…

I don’t know how to declare this feature in a query…

Any ideas?

Thanks in advance!

The easiest way to think about this is : can you create an excel formula that would build your SQL string for you?
eg
=”select a, b, c where a ” & A1 & ” ” & A2
would result in
select a, b, c where a > 100
You could add validation to A1 to restrict it to valid operators “><=". You can then make your query function reference the function you have built above!

Rob,

I would like to get your help in understanding the query function more.
I am trying to do statistics for a game I play online, but unable to get the query to produce the results I need.
Can you help with this?

basically the sheet gets its information from a form that was created.
so date is A
Name is B
the other information is C to AU
but I would like to get the following
select B, sum(C:D), max(E), sum(F:G), max(H), sum(I:J), max(K), sum(L:M), max(N), sum(O:P), max(Q), sum(R:S), max(T), sum(U:V), max(W), sum(X:Y), max(Z), sum(AA:AB), max(AC), sum(AD:AE), max(AF), sum(AG:AH), max(AI), sum(AJ:AK), max(AL), sum(AM:AN), max(AO), sum(AP:AU)

grouped by B

and I understand that ; 1 means to import the first row as the header

The easiest way to resolve this would be if you were to share the spreadsheet (or a copy of it) with me.
I can then see what errors you are getting and once you are happy you can unshare it.

I have each column explained on the Totals tab and would like to build separate tabs for grand totals like I did on the name tab as well, so when I build the charts to post on the webpage they auto update, instead of me going in and doing all the math and things by hand.

Was a pleasure working with you online – you may want to consider if you want the link to your spreadsheet above to remain public (or did you share it just with me)?
You may want to leave it public (read only) to show off your end result !

Ross,

Thank you for your help. You taught me a lot in a very short time frame.
I will be sure to recommend others to you if they are in the same situation as I was in.
And really words can’t say a lot, but you made my day and taught me so much.
If the Google contributors could teach people the same way as you then there would not be so many people complaining. But the only downfall is, you would have to find better ways to teach people.

Thanks
Rich

I will leave the spreadsheet public to show others more examples on how well Query works and how they can manage their results really easily

Hi Ross,
Need your help in following query
select J,
sum(Q), sum(R), sum(S), sum(T), sum(U), sum(V),
sum(W), sum(X), sum(Y), sum(Z), sum(AA), sum(AB),
sum(AC), sum(AD), sum(AE), sum(AF), sum(AG), sum(AH),
sum(AI), sum(AJ), sum(AK), sum(AL), sum(AM), sum(AN),
sum(AO), sum(AP), sum(AQ), sum(AR), sum(AS), sum(AT),
sum(AU), sum(AV), sum(AW), sum(AX), sum(AY), sum(AZ),
sum(BA), sum(BB), sum(BC), sum(BD), sum(BE), sum(BF),
sum(BG), sum(BH), sum(BI), sum(BJ), sum(BK), sum(BL),
sum(BM), sum(BN), sum(BO), sum(BP), sum(BQ), sum(BR),
sum(BS), sum(BT), sum(BU), sum(BV), sum(BW), sum(BX),
sum(BY), sum(BZ), sum(CA), sum(CB), sum(CC), sum(CD),
sum(CE), sum(CF), sum(CG), sum(CH), sum(CI), sum(CJ)
group by J
label J “Team”,
sum(Q) “W1”, sum(R) “W2”, sum(S) “W3”, sum(T) “W4”, sum(U) “W5”, sum(V) “April 2014”,
sum(W) “W1”, sum(X) “W2”, sum(Y) “W3”, sum(Z) “W4”, sum(AA) “W5”, sum(AB) “May 2014”,
sum(AC) “W1”, sum(AD) “W2”, sum(AE) “W3”, sum(AF) “W4”, sum(AG) “W5”, sum(AH) “June 2014”,
sum(AI) “W1”, sum(AJ) “W2”, sum(AK) “W3”, sum(AL) “W4”, sum(AM) “W5”, sum(AN) “July 2014”,
sum(AO) “W1”, sum(AP) “W2”, sum(AQ) “W3”, sum(AR) “W4”, sum(AS) “W5”, sum(AT) “Aug 2014”,
sum(AU) “W1”, sum(AV) “W2”, sum(AW) “W3”, sum(AX) “W4”, sum(AY) “W5”, sum(AZ) “Sept 2014”,
sum(BA) “W1”, sum(BB) “W2”, sum(BC) “W3”, sum(BD) “W4”, sum(BE) “W5”, sum(BF) “Oct 2014”,
sum(BG) “W1”, sum(BH) “W2”, sum(BI) “W3”, sum(BJ) “W4”, sum(BK) “W5”, sum(BL) “Nov 2014”,
sum(BM) “W1”, sum(BN) “W2”, sum(BO) “W3”, sum(BP) “W4”, sum(BQ) “W5”, sum(BR) “Dec 2014”,
sum(BS) “W1”, sum(BT) “W2”, sum(BU) “W3”, sum(BV) “W4”, sum(BW) “W5”, sum(BX) “Jan 2015”,
sum(BY) “W1”, sum(BZ) “W2”, sum(CA) “W3”, sum(CB) “W4”, sum(CC) “W5”, sum(CD) “Feb 2015”,
sum(CE) “W1”, sum(CF) “W2”, sum(CG) “W3”, sum(CH) “W4”, sum(CI) “W5”, sum(CJ) “Mar 2015”

Here column BY is treated as a keyword so google query is unable to parse this.
please help me out on this.

Would you be able to send me a link to the spreadsheet (or a copy of). It’s easier to troubleshoot when I can see the actual query in line with some data.

Hi Ross,
Thanks for finding time for the problem.
Please find below sample sheet
https://docs.google.com/a/cssoftsolutions.com/spreadsheet/ccc?key=0ArpOCIFSXY7IdC16NFpGeExhSzYtLWREejktbjJ1RFE&usp=sharing

Here you can see number of columns are too large and crossing column name BY as well.
When i tries to use this column in my query it’s unable to parse this due to the fact that BY is itself a keyword

Is there any way in which we can use column header instead of column name ?
for example Column Name J is named as Sales_team_member .

so want to use query like select Sales_team_member, sum(P) order by Sales_team_member

Regards
Vajinder

Ah yes, I see the problem now, the column reference BY is being interpreted as the keyword “by”.
This is a bug!
The only thing that I can think to do is to add a layer of abstraction.

Instead of using
=QUERY(WorkBook!A1:BZ999,”Select A, B”)
use
=QUERY(importrange(“YourSpreadsheetKey”, “Workbook!A1:BZ999”), “Select Col1, Col2”)

The second example pulls the data from a different spreadsheet so you need to supply the key for that spreadsheet.
Also, because it’s running on imported data rather than a spreadsheet directly all of the column references switch from A, B, C etc to Col1, Col2, Col3 etc

In response to Amit about constructing ‘dynamic’ queries, where the query changes based on the content of certain other cells: be aware that the whole query thing inside the function is essentially a string that gets parsed by the function. That means (and this works for many other functions as well) that you can ‘dynamically’ construct the string by just concatening the pieces:

=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’” & InputFromUser!B1 & “‘”)

You will no doubt have to do some fiddling with quotes, but this works perfectly ok. If you want to, you could do an operation on the input first; validating it, or maybe lowercasing it to get a more general search result. As long as what you construct ends up as a string (piece of text) and that string is valid syntax as a query, it works.

How to do division by query, considering the data from different spreadsheet.

How to do division by query, considering the data from different spreadsheets.

okay.. Thanks for your reply.. For further clarification, The data is in two separate sheets (Example sheet 1 and sheet 2). I will be happy, if you can help me with a query, to divide the sum of column A in sheet 1 with the sum of column A in sheet 2. I like to put the query in sheet 3 and want to use only query function.

Hope I am clear..

Thanks in Advance..

The simplest option (pseudo code) is:
Sheet3!A1 = query(sheet1, select sum(ColA))
Sheet3!A2 = query(sheet2, select sum(ColA))
Sheet3!A3 = A1 / A2

Ross, great article, trying to do some fun data mining. I’m inserting two different tables successfully with ImportHTML. However I want to identify records in rows to which table it belongs.

In this particular case (link below) for example the player Higuain plays both in Serie A (table,0) and UEFA League (table,3). How to add a record in the column ‘Competition’ that would identify the table when I’m sorting each game by date (Sorted_Higuain_Last5Games) sheet?

https://docs.google.com/spreadsheets/d/1Tmlj8x-yKzsEpKcJ338O-0WXEK9TobVmrkUN7Aq6JAY/edit?usp=sharing

There is no insert sql function that allows to do that. Do you think this is possible?
Many thanks

The only thing that springs to mind would be to name your columns explicitly.
Rather than
select *
use
select col1, col2, col3, …., “Serie A” as Competition

The point to note is the last column, this last column is a constant with the text that you specify and the column heading of Competition.
In each of your queries you will hard code a different value as appropriate.

Hi, I have a query =QUERY(‘Elite Transaction’!D3:H51,”select E, avg(H) group by E”). I am trying to divide AVG(H) by a number of cell value but I keep getting syntax error, even with brackets it doesn’t work. =QUERY(‘Elite Transaction’!D3:H51,”select E, avg(H) / 100 group by E”)

Can you confirm what error you are getting?
I have done a quick mock up which divides by a value held in a cell :

=query(A1:C9, “select A, avg(C) / ” & I1 & ” group by A”)

This seems to work OK for me.

Leave a Reply to Vajinder Cancel reply

Your email address will not be published. Required fields are marked *