Table of contents
We’ll be using data from the South Pole. Who doesn’t like penguins? The data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.
To load the data into your console, do the following:
// this is the url that you need to fetch the data let url = "https://raw.githubusercontent.com/seblammers/sebastianlammers.com/master/src/routes/api/data.json/penguins.json" // this will load the data into your session under the name "data" let data = await (await fetch(url)).json();
If you’d rather work in a REPL, I have a starter-template for you that already contains the data and it’s waiting for you over here ↗ .
With that out of the way, let’s dive into our dataset and see what we can learn about it!
Firstly, after loading our data, let’s check out which facts about the penguins are actually captured in the dataset by looking at the names of the columns. Remember that there is a way to access all keys in a given object? That’s exactly what we need to get our column names!
let columnNames = Object.keys(data); console.log(columnNames); // logs ["species", "island", "bill_length_mm","bill_depth_mm","flipper_length_mm","body_mass_g","sex","year"]
Our column names are
How did we get these?
We combined an object and an array operation:
data will access the first row of our data aka the first object in our array.
This single object is passed into the
keys() function that gets us the keys in that object.
The column names give us a rough impression of what the data might be able to tell us.
We know sth. about the type of penguin (
species) and where they live (
We also have some measurement of their bodily features, such as how long their
bills... (aka beaks) and their
flippers... (aka wings) are.
Oh, and we know something about how much they weigh (
I’m not a penguin expert, but I suspect that we’ll see differences in some of the bodily features if we compare the different species.
Maybe we’ll see that later?
Next, let’s see how many datapoints are actually included in our data. This is a simple sanity check that is useful for at least two things:
- if you already know how big your dataset is, this let’s you check if reading in the data actually worked as expected.
- If you don’t have any clue about the data, this gets you a feel for how large the dataset is that you’re about to work with.
You already know how to do this one: get the length of the array!
let observations = data.length; console.log(observations); // logs 344
There are many reasons why you might want to get a subset of rows from your table. The two most frequent reasons might be these:
- You want to get some kind subset to peek inside your data without being overwhelmed by rendering all of it.
- You want to get a specific subset to e.g. visualize/work on separately from the rest.
In the previous steps you learned about the column names and how many observations are included in total. Now, let’s actually render some of the data to the screen. But not all of it, because we don’t want take up the whole screen or trap everything in a giant scrollable container. Instead, let’s take a glimpse on the first 3 rows of the data to get a feel for it.
We take a look at the head of the body of data…
To do that we use a standard array method called
let head = data.slice(0, 3);
The result looks like this:
Now you can view some sample data from a subset of our 344 observations for all of the 8 columns (if you scroll a little).
You could even log this to your console, if you just wanted to take a quick peek.
Or you can use
slice() to create a nice and small subset to start sketching out a data visualization (for that, it might be a good idea to increase the number of rows to more than 3 when calling
Starting out with a subset, where you more or less know the datapoints let’s you create a draft viz and then iterate from there.
Further resources on slice()
But sometimes you might be interested in a particular subset of your data.
For that scenario,
slice() is not useful, since it returns whatever is currently listed on top.
If you want more control, you need to filter your data by a criterion or condition.
Let’s say you are interested in what penguins Rita hangs out with on their home island called Dream Island.
To find out more about Dream Island, let’s get all observations where
island === "Dream".
let dreamIsland = data.filter(row => row.island === "Dream"); console.log(dreamIsland.length); // logs 124
You can read this as: take the data and filter it. For each row, check if the variable called
island is equal to
"Dream". If so, include in the output.
Nice and simple, right?
Now you can work with the data that is specifically about the observations from Dream Island.
I’m not rendering these to the screen, but you can see that 124 out of the 344 observations are about penguins from Dream Island.
Bonus: Filter by multiple conditions
Do you want to filter by multiple conditions in one call? Easy:
let dreamIslandAdelie = data.filter(row => row.island === "Dream" && row.species === "Adelie"); console.log(dreamIslandAdelie.length); // logs 56
Further information on the logical AND (&&) in the MDN web docs.
Further resources on filter()
Especially for categorical data it is useful to know how many distinct values there are for a given variable.
In our case, let’s take a look at the distinct penguin species that the researchers observed.
We’ll use a combination of
map() to achieve that.
So, we’ll take advantage of the fact that a
Set only retains unique values.
Let’s pull back our simple example array and then use it create a
Set from it, before we advance to an example that uses our big dataset.
// our original array has 3 entries, but only 2 of them are unique let penguinSpecies = ["Adelie", "Adelie", "Chinstrap"]; // when we create a set from that array... let speciesSet = new Set(penguinSpecies); console.log(speciesSet); // logs ["Adelie", "Chinstrap"]
Now, isn’t that convenient? All we need now is a way to go through all entries in our array and inside each entry, visit our variable of interest.
Sure, we can write a for-loop, but there is nice array method to do this more succinctly for us.
This is where
map() comes in to save our day!
Here is a super-duper simple example of what
map() can do:
// starting out with our weights in grams let bodyMassGrams = [3750, 3800, 3250]; // using map() to convert to kilograms let bodyMassKiloGrams = bodyMassGrams.map(x => x / 1000); console.log(bodyMassKiloGrams); // logs [ 3.75, 3.8, 3.25 ]
Simple: for each entry in our array of body mass values, divide it by 1000 and then save the result to a new array.
Nice, now we put
map() together to finally retrieve our distinct species values.
let distinctSpecies = [...new Set(data.map(row => row.species))]; console.log(distinctSpecies); // logs ["Adelie", "Gentoo", "Chinstrap"]
Yay! There is a little more going on here than we encountered before, so let’s unpack it a bit.
We take our array called data and use
map() to visit each entry in it.
Each entry is an object that we can nickname row to refer to its relation in the spreadsheet-style image.
Inside each of the rows, we visit the species-variable via dot notation.
Lastly, we put whatever value we find inside the species-variable into a
As we learned above,
Set actually takes care of only storing unique values, so duplicate entries are simply not added to the resulting
That’s it! We found the distinct values in our species-variable! Doing the same for the islands is trivial now… I’ll leave that to you, if you want to flex your new muscles.