A Powerful Way to Explore Possible Identifications

With some prompting by @billdodd, I have once again started using the Global Biodiversity Information Facility (www.gbif.org) to help with identifications of my iNat observations. Because of the massive size of their database (1.6 million species; 700 million occurrences), it offers a vast network of possible natural history occurrences through which to search. Then, combining their search results with an image search on Google Images offers a quick visual check of possibilities. This one-two punch is often sufficient to narrow an identification considerably and often nail it exactly. This is particularly true for groups of organisms or particular areas which may not be well-represented in the iNat database or other common resources.

But there’s a problem: How to do you narrow a search in such a huge database like GBIF? I have found it particularly powerful when I can begin by limiting my search parameters to a certain family or genus (of plants, for instance). The combination of a “species” search (which can be tailored to a family or genus) and the mapping capabilities of GBIF can really help you zoom in on likely possibilties. Here is an example from my recent efforts:

A Maidenhair Fern from Chiapas:

In January, I photographed a maidenhair fern in dry oak-pine woodland in the mountains surrounding San Cristóbal de las Casas, Chiapas, Mexico:
http://www.inaturalist.org/observations/5066655
I knew that there were several species in Mexico. A quick search of the iNat database alone indicates at least 13 species in Mexico:
http://www.inaturalist.org/observations?place_id=6793&taxon_id=48436&view=species
Even selecting for just the state of Chiapas indicated at least four possibilities:
http://www.inaturalist.org/observations?place_id=97003&taxon_id=48436&view=species
but there were only four iNat observations of the genus from that state so that limited my confidence in this comparative set.

So I went to the GBIF database: www.gbif.org and did this: From the home page, under the “Data” menu at the top, I chose “Explore Species”:
GBIF - Data Menu - Ssearch Species

This brings up a simple search field: http://www.gbif.org/species. Next, I typed “Adiantum” into the search field. (I didn’t bother to limit my search to “Vascular Plants” although I could have.)
GBIF Search bar - Adiantum

The subsequent search results gave me 1,307 results for “Adiantum”, the very first one of which is the one I'm interested in: “Accepted Genus, Adiantum L.” Make sure you find the one you want; you may also see other names labeled "Doubtful Genus", "Doubtful Species", "Species Synonym", etc. Sometimes the family or genus name may have two different entries spelled the same but with different original authors. You'll always want to pick the "Accepted ..." of whatever your focused on. A taxonomic classification listed under the name also helps to make sure you're in the right group of organisms; notice in the screen capture below that there is a genus of mammals spelled "Adianthus"!
GBIF Search Results - Adiantum small

Ignoring all the ways to “Refine your search” (right hand panel), I just clicked on that first result for “Adiantum L.” That brings up an Overview page.
GBIF Species Search Results - Adiantum

Below the Overview box you’ll find a map of the world. Next to the map, it indicates that this search found 63,779 georeferenced data points.
GBIF-Adiantum-Worldwide

Adiantum is distributed on all continents except Antarctica. The next step offers the most powerful capability: Just zoom into the area of interest! (With the little roller brush icon in the lower right of the map, you can choose among various views such as Classic, Night, Terrain, Satellite, High Contrast, or Roads. Since I am a Google Earth fanatic, I often choose Satellite, but “Roads” is also useful where I need to navigate to my location by that means. As you zoom, the specimen dots may seem to disappear--they are probably just too small. When that happens, go to that handy "paint roller" icon again and choose a larger point size.) So I zoom and zoom and zoom, until I’m looking at just the area around San Cristóbal de las Casas, in the middle of Chiapas:
GBIF Map - Adiantum - San Cristobal

Then clicking on “In viewable area” in the right panel of the map brings up a list of all those georeferenced data just in my area of interest--that is, just the area I've zoomed into. A portion of that list is shown in this screen capture:
GBIF Georef data - Viewable area results - Adiantum

Again, the "Refine your search" panel on the right side would allow me to subset this group of specimens more if the number of returns was still too large or if I want some other refinement. In this case, I don't need to, since I have narrowed my possibilites from 60,847 records to just 32 records. This represents 32 museum specimens of Adiantum collected through the years right around San Cristóbal! Scanning down the list, I find that five species of Adiantum have been collected there: A. andicola , A. braunii , A. capillus-veneris , A. concinnum , and A. patens. In actuality, an even closer zoom into San Cristóbal—where a lot of botanical collecting has been done—narrows my search to just andicola, capillus-veneris, and concinnum, and of these the first is the most numerous in the database by far.

Now I can open a new tab in my browser and have Google Images ready. A quick search through each of the three relevant Adiantum’s shows that my photos (especially the view of the underside of the fronds) is a good match to only Adiantum andicola. And I have my ID!

A few caveats with this method:

— No database is complete, even GBIF. Depending on how specific an area you zoom to on the map, there may or may not be a good selection of specimens. So gauge your zoom area to get a reasonable set of possibilities. If you zoom in too far, you may miss all records or you may accidentally exclude relevant ones. If your area of interest has no specimens right near it, back out to encompass the nearest county or state to gather more records. And keep in mind that you are widening your search, so the precise species you find in that larger area may or may not occur where you were.

— No database is infallable. There are always the possibilities of mis-identified museum specimens, or changes in taxonomy not reflected in the GBIF search. [In particular, see the link to @rdmpage's blog in @kueda's comment, below.] Related to this, Google Images is a great resource for finding images of species, but it is often very “inclusive” when grabbing images off the internet, especially when it can’t find your specific search target. When scanning through Google Images, ALWAYS check the description and/or go to the original page to double-check that the pic your looking at matches what you searched for.

–– Interestingly, GBIF taps into iNaturalist observations. If any meet your criteria and geographic area of interest, they will show up in your search results. That's a double-edged sword: Good for seeing local iNat observations along with specimen records, but Bad because of the potential pitfalls we are all familiar with using crowd-sourced identifications. In that helpful right-hand panel to "Refine Your Search", you can include or exclude "Human Observations" (where iNat observations fall); by default they are included.

— Most importantly, keep in mind that photos may not be sufficient to make a solid identification. This may be due to the technical details of how similar species are distinguished and/or it may be due to the quality of your photos. You can always help your cause by photographing all the important parts of a plant or critter. For plants in particular, it’s not always obvious what characters will help distinguish one species from another, and those characters may or may not be visible in a set of photos. Be sure to photograph the whole plant, the leaves, the flowers and fruit if present, and get good close-ups at various angles. This might includes top and side views of flowers (e.g. for Composites), views of the underside of leaves, etc., etc. If you’ve ever worked through dichotomous keys for a given group of organisms, you’ll have some sense of the characters to emphasize in images; this just comes with experience. If you have some science training, a little supplemental research with something like Google Scholar, Archive.org, or JSTOR may point you to useful technical papers to help with your ID. A particularly useful link to the Biodiversity Heritage Library is found in the right-hand panel of the Overview window in the search results on GBIF. That will bring up a list of very specific references to your species of interest; you can dive into that rabbit hole and follow it ad infinitum.

Lähettänyt gcwarbler gcwarbler, 7. helmikuuta 2017 20:03

Havainnot

Kuvat / Äänet

Havainnoija

gcwarbler

Päivämäärä

Tammikuu 14, 2017 09:33 AM CST

Kuvaus

Since there are apparently several species that might occur in Chiapas, I'm only going to upload this at the genus level.

Merkinnät

Kommentit

Really interesting post, Chuck!

Lähettänyt muir melkein 5 vuotta sitten (Lippu)

Given that the taxon pages suck so badly now, i figured i ought to try this out. Nice tool! Provided one has a starting place for the ID.
Up here on the panhandle, though, it tends to show me a lot of my own observations, which makes me feel like i am accomplishing something, but only if the IDs are correct. So that's a teleological dilemma.
We amateurs are stacking the deck now, in a major way.

Lähettänyt ellen5 melkein 5 vuotta sitten (Lippu)

@ellen5, I would think that TxTU or other regional institutions have substantial herbarium holdings. It would be of interest to know if GBIF taps into those records. There's probably some way to explore that, but I'm a relative novice on GBIF.

Lähettänyt gcwarbler melkein 5 vuotta sitten (Lippu)

"Provided one has a starting place for the ID" is sooooo true. I tried this GBIF method to double-check a Gecko observation I made in Hawaii a few years ago (http://www.inaturalist.org/observations/5051777). The critter had already been IDed by @ritt, so this was just to check against GBIF database. In that case, I searched for "Gekkonidae" and zoomed into the Hawaiian Islands and the possibilities were relatively finite in number. I'll have to practice to see if the GBIF-Google Images methodology is of any use for something like "Apoidea" in a well-collected area. [Just tried it and it doesn't offer refinement for anything between "Order" and "Family".]

Lähettänyt gcwarbler melkein 5 vuotta sitten (Lippu)

Great walkthrough, Chuck! Thanks for putting this together. I learned a couple of new tips.

Lähettänyt billdodd melkein 5 vuotta sitten (Lippu)

If you're interested in reading more about the reliability of GBIF data more broadly, @rdmpage has written quite a bit about it. Here's one of his blog posts on the subject: http://iphylo.blogspot.com/2014/08/seven-percent-of-gbif-data-is-usable.html. Summary (despite the title/url): it's complicated. It's too bad Google doesn't seem to have an API for their image search any more, or you could imagine making a tool that kind of wraps up your protocol, Chuck.

Best way to figure out if an herbarium contributes to GBIF is probably to either search GBIF for records you know to be in the herbarium, or just contact the herbarium. If they don't, they would need to a) get GBIF or the US GBIF node to agree to integrate their data, b) have their specimens databased, and c) publish that dataset in a format GBIF can consume. Lots more at http://www.gbif.org/publishing-data/quick-guide

Lähettänyt kueda melkein 5 vuotta sitten (Lippu)

@rdmpage's blog (above) is informing and sobering...and not unexpected. I briefly recite some of the same caveats in my post above. I think the value of my overview is, of course, not so much for conservation status--a much weightier examination of data--but rather just to give an iNatter some handle on how to narrow ID possibilities, with all the warts that come with such results. The methods above have the most utility if you have some sort of starting point of reference (like "Adiantum or "Gekkonidae") and are in an area with limited iNat coverage at present. In those circumstances, GBIF is better than nothing!

Lähettänyt gcwarbler melkein 5 vuotta sitten (Lippu)

yes,i think it's going to be fantastic for doing a Reality Check on an ID for something unfamiliar

Lähettänyt ellen5 melkein 5 vuotta sitten (Lippu)

As usual, another really great and thought provoking journal entry, Chuck! :) I also enjoy the comments and discussion too.

I had used GBIF a little bit before when I worked in the herbarium and as we were databasing some of our specimens. I'd heard that 'ooooh, it's such a messy database and not too reliable.' However, in all honesty, if anyone spends a considerable amount of time in a large herbarium, they'll see how messy any collection is! At BRIT, there are approximately 1 million plant specimens. About 10- 20 percent of these are TX specimens, and they're relatively well curated. However, there are bookoos of specimens from Mexico and South America that are riddled with old taxonomy and insufficient identifications. They just wait to be databased and examined to give some new life into their existence.

My two cents: I try my best to make iNaturalist as reliable of a database as I can. Sure, I make lots of mistakes in ID's, but I do try to rectify them and change. On my observations that lack the right photo angle (ugh, Asteraceae), I try my best to get better/more shots the next time I see it. Like in the herbarium, many of my obscure observations of some relatively unknown organism wait and wait until an expert comes along, but that's ok. The functionality of the data is important by having the observation exist in the first place (by 'functionality of the data' I mean 'getting people to care about the critters around them'). :)

"If you don't like something, change it. If you can't change it, change your attitude." ~ Maya Angelou

Also, Chuck, will you look at all of my moths again? OK, thanks! ;) (just kidding!!!)

Lähettänyt sambiology melkein 5 vuotta sitten (Lippu)

I'm gonna bookmark this. I'm sure it will prove valuable once i get the hang of it.
Question: is there a way to select for records having images?
I find google searching to be really inefficient, as it prioritizes marketable stuff that doesn't even match the search string

Lähettänyt ellen5 melkein 5 vuotta sitten (Lippu)

I don't think GBIF has the capability of searching only image databases. That would be awesome, but understandably, very few institutions have yet digitized any of their natural history collections--the U.T. herbarium being a wonderful exception!

Also, let me clarify: My search efforts (for images) use "Google Images", available as a link in the upper right corner of the Google home page. (I have it bookmarked in my Favorites in Safari for easy access, along with "Google Scholar" and some other tools.
As you've encountered, searching for images of a plant or animal can bring up some strange results. It ALWAYS requires a cautious review. It's always going to give you thousands of results, even if they are completely irrelevant. In fact, I rarely look past the first page of results, knowing that the remainder will have virtually zero relevancy to what I'm looking for.

Here are a couple of tricks to use to focus a search:

a. Try to use the most specific, least "common" word(s) in your search string. For plants, use the scientific genus name (or species if you want to be that detailed). But understand that Google Images is still going to give you thousands of results, no matter what you ask for. For instance, if you search, as I did, for "Adiantum andicola", it comes with pages filled with maidenhair ferns. But only about the first 25 are actually of that species. Thereafter, it begins to throw in other species of Adiantum, and by the time I scroll towards the bottom of the first page, it's throwing in images of bromeliads and Ludwigia's, probably because somewhere on the page with those images, the word "Adiantum" appears.

b. If you are searching for images of a species in a large genus and you get many images of non-target species--or to eliminate any other irrelevant images--make use of the "negation" capability in the search bar: That is, put a minus sign ("-") in front of terms or species names that you want to exclude from the search. If you're searching for, say, "Euphorbia albomarginata" but the results get photo-bombed by lots of Euphorbia prostrata and other species, just eliminate them from consideration by modifying the search to read, "Euphorbia albomarginata -prostrata -serpens -villifera -hirta", etc. Another example: Just try searching for "beetle" in Google Images. About the fourth or fifth line down, you begin to see images of Volkswagens. Then search for "beetle -VW -Volkswagen" and those are eliminated.

IF anyone has different experience with other image search engines, I'd like to hear about them.

Hope this helps!

Lähettänyt gcwarbler melkein 5 vuotta sitten (Lippu)

Lisää kommentti

Kirjaudu sisään tai Rekisteröidy lisätäksesi kommentteja