How to Select Features by Location in GeoPandas

Problem statement

A common GIS task is selecting features based on where they are relative to another layer.

Typical examples include:

  • selecting points inside administrative boundaries
  • selecting parcels within a study area
  • selecting roads that intersect flood zones
  • filtering buildings within a buffer distance of a river

In GeoPandas, this means comparing geometries in one GeoDataFrame to geometries in another and keeping only the matching features. The key requirements are:

  • both layers must use the same CRS
  • you must choose the correct spatial predicate, such as within, intersects, or contains

Quick answer

The usual GeoPandas workflow for selecting features by location is:

  1. read both layers
  2. make sure both use the same CRS
  3. choose the correct spatial predicate
  4. use geopandas.sjoin() when matching one full layer to another

Example:

import geopandas as gpd

points = gpd.read_file("data/schools.shp")
polygons = gpd.read_file("data/city_boundary.shp")

polygons = polygons.to_crs(points.crs)

selected = gpd.sjoin(points, polygons, predicate="within", how="inner")
print(len(selected))

This keeps only points that fall inside the polygons.

Step-by-step solution

Load the layers you want to compare

Read source files into GeoDataFrames

Start by loading both datasets.

import geopandas as gpd

points = gpd.read_file("data/customer_locations.geojson")
boundaries = gpd.read_file("data/admin_areas.shp")

Check geometry types

It helps to confirm what kinds of features you are working with.

print(points.geom_type.unique())
print(boundaries.geom_type.unique())

Typical output might be:

  • points layer: Point
  • boundary layer: Polygon or MultiPolygon

Check for invalid geometries

Invalid geometries can cause spatial operations to fail or return unexpected results.

print(points.is_valid.value_counts())
print(boundaries.is_valid.value_counts())

If needed, you can try a quick repair for some polygon issues:

# Quick fix for some invalid polygon geometries; verify results afterward
boundaries["geometry"] = boundaries.buffer(0)

Make sure both layers use the same CRS

Check the CRS of each layer

print(points.crs)
print(boundaries.crs)

Reproject one layer if needed

If the CRS does not match, reproject one layer to the other.

boundaries = boundaries.to_crs(points.crs)

Why CRS mismatch causes wrong results

If one layer is in EPSG:4326 and the other is in a projected CRS such as EPSG:3857 or EPSG:32633, the coordinates are not directly comparable. Spatial selection may return no rows or incorrect rows.

Select features by spatial relationship

There are two main patterns:

  • compare a layer to one merged geometry
  • compare every feature in one layer to every feature in another with sjoin()

Use the first pattern for simple filtering against a single area. Use sjoin() for feature-to-feature matching between full layers.

Use within to select features inside polygons

This is a common pattern for points or parcels inside an area.

import geopandas as gpd

points = gpd.read_file("data/inspection_points.shp")
polygons = gpd.read_file("data/service_areas.shp").to_crs(points.crs)

study_area = polygons.union_all()
selected_points = points[points.within(study_area)]

print(f"Selected points: {len(selected_points)}")

This works well when you want to test one layer against the combined geometry of another.

Use intersects to select overlapping or touching features

Use intersects when features should be kept if they overlap or touch.

import geopandas as gpd

roads = gpd.read_file("data/roads.shp")
flood_zones = gpd.read_file("data/flood_zones.shp").to_crs(roads.crs)

flood_geom = flood_zones.union_all()
roads_in_flood = roads[roads.intersects(flood_geom)]

print(f"Roads intersecting flood zones: {len(roads_in_flood)}")

Use contains when the reference geometry must fully contain another

contains is the reverse of within. For example, if you want polygons that contain a specific point:

import geopandas as gpd

points = gpd.read_file("data/sample_points.shp")
polygons = gpd.read_file("data/zones.shp").to_crs(points.crs)

point = points.iloc[0].geometry
matching_polygons = polygons[polygons.contains(point)]

print(matching_polygons)

Choose the right predicate

Use:

  • within for features completely inside another geometry
  • contains for geometries that completely contain another
  • intersects for any overlap or touch
  • touches when only shared boundaries matter
  • crosses for line-crossing cases

For many GIS workflows, intersects is more inclusive than within.

Use spatial join for layer-to-layer selection

When geopandas.sjoin() is the simplest option

If you want to compare every feature in one layer against every feature in another and keep matching rows, sjoin() is usually the cleanest approach.

Select features from one layer based on matches in another

import geopandas as gpd

points = gpd.read_file("data/stores.shp")
districts = gpd.read_file("data/sales_districts.shp").to_crs(points.crs)

joined = gpd.sjoin(points, districts, how="inner", predicate="within")
print(joined.columns)
print(joined.head())

Keep only matching records from the target layer

how="inner" keeps only matches, which makes it useful for select-by-location tasks.

selected_points = joined.drop(columns=["index_right"], errors="ignore")

Code examples

Select points within administrative boundaries

import geopandas as gpd

points = gpd.read_file("data/schools.geojson")
admin = gpd.read_file("data/municipal_boundary.shp").to_crs(points.crs)

selected = gpd.sjoin(points, admin, predicate="within", how="inner")
print(f"Schools inside boundary: {len(selected)}")

Select roads that intersect hazard polygons

import geopandas as gpd

roads = gpd.read_file("data/roads.shp")
hazards = gpd.read_file("data/landslide_zones.shp").to_crs(roads.crs)

selected_roads = gpd.sjoin(roads, hazards, predicate="intersects", how="inner")
selected_roads = selected_roads.drop(columns=["index_right"], errors="ignore")

print(f"Roads intersecting hazards: {len(selected_roads)}")

Select parcels inside a study area boundary

import geopandas as gpd

parcels = gpd.read_file("data/parcels.shp")
study_area = gpd.read_file("data/study_area.geojson").to_crs(parcels.crs)

study_geom = study_area.union_all()
selected_parcels = parcels[parcels.within(study_geom)]

print(f"Parcels inside study area: {len(selected_parcels)}")

Save the selected features

Export results to GeoJSON or Shapefile

After filtering, write the result to a new file.

import geopandas as gpd

parcels = gpd.read_file("data/parcels.shp")
study_area = gpd.read_file("data/study_area.geojson").to_crs(parcels.crs)

study_geom = study_area.union_all()
selected_parcels = parcels[parcels.within(study_geom)]

selected_parcels.to_file("output/selected_parcels.geojson", driver="GeoJSON")
selected_parcels.to_file("output/selected_parcels.shp")

Keep only the columns you need before saving

This reduces file size and avoids carrying unnecessary fields into the output.

import geopandas as gpd

parcels = gpd.read_file("data/parcels.shp")
study_area = gpd.read_file("data/study_area.geojson").to_crs(parcels.crs)

study_geom = study_area.union_all()
selected_parcels = parcels[parcels.within(study_geom)]

output = selected_parcels[["parcel_id", "owner_name", "geometry"]]
output.to_file("output/selected_parcels.geojson", driver="GeoJSON")

If you save to Shapefile, remember that field names are limited in length.

Explanation

There are two practical ways to select features by location in GeoPandas.

The first is to use boolean geometry methods such as within() and intersects() directly on a GeoSeries. This is a good fit when you want to compare one layer against a single geometry, such as a study area created by merging polygons with union_all().

The second is to use gpd.sjoin() when you want feature-to-feature comparison between two full layers. This is usually the better option when each feature in one dataset may match one or more features in another.

The main decision is:

  • use geometry methods for one geometry or one merged area
  • use sjoin() for full layer-to-layer matching

Predicate choice also matters:

  • within excludes boundary-only cases
  • intersects includes touching and overlapping cases
  • contains depends on geometry direction

This pattern is useful in automation because it can be reused in scripts: load layers, align CRS, run the spatial filter, and export the result.

Edge cases and notes

  • CRS mismatch is the most common reason for empty results. Always compare gdf1.crs and gdf2.crs.
  • Invalid geometries can break spatial matching or give unreliable output. Check is_valid before joining or filtering.
  • Boundary-touching features may not be returned by within. Use intersects if touching the boundary should count.
  • Duplicate matches can occur with sjoin() if one feature matches multiple polygons.
  • Large datasets are usually faster with sjoin() than manual loops, but performance still depends on geometry complexity and data size.
  • Null geometries should be removed or handled before spatial operations.

Distance-based selection

For distance-based selection, use a projected CRS before buffering so the distance is measured in meters or feet rather than degrees.

import geopandas as gpd

rivers = gpd.read_file("data/rivers.shp")
buildings = gpd.read_file("data/buildings.shp").to_crs(rivers.crs)

# Reproject first if rivers is in a geographic CRS such as EPSG:4326
river_buffer = rivers.buffer(100).union_all()
nearby_buildings = buildings[buildings.intersects(river_buffer)]

print(f"Buildings within 100 units of rivers: {len(nearby_buildings)}")

For the broader workflow, see Spatial joins in GeoPandas.

Related tasks:

For exporting filtered results, see How to Export GeoJSON in Python with GeoPandas.

If your query returns nothing, see Why GeoPandas Spatial Join Returns No Results.

FAQ

How do I select points inside polygons in GeoPandas?

Use within() against a polygon geometry or use gpd.sjoin() with predicate="within".

selected = gpd.sjoin(points, polygons, how="inner", predicate="within")

Should I use within or intersects?

Use within when the feature must be fully inside another geometry. Use intersects when touching the boundary should count as a match.

Why does my spatial selection return no rows?

Common causes are:

  • CRS mismatch
  • invalid geometries
  • wrong predicate choice
  • null or empty geometries

Can I export selected features to a new file?

Yes. After filtering, save the result with to_file().

selected.to_file("output/selected_features.geojson", driver="GeoJSON")

Is sjoin() better than geometry methods?

Use geometry methods for simple comparisons against one geometry. Use sjoin() when comparing two layers feature by feature.