What happened?
Hi everyone!
I noticed some odd interference of calls to table.cache() with the schema reported by table.schema() depending on the order of calls. I.e. in the following code, the call to .cache() turns centroid's dtype from point:geometry to geospatial:geometry:
from pathlib import Path
from urllib.request import urlretrieve
import ibis
# Download upstream example geoparquet file
url = "https://github.com/opengeospatial/geoparquet/raw/refs/tags/v1.1.0+p1/examples/example.parquet"
parquet_path = Path("opengeospatial-example.parquet")
if not parquet_path.exists():
urlretrieve(url, parquet_path)
# Ensure the spatial extension has been loaded (duckdb backend)
con = ibis.get_backend()
con.load_extension("spatial")
# Read data and compute centroids
data = ibis.read_parquet(parquet_path) # Key error when using duckdb==1.5.0, see below
data_with_centroids = data.mutate(centroid=ibis._.geometry.centroid())
data_with_centroids_cached = data_with_centroids.cache()
# Compare schemas
print(
"data schema:",
data.schema(),
"\ndata with centroids schema:",
data_with_centroids.schema(),
"\ndata with centroids cached schema:",
data_with_centroids_cached.schema(),
"\nschemas equivalent:",
data_with_centroids.schema().equals(data_with_centroids_cached.schema()),
sep="\n",
)
Output:
data schema:
ibis.Schema {
pop_est float64
continent string
name string
iso_a3 string
gdp_md_est int64
geometry geospatial:geometry
bbox struct<xmax: float64, xmin: float64, ymax: float64, ymin: float64>
}
data with centroids schema:
ibis.Schema {
pop_est float64
continent string
name string
iso_a3 string
gdp_md_est int64
geometry geospatial:geometry
bbox struct<xmax: float64, xmin: float64, ymax: float64, ymin: float64>
centroid point:geometry
}
data with centroids cached schema:
ibis.Schema {
pop_est float64
continent string
name string
iso_a3 string
gdp_md_est int64
geometry geospatial:geometry
bbox struct<xmax: float64, xmin: float64, ymax: float64, ymin: float64>
centroid geospatial:geometry
}
schemas equivalent:
False
Note how the two schemas are not considered equal. I would expect the reported schema to remain invariant under calls to table.cache().
What version of ibis are you using?
ibis-framework[duckdb,geospatial]==12.0.0
What backend(s) are you using, if any?
DuckDB, i.e. duckdb==1.4.4 (current LTS-version as of now; I'm not using the latest version of 1.5.0 because that one gives me a KeyError: 'OGC:CRS84' when reading the parquet-file)
Relevant log output
Code of Conduct
What happened?
Hi everyone!
I noticed some odd interference of calls to
table.cache()with the schema reported bytable.schema()depending on the order of calls. I.e. in the following code, the call to.cache()turnscentroid's dtype frompoint:geometrytogeospatial:geometry:Output:
Note how the two schemas are not considered equal. I would expect the reported schema to remain invariant under calls to
table.cache().What version of ibis are you using?
ibis-framework[duckdb,geospatial]==12.0.0What backend(s) are you using, if any?
DuckDB, i.e.
duckdb==1.4.4(current LTS-version as of now; I'm not using the latest version of1.5.0because that one gives me aKeyError: 'OGC:CRS84'when reading the parquet-file)Relevant log output
Code of Conduct