An .xlsx file is a standard ZIP archive containing XML files and folders.
Excel reads and writes all data through these XML files — understanding their
structure is the prerequisite for any surgical, code-based repair.
This document is grounded in a direct analysis of BadPipeIn.xlsx and reflects
the specific layout of the PipeUpdUV workbook. It serves as the foundation for
developing FixPipe.py fixes step by step.
BadPipeIn.xlsx (ZIP archive)
│
├── [Content_Types].xml ← Registry of all parts and their MIME types
├── _rels/
│ └── .rels ← Root: points to xl/workbook.xml
│
├── docProps/
│ ├── app.xml ← Application metadata (app version, etc.)
│ ├── core.xml ← Core properties (author, created, modified)
│ └── custom.xml ← Custom document properties
│
└── xl/
├── workbook.xml ← Sheet list, names, active sheet
├── _rels/
│ └── workbook.xml.rels ← Maps rId→ file for each sheet, cache, style, etc.
│
├── worksheets/
│ ├── sheet1.xml … sheet12.xml ← Cell data, formulas, column/row definitions
│ └── _rels/
│ ├── sheet4.xml.rels ← Refs to pivot tables for "By Sales"
│ ├── sheet5.xml.rels ← (and other sheets that have rels)
│ └── …
│
├── pivotTables/
│ ├── pivotTable1.xml … pivotTable10.xml ← Layout, field config, filter state
│ └── _rels/
│ └── pivotTableN.xml.rels ← Each PT → its pivotCacheDefinition
│
├── pivotCache/
│ ├── pivotCacheDefinition1.xml … 4.xml ← Field metadata + shared-item lists
│ ├── pivotCacheRecords1.xml … 4.xml ← Actual cached data rows
│ └── _rels/
│ └── pivotCacheDefinitionN.xml.rels ← Def → its Records file
│
├── charts/
│ └── chart1.xml, chart2.xml, chart3.xml ← Chart definitions (type, series, style)
│
├── drawings/
│ ├── drawing1.xml ← Anchors charts to worksheet grid
│ └── _rels/
│ └── drawing1.xml.rels ← drawing → chart file(s)
│
├── sharedStrings.xml ← Deduplicated string table (cell text)
├── styles.xml ← Number formats, fonts, fills, borders, cell styles
├── calcChain.xml ← Formula recalculation order
└── theme/
└── theme1.xml ← Color/font theme
Every file in the ZIP must be registered here so Excel knows how to handle it.
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<!-- Default: applies to any file with this extension -->
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
<!-- Override: applies to a specific path -->
<Override PartName="/xl/workbook.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
<Override PartName="/xl/worksheets/sheet4.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
<Override PartName="/xl/charts/chart1.xml"
ContentType="application/vnd.openxmlformats-officedocument.drawingml.chart+xml"/>
<Override PartName="/xl/drawings/drawing1.xml"
ContentType="application/vnd.openxmlformats-officedocument.drawing+xml"/>
<Override PartName="/xl/pivotTables/pivotTable1.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.pivotTable+xml"/>
<Override PartName="/xl/pivotCache/pivotCacheDefinition1.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.pivotCacheDefinition+xml"/>
<Override PartName="/xl/pivotCache/pivotCacheRecords1.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.pivotCacheRecords+xml"/>
</Types>Rules:
- Adding a new chart/drawing/cache without a
<Override>entry will cause Excel to ignore or corrupt the file on open. - Use string manipulation (not ElementTree) to edit this file. ET re-serializes
the namespace as
ns0:which breaks the file.
Every .rels file lives in a _rels/ subdirectory alongside its parent file.
<!-- xl/worksheets/_rels/sheet4.xml.rels -->
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="…/pivotTable" Target="../pivotTables/pivotTable1.xml"/>
<Relationship Id="rId9" Type="…/drawing" Target="../drawings/drawing1.xml"/>
</Relationships>Key rules:
Idvalues are unique only within that .rels file —rId1insheet4.xml.relsis unrelated torId1indrawing1.xml.rels.- The
TypeURI identifies what kind of link this is (worksheet, chart, drawing, pivotTable, pivotCacheDefinition, sharedStrings, styles, theme, …). Targetis a relative path from the parent file's directory.- Relationship files are optional: only sheets that have charts, pivot tables, or
other linked objects have a
.relsfile.
Lists all sheets with their display names, internal sheet IDs, and relationship IDs:
<workbook>
<sheets>
<sheet name="Pipeline Sell Out" sheetId="1" r:id="rId1"/>
<sheet name="By Sales" sheetId="3" r:id="rId4"/>
<sheet name="Pipe Analysis" sheetId="4" r:id="rId5"/>
<sheet name="Pipe Log" sheetId="7" r:id="rId8"/>
<!-- hidden sheets: -->
<sheet name="Pipeline Run Rate" sheetId="2" state="hidden" r:id="rId3"/>
</sheets>
</workbook>xl/_rels/workbook.xml.rels maps each r:id to a file path, and also maps the
four pivot cache definitions to their workbook-level relationship IDs.
Contains cell data, column widths, row heights, and references to embedded objects.
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheetViews>…</sheetViews>
<sheetFormatPr …/>
<cols>…</cols>
<sheetData>
<row r="1">
<c r="A1" t="s"><v>42</v></c> <!-- string: index into sharedStrings -->
<c r="B1" t="n"><v>1234.5</v></c>
<c r="C1"><f>A1+B1</f><v>…</v></c>
</row>
</sheetData>
<!-- Reference to the drawing layer (charts): -->
<drawing r:id="rId10"/>
</worksheet>The <drawing r:id="…"/> element at the end of the worksheet XML links the
sheet to its drawingN.xml via the sheet's .rels file.
Removing or failing to add this element means charts are invisible even if all
other files are present.
Each pivot cache is three files: definition, records, and the definition's .rels.
Describes the data source and the field catalogue (schema + shared value lists).
<pivotCacheDefinition r:id="rId1" recordCount="936">
<cacheSource type="worksheet">
<!-- Option 1: direct range reference -->
<worksheetSource ref="A2:U1048576" sheet="Pipeline Sell Out"/>
<!-- Option 2: named range (DATASELLOUT in BadPipeIn cache4) -->
<worksheetSource name="DATASELLOUT"/>
</cacheSource>
<cacheFields count="21">
<!-- Field with shared-item list (used by <x> refs in records) -->
<cacheField name="Propriétaire de l'opportunité" numFmtId="0">
<sharedItems containsBlank="1" count="33">
<s v="Kajanan SHAN"/>
<s v="Charles TEZENAS"/>
<!-- … one <s> per unique owner value … -->
</sharedItems>
</cacheField>
<!-- Field whose values are stored inline in records (no shared items) -->
<cacheField name="Opportunity Number" numFmtId="0">
<sharedItems containsBlank="1" containsSemiMixedTypes="1" containsNonDate="1" containsString="1"/>
</cacheField>
<!-- Numeric field -->
<cacheField name="Quantité" numFmtId="0">
<sharedItems containsNumber="1" containsInteger="1" minValue="1" maxValue="10000000"/>
</cacheField>
</cacheFields>
</pivotCacheDefinition>Shared items are the unique values for a field, stored as an indexed list. Fields without shared items store their values directly in each record.
One <r> element per data row. Each child element corresponds to one field
in the same order as cacheFields:
<pivotCacheRecords count="936">
<r>
<x v="0"/> <!-- Owner: index 0 → "Kajanan SHAN" -->
<d v="2024-12-19T00:00:00"/> <!-- Created Date: inline date -->
<d v="2025-06-30T00:00:00"/> <!-- Close Date: inline date -->
<s v="Closed Won"/> <!-- Stage: inline string -->
<s v="OP0000218588"/> <!-- Opportunity Number: inline string -->
<s v="CFI"/> <!-- Revendeur: inline string -->
<s v="France Travail"/> <!-- Client Final: inline string -->
<n v="110000"/> <!-- Quantité: inline number -->
<n v="367"/> <!-- Prix de vente: inline number -->
<n v="40370000"/> <!-- Prix total: inline number -->
<!-- … more fields … -->
<x v="0"/> <!-- Product Family: index into its sharedItems -->
<x v="3"/> <!-- Quarter Invoice: index -->
<x v="2"/> <!-- Forecast: index -->
<m/> <!-- Blank / missing value -->
</r>
</pivotCacheRecords>Element types:
| Tag | Meaning |
|---|---|
<x v="N"/> |
Shared-item reference — index N into that field's <sharedItems> list |
<s v="…"/> |
Inline string |
<n v="…"/> |
Inline number |
<d v="…"/> |
Inline date (ISO 8601) |
<m/> |
Missing / null value |
<b v="0|1"/> |
Boolean |
<e v="…"/> |
Error value |
| Cache | Source | Records | Used by Pivot Tables | Notes |
|---|---|---|---|---|
| cache1 | Owner Opty Tracking sheet | — | PT5 (pivotTable5) | Owner tracking pivot; do not modify |
| cache2 | Pipeline Sell Out A2:U∞ |
4799 | PT1,2,4,6,7,9 | Main pipeline data |
| cache3 | — | — | PT10 | Used by Pipeline Close Lost sheet |
| cache4 | Named range DATASELLOUT |
936 | PT3, PT8 | "By Sales" main data; refreshes on load |
Defines how the cache data is arranged and displayed.
<pivotTableDefinition name="Tableau croisé dynamique1" cacheId="2">
<location ref="S6:V11" firstHeaderRow="1" firstDataRow="2" firstDataCol="1"/>
<pivotFields count="24">
<!-- One pivotField per cache field, in the same order -->
<!-- A row field (axis="axisRow") -->
<pivotField axis="axisRow" showAll="0">
<items count="40">
<item sd="0" x="0"/> <!-- cache index 0 → visible (no h attr) -->
<item h="1" sd="0" x="17"/> <!-- h="1" → hidden in pivot -->
<item t="default" sd="0"/> <!-- the Grand Total row marker -->
</items>
</pivotField>
<!-- A page/filter field (axis="axisPage") -->
<pivotField axis="axisPage" showAll="0" multipleItemSelectionAllowed="1">
<items count="25">
<item x="0"/> <!-- selected (no h attr) -->
<item h="1" x="1"/> <!-- hidden / unchecked in filter dropdown -->
</items>
</pivotField>
<!-- A data field (no axis) -->
<pivotField dataField="1" showAll="0"/>
</pivotFields>
<!-- Which fields go where -->
<rowFields> <field x="0"/> </rowFields>
<colFields> <field x="3"/> </colFields>
<pageFields> <pageField fld="18" item="4294967295"/> </pageFields>
<dataFields> <dataField name="Sum of Qty" fld="7" subtotal="sum"/> </dataFields>
</pivotTableDefinition>Key concepts:
<item x="N"/>— thexattribute is a cache sharedItems index for that field.h="1"— the item is hidden (unchecked) in the pivot/filter.item="4294967295"on<pageField>— sentinel value meaning "multiple items selected".- A pivot table references its cache via
cacheIdwhich maps to the workbook'spivotCacheDefinitionrelationship.
Defines chart type, title, series, and visual styling.
<c:chartSpace xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart">
<c:chart>
<c:title><c:tx>…<a:t>My Chart Title</a:t>…</c:tx></c:title>
<c:plotArea>
<c:barChart> <!-- or lineChart, line3DChart, pieChart, … -->
<c:ser>
<c:idx val="0"/>
<c:order val="0"/>
<!-- Series label from a cell -->
<c:tx><c:strRef><c:f>'Pipe Analysis'!$B$2</c:f></c:strRef></c:tx>
<!-- Category axis values -->
<c:cat><c:numRef><c:f>'Pipe Analysis'!$A$3:$A$34</c:f></c:numRef></c:cat>
<!-- Data values -->
<c:val><c:numRef><c:f>'Pipe Analysis'!$C$3:$C$34</c:f></c:numRef></c:val>
</c:ser>
</c:barChart>
</c:plotArea>
</c:chart>
</c:chartSpace>In BadPipeIn.xlsx:
chart1.xml—line3DChart, title "Evolution OPTY 30 Derniers Jours", referencesPipe Analysissheet. Belongs todrawing1.xmlviarId1.chart2.xml—barChart, referencesPipe Analysissheet data.rId2.chart3.xml—barChart, referencesPipe Analysissheet data.rId3.- All three charts live in
drawing1.xmlwhich is anchored to a different sheet (not "By Sales" —drawing1.xmlis already present in another sheet's.rels).
Acts as the anchor layer that positions charts (or images/shapes) on the worksheet grid.
<xdr:wsDr xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing"
xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<xdr:twoCellAnchor> <!-- anchor defined by two cell corners -->
<xdr:from>
<xdr:col>4</xdr:col> <xdr:colOff>260350</xdr:colOff>
<xdr:row>2</xdr:row> <xdr:rowOff>63500</xdr:rowOff>
</xdr:from>
<xdr:to>
<xdr:col>14</xdr:col> <xdr:colOff>438150</xdr:colOff>
<xdr:row>22</xdr:row> <xdr:rowOff>107950</xdr:rowOff>
</xdr:to>
<xdr:graphicFrame macro="">
<xdr:nvGraphicFramePr>
<xdr:cNvPr id="2" name="Chart 1"/>
<xdr:cNvGraphicFramePr/>
</xdr:nvGraphicFramePr>
<xdr:xfrm><a:off x="0" y="0"/><a:ext cx="0" cy="0"/></xdr:xfrm>
<a:graphic>
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/chart">
<!-- This r:id links to chart1.xml via drawing1.xml.rels -->
<c:chart xmlns:c="…" xmlns:r="…" r:id="rId1"/>
</a:graphicData>
</a:graphic>
</xdr:graphicFrame>
<xdr:clientData/>
</xdr:twoCellAnchor>
<!-- Second chart anchor in the same drawing file -->
<xdr:twoCellAnchor>…r:id="rId2"…</xdr:twoCellAnchor>
</xdr:wsDr>colOff and rowOff are in EMUs (English Metric Units): 914400 EMU = 1 inch.
xl/workbook.xml
sheet name="By Sales" r:id="rId4"
↓
xl/_rels/workbook.xml.rels
rId4 → worksheets/sheet4.xml
↓
xl/worksheets/sheet4.xml
<drawing r:id="rId9"/> ← must be present inside the worksheet XML
↓
xl/worksheets/_rels/sheet4.xml.rels
rId9 → ../drawings/drawing2.xml (new drawing for By Sales)
↓
xl/drawings/drawing2.xml
<c:chart r:id="rId1"/> ← first chart
<c:chart r:id="rId2"/> ← second chart
↓
xl/drawings/_rels/drawing2.xml.rels
rId1 → ../charts/chart4.xml (new chart files)
rId2 → ../charts/chart5.xml
↓
xl/charts/chart4.xml (bar chart — By Sales pivot chart 1)
xl/charts/chart5.xml (bar chart — By Sales pivot chart 2)
And in [Content_Types].xml, each new file needs an <Override> entry.
Sheet file: xl/worksheets/sheet4.xml (mapped via rId4 in workbook.xml.rels)
Pivot tables on this sheet (9 total):
| File | Name | Cache | Location | Description |
|---|---|---|---|---|
| pivotTable1 | PivotTable5 | cache2 | J60:L64 | Small summary |
| pivotTable2 | TCD2 | cache2 | AT4:AV25 | Column pivot |
| pivotTable3 | PivotTable7 | cache4 | AA5:AE12 | By Sales main (DATASELLOUT) |
| pivotTable4 | TCD1 | cache2 | S6:V11 | Filter-heavy pivot |
| pivotTable5 | TCD4 | cache1 | AH7:AI9 | Owner Opty (skip) |
| pivotTable6 | TCD6 | cache2 | O6:P32 | Long list pivot |
| pivotTable7 | PivotTable6 | cache2 | O60:R69 | Summary |
| pivotTable8 | TCD3 | cache4 | B5:D13 | By Sales top-left (DATASELLOUT) |
| pivotTable9 | PivotTable1 | cache2 | B60:F78 | Large summary |
Charts: Currently drawing1.xml (which has chart1, chart2, chart3) is
NOT linked to sheet4. Sheet4 has no <drawing> element and no drawing .rels entry.
This is the corruption fix_by_sales_charts addresses.
Target pivot table for chart fix: pivotTable8 (TCD3, cache4, location B5:D13)
— this is the "By Sales" summary table that the new charts should visualize.
-
Use string operations (not
ElementTree.tostring) for.relsand[Content_Types].xml— ET addsns0:namespace prefixes that corrupt the file. -
Use
ElementTree.fromstring()freely for reading (parsing) XML — it is safe to parse without modifying the serialized form. -
Preserve ZIP metadata when rewriting: use
zf_out.writestr(item, new_bytes)whereitemis the originalZipInfoobject to keep compression method and other metadata consistent. -
XML-escape attribute values when building XML strings:
&→&,"→",<→<,>→> -
Orphaned file cleanup (planned): before injecting new chart/drawing files, scan all
.relsfiles to find whichchartN.xml/drawingN.xmlfiles are actually referenced. Remove unreferenced files from the ZIP and their[Content_Types].xmlentries. This prevents accumulation of dead files (chart4,chart6,chart8…) from repeated fix cycles.
| Prefix | URI | Usage |
|---|---|---|
| (default) | http://schemas.openxmlformats.org/spreadsheetml/2006/main |
Worksheets, pivot tables, cache |
r: |
http://schemas.openxmlformats.org/officeDocument/2006/relationships |
Relationship IDs in XML |
c: |
http://schemas.openxmlformats.org/drawingml/2006/chart |
Chart XML |
a: |
http://schemas.openxmlformats.org/drawingml/2006/main |
DrawingML (shapes, text) |
xdr: |
http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing |
Drawing anchors |
mc: |
http://schemas.openxmlformats.org/markup-compatibility/2006 |
Markup compatibility |
xr: |
http://schemas.microsoft.com/office/spreadsheetml/2014/revision |
Revision tracking |
| (rels) | http://schemas.openxmlformats.org/package/2006/relationships |
.rels files |
| (ct) | http://schemas.openxmlformats.org/package/2006/content-types |
Content_Types.xml |