on
Clickhouse issue while reading a no-header CSV coming from a structured s3 path containing named entities (e.g. `foo=bar`)
Another issue that I came accross was reading no-header CSV from a a s3 path that has some structure. Specifically an expression containing equal sign such as foo=bar.
I’ve previously included a sample CSV file with NO header in s3. It’s duplicated in two different paths:
- test/sample-no-header.csv
- test/sample/date=2025-10-12/sample-no-header.csv
Since they are identical files, I’m expecting to be able to read both of them fine.
As usual, everything is available in this repo and you can reproduce it for yourself.
Reading a CSV from a regular s3 path
As expected, clickhouse detects columns names as c1, c2.. and is able to read the data.
SELECT *
FROM s3('http://minio:9001/test/sample-no-header.csv', 'minio', 'minio123', 'CSV')
Query id: ec707ea1-2d54-461d-ac0e-95bed5fa86e3
┌─────────c1─┬──────c2─┬──────c3─┬──────c4─┬──────c5─┬───────c6─┬─c7─┐
1. │ 1984-09-07 │ 0.42388 │ 0.42902 │ 0.41874 │ 0.42388 │ 23220030 │ 0 │
2. │ 1984-09-10 │ 0.42388 │ 0.42516 │ 0.41366 │ 0.42134 │ 18022532 │ 0 │
3. │ 1984-09-11 │ 0.42516 │ 0.43668 │ 0.42516 │ 0.42902 │ 42498199 │ 0 │
4. │ 1984-09-12 │ 0.42902 │ 0.43157 │ 0.41618 │ 0.41618 │ 37125801 │ 0 │
5. │ 1984-09-13 │ 0.43927 │ 0.44052 │ 0.43927 │ 0.43927 │ 57822062 │ 0 │
6. │ 1984-09-14 │ 0.44052 │ 0.45589 │ 0.44052 │ 0.44566 │ 68847968 │ 0 │
7. │ 1984-09-17 │ 0.45718 │ 0.46357 │ 0.45718 │ 0.45718 │ 53755262 │ 0 │
8. │ 1984-09-18 │ 0.45718 │ 0.46103 │ 0.44052 │ 0.44052 │ 27136886 │ 0 │
9. │ 1984-09-19 │ 0.44052 │ 0.44566 │ 0.43157 │ 0.43157 │ 29641922 │ 0 │
10. │ 1984-09-20 │ 0.43286 │ 0.43668 │ 0.43286 │ 0.43286 │ 18453585 │ 0 │
11. │ 1984-09-21 │ 0.43286 │ 0.44566 │ 0.42388 │ 0.42902 │ 27842780 │ 0 │
└────────────┴─────────┴─────────┴─────────┴─────────┴──────────┴────┘
11 rows in set. Elapsed: 0.006 sec.
Reading a CSV from a structured s3 path e.g. foo=bar
The second file I wanted to read from has date=2025-10-12 in its s3 path. When I tried to read from it, clickhouse fails indicating that the file might have missing columns.
SELECT *
FROM s3('http://minio:9001/test/sample/date=2025-10-12/sample-no-header.csv', 'minio', 'minio123', 'CSV')
Query id: 712d8d31-203c-4a9a-91ed-e6df668ea78b
Elapsed: 0.008 sec.
Received exception from server (version 25.8.10):
Code: 27. DB::Exception: Received from localhost:9000. DB::Exception: Cannot parse input: expected ',' before: '\n1984-09-10,0.42388,0.42516,0.41366,0.42134,18022532,0\n1984-09-11,0.42516,0.43668,0.42516,0.42902,42498199,0\n1984-09-12,0.42902,0.43157,0.41618,0.41618,37125801': (at row 1)
:
Row 1:
Column 0, name: c1, type: Nullable(Date), parsed text: "1984-09-07"
Column 1, name: c2, type: Nullable(Float64), parsed text: "0.42388"
Column 2, name: c3, type: Nullable(Float64), parsed text: "0.42902"
Column 3, name: c4, type: Nullable(Float64), parsed text: "0.41874"
Column 4, name: c5, type: Nullable(Float64), parsed text: "0.42388"
Column 5, name: c6, type: Nullable(Int64), parsed text: "23220030"
Column 6, name: c7, type: Nullable(Int64), parsed text: "0"
ERROR: Line feed found where delimiter (,) is expected. It's like your file has less columns than expected.
And if your file has the right number of columns, maybe it has unescaped quotes in values.
: (in file/uri test/sample/date=2025-10-12/sample-no-header.csv): While executing ParallelParsingBlockInputFormat: While executing ReadFromObjectStorage. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)
But it’s surprising enough that DESCRIBE TABLE function is able to detect columns just fine.
DESCRIBE TABLE s3('http://minio:9001/test/sample/date=2025-10-12/sample-no-header.csv', 'minio', 'minio123', 'CSV')
Query id: ec5d6649-1c39-40ca-902c-9048353382ba
┌─name─┬─type──────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
1. │ c1 │ Nullable(Date) │ │ │ │ │ │
2. │ c2 │ Nullable(Float64) │ │ │ │ │ │
3. │ c3 │ Nullable(Float64) │ │ │ │ │ │
4. │ c4 │ Nullable(Float64) │ │ │ │ │ │
5. │ c5 │ Nullable(Float64) │ │ │ │ │ │
6. │ c6 │ Nullable(Int64) │ │ │ │ │ │
7. │ c7 │ Nullable(Int64) │ │ │ │ │ │
└──────┴───────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
7 rows in set. Elapsed: 0.004 sec.
Workaround to read a CSV from structured s3 path e.g. foo=bar
After some tests, I’ve figured that I can force reading the file from this kind of path by enabling input_format_csv_allow_variable_number_of_columns setting.
SELECT *
FROM s3('http://minio:9001/test/sample/date=2025-10-12/sample-no-header.csv', 'minio', 'minio123', 'CSV')
SETTINGS input_format_csv_allow_variable_number_of_columns = 1
Query id: b42f5872-8e40-4711-ac47-2e1a577a572e
┌─────────c1─┬──────c2─┬──────c3─┬──────c4─┬──────c5─┬───────c6─┬─c7─┬───────date─┐
1. │ 1984-09-07 │ 0.42388 │ 0.42902 │ 0.41874 │ 0.42388 │ 23220030 │ 0 │ 2025-10-12 │
2. │ 1984-09-10 │ 0.42388 │ 0.42516 │ 0.41366 │ 0.42134 │ 18022532 │ 0 │ 2025-10-12 │
3. │ 1984-09-11 │ 0.42516 │ 0.43668 │ 0.42516 │ 0.42902 │ 42498199 │ 0 │ 2025-10-12 │
4. │ 1984-09-12 │ 0.42902 │ 0.43157 │ 0.41618 │ 0.41618 │ 37125801 │ 0 │ 2025-10-12 │
5. │ 1984-09-13 │ 0.43927 │ 0.44052 │ 0.43927 │ 0.43927 │ 57822062 │ 0 │ 2025-10-12 │
6. │ 1984-09-14 │ 0.44052 │ 0.45589 │ 0.44052 │ 0.44566 │ 68847968 │ 0 │ 2025-10-12 │
7. │ 1984-09-17 │ 0.45718 │ 0.46357 │ 0.45718 │ 0.45718 │ 53755262 │ 0 │ 2025-10-12 │
8. │ 1984-09-18 │ 0.45718 │ 0.46103 │ 0.44052 │ 0.44052 │ 27136886 │ 0 │ 2025-10-12 │
9. │ 1984-09-19 │ 0.44052 │ 0.44566 │ 0.43157 │ 0.43157 │ 29641922 │ 0 │ 2025-10-12 │
10. │ 1984-09-20 │ 0.43286 │ 0.43668 │ 0.43286 │ 0.43286 │ 18453585 │ 0 │ 2025-10-12 │
11. │ 1984-09-21 │ 0.43286 │ 0.44566 │ 0.42388 │ 0.42902 │ 27842780 │ 0 │ 2025-10-12 │
└────────────┴─────────┴─────────┴─────────┴─────────┴──────────┴────┴────────────┘
11 rows in set. Elapsed: 0.007 sec.
The query above runs without an error and adds an additional column coming from the s3 path.
Testing with a CSV having a header row
I’ve tested the same thing with a csv having a header row. And in that case, everything works fine.
SELECT *
FROM s3('http://minio:9001/test/sample.csv', 'minio', 'minio123', 'CSV')
LIMIT 1
Query id: dd464d01-89bf-4d0d-9d00-fd443bdf2bae
┌───────Date─┬────Open─┬────High─┬─────Low─┬───Close─┬───Volume─┬─OpenInt─┐
1. │ 1984-09-07 │ 0.42388 │ 0.42902 │ 0.41874 │ 0.42388 │ 23220030 │ 0 │
└────────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
1 row in set. Elapsed: 0.005 sec.
And from similar path:
SELECT *
FROM s3('http://minio:9001/test/sample/date=2025-10-12/sample.csv', 'minio', 'minio123', 'CSV')
LIMIT 1
Query id: c1f09583-0184-4c4c-8631-2311b8e6b21a
┌───────Date─┬────Open─┬────High─┬─────Low─┬───Close─┬───Volume─┬─OpenInt─┬───────date─┐
1. │ 1984-09-07 │ 0.42388 │ 0.42902 │ 0.41874 │ 0.42388 │ 23220030 │ 0 │ 2025-10-12 │
└────────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┴────────────┘
1 row in set. Elapsed: 0.008 sec.