machine learning
data preprocessing
linear regression
type conversion
data types

Cast string to float is not supported in Linear Model

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

This error means your linear model received string data where it expected numeric features. Linear models operate on numbers, so if a column contains raw strings such as "red", "42" stored as text, or missing values encoded as words, the model cannot automatically turn that into a usable floating-point tensor.

The fix is not to force the model to cast blindly. The fix is to preprocess the input correctly: convert numeric-looking strings into numeric values, and encode categorical strings into numeric features before training.

Why Linear Models Need Numeric Input

A linear model computes a weighted sum of feature values. That requires arithmetic such as multiplication and addition, which only makes sense for numeric tensors.

If your dataset contains:

text
age,salary,city
"34","75000","Toronto"

then age and salary may be convertible to numbers, but city is categorical text. Those two cases need different preprocessing.

Convert Numeric Strings Explicitly

If the feature is conceptually numeric but stored as text, parse it before the model sees it.

Example with pandas:

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "age": ["34", "28", "41"],
5    "target": [1.2, 0.8, 1.9]
6})
7
8df["age"] = pd.to_numeric(df["age"], errors="raise")
9print(df.dtypes)

After conversion, age is numeric and can be used directly by a linear model.

If parsing fails, that is useful information. It means the column is not actually clean numeric data yet.

Encode Categorical Strings Instead of Casting Them

If the feature is true category text such as a city name or product type, do not cast it to float. Encode it.

A simple TensorFlow preprocessing pipeline looks like this:

python
1import tensorflow as tf
2
3city_lookup = tf.keras.layers.StringLookup(output_mode="one_hot")
4city_lookup.adapt(tf.constant(["Toronto", "Paris", "Tokyo"]))
5
6cities = tf.constant([["Toronto"], ["Paris"], ["Tokyo"]])
7encoded = city_lookup(cities)
8
9print(encoded)

Now the string category becomes a numeric one-hot vector that a linear layer can consume.

You can combine numeric and categorical preprocessing in a Keras model:

python
1import tensorflow as tf
2
3age_input = tf.keras.Input(shape=(1,), dtype=tf.float32, name="age")
4city_input = tf.keras.Input(shape=(1,), dtype=tf.string, name="city")
5
6city_lookup = tf.keras.layers.StringLookup(output_mode="one_hot")
7city_lookup.adapt(tf.constant(["Toronto", "Paris", "Tokyo"]))
8
9city_features = city_lookup(city_input)
10all_features = tf.keras.layers.Concatenate()([age_input, city_features])
11output = tf.keras.layers.Dense(1)(all_features)
12
13model = tf.keras.Model(inputs=[age_input, city_input], outputs=output)

The important part is that the raw string never reaches the linear layer unprocessed.

Check the Input Pipeline, Not Just the Model

Many of these errors come from the data pipeline rather than the model definition. Common sources include:

  • CSV readers that infer every column as string
  • missing values represented by text such as "NA"
  • feature dictionaries where one field has the wrong dtype
  • training data and serving data using different schemas

So when you see this error, inspect the actual tensor dtypes entering the model. In TensorFlow, printing tensor.dtype or validating your dataset schema early usually saves time.

Separate Numeric Parsing from Categorical Encoding

Do not lump all strings together. Ask two questions for each column:

  1. Is this feature fundamentally numeric but stored as text?
  2. Or is it categorical text that needs encoding?

Numeric text should be parsed. Categorical text should be encoded. Those are different transformations with different meanings.

Common Pitfalls

  • Trying to cast category labels such as "Toronto" directly to float.
  • Assuming the model will infer how to parse numeric-looking strings automatically.
  • Letting CSV import keep numeric columns as object or string types and never validating them.
  • Mixing training and inference schemas so the same feature arrives as float in one place and string in another.

Summary

  • A linear model requires numeric features, so raw strings must be preprocessed first.
  • Convert numeric-looking strings with explicit parsing.
  • Encode categorical strings with one-hot, embedding, or similar feature transformations.
  • Validate dtypes in the input pipeline, not only in the model code.
  • Treat numeric text and categorical text as different problems with different fixes.

Course illustration
Course illustration

All Rights Reserved.