Building a Text Categorization API with TensorFlow.NET in ASP.NET Core
Introduction
In the design of modern web applications, text categorization is a crucial feature that can help in organizing, filtering, and understanding large volumes of textual data. This tutorial will guide you through creating a powerful text categorization API using TensorFlow.NET and ASP.NET Core. By the end, you’ll have a functional API that can categorize text using a pre-trained TensorFlow model.
Prerequisites
Before we go in, ensure you have the following:
- Basic knowledge of ASP.NET Core.
- A working installation of .NET SDK.
- Python (for training the TensorFlow model).
Step 1: Set Up Your ASP.NET Core Web API Project
Start by creating a new ASP.NET Core Web API project:
dotnet new webapi -n TextCategorizationApi
cd TextCategorizationApi
Open the project in your preferred IDE (e.g., Visual Studio or Visual Studio Code).
Step 2: Add Necessary Packages
Next, install TensorFlow.NET and the required dependencies:
dotnet add package TensorFlow.NET
dotnet add package SciSharp.TensorFlow.Redist
Step 3: Prepare Your TensorFlow Model
If you don’t have a pre-trained TensorFlow model, you can train one using Python. Here’s an example script to train and save a text classification model:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
import numpy as np
# Example training data
texts = ["This is positive", "This is negative", "Very happy", "Very sad"]
labels = [1, 0, 1, 0]
tokenizer = Tokenizer(num_words=1000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
data = pad_sequences(sequences, maxlen=10)
labels = np.array(labels)
model = Sequential([
Embedding(1000, 64, input_length=10),
LSTM(64),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(data, labels, epochs=5)
# Save the model
model.save('text_categorization_model')
Convert and save the model in TensorFlow SavedModel format:
model.save('saved_model')
Step 4: Implement the TensorFlow Service in ASP.NET Core
Create a service to load and use the TensorFlow model:
// Services/TensorFlowService.cs
using System;
using System.Linq;
using NumSharp;
using Tensorflow;
using static Tensorflow.Binding;
using TextCategorizationApi.Models;
public class TensorFlowService
{
private readonly string _modelPath;
private readonly Session _session;
private readonly Graph _graph;
public TensorFlowService(string modelPath)
{
_modelPath = modelPath;
_graph = new Graph().as_default();
_session = tf.Session(_graph);
tf.train.import_meta_graph($"{_modelPath}/saved_model.meta");
tf.train.Saver().restore(_session, _modelPath);
}
public string PredictCategory(string text)
{
// Tokenize and pad the input text as in the training script
var tokens = TokenizeAndPad(text);
var inputTensor = _graph.OperationByName("input_1"); // Adjust according to your model's input name
var outputTensor = _graph.OperationByName("dense_2/Sigmoid"); // Adjust according to your model's output name
var result = _session.run(outputTensor, new FeedItem(inputTensor, tokens));
// Assume binary classification: 0 or 1
return result[0].ToString() == "1" ? "Positive" : "Negative";
}
private NDArray TokenizeAndPad(string text)
{
// Implement the same tokenization and padding logic as in the Python script
// This is a simplified version, you may need to adjust it based on your actual tokenizer
var tokens = text.Split(' ').Select(word => (float)word.GetHashCode() % 1000).ToArray();
var paddedTokens = new float[10]; // Adjust to your max sequence length
Array.Copy(tokens, paddedTokens, Math.Min(tokens.Length, paddedTokens.Length));
return np.array(new[] { paddedTokens });
}
}
Step 5: Register the TensorFlow Service in Startup.cs
Add the following code to register the TensorFlow service:
public void ConfigureServices(IServiceCollection services)
{
services.AddControllers();
var modelPath = "path_to_your_model";
services.AddSingleton(new TensorFlowService(modelPath));
}
Step 6: Create a Controller to Use the TensorFlow Service
Create a controller to handle predictions:
// Controllers/PredictController.cs
using Microsoft.AspNetCore.Mvc;
using TextCategorizationApi.Models;
using TextCategorizationApi.Services;
[Route("api/[controller]")]
[ApiController]
public class PredictController : ControllerBase
{
private readonly TensorFlowService _tensorFlowService;
public PredictController(TensorFlowService tensorFlowService)
{
_tensorFlowService = tensorFlowService;
}
[HttpPost]
public ActionResult Predict([FromBody] TextData input)
{
var category = _tensorFlowService.PredictCategory(input.Text);
return Ok(new { input.Text, Category = category });
}
}
Step 7: Test Your API
Run your API:
dotnet run
Use a tool like Postman or cURL to send requests to your API endpoint:
Example Request Using Postman:
-
URL:
http://localhost:5000/api/predict
- Method: POST
-
Body:
{ "text": "This is a sample text to categorize." }
Conclusion
In this article, we’ve walked through creating a text categorization API using TensorFlow.NET and ASP.NET Core. By leveraging the power of TensorFlow’s machine learning capabilities, you can enhance your .NET applications with advanced text analysis features. Whether you’re organizing user feedback, filtering content, or gaining insights from textual data, this API provides a robust foundation for your text categorization needs.