Natural Language Processing (NLP) in JavaScript (series 2)

Natural Language Processing (NLP) in JavaScript (series 2)

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language.

It plays a critical role in modern applications, ranging from language translation and sentiment analysis to chatbots and search engines.

NLP techniques empower developers to extract insights from vast amounts of textual data, making it a powerful tool for data analysis and decision-making.

In the first part of this article, we learned how to set up an environment and gather data, strip them of irrelevant words while preparing the data to be used as samples.

In the following session, we would see how to use the data. Find the previous session here.

Part-of-Speech (POS) Tagging in JavaScript

In Natural Language Processing (NLP), Part-of-Speech (POS) tagging is a crucial task that involves assigning specific parts of speech, such as nouns, verbs, adjectives, and more, to each word in a sentence.

Understanding Part-of-Speech Tagging

POS tagging plays a vital role in NLP, as it helps in understanding the grammatical structure of a sentence, which is essential for various language-related tasks.

The process involves analyzing each word in a sentence and assigning them their corresponding part of speech. For example, in the sentence "The quick brown fox jumps over the lazy dog," the words would be tagged as follows:

  • "The" -> Determiner (DT)

  • "quick" -> Adjective (JJ)

  • "brown" -> Adjective (JJ)

  • "fox" -> Noun (NN)

  • "jumps" -> Verb (VBZ)

  • "over" -> Preposition (IN)

  • "the" -> Determiner (DT)

  • "lazy" -> Adjective (JJ)

  • "dog" -> Noun (NN)

Working with POS Tagging in JavaScript

We'll use the "natural" NLP library; follow the previous session to understand how to set up your environment.

To perform POS tagging in JavaScript, which supports various NLP tasks, including tokenization, POS tagging, and more.

Implementing POS Tagging

To implement this step, you must set up the "natural" library, and the steps are outlined in the first session of this two-part series. Let's implement POS tagging in JavaScript.

// Import the "natural" library
const natural = require('natural');

// Create a tokenizer and POS tagger instance
const tokenizer = new natural.WordTokenizer();
const posTagger = new natural.BrillPOSTagger(
  natural.BrillPOSTagger.defaultRules,
  natural.BrillPOSTagger.defaultLexicon
);

// Sample sentence for POS tagging
const sentence = "The quick brown fox jumps over the lazy dog";

// Tokenize the sentence into words
const words = tokenizer.tokenize(sentence);

// Perform POS tagging
const taggedWords = posTagger.tag(words);

// Print the tagged words
taggedWords.forEach((word) => {
  console.log(`${word[0]} - ${word[1]}`);
});

Explanation

  • We import the "natural" library and create instances for tokenization and POS tagging using natural.WordTokenizer() and natural.BrillPOSTagger().

  • We define a sample sentence we want to tag with parts of speech.

  • The sentence is tokenized into individual words using the tokenizer.tokenize() function.

  • The posTagger.tag() function performs POS tagging on the tokenized words.

  • Finally, we iterate through the tagged words and print them along with their respective parts of speech.

Topic Modeling in JavaScript

Topic modeling is an unsupervised learning technique used to discover underlying themes or topics within a collection of text documents. We'll use a sample corpus of documents to extract meaningful topics.

Understanding Topic Modeling

Topic modeling is a statistical approach aiming to uncover latent topics in a large set of text documents.

It enables us to identify the main themes or subjects without any prior labeling or human supervision.

One of the popular algorithms for topic modeling is Latent Dirichlet Allocation (LDA).

LDA assumes that each document in the corpus is a mixture of various topics, and a distribution of words represents each topic.

The algorithm then works iteratively to assign words to different topics and determine the probability of each topic's presence in a given document.

By the end of the process, we get a list of topics and the words that contribute most to each topic.

Working with Topic Modeling in JavaScript

We will utilize the "natural" NLP library to perform topic modeling in JavaScript. We'll use a sample corpus of documents to demonstrate the process.

Implementing Topic Modeling


// Import the "natural" library
const natural = require('natural');

// Create a new LDA instance
const lda = new natural.LdaSandbox();

// Sample corpus of documents
const documents = [
  "Machine learning is an exciting field in computer science.",
  "JavaScript is a versatile programming language used for web development.",
  "Data science involves extracting insights from data using various techniques.",
  "Node.js is a popular runtime environment for server-side JavaScript applications.",
  "Topic modeling helps in discovering latent themes from text documents.",
];

// Tokenize the documents
const tokenizer = new natural.WordTokenizer();
const tokenizedDocs = documents.map((doc) => tokenizer.tokenize(doc));

// Perform topic modeling
const numTopics = 2; // Set the number of topics to discover
const numIterations = 1000; // Number of iterations for the algorithm
lda.train(tokenizedDocs, numTopics, numIterations);

// Print the extracted topics
const topics = lda.getTopics();
console.log("Extracted Topics:");
topics.forEach((topic, index) => {
  console.log(`Topic ${index + 1}: ${topic.words.join(", ")}`);
});

Explanation

  • We import the "natural" library, allowing us to work with topic modeling.

  • A sample corpus of documents is defined, representing the collection of text we want to analyze and extract topics from.

  • The documents are tokenized into individual words using the natural.WordTokenizer().

  • We set the number of topics (numTopics) we want the algorithm to discover and the number of iterations (numIterations) for the LDA algorithm to converge.

  • The lda.train() function performs topic modeling on the tokenized documents.

  • Finally, we retrieve and print the extracted topics and the most representative words for each topic using the lda.getTopics() function.

Session 8: Text Classification with NLP

Text classification is a vital Natural Language Processing (NLP) task that involves categorizing textual data into predefined classes or categories.

Understanding Text Classification

Text classification is crucial in various real-world applications, including sentiment analysis, spam detection, language identification, and content categorization. The goal is automatically assigning a label or category to a given text document based on its content.

To achieve text classification, we can leverage machine learning algorithms that learn patterns and relationships between the textual data and their corresponding classes.

One commonly used text classification algorithm is the Naive Bayes classifier, which is simple yet effective for many NLP tasks.

Implementing Text Classification


// Import the "natural" library
const natural = require('natural');

// Create a new Naive Bayes classifier instance
const classifier = new natural.BayesClassifier();

// Training data for text classification
const trainingData = [
  { text: "I love this product! It's fantastic.", category: "positive" },
  { text: "This movie was boring and disappointing.", category: "negative" },
  { text: "The weather is lovely today.", category: "positive" },
  { text: "The service at this restaurant was terrible.", category: "negative" },
  { text: "The new software update works perfectly.", category: "positive" },
];

// Training the classifier with the data
trainingData.forEach((data) => {
  classifier.addDocument(data.text, data.category);
});
classifier.train();

// Test data for text classification
const testText = "The hotel stay was wonderful! I had a great time.";

// Classify the test data
const predictedCategory = classifier.classify(testText);

// Print the predicted category
console.log(`Predicted Category: ${predictedCategory}`);

Explanation

  • We import the "natural" library, which provides the necessary tools for text classification.

  • We create a new instance of the Naive Bayes classifier using natural.BayesClassifier().

  • The training data contains labeled examples of text and corresponding categories (positive or negative in this case).

  • The classifier is trained on the training data using the classifier.addDocument() and classifier.train() functions.

  • We define a test text for which we want to predict the category.

  • The classifier.classify() function is used to classify the test text into a specific category.

  • The predicted category is printed on the console.

Language Translation with NLP in JavaScript

Language translation is a crucial NLP application that enables communication and understanding across different languages. This session focuses on language translation techniques and demonstrates how to perform language translation in JavaScript using NLP libraries.

Language Translation Techniques

Language translation can be achieved using different techniques, including rule-based approaches, statistical machine translation, and neural machine translation. In this session, we'll utilize the power of NLP libraries to perform language translation.

Implementing Language Translation in JavaScript

To perform language translation in JavaScript, we can leverage NLP libraries such as "translate-google" and "translate" to access translation services.

Example: Translating Text Using "translate-google" Library


// Import the "translate-google" library
const translate = require('translate-google');

// Text to be translated
const text = "Hello, how are you?";

// Source and target languages
const sourceLanguage = 'en';
const targetLanguage = 'es';

// Translate the text
translate(text, { from: sourceLanguage, to: targetLanguage })
  .then((translation) => {
    console.log(`Translated Text: ${translation}`);
  })
  .catch((error) => {
    console.error('Translation Error:', error);
  });

Example: Translating Text Using "translate" Library


// Import the "translate" library
const translate = require('translate');

// Configure the library with the translation service
translate.engine = 'google';
translate.from = 'en';
translate.to = 'fr';

// Text to be translated
const text = "Good morning, how are you today?";

// Translate the text
translate(text)
  .then((translation) => {
    console.log(`Translated Text: ${translation}`);
  })
  .catch((error) => {
    console.error('Translation Error:', error);
  });

Explanation

  • We import the required NLP libraries for language translation: "translate-google" or "translate".

  • We define the text that needs to be translated.

  • We specify the source language (sourceLanguage) and the target language (targetLanguage) for translation.

  • The text is translated using the translate() function provided by the respective library.

  • The translated text is printed to the console.

In the final session of this series, we would look at the use case and future trend of NLP and how its implication in Javascript has the potential to transform learning.

Follow us to see how we build the final project, as this is the first session of a three-part series. If you find this post exciting, find more exciting posts on Learnhub Blog; we write everything tech from Cloud computing to Frontend Dev, Cybersecurity, AI, and Blockchain.

Resource