Connect with us

Jobs & Careers

Mistral AI Secures €1.7 Bn Funding Led by ASML, Valuation Reaches €11.7 Bn

Published

on


French AI startup Mistral AI has raised €1.7 billion in a Series C funding round, led by Dutch semiconductor giant ASML, bringing its valuation to about €11.7 billion.

ASML invested €1.3 billion, becoming Mistral’s largest shareholder with an 11% stake and a seat on the company’s strategic committee.

“ASML is proud to enter a strategic partnership with Mistral AI, and to be the lead investor in this funding round,” said CEO Christophe Fouquet. “The collaboration between Mistral AI and ASML aims to generate clear benefits for ASML customers through innovative products and solutions enabled by AI, and will offer potential for joint research to address future opportunities.”

The round also saw participation from major investors, including Andreessen Horowitz, DST Global, Bpifrance, General Catalyst, Index Ventures, Lightspeed Venture Partners and NVIDIA.

The deal cements Mistral AI’s position as Europe’s most valuable AI company, strengthening the region’s bid for technological sovereignty and providing a counterweight to US giants such as OpenAI and Anthropic.

Mistral AI said the investment will support scientific research and the development of decentralised frontier AI solutions aimed at addressing complex engineering and industrial problems.

“This investment brings together two technology leaders operating in the same value chain,” Arthur Mensch, CEO of Mistral AI, said. “We have the ambition to help ASML and its numerous partners solve current and future engineering challenges through AI, and ultimately to advance the full semiconductor and AI value chain.”

Fouquet described the tie-up as a strategic move to enhance chipmaking tools with advanced AI capabilities, stressing that the focus is on technological collaboration rather than geopolitics.

Founded in 2023 by former Google DeepMind and Meta researchers, Mistral has quickly risen to the forefront of Europe’s AI ecosystem. The latest funding builds on its €600 million Series B round in 2024, signalling growing momentum in Europe’s push to compete globally in artificial intelligence.

The post Mistral AI Secures €1.7 Bn Funding Led by ASML, Valuation Reaches €11.7 Bn appeared first on Analytics India Magazine.



Source link

Jobs & Careers

Uncommon Uses of Common Python Standard Library Functions

Published

on


Uncommon Uses of Common Python Standard Library Functions
Image by Author | Ideogram

 

Introduction

 
You know the basics of Python’s standard library. You’ve probably used functions like zip() and groupby() to handle everyday tasks without fuss. But here’s what most developers miss: these same functions can solve surprisingly “uncommon” problems in ways you’ve probably never considered. This article explains some of these uses of familiar Python functions.

🔗 Link to the code on GitHub

 

1. itertools.groupby() for Run-Length Encoding

 
While most developers think of groupby() as a simple tool for grouping data logically, it’s also useful for run-length encoding — a compression technique that counts consecutive identical elements. This function naturally groups adjacent matching items together, so you can transform repetitive sequences into compact representations.

from itertools import groupby

# Analyze user activity patterns from server logs
user_actions = ['login', 'login', 'browse', 'browse', 'browse',
                'purchase', 'logout', 'logout']

# Compress into pattern summary
activity_patterns = [(action, len(list(group)))
                    for action, group in groupby(user_actions)]

print(activity_patterns)

# Calculate total time spent in each activity phase
total_duration = sum(count for action, count in activity_patterns)
print(f"Session lasted {total_duration} actions")

 

Output:

[('login', 2), ('browse', 3), ('purchase', 1), ('logout', 2)]
Session lasted 8 actions

 

The groupby() function identifies consecutive identical elements and groups them together. By converting each group to a list and measuring its length, you get a count of how many times each action occurred in sequence.

 

2. zip() with * for Matrix Transposition

 
Matrix transposition — flipping rows into columns — becomes simple when you combine zip() with Python’s unpacking operator.

The unpacking operator (*) spreads your matrix rows as individual arguments to zip(), which then reassembles them by taking corresponding elements from each row.

# Quarterly sales data organized by product lines
quarterly_sales = [
    [120, 135, 148, 162],  # Product A by quarter
    [95, 102, 118, 125],   # Product B by quarter
    [87, 94, 101, 115]     # Product C by quarter
]

# Transform to quarterly view across all products
by_quarter = list(zip(*quarterly_sales))
print("Sales by quarter:", by_quarter)

# Calculate quarterly growth rates
quarterly_totals = [sum(quarter) for quarter in by_quarter]
growth_rates = [(quarterly_totals[i] - quarterly_totals[i-1]) / quarterly_totals[i-1] * 100
                for i in range(1, len(quarterly_totals))]
print(f"Growth rates: {[f'{rate:.1f}%' for rate in growth_rates]}")

 

Output:

Sales by quarter: [(120, 95, 87), (135, 102, 94), (148, 118, 101), (162, 125, 115)]
Growth rates: ['9.6%', '10.9%', '9.5%']

 

We unpack the lists first, and then the zip() function groups the first elements from each list, then the second elements, and so on.

 

3. bisect for Maintaining Sorted Order

 
Keeping data sorted as you add new elements typically requires expensive re-sorting operations, but the bisect module maintains order automatically using binary search algorithms.

The module has functions that help find the exact insertion point for new elements in logarithmic time, then place them correctly without disturbing the existing order.

import bisect

# Maintain a high-score leaderboard that stays sorted
class Leaderboard:
    def __init__(self):
        self.scores = []
        self.players = []

    def add_score(self, player, score):
        # Insert maintaining descending order
        pos = bisect.bisect_left([-s for s in self.scores], -score)
        self.scores.insert(pos, score)
        self.players.insert(pos, player)

    def top_players(self, n=5):
        return list(zip(self.players[:n], self.scores[:n]))

# Demo the leaderboard
board = Leaderboard()
scores = [("Alice", 2850), ("Bob", 3100), ("Carol", 2650),
          ("David", 3350), ("Eva", 2900)]

for player, score in scores:
    board.add_score(player, score)

print("Top 3 players:", board.top_players(3))

 

Output:

Top 3 players: [('David', 3350), ('Bob', 3100), ('Eva', 2900)]

 

This is useful for maintaining leaderboards, priority queues, or any ordered collection that grows incrementally over time.

 

4. heapq for Finding Extremes Without Full Sorting

 
When you need only the largest or smallest elements from a dataset, full sorting is inefficient. The heapq module uses heap data structures to efficiently extract extreme values without sorting everything.

import heapq

# Analyze customer satisfaction survey results
survey_responses = [
    ("Restaurant A", 4.8), ("Restaurant B", 3.2), ("Restaurant C", 4.9),
    ("Restaurant D", 2.1), ("Restaurant E", 4.7), ("Restaurant F", 1.8),
    ("Restaurant G", 4.6), ("Restaurant H", 3.8), ("Restaurant I", 4.4),
    ("Restaurant J", 2.9), ("Restaurant K", 4.2), ("Restaurant L", 3.5)
]

# Find top performers and underperformers without full sorting
top_rated = heapq.nlargest(3, survey_responses, key=lambda x: x[1])
worst_rated = heapq.nsmallest(3, survey_responses, key=lambda x: x[1])

print("Excellence awards:", [name for name, rating in top_rated])
print("Needs improvement:", [name for name, rating in worst_rated])

# Calculate performance spread
best_score = top_rated[0][1]
worst_score = worst_rated[0][1]
print(f"Performance range: {worst_score} to {best_score} ({best_score - worst_score:.1f} point spread)")

 

Output:

Excellence awards: ['Restaurant C', 'Restaurant A', 'Restaurant E']
Needs improvement: ['Restaurant F', 'Restaurant D', 'Restaurant J']
Performance range: 1.8 to 4.9 (3.1 point spread)

 

The heap algorithm maintains a partial order that efficiently tracks extreme values without organizing all data.

 

5. operator.itemgetter for Multi-Level Sorting

 
Complex sorting requirements often lead to convoluted lambda expressions or nested conditional logic. But operator.itemgetter provides an elegant solution for multi-criteria sorting.

This function creates key extractors that pull multiple values from data structures, enabling Python’s natural tuple sorting to handle complex ordering logic.

from operator import itemgetter

# Employee performance data: (name, department, performance_score, hire_date)
employees = [
    ("Sarah", "Engineering", 94, "2022-03-15"),
    ("Mike", "Sales", 87, "2021-07-22"),
    ("Jennifer", "Engineering", 91, "2020-11-08"),
    ("Carlos", "Marketing", 89, "2023-01-10"),
    ("Lisa", "Sales", 92, "2022-09-03"),
    ("David", "Engineering", 88, "2021-12-14"),
    ("Amanda", "Marketing", 95, "2020-05-18")
]

sorted_employees = sorted(employees, key=itemgetter(1, 2))
# For descending performance within department:
dept_performance_sorted = sorted(employees, key=lambda x: (x[1], -x[2]))

print("Department performance rankings:")
current_dept = None
for name, dept, score, hire_date in dept_performance_sorted:
    if dept != current_dept:
        print(f"\n{dept} Department:")
        current_dept = dept
    print(f"  {name}: {score}/100")

 

Output:

Department performance rankings:

Engineering Department:
  Sarah: 94/100
  Jennifer: 91/100
  David: 88/100

Marketing Department:
  Amanda: 95/100
  Carlos: 89/100

Sales Department:
  Lisa: 92/100
  Mike: 87/100

 

The itemgetter(1, 2) function extracts the department and performance score from each tuple, creating composite sorting keys. Python’s tuple comparison naturally sorts by the first element (department), then by the second element (score) for items with matching departments.

 

6. collections.defaultdict for Building Data Structures on the Fly

 
Creating complex nested data structures typically requires tedious existence checking before adding values, leading to repetitive conditional code that obscures your actual logic.

The defaultdict eliminates this overhead by automatically creating missing values using factory functions you specify.

from collections import defaultdict

books_data = [
    ("1984", "George Orwell", "Dystopian Fiction", 1949),
    ("Dune", "Frank Herbert", "Science Fiction", 1965),
    ("Pride and Prejudice", "Jane Austen", "Romance", 1813),
    ("The Hobbit", "J.R.R. Tolkien", "Fantasy", 1937),
    ("Foundation", "Isaac Asimov", "Science Fiction", 1951),
    ("Emma", "Jane Austen", "Romance", 1815)
]

# Create multiple indexes simultaneously
catalog = {
    'by_author': defaultdict(list),
    'by_genre': defaultdict(list),
    'by_decade': defaultdict(list)
}

for title, author, genre, year in books_data:
    catalog['by_author']Bala Priya C.append((title, year))
    catalog['by_genre'][genre].append((title, author))
    catalog['by_decade'][year // 10 * 10].append((title, author))

# Query the catalog
print("Jane Austen books:", dict(catalog['by_author'])['Jane Austen'])
print("Science Fiction titles:", len(catalog['by_genre']['Science Fiction']))
print("1960s publications:", dict(catalog['by_decade']).get(1960, []))

 

Output:

Jane Austen books: [('Pride and Prejudice', 1813), ('Emma', 1815)]
Science Fiction titles: 2
1960s publications: [('Dune', 'Frank Herbert')]

 

The defaultdict(list) automatically creates empty lists for any new key you access, eliminating the need to check if key not in dictionary before appending values.

 

7. string.Template for Safe String Formatting

 
Standard string formatting methods like f-strings and .format() fail when expected variables are missing. But string.Template keeps your code running even with incomplete data. The template system leaves undefined variables in place rather than crashing.

from string import Template

report_template = Template("""
=== SYSTEM PERFORMANCE REPORT ===
Generated: $timestamp
Server: $server_name

CPU Usage: $cpu_usage%
Memory Usage: $memory_usage%
Disk Space: $disk_usage%

Active Connections: $active_connections
Error Rate: $error_rate%

${detailed_metrics}

Status: $overall_status
Next Check: $next_check_time
""")

# Simulate partial monitoring data (some sensors might be offline)
monitoring_data = {
    'timestamp': '2024-01-15 14:30:00',
    'server_name': 'web-server-01',
    'cpu_usage': '23.4',
    'memory_usage': '67.8',
    # Missing: disk_usage, active_connections, error_rate, detailed_metrics
    'overall_status': 'OPERATIONAL',
    'next_check_time': '15:30:00'
}

# Generate report with available data, leaving gaps for missing info
report = report_template.safe_substitute(monitoring_data)
print(report)
# Output shows available data filled in, missing variables left as $placeholders
print("\n" + "="*50)
print("Missing data can be filled in later:")
additional_data = {'disk_usage': '45.2', 'error_rate': '0.1'}
updated_report = Template(report).safe_substitute(additional_data)
print("Disk usage now shows:", "45.2%" in updated_report)

 
Output:

=== SYSTEM PERFORMANCE REPORT ===
Generated: 2024-01-15 14:30:00
Server: web-server-01

CPU Usage: 23.4%
Memory Usage: 67.8%
Disk Space: $disk_usage%

Active Connections: $active_connections
Error Rate: $error_rate%

${detailed_metrics}

Status: OPERATIONAL
Next Check: 15:30:00


==================================================
Missing data can be filled in later:
Disk usage now shows: True

 

The safe_substitute() method processes available variables while preserving undefined placeholders for later completion. This creates fault-tolerant systems where partial data produces meaningful partial results rather than complete failure.

This approach is useful for configuration management, report generation, email templating, or any system where data arrives incrementally or might be temporarily unavailable.

 

Conclusion

 
The Python standard library contains solutions to problems you didn’t know it could solve. What we discussed here shows how familiar functions can handle non-trivial tasks.

Next time you start writing a custom function, pause and explore what’s already available. The tools in the Python standard library often provide elegant solutions that are faster, more reliable, and require zero additional setup.

Happy coding!
 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.





Source link

Continue Reading

Jobs & Careers

Tendulkar-Backed RRP Electronics Gets 100 Acres in Maharashtra for Semiconductor Fab

Published

on


The Maharashtra government has allocated 100 acres in Navi Mumbai to RRP Electronics for the establishment of a semiconductor fabrication facility. CM Devendra Fadnavis handed over a letter of comfort to the company, which plans to relocate a fab from Sherman, Texas, with a production capacity of 1.25 lakh wafers per month.

The project is backed by former cricketer Sachin Tendulkar and marks a significant step for India’s semiconductor mission. The new fab is expected to boost industrial growth, generate employment opportunities and enhance supply chains in the state.

“This allotment of land firmly positions Maharashtra at the heart of the India Semiconductor Mission roadmap. Our government is fully committed to extending all necessary support, be it in infrastructure, policy facilitation or skill development, to ensure the success of this initiative,” Fadnavis said.

He added that the facility would accelerate industrial growth and reinforce Maharashtra’s role as a hub for high-technology manufacturing.

Rajendra Chodankar, chairman of RRP Electronics, said, “We are thankful to the Maharashtra government, the honourable chief minister and his team for the continued encouragement and support towards enabling the state to take pioneering initiatives for the semiconductor ecosystem. This acquisition is a landmark step in our journey to make India self-reliant in semiconductors.”

The move comes a year after Maharashtra launched its first outsourced semiconductor assembly and test (OSAT) facility in Navi Mumbai, which was established by RRP itself. With the new fab, the state strengthens its position in the global semiconductor value chain.

Earlier in May, HorngCom Technology of Taiwan entered into a strategic collaboration with RRP to expand its OSAT capabilities in India. The agreement followed a successful technical assessment of RRP’s semiconductor facility in Mahape, Navi Mumbai, and marked HorngCom’s latest move to scale its operations globally.



Source link

Continue Reading

Jobs & Careers

5 Tips for Building Optimized Hugging Face Transformer Pipelines

Published

on


5 Tips for Building Optimized Hugging Face Transformer Pipelines5 Tips for Building Optimized Hugging Face Transformer PipelinesImage by Editor | ChatGPT

 

Introduction

 
Hugging Face has become the standard for many AI developers and data scientists because it drastically lowers the barrier to working with advanced AI. Rather than working with AI models from scratch, developers can access a wide range of pretrained models without hassle. Users can also adapt these models with custom datasets and deploy them quickly.

One of the Hugging Face framework API wrappers is the Transformers Pipelines, a series of packages that consists of the pretrained model, its tokenizer, pre- and post-processing, and related components to make an AI use case work. These pipelines abstract complex code and provide a simple, seamless API.

However, working with Transformers Pipelines can get messy and may not yield an optimal pipeline. That is why we will explore five different ways you can optimize your Transformers Pipelines.

Let’s get into it.

 

1. Batch Inference Requests

 
Often, when using Transformers Pipelines, we do not fully utilize the graphics processing unit (GPU). Batch processing of multiple inputs can significantly boost GPU utilization and enhance inference efficiency.

Instead of processing one sample at a time, you can use the pipeline’s batch_size parameter or pass a list of inputs so the model processes several inputs in one forward pass. Here is a code example:

from transformers import pipeline

pipe = pipeline(
    task="text-classification",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device_map="auto"
)

texts = [
    "Great product and fast delivery!",
    "The UI is confusing and slow.",
    "Support resolved my issue quickly.",
    "Not worth the price."
]

results = pipe(texts, batch_size=16, truncation=True, padding=True)
for r in results:
    print(r)

 

By batching requests, you can achieve higher throughput with only a minimal impact on latency.

 

2. Use Lower Precision And Quantization

 

Many pretrained models fail at inference because development and production environments do not have enough memory. Lower numerical precision helps reduce memory usage and speeds up inference without sacrificing much accuracy.

For example, here is how to use half precision on the GPU in a Transformers Pipeline:

import torch
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    torch_dtype=torch.float16
)

 

Similarly, quantization techniques can compress model weights without noticeably degrading performance:

# Requires bitsandbytes for 8-bit quantization
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_8bit=True,
    device_map="auto"
)

 

Using lower precision and quantization in production usually speeds up pipelines and reduces memory use without significantly impacting model accuracy.

 

3. Select Efficient Model Architectures

 
In many applications, you do not need the largest model to solve the task. Selecting a lighter transformer architecture, such as a distilled model, often yields better latency and throughput with an acceptable accuracy trade-off.

Compact models or distilled versions, such as DistilBERT, retain most of the original model’s accuracy but with far fewer parameters, resulting in faster inference.

Choose a model whose architecture is optimized for inference and suits your task’s accuracy requirements.

 

4. Leverage Caching

 
Many systems waste compute by repeating expensive work. Caching can significantly enhance performance by reusing the results of costly computations.

with torch.inference_mode():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=120,
        do_sample=False,
        use_cache=True
    )

 

Efficient caching reduces computation time and improves response times, lowering latency in production systems.

 

5. Use An Accelerated Runtime Via Optimum (ONNX Runtime)

 
Many pipelines run in a PyTorch not-so-optimal mode, which adds Python overhead and extra memory copies. Using Optimum with Open Neural Network Exchange (ONNX) Runtime — via ONNX Runtime — converts the model to a static graph and fuses operations, so the runtime can use faster kernels on a central processing unit (CPU) or GPU with less overhead. The result is usually faster inference, especially on CPU or mixed hardware, without changing how you call the pipeline.

Install the required packages with:

pip install -U transformers optimum[onnxruntime] onnxruntime

 

Then, convert the model with code like this:

from optimum.onnxruntime import ORTModelForSequenceClassification

ort_model = ORTModelForSequenceClassification.from_pretrained(
    model_id,
    from_transformers=True
)

 

By converting the pipeline to ONNX Runtime through Optimum, you can keep your existing pipeline code while getting lower latency and more efficient inference.

 

Wrapping Up

 
Transformers Pipelines is an API wrapper in the Hugging Face framework that facilitates AI application development by condensing complex code into simpler interfaces. In this article, we explored five tips to optimize Hugging Face Transformers Pipelines, from batch inference requests, to selecting efficient model architectures, to leveraging caching and beyond.

I hope this has helped!
 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.



Source link

Continue Reading

Trending