kali-ai-scripts.blogspot.com: Local AI Mastery: Running Phi-3.5 and Qwen 2.5 on Kali Linux for Offline Code Generation

Running Phi-Silica Alternatives on Linux(Complete Gemini 3 Conversation)

"https://learn.microsoft.com/en-us/windows/ai/apis/phi-silica" How can I implement these in Kali Linux PC I don't have any paid subscriptions, I cannot afford any now, so can these steps be done with free or open source alternatives?

Gemini said

Implementing the specific Phi Silica APIs directly on Kali Linux is not possible because those APIs are built into the Windows App SDK and are hard-coded to utilize the Windows Copilot Runtime and specific Windows-compatible NPUs (like the Snapdragon X series).

However, Phi-Silica is a derivative of Phi-3.5, which is an open-weights model. You can achieve the exact same functionality (local, offline, free AI) on Kali Linux using open-source tools that run the underlying model.

Here is how you can implement this for free on your Kali Linux PC.

1. The Recommended Tool: Ollama

Ollama is the industry standard for running "Phi" models on Linux. It is free, open-source, and manages the hardware acceleration for you.

Installation: Open your Kali terminal and run:

Bash

curl -fsSL https://ollama.com/install.sh | sh

Running the Model: Phi-Silica is a ~3.3B parameter model. The closest open-source equivalent you can run is Phi-3.5 Mini (3.8B), which actually outperforms the version embedded in Windows for many tasks.

Bash

ollama run phi3.5

2. Implementation via Python (Developer Approach)

If you want to build an application similar to what the Microsoft Learn page describes (using APIs), you should use the transformers library or llama-cpp-python. This is the Linux "API" equivalent.

Step 1: Install Dependencies

Bash

pip install torch transformers accelerate

Step 2: Python Code (The "Linux API" version) Since you requested the complete final code for corrections/implementations, here is a script that replicates the "Summarize" and "Chat" functionality mentioned in the Phi-Silica documentation:

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# 1. Setup Model (Using Phi-3.5-mini-instruct, the open sibling of Silica)
model_id = "microsoft/Phi-3.5-mini-instruct"

print("Loading model... This may take a moment.")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype="auto", 
    trust_remote_code=True
)

# 2. Function to replicate "Text Intelligence Skills" (Summarization)
def summarize_text(input_text):
    messages = [
        {"role": "system", "content": "You are a helpful assistant that summarizes text concisely."},
        {"role": "user", "content": f"Summarize the following: {input_text}"},
    ]
    
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
    generation_args = {
        "max_new_tokens": 150,
        "return_full_text": False,
        "temperature": 0.0,
        "do_sample": False,
    }
    
    output = pipe(messages, **generation_args)
    return output[0]['generated_text']

# 3. Example Usage
sample_text = """
The Phi Silica APIs are part of the Windows App SDK. They allow developers to 
access high-performance local language models on Copilot+ PCs using NPUs.
"""

print("\n--- Summary ---")
print(summarize_text(sample_text))

3. Open Source Alternatives for "Windows-Only" Features

The link you provided highlights three "Text Intelligence Skills." Here is how to do them for free on Kali:

Phi-Silica Skill	Linux Open Source Alternative
Summarize	Use `ollama run phi3.5` with the prompt: "Summarize this: [text]"
Text-to-table	Use `ollama run phi3.5` with: "Format this data into a Markdown table: [text]"
Rewrite	Use `ollama run phi3.5` with: "Rewrite this in a professional tone: [text]"

Hardware Note for Kali Linux

NPU vs GPU: Microsoft Phi-Silica is tuned for NPUs. On Kali, your best performance will come from your GPU (NVIDIA/AMD) or CPU (using AVX2 instructions).
Memory: Since Phi-3.5 is small (approx. 2.2GB - 4GB), it will run smoothly on almost any modern PC with 8GB+ RAM without needing a paid subscription or expensive hardware.

kali㉿kali)-[~]

└─$ python Phi-Silica-Gemini.py

Traceback (most recent call last):

File "/home/kali/Phi-Silica-Gemini.py", line 1, in <module>

import torch

ModuleNotFoundError: No module named 'torch'

Gemini said

It looks like you're running into a classic Python environment issue. On Kali Linux, Python is pre-installed, but library packages like torch (PyTorch) are not.

Additionally, modern Debian-based systems (like Kali) often prevent you from using pip install globally to avoid breaking system tools. The best way to fix this is to use a Virtual Environment.

The Fix: Setup and Complete Code

Follow these steps in your terminal to get the environment ready and run the script properly.

1. Install the Virtual Environment tool

Bash

sudo apt update
sudo apt install python3-venv -y

2. Create and Activate the Environment

This keeps your AI libraries isolated so they don't mess with Kali's system tools.

Bash

python3 -m venv ai_env
source ai_env/bin/activate

(You will see (ai_env) appear before your prompt).

3. Install the Required Libraries

Now that the environment is active, install the necessary packages:

Bash

pip install torch transformers accelerate sentencepiece

4. The Complete Final Code

Here is the updated, robust version of the script. I have added a check for CUDA (NVIDIA GPU) support; if you have an NVIDIA card, it will run significantly faster. If not, it will default to your CPU.

File: Phi-Silica-Gemini.py

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# 1. Configuration and Model Selection
# Using Phi-3.5-mini-instruct (The open-source sibling to Silica)
model_id = "microsoft/Phi-3.5-mini-instruct"

print("Checking hardware...")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

print(f"Loading model '{model_id}'... This may take a few minutes on first run.")

# 2. Initialize Model and Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype="auto", 
    trust_remote_code=True
)

# 3. Define the Intelligence Skill (Summarization)
def run_phi_task(user_input, system_prompt="You are a helpful AI assistant."):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input},
    ]
    
    # Setting up the pipeline
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
    
    generation_args = {
        "max_new_tokens": 500,
        "return_full_text": False,
        "temperature": 0.7,
        "do_sample": True,
    }
    
    print("\n--- Processing ---")
    output = pipe(messages, **generation_args)
    return output[0]['generated_text']

# 4. Implementation Example
if __name__ == "__main__":
    # Example task: Replicating Phi-Silica's Summarization Skill
    context_text = """
    The user wants to run local AI on Kali Linux without paid subscriptions. 
    By using Phi-3.5 and the Hugging Face transformers library, they can 
    achieve similar results to the Windows-only Phi-Silica API.
    """
    
    prompt = f"Summarize the following text into a single sentence:\n{context_text}"
    
    result = run_phi_task(prompt)
    print("\n[AI RESPONSE]:")
    print(result)

How to run it now:

Once you have saved the code and your ai_env is active:

Bash

python Phi-Silica-Gemini.py

Note: The first time you run this, it will download several gigabytes of model weights from Hugging Face. Ensure you have a stable internet connection and about 10GB of free disk space.

Would you like me to show you how to turn this into a local web-based API so other tools on your Kali machine can use it?

ai_env)─(kali㉿kali)-[~]

└─$ python Phi-Silica-Gemini1.py

Checking hardware...

Using device: cpu

Loading model 'microsoft/Phi-3.5-mini-instruct'... This may take a few minutes on first run.

config.json: 3.45kB [00:00, 3.07MB/s]

This model config has set a `rope_parameters['original_max_position_embeddings']` field, to be used together with `max_position_embeddings` to determine a scaling factor. Please set the `factor` field of `rope_parameters`with this ratio instead -- we recommend the use of this field over `original_max_position_embeddings`, as it is compatible with most model architectures.

tokenizer_config.json: 3.98kB [00:00, 3.02MB/s]

tokenizer.json: 1.84MB [00:00, 20.1MB/s]

tokenizer.model: 100%|████████████████████████| 500k/500k [00:01<00:00, 359kB/s]

added_tokens.json: 100%|████████████████████████| 306/306 [00:00<00:00, 475kB/s]

special_tokens_map.json: 100%|█████████████████| 665/665 [00:00<00:00, 1.02MB/s]

configuration_phi3.py: 11.2kB [00:00, 18.0MB/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:

- configuration_phi3.py

. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.

modeling_phi3.py: 73.8kB [00:00, 28.1MB/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:

- modeling_phi3.py

. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.

`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.

Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.

model.safetensors.index.json: 16.3kB [00:00, 56.7MB/s]

Fetching 2 files: 100%|██████████████████████████| 2/2 [10:05<00:00, 302.85s/it]

Download complete: 100%|███████████████████| 7.64G/7.64G [10:05<00:00, 12.6MB/s]

Loading weights: 100%|█| 195/195 [00:23<00:00, 8.35it/s, Materializing param=mo

generation_config.json: 100%|████████████████████| 195/195 [00:00<00:00, 862B/s]

--- Processing ---

Passing `generation_config` together with generation-related arguments=({'max_new_tokens', 'temperature', 'do_sample'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.

Both `max_new_tokens` (=500) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)

Traceback (most recent call last):

File "/home/kali/Phi-Silica-Gemini1.py", line 55, in <module>

result = run_phi_task(prompt)

File "/home/kali/Phi-Silica-Gemini1.py", line 41, in run_phi_task

output = pipe(messages, **generation_args)

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 293, in __call__

return super().__call__(text_inputs, **kwargs)

~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1274, in __call__

return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)

~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1281, in run_single

model_outputs = self.forward(model_inputs, **forward_params)

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1173, in forward

model_outputs = self._forward(model_inputs, **forward_params)

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 397, in _forward

output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)

File "/home/kali/ai_env/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context

return func(*args, **kwargs)

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/generation/utils.py", line 2638, in generate

result = decoding_method(

self,

...<5 lines>...

**model_kwargs,

)

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/generation/utils.py", line 2833, in _sample

outputs = self._prefill(input_ids, generation_config, model_kwargs)

File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/generation/utils.py", line 3821, in _prefill

model_inputs = self.prepare_inputs_for_generation(input_ids, is_first_iteration=True, **model_kwargs)

File "/home/kali/.cache/huggingface/modules/transformers_modules/microsoft/Phi_hyphen_3_dot_5_hyphen_mini_hyphen_instruct/2fe192450127e6a83f7441aef6e3ca586c338b77/modeling_phi3.py", line 1298, in prepare_inputs_for_generation

past_length = past_key_values.seen_tokens

^^^^^^^^^^^^^^^^^^^^^^^^^^^

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

Gemini said

This error occurs because of a version mismatch between the very latest transformers library and the custom model code provided by Microsoft for Phi-3.5. Specifically, the model script is looking for an attribute (seen_tokens) that was recently renamed in the library.

To fix this, we need to:

Force the use of the standard implementation (avoiding the buggy custom script).
Clear the cache to ensure the fix takes effect.
Clean up the generation parameters to remove those "deprecated" warnings you saw.

The Fix

Run this command in your terminal first to ensure you have the einops library, which Phi often requires for tensor operations:

Bash

pip install einops

The Complete Corrected Code

Replace everything in Phi-Silica-Gemini1.py with this version. I have added attn_implementation="eager" and trust_remote_code=True logic to bypass the seen_tokens error.

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# 1. Configuration
model_id = "microsoft/Phi-3.5-mini-instruct"

print("Checking hardware...")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

print(f"Loading model '{model_id}'...")

# 2. Initialize Model and Tokenizer
# We use attn_implementation="eager" to bypass the 'DynamicCache' error
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype="auto", 
    trust_remote_code=True,
    attn_implementation="eager" 
)

# 3. Define the Intelligence Skill
def run_phi_task(user_input, system_prompt="You are a helpful AI assistant."):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input},
    ]
    
    # We build the prompt manually to avoid pipeline/config conflicts
    prompt = tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
    )
    
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    
    print("\n--- Processing (Thinking...) ---")
    
    # Optimized generation settings
    outputs = model.generate(
        **inputs, 
        max_new_tokens=500,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Decode only the new tokens
    response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
    return response

# 4. Implementation Example
if __name__ == "__main__":
    context_text = """
    The user wants to run local AI on Kali Linux without paid subscriptions. 
    By using Phi-3.5 and the Hugging Face transformers library, they can 
    achieve similar results to the Windows-only Phi-Silica API.
    """
    
    prompt = f"Summarize the following text into a single sentence:\n{context_text}"
    
    try:
        result = run_phi_task(prompt)
        print("\n[AI RESPONSE]:")
        print(result.strip())
    except Exception as e:
        print(f"\n[ERROR]: {e}")

Why this works:

attn_implementation="eager": This tells the library to use the standard PyTorch math instead of the optimized "Flash Attention" or "SDPA" paths that are currently triggering the AttributeError in the Phi-3.5 code.
Manual Generation: By using model.generate instead of the pipeline object, we gain direct control over the inputs and avoid the max_length vs max_new_tokens conflict that caused your warnings.
CPU Optimization: Since you are running on CPU, it will still be slow (it might take 30–60 seconds to "think"). If it feels stuck, just give it a minute!

Would you like me to show you how to optimize the model size (Quantization) so it runs 4x faster on your CPU?

(ai_env)─(kali㉿kali)-[~]

└─$ python Phi-Silica-Gemini2.py

Checking hardware...

Using device: cpu

Loading model 'microsoft/Phi-3.5-mini-instruct'...

`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.

Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.

Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:18<00:00, 10.38it/s, Materializing param=model.norm.weight]

--- Processing (Thinking...) ---

[ERROR]: 'DynamicCache' object has no attribute 'seen_tokens'

Gemini said

This error is happening because trust_remote_code=True tells Python to download and use Microsoft's custom model script, which hasn't been updated to work with the latest version of the transformers library. In the newer library, seen_tokens was replaced by a more general function.

Since Phi-3.5 is now officially supported directly by the transformers library, you do not need that custom script anymore. By setting trust_remote_code=False, we force the model to use the library's built-in, bug-free implementation.

The Fix

Clear the faulty local script: Since you already downloaded the custom code, we need to make sure the model uses the official library version instead.
Update the code: We will set trust_remote_code=False.

The Complete Final Code

Replace everything in your file with this version. This is the cleanest and most compatible way to run Phi-3.5 on Kali in 2026.

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 1. Configuration
model_id = "microsoft/Phi-3.5-mini-instruct"

print("Checking hardware...")
# Kali default is usually CPU unless you've installed specific NVIDIA drivers
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

print(f"Loading model '{model_id}'...")

# 2. Initialize Model and Tokenizer
# CRITICAL FIX: trust_remote_code=False avoids the 'seen_tokens' bug
# attn_implementation="eager" ensures it runs on CPU without FlashAttention
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype="auto", 
    trust_remote_code=False, 
    attn_implementation="eager"
)

# 3. Define the Execution Logic
def run_phi_task(user_input):
    # Using the official chat template for Phi-3.5
    messages = [
        {"role": "system", "content": "You are a helpful AI assistant that summarizes text."},
        {"role": "user", "content": user_input},
    ]
    
    prompt = tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
    )
    
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    
    print("\n--- AI is Thinking (This may take a minute on CPU) ---")
    
    # Generate response
    outputs = model.generate(
        **inputs, 
        max_new_tokens=200,
        temperature=0.1, # Lower temperature for more stable summaries
        do_sample=False, # Deterministic output
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Extract only the newly generated text
    generated_text = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
    return generated_text

# 4. Main Execution
if __name__ == "__main__":
    text_to_summarize = """
    The Phi Silica APIs are built for Windows, but because the underlying 
    model is open-source, Kali Linux users can replicate the functionality 
    using the Hugging Face library. This allows for private, local AI 
    processing without any subscription costs.
    """
    
    try:
        result = run_phi_task(f"Summarize this in one sentence: {text_to_summarize}")
        print("\n[SUMMARY]:")
        print(result.strip())
    except Exception as e:
        print(f"\n[CRITICAL ERROR]: {e}")

Pro-Tip for Kali Performance:

Since you are on a "CPU" device:

The script will be slow. It will load the ~7GB model into your RAM.
If your PC has less than 16GB of RAM, this script might crash or trigger the "OOM Killer."
If it feels too slow, I can show you how to use the GGUF/Quantized version, which reduces the model size from 7GB down to 2GB and makes it run much faster on Linux.

Would you like to try the high-speed (Quantized) version instead?

┌──(ai_env)─(kali㉿kali)-[~]

└─$ python Phi-Silica-Gemini3.py

Checking hardware...

Using device: cpu

Loading model 'microsoft/Phi-3.5-mini-instruct'...

Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:15<00:00, 12.74it/s, Materializing param=model.norm.weight]

--- AI is Thinking (This may take a minute on CPU) ---

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

[SUMMARY]:

Phi Silica APIs, originally for Windows, can be replicated on Kali Linux using Hugging Face for private, local AI processing without fees.

Gemini said

Success! Since your code is now running and producing correct summaries, you have successfully replicated the core "Text Intelligence" functionality of the Windows Phi-Silica API on Kali Linux for $0$ .

However, as you noticed, running the full 7GB model on a CPU is slow. To make this "snappy" like the Windows version, we can use Quantization (GGUF format). This reduces the model size from 7GB to ~2.2GB, allowing it to fit in your CPU cache more effectively.

The High-Speed (Quantized) Implementation

This version uses llama-cpp-python. It is much faster on Kali Linux because it is written in C++ and optimized specifically for CPUs.

1. Install the Faster Library

Inside your ai_env, run:

Bash

pip install llama-cpp-python

2. The Complete Final Code (GGUF Version)

This script will automatically download the lightweight version of Phi-3.5 and run it.

Python

from llama_cpp import Llama

# 1. Configuration - Downloading the Q4_K_M version (Best balance of speed/quality)
print("Loading Optimized Phi-3.5 (GGUF)...")
llm = Llama.from_pretrained(
    repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
    filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
    n_ctx=2048,      # Context window
    n_threads=4,     # Adjust based on your CPU cores
    verbose=False
)

def fast_phi_task(user_input):
    print("\n--- AI is Thinking (Optimized) ---")
    
    # We use the chat completion API which handles the templates for us
    response = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": "You are a concise assistant."},
            {"role": "user", "content": user_input}
        ],
        max_tokens=200,
        temperature=0.1
    )
    
    return response["choices"][0]["message"]["content"]

# 3. Execution
if __name__ == "__main__":
    text = "The user has successfully bypassed Windows-only restrictions to run Phi-3.5 on Kali Linux."
    
    prompt = f"Summarize this: {text}"
    
    result = fast_phi_task(prompt)
    print("\n[FAST SUMMARY]:")
    print(result.strip())

Key Improvements in this Version:

RAM Usage: Drops from ~8GB down to ~2.5GB.
Speed: You should see tokens appearing significantly faster than the previous script.
No "Seen Tokens" Error: This library doesn't use the buggy Microsoft custom script, so it's much more stable on Linux.

Comparison of your current setup vs. Phi-Silica

Feature	Microsoft Phi-Silica (Windows)	Your Kali Implementation (Linux)
Cost	Paid (Hardware/Pro Subscription)	$0 (Free/Open Source)
Model	Phi-3 Silica (Proprietary)	Phi-3.5 Mini (Open Weights - Better)
Hardware	NPU (Snapdragon/Intel/AMD)	CPU / GPU / Any Hardware
Privacy	Local	Local & Offline

Would you like me to help you wrap this into a Python GUI (a window with a text box) so you don't have to use the terminal every time?

yes

To finalize your local AI implementation, we will build a professional desktop interface using tkinter (which comes pre-installed with Python) and the optimized llama-cpp-python library.

This GUI will provide a dedicated "Input" area, a "Summary" area, and a progress status—replicating the look and feel of a native Windows AI app, but completely free and open source on Kali.

1. Prerequisites

Ensure you have the high-speed library installed in your ai_env:

Bash

pip install llama-cpp-python

2. The Complete Final Code (`Phi-Silica-GUI.py`)

This script uses threading so the GUI doesn't "freeze" while the AI is thinking.

Python

import tkinter as tk
from tkinter import scrolledtext, messagebox
import threading
from llama_cpp import Llama

# --- AI Backend Logic ---
class AIBackend:
    def __init__(self):
        print("Loading AI Model (GGUF)...")
        # Downloads a lightweight 2.2GB version of Phi-3.5
        self.llm = Llama.from_pretrained(
            repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
            filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
            n_ctx=2048,
            n_threads=4, # Adjust to your CPU core count
            verbose=False
        )

    def generate_summary(self, text):
        prompt = f"Summarize the following text into one concise paragraph:\n\n{text}"
        response = self.llm.create_chat_completion(
            messages=[
                {"role": "system", "content": "You are a professional summarization assistant."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=250,
            temperature=0.1
        )
        return response["choices"][0]["message"]["content"]

# --- GUI Application ---
class PhiGuiApp:
    def __init__(self, root):
        self.root = root
        self.root.title("Kali AI - Local Phi-3.5")
        self.root.geometry("700x600")
        self.ai = None # Loaded on a separate thread to keep GUI responsive

        # UI Elements
        tk.Label(root, text="Paste Text to Summarize:", font=("Arial", 10, "bold")).pack(pady=5)
        
        self.input_area = scrolledtext.ScrolledText(root, height=10, width=80)
        self.input_area.pack(padx=20, pady=5)

        self.btn_run = tk.Button(root, text="Generate Summary", command=self.start_ai_thread, bg="#2ecc71", fg="white", font=("Arial", 10, "bold"))
        self.btn_run.pack(pady=10)

        self.status_label = tk.Label(root, text="Status: Ready", fg="blue")
        self.status_label.pack()

        tk.Label(root, text="AI Summary Output:", font=("Arial", 10, "bold")).pack(pady=5)
        
        self.output_area = scrolledtext.ScrolledText(root, height=10, width=80, bg="#f0f0f0")
        self.output_area.pack(padx=20, pady=5)

        # Start loading AI in the background
        threading.Thread(target=self.load_ai, daemon=True).start()

    def load_ai(self):
        self.status_label.config(text="Status: Loading Model (First time takes longer)...")
        self.btn_run.config(state="disabled")
        self.ai = AIBackend()
        self.status_label.config(text="Status: Model Loaded & Ready", fg="green")
        self.btn_run.config(state="normal")

    def start_ai_thread(self):
        input_text = self.input_area.get("1.0", tk.END).strip()
        if not input_text:
            messagebox.showwarning("Warning", "Please enter some text first!")
            return
        
        self.btn_run.config(state="disabled")
        self.status_label.config(text="Status: AI is thinking...", fg="orange")
        self.output_area.delete("1.0", tk.END)
        
        # Run AI in background thread so GUI doesn't hang
        threading.Thread(target=self.process_summary, args=(input_text,), daemon=True).start()

    def process_summary(self, text):
        try:
            summary = self.ai.generate_summary(text)
            self.output_area.insert(tk.END, summary.strip())
            self.status_label.config(text="Status: Success!", fg="green")
        except Exception as e:
            messagebox.showerror("Error", str(e))
            self.status_label.config(text="Status: Error", fg="red")
        finally:
            self.btn_run.config(state="normal")

if __name__ == "__main__":
    root = tk.Tk()
    app = PhiGuiApp(root)
    root.mainloop()

3. How to Run

Save the code as Phi-Silica-GUI.py.
In your terminal (with ai_env active), run:
Bash

```
python Phi-Silica-GUI.py
```
Note: The first time you click "Generate Summary," it will download the GGUF file (approx 2.2GB). Watch your terminal for the download progress. Once it's done, subsequent uses will be nearly instant.

Why this is better than the Windows API:

Total Control: You can change the "System Prompt" in the code to make it act like a translator, a code debugger, or a creative writer.
Privacy: Unlike Copilot+, which may send metadata to Microsoft, this script is 100% disconnected from any external servers once the model is downloaded.
Efficiency: By using the GGUF format, you are saving ~5GB of RAM compared to your previous script.

Would you like me to show you how to add a "Save to File" button to this GUI so you can export your summaries automatically?

(ai_env)─(kali㉿kali)-[~]

└─$ python Phi-Silica-GUI.py

Loading AI Model (GGUF)...

/home/kali/ai_env/lib/python3.13/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.

warnings.warn(

llama_context: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized

Gemini said

Those messages you're seeing in the terminal are quite common in 2026. The local_dir_use_symlinks warning is just a "housekeeping" alert from the Hugging Face library because Python 3.13 handles file downloads differently now. The n_ctx message simply means the AI is capable of reading a massive book (131k tokens), but we've told it to only focus on a smaller "workspace" (2048 tokens) to save your RAM—which is exactly what you want on a local machine.

Here is the complete final code with a new "Save Summary" button and the fixes to silence those warnings.

The Complete Final Code (`Phi-Silica-GUI.py`)

Python

import tkinter as tk
from tkinter import scrolledtext, messagebox, filedialog
import threading
import os
import warnings
from llama_cpp import Llama

# Silence the symlink warnings for a cleaner terminal
warnings.filterwarnings("ignore", category=UserWarning, module="huggingface_hub")

# --- AI Backend Logic ---
class AIBackend:
    def __init__(self):
        print("Loading AI Model (GGUF)...")
        # Optimization: We set n_ctx to match the model's preferred local limit
        self.llm = Llama.from_pretrained(
            repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
            filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
            n_ctx=2048,      
            n_threads=4,     
            verbose=False
        )

    def generate_summary(self, text):
        prompt = f"Summarize the following text into one concise paragraph:\n\n{text}"
        response = self.llm.create_chat_completion(
            messages=[
                {"role": "system", "content": "You are a professional summarization assistant."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=300,
            temperature=0.1
        )
        return response["choices"][0]["message"]["content"]

# --- GUI Application ---
class PhiGuiApp:
    def __init__(self, root):
        self.root = root
        self.root.title("Kali AI - Local Phi-3.5 (Free & Offline)")
        self.root.geometry("800x700")
        self.ai = None

        # Top Section: Input
        tk.Label(root, text="Input Text:", font=("Arial", 10, "bold")).pack(pady=(10, 0))
        self.input_area = scrolledtext.ScrolledText(root, height=12, width=90)
        self.input_area.pack(padx=20, pady=5)

        # Middle Section: Controls
        self.btn_frame = tk.Frame(root)
        self.btn_frame.pack(pady=10)

        self.btn_run = tk.Button(self.btn_frame, text="✨ Generate Summary", command=self.start_ai_thread, 
                                 bg="#2ecc71", fg="white", font=("Arial", 10, "bold"), width=20)
        self.btn_run.grid(row=0, column=0, padx=10)

        self.btn_save = tk.Button(self.btn_frame, text="💾 Save Summary", command=self.save_to_file, 
                                  bg="#3498db", fg="white", font=("Arial", 10, "bold"), width=20)
        self.btn_save.grid(row=0, column=1, padx=10)

        self.status_label = tk.Label(root, text="Status: Initializing...", fg="blue")
        self.status_label.pack()

        # Bottom Section: Output
        tk.Label(root, text="AI Output:", font=("Arial", 10, "bold")).pack(pady=(10, 0))
        self.output_area = scrolledtext.ScrolledText(root, height=12, width=90, bg="#f8f9fa")
        self.output_area.pack(padx=20, pady=5)

        # Load AI in background
        threading.Thread(target=self.load_ai, daemon=True).start()

    def load_ai(self):
        try:
            self.ai = AIBackend()
            self.status_label.config(text="Status: Model Ready", fg="green")
        except Exception as e:
            self.status_label.config(text=f"Status: Load Error", fg="red")
            messagebox.showerror("Model Error", str(e))

    def start_ai_thread(self):
        input_text = self.input_area.get("1.0", tk.END).strip()
        if not input_text:
            messagebox.showwarning("Empty Input", "Please paste some text to summarize.")
            return
        
        self.btn_run.config(state="disabled")
        self.status_label.config(text="Status: AI is processing...", fg="orange")
        self.output_area.delete("1.0", tk.END)
        threading.Thread(target=self.process_summary, args=(input_text,), daemon=True).start()

    def process_summary(self, text):
        try:
            summary = self.ai.generate_summary(text)
            self.output_area.insert(tk.END, summary.strip())
            self.status_label.config(text="Status: Done!", fg="green")
        except Exception as e:
            messagebox.showerror("AI Error", str(e))
            self.status_label.config(text="Status: Process Failed", fg="red")
        finally:
            self.btn_run.config(state="normal")

    def save_to_file(self):
        content = self.output_area.get("1.0", tk.END).strip()
        if not content:
            messagebox.showwarning("Nothing to Save", "Generate a summary first!")
            return
        
        file_path = filedialog.asksaveasfilename(
            defaultextension=".txt",
            filetypes=[("Text files", "*.txt"), ("All files", "*.*")],
            title="Save Summary As"
        )
        
        if file_path:
            try:
                with open(file_path, "w") as f:
                    f.write(content)
                messagebox.showinfo("Success", f"Summary saved to:\n{file_path}")
            except Exception as e:
                messagebox.showerror("Save Error", f"Could not save file: {e}")

if __name__ == "__main__":
    root = tk.Tk()
    app = PhiGuiApp(root)
    root.mainloop()

What’s New in This Version:

The Save Button: Added a "Save Summary" button that opens a standard Linux file dialog. You can choose where to save your .txt file on your Kali drive.
Warning Silencer: Added warnings.filterwarnings to stop that "symlinks" text from cluttering your terminal.
Status Indicators: The status label now changes colors (Green for ready, Orange for thinking, Red for error) so you know exactly what the AI is doing.
Threading: The "Save" and "Generate" functions work independently so the window never feels "frozen."

You now have a fully functional, local, and free alternative to Windows Phi-Silica. Would you like to add a "Clear All" button to quickly reset the input and output areas for a new task?

How can I use this model to generate programs and codes and build large priojects?

Gemini said

To move from simple summaries to building large software projects with Phi-3.5 on Kali Linux, you need to transition from a standalone script to an Agentic Workflow. While Phi-3.5-mini is "small," its 128k token context window makes it uniquely powerful for reading entire codebases.

Here is the professional strategy for using this model for large-scale development.

1. Integration: The "Local Copilot" Setup

Don't copy-paste code into your script. Instead, connect your local model directly to your IDE (VS Code or Codium) so it can see your files.

Tool: Continue.dev (Open-source VS Code Extension).
Setup:
1. Install the Continue extension in VS Code.

{
  "models": [{
    "title": "Local Phi-3.5",
    "provider": "ollama", 
    "model": "phi3.5"
  }]
}

(Note: You can run ollama serve on Kali to make this seamless).

2. Strategy for "Large Projects"

Phi-3.5 is a 3.8B parameter model. While smart, it can lose track if you ask it to "Build a whole banking app." Use the Modular Prompting method:

Step 1: The Architect Phase Ask the model to generate a file_structure.md.
Prompt: "I want to build a Python-based network scanner. Create a modular file structure including a backend, a CLI, and a requirements.txt."
Step 2: The Skeleton Phase Ask it to write the interfaces/classes first, with comments but no logic.
Step 3: Implementation Provide one file at a time to the model and ask it to fill in the logic.

3. Code Generation Script

If you want to build a tool that generates entire projects for you, you should use JSON Mode. This forces the model to output raw code that your script can automatically save to files.

Complete Final Code: Project File Generator This script takes a prompt and generates a file with the code automatically.

Python

import os
import json
from llama_cpp import Llama

# 1. Initialize High-Performance Backend
llm = Llama.from_pretrained(
    repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
    filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
    n_ctx=4096, # Expanded context for code
    n_threads=6,
    verbose=False
)

def generate_project_file(file_purpose, filename):
    print(f"🚀 Generating {filename}...")
    
    # System prompt optimized for coding
    system_msg = (
        "You are an expert Senior Lead Developer. "
        "Provide ONLY the raw code for the requested file. "
        "Do not include explanations, markdown backticks, or 'Here is your code'."
    )
    
    response = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": f"Write a complete, production-ready {file_purpose}."}
        ],
        temperature=0.2 # Lower temperature = fewer bugs
    )
    
    code = response["choices"][0]["message"]["content"]
    
    # Save the file automatically
    with open(filename, "w") as f:
        f.write(code.strip())
    print(f"✅ Saved to {os.path.abspath(filename)}")

if __name__ == "__main__":
    # Example: Generating a modular project
    generate_project_file("Python script using Scapy for a port scanner", "scanner.py")
    generate_project_file("README.md file explaining how to use a network scanner", "README.md")

4. Advanced: RAG (Retrieval Augmented Generation)

For "Large Projects," the model needs to "remember" your other files.

The Hack: Since Phi-3.5 has a 128k context window, you can actually concatenate all your .py files into one big text block and paste it into the prompt.
The Pro Way: Use a library like LangChain to index your local folder into a vector database (ChromaDB), so the model only "reads" the relevant files when you ask a question.

Summary of Tools for Kali

Terminal: Use ollama for the fastest background serving.
IDE: Use VS Code + Continue for an "Autopilot" experience.
CLI: Use aider (pip install aider-chat)—it is a command-line tool that lets Phi-3.5 actually edit your local files directly.

Would you like me to show you how to set up Aider, so the AI can run your code and fix its own bugs on your Kali machine?

"Open your config.json in Continue." How to do it?

Gemini said

To open the configuration for the Continue extension on Kali Linux (or any OS), you have two main ways: the "Command Palette" way and the "Sidebar" way.

Option 1: The Command Palette (Fastest)

Open VS Code.
Press Ctrl + Shift + P.
Type "Continue: Open Config" (or just "Continue Config").
Select it, and it will automatically open the config.yaml or config.json file in your editor.

Option 2: The Sidebar (Visual)

Click the Continue icon in the left sidebar (looks like a small triangle/play button).
Look at the bottom left of the Continue panel for a Gear Icon (⚙️).
Click that icon to open the configuration file.

How to add your Local Phi-3.5

Once the file is open, search for the "models": [] section. To use the model we've been working with, you will need to have Ollama running in the background (as it's the easiest way for Continue to talk to local models).

Paste this into your models array:

JSON

{
  "title": "Local Phi-3.5",
  "provider": "ollama",
  "model": "phi3.5",
  "apiBase": "http://localhost:11434"
}

Why we are using Ollama here:

While our Python scripts used llama-cpp-python directly, the Continue extension is designed to work seamlessly with Ollama.

To set this up on your Kali machine:

In your terminal, run: ollama run phi3.5 (This downloads the model specifically for the Ollama service).
Once it's running, the VS Code extension will be able to "see" it at the apiBase address above.

Complete Final Checklist for Projects:

Generate Code: Use Ctrl + L in VS Code to ask the model to write code.
Edit Code: Highlight a block of code and press Ctrl + I. Tell it "add error handling" or "refactor this."
Explain Code: Highlight a complex function and ask "What does this do?"

Would you like me to show you how to use Aider in your terminal? It's a tool that lets Phi-3.5 actually edit your project files directly without you having to copy-paste.

help which one to choose?

Gemini said

To open the configuration for the Continue extension in Visual Studio Code, you can use any of the following methods:

Command Palette: Press Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (macOS), type "Continue: Open configuration file", and select it.
Continue Sidebar: Open the Continue sidebar by clicking its logo or pressing Ctrl+L (Cmd+L on Mac). Click the selector (often labeled "Local Assistant" or an agent name) above the chat input, hover over the agent, and click the gear (cog) icon.
Config Dropdown: In the top-right of the chat input box within the extension panel, open the configs dropdown and select the cog icon next to the "Local Config" option.

Which configuration format should you choose?

Continue currently supports multiple configuration formats, but for most users, local YAML is recommended.

Option	Best For...
config.yaml	The modern standard. It is easy to read and recommended for most users to define models, rules, and tools.
config.json	Use this if you are already comfortable with the older format. Note that this is technically a legacy/deprecated method, though still functional for many.
config.ts	Advanced users. Choose this if you need to programmatically extend your configuration using TypeScript.

Summary for your specific goal:

Since you are setting up local models like Phi-3.5 via Ollama on Kali Linux, your next step should be:

Open the config.yaml (or config.json) using one of the methods above.
Add your model details to the models section, ensuring the apiBase points to your local Ollama port (typically http://localhost:11434).

Would you like me to provide a specific YAML block to paste into your configuration for Phi-3.5?

yes

Gemini said

To add Phi-3.5 to your Continue configuration, you should follow the modern YAML standard. This format is cleaner and more resistant to the "missing brackets" errors often found in JSON.

1. The Configuration Block

Open your config.yaml (using the methods we discussed earlier) and paste this block into the models: section.

YAML

models:
  - name: Phi-3.5 Local
    provider: ollama
    model: phi3.5:latest
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    capabilities:
      - tool_use
    defaultCompletionOptions:
      temperature: 0.1
      contextLength: 4096

2. Pro-Tip: Setting up Autocomplete

If you want the "ghost text" (autocomplete) feature similar to GitHub Copilot, you should use a smaller, faster version of the model. Add this to the end of your config.yaml:

YAML

tabAutocompleteModel:
  name: Phi-3.5 Mini
  provider: ollama
  model: phi3.5:3.8b-mini-instruct-q4_K_M

3. Important Setup Step

Before VS Code can use these, you must download them via Ollama in your Kali terminal. Run these two commands:

Pull the main model:
Bash

```
ollama pull nomic-embed-text
```

4. How to use it for "Large Projects"

Once configured, you can use these special commands in the Continue chat sidebar:

@codebase: Type this before your question (e.g., “@codebase where is the login logic handled?”). The AI will search all your local files.
@file: Type this to point the AI to a specific file you want it to edit or explain.
@docs: Use this to let the AI read official documentation URLs while it writes your code.

Would you like me to provide a specialized "Rules" block for your config? This will force the AI to always follow specific coding standards (like "always use Python 3.13" or "always include unit tests").

Help if the config.yaml file is correct?

Gemini said

To determine if your config.yaml is correct for running Phi-3.5 via Ollama on Kali Linux, you should verify its structure against the following standard requirements.

1. Essential YAML Structure

Ensure your file includes these top-level fields, which are required for the configuration to load:

name: A display name for your configuration (e.g., Local Assistant).
version: Typically set to 1.0.0.
schema: Must be set to v1 to ensure compatibility with the modern YAML format.

2. Model Configuration for Phi-3.5

The models: section must be correctly formatted as a list (using hyphens) with the following specific properties for Ollama:

provider: Must be ollama.
model: Use the exact tag you downloaded (e.g., phi3.5:latest).
apiBase: For local Kali setups, this is usually http://localhost:11434.
roles: To enable all features, include chat, edit, and apply.
capabilities: Add - tool_use to ensure "Agent Mode" functions correctly.

3. Common "Gotchas" to Check

Indentation: YAML is strictly indentation-sensitive. Use exactly 2 spaces per level; never use tabs, as they will cause the file to fail.
Spaces after Colons: Ensure there is a space after every colon (e.g., model: phi3.5, not model:phi3.5).
Model Availability: Your configuration will only work if the model is already downloaded. Open your Kali terminal and run ollama list to verify phi3.5 is present.

Troubleshooting

If your models disappear from the UI after saving the file, there is likely a syntax error. You can validate your YAML by running this command in your terminal: python -c "import yaml; yaml.safe_load(open('~/.continue/config.yaml'))".

Would you like me to analyze a specific snippet of your file to see if it matches these rules?

yes

Gemini said

To verify if your config.yaml is correct for 2026, check it against this standardized "Golden Template."

In the latest versions of Continue, the YAML structure has become more strict regarding schema and roles. If your models aren't showing up in the VS Code sidebar, it is usually because one of these fields is missing or the indentation is off.

The "Golden Template" for Phi-3.5 on Kali

Compare your file to this. Every space matters:

YAML

# Top-level required fields
name: Local Phi Setup
version: 1.0.0
schema: v1

models:
  - name: Phi-3.5 Local
    provider: ollama
    model: phi3.5:latest
    apiBase: http://localhost:11434
    # Roles tell VS Code where this model can be used
    roles:
      - chat
      - edit
      - apply
    # Capabilities enable 'Agent Mode' (needed for large projects)
    capabilities:
      - tool_use
    # Optional performance settings
    completionOptions:
      temperature: 0.1
      contextLength: 4096

# Recommended: Small model for instant typing/autocomplete
tabAutocompleteModel:
  name: Phi-3.5 Autocomplete
  provider: ollama
  model: phi3.5:3.8b-mini-instruct-q4_K_M
  roles:
    - autocomplete

3-Point Checklist for Success

The "No-Tabs" Rule: YAML will fail silently if you use a single "Tab" character. Ensure your editor is using 2 spaces for indentation. You can check this in the bottom-right corner of VS Code (it should say Spaces: 2).
The "Role" Requirement: In 2026, if you don't list roles: [chat, edit], the model might appear in the chat box but refuse to help when you try to highlight code and press Ctrl + I.
The "Ollama Serve" Check: Before opening VS Code, make sure the Ollama engine is actually awake. Run this in your Kali terminal:
Bash

```
ollama serve
```
(If it says "address already in use," it's already running and you're good to go).

How to test it immediately

Once you save the file:

Look at the Continue Sidebar in VS Code.
Click the model dropdown (at the bottom of the chat window).
If you see "Phi-3.5 Local" with a green dot or no error message, it is perfectly configured.

If it's still not showing up, could you paste the first 10 lines of your file here? I can spot the specific syntax error for you.

help

Gemini said

The config.yaml file you are building for Continue is the modern standard for 2026. To ensure it works perfectly for your local Phi-3.5 setup on Kali Linux, it must follow a specific structure defined by the v1 schema.

Mandatory Configuration Fields

Your file must begin with these top-level properties to be valid:

name: A unique name for your project or setup (e.g., Kali Local AI).
version: The configuration version, typically 1.0.0.
schema: Must be set to v1 to use the YAML format correctly.

Model Settings for Phi-3.5

In the models section, you define the AI's properties. For an Ollama-powered model on your local machine, use this structure:

provider: Must be ollama.
model: Set this to the exact tag used in Ollama (e.g., phi3.5:latest).
roles: List the tasks this model can perform. Common roles include chat, edit, apply, and summarize.
capabilities: Add - tool_use if you want the model to be compatible with Agent Mode.

Adding Autocomplete Support

To enable "ghost text" suggestions while you type, you can define a separate model specifically for the autocomplete role:

Add a model to your list with the role autocomplete.
Pro-Tip: Use a smaller model like phi3.5:3.8b-mini-instruct-q4_K_M for this role to ensure the suggestions appear instantly without lag.

Verifying and Troubleshooting

Indentation: YAML requires strict indentation using spaces, not tabs.
Auto-Detection: If you are unsure of your local settings, you can set your model to AUTODETECT in the GUI, and Continue will scan your local Ollama installation for you.
Reloading: If changes do not appear in VS Code, open the Command Palette (Ctrl+Shift+P) and select "Reload Window" to refresh the extension.

Would you like me to look at your actual config.yaml code to check for any specific indentation or syntax errors?

yes

Gemini said

To confirm if your config.yaml is correct, check it against the mandatory structural rules for 2026. If any of these are missing or indented with tabs instead of spaces, the Continue extension will either fail to load your models or show a "Failed to parse config" error.

1. The "Golden" Syntax Check

Ensure your file follows this exact pattern. Pay close attention to the v1 schema, which is required for all new YAML-based configurations.

YAML

# Top-level required metadata
name: Local Phi Setup
version: 1.0.0
schema: v1

models:
  - name: Phi-3.5 Local
    provider: ollama
    model: phi3.5:latest
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    capabilities:
      - tool_use
    # Settings for this specific model
    completionOptions:
      temperature: 0.1
      contextLength: 4096

2. Common Errors to Look For

Indentation: YAML is strictly sensitive to spacing. You must use exactly 2 spaces for each level. Never use tabs; if you do, the file will be invalid.
Missing Roles: In the current version, if you do not explicitly list roles: [chat, edit], the model may not appear in the "Edit" or "Refactor" menus.
Schema Version: If you omit schema: v1, the extension may try to parse it as an older format and fail.

3. How to Verify via Terminal

You can quickly check if your YAML is technically valid by running this command in your Kali terminal:

Bash

python3 -c "import yaml; yaml.safe_load(open('/home/kali/.continue/config.yaml'))" && echo "✅ YAML is Valid"

If this command returns an error, it will tell you the exact line number where your syntax is broken.

4. Troubleshooting Models Not Appearing

If the YAML is valid but the model still doesn't show up in VS Code:

Check Ollama: Run ollama list to ensure phi3.5:latest is actually downloaded.
Toggle Developer Tools: In VS Code, press Ctrl+Shift+P, type "Toggle Developer Tools", and check the Console tab for red error messages related to "Failed to load config".
Reload: Use Ctrl+Shift+P and run "Developer: Reload Window" to force the extension to re-read your file.

If you paste your actual code here, I can identify the specific line that's causing trouble. Would you like me to do that?

(kali㉿kali)-[~]

└─$ python3 -c "import yaml; yaml.safe_load(open('/home/kali/.continue/config.yaml'))" && echo "✅ YAML is Valid"

✅ YAML is Valid

Gemini said

Great news! If the terminal says "✅ YAML is Valid", it means your syntax (indentation, colons, and structure) is perfect. Your Kali system can read the file correctly.

Since the file is technically correct, the final step is to ensure the Continue extension and Ollama are talking to each other.

Final Verification Steps

Is Ollama running? In your terminal, run:
Bash

```
ollama list
```
Confirm that phi3.5 appears in that list. If the list is empty, run ollama pull phi3.5 first.
Check the VS Code UI
- Open VS Code.
- Open the Continue sidebar (on the left).
- Look at the model dropdown at the bottom of the chat window.
- You should now see "Phi-3.5 Local" as an option. Select it.
The "Acid Test" (Coding Task) To see if your "Large Project" setup is working, try this command in the Continue chat:
@codebase How does this project work?
If it starts indexing your files, you are officially running a local, free, and private AI development environment on Kali!

One Last Recommendation: The "Rules"

Since you want to build large projects, you can add a system_message to your config.yaml inside the Phi-3.5 block to ensure it always writes high-quality code.

Would you like me to show you how to add a "Rules" section to your config so the AI always follows best security practices (crucial for Kali users)?

Error handling model response

There was an error handling the response from Phi-3.5 Local.

Please try to submit your message again, and if the error persists, let us know by reporting the issue using the buttons below.

Resubmit last message

View error output

"registry.ollama.ai/library/phi3.5:latest does not support tools"

This error occurs because the standard Phi-3.5 model in the Ollama library does not natively support "Tool Calling" (the ability for the AI to use search, index files, or run functions). When the Continue extension sees capabilities: [tool_use] in your config, it tries to use a feature the model doesn't have, leading to that "registry error."

To fix this and keep building large projects on Kali, you have two choices: The Fallback Method or The Pro Swap.

Option 1: The Fallback Method (Keep Phi-3.5)

You can force Continue to use "System Message Tools." This means the extension will explain the tools to the AI in the chat instructions rather than relying on the model's built-in code.

Update your config.yaml to this:

YAML

models:
  - name: Phi-3.5 Local
    provider: ollama
    model: phi3.5:latest
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    # REMOVE 'capabilities: [tool_use]'
    # Continue will automatically use its fallback system instead

Option 2: The Pro Swap (Best for Large Projects)

If you want to build large projects, you need an AI that is built for tools. The Llama 3.1 8B or Qwen 2.5 Coder models are the current industry standard for local coding agents because they have native tool support.

Download a "Tool-Ready" model in your terminal:
Bash

models:
  - name: Code Expert (Qwen)
    provider: ollama
    model: qwen2.5-coder:7b
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    capabilities:
      - tool_use  # This will now work perfectly!

### Why this happened:
* **Phi-3.5:** Great for reasoning and short summaries, but it doesn't know how to "talk" to external tools like your file system yet.
* **Qwen 2.5 Coder:** Specifically trained to use tools and manage multi-file projects. Since you're on Kali and likely have enough RAM for a 7B model, this is the much better choice for "Building Large Projects."

### What to do next:
1.  Try **Option 1** first to see if the error disappears.
2.  If the AI still feels "limited" when you try to use `@codebase`, switch to **Option 2**.

**Would you like me to help you verify if your PC has enough VRAM to run the 7B "Code Expert" model smoothly?**

"Local Config

Failed to parse config: Map keys must be unique at line 32, column 1: models: ^ Error: Failed to parse config: Map keys must be unique at line 32, column 1: models: ^ at parseConfigYaml (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/extension.js:261639:13) at parseYamlOrMarkdownRule (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/extension.js:262034:18) at parseMarkdownRuleOrConfigYaml (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/extension.js:262029:10) at unrollAssistantFromContent (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/e" - Config/yaml file content: "# Top-level required fields

name: Local Phi Setup

version: 1.0.0

schema: v1

models:

- name: Phi-3.5 Local

provider: ollama

model: phi3.5:latest

apiBase: http://localhost:11434

# Roles tell VS Code where this model can be used

roles:

- chat

- edit

- apply

# Capabilities enable 'Agent Mode' (needed for large projects)

capabilities:

- tool_use

# Optional performance settings

completionOptions:

temperature: 0.1

contextLength: 4096

# Recommended: Small model for instant typing/autocomplete

tabAutocompleteModel:

name: Phi-3.5 Autocomplete

provider: ollama

model: phi3.5:3.8b-mini-instruct-q4_K_M

roles:

- autocomplete

models:

- name: Code Expert (Qwen)

provider: ollama

model: qwen2.5-coder:7b

apiBase: http://localhost:11434

roles:

- chat

- edit

- apply

capabilities:

- tool_use # This will now work perfectly!"

The error "Map keys must be unique" is happening because you have the word models: written twice in your file (once on line 6 and again further down). In YAML, you can only define the models: category once; all your different AI models must be listed under that single heading.

By having two models: sections, the Continue extension gets confused and crashes.

The Fix: Merged & Corrected Code

I have merged both Phi-3.5 and Qwen (Code Expert) into one single list under one models: header. I also removed the tool_use capability from Phi-3.5 to prevent the "registry" error you saw earlier, while keeping it for Qwen.

Replace the entire content of your config.yaml with this:

YAML

# Top-level required fields
name: Local AI Setup
version: 1.0.0
schema: v1

models:
  - name: Phi-3.5 Local
    provider: ollama
    model: phi3.5:latest
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    # No tool_use here to avoid the registry error
    completionOptions:
      temperature: 0.1
      contextLength: 4096

  - name: Code Expert (Qwen)
    provider: ollama
    model: qwen2.5-coder:7b
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    capabilities:
      - tool_use  # Qwen supports this perfectly for large projects

tabAutocompleteModel:
  name: Phi-3.5 Autocomplete
  provider: ollama
  model: phi3.5:3.8b-mini-instruct-q4_K_M
  roles:
    - autocomplete

Why this works now:

Unique Keys: There is now only one models: section.
Clean Structure: Each model starts with a - name:, which tells YAML these are items in the same list.
Conflict Resolved: By keeping both, you can use Phi-3.5 for quick questions and switch to Code Expert (Qwen) when you need to use @codebase or build large projects.

How to use it in VS Code:

Save the file.
In the Continue sidebar, click the model dropdown at the bottom.
You will now see both choices. Select Code Expert (Qwen) for your large project tasks.

Would you like me to show you how to set up a "Custom Command" so you can generate a whole project structure with one click?

You said

yes

Gemini said

To build large projects with one click, we will use Slash Commands. These are custom shortcuts that tell the AI: "Use this specific expert persona and follow these structural rules for every file you generate."

1. Update your `config.yaml`

Add this customCommands section to the bottom of your file. I have designed this command specifically to generate professional, modular project structures.

YAML

# Add this to the bottom of your config.yaml
customCommands:
  - name: project
    description: Generate a complete modular project structure
    prompt: |
      You are a Senior Software Architect. The user wants to build: {{{ input }}}
      
      Please provide:
      1. A directory tree structure (folder/file layout).
      2. A 'requirements.txt' or 'package.json' listing all dependencies.
      3. The core logic for the main entry point file.
      
      Use best practices: modularity, error handling, and clear comments. 
      For Kali Linux, ensure any networking code follows standard security protocols.

2. How to use it in VS Code

Once you save the file, the command is ready instantly:

Open the Continue sidebar (Ctrl + L).
Type /project followed by your idea.
- Example: /project a Python tool that scans a network and saves results to a PDF
The AI will now stop giving you "snippet" answers and start giving you a full Architectural Blueprint.

3. The "Power User" Workflow for Large Projects

For a large project, don't try to generate 20 files in one go (it will hit memory limits). Use this 3-step sequence:

Step 1 (The Blueprint): Run /project [your idea]. Copy the directory tree it gives you and create those empty folders/files in your Kali terminal.
Step 2 (The Implementation): Open one of the empty files, highlight the file name, and type:
@codebase Write the full implementation for this specific file based on our previous blueprint.
Step 3 (The Refactor): Once the code is written, highlight it all and press Ctrl + I, then type:
add logging and a try/except block for every network call

4. Critical Pro-Tip for Kali

Since you are on Kali, you likely deal with security tools. If you want the AI to be a Security Expert, add this command too:

YAML

  - name: audit
    description: Check code for security vulnerabilities
    prompt: |
      Review the following code for OWASP top 10 vulnerabilities, 
      unsafe buffer handling, or hardcoded credentials. 
      Suggest a secure alternative for every issue found.
      {{{ input }}}

Would you like me to help you create a Bash script that automatically creates the folders and files based on the AI's blueprint?

Open your config.json in Continue.

Point it to your local llama-cpp-python server:

JSON

ollama pull phi3.5

Pull the optimized embedding model (This allows the AI to "read" your whole folder/codebase using the @codebase command):

Bash

ollama pull qwen2.5-coder:7b

Update your config.yaml to use it:

YAML

Bypassing Restrictions: Replicated Windows-only Phi-Silica features on Kali Linux for $0.

Optimized Performance: Switched to GGUF quantization to save 5GB of RAM and boost speed.

GUI Development: Built a custom Python-based desktop interface for local AI summarization.

Pro Coding Environment: Configured VS Code with the Continue extension and Ollama to manage large, multi-file software projects.

Agentic Workflows: Added custom slash commands (/project) to automate architectural blueprints and security audits.

#KaliLinux #LocalAI #Phi35 #QwenAI #Ollama #AICoding #LinuxDev #CyberSecurity #OpenSourceAI #PrivateAI #VSCode #PythonCoding 

Saturday, February 21, 2026

Local AI Mastery: Running Phi-3.5 and Qwen 2.5 on Kali Linux for Offline Code Generation

1. The Recommended Tool: Ollama

2. Implementation via Python (Developer Approach)

3. Open Source Alternatives for "Windows-Only" Features

Hardware Note for Kali Linux

The Fix: Setup and Complete Code

1. Install the Virtual Environment tool

2. Create and Activate the Environment

3. Install the Required Libraries

4. The Complete Final Code

How to run it now:

The Fix

The Complete Corrected Code

Why this works:

The Fix

The Complete Final Code

Pro-Tip for Kali Performance:

The High-Speed (Quantized) Implementation

1. Install the Faster Library

2. The Complete Final Code (GGUF Version)

Key Improvements in this Version:

Comparison of your current setup vs. Phi-Silica

Gemini said

1. Prerequisites

2. The Complete Final Code (Phi-Silica-GUI.py)

3. How to Run

Why this is better than the Windows API:

The Complete Final Code (Phi-Silica-GUI.py)

What’s New in This Version:

1. Integration: The "Local Copilot" Setup

2. Strategy for "Large Projects"

3. Code Generation Script

4. Advanced: RAG (Retrieval Augmented Generation)

Summary of Tools for Kali

Option 1: The Command Palette (Fastest)

Option 2: The Sidebar (Visual)

How to add your Local Phi-3.5

Why we are using Ollama here:

Complete Final Checklist for Projects:

Which configuration format should you choose?

Summary for your specific goal:

1. The Configuration Block

2. Pro-Tip: Setting up Autocomplete

3. Important Setup Step

4. How to use it for "Large Projects"

1. Essential YAML Structure

2. Model Configuration for Phi-3.5

3. Common "Gotchas" to Check

Troubleshooting

The "Golden Template" for Phi-3.5 on Kali

3-Point Checklist for Success

How to test it immediately

Mandatory Configuration Fields

Model Settings for Phi-3.5

Adding Autocomplete Support

Verifying and Troubleshooting

1. The "Golden" Syntax Check

2. Common Errors to Look For

3. How to Verify via Terminal

4. Troubleshooting Models Not Appearing

Final Verification Steps

One Last Recommendation: The "Rules"

Gemini said

Option 1: The Fallback Method (Keep Phi-3.5)

Option 2: The Pro Swap (Best for Large Projects)

Gemini said

The Fix: Merged & Corrected Code

Why this works now:

How to use it in VS Code:

1. Update your config.yaml

2. How to use it in VS Code

3. The "Power User" Workflow for Large Projects

4. Critical Pro-Tip for Kali

No comments:

Post a Comment

2. The Complete Final Code (`Phi-Silica-GUI.py`)

The Complete Final Code (`Phi-Silica-GUI.py`)

1. Update your `config.yaml`