Running Phi-Silica Alternatives on Linux(Complete Gemini 3 Conversation)
"https://learn.microsoft.com/en-us/windows/ai/apis/phi-silica" How can I implement these in Kali Linux PC I don't have any paid subscriptions, I cannot afford any now, so can these steps be done with free or open source alternatives?
Gemini said
Implementing the specific Phi Silica APIs directly on Kali Linux is not possible because those APIs are built into the Windows App SDK and are hard-coded to utilize the Windows Copilot Runtime and specific Windows-compatible NPUs (like the Snapdragon X series).
However, Phi-Silica is a derivative of Phi-3.5, which is an open-weights model. You can achieve the exact same functionality (local, offline, free AI) on Kali Linux using open-source tools that run the underlying model.
Here is how you can implement this for free on your Kali Linux PC.
1. The Recommended Tool: Ollama
Ollama is the industry standard for running "Phi" models on Linux. It is free, open-source, and manages the hardware acceleration for you.
Installation: Open your Kali terminal and run:
curl -fsSL https://ollama.com/install.sh | sh
Running the Model: Phi-Silica is a ~3.3B parameter model. The closest open-source equivalent you can run is Phi-3.5 Mini (3.8B), which actually outperforms the version embedded in Windows for many tasks.
ollama run phi3.5
2. Implementation via Python (Developer Approach)
If you want to build an application similar to what the Microsoft Learn page describes (using APIs), you should use the transformers library or llama-cpp-python. This is the Linux "API" equivalent.
Step 1: Install Dependencies
pip install torch transformers accelerate
Step 2: Python Code (The "Linux API" version) Since you requested the complete final code for corrections/implementations, here is a script that replicates the "Summarize" and "Chat" functionality mentioned in the Phi-Silica documentation:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# 1. Setup Model (Using Phi-3.5-mini-instruct, the open sibling of Silica)
model_id = "microsoft/Phi-3.5-mini-instruct"
print("Loading model... This may take a moment.")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
# 2. Function to replicate "Text Intelligence Skills" (Summarization)
def summarize_text(input_text):
messages = [
{"role": "system", "content": "You are a helpful assistant that summarizes text concisely."},
{"role": "user", "content": f"Summarize the following: {input_text}"},
]
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
generation_args = {
"max_new_tokens": 150,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
return output[0]['generated_text']
# 3. Example Usage
sample_text = """
The Phi Silica APIs are part of the Windows App SDK. They allow developers to
access high-performance local language models on Copilot+ PCs using NPUs.
"""
print("\n--- Summary ---")
print(summarize_text(sample_text))
3. Open Source Alternatives for "Windows-Only" Features
The link you provided highlights three "Text Intelligence Skills." Here is how to do them for free on Kali:
Hardware Note for Kali Linux
NPU vs GPU: Microsoft Phi-Silica is tuned for NPUs. On Kali, your best performance will come from your GPU (NVIDIA/AMD) or CPU (using AVX2 instructions).
Memory: Since Phi-3.5 is small (approx. 2.2GB - 4GB), it will run smoothly on almost any modern PC with 8GB+ RAM without needing a paid subscription or expensive hardware.
kali㉿kali)-[~]
└─$ python Phi-Silica-Gemini.py
Traceback (most recent call last):
File "/home/kali/Phi-Silica-Gemini.py", line 1, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
Gemini said
It looks like you're running into a classic Python environment issue. On Kali Linux, Python is pre-installed, but library packages like torch (PyTorch) are not.
Additionally, modern Debian-based systems (like Kali) often prevent you from using pip install globally to avoid breaking system tools. The best way to fix this is to use a Virtual Environment.
The Fix: Setup and Complete Code
Follow these steps in your terminal to get the environment ready and run the script properly.
1. Install the Virtual Environment tool
sudo apt update
sudo apt install python3-venv -y
2. Create and Activate the Environment
This keeps your AI libraries isolated so they don't mess with Kali's system tools.
python3 -m venv ai_env
source ai_env/bin/activate
(You will see (ai_env) appear before your prompt).
3. Install the Required Libraries
Now that the environment is active, install the necessary packages:
pip install torch transformers accelerate sentencepiece
4. The Complete Final Code
Here is the updated, robust version of the script. I have added a check for CUDA (NVIDIA GPU) support; if you have an NVIDIA card, it will run significantly faster. If not, it will default to your CPU.
File: Phi-Silica-Gemini.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# 1. Configuration and Model Selection
# Using Phi-3.5-mini-instruct (The open-source sibling to Silica)
model_id = "microsoft/Phi-3.5-mini-instruct"
print("Checking hardware...")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print(f"Loading model '{model_id}'... This may take a few minutes on first run.")
# 2. Initialize Model and Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True
)
# 3. Define the Intelligence Skill (Summarization)
def run_phi_task(user_input, system_prompt="You are a helpful AI assistant."):
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input},
]
# Setting up the pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.7,
"do_sample": True,
}
print("\n--- Processing ---")
output = pipe(messages, **generation_args)
return output[0]['generated_text']
# 4. Implementation Example
if __name__ == "__main__":
# Example task: Replicating Phi-Silica's Summarization Skill
context_text = """
The user wants to run local AI on Kali Linux without paid subscriptions.
By using Phi-3.5 and the Hugging Face transformers library, they can
achieve similar results to the Windows-only Phi-Silica API.
"""
prompt = f"Summarize the following text into a single sentence:\n{context_text}"
result = run_phi_task(prompt)
print("\n[AI RESPONSE]:")
print(result)
How to run it now:
Once you have saved the code and your ai_env is active:
python Phi-Silica-Gemini.py
Note: The first time you run this, it will download several gigabytes of model weights from Hugging Face. Ensure you have a stable internet connection and about 10GB of free disk space.
Would you like me to show you how to turn this into a local web-based API so other tools on your Kali machine can use it?
ai_env)─(kali㉿kali)-[~]
└─$ python Phi-Silica-Gemini1.py
Checking hardware...
Using device: cpu
Loading model 'microsoft/Phi-3.5-mini-instruct'... This may take a few minutes on first run.
config.json: 3.45kB [00:00, 3.07MB/s]
This model config has set a `rope_parameters['original_max_position_embeddings']` field, to be used together with `max_position_embeddings` to determine a scaling factor. Please set the `factor` field of `rope_parameters`with this ratio instead -- we recommend the use of this field over `original_max_position_embeddings`, as it is compatible with most model architectures.
tokenizer_config.json: 3.98kB [00:00, 3.02MB/s]
tokenizer.json: 1.84MB [00:00, 20.1MB/s]
tokenizer.model: 100%|████████████████████████| 500k/500k [00:01<00:00, 359kB/s]
added_tokens.json: 100%|████████████████████████| 306/306 [00:00<00:00, 475kB/s]
special_tokens_map.json: 100%|█████████████████| 665/665 [00:00<00:00, 1.02MB/s]
configuration_phi3.py: 11.2kB [00:00, 18.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_phi3.py: 73.8kB [00:00, 28.1MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3.5-mini-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
model.safetensors.index.json: 16.3kB [00:00, 56.7MB/s]
Fetching 2 files: 100%|██████████████████████████| 2/2 [10:05<00:00, 302.85s/it]
Download complete: 100%|███████████████████| 7.64G/7.64G [10:05<00:00, 12.6MB/s]
Loading weights: 100%|█| 195/195 [00:23<00:00, 8.35it/s, Materializing param=mo
generation_config.json: 100%|████████████████████| 195/195 [00:00<00:00, 862B/s]
--- Processing ---
Passing `generation_config` together with generation-related arguments=({'max_new_tokens', 'temperature', 'do_sample'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
Both `max_new_tokens` (=500) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Traceback (most recent call last):
File "/home/kali/Phi-Silica-Gemini1.py", line 55, in <module>
result = run_phi_task(prompt)
File "/home/kali/Phi-Silica-Gemini1.py", line 41, in run_phi_task
output = pipe(messages, **generation_args)
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 293, in __call__
return super().__call__(text_inputs, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1274, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1281, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1173, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 397, in _forward
output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
File "/home/kali/ai_env/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/generation/utils.py", line 2638, in generate
result = decoding_method(
self,
...<5 lines>...
**model_kwargs,
)
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/generation/utils.py", line 2833, in _sample
outputs = self._prefill(input_ids, generation_config, model_kwargs)
File "/home/kali/ai_env/lib/python3.13/site-packages/transformers/generation/utils.py", line 3821, in _prefill
model_inputs = self.prepare_inputs_for_generation(input_ids, is_first_iteration=True, **model_kwargs)
File "/home/kali/.cache/huggingface/modules/transformers_modules/microsoft/Phi_hyphen_3_dot_5_hyphen_mini_hyphen_instruct/2fe192450127e6a83f7441aef6e3ca586c338b77/modeling_phi3.py", line 1298, in prepare_inputs_for_generation
past_length = past_key_values.seen_tokens
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'
Gemini said
This error occurs because of a version mismatch between the very latest transformers library and the custom model code provided by Microsoft for Phi-3.5. Specifically, the model script is looking for an attribute (seen_tokens) that was recently renamed in the library.
To fix this, we need to:
Force the use of the standard implementation (avoiding the buggy custom script).
Clear the cache to ensure the fix takes effect.
Clean up the generation parameters to remove those "deprecated" warnings you saw.
The Fix
Run this command in your terminal first to ensure you have the einops library, which Phi often requires for tensor operations:
pip install einops
The Complete Corrected Code
Replace everything in Phi-Silica-Gemini1.py with this version. I have added attn_implementation="eager" and trust_remote_code=True logic to bypass the seen_tokens error.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# 1. Configuration
model_id = "microsoft/Phi-3.5-mini-instruct"
print("Checking hardware...")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print(f"Loading model '{model_id}'...")
# 2. Initialize Model and Tokenizer
# We use attn_implementation="eager" to bypass the 'DynamicCache' error
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
attn_implementation="eager"
)
# 3. Define the Intelligence Skill
def run_phi_task(user_input, system_prompt="You are a helpful AI assistant."):
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input},
]
# We build the prompt manually to avoid pipeline/config conflicts
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
print("\n--- Processing (Thinking...) ---")
# Optimized generation settings
outputs = model.generate(
**inputs,
max_new_tokens=500,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode only the new tokens
response = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
return response
# 4. Implementation Example
if __name__ == "__main__":
context_text = """
The user wants to run local AI on Kali Linux without paid subscriptions.
By using Phi-3.5 and the Hugging Face transformers library, they can
achieve similar results to the Windows-only Phi-Silica API.
"""
prompt = f"Summarize the following text into a single sentence:\n{context_text}"
try:
result = run_phi_task(prompt)
print("\n[AI RESPONSE]:")
print(result.strip())
except Exception as e:
print(f"\n[ERROR]: {e}")
Why this works:
attn_implementation="eager": This tells the library to use the standard PyTorch math instead of the optimized "Flash Attention" or "SDPA" paths that are currently triggering theAttributeErrorin the Phi-3.5 code.Manual Generation: By using
model.generateinstead of thepipelineobject, we gain direct control over the inputs and avoid themax_lengthvsmax_new_tokensconflict that caused your warnings.CPU Optimization: Since you are running on CPU, it will still be slow (it might take 30–60 seconds to "think"). If it feels stuck, just give it a minute!
Would you like me to show you how to optimize the model size (Quantization) so it runs 4x faster on your CPU?
(ai_env)─(kali㉿kali)-[~]
└─$ python Phi-Silica-Gemini2.py
Checking hardware...
Using device: cpu
Loading model 'microsoft/Phi-3.5-mini-instruct'...
This model config has set a `rope_parameters['original_max_position_embeddings']` field, to be used together with `max_position_embeddings` to determine a scaling factor. Please set the `factor` field of `rope_parameters`with this ratio instead -- we recommend the use of this field over `original_max_position_embeddings`, as it is compatible with most model architectures.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:18<00:00, 10.38it/s, Materializing param=model.norm.weight]
--- Processing (Thinking...) ---
[ERROR]: 'DynamicCache' object has no attribute 'seen_tokens'
Gemini said
This error is happening because trust_remote_code=True tells Python to download and use Microsoft's custom model script, which hasn't been updated to work with the latest version of the transformers library. In the newer library, seen_tokens was replaced by a more general function.
Since Phi-3.5 is now officially supported directly by the transformers library, you do not need that custom script anymore. By setting trust_remote_code=False, we force the model to use the library's built-in, bug-free implementation.
The Fix
Clear the faulty local script: Since you already downloaded the custom code, we need to make sure the model uses the official library version instead.
Update the code: We will set
trust_remote_code=False.
The Complete Final Code
Replace everything in your file with this version. This is the cleanest and most compatible way to run Phi-3.5 on Kali in 2026.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# 1. Configuration
model_id = "microsoft/Phi-3.5-mini-instruct"
print("Checking hardware...")
# Kali default is usually CPU unless you've installed specific NVIDIA drivers
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print(f"Loading model '{model_id}'...")
# 2. Initialize Model and Tokenizer
# CRITICAL FIX: trust_remote_code=False avoids the 'seen_tokens' bug
# attn_implementation="eager" ensures it runs on CPU without FlashAttention
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto",
trust_remote_code=False,
attn_implementation="eager"
)
# 3. Define the Execution Logic
def run_phi_task(user_input):
# Using the official chat template for Phi-3.5
messages = [
{"role": "system", "content": "You are a helpful AI assistant that summarizes text."},
{"role": "user", "content": user_input},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
print("\n--- AI is Thinking (This may take a minute on CPU) ---")
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.1, # Lower temperature for more stable summaries
do_sample=False, # Deterministic output
pad_token_id=tokenizer.eos_token_id
)
# Extract only the newly generated text
generated_text = tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
return generated_text
# 4. Main Execution
if __name__ == "__main__":
text_to_summarize = """
The Phi Silica APIs are built for Windows, but because the underlying
model is open-source, Kali Linux users can replicate the functionality
using the Hugging Face library. This allows for private, local AI
processing without any subscription costs.
"""
try:
result = run_phi_task(f"Summarize this in one sentence: {text_to_summarize}")
print("\n[SUMMARY]:")
print(result.strip())
except Exception as e:
print(f"\n[CRITICAL ERROR]: {e}")
Pro-Tip for Kali Performance:
Since you are on a "CPU" device:
The script will be slow. It will load the ~7GB model into your RAM.
If your PC has less than 16GB of RAM, this script might crash or trigger the "OOM Killer."
If it feels too slow, I can show you how to use the GGUF/Quantized version, which reduces the model size from 7GB down to 2GB and makes it run much faster on Linux.
Would you like to try the high-speed (Quantized) version instead?
┌──(ai_env)─(kali㉿kali)-[~]
└─$ python Phi-Silica-Gemini3.py
Checking hardware...
Using device: cpu
Loading model 'microsoft/Phi-3.5-mini-instruct'...
This model config has set a `rope_parameters['original_max_position_embeddings']` field, to be used together with `max_position_embeddings` to determine a scaling factor. Please set the `factor` field of `rope_parameters`with this ratio instead -- we recommend the use of this field over `original_max_position_embeddings`, as it is compatible with most model architectures.
Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:15<00:00, 12.74it/s, Materializing param=model.norm.weight]
--- AI is Thinking (This may take a minute on CPU) ---
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
[SUMMARY]:
Phi Silica APIs, originally for Windows, can be replicated on Kali Linux using Hugging Face for private, local AI processing without fees.
Gemini said
Success! Since your code is now running and producing correct summaries, you have successfully replicated the core "Text Intelligence" functionality of the Windows Phi-Silica API on Kali Linux for .
However, as you noticed, running the full 7GB model on a CPU is slow. To make this "snappy" like the Windows version, we can use Quantization (GGUF format). This reduces the model size from 7GB to ~2.2GB, allowing it to fit in your CPU cache more effectively.
The High-Speed (Quantized) Implementation
This version uses llama-cpp-python. It is much faster on Kali Linux because it is written in C++ and optimized specifically for CPUs.
1. Install the Faster Library
Inside your ai_env, run:
pip install llama-cpp-python
2. The Complete Final Code (GGUF Version)
This script will automatically download the lightweight version of Phi-3.5 and run it.
from llama_cpp import Llama
# 1. Configuration - Downloading the Q4_K_M version (Best balance of speed/quality)
print("Loading Optimized Phi-3.5 (GGUF)...")
llm = Llama.from_pretrained(
repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
n_ctx=2048, # Context window
n_threads=4, # Adjust based on your CPU cores
verbose=False
)
def fast_phi_task(user_input):
print("\n--- AI is Thinking (Optimized) ---")
# We use the chat completion API which handles the templates for us
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": user_input}
],
max_tokens=200,
temperature=0.1
)
return response["choices"][0]["message"]["content"]
# 3. Execution
if __name__ == "__main__":
text = "The user has successfully bypassed Windows-only restrictions to run Phi-3.5 on Kali Linux."
prompt = f"Summarize this: {text}"
result = fast_phi_task(prompt)
print("\n[FAST SUMMARY]:")
print(result.strip())
Key Improvements in this Version:
RAM Usage: Drops from ~8GB down to ~2.5GB.
Speed: You should see tokens appearing significantly faster than the previous script.
No "Seen Tokens" Error: This library doesn't use the buggy Microsoft custom script, so it's much more stable on Linux.
Comparison of your current setup vs. Phi-Silica
Would you like me to help you wrap this into a Python GUI (a window with a text box) so you don't have to use the terminal every time?
yes
Gemini said
To finalize your local AI implementation, we will build a professional desktop interface using tkinter (which comes pre-installed with Python) and the optimized llama-cpp-python library.
This GUI will provide a dedicated "Input" area, a "Summary" area, and a progress status—replicating the look and feel of a native Windows AI app, but completely free and open source on Kali.
1. Prerequisites
Ensure you have the high-speed library installed in your ai_env:
pip install llama-cpp-python
2. The Complete Final Code (Phi-Silica-GUI.py)
This script uses threading so the GUI doesn't "freeze" while the AI is thinking.
import tkinter as tk
from tkinter import scrolledtext, messagebox
import threading
from llama_cpp import Llama
# --- AI Backend Logic ---
class AIBackend:
def __init__(self):
print("Loading AI Model (GGUF)...")
# Downloads a lightweight 2.2GB version of Phi-3.5
self.llm = Llama.from_pretrained(
repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
n_ctx=2048,
n_threads=4, # Adjust to your CPU core count
verbose=False
)
def generate_summary(self, text):
prompt = f"Summarize the following text into one concise paragraph:\n\n{text}"
response = self.llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a professional summarization assistant."},
{"role": "user", "content": prompt}
],
max_tokens=250,
temperature=0.1
)
return response["choices"][0]["message"]["content"]
# --- GUI Application ---
class PhiGuiApp:
def __init__(self, root):
self.root = root
self.root.title("Kali AI - Local Phi-3.5")
self.root.geometry("700x600")
self.ai = None # Loaded on a separate thread to keep GUI responsive
# UI Elements
tk.Label(root, text="Paste Text to Summarize:", font=("Arial", 10, "bold")).pack(pady=5)
self.input_area = scrolledtext.ScrolledText(root, height=10, width=80)
self.input_area.pack(padx=20, pady=5)
self.btn_run = tk.Button(root, text="Generate Summary", command=self.start_ai_thread, bg="#2ecc71", fg="white", font=("Arial", 10, "bold"))
self.btn_run.pack(pady=10)
self.status_label = tk.Label(root, text="Status: Ready", fg="blue")
self.status_label.pack()
tk.Label(root, text="AI Summary Output:", font=("Arial", 10, "bold")).pack(pady=5)
self.output_area = scrolledtext.ScrolledText(root, height=10, width=80, bg="#f0f0f0")
self.output_area.pack(padx=20, pady=5)
# Start loading AI in the background
threading.Thread(target=self.load_ai, daemon=True).start()
def load_ai(self):
self.status_label.config(text="Status: Loading Model (First time takes longer)...")
self.btn_run.config(state="disabled")
self.ai = AIBackend()
self.status_label.config(text="Status: Model Loaded & Ready", fg="green")
self.btn_run.config(state="normal")
def start_ai_thread(self):
input_text = self.input_area.get("1.0", tk.END).strip()
if not input_text:
messagebox.showwarning("Warning", "Please enter some text first!")
return
self.btn_run.config(state="disabled")
self.status_label.config(text="Status: AI is thinking...", fg="orange")
self.output_area.delete("1.0", tk.END)
# Run AI in background thread so GUI doesn't hang
threading.Thread(target=self.process_summary, args=(input_text,), daemon=True).start()
def process_summary(self, text):
try:
summary = self.ai.generate_summary(text)
self.output_area.insert(tk.END, summary.strip())
self.status_label.config(text="Status: Success!", fg="green")
except Exception as e:
messagebox.showerror("Error", str(e))
self.status_label.config(text="Status: Error", fg="red")
finally:
self.btn_run.config(state="normal")
if __name__ == "__main__":
root = tk.Tk()
app = PhiGuiApp(root)
root.mainloop()
3. How to Run
Save the code as
Phi-Silica-GUI.py.In your terminal (with
ai_envactive), run:Bash
python Phi-Silica-GUI.pyNote: The first time you click "Generate Summary," it will download the GGUF file (approx 2.2GB). Watch your terminal for the download progress. Once it's done, subsequent uses will be nearly instant.
Why this is better than the Windows API:
Total Control: You can change the "System Prompt" in the code to make it act like a translator, a code debugger, or a creative writer.
Privacy: Unlike Copilot+, which may send metadata to Microsoft, this script is 100% disconnected from any external servers once the model is downloaded.
Efficiency: By using the GGUF format, you are saving ~5GB of RAM compared to your previous script.
Would you like me to show you how to add a "Save to File" button to this GUI so you can export your summaries automatically?
(ai_env)─(kali㉿kali)-[~]
└─$ python Phi-Silica-GUI.py
Loading AI Model (GGUF)...
/home/kali/ai_env/lib/python3.13/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.
warnings.warn(
llama_context: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Gemini said
Those messages you're seeing in the terminal are quite common in 2026. The local_dir_use_symlinks warning is just a "housekeeping" alert from the Hugging Face library because Python 3.13 handles file downloads differently now. The n_ctx message simply means the AI is capable of reading a massive book (131k tokens), but we've told it to only focus on a smaller "workspace" (2048 tokens) to save your RAM—which is exactly what you want on a local machine.
Here is the complete final code with a new "Save Summary" button and the fixes to silence those warnings.
The Complete Final Code (Phi-Silica-GUI.py)
import tkinter as tk
from tkinter import scrolledtext, messagebox, filedialog
import threading
import os
import warnings
from llama_cpp import Llama
# Silence the symlink warnings for a cleaner terminal
warnings.filterwarnings("ignore", category=UserWarning, module="huggingface_hub")
# --- AI Backend Logic ---
class AIBackend:
def __init__(self):
print("Loading AI Model (GGUF)...")
# Optimization: We set n_ctx to match the model's preferred local limit
self.llm = Llama.from_pretrained(
repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
n_ctx=2048,
n_threads=4,
verbose=False
)
def generate_summary(self, text):
prompt = f"Summarize the following text into one concise paragraph:\n\n{text}"
response = self.llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a professional summarization assistant."},
{"role": "user", "content": prompt}
],
max_tokens=300,
temperature=0.1
)
return response["choices"][0]["message"]["content"]
# --- GUI Application ---
class PhiGuiApp:
def __init__(self, root):
self.root = root
self.root.title("Kali AI - Local Phi-3.5 (Free & Offline)")
self.root.geometry("800x700")
self.ai = None
# Top Section: Input
tk.Label(root, text="Input Text:", font=("Arial", 10, "bold")).pack(pady=(10, 0))
self.input_area = scrolledtext.ScrolledText(root, height=12, width=90)
self.input_area.pack(padx=20, pady=5)
# Middle Section: Controls
self.btn_frame = tk.Frame(root)
self.btn_frame.pack(pady=10)
self.btn_run = tk.Button(self.btn_frame, text="✨ Generate Summary", command=self.start_ai_thread,
bg="#2ecc71", fg="white", font=("Arial", 10, "bold"), width=20)
self.btn_run.grid(row=0, column=0, padx=10)
self.btn_save = tk.Button(self.btn_frame, text="💾 Save Summary", command=self.save_to_file,
bg="#3498db", fg="white", font=("Arial", 10, "bold"), width=20)
self.btn_save.grid(row=0, column=1, padx=10)
self.status_label = tk.Label(root, text="Status: Initializing...", fg="blue")
self.status_label.pack()
# Bottom Section: Output
tk.Label(root, text="AI Output:", font=("Arial", 10, "bold")).pack(pady=(10, 0))
self.output_area = scrolledtext.ScrolledText(root, height=12, width=90, bg="#f8f9fa")
self.output_area.pack(padx=20, pady=5)
# Load AI in background
threading.Thread(target=self.load_ai, daemon=True).start()
def load_ai(self):
try:
self.ai = AIBackend()
self.status_label.config(text="Status: Model Ready", fg="green")
except Exception as e:
self.status_label.config(text=f"Status: Load Error", fg="red")
messagebox.showerror("Model Error", str(e))
def start_ai_thread(self):
input_text = self.input_area.get("1.0", tk.END).strip()
if not input_text:
messagebox.showwarning("Empty Input", "Please paste some text to summarize.")
return
self.btn_run.config(state="disabled")
self.status_label.config(text="Status: AI is processing...", fg="orange")
self.output_area.delete("1.0", tk.END)
threading.Thread(target=self.process_summary, args=(input_text,), daemon=True).start()
def process_summary(self, text):
try:
summary = self.ai.generate_summary(text)
self.output_area.insert(tk.END, summary.strip())
self.status_label.config(text="Status: Done!", fg="green")
except Exception as e:
messagebox.showerror("AI Error", str(e))
self.status_label.config(text="Status: Process Failed", fg="red")
finally:
self.btn_run.config(state="normal")
def save_to_file(self):
content = self.output_area.get("1.0", tk.END).strip()
if not content:
messagebox.showwarning("Nothing to Save", "Generate a summary first!")
return
file_path = filedialog.asksaveasfilename(
defaultextension=".txt",
filetypes=[("Text files", "*.txt"), ("All files", "*.*")],
title="Save Summary As"
)
if file_path:
try:
with open(file_path, "w") as f:
f.write(content)
messagebox.showinfo("Success", f"Summary saved to:\n{file_path}")
except Exception as e:
messagebox.showerror("Save Error", f"Could not save file: {e}")
if __name__ == "__main__":
root = tk.Tk()
app = PhiGuiApp(root)
root.mainloop()
What’s New in This Version:
The Save Button: Added a "Save Summary" button that opens a standard Linux file dialog. You can choose where to save your
.txtfile on your Kali drive.Warning Silencer: Added
warnings.filterwarningsto stop that "symlinks" text from cluttering your terminal.Status Indicators: The status label now changes colors (Green for ready, Orange for thinking, Red for error) so you know exactly what the AI is doing.
Threading: The "Save" and "Generate" functions work independently so the window never feels "frozen."
You now have a fully functional, local, and free alternative to Windows Phi-Silica. Would you like to add a "Clear All" button to quickly reset the input and output areas for a new task?
How can I use this model to generate programs and codes and build large priojects?
Gemini said
To move from simple summaries to building large software projects with Phi-3.5 on Kali Linux, you need to transition from a standalone script to an Agentic Workflow. While Phi-3.5-mini is "small," its 128k token context window makes it uniquely powerful for reading entire codebases.
Here is the professional strategy for using this model for large-scale development.
1. Integration: The "Local Copilot" Setup
Don't copy-paste code into your script. Instead, connect your local model directly to your IDE (VS Code or Codium) so it can see your files.
Tool: Continue.dev (Open-source VS Code Extension).
Setup:
Install the Continue extension in VS Code.
{ "models": [{ "title": "Local Phi-3.5", "provider": "ollama", "model": "phi3.5" }] }
(Note: You can run
ollama serveon Kali to make this seamless).
2. Strategy for "Large Projects"
Phi-3.5 is a 3.8B parameter model. While smart, it can lose track if you ask it to "Build a whole banking app." Use the Modular Prompting method:
Step 1: The Architect Phase Ask the model to generate a
file_structure.md.Prompt: "I want to build a Python-based network scanner. Create a modular file structure including a backend, a CLI, and a requirements.txt."
Step 2: The Skeleton Phase Ask it to write the interfaces/classes first, with comments but no logic.
Step 3: Implementation Provide one file at a time to the model and ask it to fill in the logic.
3. Code Generation Script
If you want to build a tool that generates entire projects for you, you should use JSON Mode. This forces the model to output raw code that your script can automatically save to files.
Complete Final Code: Project File Generator This script takes a prompt and generates a file with the code automatically.
import os
import json
from llama_cpp import Llama
# 1. Initialize High-Performance Backend
llm = Llama.from_pretrained(
repo_id="bartowski/Phi-3.5-mini-instruct-GGUF",
filename="Phi-3.5-mini-instruct-Q4_K_M.gguf",
n_ctx=4096, # Expanded context for code
n_threads=6,
verbose=False
)
def generate_project_file(file_purpose, filename):
print(f"🚀 Generating {filename}...")
# System prompt optimized for coding
system_msg = (
"You are an expert Senior Lead Developer. "
"Provide ONLY the raw code for the requested file. "
"Do not include explanations, markdown backticks, or 'Here is your code'."
)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": system_msg},
{"role": "user", "content": f"Write a complete, production-ready {file_purpose}."}
],
temperature=0.2 # Lower temperature = fewer bugs
)
code = response["choices"][0]["message"]["content"]
# Save the file automatically
with open(filename, "w") as f:
f.write(code.strip())
print(f"✅ Saved to {os.path.abspath(filename)}")
if __name__ == "__main__":
# Example: Generating a modular project
generate_project_file("Python script using Scapy for a port scanner", "scanner.py")
generate_project_file("README.md file explaining how to use a network scanner", "README.md")
4. Advanced: RAG (Retrieval Augmented Generation)
For "Large Projects," the model needs to "remember" your other files.
The Hack: Since Phi-3.5 has a 128k context window, you can actually concatenate all your
.pyfiles into one big text block and paste it into the prompt.The Pro Way: Use a library like
LangChainto index your local folder into a vector database (ChromaDB), so the model only "reads" the relevant files when you ask a question.
Summary of Tools for Kali
Terminal: Use
ollamafor the fastest background serving.IDE: Use
VS Code+Continuefor an "Autopilot" experience.CLI: Use
aider(pip install aider-chat)—it is a command-line tool that lets Phi-3.5 actually edit your local files directly.
Would you like me to show you how to set up Aider, so the AI can run your code and fix its own bugs on your Kali machine?
"Open your config.json in Continue." How to do it?
Gemini said
To open the configuration for the Continue extension on Kali Linux (or any OS), you have two main ways: the "Command Palette" way and the "Sidebar" way.
Option 1: The Command Palette (Fastest)
Open VS Code.
Press
Ctrl + Shift + P.Type "Continue: Open Config" (or just "Continue Config").
Select it, and it will automatically open the
config.yamlorconfig.jsonfile in your editor.
Option 2: The Sidebar (Visual)
Click the Continue icon in the left sidebar (looks like a small triangle/play button).
Look at the bottom left of the Continue panel for a Gear Icon (⚙️).
Click that icon to open the configuration file.
How to add your Local Phi-3.5
Once the file is open, search for the "models": [] section. To use the model we've been working with, you will need to have Ollama running in the background (as it's the easiest way for Continue to talk to local models).
Paste this into your models array:
{
"title": "Local Phi-3.5",
"provider": "ollama",
"model": "phi3.5",
"apiBase": "http://localhost:11434"
}
Why we are using Ollama here:
While our Python scripts used llama-cpp-python directly, the Continue extension is designed to work seamlessly with Ollama.
To set this up on your Kali machine:
In your terminal, run:
ollama run phi3.5(This downloads the model specifically for the Ollama service).Once it's running, the VS Code extension will be able to "see" it at the
apiBaseaddress above.
Complete Final Checklist for Projects:
Generate Code: Use
Ctrl + Lin VS Code to ask the model to write code.Edit Code: Highlight a block of code and press
Ctrl + I. Tell it "add error handling" or "refactor this."Explain Code: Highlight a complex function and ask "What does this do?"
Would you like me to show you how to use Aider in your terminal? It's a tool that lets Phi-3.5 actually edit your project files directly without you having to copy-paste.
To open the configuration for the Continue extension in Visual Studio Code, you can use any of the following methods:
Command Palette: Press
Ctrl+Shift+P(Windows/Linux) orCmd+Shift+P(macOS), type "Continue: Open configuration file", and select it.Continue Sidebar: Open the Continue sidebar by clicking its logo or pressing
Ctrl+L(Cmd+Lon Mac). Click the selector (often labeled "Local Assistant" or an agent name) above the chat input, hover over the agent, and click the gear (cog) icon.Config Dropdown: In the top-right of the chat input box within the extension panel, open the configs dropdown and select the cog icon next to the "Local Config" option.
Which configuration format should you choose?
Continue currently supports multiple configuration formats, but for most users, local YAML is recommended.
Summary for your specific goal:
Since you are setting up local models like Phi-3.5 via Ollama on Kali Linux, your next step should be:
Open the
config.yaml(orconfig.json) using one of the methods above.Add your model details to the
modelssection, ensuring theapiBasepoints to your local Ollama port (typicallyhttp://localhost:11434).
Would you like me to provide a specific YAML block to paste into your configuration for Phi-3.5?
yes
Gemini said
To add Phi-3.5 to your Continue configuration, you should follow the modern YAML standard. This format is cleaner and more resistant to the "missing brackets" errors often found in JSON.
1. The Configuration Block
Open your config.yaml (using the methods we discussed earlier) and paste this block into the models: section.
models:
- name: Phi-3.5 Local
provider: ollama
model: phi3.5:latest
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
capabilities:
- tool_use
defaultCompletionOptions:
temperature: 0.1
contextLength: 4096
2. Pro-Tip: Setting up Autocomplete
If you want the "ghost text" (autocomplete) feature similar to GitHub Copilot, you should use a smaller, faster version of the model. Add this to the end of your config.yaml:
tabAutocompleteModel:
name: Phi-3.5 Mini
provider: ollama
model: phi3.5:3.8b-mini-instruct-q4_K_M
3. Important Setup Step
Before VS Code can use these, you must download them via Ollama in your Kali terminal. Run these two commands:
Pull the main model:
Bash
ollama pull nomic-embed-text
4. How to use it for "Large Projects"
Once configured, you can use these special commands in the Continue chat sidebar:
@codebase: Type this before your question (e.g., “@codebase where is the login logic handled?”). The AI will search all your local files.@file: Type this to point the AI to a specific file you want it to edit or explain.@docs: Use this to let the AI read official documentation URLs while it writes your code.
Would you like me to provide a specialized "Rules" block for your config? This will force the AI to always follow specific coding standards (like "always use Python 3.13" or "always include unit tests").
To determine if your config.yaml is correct for running Phi-3.5 via Ollama on Kali Linux, you should verify its structure against the following standard requirements.
1. Essential YAML Structure
Ensure your file includes these top-level fields, which are required for the configuration to load:
name: A display name for your configuration (e.g.,
Local Assistant).version: Typically set to
1.0.0.schema: Must be set to
v1to ensure compatibility with the modern YAML format.
2. Model Configuration for Phi-3.5
The models: section must be correctly formatted as a list (using hyphens) with the following specific properties for Ollama:
provider: Must be
ollama.model: Use the exact tag you downloaded (e.g.,
phi3.5:latest).apiBase: For local Kali setups, this is usually
http://localhost:11434.roles: To enable all features, include
chat,edit, andapply.capabilities: Add
- tool_useto ensure "Agent Mode" functions correctly.
3. Common "Gotchas" to Check
Indentation: YAML is strictly indentation-sensitive. Use exactly 2 spaces per level; never use tabs, as they will cause the file to fail.
Spaces after Colons: Ensure there is a space after every colon (e.g.,
model: phi3.5, notmodel:phi3.5).Model Availability: Your configuration will only work if the model is already downloaded. Open your Kali terminal and run
ollama listto verifyphi3.5is present.
Troubleshooting
If your models disappear from the UI after saving the file, there is likely a syntax error. You can validate your YAML by running this command in your terminal:
python -c "import yaml; yaml.safe_load(open('~/.continue/config.yaml'))".
Would you like me to analyze a specific snippet of your file to see if it matches these rules?
yes
Gemini said
To verify if your config.yaml is correct for 2026, check it against this standardized "Golden Template."
In the latest versions of Continue, the YAML structure has become more strict regarding schema and roles. If your models aren't showing up in the VS Code sidebar, it is usually because one of these fields is missing or the indentation is off.
The "Golden Template" for Phi-3.5 on Kali
Compare your file to this. Every space matters:
# Top-level required fields
name: Local Phi Setup
version: 1.0.0
schema: v1
models:
- name: Phi-3.5 Local
provider: ollama
model: phi3.5:latest
apiBase: http://localhost:11434
# Roles tell VS Code where this model can be used
roles:
- chat
- edit
- apply
# Capabilities enable 'Agent Mode' (needed for large projects)
capabilities:
- tool_use
# Optional performance settings
completionOptions:
temperature: 0.1
contextLength: 4096
# Recommended: Small model for instant typing/autocomplete
tabAutocompleteModel:
name: Phi-3.5 Autocomplete
provider: ollama
model: phi3.5:3.8b-mini-instruct-q4_K_M
roles:
- autocomplete
3-Point Checklist for Success
The "No-Tabs" Rule: YAML will fail silently if you use a single "Tab" character. Ensure your editor is using 2 spaces for indentation. You can check this in the bottom-right corner of VS Code (it should say
Spaces: 2).The "Role" Requirement: In 2026, if you don't list
roles: [chat, edit], the model might appear in the chat box but refuse to help when you try to highlight code and pressCtrl + I.The "Ollama Serve" Check: Before opening VS Code, make sure the Ollama engine is actually awake. Run this in your Kali terminal:
Bash
ollama serve(If it says "address already in use," it's already running and you're good to go).
How to test it immediately
Once you save the file:
Look at the Continue Sidebar in VS Code.
Click the model dropdown (at the bottom of the chat window).
If you see "Phi-3.5 Local" with a green dot or no error message, it is perfectly configured.
If it's still not showing up, could you paste the first 10 lines of your file here? I can spot the specific syntax error for you.
The config.yaml file you are building for Continue is the modern standard for 2026. To ensure it works perfectly for your local Phi-3.5 setup on Kali Linux, it must follow a specific structure defined by the v1 schema.
Mandatory Configuration Fields
Your file must begin with these top-level properties to be valid:
name: A unique name for your project or setup (e.g.,Kali Local AI).version: The configuration version, typically1.0.0.schema: Must be set tov1to use the YAML format correctly.
Model Settings for Phi-3.5
In the models section, you define the AI's properties. For an Ollama-powered model on your local machine, use this structure:
provider: Must beollama.model: Set this to the exact tag used in Ollama (e.g.,phi3.5:latest).roles: List the tasks this model can perform. Common roles includechat,edit,apply, andsummarize.capabilities: Add- tool_useif you want the model to be compatible with Agent Mode.
Adding Autocomplete Support
To enable "ghost text" suggestions while you type, you can define a separate model specifically for the autocomplete role:
Add a model to your list with the role
autocomplete.Pro-Tip: Use a smaller model like
phi3.5:3.8b-mini-instruct-q4_K_Mfor this role to ensure the suggestions appear instantly without lag.
Verifying and Troubleshooting
Indentation: YAML requires strict indentation using spaces, not tabs.
Auto-Detection: If you are unsure of your local settings, you can set your model to
AUTODETECTin the GUI, and Continue will scan your local Ollama installation for you.Reloading: If changes do not appear in VS Code, open the Command Palette (
Ctrl+Shift+P) and select "Reload Window" to refresh the extension.
Would you like me to look at your actual config.yaml code to check for any specific indentation or syntax errors?
yes
Gemini said
To confirm if your config.yaml is correct, check it against the mandatory structural rules for 2026. If any of these are missing or indented with tabs instead of spaces, the Continue extension will either fail to load your models or show a "Failed to parse config" error.
1. The "Golden" Syntax Check
Ensure your file follows this exact pattern. Pay close attention to the v1 schema, which is required for all new YAML-based configurations.
# Top-level required metadata
name: Local Phi Setup
version: 1.0.0
schema: v1
models:
- name: Phi-3.5 Local
provider: ollama
model: phi3.5:latest
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
capabilities:
- tool_use
# Settings for this specific model
completionOptions:
temperature: 0.1
contextLength: 4096
2. Common Errors to Look For
Indentation: YAML is strictly sensitive to spacing. You must use exactly 2 spaces for each level. Never use tabs; if you do, the file will be invalid.
Missing Roles: In the current version, if you do not explicitly list
roles: [chat, edit], the model may not appear in the "Edit" or "Refactor" menus.Schema Version: If you omit
schema: v1, the extension may try to parse it as an older format and fail.
3. How to Verify via Terminal
You can quickly check if your YAML is technically valid by running this command in your Kali terminal:
python3 -c "import yaml; yaml.safe_load(open('/home/kali/.continue/config.yaml'))" && echo "✅ YAML is Valid"
If this command returns an error, it will tell you the exact line number where your syntax is broken.
4. Troubleshooting Models Not Appearing
If the YAML is valid but the model still doesn't show up in VS Code:
Check Ollama: Run
ollama listto ensurephi3.5:latestis actually downloaded.Toggle Developer Tools: In VS Code, press
Ctrl+Shift+P, type "Toggle Developer Tools", and check the Console tab for red error messages related to "Failed to load config".Reload: Use
Ctrl+Shift+Pand run "Developer: Reload Window" to force the extension to re-read your file.
If you paste your actual code here, I can identify the specific line that's causing trouble. Would you like me to do that?
(kali㉿kali)-[~]
└─$ python3 -c "import yaml; yaml.safe_load(open('/home/kali/.continue/config.yaml'))" && echo "✅ YAML is Valid"
✅ YAML is Valid
Gemini said
Great news! If the terminal says "✅ YAML is Valid", it means your syntax (indentation, colons, and structure) is perfect. Your Kali system can read the file correctly.
Since the file is technically correct, the final step is to ensure the Continue extension and Ollama are talking to each other.
Final Verification Steps
Is Ollama running? In your terminal, run:
Bash
ollama listConfirm that
phi3.5appears in that list. If the list is empty, runollama pull phi3.5first.Check the VS Code UI
Open VS Code.
Open the Continue sidebar (on the left).
Look at the model dropdown at the bottom of the chat window.
You should now see "Phi-3.5 Local" as an option. Select it.
The "Acid Test" (Coding Task) To see if your "Large Project" setup is working, try this command in the Continue chat:
@codebase How does this project work?If it starts indexing your files, you are officially running a local, free, and private AI development environment on Kali!
One Last Recommendation: The "Rules"
Since you want to build large projects, you can add a system_message to your config.yaml inside the Phi-3.5 block to ensure it always writes high-quality code.
Would you like me to show you how to add a "Rules" section to your config so the AI always follows best security practices (crucial for Kali users)?
Error handling model response
There was an error handling the response from Phi-3.5 Local.
Please try to submit your message again, and if the error persists, let us know by reporting the issue using the buttons below.
Resubmit last message
View error output
"registry.ollama.ai/library/phi3.5:latest does not support tools"
Gemini said
This error occurs because the standard Phi-3.5 model in the Ollama library does not natively support "Tool Calling" (the ability for the AI to use search, index files, or run functions). When the Continue extension sees capabilities: [tool_use] in your config, it tries to use a feature the model doesn't have, leading to that "registry error."
To fix this and keep building large projects on Kali, you have two choices: The Fallback Method or The Pro Swap.
Option 1: The Fallback Method (Keep Phi-3.5)
You can force Continue to use "System Message Tools." This means the extension will explain the tools to the AI in the chat instructions rather than relying on the model's built-in code.
Update your config.yaml to this:
models:
- name: Phi-3.5 Local
provider: ollama
model: phi3.5:latest
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
# REMOVE 'capabilities: [tool_use]'
# Continue will automatically use its fallback system instead
Option 2: The Pro Swap (Best for Large Projects)
If you want to build large projects, you need an AI that is built for tools. The Llama 3.1 8B or Qwen 2.5 Coder models are the current industry standard for local coding agents because they have native tool support.
Download a "Tool-Ready" model in your terminal:
Bash
models: - name: Code Expert (Qwen) provider: ollama model: qwen2.5-coder:7b apiBase: http://localhost:11434 roles: - chat - edit - apply capabilities: - tool_use # This will now work perfectly!
### Why this happened:
* **Phi-3.5:** Great for reasoning and short summaries, but it doesn't know how to "talk" to external tools like your file system yet.
* **Qwen 2.5 Coder:** Specifically trained to use tools and manage multi-file projects. Since you're on Kali and likely have enough RAM for a 7B model, this is the much better choice for "Building Large Projects."
### What to do next:
1. Try **Option 1** first to see if the error disappears.
2. If the AI still feels "limited" when you try to use `@codebase`, switch to **Option 2**.
**Would you like me to help you verify if your PC has enough VRAM to run the 7B "Code Expert" model smoothly?**
"Local Config
Failed to parse config: Map keys must be unique at line 32, column 1: models: ^ Error: Failed to parse config: Map keys must be unique at line 32, column 1: models: ^ at parseConfigYaml (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/extension.js:261639:13) at parseYamlOrMarkdownRule (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/extension.js:262034:18) at parseMarkdownRuleOrConfigYaml (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/extension.js:262029:10) at unrollAssistantFromContent (/home/kali/.vscode/extensions/continue.continue-1.2.14-linux-x64/out/e" - Config/yaml file content: "# Top-level required fields
name: Local Phi Setup
version: 1.0.0
schema: v1
models:
- name: Phi-3.5 Local
provider: ollama
model: phi3.5:latest
apiBase: http://localhost:11434
# Roles tell VS Code where this model can be used
roles:
- chat
- edit
- apply
# Capabilities enable 'Agent Mode' (needed for large projects)
capabilities:
- tool_use
# Optional performance settings
completionOptions:
temperature: 0.1
contextLength: 4096
# Recommended: Small model for instant typing/autocomplete
tabAutocompleteModel:
name: Phi-3.5 Autocomplete
provider: ollama
model: phi3.5:3.8b-mini-instruct-q4_K_M
roles:
- autocomplete
models:
- name: Code Expert (Qwen)
provider: ollama
model: qwen2.5-coder:7b
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
capabilities:
- tool_use # This will now work perfectly!"
Gemini said
The error "Map keys must be unique" is happening because you have the word models: written twice in your file (once on line 6 and again further down). In YAML, you can only define the models: category once; all your different AI models must be listed under that single heading.
By having two models: sections, the Continue extension gets confused and crashes.
The Fix: Merged & Corrected Code
I have merged both Phi-3.5 and Qwen (Code Expert) into one single list under one models: header. I also removed the tool_use capability from Phi-3.5 to prevent the "registry" error you saw earlier, while keeping it for Qwen.
Replace the entire content of your config.yaml with this:
# Top-level required fields
name: Local AI Setup
version: 1.0.0
schema: v1
models:
- name: Phi-3.5 Local
provider: ollama
model: phi3.5:latest
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
# No tool_use here to avoid the registry error
completionOptions:
temperature: 0.1
contextLength: 4096
- name: Code Expert (Qwen)
provider: ollama
model: qwen2.5-coder:7b
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
capabilities:
- tool_use # Qwen supports this perfectly for large projects
tabAutocompleteModel:
name: Phi-3.5 Autocomplete
provider: ollama
model: phi3.5:3.8b-mini-instruct-q4_K_M
roles:
- autocomplete
Why this works now:
Unique Keys: There is now only one
models:section.Clean Structure: Each model starts with a
- name:, which tells YAML these are items in the same list.Conflict Resolved: By keeping both, you can use Phi-3.5 for quick questions and switch to Code Expert (Qwen) when you need to use
@codebaseor build large projects.
How to use it in VS Code:
Save the file.
In the Continue sidebar, click the model dropdown at the bottom.
You will now see both choices. Select Code Expert (Qwen) for your large project tasks.
Would you like me to show you how to set up a "Custom Command" so you can generate a whole project structure with one click?
You said
yes
Gemini said
To build large projects with one click, we will use Slash Commands. These are custom shortcuts that tell the AI: "Use this specific expert persona and follow these structural rules for every file you generate."
1. Update your config.yaml
Add this customCommands section to the bottom of your file. I have designed this command specifically to generate professional, modular project structures.
# Add this to the bottom of your config.yaml
customCommands:
- name: project
description: Generate a complete modular project structure
prompt: |
You are a Senior Software Architect. The user wants to build: {{{ input }}}
Please provide:
1. A directory tree structure (folder/file layout).
2. A 'requirements.txt' or 'package.json' listing all dependencies.
3. The core logic for the main entry point file.
Use best practices: modularity, error handling, and clear comments.
For Kali Linux, ensure any networking code follows standard security protocols.
2. How to use it in VS Code
Once you save the file, the command is ready instantly:
Open the Continue sidebar (
Ctrl + L).Type
/projectfollowed by your idea.Example:
/project a Python tool that scans a network and saves results to a PDF
The AI will now stop giving you "snippet" answers and start giving you a full Architectural Blueprint.
3. The "Power User" Workflow for Large Projects
For a large project, don't try to generate 20 files in one go (it will hit memory limits). Use this 3-step sequence:
Step 1 (The Blueprint): Run
/project [your idea]. Copy the directory tree it gives you and create those empty folders/files in your Kali terminal.Step 2 (The Implementation): Open one of the empty files, highlight the file name, and type:
@codebase Write the full implementation for this specific file based on our previous blueprint.Step 3 (The Refactor): Once the code is written, highlight it all and press
Ctrl + I, then type:add logging and a try/except block for every network call
4. Critical Pro-Tip for Kali
Since you are on Kali, you likely deal with security tools. If you want the AI to be a Security Expert, add this command too:
- name: audit
description: Check code for security vulnerabilities
prompt: |
Review the following code for OWASP top 10 vulnerabilities,
unsafe buffer handling, or hardcoded credentials.
Suggest a secure alternative for every issue found.
{{{ input }}}
Would you like me to help you create a Bash script that automatically creates the folders and files based on the AI's blueprint?
Open your config.json in Continue.
Point it to your local llama-cpp-python server:
ollama pull phi3.5
Pull the optimized embedding model (This allows the AI to "read" your whole folder/codebase using the @codebase command):
ollama pull qwen2.5-coder:7b
Update your config.yaml to use it:
Bypassing Restrictions: Replicated Windows-only Phi-Silica features on Kali Linux for $0.
Optimized Performance: Switched to GGUF quantization to save 5GB of RAM and boost speed.
GUI Development: Built a custom Python-based desktop interface for local AI summarization.
Pro Coding Environment: Configured VS Code with the Continue extension and Ollama to manage large, multi-file software projects.
Agentic Workflows: Added custom slash commands (/project) to automate architectural blueprints and security audits.

yaml-file.jpeg)
